Restoration of fragmentary Babylonian texts using recurrent neural networks

Edited by Emilie Pagé-Perron, University of California, Los Angeles, CA, and accepted by Editorial Board Member Elsa M. Redmond July 7, 2020 (received for review February 27, 2020)
September 1, 2020
117 (37) 22743-22751

Significance

The documentary sources for the political, economic, and social history of ancient Mesopotamia constitute hundreds of thousands of clay tablets inscribed in the cuneiform script. Most tablets are damaged, leaving gaps in the texts written on them, and the missing portions must be restored by experts. This paper uses available digitized texts for training advanced machine-learning algorithms to restore daily economic and administrative documents from the Persian empire (sixth to fourth centuries BCE). As the amount of digitized texts grows, the model can be trained to restore damaged texts belonging to other genres, such as scientific or literary texts. Therefore, this is a first step for a large-scale reconstruction of a lost ancient heritage.

Abstract

The main sources of information regarding ancient Mesopotamian history and culture are clay cuneiform tablets. Many of these tablets are damaged, leading to missing information. Currently, the missing text is manually reconstructed by experts. We investigate the possibility of assisting scholars, by modeling the language using recurrent neural networks and automatically completing the breaks in ancient Akkadian texts from Achaemenid period Babylonia.

Continue Reading

Data Availability

All study data are included in the article, SI Appendix, and Datasets S1–S6. Atrahasis can be accessed on the Babylonian Engine Website (https://babylonian.herokuapp.com/). Our source code is available at GitHub (https://github.com/DHALab/Atrahasis).

Acknowledgments

This research was supported by the Ministry of Science & Technology, Israel, Grant 89540 for the project “Human-Computer Collaboration for Studying Life and Environment in Babylonian Exile” of Sh.G. and Amos Azaria, as part of Sh.G.’s Babylonian Engine initiative. We thank Eugene McGarry for language editing, Avital Romach for her assistance with the final proofs of this paper, Klaus Wagensonner for tablet photographs, and Moshe Shtekel for designing the web-tool for Atrahasis. We especially thank the anonymous reviewers for their detailed remarks and corrections.

Supporting Information

Appendix (PDF)
Dataset_S01 (TXT)
Dataset_S02 (TXT)
Dataset_S03 (TXT)
Dataset_S04 (TXT)
Dataset_S05 (CSV)
Dataset_S06 (CSV)

References

1
N. Veldhuis, “Cuneiform: Changes and developments” in The Shape of Script. How and Why Writing Systems Changes, S. D. Houston, Eds. (School for Advanced Research Press, 2012), pp. 3–24.
2
J. Huehnergard, C. Woods, “Akkadian and eblaite” in The Cambridge Encyclopedia of the World’s Ancient Languages, R. D. Woodard, Eds. (Cambridge University Press, 2004), pp. 218–280.
3
M. P. Streck, “Akkadian in general” in The Semitic Languages: An International Handbook, S. Weninger, G. Kahn, M. P. Streck, J. C. E. Watson, Eds. (De Gruyter, 2011), pp 330–339.
4
M. P. Streck, “Großes altorientalistik. Der umfang des keilschriftlichen textkorpus” in Mitteilungen der Deutschen Orient-Gesellschaft (Deutsche Orient Gesellschaft, 2010), vol. 142, pp. 35–58.
5
C. Gütschow, Methoden zur Restaurierung von ungebrannten und gebrannten Keilschrifttafeln (PeWe-Verlag, 2012).
6
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016).
7
A. Radford et al., Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019).
8
J. Cohen et al., “iClay: Digitizing cuneiform” in VAST 2004: The 5th International Symposium on Virtual Reality, Archaeology and Cultural Heritage, Y. Chrysanthou, K. Cain, N. Silberman, F. Niccolucci, Eds. (The Eurographics Association, 2004), pp. 135–143.
9
H. Mara, S. Krömker, S. Jakob, B. Breuckmann, “Gigamesh and Gilgamesh – 3D multiscale integral invariant cuneiform character extraction” in VAST: International Symposium on Virtual Reality, Archaeology and Intelligent Cultural Heritage, A. Artusi, M. Joly, G. Lucet, D. Pitzalis, A. Ribes, Eds. (The Eurographics Assocation, 2010), pp. 131–138.
10
G. Earl et al., “Reflectance transformation imaging systems for ancient documentary artefacts” in Electronic Visualisation and the Arts (EVA 2011), S. Dunnand, J. P. Bowen, K. C. Ng, Eds. (BCS: The Chartered Institute for IT, 2011), pp. 147–154.
11
H. Hameeuw, G. Willems, New visualization techniques for cuneiform texts and sealings. Akkadica 132, 163–178 (2011).
12
M. Pauzi, M. Asyraf, “Digital preservation of Malaysian historical artefact using 3D scanner: A case study of Mah Meri mask,” PhD dissertation, Multimedia University, Persiaran Multimedia, 63100 Cyberjaya, Selangor, Malaysia (2017).
13
D. Fisseler, F. Weichert, G. Gerfrid. W. Müller, M. Cammarosano, “Extending philological research with methods of 3D computer graphics applied to analysis of cultural heritage” in Eurographics Workshop on Graphics and Cultural Heritage, R. Klein, P. Santos, Eds. (The Eurographics Association, 2014), pp. 165–172.
14
B. Bogacz, N. Gertz, H. Mara, “Character retrieval of vectorized cuneiform script” in 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (IEEE Computer Society, 2015), pp. 326–330.
15
B. Bogacz, M. Klingmann, H. Mara, “Automating transliteration of cuneiform from parallel lines with sparse data” in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (IEEE Computer Society, 2017), vol. 1, pp. 615–620.
16
B. Bogacz, H. Mara, “Automatable annotations – Image processing and machine learning for script in 3D and 2D with Gigamesh” in Kodikologie und Paläographie im Digitalen Zeitalter 4 – Codicology and Palaeography in the Digital Age 4, H. Busch, F. Fischer, P. Sahle, Eds. (Books on Demand, 2017), pp. 137–149.
17
M. P. Streck, “Babylonian and assyrian” in The Semitic Languages: An International Handbook, S. Weninger, G. Kahn, M. P. Streck, J. C. E. Watson, Eds. (De Gruyter, 2011), pp. 359–395.
18
J. Hackl, “Zur Sprachsituation im Babylonien des ersten Jahrtausends v.Chr. Ein Beitrag zur Sprachgeschichte des jüngeren Akkadischen” in Mehrsprachigkeit vom Alten Orient bis zum Esperanto, S. Fink, M. Lang, M. Schretter, Eds. (Zaphon, 2018), pp. 209–238.
19
M. Jursa, “Accounting in Neo-Babylonian institutional archives: Structure, usage, and implications” in Creating Economic Order: Record-keeping, Standardization, and Development of Accounting in the Ancient Near East, M. Hudson, C. Wunsch, Eds. (CDL Press, 2004), pp. 145–198.
20
M. Jursa, Neo-Babylonian Legal and Administrative Documents. Typology, Contents and Archives (Ugarit-Verlag, 2005).
21
M. Jursa, Aspects of the Economic History of Babylonia in the First Millennium BCE (Ugarit-Verlag, 2010).
22
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
23
A. A. Akın, M. D. Akın, Zemberek, an open source NLP framework for Turkic languages. Structure 10, 1–5 (2007).
24
N. Gupta, P. Mathur, Spell checking techniques in NLP: A survey. Int. J. Adv. Res. Comput. Sci. Software Eng. 2, 217–221 (2012).
25
B. Kaur, Review on error detection and error correction techniques in NLP. Int. J. Adv. Res. Comput. Sci. Software Eng. 4, 851–853 (2014).
26
S. Singh, S. Singh, “Review of real-word error detection and correction methods in text documents” in 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA) (IEEE Computer Society, 2018), pp. 1076–1081.
27
R. Altarawneh, Spelling detection errors techniques in NLP: A survey. Int. J. Comput. Appl. 172, 1–5 (2017).
28
O. Streiter, E. W. De Luca, “Example-based NLP for minority languages: Tasks, resources and tools” in Proceedings of the Workshop “Traitement Automatique Des Langues Minoritaires et des Petites Langues”, 10e Conference TALN, O. Streiter, Ed. (Batz-sur-Mer, France, 2003), pp. 233–242.
29
S. Nirenburg, Language Engineering for Lesser-Studied Languages (Ios Press, 2009).
30
V. B. Juloux, A. R. Gansell, A. Di Ludovico, CyberResearch on the Ancient Near East and Neighboring Regions: Case Studies on Archaeological Data, Objects, Texts, and Digital Archiving (Brill, 2018).
31
S. P. Singh et al., “Frequency based spell checking and rule based grammar checking” in 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT, 2016), pp. 4435–4439.
32
A. Sahala, M. Silfverberg, A. Arppe, K. Lindén, “BabyFST: Towards a finite-state based computational model of ancient Babylonian” in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) (European Language Resources Association [ELRA], 2020), pp. 3886–3894.
33
P. Gakis, C. Panagiotakopoulos, K. Sgarbas, C. Tsalidis, Design and implementation of an electronic lexicon for modern Greek. Lit. Ling. Comput. 27, 155–169 (2012).
34
N. Suguna, K. G. Thanushkodi, Predicting missing attribute values using k-means clustering. J. Comput. Sci. 7, 216–224 (2011).
35
T. Homburg, C. Chiarcos, “Word segmentation for Akkadian cuneiform” in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), N. Calzolari et al., Eds. (European Language Resources Association (ELRA), 2016), pp. 4067–4074.
36
M. Sukhareva, I. Khait, E. Pagé-Perron, C. Chiarcos, “Machine translation and automated analysis of the Sumerian language” in Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Association for Computational Linguistics Anthology, B. Alex et al., Eds. (Association for Computational Linguistics, 2017), pp. 10–16.
37
C. Chiarcos et al., Annotating a low-resource language with LLOD technology: Sumerian morphology and syntax. Information 9, 290 (2018).
38
A. Z. Aktaş, B. Yesiltepe, T. Aşuroğlu, Computerized Hittite cuneiform sign recognition and knowledge-based system application examples. Eur. Sci. J. 15, 32–54 (2019).
39
A. Sahala, M. Silfverberg, A. Arppe, K. Lindén, “Automated phonological transcription of Akkadian cuneiform text” in Proceedings of The 12th Language Resources and Evaluation Conference (LREC 2020) (European Language Resources Association [ELRA], 2020), pp. 3528–3534.
40
P. Majumder, M. Mitra, B. B. Chaudhuri, “N-gram: A language independent approach to IR and NLP” in International Conference on Universal Knowledge and Language (2002).
41
R. P. N. Rao et al., A markov model of the Indus script. Proc. Natl. Acad. Sci. U.S.A. 106, 13685–13690 (2009).
42
N. Yadav et al., Statistical analysis of the Indus script using n-grams. PLOS One 5, e9506 (2010).
43
G. Marra, A. Zugarini, S. Melacci, M. Maggini, “An unsupervised character-aware neural approach to word and context representation learning” in International Conference on Artificial Neural Networks, V. Kůrková, Y. Manolopoulos, B. Hammer, L. Iliadis, I. Maglogiannis, Eds. (Springer, 2018), pp. 126–136.
44
Y. Kim, Y. Jernite, D. Sontag, A. M. Rush, “Character-aware neural language models” in Thirtieth AAAI Conference on Artificial Intelligence, D. Schuurmans, M. Wellman, Eds. (AAAI Press, 2016), pp. 2741–2749.
45
Z. Zhang, Y. Huang, P. Zhu, H. Zhao, “Effective character-augmented word embedding for machine reading comprehension” in Natural Language Processing and Chinese Computing, M. Zhang, V. Ng, D. Zhao, S. Li, H. Zan, Eds. (Springer International Publishing, Cham, 2018), pp. 27–39.
46
Y. Assael, T. Sommerschield, J. Prag, “Restoring ancient text using deep learning: A case study on Greek epigraphy” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), S. Pad, R. Huang, Eds. (Association for Computational Linguistics, 2019), pp. 6368–6375.
47
S. Tyndall, “Toward automatically assembling Hittite-language cuneiform tablet fragments into larger texts” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, H. Li, C.-Y. Lin, M. Osborne, G. G. Lee, J. C. Park, Eds. (Association for Computational Linguistics, 2012), pp. 243–247.
48
T. Collins et al., “Computer-assisted reconstruction of virtual fragmented cuneiform tablets” in 2014 International Conference on Virtual Systems Multimedia (VSMM), H. Thwaites, S. Kenderdine, J. Shaw, Eds. (IEEE Computer Society, 2014), pp. 70–77.
49
T. Collins, S. Woolley, E. Gehlken, E. Ch’ng, “Computational aspects of model acquisition and join geometry for the virtual reconstruction of the Atrahasis cuneiform tablet” in 2017 23rd International Conference on Virtual System Multimedia (VSMM), L. Goodman, A. Addison, Eds. (IEEE Computer Society, 2017), pp. 1–6.
50
M. Jursa, Das Archiv des Bēl-rēmanni (NINO, 1999).
51
O. Pedersén, Archive und Bibliotheken in Babylon: Die Tontafeln der Grabung Robert Koldeweys 1899-1917 (SDV Saarländische Druckerei und Verlag, 2005).
52
C. Waerzeggers, “The network of resistance: Archives and political action in Babylonia before 484 BCE” in Xerxes and Babylonia: The Cuneiform Evidence,C. Waerzeggers, M. Seire, Eds. (Peeters, 2018), pp. 89–134.
53
L. E. Pearce, C. Wunsch, Documents of Judean Exiles and West Semites in Babylonia in the Collection of David Sofer (CDL Press, 2014).
54
M. W. Stolper, Entrepreneurs and Empire: The Murašû Archive, the Murašû Firm, and Persian Rule in Babylonia (NINO, 1985).
55
M. W. Stolper, “Kasr texts: Excavated–but not in Berlin” in Studies Presented to Robert D. Biggs, M. T. Roth, W. Farber, M. W. Stolper, P. von Bechtolsheim, Eds. (The Oriental Institute of the University of Chicago, 2007), pp. 243–283.
56
E. Frahm, M. Jursa, Neo-Babylonian Letters and Contracts from the Eanna Archive (Yale University Press, 2011).
57
K. K. Uruk, “The fate of the Eanna archive, the Gimil-Nanāya B archive, and their archaeological evidence” in Xerxes and Babylonia: The Cuneiform Evidence, C. Waerzeggers, M. Seire, Eds. (Peeters, 2018), pp. 73–87.
58
E. von Dassow, “Introducing the witnesses in Neo-Babylonian documents” in Ancient Near Eastern, Biblical, and Judaic Studies in Honor of Baruch A. Levine, B. A. Levine, R. Chazan, W. W. Hallo, L. H. Schiffman, Eds. (Eisenbrauns, 1999), pp. 3–22.
59
L. B. Bregstein, “Seal Use in Fifth Century B.C. Nippur, Iraq: A Study of Seal Selection and Sealing Practices in the Murašû Archive” (Doctoral dissertation, University of Pennsylvania, Philadelphia, PA, 1993).
60
E. Ehrenberg, Uruk: Late Babylonian Seal Impressions on Eanna-Tablets (Philipp von Zabern, 1999).
61
E. Gehlken, “Uruk: Spätbabylonische Wirtschaftstexte aus dem Eanna-Archiv” in Ausgrabungen in Uruk-Warka. Endberichte 5 (Philipp von Zabern, 1990).
62
H. Lanz, Die Neubablonischen Harrânu-Geschäftsunternehmen (Schweitzer, 1976).
63
S. Holtz, Neo-Babylonian Court Procedure (Brill, 2009).
64
B. Wells, F. R. Magdalene, C. Wunsch, The assertory oath in Neo-Babylonian and Persian administrative texts. Revue Internationale des Droits de l ‘Antiquité 57, 13–29 (2010).
65
J. MacGinnis, Letter Orders from Sippar and the Administration of the Ebabbara in the Late-Babylonian Period (Bonami, 1995).
66
S. Zawadzki, The Rental Houses in the Neo-Babylonian Period (VI-V Centuries BC) (Wydawnictwo Agade, 2018).
67
C. Wunsch, “Debt, interest, pledge and forfeiture in the Neo-Babylonian and early Achaemenid period: The evidence from private archives” in Debt and Economic Renewal in the Ancient Near East, M. Hudson, M. Van De Mieroop, Eds. (CDL Press, 2002), pp. 221–255.
68
J. Hackl, M. Jursa, M. Schmidl, Spätbabylonische Privatbriefe (Ugarit-Verlag, 2014).
69
M. T. Roth, Babylonian Marriage Agreements: 7th-3rd Centuries BC (Butzon u. Bercker, 1989).
70
M. P. Streck, Orthographie. B. Akkadisch im II. und I. Jt. Reallexikon der Assyriologie 10, 137–140 (2003).
71
M. P. Streck, “Innovations in the neo-Babylonian lexicon” in Languages in the Ancient Near East: Proceedings of the 53e Rencontre Assyriologique Internationale,L. E. Kogan et al., Eds. (Eisenbrauns, 2010), pp. 647–660.

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 117 | No. 37
September 15, 2020
PubMed: 32873650

Classifications

Data Availability

All study data are included in the article, SI Appendix, and Datasets S1–S6. Atrahasis can be accessed on the Babylonian Engine Website (https://babylonian.herokuapp.com/). Our source code is available at GitHub (https://github.com/DHALab/Atrahasis).

Submission history

Published online: September 1, 2020
Published in issue: September 15, 2020

Keywords

  1. Babylonian heritage
  2. cuneiform script
  3. Late Babylonian dialect
  4. Achaemenid empire
  5. neural networks

Acknowledgments

This research was supported by the Ministry of Science & Technology, Israel, Grant 89540 for the project “Human-Computer Collaboration for Studying Life and Environment in Babylonian Exile” of Sh.G. and Amos Azaria, as part of Sh.G.’s Babylonian Engine initiative. We thank Eugene McGarry for language editing, Avital Romach for her assistance with the final proofs of this paper, Klaus Wagensonner for tablet photographs, and Moshe Shtekel for designing the web-tool for Atrahasis. We especially thank the anonymous reviewers for their detailed remarks and corrections.

Notes

This article is a PNAS Direct Submission. E.P. is a guest editor invited by the Editorial Board.
*Initiated by Pierre Briant of the Collège de France in 2000, this website is entirely dedicated to the history, material culture, texts, and art of the Achaemenid Empire. Since we began our study, the Babylonian text section has grown to include 2,709 texts (accessed 27 May 2020); it is administered by the Histoire et Archéologie de l’Orient Cunéiforme team of Francis Joannès (Unité Mixte de Recherche ArScAn 7041, CNRS, Nanterre, France).
As shown already by Streck and recently by Hackl, an actual sharp distinction between Neo- and Late Babylonian dialects does not linguistically exist (18) (see also Materials and Methods).
Kasr has, in fact, a mixed private and institutional background. See ref. 51 for an overview of cuneiform archives from Achaemenid period Babylonia and their time span. A more detailed discussion of each text group is found in ref. 20.
§
Designations of archives are listed in parentheses following each city name. Despite being mentioned in the description on the Achemenet website, the Ur archives are not yet represented in that collection. Archives already mentioned above, like Murašu from Nippur, are not included in this list.
The Murašu texts were damaged during their transport out of Nippur (54), and the Kasr texts partially survived a grim sequence of events triggered by the First World War. Many of them had already suffered ancient fire damage during or after the Achaemenid period (55).
#
Take, for example, the form of a very common word in the Nippur Achaemenid-period Murašu archive hatru. As shown by Stolper (54), the different spellings of this term leave the quality of the middle, dental consonant uncertain: (lú) ha-ad/t/ ṭ-ru/ri, its variants range from (lú) ha-d/ ṭa-ri, (lú) ha-dar/tár/ ṭár, and (lú) ha-d/ ṭa-ad/t/ ṭ-ri.

Authors

Affiliations

Faculty of Engineering, Bar-Ilan University, Ramat-Gan 5290002, Israel;
Yonatan Lifshitz
Alpha Program, Davidson Institute of Science Education, Weizmann Institute of Science, Rehovot 7610001, Israel;
Elad Aaron
Faculty of Social Sciences and Humanities, Digital Humanities Ariel Lab, Ariel University, Ariel 40700, Israel
Faculty of Social Sciences and Humanities, Digital Humanities Ariel Lab, Ariel University, Ariel 40700, Israel

Notes

2
To whom correspondence may be addressed. Email: [email protected] or [email protected].
Author contributions: E.F. and Sh.G. designed research; E.F., Y.L., E.A., and Sh.G. performed research; E.F. and Sh.G. contributed new reagents/analytic tools; E.F., Y.L., E.A., and Sh.G. analyzed data; and E.F. and Sh.G. wrote the paper.
1
E.F. and Sh.G. contributed equally to this work.

Competing Interests

The authors declare no competing interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements

Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to get full access to it.

    Single Article Purchase

    Restoration of fragmentary Babylonian texts using recurrent neural networks
    Proceedings of the National Academy of Sciences
    • Vol. 117
    • No. 37
    • pp. 22605-23194

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media