Automated reconstruction of ancient languages using probabilistic models of sound change
See allHide authors and affiliations
Edited by Nick Chater, University of Warwick, Coventry, United Kingdom, and accepted by the Editorial Board December 22, 2012 (received for review March 19, 2012)

Abstract
One of the oldest problems in linguistics is reconstructing the words that appeared in the protolanguages from which modern languages evolved. Identifying the forms of these ancient languages makes it possible to evaluate proposals about the nature of language change and to draw inferences about human history. Protolanguages are typically reconstructed using a painstaking manual process known as the comparative method. We present a family of probabilistic models of sound change as well as algorithms for performing inference in these models. The resulting system automatically and accurately reconstructs protolanguages from modern languages. We apply this system to 637 Austronesian languages, providing an accurate, large-scale automatic reconstruction of a set of protolanguages. Over 85% of the system’s reconstructions are within one character of the manual reconstruction provided by a linguist specializing in Austronesian languages. Being able to automatically reconstruct large numbers of languages provides a useful way to quantitatively explore hypotheses about the factors determining which sounds in a language are likely to change over time. We demonstrate this by showing that the reconstructed Austronesian protolanguages provide compelling support for a hypothesis about the relationship between the function of a sound and its probability of changing that was first proposed in 1955.
Footnotes
- ↵1To whom correspondence should be addressed. E-mail: bouchard{at}stat.ubc.ca.
Author contributions: A.B.-C., D.H., T.L.G., and D.K. designed research; A.B.-C. and D.H. performed research; A.B.-C. and D.H. contributed new reagents/analytic tools; A.B.-C., D.H., T.L.G., and D.K. analyzed data; and A.B.-C., D.H., T.L.G., and D.K. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission. N.C. is a guest editor invited by the Editorial Board.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1204678110/-/DCSupplemental.
See Commentary on page 4159.
Freely available online through the PNAS open access option.
Citation Manager Formats
Article Classifications
- Physical Sciences
- Computer Sciences
- Social Sciences
- Psychological and Cognitive Sciences
See related content: