Supporting Appendix

A tutorial is available at, under "Phylogenetic Methods."

In an ab initio attempt to determine the linguistic position of Gaulish within the Indo-European family, we compiled a glossary of 35 Gaulish language items (lexicon, grammar, and one phonological item) by consulting bilingual (mainly Gaulish-Latin) inscriptions from southern France and northern Italy. By carefully restricting ourselves to bilingual inscriptions, we aim to avoid an ascertainment bias in our primary data, which would compromise both the phylogenetic analysis as well as the time estimation. For example, if we were instead to choose Gaulish features translated by etymological means, our data would be biased against Gaulish autapomorphies, and moreover biased toward the oldest and best-attested languages. It is a possibility that the ancient authors of the inscriptions may not have been perfectly bilingual and may conceivably have distorted some of the language items, but if this had extensively happened, then the network would not be expected to be treelike. Another consideration is the possibility that cisalpine and transalpine Gaulish constituted different languages, but for the phylogenetic analysis this is not crucial as long as cis- and transalpine Gaulish are more closely related to each other than to the other languages that we included in the analysis, namely Classical Greek, Latin, Italian, French, Occitan, Spanish, Old Irish, Modern Irish, Scots Gaelic, Welsh, Breton, and English.


Our data on the Gaulish language are drawn from a recent compilation of Gaulish records (1), the description of the La Graufesenque pottery finds (2), and a study on classical authors’ accounts of Gaulish plant names (3). Gaulish records include inscriptions on stones, coins, and instruments, as well as contemporary accounts of classical authors. These different types of records vary in their degree of reliability, so we consider only the least controversial records, namely, (i) translations obtained from bilingual inscriptions and inscriptions accompanied by unambiguous depictions, (ii) translations obtained from internal information (in this category we list those rare cases where translations can be deduced from the internal system of a Gaulish record in the absence of accompanying translations or depictions), and (iii) translations recorded by authors who were approximately contemporaneous with the inscriptions (dating from around the birth of Christ) and who are furthermore known to have visited the Gaulish area. Hence, we disregard most of the available monolingual inscriptions, as well as conclusions based on etymological considerations. In the following, Gaulish is shown in bold capitals to distinguish it from Latin. Square brackets indicate missing parts of the inscriptions.

Category I: Translations Obtained from Bilingual Inscriptions and Depictions. Bilingual inscription of the bifacial stone block of Todi.

Side A

Side B





















On side B, the N’s in "TRUTIKNI" and "KARNITU" are damaged. The translation of the Latin text (A = B) is "(The tomb) of Ategnatos son of Drutos; Coisis son of Drutos, his youngest brother, arranged and erected it." Although Todi is situated in Umbria, distant from Gaulish areas, the Gaulish identity of this inscription is established by the presence of the ending -IKNOS, and the word KARNITU, both known from inscriptions in Gaul (ref. 1, p. 76). The equivalence of the Latin and Gaulish inscriptions is evident from the presence of identical personal names. The following language features can be correlated: -IKNOS (Gaulish) is a patronymic suffix that the inscriptor translated by F(ILIVS), i.e., "son of"; the suffixes -OS and -I (Gaulish) in TRUTIKNOS/TRUTIKNI correspond to Latin masculine nominative and genitive endings, respectively.

Bilingual inscription of the stone of Vercelli.

















The translation of the Latin text is "Limit of the land which Acisius Argantoco-materecus gave in common to gods and men - (in the boundaries) where four stones have been erected." It may be inferred from the inscription that the inscribed stone block is one of the four stones in question. The name AKISIOS ARKATOKOKMATEREKOS (an erroneous K is assumed in ref. 1) corresponds to ACISIVS ARGANTOCOMATER-ECUS in Latin. COMVNEM DEIS ET HOMINIBVS corresponds to TEUOXTONION, postulated by Michel Lejeune (see ref. 1, p. 78) to be a compound adjective, "TEUO-" corresponding to Indo-European "divine" and "XTONION" corresponding to Greek "c q o n i o V ," i.e., "terrestrial, mortal, human." Others see it as a compound dvandva noun, "of gods and men" (4). For ATOM (see ref. 5 for alternative readings), an Indo-European root *anto- (edge, border) has been suggested (6); however, its translation appears less certain than other items in our glossary, so we exclude it from our analysis.

Inscriptions of the bronze instruments found at Visignot and Couchey (Département Côte d'Or).







These two inscribed bronze instruments found at two localities in the Département Côte d'Or are stylistically identical, indicating the same manufacturer for both, but one inscription is in Latin (Visignot), the other in Gaulish (Couchey); "leaf" designates the inscribed leaf ornament at the end of the lines. In this sense the twin Gaulish inscriptions are bilingual. The translation of the Latin text is "To the god of Alise, Paullinus (has offered this) for his son Contedoius. VSLM." The Latin abbreviation "VSLM" stands for the widespread votive phrase "uotum soluit libens merito", i.e., the person offering the instrument "has willingly and deservedly fulfilled his vow". Evidently the donor had earlier vowed to make an offering in return for a divine favor to his son; once the supplication was deemed fulfilled, the donor was gratefully fulfilling his side of the bargain. The Latin text indicates that the word IEVRV signifies "has offered". The alternative translation "has made" appears much less likely because "AVVOT" is used for this purpose in the La Graufesenque pottery inscriptions (see below). It follows that that "V" is the dative suffix of ALISANV.

Inscription on one of the stone blocks of the Pilier des Nautes Parisiaques (Paris).


This inscription dates from the reign of Tiberius (AD 14-37) and is accompanied by a representation of a bull with three birds (cranes). It thus appears safe to assume that Gaulish "TARVOS" corresponds to Latin "Taurus," Gaulish "TRI" to Latin "tri," and "GARANVS" may be (an inflected form of?) the Gaulish word for "crane."

Inscriptions on pottery manufactured at La Graufesenque.

The potters at La Graufesenque (2) shared kilns and kept an account, inscribed on the wet clay, of each potter's items in the loaded kiln. The accounts use Roman numerals and modified Latin names for the types of vessels (e.g., Latin "parapsidi" is changed phonetically to PARAXIDI; "parapsidi" as a Greek loan into Latin is recorded in Petronius 34,2) and mainly Gaulish words and grammar in headings and in the few accompanying remarks. Some of the accounts are in (faulty or Gallicized) Latin, which yields approximate translations for the Gaulish counterparts. Thus, the frequent heading of the type TUq OS CINTUX LUXTODOS corresponds to Latin "furnus primus oneratus," i.e., "first kiln-load." "Furnus" actually is a domestic oven rather than an industrial kiln ("fornax"). Possibly Gaulish did not differentiate between the two types of oven (neither does, for example, German), but see refs. 1 and 2 for different interpretations. Gaulish ordinal numbers from "first" to "tenth" are also discernable from these headings: CINTUX[, ALOS/ALLOS, TR[], PETUAR[], PINPETOS, SUEXOS, SEXTAMETOS, OXTUMETO[]/OXTUMET[, NAMET[, DECOMETOS/DECAMETOS. Furthermore, the translation "grand total" for SUMMA UXSEDIA at the ends of the accounts is obvious from the arithmetic, which includes running totals. UXSEDIA is also used to qualify vessel types, perhaps reminiscent of the English idiomatic "grand." ETI is translated as "item" (or incorrectly "idem"), i.e., "likewise/and," and is used to link two items. For linking two or three potters’ names, the idiomatic construction "(name) DUCI (name) TONI (name)" is used and translated in Latin by "(name) et (name) et (name)". The word "AVVOT" used by the potters in their signatures corresponds to Latin "fecit" (has made) used by the same potters, for example: "Iullus fe(cit)" and "Iullo(s) avot" (ref. 1, pp. 118-122). The identification of verbs (AVVOT, IEVRV) in Gaulish then permits the conclusion that the word order in the majority of available Gaulish sentences is of the type "subject-verb" (ref. 1, p. 68).

Category II: Translations Obtained from Internal Information.

Coligny lunar calendar.

The Coligny lunar calendar periodically introduced lunar leap months (one lunar month is 29.53 days) to keep pace with the solar year (365.25 days). Hence, the entry "M XIII LAT CCCLXXXV" announcing such a leap month evidently means "month(s) 13 day(s) 385". M and MID are used alternatively for "month" throughout the calendar.

Plomb de Larzac

The Larzac lead tablet (Plomb de Larzac) found in Hospitalet du Larzac is as yet incompletely interpreted but contains lists of female names, apparently accompanied by their familial relationships. In some cases the names occur in more than one relationship, e.g.: Aia duxtir Adiegas and Adiega matir Alias. It can be concluded that "matir" is "mother" and "duxtir" is "daughter"; "-a" is a feminine nominative and "-as" or "-ias" is the corresponding genitive.

Category III: Translations Reported by Classical Authors.

Between 1896 and 1914, Alfred Holder published three volumes of what he considered Celtic names and lexicon (7). This work contains an estimated 50,000 entries (assuming about 10 entries per page), which include classical attestations as well as etymological deductions. Lambert (1), in his chapter 15 and appendix, is more stringent and extracts from Holder's work 43 words attested by classical authors (as well as several etymologically deduced words). The complementary publication of André (3) critically discusses 84 putatively Gaulish plant names attested by classical authors, mainly Plinius and Marcellus of Bordeaux (4th century AD). Starting out from these 127 attested words, we impose two further stringency criteria: first, we require that the classical author explicitly states that the word is used in the language of the "Galli," "Celti," etc. It transpires that only 17 of Lambert’s 43 words were explicitly stated to be used in the Gaulish/Celtic language, while the other 26 words might arguably be Latin or Greek words that describe a Gaulish custom or object. In addition to the 19 words, we came across four additional Celtic words while perusing Holder's quotations and Plinius’ naturalis historia: "gaesus" was reportedly Celtic for "strong man;" and Plinius differentiates between "viriae" (Celtiberian) and "viriolae" (Celtic), yielding an additional Celtiberian word for the list. Plinius furthermore describes "marga columbina," a type of hard, flaky marl used in Gaul for fertilizing (evidently marlstone), which, according to him, the Gauls referred to as "eglecopala." This might fit with "pala" found on Lepontic gravestones, suggested to mean "stone" (8). These 21 words quoted as being used in Gaulish or Celtic are listed in Table 2. We dealt with André’s (3) plant names separately.

Our second condition is that the classical author is known to have lived around the time of the Gaulish inscriptions (i.e., around the birth of Christ) and to have resided in Gaul or northern Italy, to ensure that he is likely to have been acquainted personally with the Gaulish language. In the case of a gloss or scholium, it is equally necessary to determine whether the author of the gloss meets this criterion. Ausonius, for example, was born in Bordeaux, but at a time (AD 310) when his familiarity with the Celtic language (which by then may have been completely replaced by Latin as a written language; ref. 1) may be a matter of dispute. On the other hand, Varro (born in 116 BC in Rieti, central Italy) lived at the relevant time but is not known to have visited Gaul or northern Italy. For most of the authors in Table 2, we similarly found insufficient bibliographical confirmation (9) and had to exclude them. Only Plinius (born in Como in AD 23/24) meets this criterion, having travelled widely in the Roman army. Hence, only the three words alauda, eglecopala, and viriolae (Celtiberian: viriae) pass the criteria.

Similarly, from the plant list of André (3) we extracted references that (i) specified that the plant name was Gaulish, (ii) stemmed from a contemporary and local author (which again left only Plinius), (iii) did not leave in doubt the botanical identity of the plant according to André (3), and (iv) bore both Latin and Greek names of the plant to avoid problems in translation for our word list. Surprisingly perhaps, two plant names survived these rigors, namely vela and vettonica, identified by André (3) as Sisymbrium officinale and Sisymbrium irio, respectively.

In conclusion, the yield of reliably Gaulish words from classical sources is meagre, and our discovery of words that are cited as Gaulish but are not widely discussed in the literature ("eglecopala" and "gaesus") presumably indicates a significant reporting bias: words seem to have been underrepresented in the literature if they bear no resemblance to Indo-European and particularly to Celtic languages. To avoid biasing linguistic distances, we therefore dispensed with category iii for our further analyses.

Translation into European Languages

In Table 1, the confirmed Gaulish words and language items are listed and translated into several relevant European languages, with Basque as a negative control. In accordance with the lexicostatistical rules set by Swadesh (10), each translation is chosen to reflect the most commonly used term in a given language. The gold standard in the table is the ancient authors’ own translations, wherever available. A possible disadvantage of this standard is that the bilingualism of the potters, etc., may have been suboptimal. For example, the standard obliges us to translate TUqOS with "furnus" (oven) rather than "fornax" (kiln), and XTONION with "homo" rather than "humanus," thus ignoring Lejeune’s Greek translation "c q o n i o V ", i.e., "terrestrial, mortal, human". Woodhouse (11) was used for translating the list into Greek and Russell (12) was consulted for Celtic grammar. The lists for the more exotic languages were compiled and/or corrected by native speakers as indicated in the Acknowledgements.

The frequency of the phonetic sequence or phoneme -ps- was determined for the Romance languages by comparing Latin "capsa" to the modern Romance equivalents "caisse," "cassa," "caixa," etc. For the non-Romance languages, biblical and folkloristic electronic texts were downloaded from the internet, and searched for the occurrence of indigenous -ps- by word processor. Modern loans containing –ps- were infrequent in these texts, and we further excluded loans by inspection of every hit. The following frequencies were obtained: Old Irish 0/120,000 words, Breton 0/13,500 words, Scots Gaelic 0/4,200 words, Welsh 0/15,500 words, Basque 0/14,600 words, English 30/57,000 words, and Latin 92/24,000 words.

Coding of Item Translations. We grouped the translated items across languages according to similarity, with a translated item being considered different if its appearance (often the only information we have for Gaulish) is unrecognizable. (In principle, grouping of item translations according to etymology, as in ref. 13, would be feasible if the underlying language tree were known. However, etymology assumes a particular language tree, so including etymologically interpolated data into the reconstruction of the uncertain Indo-European/Celtic language tree would amount to circular reasoning.) For example, we score SV syntax as being different from VS syntax, and Spanish "hija" as being mutated and different from Latin "filia," notwithstanding their direct etymological relationship. Inevitably, chance convergence will incur an error rate, which we assess by comparison with Basque as a negative control. Similarity of items was scored in Table 2 by identical letters in parentheses. For instance, "TARVOS" in Gaulish has a recognizable relationship to "taurus" in Latin and "tarb" in Old Irish, so all three receive the same letter, in this case "a". The English and Basque translations ("bull", "zezen") are not recognisably similar to Gaulish "TARVOS" and receive different letters, in this case "b" and "z". In some cases, a translated item (for LUXTODOS, CINTUX, PINPETOS, and OXTUMETO) could only be classified by subsuming it within the spectrum of translated forms for that item.

Assessment of the Negative Control. In order to classify neutrally the translated items in Table 1, our coding procedure scores item replacements, irrespective of whether these are caused by change of appearance (e.g. Latin "filia" vs. Spanish "hija"), or whether these are caused by outright internal replacement or loans (e.g. Welsh "llawn" vs. Breton "karget"), as we would be unlikely to distinguish reliably between these possibilities for the sparse Gaulish corpus. This approach will fail to detect item replacements, as defined here, whenever etymologically unrelated items have converged by chance (a popular example is Spanish "mucho" vs. English "much"). As can be determined from Table 1, our negative control Basque scores an average of only 5 matching items out of 35 items when compared to our Indo-European languages. This value corresponds to the error of the coding procedure if we assume that no Basque item is etymologically related to Indo-European. Some may argue (14) that Basque matches might reflect loan events from or even ancestral relationships with Indo-European; in both cases the actual coding error would be lower. In either case, the relatively low error justifies the procedure and is not expected to obscure the phylogenetic network.

1. Lambert, P.-Y. (1994) La Langue Gauloise (Editions Errance, Paris).

2. Marichal, R. (1988) Les Graffites de La Graufesenque, Gallia (Centre National de la Recherche Scientifique, Paris), Supplement 47.

3. André, J. (1985) Etudes Celtiques 22, 179–198.

4. Eska, J. F. (1994) Münchener Studien zur Sprachwissenschaft 55, 7–39.

5. Lejeune, M. (1988) in Recueil des Inscriptions Gauloises II, Gallia (Centre National de la Recherche Scientifique, Paris), Supplement 45.

6. Tibiletti Bruno, M. G. (1976) Atti della Accademia Nazionale dei Lincei. Rendiconti: Classe di Scienze Morali, Storiche e Filologiche 31, 355–378.

7. Holder, A. (1896-1914) Alt-Celtischer Sprachschatz (Teubner, Leipzig), reprint (1961) (Akademische Druck-und-Verlagsanstalt, Graz, Austria).

8. Risch, E. (1984) Schriftenreihe des raetischen Museums Chur. 28, 22–36.

9. Hornblower, S. & Spawforth, A. (1996) The Oxford Classical Dictionary (Oxford Univ. Press, Oxford), 3rd Ed.

10. Swadesh, M. (1955) Int. J. Am. Linguist. 21, 121–137.

11. Woodhouse, S. C. (1987) EnglishGreek Dictionary: A Vocabulary of the Attic Language (Routledge and Kegan Paul, London).

12. Russell, P. (1995) An Introduction to the Celtic Languages (Longman, Essex, U.K.).

13. Forster, P., Toth, A. & Bandelt, H.-J. (1998) J. Quant. Linguist. 5, 174–187.

14. Hamel, E. & Vennemann, T. (May 2002) Spektrum der Wissenschaften, pp. 32–42 (German); trans. (July 2002) Le Scienze 407, 62–71 (Italian); trans. (September 2002) Pour la Science 299, 24–33 (French); trans. (January 2003) Investigacion y Ciencia, pp. 62–71 (Spanish).