Skip to main content
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian
  • Log in
  • My Cart

Main menu

  • Home
  • Articles
    • Current
    • Latest Articles
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • Archive
  • Front Matter
  • News
    • For the Press
    • Highlights from Latest Articles
    • PNAS in the News
  • Podcasts
  • Authors
    • Purpose and Scope
    • Editorial and Journal Policies
    • Submission Procedures
    • For Reviewers
    • Author FAQ
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home

Advanced Search

  • Home
  • Articles
    • Current
    • Latest Articles
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • Archive
  • Front Matter
  • News
    • For the Press
    • Highlights from Latest Articles
    • PNAS in the News
  • Podcasts
  • Authors
    • Purpose and Scope
    • Editorial and Journal Policies
    • Submission Procedures
    • For Reviewers
    • Author FAQ

New Research In

Physical Sciences

Featured Portals

  • Physics
  • Chemistry
  • Sustainability Science

Articles by Topic

  • Applied Mathematics
  • Applied Physical Sciences
  • Astronomy
  • Computer Sciences
  • Earth, Atmospheric, and Planetary Sciences
  • Engineering
  • Environmental Sciences
  • Mathematics
  • Statistics

Social Sciences

Featured Portals

  • Anthropology
  • Sustainability Science

Articles by Topic

  • Economic Sciences
  • Environmental Sciences
  • Political Sciences
  • Psychological and Cognitive Sciences
  • Social Sciences

Biological Sciences

Featured Portals

  • Sustainability Science

Articles by Topic

  • Agricultural Sciences
  • Anthropology
  • Applied Biological Sciences
  • Biochemistry
  • Biophysics and Computational Biology
  • Cell Biology
  • Developmental Biology
  • Ecology
  • Environmental Sciences
  • Evolution
  • Genetics
  • Immunology and Inflammation
  • Medical Sciences
  • Microbiology
  • Neuroscience
  • Pharmacology
  • Physiology
  • Plant Biology
  • Population Biology
  • Psychological and Cognitive Sciences
  • Sustainability Science
  • Systems Biology

Sound–meaning association biases evidenced across thousands of languages

Damián E. Blasi, Søren Wichmann, Harald Hammarström, Peter F. Stadler, and Morten H. Christiansen
PNAS published ahead of print September 12, 2016 https://doi.org/10.1073/pnas.1605782113
Damián E. Blasi
aDepartment of Comparative Linguistics and Psycholinguistics Laboratory, University of Zürich, CH-8006 Zurich, Switzerland;bDepartment of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, 07745 Jena, Germany;cDiscrete Biomathematics Group, Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Damián E. Blasi
  • For correspondence: damianblasi@gmail.com
Søren Wichmann
dUniversity of Leiden, 2311 BV Leiden, The Netherlands;eKazan Federal University, Kazan, Russia, 420000;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Harald Hammarström
bDepartment of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, 07745 Jena, Germany;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter F. Stadler
cDiscrete Biomathematics Group, Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany;fInterdisciplinary Center for Bioinformatics, Department of Computer Science, University of Leipzig, 04107 Leipzig, Germany;gSanta Fe Institute, Santa Fe, NM 87501;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Peter F. Stadler
Morten H. Christiansen
hDepartment of Psychology, Cornell University, Ithaca, NY 14853;iInteracting Minds Centre, Aarhus University, 8000 Aarhus C, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  1. Edited by Anne Cutler, University of Western Sydney, Penrith South, NSW, Australia, and approved July 25, 2016 (received for review April 13, 2016)

  • Article
  • Figures & SI
  • Authors & Info
  • PDF
Loading

Significance

The independence between sound and meaning is believed to be a crucial property of language: across languages, sequences of different sounds are used to express similar concepts (e.g., Russian “ptitsa,” Swahili “ndege,” and Japanese “tori” all mean “bird”). However, a careful statistical examination of words from nearly two-thirds of the world’s languages reveals that unrelated languages very often use (or avoid) the same sounds for specific referents. For instance, words for tongue tend to have l or u, “round” often appears with r, and “small” with i. These striking similarities call for a reexamination of the fundamental assumption of the arbitrariness of the sign.

Abstract

It is widely assumed that one of the fundamental properties of spoken language is the arbitrary relation between sound and meaning. Some exceptions in the form of nonarbitrary associations have been documented in linguistics, cognitive science, and anthropology, but these studies only involved small subsets of the 6,000+ languages spoken in the world today. By analyzing word lists covering nearly two-thirds of the world’s languages, we demonstrate that a considerable proportion of 100 basic vocabulary items carry strong associations with specific kinds of human speech sounds, occurring persistently across continents and linguistic lineages (linguistic families or isolates). Prominently among these relations, we find property words (“small” and i, “full” and p or b) and body part terms (“tongue” and l, “nose” and n). The areal and historical distribution of these associations suggests that they often emerge independently rather than being inherited or borrowed. Our results therefore have important implications for the language sciences, given that nonarbitrary associations have been proposed to play a critical role in the emergence of cross-modal mappings, the acquisition of language, and the evolution of our species’ unique communication system.

  • linguistics
  • cognitive sciences
  • language evolution
  • iconicity
  • sound symbolism

Although there is substantial debate in the language sciences over how to best characterize the features of spoken language, there is nonetheless a general consensus that the relationship between sound and meaning is largely arbitrary (1⇓–3). Plenty of exceptions exist, however, within individual languages. For instance, ideophones—a class of words found in many languages—convey a communicative function (or meaning) through the depiction of sensory imagery (4). In the Mel language Kisi Kisi (spoken in Sierra Leone), hábá means “(human) wobbly, clumsy movement,” and hábá-hábá-hábá “(human) prolonged, extreme wobbling”; here, repetition serves as a way to convey the meaning of intensity. More generally, the resemblance between certain aspects of the acoustic basis of speech and their referents, “iconicity,” is the most researched and well-known case of nonarbitrary associations between sound and meaning (5, 6). “Systemacity,” in contrast, refers to (statistical) regularities that are common to particular set of words, created by historical contingencies and analogical processes (5). For example, word-initial gl- in English evokes the idea of a visual phenomenon (as in glare, glance, glimmer) (7). At a larger scale, there is evidence that the phonological properties of whole morphosyntactic classes of words (like verbs and nouns) are distinct in several languages (8).

The evidence of recurring regularities in sound–meaning mappings across multiple languages is considerably more modest, despite its potential importance for fundamental questions about language evolution and the role of basic perceptual biases in cognition. For example, certain shape–sound associations—known as the bouba-kiki effect (9⇓–11)—are believed to rely on the ability that humans [and perhaps also other primate species (12)] have for associating stimuli across different modalities (13). Other plausible sources of cross-linguistic associations include, for instance, the relationship across many animal species between vocalization frequency and animal size (14), the mimicry of referents via unconscious mouth gesturing (15), and the persistence of vestiges of a conjectured early human language (16).

Experimental studies support the hypothesis that humans are indeed sensitive to such associations. It has been demonstrated several times that participants perform above chance when asked to pair up words with opposite meanings (antonyms) in languages unknown to them (17), and that English speakers might even be able to decide on the concreteness of words from languages to which they have not been exposed (18). However, this evidence for nonarbitrary sound–meaning associations pertains only to narrow pockets of the vocabulary, making it unclear whether a more general pressure toward arbitrariness may overpower such potential biases when considering a more semantically diverse selection of the vocabulary (2, 19).

A further issue with current studies of nonarbitrariness in sound–meaning correspondences is that, save for a single exception (20), cross-linguistic corpus studies of nonarbitrary associations have tended to rely on a small number of languages (maximally 200) and focusing on small semantically restricted sets of words, ranging from phonation-related organs (21) to South American animals (15), to spatial orientation (demonstratives) (14, 22), repair initiators (like huh? in English) (23), and the conceptualization of magnitude in Australian languages (24). These studies involve confirmatory analyses, aiming to test specific hypotheses regarding sound–meaning correspondences; as a consequence, they are guided by a priori intuitions or indirectly by findings from other disciplines. These limitations may help explain, at least in part, why language scientists typically consider nonarbitrary associations to be marginal phenomena that may only apply to small, strictly circumscribed regions of the vocabulary (3). In this paper, we therefore conduct a comprehensive set of analyses involving a semantically diverse set of words from close to two-thirds of the world’s languages.

Testing Associations on a Global Scale

The availability of a large collection of word lists allows us to search for statistically robust associations in an unsupervised, theory-neutral manner. The data consist of 28–40 lexical items from 6,452 word lists, with a subset of 328 word lists having up to 100 items (25). Words are transcribed into a phonologically simplified system consisting of 34 consonant and 7 vowels, which we refer to collectively as “symbols” (Table S1). These words belong to what is often referred to as “basic vocabulary,” including for instance pronouns, body part terms, property words, motion verbs, and nouns describing natural phenomena (26). The word lists include both languages and dialects, spanning 62% of the world’s languages and about 85% of its lineages (Fig. 1). A lineage is a maximal set of languages that can be shown to have a common ancestor. Such a set may have only one member (an isolate) or multiple members (a family).

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

Geographic distribution of the 6,452 word lists from the ASJP database (25). Colors distinguish different linguistic macroareas, regions with relatively little or no contact between them (but with much internal contact between their populations). These are North America (orange), South America (dark green), Eurasia (blue), Africa (green), Papua New Guinea and the Pacific Islands (red), and Australia (fuchsia).

View this table:
  • View inline
  • View popup
Table S1.

ASJP symbols and their description

Regarding the classification of languages, the Glottolog genealogical classification is preferable over other available alternatives because it is the only one to classify every living or extinct language while providing brief pointers to justifications for all choices taken—however, a less conservative independent classification was used additionally in the main test (see below). We stratify languages geographically by dividing the world’s landmass into six largely independent linguistic macroareas: North America, South America, Eurasia, Africa, Greater New Guinea, and Australia—these regions have a history of attested contact within them but little contact between them in prehistorical times (27). To guarantee that only truly global associations were selected, we screened the sound–meaning associations, keeping only those where the concept and symbol were attested in languages from at least 10 different lineages and found in no less than three different macroareas.

We aim to capture robust and widespread tendencies in sound–meaning associations, where “tendency” should be understood as a systematic bias in the frequency with which certain words tend to carry specific symbols in contrast to their baseline occurrence in other words. Crucially, a strong tendency does not imply that a signal has an extremely high frequency of occurrence, and conversely a very frequent sound–meaning co-occurrence is not sufficient evidence to discount chance. Importantly, whatever advantage a sound–meaning pairing might confer in terms of learning or processing, it has to be considered in the context of a myriad of competing factors that shape the phonetic and phonological fabric of words, from articulatory production costs (28) to systemic constraints due to the similarity with other lexical elements (29).

Our statistical approach consists in a series of tests where the presence of a symbol in a word is contrasted against a suitable subset of other words, and then the bias is evaluated across lineages. To begin, we calculate, for each concept and symbol, a genealogically balanced average ratio of the times they co-occur in a word of a language for which both symbol and concept are attested. We simulated the same quantity based on the rest of the concepts and compared it with the previously computed quantity (Materials and Methods). The associated P value roughly estimates the chance of finding the same or more extreme (genealogically balanced) average by picking any word other than the target one. Notice that this includes both recurring sound–meaning pairings as well as its complement, sound–meaning associations that are observed less often than expected given our null model.

Crucially, a sequence of tests need to be applied to ensure that potential associations are not statistical artifacts (see Materials and Methods and SI Materials and Methods for more details). First, we used two independent worldwide language classifications with contrasting degrees of conservativeness (30, 31). Second, we controlled the false discovery rate at a 5% expected level of false positives (for both classifications independently) so as to avoid an inflated number of associations due to multiple comparisons.

Third, word length is trivially correlated with the chance of finding any particular symbol. There is considerable variance in the (genealogically balanced) length of the words in our dataset, with some pronouns, negation, and basic verbs (like say and give) consisting only of about three symbols on average, whereas the length of some color words and body part terms contain is over five (Fig. S1). We filter out associations that also emerge when all of the symbols of all of the words of each language are randomly permuted while keeping word lengths fixed.

Fig. S1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S1.

On the Left, genealogically balanced average of the number of characters for each of the 40 concepts with most coverage in ASJP. The horizontal bars represent approximate 95% CI for the average. On the Right, distribution of the genealogically balanced average for all of the concepts in ASJP. In both graphs, the vertical blue bar represents the mean value across all concepts in ASJP.

Fourth, besides the mere number of symbols, word length might be a confound due to the fact that different phonotactic restrictions might apply accordingly. For instance, in a language that only allows consonant–vowel structures and also prohibits the presence of word-initial liquids, no monosyllabic words will carry liquids. To remedy this, we performed a test similar to the first one described but this time comparing words only with the length-matched equivalents of different concepts.

Finally, to filter out associations due to areal contact or unresolved genealogy, we looked for association that could be detected within the linguistic macroareas independently. Thus, we restricted our attention to associations that passed all these statistical controls and for which a bias consistent with the worldwide trend could be found in at least three macroareas, with no single area showing a bias in the opposite direction.

It should be noted that the overall testing scheme is conservative and that it is likely to have a large false-negative rate. Also working against our analyses is the fact that the core set of concepts we use was originally gathered due to their exceptional phylogenetic persistence and resistance to borrowing, thus rendering them less likely to be adapted to potential functional biases that might underlie specific sound–meaning associations. Moreover, it is not clear a priori whether the granularity of our phonetic descriptions is sufficiently fine to capture widespread sound–meaning relations—for instance, the opposition between voiced and unvoiced consonants and between rounded and unrounded in vowels have been suggested to bear importance for sound–symbolism (22, 32), but each feature pair are usually conflated under a single symbol in the database. For these reasons, the associations found in our analyses should be regarded as providing a lower-bound estimate of the presence of nonarbitrariness in sound–meaning pairings.

SI Materials and Methods

Positional Test.

We simulate, for each language and signal, random positions of the relevant signal-associated symbol based on all of the available positions in the word according to the consonant/vowel distinction. Concretely, we calculate the number of times the phone is initial when its simulated counterpart is not, averaging genealogically and respecting the vowel and consonant template of each word. Then we compare this quantity in the original word list against n=1,000 simulations and consider those cases in which the original bias is larger than 95% of the simulated cases. These results can be observed in Table S6.

Areal and Population Test.

For each positive signal we calculated the great circle distances—i.e., the distance in kilometers of the shortest geodesic connecting two points in the surface of the Earth—involving all languages having both the relevant symbol and concept (but not necessarily the signal) and their nearest language from a different lineage that has the (positive) signal (dnn). The hypothesis is that small distance from a language that has a signal will influence the likelihood of signal presence in a given language. Only signals belonging to the group of 28–40 better attested concepts were used for the analysis, and only one dialect per language was chosen. Extinct languages were excluded from the analyses.

For the testing, we used a generalized logistic model with random effectslogit(E[signal presence])=α+(βdnn+βdnnlineage)log(1+dnn)+βpoplog(population)+αlineage,where the superscripted coefficients (βdnnlineage and αlineage) are random effects structured according to the lineage. Lineage as a random intercept is introduced as a means of accounting for the varying baseline presence of the signals within lineages, and their presence as random slopes aims to capture the fact that lineages have spread with different rates across the globe. The logarithmic transforms aim to reduce the effect of population and distance outliers. P values were estimated through an asymptotic likelihood ratio test. Apart from the estimated coefficients, we calculated the genealogical balanced mean difference in probability of having a signal for two reference points, one variable at a time. For population, the difference was calculated between fixing all languages’ populations to 10,000 individuals and a single individual, and for dnn between 1,000 km—which is roughly the maximum radius of linguistic areas as defined in AUTOTYP (56)—and 0 km (which corresponds to the situation where both languages as spoken at the same place). The results can be observed in Table S5.

Word Similarity Test.

Ideally, a proper phylogenetic test in the context of language history would comprise some kind of data carrying a phylogenetic signal (like cognate sets or collections of regular sound changes) and a sound evolutionary model that would lead to a tree or a distribution of trees. Unfortunately, such trees exist for only a handful of language families (57, 58). Instead, we approach the question of both phylogenetic stability and ancestry of signals by analyzing word form similarity, which serves as a proxy for cognacy. If it is a correct hypothesis that signals render words less prone to change and that they are prehistoric vestiges, then, after controlling for concept, symbol, and lineage, we would expect to find that the similarity among words is predicted by signals.

The distance between words used here is the Levenshtein distance, which has found several uses in linguistics and often correlates with perceptual, processing, and other meaningful lexical distances differences (59, 60). The Levenshtein distance between strings x and y LD(x,y) is defined as the minimum number of edits, additions, or deletions of characters necessary to make two strings identical. For instance, “Zultus” and “sulus”—star in Uyghur and Sakha (two Turkic languages), respectively, have a Levenshtein distance of 2: a change of “Z” to “s” and the deletion of “t” in the Sakha word. The normalized Levenshtein distance is simply l=LD(x,y)/max(|x|,|y|).

For every family with at least six languages and every combination of concept and symbol, we calculated the Levenshtein distance between all members of two groups: word pairs for a concept belonging to a combination, and word pairs for a concept sharing at least one symbol but not the symbol relevant for the combination. For instance, given a family with three languages having the forms ana, ena, and ete for the concept “rock,” and considering the combination rock-n, we will have the two following groups: (ana,ena) and (ena,ete). Families with less than three distances in any of the groups were excluded from the analysis.

To summarize the previous information, we calculated, for each family, the probability of choosing a distance in the signal-sharing group and another in the non–signal-sharing group and finding that the first is smaller than the second [Pr(ls<l−s)]. The larger this quantity, the more reliable an estimator of word form similarity the association is.

Then we implemented the following β regression mixed model with logistic link function and constant precision parameterlogit(E[Pr(ls<l−s)])=∑conceptsβiIi+∑symbolsβjIj+αsignalhood+αlineage,where the i and j indexes run over the set of concepts and symbols, respectively; the coefficient “signalhood” indicates whether the combination of concept and symbol is to be found in Table S2. “Signalhood” was coded as a single level common to all individual positive signals. αlineage stands for a random intercept according to lineage. To cope with a few values of Pr(ls<l−s) identical to 1 (that account for less than 0.5% of the data), we applied the transformation t(x)=(x(N−1)+0.5)/N to the values (61). As a way of accounting for the more robust evidence provided by lineages with a large number of distance pairs to be compared, we included a weight for each observation equal to the logarithm of the number of such pairs involved—however, the results did not differ considerably from the unweighted case. Overall, the model quality is heavily dominated by lineage: 86% vs. 3% of explained deviance with and without the lineage random effect, respectively.

Strong Worldwide Associations

Our analysis detected 74 (positive and negative) sound–meaning associations, involving 30 concepts and 23 symbols. All of these associations are referred to as “signals” (Table 1; more detail is provided in Tables S2 and S3).

View this table:
  • View inline
  • View popup
Table 1.

Summary of signals found in the ASJP database

View this table:
  • View inline
  • View popup
Table S2.

Complete list of positive signals found in the ASJP database

View this table:
  • View inline
  • View popup
Table S3.

Complete list of negative signals found in the ASJP database

Signals will be described in terms of the most relevant information about them: the frequency of the symbol in the words corresponding to the concept (p), the ratio between that frequency and the frequency in other words (RR), the number of lineages that were analyzed for the global association (nl), and the ratio between the number of areas where the association was independently found and the total number of tested areas (as/at).

Some concepts are associated with more than one signal. These are expected to be correlated; across languages, it is often observed that there are preferences or restrictions with regard to the co-occurrence of symbols within one and the same word for either diachronic or synchronic phonotactic reasons. As an example, it is known that high front vowels trigger palatalization (33), so it is therefore not surprising that the voiceless palato-alveolar affricate C appears with i in the signals of small. In a set of testable pairs of signals (Materials and Methods), signals sharing a concept tend to be significantly associated in about 41% of the time, against only 8% of signals involving different concepts (Table S4).

View this table:
  • View inline
  • View popup
Table S4.

Dependencies between signals involving the same concept

The signals found in our analysis show a mixture of well-known and new associations. In line with the considerable literature on magnitude sound symbolism, the concept small was found to be associated with the high front vowel i (RR = 1.58, P = 0.61, nl = 78, as/at = 3/5), consistent with findings linking vowel height quality and size (14, 17), and with the palatal consonant C (RR = 5.12, P = 0.41, nl = 61, as/at = 3/4), also in agreement with previous work (14, 24).

We also observed a strong association between round and r sounds (RR = 2.48, P = 0.37, nl = 56, as/at = 4/5). Although most recent research has emphasized the role of consonants in shape–sound meaning associations like this (34, 35), the usual hypothesis in this direction concerned the correlation between vowel roundedness and round objects (11)—association that appears as a tendency in our analyses without reaching the minimum statistical threshold established before. Both small and round have been linked to the phenomenon of cross-modal mapping (10, 13, 36). Another property word, full, is endowed with a pair of signals involving voiced (RR = 1.91, P = 0.22, nl = 213, as/at = 4/6) and unvoiced bilabial stops (RR = 2.11, P = 0.13, nl = 231, as/at = 5/6).

Some of the strongest signals found correspond to body parts. Tongue was very strongly associated with the lateral “l” (RR = 2.77, P = 0.41, nl = 280, as/at = 6/6) and the mid and low front vowels e (RR = 1.54, P = 0.11, nl = 322, as/at = 5/6) and E (RR = 1.73, P = 0.11, nl = 164, as/at = 4/6). Nose was found to be associated most strongly with the alveolar nasal n (RR = 1.47, P = 0.35, nl = 334, as/at = 4/6) and the high back vowel u (RR = 1.38, P = 0.35, nl = 325, as/at = 4/6). The link between nose and nasality has been noted previously (37), in particular in reference to the conjecture that body part terms used in phonation makes use of the distinctive qualities provided by the relevant organ (21).

Breasts was associated with the bilabial nasal consonant m (RR = 1.63, P = 0.32, nl = 320, as/at = 4/6) and the high back vowel u (RR = 1.46, P = 0.37, nl = 317, as/at = 4/6). Similar associations were found in the nursery terms for mother, a concept with which it often colexifies. It has been suggested that this might be due to the mouth configuration of suckling babies or to the sounds feeding babies produce (38, 39).

Although this study lends support to a number of associations that were either elicited in experiments or conjectured based on a much smaller number of languages, it also provides telling negative evidence on others. Together with the association between high front vowels and the concept of small, there has been reports on a connection between back low vowels and the notion of big (22). However, big (nl = 73) and large (nl = 74) and o did not show any relevant signature of association in our sample at the global level. Similarly, an analogous front/back vowel opposition has been proposed to hold between proximal and distal pronouns—the purported explanation being that proximal referents tend to be small, whereas distal referents are usually large (22). The concepts this (nl = 71) and that (nl = 74), however, do not show any associations with i and o (respectively).

Origins and Nature of the Associations

As discussed in the previous sections, there are multiple theories that attempt to elucidate why humans find that some sounds are more convenient or salient in association with certain meanings. How these hypothesized mechanisms lead to the widespread biases in vocabularies we find here is a complex question that is unlikely to be fully answered by the inspection of wordlists. Nonetheless, we can attempt to evaluate some of the potential consequences of those theories given the coarse level of detail of our data.

Functional advantages might increase the likelihood of signals being borrowed across languages in contact with one another, thus producing spatial diffusion patterns (39) (Fig. 2). The existence of opposing factors obscure definitive inferences in this direction, however: basic vocabulary items are particularly resistant to borrowing, but unresolved genealogy involving nearby languages would be confounded with borrowing. In the same direction, large populations have been claimed to be more efficient at gaining and retaining nonarbitrary sound–meaning associations given a potential functional value (39), which is coherent with recent evidence from some Austronesian languages showing that larger populations gain new words at a faster rate (40).

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

Competing configurations of the spatial distribution of the tested languages. Blue and fuchsia dots represent languages with and without a specific signal, respectively. In the panel to the Left, the likelihood of a language having the signal is correlated with its geographical distance to its nearest neighbor, and on the Right, there is no spatial structure.

We determined whether present-day log population size and log distance to the nearest genealogically unrelated language bearing the (positive) signal are effective predictors for signal presence, via a mixed-effects logistic model (Table S5 and SI Materials and Methods). At α=0.05, log population turned out to be significant in about one-third of the cases, but the effect was small and as many times positive as it was negative, which rules out a consistent role for population. Only one-fifth of the signals showed sensitivity to the distance of nearest neighbors with signal, with all of the cases having an effect in the predicted direction by our model. On average, and in contrast to the case in which a language and its signal-bearing nearest genealogically unrelated neighbor are spoken in exactly the same place, the probability of finding the signal also in the language drops by 28%.

View this table:
  • View inline
  • View popup
Table S5.

Spatial and population analysis

From a historical perspective, it has been suggested that sound–meaning associations might be evolutionarily preserved features of spoken language (41), potentially hindering regular sound change (17). Furthermore, it has been claimed that widespread sound–meaning associations might be vestiges of one or more large-scale prehistoric protolanguages (16). Tellingly, some of the signals found here feature prominently in reconstructed “global etymologies” (42, 43) that have been used for deep phylogeny inference (44). If signals are inherited from an ancestral language spoken in remote prehistory, we might expect them to be distributed similarly to inherited, cognate words; that is, their distribution should to a large extent be congruent with the nodes defining their linguistic phylogeny (see Fig. 3 for illustration).

Fig. 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 3.

Genealogical trees of languages where leaves are words for specific referents. In the figure to the Left, cognate classes (depicted as different shapes) are associated with signal presence (blue shapes), whereas to the Right there is no such correspondence.

A direct evaluation of this hypothesis is infeasible due to the absence of etymological dictionaries for all but a few families. However, it can be tested indirectly given that cognate words are expected to be more similar to one another than noncognates (45). We investigated whether the presence of the signal-bearing symbol was a better indicator of overall form similarity between words than other shared symbols, using a β mixed-regression model that distinguishes the effects of symbols, concept, and lineage (SI Materials and Methods). The model is heavily dominated by the effect of lineage, and signal presence (although significant) has a negligible effect in the opposite direction than predicted: the genealogically balanced average effect is less than a 0.5% decrease in similarity for those words sharing a signal-related symbol compared with those sharing some other symbol.

Consistency in word position is important for establishing cognacy (45, 46). Further support for the idea that signals are not residuals of deep history comes from the analysis of the position within the word in which they occur, in particular whether they have a clear word-initial bias. All in all, we find that signals do not have a consistent cross-linguistic preference or dispreference in this respect beyond well-established cross-linguistic phonotactic patterns, such as the avoidance of liquids or the prevalence of dorsal and labial stops in word-initial position (47, 48) (SI Materials and Methods and Table S6).

View this table:
  • View inline
  • View popup
Table S6.

Analysis of word-initial position bias

These results suggest that, although it is possible that the presence of signals in some families are symptomatic of a particularly pervasive cognate set, this is not the usual case. Hence, the explanation for the observed prevalence of sound–meaning associations across the world has to be found elsewhere (49).

Conclusion

We have demonstrated that a substantial proportion of words in the basic vocabulary are biased to carry or to avoid specific sound segments, both across continents and linguistic lineages. Given that our analyses suggest that phylogenetic persistence or areal dispersal are unlikely to explain the widespread presence of these signals, we are left with the alternative that the signals are due to factors common to our species, such as sound symbolism, iconicity, communicative pressures, or synesthesia. We expect future research to further elucidate the role and interaction of these factors in driving the observed sound–meaning association biases, and to extend the scope of our findings to a broader portion of the vocabulary.

The outcome of our analyses have consequences for historical-comparative linguistics, where it has been suggested that there is a small set of ultraconserved words that are particularly useful for establishing ancient genealogical relations beyond the limits of the comparative method (44). However, some of these words are involved in the signals discovered here: we is associated with the alveolar nasal, hear with the velar nasal, and ash with the vowel u. Thus, proposals of far-reaching etymologies based on words of similar form and meaning should be accompanied by an evaluation of whether the observed lexical similarities might have resulted from the kinds of signal discussed in this paper rather than common inheritance. More generally, even though it is unclear whether the locus of the emergence of signals is in the invention or historical development of lexical roots, our findings have implications for the study of the dynamics of lexical phonology.

In summary, our results provide insights into the constraints that affect how we communicate, suggesting that despite the immense flexibility of the world’s languages, some sound–meaning associations are preferred by culturally, historically, and geographically diverse human groups.

Materials and Methods

Basic Vocabulary Word Lists.

The dataset used for this study is drawn from version 16 of the Automated Similarity Judgment Program (ASJP) database (25). ASJP comprises 6,895 word lists from around 62% of the world’s languages, covering 85% of families, isolates, and unclassified languages [using the Ethnologue (50) for these statistics]. After removing artificial languages, pidgins, and creoles, and varieties whose ISO-639-3 code cannot be confirmed, the number goes down to 6,447 word lists, corresponding to 4,298 different languages and 359 lineages. The database was not constructed for the specific purpose of studying sound symbolism, but rather for identifying genealogical relations among languages. For this reason, it generally consists of the 40-item subset of the 100-item so-called Swadesh list (51) that are assumed to remain stable as languages diverge into different lineages over time (52). Of these word lists, 328 additionally contain the remaining 60 Swadesh lists items.

Words are rendered in a unified transcription system, which facilitates cross-linguistic comparison but also ignores phonetic details such as vowel length, nasalization, tones, and retroflexation. Vowel quality distinctions are merged into seven categories (high front, mid front, low front, high-mid central, low central, high back, and midlow back) (see ref. 53 for a discussion of the system).

Each 40-item word list provides translational equivalents, when available, for the following items: blood, bone, breast, come, die, dog, drink, ear, eye, fire, fish, full, hand, hear, horn, I, knee, leaf, liver, louse, mountain, name, new, night, nose, one, path, person, see, skin, star, stone, sun, tongue, tooth, tree, two, water, we, and you (sg). The additional Swadesh list items contained in some of the word lists are as follows: all, ash, bark, belly, big, bird, bite, black, burn, claw, cloud, cold, dry, earth, eat, egg, feather, flesh, fly, foot, give, good, grease, green, hair, head, heart, hot, kill, know, lie, long, man, many, moon, mouth, neck, not, rain, red, root, round, sand, say, seed, sit, sleep, small, smoke, stand, swim, tail, that, this, walk, what, white, who, woman, and yellow.

Associations Between Symbols and Concepts.

The fundamental statistic in our analysis is pij, the maximum-likelihood estimator (i.e., the sample frequency) for the probability of finding that concept i has at least one instance of symbol j, after randomly choosing a lineage, a language within the lineage and a dialect within the language (if any) in that sequential order. Naturally, this calculation is restricted to the set of dialects of languages for which the concept and the phone are attested (which we will refer as Sij); for each of those sets, this quantity is formallypij=1|L|∑k=1|L|(1|Lk|∑l=1|Lk|1|Lkl|∑d=1|Lkl|πijkld).

The sets L, Lk, and Lkl are the sets of all lineages, languages within lineage k and dialects of language l within lineage k. πijkld is a binary variable that takes value 1 if there is at least one instance of symbol j in the word for concept i for dialect d of language l from lineage k (always within the set Sij) and 0 otherwise.

This computation is conservative in that all languages known to belong to the same genealogical group influence the aggregated statistics in the same way regardless of their size, but on the other hand it guarantees the minimum possible bias in the dependence of the languages’ words. To avoid testing cases whose coverage is insufficiently wide before testing, we evaluated only those associations for which Sij comprises 10 lineages in each of three different macroareas at least.

Conversely, for each dialect of each language, we calculated the proportion of words other than that associated with i that have symbol j, and we note this as π−ijkld, and similarly the genealogical balanced average as p−ij. These probabilities are used to produce nsim=1,000 Monte Carlo simulations of symbol j presence/absence for all of the languages in Sij—the set of p−ij values resulting from these simulations will be called ζij. The purpose is to compare ζij with πij to answer the question: does symbol j appear much more (or much less) often when a subset of words referring to concept i is selected than in a randomly picked set of words from the same languages? The two-tailed P value for a particular concept i and symbol j is then as follows (54):P=1nsim+1(2 min{|x∈ζij:x≥pij|,|x∈ζij:x≤pij|}+1),

where |⋅| is the cardinality of the set.

The large number of tests performed require a control for type I errors. We perform a false discovery rate (FDR) analysis fixing the FDR rejection threshold to 0.05, which means that we will allow no more than 5% of false positives on average. For this purpose, we use the method described in ref. 55. The basic idea is that the distribution of P values comes from a mixture of a uniform distribution (that corresponds to the baseline of tests where no associations beyond chance are present) and a distribution concentrated near P=0 of true positives. The method used here learns the mixture proportion of the uniform distribution from values P from 1 down to a threshold that is adjusted to reduce the false nondiscovery rate.

This entire procedure was repeated with a different, less conservative, genealogical classification—the one provided by the World Atlas of Language Structures (WALS) (30). For our analysis, we only considered associations that were below the defined FDR level according to both classifications. The fraction of the component of true negatives learned from both classifications was around 0.65.

Regarding possible confounds due to word length, we performed two extra tests on those associations that successfully passed the previous test. First, we repeated the same global test using the Glottolog classification this time comparing pij with simulations obtained from words of exactly the same number of symbols in each language (and dialect). Second, for each language (and dialect) in Sij, n=1,000 of independent simulations we sampled without replacement as many random symbols from words other than i up to the length of word i. This effectively produces, for each word i, a random counterpart equivalent to shuffling all of the symbols corresponding to all of the words of a language while keeping word lengths constant. Over each of those sets, the same association test based on the Glottolog classification was performed. In both of these procedures, we imposed a stricter cutoff: if any of the simulations yield a value of pij equally or more extreme, we would reject the association as of potential interest.

Finally, for each macroarea with at least 10 independent lineages in Sij, we analyzed the presence of a significant direction of association as in the main associations test—computing both empirical and random probabilities using only the languages of that area—with the difference that we flagged each macroarea-specific association with P ≤ 0.1. It should be noticed that this does not imply a softer rejection threshold than in the worldwide case: we only keep associations that display a bias consistent with the worldwide trend in at least one-half of the macroareas, with the extra condition that no macroarea should exhibit a bias in the opposite direction.

To summarize: only associations that successfully satisfied all of the requirements of the overall association test (with Glottolog and WALS classifications independently), the word length and the matched-length tests, and for which a consistent preference in at least one-half of the macroareas could be found were considered “signals.”

Association Between Signals.

As in the previous case, we analyze sets of languages for which both the concept and the symbol associated with a pair of signals was present in at least 10 lineages in each of (at least) three macroareas. The association between signals—which we will refer to A and B here—was tested by means of a simple mixed-effects logistic model as follows:logit(signal A presence)=αsignal B presence+αlineage,

where αsignal A presence is the coefficient related to the presence of signal A, and αlineage is a random coefficient structured according to lineage. To the results obtained by comparing all of the pairwise associations between signals belonging to the core 40 words, we applied a threshold on the FDR of 5%. About 12% of the 2,062 cases satisfied this condition. The results of associations regarding same-concept signals and the genealogically balanced average effect on the presence of signal B on A can be found in Table S4.

Acknowledgments

We acknowledge the comments of Bernard Comrie, Brent O. Berlin, Stephen C. Levinson, Mark Dingemanse, Russell Gray, and Eric W. Holman. We also thank Jeremy Collins and Stefany Moreno for assistance with other aspects of the manuscript. H.H.’s research was made possible thanks to the financial support of the Language and Cognition Department at the Max Planck Institute for Psycholinguistics, Max-Planck Gesellschaft, and European Research Council’s Advanced Grant 269484 (“INTERACT”) to Stephen C. Levinson. S.W.’s research was supported by the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/European Research Council Grant Agreement 295918 and by a subsidy of the Russian Government to support the Program of Competitive Development of Kazan Federal University. D.E.B.’s research was funded by the Max Planck Institute International Research School.

Footnotes

  • ↵1To whom correspondence should be addressed. Email: damianblasi{at}gmail.com.
  • Author contributions: D.E.B., S.W., H.H., and M.H.C. designed research; D.E.B. performed research; D.E.B. analyzed data; and D.E.B., S.W., H.H., P.F.S., and M.H.C. wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1605782113/-/DCSupplemental.

References

  1. ↵
    1. Saussure FD
    (1916) in Cours de Linguistique Générale, eds Bally C, Sechehaye A, Riedlienger A (Payot, Paris).
    .
  2. ↵
    1. Hockett CF
    (1960) The origin of speech. Sci Am 203:89–96.
    .
    OpenUrlPubMed
  3. ↵
    1. Pinker S
    (1999) Words and Rules: The Ingredients of Language (Perseus Books, New York).
    .
  4. ↵
    1. Dingemanse M
    (2012) Advances in the cross-linguistic study of ideophones. Lang Linguist Compass 6(10):654–672.
    .
    OpenUrlCrossRef
  5. ↵
    1. Dingemanse M,
    2. Blasi DE,
    3. Lupyan G,
    4. Christiansen MH,
    5. Monaghan P
    (2015) Arbitrariness, iconicity, and systematicity in language. Trends Cogn Sci 19(10):603–615.
    .
    OpenUrlCrossRefPubMed
  6. ↵
    1. Monaghan P,
    2. Shillcock RC,
    3. Christiansen MH,
    4. Kirby S
    (2014) How arbitrary is language? Philos Trans R Soc Lond B Biol Sci 369(1651):20130299.
    .
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Bergen BK
    (2004) The psychological reality of phonaesthemes. Language 80:290–311.
    .
    OpenUrlCrossRef
  8. ↵
    1. Monaghan P,
    2. Christiansen MH,
    3. Chater N
    (2007) The phonological-distributional coherence hypothesis: Cross-linguistic evidence in language acquisition. Cognit Psychol 55(4):259–305.
    .
    OpenUrlCrossRefPubMed
  9. ↵
    1. Köhler W
    (1929) Gestalt Psychology (Liveright, New York).
    .
  10. ↵
    1. Ramachandran VS,
    2. Hubbard EM
    (2001) Synaesthesia—a window into perception, thought and language. J Conscious Stud 8(12):3–34.
    .
    OpenUrl
  11. ↵
    1. Maurer D,
    2. Pathman T,
    3. Mondloch CJ
    (2006) The shape of boubas: Sound-shape correspondences in toddlers and adults. Dev Sci 9(3):316–322.
    .
    OpenUrlCrossRefPubMed
  12. ↵
    1. Ludwig VU,
    2. Adachi I,
    3. Matsuzawa T
    (2011) Visuoauditory mappings between high luminance and high pitch are shared by chimpanzees (Pan troglodytes) and humans. Proc Natl Acad Sci USA 108(51):20661–20665.
    .
    OpenUrlAbstract/FREE Full Text
  13. ↵
    1. Cuskley C,
    2. Kirby S
    (2013) Synaesthesia, cross-modality and language evolution. Oxford Handbook of Synaesthesia, eds Simner J, Hubbard EM (Oxford Univ Press, Oxford, UK), pp 869–907.
    .
  14. ↵
    Hinton L, Nichols J, Ohala JJ, eds (2006) Sound Symbolism (Cambridge Univ Press, New York).
    .
  15. ↵
    1. Berlin B
    (2006) The first congress of ethnozoological nomenclature. J R Anthropol Inst 12(s1):S23–S44.
    .
    OpenUrlCrossRef
  16. ↵
    1. Imai M,
    2. Kita S
    (2014) The sound symbolism bootstrapping hypothesis for language acquisition and language evolution. Philos Trans R Soc Lond B Biol Sci 369(1651):20130298.
    .
    OpenUrlAbstract/FREE Full Text
  17. ↵
    1. Nuckolls JB
    (1999) The case for sound symbolism. Annu Rev Anthropol 28:225–252.
    .
    OpenUrlCrossRef
  18. ↵
    1. Reilly J,
    2. Hung J,
    3. Westbury C
    (2016) Non arbitrariness in mapping word form to meaning: Cross-linguistic formal markers of word concreteness. Cogn Sci doi:10.1111/cogs.12361.
    .
    OpenUrlCrossRef
  19. ↵
    1. Monaghan P,
    2. Christiansen MH,
    3. Fitneva SA
    (2011) The arbitrariness of the sign: Learning advantages from the structure of the vocabulary. J Exp Psychol Gen 140(3):325–347.
    .
    OpenUrlCrossRef
  20. ↵
    1. Wichmann S,
    2. Holman EW,
    3. Brown CH
    (2010) Sound symbolism in basic vocabulary. Entropy 12:844–858.
    .
    OpenUrlCrossRef
  21. ↵
    1. Urban M
    (2011) Conventional sound symbolism in terms for organs of speech: A cross-linguistic study. Folia Linguist 45(1):199–214.
    .
    OpenUrl
  22. ↵
    1. Johansson N,
    2. Zlatev J
    (2013) Motivations for sound symbolism in spatial deixis: A typological study of 101 languages. Public J Semiotics 5(1):3–20.
    .
    OpenUrl
  23. ↵
    1. Dingemanse M,
    2. Torreira F,
    3. Enfield NJ
    (2013) Is “huh?” a universal word? Conversational infrastructure and the convergent evolution of linguistic items. PLoS One 8(11):e78273.
    .
    OpenUrlCrossRefPubMed
  24. ↵
    1. Haynie H,
    2. Bowern C,
    3. Lapalombara H
    (2014) Sound symbolism in the languages of Australia. PLoS One 9(4):e92852.
    .
    OpenUrlCrossRefPubMed
  25. ↵
    1. Wichmann S, et al.
    (2013) The ASJP Database (Version 16). Available at asjp.clld.org/. Accessed July 2, 2013.
    .
  26. ↵
    1. Tadmor U,
    2. Haspelmath M,
    3. Taylor B
    (2010) Borrowability and the notion of basic vocabulary. Diachronica 27(2):226–246.
    .
    OpenUrlCrossRef
  27. ↵
    1. Hammarström H,
    2. Donohue M
    (2014) Some principles on the use of macro-areas in typological comparison. Lang Dynam Change 4(1):167–187.
    .
    OpenUrlCrossRef
  28. ↵
    1. Napoli DJ,
    2. Sanders N,
    3. Wright R
    (2014) On the linguistic effects of articulatory ease, with a focus on sign languages. Language 90(2):424–456.
    .
    OpenUrlCrossRef
  29. ↵
    1. Vitevitch MS,
    2. Luce PA
    (2016) Phonological neighborhood effects in spoken word perception and production. Annu Rev Linguistics 2:75–94.
    .
    OpenUrlCrossRef
  30. ↵
    1. Haspelmath M,
    2. Dryer MS,
    3. Gil D,
    4. Comrie B
    (2005) The World Atlas of Language Structures (Oxford Univ Press, Oxford, UK).
    .
  31. ↵
    1. Nordhoff S,
    2. Hammarström H,
    3. Forkel R,
    4. Haspelmath M
    (2013) Glottolog 2.1. Available at glottolog.org. Accessed July 2, 2013.
    .
  32. ↵
    1. Lockwood G,
    2. Dingemanse M
    (2015) Iconicity in the lab: A review of behavioral, developmental, and neuroimaging research into sound-symbolism. Front Psychol 6:1624.
    .
    OpenUrlPubMed
  33. ↵
    1. Bateman N
    (2011) On the typology of palatalization. Lang Linguist Compass 5(8):588–602.
    .
    OpenUrlCrossRef
  34. ↵
    1. Nielsen A,
    2. Rendall D
    (2011) The sound of round: Evaluating the sound-symbolic role of consonants in the classic Takete-Maluma phenomenon. Can J Exp Psychol 65(2):115–124.
    .
    OpenUrlCrossRefPubMed
  35. ↵
    1. Fort M,
    2. Martin A,
    3. Peperkamp S
    (2015) Consonants are more important than vowels in the Bouba-kiki effect. Lang Speech 58(Pt 2):247–266.
    .
    OpenUrlCrossRefPubMed
  36. ↵
    1. Bankieris K,
    2. Simner J
    (2015) What is the link between synaesthesia and sound symbolism? Cognition 136:186–195.
    .
    OpenUrlCrossRefPubMed
  37. ↵
    Greenberg JH, Ferguson CA, Moravcsik EA, eds (1978) Universals of Human Language: Phonology (Stanford Univ Press, Stanford, CA), Vol 2.
    .
  38. ↵
    1. Jakobson R
    (1960) Why “mama” and “papa.” Perspectives in Psychological Theory, Dedicated to Heinz Werner, eds Kaplan B, Wapner S (International Universities Press, New York), pp 124–134.
    .
  39. ↵
    1. Traunmüller H
    (1994) Sound symbolism in deictic words. Tongues and Texts Unlimited. Studies in Honour of Tore Jansson on the Occasion of his Sixtieth Anniversary, eds Janson T, Aili H, af Trampe P (Stockholms Universitet, Institutionen för Klassiska Språk, Stockholm), pp 213–234.
    .
  40. ↵
    1. Bromham L,
    2. Hua X,
    3. Fitzpatrick TG,
    4. Greenhill SJ
    (2015) Rate of language evolution is affected by population size. Proc Natl Acad Sci USA 112(7):2097–2102.
    .
    OpenUrlAbstract/FREE Full Text
  41. ↵
    1. Nygaard LC,
    2. Cook AE,
    3. Namy LL
    (2009) Sound to meaning correspondences facilitate word learning. Cognition 112(1):181–186.
    .
    OpenUrlCrossRefPubMed
  42. ↵
    1. Ruhlen M
    (1994) On the Origin of Languages: Studies in Linguistic Taxonomy (Stanford Univ Press, Stanford, CA).
    .
  43. ↵
    1. Starostin SA,
    2. Bronnikov Y
    (2009) Languages of the World Etymological Database, Part of the Tower of Babel–Evolution of Human Language Project. Available at starling.rinet.ru/cgi-bin/main.cgi?flags=eygtnnl. Accessed May 25, 2015.
    .
  44. ↵
    1. Pagel M,
    2. Atkinson QD,
    3. S Calude A,
    4. Meade A
    (2013) Ultraconserved words point to deep language ancestry across Eurasia. Proc Natl Acad Sci USA 110(21):8471–8476.
    .
    OpenUrlAbstract/FREE Full Text
  45. ↵
    1. Steiner L,
    2. Stadler PF,
    3. Cysouw M
    (2011) A pipeline for computational historical linguistics. Lang Dynam Change 1(1):89–127.
    .
    OpenUrlCrossRef
  46. ↵
    1. Jäger G
    (2015) Support for linguistic macrofamilies from weighted sequence alignment. Proc Natl Acad Sci USA 112(41):12752–12757.
    .
    OpenUrlAbstract/FREE Full Text
  47. ↵
    1. Proctor MI
    (1995) Gestural characterization of a phonological class: The liquids. PhD dissertation (Yale University, New Haven, CT).
    .
  48. ↵
    1. MacNeilage PF,
    2. Davis BL
    (2000) On the origin of internal structure of word forms. Science 288(5465):527–531.
    .
    OpenUrlAbstract/FREE Full Text
  49. ↵
    1. Campbell L,
    2. Poser WJ
    (2008) Language Classification (Cambridge Univ Press, New York).
    .
  50. ↵
    1. Lewis P,
    2. Simons G,
    3. Fennig C
    (2013) Ethnologue: Languages of the World (SIL International, Dallas, TX).
    .
  51. ↵
    1. Swadesh M
    (1955) Towards greater accuracy in lexicostatistic dating. Int J Am Linguist 21(2):121–137.
    .
    OpenUrlCrossRef
  52. ↵
    1. Holman EW, et al.
    (2008) Explorations in automated language classification. Folia Linguist 42(2):331–354.
    .
    OpenUrl
  53. ↵
    1. Brown CH,
    2. Holman EW,
    3. Wichmann S
    (2013) Sound correspondences in the world’s languages. Language 89(1):4–29.
    .
    OpenUrlCrossRef
  54. ↵
    1. North BV,
    2. Curtis D,
    3. Sham PC
    (2002) A note on the calculation of empirical P values from Monte Carlo procedures. Am J Hum Genet 71(2):439–441.
    .
    OpenUrlCrossRefPubMed
  55. ↵
    1. Strimmer K
    (2008) fdrtool: A versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics 24(12):1461–1462.
    .
    OpenUrlAbstract/FREE Full Text
  56. ↵
    1. Nichols J,
    2. Witzlack-Makarevich A,
    3. Bickel B
    (2013) The AUTOTYP Genealogy and Geography Database: 2013 Release (University of Zürich, Zurich). Available at www.autotyp.uzh.ch/. Accessed May 29, 2015.
    .
  57. ↵
    1. Dunn M,
    2. Greenhill SJ,
    3. Levinson SC,
    4. Gray RD
    (2011) Evolved structure of language shows lineage-specific trends in word-order universals. Nature 473(7345):79–82.
    .
    OpenUrlCrossRefPubMed
  58. ↵
    1. Gray RD,
    2. Drummond AJ,
    3. Greenhill SJ
    (2009) Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science 323(5913):479–483.
    .
    OpenUrlAbstract/FREE Full Text
  59. ↵
    1. Gooskens C,
    2. Heeringa W
    (2004) Perceptive evaluation of Levenshtein dialect distance easurements using Norwegian dialect data. Lang Var Change 16:189–207.
    .
    OpenUrl
  60. ↵
    1. Nerbonne J,
    2. Heeringa W,
    3. Kleiweg P
    (1999) Edit distance and dialect proximity. Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, eds Sankoff D, Kruskal J (CSLI Press, Stanford, CA), pp v–xv.
    .
  61. ↵
    1. Smithson M,
    2. Verkuilen J
    (2006) A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol Methods 11(1):54–71.
    .
    OpenUrlCrossRefPubMed
View Abstract
Next
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Sound–meaning association biases evidenced across thousands of languages
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
Citation Tools
Languages share similar sound–meaning associations
Damián E. Blasi, Søren Wichmann, Harald Hammarström, Peter F. Stadler, Morten H. Christiansen
Proceedings of the National Academy of Sciences Sep 2016, 201605782; DOI: 10.1073/pnas.1605782113

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Languages share similar sound–meaning associations
Damián E. Blasi, Søren Wichmann, Harald Hammarström, Peter F. Stadler, Morten H. Christiansen
Proceedings of the National Academy of Sciences Sep 2016, 201605782; DOI: 10.1073/pnas.1605782113
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley

More Articles of This Classification

Social Sciences

  • Linking economic growth pathways and environmental sustainability by understanding development as alternate social–ecological regimes
  • Status-quo management of marine recreational fisheries undermines angler welfare
  • A monumental cemetery built by eastern Africa’s first herders near Lake Turkana, Kenya
Show more

Anthropology

  • A monumental cemetery built by eastern Africa’s first herders near Lake Turkana, Kenya
  • Evolution and function of the hominin forefoot
  • Archaeogenomic evidence from the southwestern US points to a pre-Hispanic scarlet macaw breeding colony
Show more

Biological Sciences

  • Linking economic growth pathways and environmental sustainability by understanding development as alternate social–ecological regimes
  • Genetic screen identifies adaptive aneuploidy as a key mediator of ER stress resistance in yeast
  • ASCT1 (Slc1a4) transporter is a physiologic regulator of brain d-serine and neurodevelopment
Show more

Psychological and Cognitive Sciences

  • Recurrent computations for visual pattern completion
  • Income inequality not gender inequality positively covaries with female sexualization on social media
  • Linguistic effect on speech perception observed at the brainstem
Show more

Related Content

  • No related articles found.
  • Scopus
  • PubMed
  • Google Scholar

Cited by...

  • Language is more abstract than you think, or, why aren't languages more iconic?
  • Nouns slow down speech across structurally and culturally diverse languages
  • Scopus (42)
  • Google Scholar

Similar Articles

You May Also be Interested in

Researchers are using eDNA to track invasive species that they’d like to remove and vulnerable species they’d like to protect. But challenges remain before eDNA can become a widely used conservation biology tool. Image courtesy of USGS/Gaia Meigs-Friend.
Core Concept: Environmental DNA helps researchers track pythons and other stealthy creatures
Researchers are using eDNA to track invasive species that they’d like to remove and vulnerable species they’d like to protect. But challenges remain before eDNA can become a widely used conservation biology tool.
Image courtesy of USGS/Gaia Meigs-Friend.
Can academic institutions rescue biomedical research and the next generation of investigators? Yes, but the task will be hard and slow, and success will be piecemeal, not sweeping. Image courtesy of Dave Cutler.
Opinion: Expansion fever and soft money plague the biomedical research enterprise
Can academic institutions rescue biomedical research and the next generation of investigators? Yes, but the task will be hard and slow, and success will be piecemeal, not sweeping.
Image courtesy of Dave Cutler.
Journal Club: Highly-detailed solar wind observations may help explain sun’s mysteries
Journal Club: Highly-detailed solar wind observations may help explain sun’s mysteries
Amaia Arranz-Otaegui describes the discovery of bread that far pre-dates agriculture.
Origins of Bread
Amaia Arranz-Otaegui describes the discovery of bread that far pre-dates agriculture.
Listen
Past PodcastsSubscribe
PNAS QnAs with chemist and NAS foreign associate Lia Addadi
PNAS QnAs
PNAS QnAs with chemist and NAS foreign associate Lia Addadi
Proceedings of the National Academy of Sciences: 115 (36)
Current Issue

Submit

Sign up for Article Alerts

Jump to section

  • Article
    • Abstract
    • Testing Associations on a Global Scale
    • SI Materials and Methods
    • Strong Worldwide Associations
    • Origins and Nature of the Associations
    • Conclusion
    • Materials and Methods
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Authors & Info
  • PDF
Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Latest Articles
  • Archive

PNAS Portals

  • Classics
  • Front Matter
  • Teaching Resources
  • Anthropology
  • Chemistry
  • Physics
  • Sustainability Science

Information

  • Authors
  • Reviewers
  • Press
  • Site Map

Feedback    Privacy/Legal

Copyright © 2018 National Academy of Sciences.