# Overtone-based pitch selection in hermit thrush song: Unexpected convergence with scale construction in human music

^{a}Department of Music, Cornish College of the Arts, Seattle, WA 98121;^{b}Department of Cognitive Biology, Faculty of Life Sciences, University of Vienna, Vienna A-1090, Austria;^{c}Theoretical Neuroscience Group, Philipps University of Marburg, 35032 Marburg, Germany; and^{d}Section for Computational Sensomotorics, Hertie Institute for Clinical Brain Research, Center for Integrative Neuroscience, Bernstein Center for Computational Neuroscience, and University Clinic Tübingen, 72076 Tübingen, Germany

See allHide authors and affiliations

Edited by Dale Purves, Duke University, Durham, NC, and approved October 8, 2014 (received for review April 3, 2014)

## Significance

The song of the hermit thrush, a common North American songbird, is renowned for its apparent musicality and has attracted the attention of musicians and ornithologists for more than a century. Here we show that hermit thrush songs, like much human music, use pitches that are mathematically related by simple integer ratios and follow the harmonic series. Our findings add to a small but growing body of research showing that a preference for small-integer ratio intervals is not unique to humans and are thus particularly relevant to the ongoing nature/nurture debate about whether musical predispositions such as the preference for consonant intervals are biologically or culturally driven.

## Abstract

Many human musical scales, including the diatonic major scale prevalent in Western music, are built partially or entirely from intervals (ratios between adjacent frequencies) corresponding to small-integer proportions drawn from the harmonic series. Scientists have long debated the extent to which principles of scale generation in human music are biologically or culturally determined. Data from animal “song” may provide new insights into this discussion. Here, by examining pitch relationships using both a simple linear regression model and a Bayesian generative model, we show that most songs of the hermit thrush (*Catharus guttatus*) favor simple frequency ratios derived from the harmonic (or overtone) series. Furthermore, we show that this frequency selection results not from physical constraints governing peripheral production mechanisms but from active selection at a central level. These data provide the most rigorous empirical evidence to date of a bird song that makes use of the same mathematical principles that underlie Western and many non-Western musical scales, demonstrating surprising convergence between human and animal “song cultures.” Although there is no evidence that the songs of most bird species follow the overtone series, our findings add to a small but growing body of research showing that a preference for small-integer frequency ratios is not unique to humans. These findings thus have important implications for current debates about the origins of human musical systems and may call for a reevaluation of existing theories of musical consonance based on specific human vocal characteristics.

Many human musical scales, including the diatonic major scale prevalent in Western music, are built partially or entirely from intervals (ratios between adjacent frequencies) corresponding to small-integer ratios drawn from the harmonic series (1). A long-running debate concerns the extent to which principles underlying the structure of human musical scales derive from biological aspects of auditory perception and/or vocal production or are historical cultural “accidents” (2⇓–4). The songs of nonhuman animals, such as birds or whales, potentially offer a valuable perspective on this debate. On the one hand, features of human music that are culturally bound, or dependent on specific characteristics of the human voice or auditory system, should be absent in animal vocalizations. On the other hand, aspects of human music observed in the vocalizations of other species seem likely to be partially determined by general physical or biological constraints rather than solely by cultural practices. Such shared features would complement recent research suggesting that common motor constraints shape both human song and that of some bird species (5).

The physical principles underlying vocal production in songbirds are well understood (6⇓⇓⇓–10) and do not differ fundamentally from those of other vertebrates. Sound is produced by tissue vibrations in the syrinx, a bird-specific organ located at the base of the trachea. Flow-driven vibrations of fleshy membranes within the syrinx (in songbirds, the medial and lateral labia) generate a periodic source signal that is filtered by the air column within the trachea and mouth and then emitted to the environment. These principles are important in formulating various alternative hypotheses considered below.

Naturalists have long wondered whether birdsong could be said to have musical properties (11⇓–13). However, early studies on pitch selection tended to be anecdotal, based on a small sample size, or lacking in analytical rigor. Two more recent studies specifically comparing pitch selection in bird song and human musical scales concluded that birdsong does not make preferential use of musical intervals found in commonly used Western musical scales (14, 15). However, because these studies each only examined one species [the white-throated sparrow (*Zonotrichia albicollis*) and the nightingale wren (*Microcerculus philomela*), respectively], a conclusion that birdsong in general does not exhibit musical properties seems premature. Indeed, other studies have shown preferential use of consonant intervals in tropical boubou shrikes (*Laniarius aethiopicus*) (16) and musician wrens (*Cyphorhinus arada*) (17), although in the first case no rigorous statistical analysis was presented.

Here, we investigated songs of the hermit thrush (*Catharus guttatus*), a medium-sized North American songbird whose famously “musical”-sounding song has attracted the attention of ornithologists and musicians alike (18) but has not yet been subjected to detailed pitch analysis. Its songs are composed of elements (the smallest unit of song construction, seen as continuous uninterrupted traces on spectrograms) that may exhibit either a variable pitch, such as trills and slides, or a stable pitch—pure, non-frequency-modulated, “flutelike” sounds. These stable sounds, which we refer to as “notes” (Fig. 1), are characterized by strong fundamental frequencies and very weak higher harmonics, making them ideally suited for an analysis of pitch relationships (15). Males typically sing 6–10 different song types, defined as nearly identical sequences of elements, durations, and frequencies. In a number of early- and mid-20th-century studies, hermit thrush song was variously attributed with use of major, minor, and pentatonic scales (19, 20) and claimed to follow the overtone series (21). However, these early studies again suffered from small sample sizes and anecdotal reporting and were not based on rigorous acoustic analysis. More recent hermit thrush studies have focused on regional differences and song-type ordering, rather than pitch selection (22, 23).

Here we tested the overtone hypothesis, which predicts that the frequencies of the individual song notes are integer multiples (harmonics) of an implied (but not actually sung) base frequency (hereafter *f*_{i}). This hypothesis seems plausible because, unlike some previous claims, it does not attribute human-specific music-theoretical concepts to hermit thrush song. Moreover, the subjective impression of trained musicians listening to hermit thrush songs (played at one-sixth of the original speed to shift the speed and frequency of the songs into a range more suitable for human hearing) was that most notes indeed seemed to follow an overtone series (see Fig. 2 and Audio File S1 for the corresponding sound example). However, determining whether a set of notes are harmonics of a frequency not present in the set requires a rigorous procedure to estimate and evaluate *f*_{i}. To this end, we used two different statistical approaches, an ordinary least-squares regression model and a generative Bayesian estimator. Both approaches were used to test the hypothesis that a song is an exchangeable sequence of frequencies that are integer multiples of some implied *f*_{i}, versus the null hypothesis that songs are generated by drawing frequencies out of a random log-normal distribution (see *Materials and Methods* for details). By using a Bayesian approach in addition to the least-squares regression model we evaluate whether our analyses represent a rigorous test of our overtone hypothesis and not simply a post hoc explanation that minimizes an error measure by “memorizing” the data. These properties make the Bayesian evaluation statistically more rigorous than least-squares fitting.

## Results and Discussion

From our collection of 114 song types produced by 14 male hermit thrushes across a wide time range, and spanning North America, we analyzed all song types containing 10 or more notes for which a single stable frequency could accurately be determined (shown as “elements with stable pitch” in Fig. 1), for a total of 71 song types. The prominent first element of each song type, the “introductory whistle,” was omitted because its pitch varies over time, often rising or falling by up to 200 cents or more, and thus cannot accurately be assigned to a single frequency.

Using the least-squares regression model we found that the frequencies of the notes from 57 of 71 songs followed a distribution that is significantly closer to an overtone series than to a random log-normal distribution, indicating that notes from these songs typically approximated integer multiples of an inferred *f*_{i}. According to the Bayesian estimator, 61 of 71 songs were significantly overtone-related. Although these two statistical approaches are conceptually quite different, both agreed in their classification of 61 of the 71 songs, 54 of which were classified as harmonic by both models, and 7 as nonharmonic (the Bayesian estimator classified an additional 7 songs as harmonic, and the least-squares regression model an additional 3). Moreover, both approaches agreed on the *f*_{i} values for 57 songs (sound examples of songs classified as harmonic and nonharmonic can be found in Audio Files S2–S13 and Table S1).

To ascertain that our statistical models correctly classify as harmonic those songs whose note frequencies closely approximated an overtone series while rejecting nonharmonic songs, we evaluated the validity of the Bayesian estimator and least-squares regression model on a computer-generated “ground truth” dataset, consisting of 1,100 sequences comprising 14 notes each (the average number of notes in hermit thrush songs from our database). Note frequencies in these sequences were drawn from distributions spanning the continuum from strictly harmonic to a distribution with a frequency “jitter” or logarithmic SD (*η*) from an exact overtone series equal to 0.3 (visually undistinguishable from a log-normal distribution; see Fig. 3 and *Materials and Methods* for details). Both regression and Bayesian approaches classified 10% or fewer of all sequences generated from a distribution with *η* ≥ 0.04 as harmonic (at a significance threshold α = 0.05), whereas more than 90% of all sequences generated from a distribution with η ≤ 0.015 were classified as harmonic (Fig. 4). Note that a distribution with *η* = 0.015 still exhibits a clear harmonic structure (Fig. 3). In comparison, human singers singing the well-known song “Happy Birthday” show an empirical deviation of *η =* 0.014 from the intended frequency (*Materials and Methods*).

To further validate our statistical models we analyzed 10 commercial recordings of the alphorn, an instrument made out of a holeless, conical bore tube that is physically capable only of producing notes from the overtone series above a base frequency produced by the entire length of the instrument. Both statistical models identified all alphorn recordings as overtone-related and correctly identified *f*_{i} for each example. As expected, the alphorn recordings yielded a better least-squares fit to the overtone series than the hermit thrush songs (Mann–Whitney *U* = 80, *P* < 0.001), because the alphorn deviates only minimally from a pure overtone series. However, the residual error of the least-squares fit for 57 of 71 hermit thrush songs was less than twice the mean residual error of the fit for the alphorn recordings, and 52 of these 57 songs were classified as significantly harmonic by the least-squares regression model. This indicates that most hermit thrush songs from our sample were only moderately more “out-of-tune” than alphorn recordings.

One possible explanation for the strong bias toward overtone-related pitches in hermit thrush songs could be that the birds couple their vocal fundamental frequency to the resonances of their vocal tract, in a manner similar to wind instruments like the alphorn. If this is true, birds would generate a harmonic series using the acoustic resonances of their tube-like trachea (24), and the observed frequency distribution would follow from basic physical production constraints. However, there is considerable evidence against the source–tract coupling in birds (25, 26) that would be required by this hypothesis. Equally critically, the implied fundamentals for the hermit thrush songs were drastically lower than those predicted using measured tracheal lengths. We calculated the predicted *f*_{i} for a half-open air-filled tube using the formula *f*_{i} = *c*/4 × *L*, where *c* is the speed of sound in warm, moist air (350 m/s) and *L* is the tube length (27). Fully stretched, the lengths of two hermit thrush tracheas were 32 and 35 mm, predicting *f*_{i} values of 2,734 and 2,500 Hz (27), far higher than our estimated values, which all fell between 180 and 720 Hz. Furthermore, hermit thrush songs often contain occasional nonovertone notes, as well as smooth frequency glides over a large frequency range, which would be impossible with a fixed-length trachea if source–tract coupling were present. All of these data are inconsistent with the physical constraints on peripheral production mechanisms required by the “alphorn” hypothesis.

Alternatively, a bird could theoretically select specific harmonics from a fixed-pitched source by varying its vocal tract filter, as occurs during overtone or “throat” singing in humans (28, 29). However, this would require both a very low source fundamental frequency and a vocal tract flexible enough to allow the bird to pick out many different harmonics of that source, neither of which conditions can plausibly be met by hermit thrush vocal anatomy.

Thus, the production of notes from the harmonic series does not result directly from the physics of the vocal apparatus but instead seems to derive from voluntary central control of muscular and neural parameters. Supporting the idea that it is possible for songbirds to learn to select pitches from the overtone series, white-crowned sparrows, whose songs do not normally follow the overtone series, can learn to sing hermit thrush songs in experimental conditions (30).

Our results show that pitches within *C. guttatus* songs are related by small-integer ratio intervals from the overtone series, indicating that pitch selection in these songs obeys mathematical constraints found within common human musical scales. Because these constraints are found in hermit thrush songs from across North America, over a span of more than 50 y, our data suggest that this is a species-typical characteristic of hermit thrush song. Although the presence of small-integer ratio intervals is neither a necessary nor a sufficient condition for something to be considered music [e.g., some music does not contain any defined pitch intervals, whereas a doorbell or other nonmusical signal may ring a perfect fifth (3:2) or other simple integer ratio], these intervals have a preferential status in practically all known human musical cultures (31) and thus may be considered a characteristic feature of music.

Although pitch selection in birdsong has only been studied quantitatively in a few species, other bird species, such as Java sparrows (*Padda oryzivora*) (32), European starlings (*Sturnus vulgaris*) (33), and pigeons (*Columba livia*) (34) have been shown to discriminate between consonance and dissonance, and newly hatched domestic chicks (*Gallus gallus*) display a preference for consonant intervals (35). Outside the avian kingdom, octave generalization has been shown in rhesus macaques (*Macaca mulatta*) (36), and pairs of dengue vector mosquitoes (*Aedes aegypti*) converge on buzzing frequencies that are a perfect fifth apart before mating (37). Given that few rigorous studies have concentrated on pitch selection or perception in nonhuman animals, our findings lead us to predict that future studies may show a preference for consonant intervals in more species.

A leading hypothesis for human consonance preference suggests that early (even prenatal) exposure to the harmonic-rich human voice, combined with the specific characteristics of the human vocal tract, provides an acquired “template” for musical attractiveness (4, 38, 39). Because hermit thrushes’ notes, like those of most birdsong, lack strong harmonics (Fig. 1), this explanation cannot apply in this species. Thus, early exposure to broad-band harmonic sounds (like the human voice) is not necessary for an organism to favor notes chosen from the harmonic series. Why, then, would hermit thrushes consistently choose to sing using pitches from the harmonic series? One possibility is that the acoustical predictability of overtone series provides an objective yardstick for females evaluating a singing male’s pitch accuracy. Another possibility, not mutually exclusive, is that frequencies related by small-integer ratios may be more easily remembered or processed in the auditory system, representing a form of sensory bias that may characterize both humans (31) and at least some species of birds.

Our results, combined with the above-mentioned evidence for preferential use of consonant intervals in other animal species, support the assertion that some aspects of human scales may be partially based on shared biological principles. More generally, these results, along with recent work on rhythmic entrainment in animals (40, 41), suggest that a number of perceptual and motor mechanisms providing the biological bases for human music may be shared with some other species and call for a reevaluation of long-held assumptions about the species-specific nature and origin of human musical preferences.

## Materials and Methods

### Recordings and Acoustic Analysis.

High-quality recordings of hermit thrush songs from 14 birds were acquired from three sources: the Borror Laboratory of Bioacoustics, Kevin J. Colver (Kevin J. Colver Productions, Elk Ridge, UT), and Bernie Krause (Wild Sanctuary, Glen Ellen, CA). Recordings and passages in which a poor signal-to-noise ratio or overlapping vocalizations prevented accurate analysis of the songs, or in which the focal bird sang fewer than 12 songs, were discarded. The longest available uninterrupted passage of the hermit thrush’s vocalizations was selected from each recording. In the case of one bird there were two suitable uninterrupted passages and we used data from both. Acoustic analysis was performed using the software Praat (42) and custom Praat macros (written by W.T.F.). Recordings for each bird were segmented into individual songs, defined as bouts of vocalization separated by more than 2 s of silence. These extracted songs were saved as separate files, numbered consecutively, and labeled according to start time and duration. The accuracy of automated extraction of the fundamental pitch of each note was verified by ear and by visual inspection of spectrograms with overlaid pitch tracks.

### Song Types and Ordering in Song Series.

Hermit thrush songs are made up of small number of song types (22, 23); to avoid pseudoreplication we analyzed only the first appearance of each song type of each bird. Song types were defined as identical sequences of elements, as determined by visual inspection of the sonograms, and by comparing the average frequencies of the introductory whistles and the frequencies of the postintroductory whistle notes of each song. Each of our birds sang between 6 and 10 song types, for a total of 114 different song types (Table 1). None of these song types was shared between birds. Each bird typically cycled through all his song types within about 14 songs. Song types were never immediately repeated. The largest observed repeat interval was 32 songs and the shortest 2 songs. Individual birds varied in the predictability with which they presented their song types.

### Pitch Extraction.

Praat’s autocorrelation-based pitch extraction algorithm was used to estimate the average fundamental frequency (“F0” or “pitch” hereafter) of each note with a steady, determinable pitch and that lasted for at least 10 ms. The minimum F0 of a note measured from any bird was 1,545 Hz and the maximum was 7,925 Hz. The mean range for each song type was 1,190 cents (very close to an octave, which corresponds to 1,200 cents). The mean range covered by each bird (including all song types) was 2,537 cents, representing slightly more than two octaves, with a minimum range of 2,201 cents and a maximum of 3,002 cents.

### Estimating the Harmonicity of the Songs.

To estimate the harmonicity of the hermit thrush songs we used two approaches, one based on an ordinary least-squares regression model and a second on a Bayesian estimator of harmonicity. Both models assume that *f*_{i} lies between 100 and 1,000 Hz (although identical results were obtained when considering a range from 20 to 5,000 Hz) and both consider only the first 16 overtones, given that higher overtones are less clearly separated on a logarithmic basis.

### Ordinary Least-Squares Regression Model.

The ordinary least-squares regression model makes no particular assumptions about the distribution of *f*_{i} besides those outlined above. For each song, a best-fitting base frequency *f*_{i} is computed by finding the value of *f*_{i} that minimizes the mean square error (logarithm of the frequency deviation) of a linear regression that fits the pitches of each note in a song to integer multiples of *f*_{i}. To test the hypothesis H1 that a song is an exchangeable sequence of frequencies that are integer multiples of some implied *f*_{i} versus the null hypothesis H0 that songs are generated by drawing frequencies out of a log-normal distribution, we generated, for each song, 1,000 randomly generated songs with the same number of notes as the actual song, but using frequencies taken from a log-normal distribution (restricted between 0 and 10,000 Hz) modeling the frequency distribution of all hermit thrush songs in our sample. For each of these randomly generated songs, a best fit for *f*_{i} was computed using the least-squares approach described above. If fewer than 5% of the randomly generated songs showed a better fit (smaller mean square error) to an overtone series than the actual song, H0 was rejected at a significance threshold of 0.05.

### Generative Bayesian Model.

The generative Bayesian model computes the posterior distribution of the implied base *f*_{i} in a manner analogous to performing a regression analysis over the measured frequencies of all individual song notes (dependent variable) relative to those predicted from integer multiples (predictor) of all possible *f*_{i} between 100 Hz and 1,000 Hz. Because the predictor values (represented by the integer multiples 1, 2, 3, 4, …, with *f*_{i} being the frequency associated with the multiple 1) are unknown, this is an instance of a latent variable problem. The model therefore generates distributions over predictor values, rather than point estimates.

A song is parameterized by a (strictly positive) base frequency *b* and an overtone distribution *q =* (*q*_{1},…,*q*_{m}), where *q*_{m} is the probability of observing overtone *m* in a song, *M* is the maximal possible overtone and *K* notes *n*_{k} ∈ {1, …, *M*} out of *q*, where *k = 1, …, K* is the sequence index. These notes are latent variables, because they are not directly observable. In a perfectly harmonic song, the notes and the corresponding observable pitch frequencies *f*_{k} would be related by

To allow for a certain amount of random frequency deviation (hereafter jitter), which might be attributed to inaccurate singing or measurement errors, we included this jitter *ρ* multiplicatively:

where *ρ* is drawn from a log-normal distribution with mean 0 and SD *σ*. Factorizing the jitter from the “ideal” frequency *b · n*_{k} allows us to use a frequency-independent relative jitter model, because the frequency dependence is already captured by the first two factors on the right-and side of Eq. **2**.

To complete the model, we specified priors on *q* and *b*, because they are a priori unknown and are thus treated as random variables. Because the notes *n*_{k} are multinomial variables (each note can take exactly one of *M* values), *q* is a multinomial distribution. The canonical prior on a multinomial distribution is the Dirichlet distribution (see ref. 43 for details). We used a symmetric Dirichlet distribution with concentration parameter *α*. To choose a prior on *b*, note that there is a scaling ambiguity in the model: Substituting *b* by *b*/*z* and *n*_{k} by *z · n*_{k} with *z* integer and positive would lead to the same observable frequencies *f*_{k}. To remove this ambiguity, we limited the range of *b* to *b* ∈ [*b*_{min}, *b*_{max}] with a density

that is, we prefer higher base frequencies linearly. The constant of proportionality in Eq. **3** was chosen to ensure normalization:

For base frequency inference we computed the posterior distribution of *b* given a song using standard Bayesian approaches (43). This computation involves an intractable integral over *q*, which we approximate by its value at the maximum (maximum-a-posteriori, or MAP, approximation). The one-dimensional integral over *b* is carried out numerically. Given the MAP estimate of *b*, inferring a note *n*_{k} is done by finding the *n*_{k} that maximizes the probability of generating the corresponding *f*_{k}. For the predictions, we used the MAP parameter estimates for *b* and *q* to compute predictive probability distributions for those frequencies that had not been used to make these estimates. We set *b* ∈ [100 Hz, 1,000 Hz] and *α* = 2. The value of *α* is not critical.

To determine a suitable value for the jitter SD *σ*, we recorded 10 human singers singing “Happy Birthday” two times each. Because this song is diatonic rather than purely harmonic, we used a song model assuming a chromatic scale, by replacing Eq. **2** with

where *ρ* is not critical but should be small enough so that neighboring steps on the scale do not mix owing to jitter. We chose 0.01. For each recording, we inferred *b* and the *n*_{k} as described above and computed the reconstructed frequency *σ*.

The above generative model can be used not only to infer the base frequency and notes but also to determine how probable is an observed song under H0 versus H1. To avoid overfitting we used leave-one-out cross-validation (43), that is, we inferred *b* and *q* from all frequency observations in a song but one and then computed the predictive probability of the left-out observation.

### Validity Testing of Both Models on a Ground Truth Dataset.

To verify that both statistical approaches used here produce sensible results we used a ground truth dataset (GT). A ground truth is a dataset that shares relevant features with the data of interest but has controllable statistical properties. Our GT is a random sequence generator (technically, an exchangeable sequence generator) that can be continuously adjusted between producing perfectly harmonic sequences of frequencies and sequences with a nearly log-normal frequency distribution.

We generated these sequences from our Bayesian model with *q* set to the average hermit thrush song overtone distribution and *b* = 393.21 Hz, which is the exponentiated average log-base frequency of the hermit thrush songs. The harmonicity of a generated sequence can be controlled by varying the jitter SD *η*. For very small *η*, the resulting frequency density is composed of distinct, evenly spaced peaks. This can be seen in Fig. 3 for *η* = 0.005 and *η* = 0.015. Drawing a sequence from such a density yields a harmonic sequence, because virtually all probability is contained in the peaks. For larger *η*, the peaks begin to overlap until the harmonic structure disappears and the density looks log-normal (*η* = 0.3 in Fig. 3). We drew sequences with *K* = 14 notes, which corresponds to the rounded average song length in our sample of hermit thrush songs. To validate that our hypothesis comparison is indeed sensitive enough to separate sequences whose frequency distribution corresponds to an overtone series from sequences whose frequencies are derived from a log-normal frequency distribution, we drew 100 sequences for several values of *η* between 0.005 and 0.3, for a total of 1,100 sequences.

We also used the GT to determine the minimum number of notes *K* that a sequence (or song) should contain so that we have enough statistical power to determine whether the detected harmonicity of the sequence corresponds to the true harmonicity. Specifically, we searched for a *K* such that both *p*(H1|harmonic song) >1–0.05 = 0.95 and *p*(H0|random song) >0.95. Given the value of 0.014 obtained for the empirical deviation *σ* with human singers, we treated sequences with a jitter of *η* ≤ 0.01 as clearly harmonic, and sequences with *η* ≥ 0.03 as clearly nonharmonic. From the GT, we computed the log-ratio that each note contributes on average in favor (or against) each hypothesis. We obtained a log-ratio of 0.290 for *η* = 0.01 in favor of H0 and. 1.015 for *η* = 0.01 in favor of H1. Assuming that we have no initial preference for either H0 or H1, these values imply *K* = 2.9 and *K* = 10.1, respectively. We therefore chose *K* = 10 as the minimal number of notes for which we have enough statistical power to determine the harmonicity of a song or sequence. Note that using a slightly larger or smaller value of *K* did not significantly affect our results.

## Acknowledgments

We thank the Borror Laboratory of Bioacoustics, Kevin Colver, and Bernie Krause for recordings; Chris Hill and Sue Anne Zollinger for information on vocal anatomy; and Neeltje Boogert, Drew Rendall, W. Andrew Schloss, Ford Doolittle, Tacye Phillipson, Andrew Horn, and Neil Banas for constructive input. E.L.D. thanks Canada Council for the Arts, and W.T.F. and B.G. acknowledge the support of European Research Council Advanced Grant 230604. D.M.E. acknowledges support from the EU Commission, KoroiBot FP7-ICT-2013-10/611909, Deutsche Forschungsgemeinschaft DFG GI 305/4-1 and Graduiertenkolleg-IRTG-1901-BrainAct, the Human Brain Project, and Medical Research Council Fellowship G0501319.

## Footnotes

↵

^{1}E.L.D. and B.G. contributed equally to this work.- ↵
^{2}To whom correspondence should be addressed. Email: tecumseh.fitch{at}univie.ac.at.

Author contributions: E.L.D. and W.T.F. designed research; E.L.D. and W.T.F. performed research; E.L.D., B.G., and D.M.E. analyzed data; and E.L.D., B.G., D.M.E., and W.T.F. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1406023111/-/DCSupplemental.

Freely available online through the PNAS open access option.

## References

- ↵.
- Harkleroad L

- ↵.
- Deutsch D

- Carterette EC,
- Kendall R

- ↵
- ↵
- ↵.
- Tierney AT,
- Russo FA,
- Patel AD

- ↵.
- Larsen ON,
- Goller F

- ↵
- ↵
- ↵.
- Mindlin GB,
- Laje R

- ↵
- ↵.
- Darwin C

- ↵.
- Armstrong EA

- ↵.
- Hartshorne C

- ↵
- ↵
- ↵.
- Thorpe WH,
- Hall-Craggs J,
- Hooker B,
- Hooker T,
- Hutchison R

- ↵.
- Doolittle E,
- Brumm H

- ↵
- ↵
- ↵.
- Mathews FS

- ↵
- ↵
- ↵
- ↵.
- Benade AH

- ↵
- ↵
- ↵.
- Titze IR

- ↵
- ↵
- ↵
- ↵.
- Wallin NL,
- Merker B,
- Brown S

- Trehub SE

- ↵
- ↵
- ↵
- ↵.
- Chiandetti C,
- Vallortigara G

- ↵
- ↵.
- Cator LJ,
- Arthur BJ,
- Harrington LC,
- Hoy RR

- ↵.
- Schwartz DA,
- Howe CQ,
- Purves D

- ↵.
- Ross D,
- Choi J,
- Purves D

- ↵
- ↵
- ↵Boersma P, Weenink D (1992–2010).
*Praat: Doing Phonetics by Computer*(Univ of Amsterdam), Version 5.1.29. Available at www.praat.org/. Accessed June 18, 2012 - ↵.
- Bishop CM

## Citation Manager Formats

## Article Classifications

- Biological Sciences
- Psychological and Cognitive Sciences