The motor origins of human and avian song structure

Edited by Dale Purves, Duke University Medical Center, Durham, NC, and approved July 20, 2011 (received for review March 30, 2011)
August 29, 2011
108 (37) 15510-15515

Abstract

Human song exhibits great structural diversity, yet certain aspects of melodic shape (how pitch is patterned over time) are widespread. These include a predominance of arch-shaped and descending melodic contours in musical phrases, a tendency for phrase-final notes to be relatively long, and a bias toward small pitch movements between adjacent notes in a melody [Huron D (2006) Sweet Anticipation: Music and the Psychology of Expectation (MIT Press, Cambridge, MA)]. What is the origin of these features? We hypothesize that they stem from motor constraints on song production (i.e., the energetic efficiency of their underlying motor actions) rather than being innately specified. One prediction of this hypothesis is that any animals subject to similar motor constraints on song will exhibit similar melodic shapes, no matter how distantly related those animals are to humans. Conversely, animals who do not share similar motor constraints on song will not exhibit convergent melodic shapes. Birds provide an ideal case for testing these predictions, because their peripheral mechanisms of song production have both notable similarities and differences from human vocal mechanisms [Riede T, Goller F (2010) Brain Lang 115:69–80]. We use these similarities and differences to make specific predictions about shared and distinct features of human and avian song structure and find that these predictions are confirmed by empirical analysis of diverse human and avian song samples.
Song exists in every human culture and exhibits a vast diversity of forms (1). What regularities exist amid this diversity, and what is their origin? This question is relevant to debates over the evolutionary and biological foundations of music (2, 3). One widespread feature is a tendency to create melodies from a limited set of stable pitches or pitch intervals (i.e., the use of musical “scales”) (4, 5). This may reflect a human cognitive tendency to create sound sequences from a small inventory of discrete elements, which are generatively recombined to form unique patterns. [This propensity is well known from language, in which unique sentences are produced from a finite set of phonemes (5).] Although musical scales and pitch intervals have recently attracted attention from biologists (6, 7), much less attention has been paid to the melodic “shape” of human song (i.e., how pitch patterns unfold over time). Cross-cultural research has revealed several widespread aspects of melodic shape, including (i) a predominance of arch-shaped and descending melodic contours in musical phrases, (ii) a tendency for phrase-final notes to be relatively long, and (iii) a bias toward “smooth” pitch contours (i.e., a statistical bias toward small pitch movements between adjacent pitches in a melody) (810) (Fig. 1).
Fig. 1.
Example human melody, illustrating arch-shaped and descending melodic contours, relatively long phrase-final notes, and smooth pitch contours. (A) A German folk song from the Essen database, in Western music notation (Audio File S1). The song consists of three phrases. (B) The same song represented as a pattern of pitch vs. time in semitones (ST) from the tonic pitch (F4 or 349 Hz, the last pitch of the melody). The blue horizontal lines show the pitch values of individual tones and their relative durations, and phrase boundaries are indicated by dashed vertical lines (in this song, all tones have one of two possible durations, short and long: note how phrase-final tones are always long). Red dots within each phrase show the mean pitch of the first, second, and last third of each phrase. (C) Melodic contour shapes (ascending, arch, and descending) assigned to the three phrases according to the pitch patterns of the red dots in B (see text for details). (D) The melody in B with the individual notes in each phrase randomly reordered in time, illustrating the more jagged contours that result (i.e., a tendency for larger jumps between adjacent pitches, compared with the original melody, which has a relatively “smooth” pitch contour by comparison). In the Essen database, phrases had an average of 9.5 notes each, excluding rests (SD = 3.4) and an average range of 8.9 semitones (st) between the highest and lowest note of the phrase (SD = 3.7). Songs had an average of 5.6 phrases each (SD = 2.8).
What is the origin of these features? In the study of speech, universal design principles are often taken as the hallmark of biological specialization for language (11). In contrast, we hypothesize that many widespread features in the melodic shape of song stem from motor constraints on song production, rather than being innately specified. By “motor constraints” we mean that the sound-producing actions underlying different melodic shapes vary in their energetic cost and that less costly actions are favored (cf. 12). Consequently, certain melodic shapes are widespread. A brief analogy helps to illustrate this argument: when humans swim they can adopt many different motor patterns, including the “butterfly” (which involves simultaneously lifting both arms out of the water and rotating them about the shoulder) and the “crawl” (which involves lifting one arm out of the water at a time, while rolling the body to the side). Both are effective ways of swimming, but because of the biomechanics of the human body in water, the crawl is less energetically expensive than the butterfly. According to a motor constraint hypothesis, this difference accounts for the fact that the crawl is a much more commonly observed swimming pattern than the butterfly.
The motor constraint hypothesis for human song claims that energetic costs (i.e., the metabolic energy required for production) underlie widespread features of melodic shape described above and leads to testable predictions. One prediction of this hypothesis is that any animals subject to similar motor constraints on song will exhibit similar melodic shapes, no matter how distantly related those animals are to humans. Conversely, animals who do not share the same motor constraints on song will not exhibit convergent melodic shapes. Birds provide an ideal case for testing these predictions. The ancestors of humans and birds diverged more than 250 million years ago, and the functional anatomy of humans and birds differs in many respects (13). Nevertheless, birdsong specialists have recently emphasized several (convergently evolved) commonalities in the vocal production biomechanics of birds and humans, while also noting specific differences (14). Two major commonalities are (i) birds and humans use respiratory air pressure to drive sound-producing oscillations in membranous tissues (the vocal folds in humans; the labial folds in birds), and (ii) the resulting sounds are filtered by a vocal tract whose shape can be rapidly changed to emphasize or attenuate certain frequencies (1520). A major difference between birds and humans concerns the fact that birds have two sets of oscillating membranes that can be controlled independently (in the two-sided syrinx), whereas humans only have one set (in the larynx) (14).
On the basis of our motor constraint hypothesis, we use these biomechanical similarities and differences to predict specific acoustic similarities and differences in the melodic shape of human and avian song, as detailed below. Testing these predictions requires empirical comparison of human and avian song structure based on diverse samples in both domains. Hence the selection of materials for analysis was an important part of this study. For human data we chose folk songs, using the largest cross-cultural database of digitally encoded folk songs that have been segmented into individual phrases: the Essen Musical database (21). This corpus contains music notation for 9,467 folk songs (52,899 phrases) from 32 geographic regions including Austria, China, France, Germany, Ireland, Luxembourg, The Netherlands, Nova Scotia, Russia, Scotland, Switzerland, and Yugoslavia (Table S1). Although this represents a small sample of the world's cultures, the inclusion of many songs from China (>2,000) allows us to test whether the patterns we study are primarily European or are more general in nature. Of course, any widespread song features in this database cannot be claimed as “universals,” yet they merit attention because the motor constraint hypothesis predicts that these patterns will be found across the diversity of human cultures. A sample folk song from this database is shown in Fig. 1 A and B.
For bird data we chose songs in which all notes had either a pure-tone quality or a harmonic structure with a fundamental frequency as the harmonic with the most power. That is, we focused on “tonal” birdsongs, to conduct empirical comparisons of pitch patterns in avian and human song. In addition, we focused on birdsongs with at least five notes, low background noise, and significant pitch variation (Experimental Methods). Using multiple sources, including research libraries and published recordings, we compiled a taxonomically diverse sample of birdsongs from 54 songbird families, with 80 species represented (one song per species; Table S2). Three birdsongs from our database are shown in Fig. 2. In gathering birdsongs we focused on songbird (oscine) families because they are the richest source of tonal songs in the avian world. Like humans, oscines are vocal learners. Some suboscines also produce songs, but these songs are not learned from an auditory model (22). Although we focus on songbirds in this study, our motor constraint hypothesis should also apply to suboscines that produce tonal songs with peripheral vocal mechanisms akin to those in songbirds.
Fig. 2.
Example bird songs (Audio Files S2, S3, S4, S5, and S6). (A) Waveform and spectrogram of a field sparrow song, Spizella pusilla (family Emberizidae), to illustrate the pitch contours of individual notes. (B) Two notes from the birdsong represented as pitch–time contours, with the mean pitches of each third of the note shown by red dots, as in Fig. 1B. Shown below the pitch contours are the melodic contour shapes assigned to these two notes. (Shape classification was done for all birdsong notes: the notes in B are shown for illustrative purposes. Several more examples of birdsong note shape classification are given in Figs. S3S7.) (C) Waveform and spectrogram of a Eurasian treecreeper song, Certhia familiaris (family Certhiidae), to illustrate the tendency for long final notes. (D) Waveform and spectrogram of a summer tanager, Piranga rubra (family Thraupidae), to illustrate a tendency for large pitch jumps between adjacent notes. Birdsongs in our corpus averaged 2 s in duration (SD = 1.2), had 13.7 notes on average (SD = 8.1), and an average range of 9.0 semitones between the median frequencies of the highest and lowest notes of the song (SD = 4.5). On the basis of the median frequency of each note in the corpus, the average frequency of birdsong notes was 3,730 Hz (SD = 1,584). Field sparrow image courtesy of Kelly Colgan Azar. Eurasian treecreeper image courtesy of Sergey Yeliseev. Summer tanager image courtesy of Jeff Whitlock.
Our empirical comparative analyses followed the following strategy: for each of the three widespread aspects of melodic shape under investigation, we examined our human and birdsong corpora, first confirming its existence in human song and then testing for specific similarities and differences to birdsong, on the basis of the predictions of the motor constraint hypothesis. In conducting this work, it is important to consider whether our selection criteria for birdsongs (i.e., tonal songs with significant pitch variation) may have biased us toward finding certain kinds of pitch patterns. Our criteria excluded birdsongs containing a more varied selection of sounds, including buzzy, noisy, or click-like notes containing inharmonic frequencies (e.g., the songs of European starlings, Sturnus vulgaris). Instead, we focus on birdsongs with notes containing strong fundamental frequency contours, resulting in clear pitch patterns. Crucially, however, the mere existence of clear pitch patterns in birdsongs does not, a priori, imply that these patterns will follow any particular shape. Thus, we believe the outcome of our study is not predetermined by our selection criteria.
One other possible concern with our approach is that our analysis of human song relies on notation rather than acoustic recordings, whereas our analysis of birdsong focuses on recordings. Ideally, our analyses would focus on recordings in both domains, but extensive audio corpora of culturally diverse human folk songs, consisting of single melodic lines segmented into individual notes and phrases, are not readily available. Thus, our analysis of notation raises the question of how closely the patterns we observe reflect those found in actual human singing. This is an important question, because it is known that pitch and timing patterns in music performance are not identical to the values indicated by notation (23). For example, a singer may “bend” (lower or raise) the pitch of a note, or alter a note's duration, for expressive effect. However, our analyses of pitch patterns do not focus on the fine-grained nuances of melodic shape but on broad features such as overall contour and the average size of pitch intervals in a melody, which are similar in performance and notation (24). Furthermore, our analysis of phrase-final lengthening is conservative, because research shows that in music performance the final notes of phrases are even longer than indicated by notation (25, 26). Thus, we fully expect that our notation-based findings will be replicated when broad-based audio corpora of human song are analyzed in the future.

Results

Melodic Contour and Respiratory Constraints.

Human song phrases, like spoken utterances, are produced during exhalations in which air pressure beneath the vocal folds (“subglottal pressure”) is regulated to influence the loudness and pitch of sound (26). Subglottal pressure during these exhalations has a characteristic profile: it rises rapidly near the beginning of the utterance, stays relatively steady or declines slightly during most of the utterance (with smaller modulations involved in regulating loudness and pitch), and falls sharply at the end (27). Humans can control the pitch of their voice independently of subglottal pressure (via the tension of the vocal folds), yet other things being equal, higher pressure leads to faster vocal fold vibration and hence higher pitch (28). Thus, higher pitches should be easier to produce when subglottal pressure is high, and vice versa. On the basis of this motor constraint one would expect two types of pitch contours to predominate in human song phrases: arch-shaped contours that rise to a peak and then fall, and descending contours that start high and gradually lower in pitch over the course of a phrase.
Following the method of Huron (9), we classified all song phrases in our human corpus into one of nine melodic contour shapes: rising, falling, rising-falling (arch), falling-rising, etc. Specifically, each phrase was converted to a pattern of pitch vs. time, divided into three equal time segments, and the mean pitch of each segment was taken. The resulting three pitch values formed a pattern that was classified into one of nine possible shapes, depending on whether the first and last pitches were higher, lower, or equivalent to the middle pitch. Fig. 3A shows a histogram of the resulting shapes. Confirming Huron's earlier findings (based on European folk songs), arch and descending contours were the most common melodic shapes. Notably, arch contours were significantly more common than their inverse shape (V-shape), and descending contours were significantly more common than their inverse ascending shape (both P < 0.0001, binomial test). (These patterns held for the corpus as a whole and for Chinese songs analyzed separately.) These distributional biases are predicted by the motor-constraint hypothesis, because the biomechanical relationship between subglottal pressure and vocal fold vibration rate should make arch and descending pitch contours more energetically efficient to produce than their inverse shapes.
Fig. 3.
Distribution of melodic contour shapes in (A) human song phrases and (B) individual birdsong notes. Following ref. 9, all contours were assigned to one of nine possible shapes, as described in the text.
Songbird vocalizations are primarily produced during controlled exhalation, and given the similarities in the myoelastic-aerodynamic sound-producing mechanisms in humans and birds (14), the motor constraint hypothesis predicts that arch and descending contours will predominate in birdsong. In testing this prediction, one important difference between birds and humans was taken into account. Unlike humans, birds tend to breathe between the individual notes of their songs, likely because this allows them to sing longer songs by constantly replenishing the small air sacs that supply the lungs (17). This means that each note is produced by a separate small exhalation, which leads to the prediction that arch and descending contours will predominate at the level of individual notes. To test this prediction we used computer software to track the fundamental frequency contours of all notes in our corpus (n = 1,092) on sound spectrograms (Experimental Methods). We then converted each frequency contour into a pitch contour and classified its shape using methods identical to those described above for human song phrases (Fig. 2B; note that conversion of avian contours from a linear Hz scale to a logarithmic semitone scale did not influence the pattern of results; Experimental Methods). Fig. 3B shows the resulting distribution. Just as with human song, arch and descending contours were common shapes. Furthermore, arches were significantly more common than V-shaped contours, and descending contours were significantly more common than ascending contours, as predicted by the motor constraint hypothesis (both P < 0.0001, binomial test).
The analysis above depended on reducing each pitch–time contour to three pitch values, to classify them into different shapes. This procedure suggested that arch and declining contours were dominant shapes in both human and avian song. As an independent check on this finding, we conducted an ancillary analysis that used much more pitch information from each contour. Specifically, for human song phrases and birdsong notes, we normalized the duration of all pitch–time contours and then averaged them, to produce average melodic shapes in each domain. The average shapes of both human song phrases and birdsong notes reflected the dominance of arch and declining contours, supporting the findings of our main analysis (Fig. S1).

Phrase-Final Note Duration and Articulatory Constraints.

In human song (and speech) the last note of phrases tends to be relatively long (25, 26). To examine our corpus for this pattern, we computed the relative duration of all phrase-final notes in our human songs. Specifically, for each phrase we computed each note's duration relative to the average duration of all notes in that phrase (Experimental Methods). Across all phrases in the corpus, we then averaged the relative duration of phrase-final notes and nonfinal notes. Phrase-final notes were significantly longer than nonfinal notes [mean (SD) of 1.58 (0.74) vs. 0.93 (0.39), t = 134.3, P < 0.0001]. Again, this held for the corpus as a whole and for Chinese songs analyzed separately.
What motor constraint might underlie this pattern? Human song (like speech) is characterized by rapid changes in the shape of the vocal tract, which serve to change its resonating properties (26). These movements typically cease momentarily at song phrase boundaries (e.g., when drawing the next breath), and the ease of slowing the articulators before coming to a complete stop, vs. stopping abruptly, could underlie the tendency for relatively long notes at phrase endings (29). Like humans, many birds also actively and rapidly change the shape of their vocal tract during song production to emphasize or attenuate certain frequencies (15, 30). Because these movements cease at the end of a song, the motor constraint hypothesis predicts that birdsong will tend to show long final notes. For each birdsong we computed the relative duration of each note by dividing each note's duration by the mean note duration in the song. By averaging the relative duration of song-final notes vs. nonfinal notes, we found that final notes tended to be relatively long, just as in human song [mean (SD) of 1.33 (0.74) vs. 0.97 (0.55), t = 7.6, P < 0.0001]. A birdsong with a relatively long final note is shown in Fig. 2C.

Small Pitch Jumps and Vibratory Constraints.

It has long been observed that the jumps between adjacent pitches in human songs tend to be small (i.e., there is an overall bias toward small pitch intervals and hence “smooth” melodic contours) (8, 10, 31). One simple way to demonstrate this is to randomly shuffle the order of pitches in a musical phrase. The resulting random phrase usually has larger jumps between adjacent pitches than the original phrase (10) (Fig. 1D). This shows that the melodic shapes of human musical phrases are smoother than one would expect simply on the basis of the distribution of pitches within a phrase. According to the motor constraint hypothesis, this bias toward small pitch intervals is due to the fact that small pitch jumps are easier to produce than large ones, because large jumps require sudden contraction or relaxation in the muscles controlling vocal fold tension. This hypothesis predicts that birds that have two sets of sound-producing labial folds should be less influenced by this constraint. This is because birds, unlike humans, can adjust tension separately in the labia on the two sides of their syrinx. Thus, by maintaining separate tensions on the left and right pair of labia, large pitch jumps can be made by alternating sound production between these structures, without demanding sudden large changes in the tension of labia on either side (18).
To test this idea, we quantified the degree of bias toward small intervals in human song and birdsong. Inspired by von Hippel (10), we created a measure called the “interval compression ratio” (ICR), defined as the mean absolute interval size (in semitones) for a melody with its pitches randomly reordered, divided by the mean absolute interval size for a melody with its pitches in their original order. (For this analysis, birdsongs were first converted to sequences of discrete frequencies, using the median frequency of each birdsong note. Note that this conversion did not involve mapping birdsong notes onto human musical scales, nor did it force the intervals computed between the discrete frequencies into integer values to resemble human music. Instead, pitch intervals in birdsong could take on continuous values, e.g., 2.3 semitones. Further details in SI Experimental Methods.)
The larger the ICR, the more biased a melody is toward small intervals. ICR values for human song phrases and birdsongs were positively skewed and hence were compared using nonparametric statistics. The median ICRs for birdsong and human song phrases were 1.21 and 1.46, respectively (Fig. S2 shows the distributions of avian and human ICR values). This difference was statistically significant (Mann-Whitney U test, U = 1,099,510, P < 0.01), indicating that human songs were more biased toward small pitch intervals, confirming the predictions of the motor constraint hypothesis. A similar result was obtained via a Monte Carlo analysis in which subsets of the human song corpus were chosen for comparison with birdsong, on the basis of matching the number of notes in human song phrases and birdsongs (SI Experimental Methods). The larger ICR value for human vs. avian song held for the Chinese songs analyzed separately, when songs with at least six scale tones were analyzed (SI Results). A birdsong with relatively large pitch jumps between adjacent notes is shown in Fig. 2D.
It is worth noting that the decision to exclude birdsongs with minimal pitch variation during our selection of 80 birdsongs for this study should not bias our analysis, because the ICR analysis is agnostic to the absolute amount of pitch variation between notes. It simply compares the average absolute interval size when the pitches of a sequence are randomly reordered vs. when they are in their original order. Hence, as long as the notes of a birdsong have nonzero variance among their median pitch values (which is very likely in biological signals such as birdsongs), the ICR analysis should be valid.

Discussion

What governs the melodic shapes of human songs? By comparing human and avian song we provide evidence that motor constraints, rather than innate factors, are the origin of several widespread features in the structure of human song phrases. Of course, once a regularity exists it can be exploited for communicative functions. Phrase-final lengthening, for example, is regulated in human musical performance as a way of marking structural boundaries (32). In the case of birdsong, it would be interesting to study whether metabolically costly features in song (i.e., those features that go against the motor constraints discussed in this article) are particularly attractive to females, who could potentially use these features to assess male vigor (cf. 12).
A motor constraint hypothesis motivates further comparative work with other species, because it makes testable predictions. For example, it predicts that the pitch contours of vocalizations made while inhaling [e.g., certain “ingressive” sounds made by primates (33)] should not show a bias toward arch and declining pitch contours, because they will not have the characteristic pressure profiles associated with exhalation. Similarly, the motor constraint hypothesis predicts that species that sing without rapidly and actively changing the shape of their vocal tract should not show a tendency to lengthen final notes. Many frogs, for example, produce songs without any salient active changes in vocal tract shape (they produce sound by pumping air through the larynx into a sac that distends passively) (34). Hence the motor constraint hypothesis predicts that frog song, unlike human and avian song, will not show long final notes.
The present work extends a long tradition of comparing the structure of birdsong and human music (e.g., 3538) but is distinguished by applying empirical methods to a diverse sample of birdsongs and human songs, in the context of hypothesis-driven research. We believe that such an approach can be used to discover many other similarities and differences between animal songs and human music.

Experimental Methods

Pitch and duration values for human songs were imported from the Essen database into MATLAB (MathWorks) using custom-written software in Python. In the Essen database, pitch values are coded as scale steps from the tonic or structurally central pitch of the melody, and duration values are coded as multiples of the shortest note in the song. For melodic contour analyses of phrases, pitch values were converted into continuous functions of pitch vs. time (pitch in semitones from the tonic pitch of the song, time relative to shortest note) (Fig. 1B), and rests were eliminated. Only phrases with at least five notes were used (77% of all phrases in the corpus), to ensure that the pitch–time contours had enough material to assign a meaningful shape. These contours were then sampled at 50 equally spaced time points and then divided into consecutive segments of approximately equal duration (17, 16, and 17 points). The mean pitch of each segment was computed, and the resulting three pitch values were classified into one of nine contour shapes as described in the text. For this analysis pitch values were marked as equivalent if the differences between them did not exceed 0.2 semitones.
For the phrase-final duration analysis of human music, note durations in each phrase were first expressed relative to the shortest note in the phrase [e.g., the third phrase of Fig. 1A would have durations of (1, 1, 1, 1, 1, 1, 1, 2, 2)]. The average duration of all notes in the phrase was then computed, and each note's duration was expressed relative to this average duration. In doing this analysis, phrases ending in rests were excluded, and if a phrase had internal rests, the durations of these rests were excluded from the analysis. [In the preceding example, the average duration would be 1.22, and relative note duration would thus be (0.82 0.82 0.82 0.82 0.82 0.82 0.82 1.64 1.64)]. These relative duration values were used to compute the average duration of phrase final vs. nonfinal notes across phrases in the corpus (N = 36,313 phrases).
Birdsongs were selected according to the criteria stated in the Introduction. Furthermore, all birdsongs consisted of a sequence of notes preceded and followed by a long pause relative to the duration of the notes and were excluded if the song had minimal pitch variation (less than one semitone difference between the highest and lowest notes, according to the median frequency of each note) or if any of the notes had two simultaneous, distinct pitches (likely made with the “two voice” properties of the syrinx). We sought songs meeting our criteria from the Cornell Laboratory of Ornithology, the Borror Laboratory of Bioacoustics, the British Museum Library, CDs accompanying Nature's Music, The Singing Life of Birds, and Music of the Birds (3941), and 12 Internet sources. Aiming for taxonomic diversity, we originally hoped to collect one song for each of the 84 songbird (oscine) families (42). However, because of our strict criteria and the nature of available materials, we found samples for only 54 families. We then sampled one more species from 26 of these families (sampling when possible from the most speciose families), for a total of 80 songs and species (Table S2). Narrow-band spectrograms were made of each song using SIGNAL (Engineering Design; SI Experimental Methods). The duration and fundamental frequency of each note were extracted from spectrograms using an automatic spectral contour detection algorithm (SI Experimental Methods). The resulting frequency contours were converted to pitch contours [i.e., all frequency points in the contour were converted to semitones from the mean frequency of the note using the formula ST = 12*log2(F/mean(F)), where F is the frequency of a data point in Hz and mean(F) is the mean frequency of the note]. (Note that this conversion did not involve mapping birdsong notes into the discrete pitches or intervals of human musical scales but simply converted frequency contours in a linear Hz scale into pitch contours in a logarithmic semitone scale, to study human and avian pitch contours in a comparable way.) Bird note pitch contours were then sampled at 50 equally spaced time points and classified into shapes using methods identical to those for human song phrases. (Note that a significant predominance of arch and descending contours vs. their inverse shapes was found whether frequency contours in Hz or pitch contours in semitones were used as the basis for birdsong analysis.)
For song-final duration analyses of birdsongs, the durations of notes in each song were first expressed relative to the shortest note in the song. Analysis then proceeded in a manner identical to that for human song phrases.

Acknowledgments

We thank Rindy Anderson, Tim Gentner, John Iversen, and Susan Peters for helpful comments; Eleanor Selfridge-Field and Craig Sapp for help with the Essen database, including access to the Chinese folk songs; Lola Cuddy, Christopher Sturdy, and Ronald Weisman for their contributions to an early implementation of the motor constraint hypothesis in the context of birdsong; and staff members at the Cornell Laboratory of Ornithology, the Borror Laboratory of Bioacoustics, and the British Museum Library for their help in locating birdsongs for our study. This study was supported by Neurosciences Research Foundation as part of its program on music and the brain at The Neurosciences Institute, where A.D.P. is the Esther J. Burnham Senior Fellow.

Supporting Information

Supporting Information (PDF)
Supporting Information
sa01.wav
sa02.wav
sa03.wav
sa04.wav
sa05.wav
sa06.wav

References

1
, eds B Nettl, R Stone (Garland Publications, New York The Garland Encyclopedia of World Music., 1998).
2
, eds NL Wallin, B Merker, S Brown (MIT Press, Cambridge, MA The Origins of Music, 2001).
3
AD Patel, Music, biological evolution, and the brain. Emerging Disciplines, ed M Bailar (Rice Univ Press, Houston, TX), pp. 91–144 (2010).
4
D Reck Music of the Whole Earth (Da Capo Press, New York, 1997).
5
AD Patel Music, Language, and the Brain (Oxford Univ Press, New York, 2008).
6
KZ Gill, D Purves, A biological rationale for musical scales. PLoS ONE 4, e8144 (2009).
7
JH McDermott, AJ Lehr, AJ Oxenham, Individual differences reveal the basis of consonance. Curr Biol 20, 1035–1041 (2010).
8
D Huron Sweet Anticipation: Music and the Psychology of Expectation (MIT Press, Cambridge, MA, 2006).
9
D Huron, The melodic arch in Western folksongs. Computing Musicol 10, 3–23 (1996).
10
P von Hippel, Redefining pitch proximity: Tessitura and mobility as constraints on melodic intervals. Music Percept 17, 315–327 (2000).
11
N Chomsky Rules and Representations (Columbia Univ Press, New York, 1980).
12
J Podos, S Nowicki, Performance limits on birdsong. Nature's Music: The Science of Birdsong, eds P Marler, H Slabberkoorn (Elsevier, Amsterdam), pp. 318–342 (2004).
13
K Schmidt-Nielsen, How birds breathe. Sci Am 225, 72–79 (1971).
14
T Riede, F Goller, Peripheral mechanisms for vocal production in birds—differences and similarities to human speech and singing. Brain Lang 115, 69–80 (2010).
15
S Nowicki, Vocal tract resonances in oscine bird sound production: Evidence from birdsongs in a helium atmosphere. Nature 325, 53–55 (1987).
16
S Nowicki, P Marler, How do birds sing? Music Percept 5, 391–426 (1988).
17
RA Suthers, F Goller, C Pytte, The neuromuscular control of birdsong. Philos Trans R Soc Lond B Biol Sci 354, 927–939 (1999).
18
RA Suthers, How birds sing and why it matters. Nature's Music: The Science of Birdsong, eds P Marler, H Slabberkoorn (Elsevier, Amsterdam), pp. 272–295 (2004).
19
J Sundberg, The acoustics of the singing voice. Sci Am 236, 82–84, 86, 88–91 (1977).
20
IR Titze, The human instrument. Sci Am 298, 94–101 (2008).
21
E Selfridge-Field Essen Musical Data Package. Center for Computer Assisted Research in the Humanities, Stanford University, Technical Report No. 1 (CCARH, Menlo Park, CA, 1995).
22
DE Kroodsma, M Konishi, A suboscine bird (eastern phoebe, Sayornis phoebe) develops normal song without auditory feedback. Anim Behav 42, 477–487 (1991).
23
C Palmer, Music performance. Annu Rev Psychol 48, 115–138 (1997).
24
A Rakowski, Intonation variants of musical intervals in isolation and in musical contexts. Psychol Music 18, 60–72 (1990).
25
J Sundberg, Emotive transforms. Phonetica 57, 95–112 (2000).
26
B Lindblom, J Sundberg, The human voice in speech and singing. Springer Handbook of Acoustics, ed T Rossing (Springer, New York), pp. 669–712 (2007).
27
J Slifka, Respiratory system pressures at the start of an utterance. Dynamics of Speech Production and Perception, eds P Divenyi, et al. (IOS Press, Amsterdam), pp. 45–57 (2006).
28
F Alipour, RC Scherer, On pressure-frequency relations in the excised larynx. J Acoust Soc Am 122, 2296–2305 (2007).
29
S Myers, B Hansen, The origin of vowel length neutralization in final position: Evidence from Finnish speakers. Nat Lang Linguist Theory 25, 157–193 (2007).
30
T Riede, RA Suthers, NH Fletcher, WE Blevins, Songbirds tune their vocal tract to the fundamental frequency of their song. Proc Natl Acad Sci USA 103, 5543–5548 (2006).
31
HJ Watt, Functions of the size of interval in the songs of Schubert and of the Chippewa [i.e., Ojibway] and Teton Sioux [i.e., Lakota] Indians. Br J Psychol 14, 370–386 (1924).
32
N Todd, A model of expressive timing in tonal music. Music Percept 3, 3–58 (1985).
33
R Eklund, Pulmonic ingressive phonation: Diachronic and synchronic characteristics, distribution and function in animal and human sound production and in human speech. J Int Phonetic Assoc 38, 235–324 (2008).
34
C Gans, Sound production in the Salientia: Mechanisms and evolution of the emitter. Am Zool 13, 1179–1194 (1973).
35
C Hartshorne, The relation of birdsong to music. Ibis 100, 421–445 (1958).
36
LF Baptista, RA Keister, Why birdsong is sometimes like music. Perspect Biol Med 48, 426–443 (2005).
37
D Rothenberg Why Birds Sing: A Journey Into the Mystery of Bird Song (Basic Books, New York, 2005).
38
H Taylor, Decoding the song of the pied butcherbird: An initial survey. Transcultural Music Rev 12, 1–30 (2008).
39
P Marler, H Slabbekorn Nature's Music: The Science of Birdsong (Elsevier, Amsterdam, 2004).
40
D Kroodsma The Singing Life of Birds (Houghton Mifflin, New York, 2004).
41
L Elliot Music of the Birds: A Celebration of Birdsong (NatureSound Studio, New York, 1999).
42
R Howard, A Moore A Complete Checklist of Birds of the World (Macmillan, London, 1984).

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 108 | No. 37
September 13, 2011
PubMed: 21876156

Classifications

Submission history

Published online: August 29, 2011
Published in issue: September 13, 2011

Keywords

  1. birdsong
  2. evolution
  3. music

Acknowledgments

We thank Rindy Anderson, Tim Gentner, John Iversen, and Susan Peters for helpful comments; Eleanor Selfridge-Field and Craig Sapp for help with the Essen database, including access to the Chinese folk songs; Lola Cuddy, Christopher Sturdy, and Ronald Weisman for their contributions to an early implementation of the motor constraint hypothesis in the context of birdsong; and staff members at the Cornell Laboratory of Ornithology, the Borror Laboratory of Bioacoustics, and the British Museum Library for their help in locating birdsongs for our study. This study was supported by Neurosciences Research Foundation as part of its program on music and the brain at The Neurosciences Institute, where A.D.P. is the Esther J. Burnham Senior Fellow.

Notes

This article is a PNAS Direct Submission.

Authors

Affiliations

Adam T. Tierney
Auditory Neuroscience Laboratory, Northwestern University, Evanston, IL 60208;
The Neurosciences Institute, San Diego, CA 92121; and
Frank A. Russo
Department of Psychology, Ryerson University, Toronto, ON, Canada M5B 2K3
Aniruddh D. Patel1 [email protected]
The Neurosciences Institute, San Diego, CA 92121; and

Notes

1
To whom correspondence should be sent. E-mail: [email protected].
Author contributions: A.T.T., F.A.R., and A.D.P. designed research; A.T.T. and A.D.P. performed research; A.T.T. and F.A.R. analyzed data; and A.T.T. and A.D.P. wrote the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements




Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to get full access to it.

    Single Article Purchase

    The motor origins of human and avian song structure
    Proceedings of the National Academy of Sciences
    • Vol. 108
    • No. 37
    • pp. 15011-15534

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media