Pitch perception: A dynamicalsystems perspective
 ^{*}Laboratorio de Estudios Cristalográficos, Consejo Superior de Investigaciones Científicas, E18071 Granada, Spain; ^{‡}Istituto Lamel, Consiglio Nazionale delle Ricerche, I40129 Bologna, Italy; and ^{¶}Institut Mediterrani d'Estudis Avançats, Consejo Superior de Investigaciones Científicas–Universitat de les Illes Balears, E07071 Palma de Mallorca, Spain
See allHide authors and affiliations

Communicated by Leo P. Kadanoff, University of Chicago, Chicago, IL (received for review December 13, 1999)
Abstract
Two and a half millennia ago Pythagoras initiated the scientific study of the pitch of sounds; yet our understanding of the mechanisms of pitch perception remains incomplete. Physical models of pitch perception try to explain from elementary principles why certain physical characteristics of the stimulus lead to particular pitch sensations. There are two broad categories of pitchperception models: place or spectral models consider that pitch is mainly related to the Fourier spectrum of the stimulus, whereas for periodicity or temporal models its characteristics in the time domain are more important. Current models from either class are usually computationally intensive, implementing a series of steps more or less supported by auditory physiology. However, the brain has to analyze and react in real time to an enormous amount of information from the ear and other senses. How is all this information efficiently represented and processed in the nervous system? A proposal of nonlinear and complex systems research is that dynamical attractors may form the basis of neural information processing. Because the auditory system is a complex and highly nonlinear dynamical system, it is natural to suppose that dynamical attractors may carry perceptual and functional meaning. Here we show that this idea, scarcely developed in current pitch models, can be successfully applied to pitch perception.
The pitch of a sound is where we perceive it to lie on a musical scale. For a pure tone with a single frequency component, pitch rises monotonically with frequency. However, more complex signals also elicit a pitch sensation. Some instances are presented in Fig. 1. These are sounds produced by the nonlinear interaction of two or more periodic sources, by amplitude or frequency modulation. All such stimuli, which may be termed complex tones, produce a definite pitch sensation, and all of them exhibit a certain spectral periodicity. Many natural sounds have this quality, including vowel sounds in human speech and vocalizations of many other animals. Evidence for the importance of spectral periodicity in sound processing by humans is that noisy stimuli exhibiting this property also elicit a pitch sensation. An example is repetition pitch: the pitch of ripple noise (1), which arises naturally when the sound from a noisy source interacts with a delayed version of itself, produced, for example, by a single or multiple echo. It is clear that an efficient mechanism for the analysis and recognition of complex tones represents an evolutionary advantage for an organism. In this light, the pitch percept may be seen as an effective oneparameter categorization of sounds possessing some spectral periodicity (2–5).
Virtual Pitch
For a harmonic stimulus like Fig. 1b (a periodic signal), there is a natural physical solution to the problem of encoding it with a single parameter: take the fundamental component of the stimulus as the pitch and all other components are naturally recorded as the higher harmonics of the fundamental. This is what nature does. However, a harmonic stimulus like Fig. 1c, which is highpass filtered so that the fundamental and some of the first higher harmonics are eliminated, nevertheless maintains its pitch at the frequency of the absent fundamental. The stimulus (Fig. 1e) obtained by amplitude modulation of a sinusoidal carrier of 1 kHz by a sinusoidal modulant of 200 Hz is also of this type. Because the carrier and modulant are rationally related, the stimulus is harmonic; the partials are integer multiples of the absent fundamental ω_{0} = 200 Hz. The perception of pitch for this kind of stimulus is known as the problem of the missing fundamental, virtual pitch, or residue perception (6). The first physical theory for the phenomenon was proposed by von Helmholtz (7), who attributed it to the generation of difference combination tones in the nonlinearities of the ear. A passive nonlinearity fed by two sources with frequencies ω_{1} and ω_{2} generates combination tones of frequency ω_{C} (see the Appendix for clarification of the concepts from nonlinear dynamics used throughout this paper). For a harmonic complex tone, such as Fig. 1e, the difference combination tone ω_{C} = ω_{2} − ω_{1} between two successive partials has the frequency of the missing fundamental ω_{0}. In a crucial experiment, however, Schouten et al. (8) demonstrated that the residue cannot be described by a difference combination tone: if we shift all of the partials in frequency by the same amount Δω (Fig. 1f), the difference combination tone remains unchanged. But the perceived pitch shifts, with a linear dependence on Δω.
A DynamicalSystems Perspective
Such a complex tone is no longer harmonic. How does nature encode an inharmonic complex tone into a single pitch? Intuitively, the shifted pseudofundamental depicted in Fig. 1g might seem to be a better choice than the unshifted fundamental, which corresponds to the difference combination tone. However, from a mathematical point of view, this is not obvious. The ratios between successive partials of the shifted stimulus are irrational and we cannot represent them as higher harmonics of a nonzero fundamental frequency; the true fundamental would have frequency zero. Some kind of approximation is needed. The approximation of two arbitrary frequencies, ω_{1} and ω_{2}, by the harmonics of a third, ω_{R}, is equivalent to the mathematical problem of finding a strongly convergent sequence of pairs of rational numbers with the same denominator that simultaneously approximates the two frequency ratios, ω_{1}/ω_{R} and ω_{2}/ω_{R}. If we consider the approximation to only one frequency ratio there exists a general solution given by the continuedfraction algorithm (9). However, for two frequency ratios a general solution is not known. Some algorithms have been proposed that work for particular values of the frequency ratios or that are weakly convergent (10). We developed an alternative approach (11). The idea is to equate the distances between appropriate harmonics of the pseudofundamental and the pair of frequencies we wish to approximate. In this way the two approximations are equally good or bad. The problem can then be solved by a generalization of the Farey sum. This approach enables the hierarchical classification of a type of dynamical attractors found in systems with three frequencies: threefrequency resonances [p, q, r].
A classification of threefrequency resonances allows us to propose how nature might encode an inharmonic complex tone into a single pitch percept. The pitch of a complex tone corresponds to a oneparameter categorization of sounds by a physical frequency whose harmonics are good approximations to the partials of the complex. This physical frequency is naturally generated as a universal response of a nonlinear dynamical system—the auditory system, or some specialized subsystem of it—under the action of an external force, namely the stimulus. Psychophysical experiments with multicomponent stimuli suggest that the lowestfrequency components are usually dominant in determining residue perception (6). Thus we represent the external force as a first approximation by the two lowestfrequency components of the stimulus. For pitch shift experiments with small frequency detuning Δω, such as those of Schouten et al., the vicinity of these two lowest components ω_{1} = kω_{0} + Δω and ω_{2} = (k + 1)ω_{0} + Δω to successive multiples of some missing fundamental ensures that (k + 1)/k is a good rational approximation to their frequency ratio. Hence, we concentrate on a small interval between the frequencies ω_{1}/k and ω_{2}/(k + 1) around the missing fundamental of the nonshifted case. These frequencies correspond to the threefrequency resonances [0, −1, k] and [−1, 0, k + 1]. We suppose that the residue should be associated with the largest threefrequency resonance in this interval: the daughter of these resonances, [−1, −1, 2k + 1]. If this reasoning is correct, the threefrequency resonance formed between the two lowestfrequency components of the complex tone and the response frequency P = (ω_{1} + ω_{2})/(2k + 1) gives rise to the perceived residue pitch P.
Results
As we showed in earlier work (12), there is good agreement between the pitch perceived in experiments and the threefrequency resonance produced by the two lowestfrequency components of the complex tone for intermediate harmonic numbers 3 ≤ k ≤ 8. For high and low k values there are systematic deviations from these predictions. Such deviations, noted in pitchperception modeling, are explained by the dominance effect: there is a frequency window of preferred stimulus components, so that not all components are equally important in determining residue perception (13). To describe these slope deviations for high and low k values within our approach, we must, instead of taking the lowestfrequency components, use some effective k that depends on the dominance effect. In this, we also take into account the presence of difference combination tones, which provide some components with ks not present in the original stimulus. In Fig. 2 we have superimposed the predicted threefrequency resonances, including the dominance effect, on published experimental pitchshift data (8, 14, 15). For stimuli consisting only of highk components, the window of the dominance region is almost empty, and difference combination tones of lower k can become more important than the primary components in determining the pitch of the stimulus. The result of this modification is a saturation of the slopes that correctly describes the experimental data. A saturation of slopes can also be seen in the experimental data for low values of k. This effect too can be explained in terms of the dominance region. For a 200Hz stimulus spacing, the region is situated at about 800 Hz; this implies that stimulus components with harmonic numbers n and n + 1, other than the two lowest partials (i.e., n > k), become more important for determining the threefrequency resonance that provides the residue pitch. Again, incorporating this modification, we can correctly predict the experimental data.
But for the more complex case of lowk stimuli, not only quantitative, but also qualitative differences arise between the twolowestcomponent theory and experiment. The most interesting feature seen in the data of Fig. 2 is a second series of pitchshift lines clustered around the pitch of 100 Hz. This too can be explained within the framework of our ideas. Recall that for small frequency detuning, the frequency ratio between adjacent stimulus components, Δω, can be approximated by the quotient of two integers differing by unity: ω_{2}/ω_{1} = (n + 1)/n. However, if we relax the small detuning constraint, so that Δω becomes large, we can move to a case where ω_{2}/ω_{1} can better be approximated by (n + 2)/(n + 1). But, by the usual Farey sum operation between rational numbers, we know that there exists between these two regions an interval in which the frequency ratio can be better approximated by (2n + 3)/(2n + 1). In this interval, then, the main threefrequency resonance is [−1, −1, 4n + 4], giving a response frequency P = (ω_{1} + ω_{2})/(4n + 4), which produces a pitchshift line with slope 1/(2n + 2) around ω_{0}/2 = 100 Hz for the case analyzed. Of course, if prefiltering produces a saturation of the slopes of the primary pitchshift lines, the same should occur for these secondary lines. In Fig. 2 we show our predictions for the secondary lines taking into account the dominance effect. The agreement, both qualitative and quantitative, is impressive. Moreover, a small group of data points indicates the existence of a further level of pitchshift lines clustered around 50 Hz in a region between a primary and a secondary pitchshift line. We can understand this level in the same way as above, and we plot our prediction for its pitchshift line in Fig. 2. This hierarchical arrangement of the perception of pitch of complex tones is entirely consistent with the universal devil's staircase structure that dynamicalsystems theory predicts for the threefrequency resonances in quasiperiodically forced dynamical systems. Further evidence comes from psychophysical experiments with pure tones. These, presented under particular experimental conditions, also elicit a residue sensation. The extremes of the threefrequency staircase correspond to subharmonics of only one external frequency, and thus these are the expected responses when only one stimulus component is present. As the results of Houtgast (16) show, these subharmonics are indeed perceived.
Discussion
A dynamical attractor can be studied by means of time or frequency analysis. Both are common techniques in dynamicalsystems analysis, but one is not inherently more fundamental than the other, nor are these the only two tools available. For this reason, and because our reasoning makes no use of a particular physiological implementation, our results cannot be included directly either in the spectral (17) or the temporal (18) classes of models of pitch perception. What we have proposed is not a model, but a mathematical basis for the perception of pitch that uses the universality of responses of dynamical systems to address the question of why the auditory system should behave as it does when confronted by stimuli consisting of complex tones. Not all pitch perception phenomena are explicable in terms of universality; nor should they be, because some will depend on the specific details of the neural circuitry. However, this is a powerful way of approaching the problem and is capable of explaining many experimental data considered difficult to understand. Future pitch models can surely incorporate these results in their frameworks. Spectral models (17) can use these ideas because they make consistent use of different kinds of harmonic templates, and threefrequency resonances offer in a natural way optimized candidates for the base frequency of such templates without the need to include stochastic terms. Temporal models (18) can apply these results because they need some kind of locking of neural spiking to the fine structure of the stimulus, and threefrequency resonances are the natural extension of phase locking to the more complicated case of quasiperiodic forcing that is typically related to the perception of complex tones. A dynamicalsystems viewpoint can then integrate spectral and temporal hypotheses into a coherent unified approach to pitch perception incorporating both sets of ideas.
We have shown that universal properties of dynamical responses in nonlinear systems are reflected in the pitch perception of complex tones. In previous work (12), we argued that a dynamicalsystems approach backs up experimental evidence for subcortical pitch processing in humans (19). The experimental evidence is not conclusive: studies with monkeys have found that raw spectral information is present in the primary auditory cortex (20). However, whether this processing occurs in, or before, the auditory cortex, the dynamical mechanism we envisage greatly facilitates processing of information into a single percept. Pitch processing may then prove to be an example in which universality in nonlinear dynamics can help to explain complex experimental results in biology. The auditory system possesses an astonishing capability for processing pitchrelated information in real time; what we have demonstrated here is how, at a fundamental level, this can be so.
Acknowledgments
We thank Fernando Acosta for his help in the preparation of Fig. 2. D.L.G. conceived the idea, and together with J.H.E.C. and O.P. carried out the research; J.H.E.C. and D.L.G. cowrote the paper. J.H.E.C. acknowledges the financial support of the Spanish Consejo Superior de Investigaciones Científicas, and Plan Nacional del Espacio Contract ESP981347. O.P. acknowledges the Spanish Ministerio de Ciencia y Tecnologia, Proyecto CONOCE, Contract BFM20001108.
Universality in Nonlinear Systems
Nonlinear systems exhibit universal responses under external forcing:
Harmonics from Periodically Forced Passive Nonlinearities.
A single frequency periodically forcing a passive (sometimes termed static) nonlinearity generates higher harmonics (overtones) 2ω_{1}, 3ω_{1}, . . . of a fundamental ω_{1}, given by pω_{1} + ω_{H} = 0 with p integer. This is seen in acoustics as harmonic distortion.
Combination Tones from Quasiperiodically Forced Passive Nonlinearities.
A passive nonlinearity forced quasiperiodically by two sources generates combination tones ω_{1} − ω_{2}, ω_{1} + ω_{2}, . . . , which are solutions of the equation pω_{1} + qω_{2} + ω_{C} =0, where p and q are integers. They are found as distortion products in acoustics.
Subharmonics, or TwoFrequency Resonances from Periodically Forced Dynamical Systems.
With a periodically forced active nonlinearity—a dynamicalsystem—more complex subharmonic responses ω_{1}/r, 2ω_{1}/r, . . . , (r − 1)ω_{1}/r known as mode lockings or twofrequency resonances are generated. These are given by pω_{1} + rω_{2}_{R} = 0 when p and r are integers. As some parameter is varied, different resonances are found that remain stable over an interval. A classical representation of this, known as the devil's staircase, is shown in Fig. 3.
We see that the resonances are hierarchically arranged. The local ordering can be described by the Farey sum: If two rational numbers a/c and b/d satisfy ad − bc = 1 we say that they are unimodular or adjacents and we can find between them a unique rational with minimal denominator. This rational is called the mediant and can be expressed as a Farey sum operation a/c ⊕ b/d = (a + b)/(c + d). The resonance characterized by the mediant is the widest between those represented by the adjacents (21).
ThreeFrequency Resonances from Quasiperiodically Forced Dynamical Systems.
Quasiperiodically forced dynamical systems show a great variety of qualitative responses that fall into three main categories: there are periodic attractors, quasiperiodic attractors, and chaotic and nonchaotic strange attractors. Here we concentrate on the threefrequency resonances produced by twofrequency quasiperiodic attractors as the natural candidates for modeling the residue (22). Threefrequency resonances are given by the nontrivial solutions of the equation pω_{1} + qω_{2} + rω_{3}_{R} = 0, where p, q, and r are integers, ω_{1} and ω_{2} are the forcing frequencies, and ω_{3R} is the resonant response, and can be written compactly in the form [p, q, r]. Combination tones are threefrequency resonances of the restricted class [p, q, 1]. This is the only type of response possible from a passive nonlinearity, whereas a dynamical system such as a forced oscillator is an active nonlinearity with at least one intrinsic frequency, and can exhibit the full panoply of threefrequency resonances, which include subharmonics of combination tones. Threefrequency resonances obey hierarchical ordering properties very similar to those governing twofrequency resonances in periodically forced systems. In the interval (ω_{2}/p, ω_{1}/q), we may define a generalized Farey sum between any pair of adjacents as a_{1}/c ⊕ a_{2}/d = (a_{1} + a_{2})/(c + d). The daughter threefrequency resonance characterized by the generalized mediant is the widest between its parents characterized by the adjacents (50). Thus, threefrequency resonances are ordered very similarly to their counterparts in twofrequency systems, and form their own devil's staircase (Fig. 4).
Footnotes
 Received December 13, 1999.
 Accepted February 12, 2001.
 Copyright © 2001, The National Academy of Sciences
References
 ↵
 ↵
 Bregman A S

 Roberts B,
 Bayley P J
 ↵
 Moore B C J
 ↵
 Keidel W D,
 Neff W D
 de Boer E
 ↵
 von Helmholtz H L F
 ↵
 ↵
 Kinchin A Y
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Pantev C,
 Hoke M,
 Lütkenhöner B,
 Lehnertz K
 ↵
 ↵
 González D L,
 Piro O
 ↵
 Reguera D,
 Rubi M,
 Vilar J
 Cartwright J H E,
 González D L,
 Piro O
 ↵