New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Physical basis of twotone interference in hearing

Communicated by T. Gold, Cornell University, Ithaca, NY (received for review September 28, 2000)
Abstract
The cochlea uses active amplification to capture faint sounds. It has been proposed that the amplifier comprises a set of selftuned critical oscillators: each hair cell contains a forcegenerating dynamical system that is maintained at the threshold of an oscillatory instability, or Hopf bifurcation. While the active response to a pure tone provides frequency selectivity, exquisite sensitivity, and wide dynamic range, its intrinsic nonlinearity causes tones of different frequency to interfere with one another in the cochlea. Here we determine the response to two tones, which provides a framework for understanding how the ear processes the more complex sounds of speech and music. Our calculations of twotone suppression and the spectrum of distortion products generated by a critical oscillator accord with experimental observations of basilar membrane motion and the nervous response. We discuss how the response of a set of selftuned oscillators, covering a range of characteristic frequencies, represents the structure of a complex sound. The frequency components of the stimulus can be inferred from the timing of neural spikes elicited by the vibrating hair cells. Passive prefiltering by the basilar membrane improves pitch discrimination by reducing interference between tones. Our analysis provides a general framework for examining the relation between the physical nature of the peripheral detection apparatus and psychophysical phenomena such as the sensation of dissonance and auditory illusions.
The nonlinear nature of sound detection has been known for more than 250 years, ever since Tartini described the perception of combination tones that are not present in a complex sound stimulus (1). More recently, experimental techniques have made it possible directly to observe nonlinearities in the basilar membrane motion (2–8) and in the signals of the auditory nerve (9) and, in some instances, to trace them to the transduction process in the hair cells of the inner ear (10, 11). Nonlinearities imply that the response to two tones is not simply a superposition of single frequency responses—different frequencies interfere with and distort one another (10). Manifestations of this interference, which have been identified experimentally, include twotone suppression (7–9) and the generation of distortion product (DP) frequencies (5, 6). Here we demonstrate that these physiological observations are a direct consequence of the active system of signal detection (12), which the ear uses to amplify weak signals. This insight permits us to draw conclusions about the way that the auditory system infers information about pitch and to shed light on a variety of psychophysical observations.
Active Amplification
That an active mechanism operates in hearing was predicted as long ago as 1948 by Gold (13), who argued that the cochlea might work analogously to a regenerative radio receiver and use a source of energy to counteract the limiting effects of friction and actively amplify the stimulus. General acceptance of this viewpoint has been hampered by the lack of a quantitative theory to confront with experimental data and also by doubts about how the hypothesized amplificatory feedback could be correctly regulated. The recently introduced notion of selftuned criticality (14) addresses these issues. According to this concept, each hair cell contains a forcegenerating dynamical system that is poised on the verge of an oscillatory instability (a Hopf bifurcation) and is kept at that critical point by a selfadjustment mechanism. Such a critical oscillator is especially responsive to weak sinusoidal stimuli applied at its characteristic frequency, and calculations have demonstrated that a Hopf resonance provides frequency selectivity, extreme sensitivity, and a broad dynamic range as a result of nonlinear amplification (14, 15).
Here, we recall the response of a mechanical amplifier, which is selftuned to a Hopf bifurcation, to a pure tone of frequency f, characterized by the amplitude F_{f} of a periodic stimulus force. The system responds with a deflection X, which is dominated by the Fourier amplitude X_{f} at the same frequency. Stimulus and response are related by an expansion of the form (14) 1 where the principal nonlinearity is cubic. The complex coefficients A(f) and B(f) are frequencydependent. The Hopf bifurcation is characterized by the fact that A vanishes for a characteristic frequency f_{c}: 2 with a complex coefficient α. If the stimulus frequency is close to the characteristic frequency, the linear term in Eq. 1 is insignificant and the system displays a nonlinear amplified response, X_{f} ∼ F. This occurs when f − f_{c} < Δf_{a}, where Δf_{a} ≡ 7B^{1/3}F/(4α) denotes the bandwidth of active amplification, which depends on the stimulus amplitude. For frequencies outside this bandwidth, the response is linear, X_{f} ∼ F_{f}/f − f_{c}. Within the sensitive bandwidth Δf_{a} the system amplifies with a gain X_{f}/F_{f} ∼ F, which becomes very high for weak stimuli. The powerlaw response of the system allows it to operate over a wide dynamic range; it compresses the 12 orders of magnitude in stimulus intensity that the ear can hear into deflections that vary by only a factor of 100.
TwoTone Interference
Twotone suppression and the generation of DPs can be explained by the twotone interferences generated at a Hopf bifurcation. Because the response of a Hopf oscillator is generic near its critical point, we can describe the main features of these effects without detailed knowledge of the underlying molecular mechanisms.
We are interested in the response to two tones (frequencies f_{1} and f_{2}, frequency difference Δf = f_{2} − f_{1}) with amplitudes F_{f1} and F_{f2} acting on a critical Hopf oscillator with f_{c} = f_{1}. The spectrum of the response contains the corresponding amplitudes X_{f1} and X_{f2}. Both amplitudes, however, are systematically smaller than each of them would be separately in the absence of the second tone, as is observed experimentally (7). This twotone suppression is illustrated in Fig. 1, which displays the numerical solutions of a simple model for a Hopf bifurcation (see Appendix A). The diagram shows that the nonlinear amplification of a tone at the oscillator's characteristic frequency can be extinguished by the presence of the second tone, especially when F_{f2} > F_{f1} and Δf < Δf_{a}. Twotone suppression near a Hopf bifurcation is generic and follows from nonlinearities in the expansion of the Fourier modes (see Appendix B). Analysis reveals that the presence of X_{f2} in the response spectrum generates an effective linear term in the equation for X_{f1}. This mode thus behaves as if the oscillator were not tuned precisely to the bifurcation point. The corresponding loss of amplification would lead to an increased detection threshold when a second tone (or noise) is introduced. This phenomenon is referred to as masking (16).
The nonlinearities of a Hopf bifurcation also generate new frequencies. The amplitudes X_{f1} and X_{f2} couple to the amplitudes with frequencies 2f_{1} − f_{2} and 2f_{2} − f_{1}, which are therefore also present in the response (see Appendix B). They subsequently excite further DPs. This leads to a hierarchy of DPs with frequencies f_{k} ≡ f_{1} + (k − 1)Δf, where k is a positive or negative integer. For large k − 3/2 > Δf_{a}/Δf the amplitudes decrease exponentially 3 This characteristic spectrum of DPs is apparent in Fig. 2, where the response of a simple model (see Appendix A) is displayed for three different values of Δf, together with the corresponding waveforms. The coefficient λ^{−1}, characterizing the number of strongly excited DPs, decreases as Δf increases. These findings are consistent with experimental data (6). For Δf < Δf_{a}, a large number of modes is excited and deviations from the exponential law appear. In the limit of vanishing Δf, a singular limit is attained for which the DP amplitudes decay as a power law X_{fk} ∼ k − 3/2^{−v}, with v = 4/3 (see Appendix B). This is confirmed by numerical solutions of the simple model described in Appendix A, for which we find v ≃ 1.31 ± 0.05.
It has previously been suggested that twotone suppression might be explained by a passive nonlinearity sandwiched between two linear bandpass filters (17). Such a system also could produce a prominent DP at frequency 2f_{1} − f_{2} on the assumption that the nonlinearity is cubic (17). This picture has no physical foundation, however, and also fails to account for the observed amplitude dependence of the DP spectrum. By contrast, a selftuned Hopf bifurcation simultaneously describes twotone suppression and DP generation, as well as explaining why the nonlinear response to a single tone at the characteristic frequency is cubic (14). Moreover, the model has a sound physical basis, because the presence of an active system of amplification is wellestablished (12).
Passive Prefilter
Although a Hopf bifurcation provides excellent amplification, especially for weak signals, its ability to filter frequencies is less impressive. An oscillator with f_{c} = f_{1} can have a significant contribution X_{f2} in its response spectrum, even when Δf > Δf_{a}. This interference, which would pose problems for the detection of complex sounds, can be reduced by prefiltering the stimulus. It is wellestablished that the mechanical properties of the basilar membrane provide such a filter (19), which, owing to the tonotopic organization of hair cells (20), is centered on the characteristic frequency of each oscillator. The bandwidth Δf_{p} of this passive prefilter sets a frequency interval above which twotone interference is suppressed, whatever the level of the stimulus. This bandwidth is therefore roughly equal to the critical bandwidth Δf_{CB} (16) measured at moderate to high intensities, for which the active system does not significantly sharpen the tuning. At low intensities, Δf_{a} ≪ Δf_{p} and the active amplifier is itself very effective at suppressing interference; in this case we expect the critical bandwidth to diminish, as is indeed observed experimentally (18).
Pitch Extraction
Our analysis contributes to the longstanding debate about whether frequency is encoded by place or by timing. An example of the response of a full set of hair cells, from which frequency information must be derived, is shown in Fig. 3. Could frequency be represented by the spatial distribution of the neural response, which is maximal where the disturbance is greatest? This notion (which was originally promoted by Békésy's experiments on cochlear mechanics) suffers a drawback. The passive filter is too broad to account for the observed pitch discrimination (19) and, although the active amplifier certainly sharpens the tuning for weak stimuli, it has little effect at high intensities. Many have argued that the majority of information about frequency is derived from the detailed time course of the response of the hair cells, via the timing of neural spikes (21, 22). When stimulated by two tones, each excited oscillator vibrates in a pattern that contains components at f_{1} and f_{2} and also at the DP frequencies. Both the absolute and the relative sizes of each component vary along the cochlea, as the characteristic frequency of the hair cells changes (Fig. 3). The nervous system is provided with only partial information about each of these complex waveforms, in the form of a time series of nervous spikes. We argue below that clear identification of a frequency component is possible only if it is dominant in the motion. This suggests a role for basilar membrane resonance in contraposition to place coding. By prefiltering the stimulus and limiting interference, it permits accurate inference of frequency from nervous timing.
We base our model of pitch discrimination on the generic nervous response of hair cells: (i) spikes are elicited whenever the hair bundle deflection traverses a threshold (for frequencies below 5 kHz, at least); and (ii) hair cells with a range of different thresholds are present at each characteristic frequency (23). How might the nervous system extract information about frequency and intensity from the resulting spike trains? A simple, but effective, algorithm is to compute a histogram of interspike intervals T, summing over all the hair cells in the cochlea (and integrating over a fraction of a second). Perceived tones correspond to peaks in the histogram and are assigned pitch 1/T, while the perceived loudness of a tone is related to the height of the peak. Examples of this procedure are shown in Fig. 4a. Three regimes are apparent, depending on the ratio of the stimulus frequencies. (i) When the two frequencies are very close, Δf ≪ Δf_{p}, a single tone is perceived with pitch (f_{1} + f_{2})/2. In addition, there are strong loudness fluctuations at the beat frequency f_{2} − f_{1}. The quality of the beats depends on Δf; the ratio of silence to loudness diminishes as the frequencies approach one another (reflecting the waveform in Fig. 2b). (ii) When the two frequencies differ by a few percent, Δf < Δf_{p}, the perceived pitch is more ambiguous. The histogram becomes much broader, making pitch assignment less accurate. Furthermore, in the case where two pitches can be discriminated, both undergo rapid fluctuations in loudness with different phases. The resulting variability of the perceived pitch would account for the roughness of sensation that is experienced in this situation (24). (iii) At larger frequency differences, Δf > Δf_{p}, the two pitches f_{1} and f_{2} are accurately and clearly distinguished. Although we do not rule out the possibility that the nervous system uses a more sophisticated algorithm to infer pitch, the results summarized in Fig. 4b concur with a wide variety of psychophysical observations (16).
They also add to our understanding of the enigmatic relation between harmony and the ratio of small integers, on which musical scales are based. Helmholtz (1) overturned the Pythagorean doctrine by arguing that consonant intervals are not perfect harmonies, but simply less jarring dissonances. He ascribed dissonance to close, but inexact, matches in frequency between some of the harmonics that are generated when notes are played on a musical instrument. Subsequent experiments using pure tones lent weight to his argument (24). No preference for integer frequency ratios was expressed; rather, the roughness of two tones was found to be most intrusive when the difference amounted to a few percent and to diminish smoothly as Δf increased. Helmholtz attributed the roughness to beats, but his explanation is unsatisfactory because it fails to explain why dissonance persists above absolute frequencies of 1 kHz, when the beats are too fast to be distinguished. It is little surprise that his linear argument proves inadequate for a nonlinear system. We argue that dissonance arises from the difficulty of inferring frequency components from partial information about a complex waveform, which results in an indeterminancy of pitch. Both dissonance and pitch discrimination depend on the degree of interference between the two tones in the cochlea, so we would expect the interval of dissonance to have the same dependence on frequency as the interval of pitch discrimination, as is indeed observed (24). We also predict that these intervals should diminish at low intensities, when the active amplifier is most effective at sharpening the filter.
In addition, our analysis accounts for two different types of auditory illusion and indicates that they have distinct origins. Many investigators have confirmed that the DP frequencies can be heard (25). Although the DPs are generated by the Hopf resonance, they are not produced as the strongest component. Thus the active amplifier cannot, by itself, account for their audibility. Nevertheless, any nonlinearity in the prefilter would generate small DP components (1), which subsequently would be magnified by oscillators of the corresponding characteristic frequency. The Tartini effect, then, can be explained by the combination of a nonlinear prefilter and active amplifier. The second type of illusion is the residue pitch (26, 27). When f_{1} and f_{2} are neighboring harmonics, this pitch is identified as the missing fundamental Δf. But when the frequencies are less simply related, one or more pitches close to, but not equal to, Δf are heard. These pitches are derived from the complex waveform of hairbundle motion and appear as peaks in the histogram of spike intervals (Fig. 5). This adds substance to previous suggestions (26–28) that they are artefacts that arise from the coding of hairbundle vibration as a time series of nervous spikes.
Summary
In summary, we have provided a unifying physical description of twotone interference effects and shown that many aspects of our perception of sound may be traced to the physiology of the inner ear, where they originate in one (or a combination) of the three stages of sound detection: prefiltering by the basilar membrane, active amplification by hair cells, and neural coding of hairbundle motion.
Acknowledgments
We thank S. Camalet, E. F. Evans, A. J. Hudspeth, P. Martin, and J. Prost for fruitful discussions. F.J. is grateful for the hospitality of the Cavendish Laboratory and acknowledges support from the Engineering and Physical Sciences Research Council. T.D. is a Royal Society University Research Fellow.
Numerical Model
Active Amplifier.
Numerical results are solutions of the complex differential equation where τ is a time scale and the Hopf bifurcation occurs for ɛ = 0. We use frequencies for which f_{1}/Δf is integer and define the response X = x_{0}Re(Z). The amplitudes X_{f} = 2Δf ∫ dtX(t)e^{−2πift}, obtained by a Fourier transform of the limit cycle, satisfy Eqs. 1, 2, A4, and A5 with α = (2πiτ/x_{0})(F_{f}/F̃_{f}) and B = B̄/2 = C = α/(2πiτx). The active bandwidth Δf_{a} is defined as the value of f − f_{c} for which the singlefrequency response falls to half the peak amplitude. It varies with force, Δf_{a} ∼ F: comparison with the experimental basilar membrane response (4) suggests that Δf_{a}/f_{c} = 0.01, 0.1, and 1 correspond, respectively, to sound pressure levels 10, 40, and 70 db. Data in Figs. 1 and 2 were obtained with τf_{c} = 1, x_{0} = 1, slightly on the oscillating side of the bifurcation, ɛ = −10^{−3}. In Fig. 2, F̃_{f1} = F̃_{f2} = 0.5, giving Δf_{a}/f_{c} ≃ 1.22 (a highlevel stimulus). In Figs. 3 and 4, a moderate stimulus was used, such that Δf_{a}/f_{c} ≃ 0.05. Data in Fig. 5 were obtained by using stimuli of slightly higher amplitude, corresponding to Δf_{a}/f_{c} = 0.1.
Passive Prefilter.
Results in Figs. 3 and 4 were obtained by using a passive prefilter χ(f), which multiplies F_{f} before excitation of the Hopf resonance. This is equivalent to the coefficients α and B in Eqs. 1 and 2 having the functional form χ^{−1}(f). The form χ(f) = [(f − f_{c})^{2}/(Δf_{p})^{2} + 1]^{−1} with bandwidth Δf_{p}/f_{c} = 0.15 is a fair approximation of the postmortem amplituderesponse of the basilar membrane (3). The prefilter suppresses twotone interferences for Δf > Δf_{p}. It does not affect the relative amplitudes of DPs.
Neural Response.
Data in Figs. 4 and 5 were obtained by using a set of 100 oscillators with characteristic frequencies in the range f_{c}/f_{1} = 0.5 − 1.5. A neural spike was elicited by an oscillator every time its response X traversed a given threshold. Histograms of interspike intervals were constructed, averaging over all positive thresholds for each oscillator and then summing over all oscillators.
Generic TwoTone Distortions Near a Hopf Bifurcation
In the presence of a stimulus A1 containing two different frequencies f_{1} and f_{2}, the response of a dynamical system close to a Hopf bifurcation contains all Fourier amplitudes with frequencies f = nf_{1} + mf_{2}, where n and m are positive or negative integers. For simplicity, we choose stimuli with commensurate frequencies, for which f_{1}/Δf is integer and the response is given by A2 with f_{k} = f_{1} + (k − 1)Δf.
Close to the Hopf bifurcation and for small amplitudes X_{f}, we can write a general expansion of the form A3 Throughout this paper, we ignore quadratic terms that occur if the symmetry X → −X of the active system is broken; they renormalize the coefficients B and B̄ of the cubic terms in Eq. A4 below and are only involved in the generation of higher harmonics and difference tones. For small stimulus and large Δf, the dominant terms in the expansion (A3) for the response at frequencies f_{1} and f_{2} is given by A4 where A(f) = 𝒜(f), B(f) = ℬ(f, f, f) and B̄(f_{1}, f_{2}) = 2ℬ(f_{1}, f_{1}, f_{2}). We choose one of the stimuli to be at the characteristic frequency, f_{1} = f_{c}, therefore A(f_{1}) = 0. For F_{f1} = F and F_{f2} = 0, it follows that X_{f2} = 0 and the nonlinear singletone response X_{f1} ≃ F^{1/3}B^{−1/3} is recovered. The stimulus F_{f2} generates X_{f2}, which in the linear regime obeys X_{f2} ≃ F_{f2}/αΔf. The mode X_{f2} creates an effective linear term for X_{f1} with A_{eff} ≃ B̄X_{f2}^{2}. The mode X_{f1} therefore behaves as if the system were mistuned and displays a suppressed response that is linear for small F_{f1}. Similarly, X_{f1} creates an effective linear term that renormalizes A(f_{2}) and suppresses the response X_{f2}.
Distortion products result from nonlinear terms XX^{*}_{f2} and XX^{*}_{f1}, which couple to the Fourier modes with frequencies 2f_{1} − f_{2} and 2f_{2} − f_{1}. For example, in the absence of a stimulus of frequency 2f_{1} − f_{2}, i.e. F_{2f1}_{−f2} = 0, the corresponding amplitude obeys A5 where C = ℬ(2f_{1} − f_{2}, f_{1}, − f_{2}). This leads to a DP amplitude X_{2f1}_{−f2} ≃ C/αΔf∥XX_{f2}. This mode together with X_{f1} generates, via the same coupling, the DP at 3f_{1} − 2f_{2}. Recursively, we therefore obtain a hierarchy of DPs with frequencies f_{k} = f_{1} + (k − 1)Δf, whose amplitudes decay exponentially according to Eq. 3 with λ ≃ ln(7Δf/12Δf_{a}) for Δf ≫ Δf_{a}.
For smaller Δf, the response involves a large number of terms of the expansion (A3) and deviations from a pure exponential appear. An interesting limit occurs for Δf → 0 where the linear coefficients A(f_{k}) ≃ αΔf(k − 1) and the frequency dependence of ℬ(f, f′, f") can be neglected for a large number of modes around f_{1}. With this approximation Eq. A3 becomes A6 which corresponds to X(t) ≃ B^{−1/3}F^{1/3}(t). The spectrum of X(t) exhibits, for large k, a power law decay of DPs, X_{fk} ∼ k − 3/2^{−v}, centered around the critical frequency. This can be demonstrated explicitly in the case F_{f1} = F_{f2} = F, for which the response to a twotone stimulus for small Δf is X(t) ≃ (2F/B)^{1/3}cos^{1/3}(2π(f_{1} + Δf/2)t)cos^{1/3}(πΔft). The spectrum X_{fk} in this limit is therefore simply given by the convolution of the spectrum C_{n} with itself, where C_{n} are the Fourier components of C(t) ≡ cos^{1/3}(2πft) = ∑_{n}C_{n}e^{2πinft}. These Fourier components decay for large n as C_{n} ∼ n^{−v} with v = 4/3. This power law reflects singularities of dC/dt. Indeed, at the zeros t_{0} for which C(t_{0}) = 0, dC/dt ∼ (t − t_{0})^{−3/2} diverges with a power law that determines the exponent v.
Abbreviation
 DP,
 distortion product
 Received September 28, 2000.
 Accepted May 23, 2001.
 Copyright © 2001, The National Academy of Sciences
References
 ↵
 Helmholtz H L F
 ↵
 ↵
 ↵
 ↵
 ↵
 Robles L,
 Ruggero M A,
 Rich N C
 ↵
 Ruggero M A,
 Robles L,
 Rich N C
 ↵
 ↵
 ↵
 ↵
 Hudpeth A J,
 Choe Y,
 Mehta A D,
 Martin P
 ↵
 ↵
 Gold T
 ↵
 Camalet S,
 Duke T,
 Jülicher F,
 Prost J
 ↵
 ↵
 Zwicker E,
 Fastl H
 ↵
 ↵
 ↵
 von Békésy G
 ↵
 Russel I J,
 Nilsen K E
 ↵
 Rose J E,
 Brugge J F,
 Anderson D J,
 Hind J E
 ↵
 Evans E F
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Rose J E,
 Brugge J F,
 Anderson D J,
 Hind J E
Citation Manager Formats
More Articles of This Classification
Physical Sciences
Applied Physical Sciences
Biological Sciences
Related Content
 No related articles found.
Cited by...
 Phantom tones and suppressive masking by active nonlinear oscillation of the haircell bundle
 Nonlinear models of development, amplification and compression in the mammalian cochlea
 Linear and nonlinear processing in hair cells
 Sparse timefrequency representations
 Power gain exhibited by motile mechanosensory neurons in Drosophila ears
 Active hairbundle motility harnesses noise to operate near an optimum of mechanosensitivity
 Spontaneous Oscillation by Hair Bundles of the Bullfrog's Sacculus
 Compressive nonlinearity in the hair bundle's active response to mechanical stimulation
 Compressive nonlinearity in the hair bundle's active response to mechanical stimulation