Brain potentials to native phoneme discrimination reveal the origin of individual differences in learning the sounds of a second language

Edited by Willem J. M. Levelt, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands, and approved September 4, 2008
October 21, 2008
105 (42) 16083-16088


Human beings differ in their ability to master the sounds of their second language (L2). Phonetic training studies have proposed that differences in phonetic learning stem from differences in psychoacoustic abilities rather than speech-specific capabilities. We aimed at finding the origin of individual differences in L2 phonetic acquisition in natural learning contexts. We consider two alternative explanations: a general psychoacoustic origin vs. a speech-specific one. For this purpose, event-related potentials (ERPs) were recorded from two groups of early, proficient Spanish-Catalan bilinguals who differed in their mastery of the Catalan (L2) phonetic contrast /e-ε/. Brain activity in response to acoustic change detection was recorded in three different conditions involving tones of different length (duration condition), frequency (frequency condition), and presentation order (pattern condition). In addition, neural correlates of speech change detection were also assessed for both native (/o/-/e/) and nonnative (/o/-/ö/) phonetic contrasts (speech condition). Participants' discrimination accuracy, reflected electrically as a mismatch negativity (MMN), was similar between the two groups of participants in the three acoustic conditions. Conversely, the MMN was reduced in poor perceivers (PP) when they were presented with speech sounds. Therefore, our results support a speech-specific origin of individual variability in L2 phonetic mastery.
Learning a nonnative language sound system is notoriously difficult and often results in the foreign accent that characterizes nonnative speakers. This difficulty in learning a new speech sound system also has consequences for comprehension of the nonnative language. For example, most Japanese native speakers may consider that the English words /rock/ and /lock/ are the same word (1), because the Japanese speech-sound system does not distinguish between the phonemes /r/ and /l/. However, individuals differ significantly in the degree to which they master a nonnative phonetic code, and factors such as the age of acquisition, the amount of exposure, and motivational constraints have a crucial role in final phonetic attainment (2). Despite very early, extended exposure to a L2, many individuals continue to have considerable difficulties in the perception and production of some foreign sounds (3), whereas other nonnative speakers cannot be distinguished from native speakers (4). Discovering the origin of such individual differences is crucial for predicting success in L2 acquisition and for designing learning protocols that maximize the success of L2 learning. To advance in our knowledge of the origin of this individual is the goal of the present study. Specifically, we assess whether such individual differences stem from differences in domain-general psychoacoustic processes or, rather, from differences in specific speech perception abilities.
One prolific line of research has explored the neural differences between individuals trained to differentiate a difficult Hindi retroflex phonetic contrast (57). The results showed significant neuroanatomical and functional differences between fast and slow learners involving auditory (including Heschl's gyrus) and parieto-occipital areas, especially in the left hemisphere. Nevertheless, these studies have not explored individual differences in the final attainment of phoneme perception, but differences between fast and slow learners. Hence, the individual differences observed could be revealing not only differences in perceptual abilities, but also differences in the ease with which the training task is performed (57). In fact, in these studies, most individuals managed to perceive the contrast at the end of the training period. Therefore, the question still remains of why in real life, with extended exposure, some individuals attain native-like performance and others behave as if they were “deaf” to some foreign contrasts. Also, it is difficult to extrapolate results from laboratory learning to real-life contexts, because the two settings differ in many fundamental ways and it is possible that, in a laboratory setting, different brain mechanisms are recruited from those used in a real-life situation. For example, in training studies, participants learn to discriminate new nonnative contrasts under strict laboratory conditions (quiet environments, attention focused on the stimuli) with carefully selected stimuli (highly simplified consonant-vowel sequences, synthesized speech) and procedures (the use of reward or feedback during training), and in relatively short periods of training [e.g., <30 min (ref. 5–7)]. In contrast, in natural situations, the L2 learning is achieved in the context of exposure to very varied stimuli (different speakers, rates of speaking, dialectal variation) without explicit attention being paid to the subtle physical differences entailed in phoneme contrasts and across long-term exposures (usually spanning several years). In the present study, we explore the origin of the differences between good perceivers (GP) and PP of L2 phoneme who have been exposed to a natural environment and learning situations.
To address this issue, it is necessary to compare GP and PP from the same population in which individuals have not only been extensively exposed to an L2, but this L2 learning has taken place through social interaction maximally equivalent to the learning of the native language (L1). Also, we will need an L2 phonetic contrast with a high degree of individual variability in its final attainment. The ideal situation is to test individuals who were born, grew up, and are actually living in a fully bilingual society. This situation can be found in the Barcelona area of Spain, where two languages, Catalan and Spanish, are coofficial and widely spoken, and where there is a Catalan-specific vocalic contrast that Spanish natives find very hard to perceive.
Previous research has shown that it is very difficult to learn a new phonetic contrast when the native language has a single phoneme category falling approximately in between (8). This is, for example, the problem that Japanese listeners must face when learning the English /r-l/ contrast (9). Spanish natives encounter a parallel acoustic-phonetic configuration when they have to learn the contrast between two Catalan midfront vowels (/e-ε/), given that Spanish only has one midfront vowel (/e/) between the two Catalan vowels [Catalan and Spanish are Romance Languages differing in their phonetic repertoires (10)]. In previous studies, we observed large individual differences in the performance of Spanish natives on this vowel contrast (1114). In the present study, two groups of Spanish-Catalan bilinguals were selected, who were maximally different in their capacity to perceive this L2 contrast. All participants had spent the first years of their lives as Spanish monolinguals and from the age of four they were continuously exposed to Catalan. Participants (n = 31) were selected from a group of 126 individuals as a function of their performance on three different behavioral tasks testing their perception of the Catalan-specific contrast /e-ε/. Participants in the GP group (n = 16) performed in all three tasks within the range of Catalan natives, and participants in the PP group (n = 15) performed below the range of Catalan natives in all three tasks.
To evaluate the neural correlates of acoustic and speech stimuli discrimination, participants' MMN was measured (15, 16). The MMN is elicited when the auditory perceptual system detects a mismatch between a neural representation of a frequently repeated stimulus (the standard) and a stimulus deviating in at least one parameter (the deviant). This ERP component peaks between 100–250 ms with a negative fronto-central scalp distribution. Importantly, the amplitude of the MMN is directly related with the magnitude of the perceived change and, hence, it is considered a measure of individual auditory discrimination accuracy (16, 17). In addition, because the MMN is elicited without participants' awareness of the change, it is not influenced by engagement of cognitive processes related to task demands or strategies. The MMN is sensitive to changes both in tones and speech sounds. It correlates with behavioral differences between GP and PP of sounds (18, 19), with nonnative language experience (2022) and with training (23).
Also, the neural activity underlying the MMN is attributed to two sets of neural generators: a superior temporal and a frontal generator (15). The former is associated with processing the auditory sensory input against a formed memory trace, whereas the latter is related to an involuntary attention switch toward a detected change in the auditory input (2427). Even though ERPs cannot directly measure the activity of the MMN generators, several studies indicate that their activity can be inferred from the amplitude and latency of the MMN (28, 29). Thus, the MMN can provide not only real-time information of potential differences between GP and PP of nonnative phonetic contrasts when tested with acoustic and speech stimuli, but also contribute to differentiating between perceptual and attentional origins of such differences.
In the present ERP study, participants were instructed to ignore auditory stimulation while watching a silent movie. Acoustic discrimination accuracy was assessed in three conditions in which pure tones were presented: duration, frequency, and pattern conditions (see Table 1). In these three conditions, the experimental paradigm was designed to maximally tax the auditory perceptual system, in an effort to increase the likelihood of observing differences between participants. For this reason, stimuli were presented at a relatively low intensity at a fast rate. In addition, the acoustic differences between the stimuli in the duration and frequency conditions were relatively small. In the duration and frequency conditions, participants' discrimination accuracy of duration and frequency changes was evaluated. We chose these parameters because they seem to be relevant for phonemic perception, as shown by several studies that link difficulties in duration and/or frequency coding and speech perception disabilities (3032). In each condition, stimuli were presented after an oddball paradigm, in which a tone was presented frequently (standard), whereas three other tones deviating in one parameter (frequency or duration) were presented at a lower probability (deviants). The third acoustic condition, the pattern condition, was designed to assess participant's capacity of extracting patterns from an auditory scenario. This ability is relevant for language perception because speech segmentation makes use of distributional information to isolate words (33, 34). In this condition, the standard stimulus corresponded to a sequence of two alternating pure tones (differing in frequency), presented in a predictable manner. In some cases, the presentation pattern was violated by repeating one of the two tones. In that case, repetition of a tone was the deviant stimulus (see Table 1) (35). Participants' speech perception capacities were assessed by presenting native and nonnative phonetic contrasts after an oddball paradigm (see Table 1) (16, 2022). Two blocks were created, the native and the nonnative one, depending on the status of the deviant phonemes. In the native block, Spanish vowels were presented. The standard stimulus was the Spanish vowel /o/, and the deviant phoneme was the Spanish vowel /e/. In the nonnative block, the standard stimulus was the same vowel /o/ presented in the native block, whereas the deviant phoneme was the Finnish vowel /ö/.
Table 1.
Experimental design and stimuli
    Duration200 ms120 ms, 80ms, 40ms
    Frequency1,000 Hz1,030 Hz, 1,060 Hz, 1,090 Hz
Acoustic discrimination abilities were assessed in three conditions (duration, frequency, and pattern). Speech-specific discrimination was evaluated in two conditions (native and nonnative phoneme). Relevant features of standard (normal type) and deviant (bold) stimuli are listed.
In the case that L2 phonetic learning abilities rely on general acoustic processing, we would expect to find significant differences in the MMN amplitude between the two groups in all of the conditions, regardless of whether they involve speech or nonspeech stimuli. On the contrary, if differences in L2 phonetic processing are caused by a speech-specific mechanism, differences in the MMN between GP and PP should be observed only when processing phonetic stimuli.


To test MMN reliability, the amplitudes of the MMN component at the frontal Fz electrode were compared with the zero level separately for each group of participants (see Table 2). A repeated-measures ANOVA was performed for each condition separately. Deviants eliciting a reliable MMN at least for one group were included in the analysis. The ANOVA performed for each condition included the factors “laterality” (right vs. left), “frontality” (frontal vs. central), “deviant type” (when necessary), and “group of participants” (GP vs. PP), unless specifically described. Only significant effects are reported.
Table 2.
The t test of the MMN mean amplitude for the duration, frequency, pattern, native phoneme, and nonnative phoneme conditions at Fz
ConditionLatency window, msPoor perceiversGood perceivers
Mean amplitude, μVtdfMean amplitude, μVtdf
    120 ms255–295−0.471.5412−0.431.7915
    80 ms230–270−0.060.1512−0.401.4015
    40 ms140–180−0.601.8112−1.314.54**15
    1,030 Hz
    1,060 Hz180–220−0.531.6713−0.691.7314
    1,090 Hz165–205−0.643.04*13−0.722.14*14
    native eMMN80–120−0.844.69**13−2.187.60**15
    native lMMN125–165−0.852.78*13−2.2012.40**15
    nonnative eMMN80–120−1.113.73*13−1.714.35**15
    nonnative lMMN145–185−1.584.93**13−1.944.60**15
Significant differences: *, P < 0.05; **, P < 0.001. df, degrees of freedom.

MMN Elicited in the Acoustic Conditions.

Fig. 1 shows the MMN grandaverage difference waveforms of statistical reliable MMN in the acoustic conditions for GP and PP groups. For the duration condition, the larger deviant stimuli (40 ms) elicited a reliable MMN for the GP and a marginal MMN for the PP (P = 0.095). No difference between groups (F(1, 27) = 1.76, P > 0.05) or interaction was significant (all F(1, 27) < 1). Regarding the frequency condition, a MMN was observed for both groups of participants when presented with the larger deviant stimuli (1,090 Hz). The ANOVA did not show significant differences between groups (F(1, 27) < 1) or any group factor interactions. In the pattern condition, repetition of a tone elicited a reliable MMN for both groups of participants. The ANOVA did not show any significant differences between groups (F(1, 28) < 1). In short, no difference in the MMN amplitude was observed between GP and PP when presented with acoustic (nonspeech) materials.
Fig. 1.
Acoustic conditions. MMN obtained for the GP and PP groups in the duration condition for deviant 3 (A), frequency condition for deviant 3 (B), and pattern condition (C) at the Fz electrode. Latency windows of the MMN are indicated in gray.

MMN Elicited in the Speech Condition.

For both phonetic deviant phonemes (native /o/-/e/ and nonnative /o/-/ö/), two MMN subcomponents (early and late peaks) were identified. As indicated in Table 2, both subcomponents were significantly different from 0 for both groups of participants and for each deviant type. For that reason, the factor “subcomponent,” early (e)MMN and late (l)MMN, was added to the ANOVA. The analysis revealed a significant group effect (see Fig. 2): GP showed a larger MMN than PP (F(1, 28) = 6.32, P < 0.05; GP = −1.86 μV, PP = −1.11 μV). The factor group did not interact with any other factor. Also, the amplitude of the MMN was larger over central than frontal electrodes (F(1, 28) = 5.33, P < 0.05; frontal (F3, F4) = −1.42 μV, central (C3, C4) = −1.55 μV).
Fig. 2.
Phoneme condition. MMN obtained for GP and PP for native (A) and nonnative (B) phoneme contrasts at the Fz and LM electrodes. Two MMN subcomponents were identified: eMMN (light gray) and lMMN (dark gray). Mean amplitude values for both groups at frontal (F3, F4) and supratemporal (LM, RM) electrodes (C) were averaged for both MMN subcomponents (eMMN, lMMN) in the native (Nph) and nonnative (n-Nph) phonetic blocks. Error bars indicate SE.
A subsequent ANOVA was carried out for the MMN amplitudes comprising the factors “MMN generator” [frontal: F3, F4 vs. supratemporal: left mastoid (LM), right mastoid (RM)], laterality (right: F4, RM vs. left: F3, LM), deviant type (native vs. nonnative), and subcomponent (eMMN vs. lMMN). The analysis revealed differences between groups of participants (F(1, 28) = 4.28, P < 0.05; GP = −0.58 μV, PP = −0.20 μV). As expected, there was a generator effect (F(1, 28) = 186.78, P < 0.001) due to the reversal of the frontal negativity in supratemporal sites (frontal = −1.42 μV, mastoids = 0.64 μV). This analysis also showed an interaction between group of participants and MMN generator (F(1, 28) = 6.52, P < 0.05). Post hoc analyses showed significant differences between the two groups at the frontal generator (t(238) = 5.03, P < 0.001; GP = −1.80 μV, PP = −1.03 μV), but not at the temporal one (t(238) = 0.33, P > 0.05; GP = 0.64 μV, PP = 0.63 μV) (see Fig. 2). That is, the amplitude of the MMN was larger for GP than for PP at the frontal sites, most likely reflecting the MMN frontal generator.


The main goal of this study was to explore the neural correlates of individual differences in perceiving L2 phonetic contrasts. More specifically, we tested whether such individual variability stems from differences in the general psychoacoustic abilities of the perceivers or rather are linked to their speech-specific abilities. To do so, two groups of early and highly proficient Spanish (L1)-Catalan (L2) bilinguals differing in their perception of an L2 phonetic contrast were presented with several acoustic and phonetic conditions. Regarding the general acoustic capabilities, no significant differences were obtained between the two groups of participants in any of the three acoustic conditions: frequency, duration, and pattern. This lack of differences suggests that both groups were equally skilled at processing acoustic (nonspeech) material. Importantly, the lack of reliable MMN signatures for less deviant stimuli both in the duration and frequency conditions shows that our paradigm was good at examining the limits of the participants' auditory system, because both rougher and finer discriminatory abilities were evaluated. The sensitivity of the paradigm to test the limits of the participants' auditory system is an important feature of our design. In the duration condition, the MMN elicited by the larger deviant was numerically larger for GP than for PP. However, the fact that such difference between groups is far from significance, together with the use of a sensitive design suggests that the groups do not differ in their discriminatory abilities for this type of material.
The absence of differences between GP and PP in the three acoustic domains suggests that the perceptual analysis of simple sound features and their representation in neural memory traces is not at the basis of the behavioral differences between the GP and PP. Hence, the hypothesis that differences in general psychoacoustic abilities are at the basis of individual differences to learn phonemic contrasts is not supported by our data (5). A different picture emerges when the speech auditory capacities are tested.
Significant differences between GP and PP were found when the two groups were presented with speech sounds: GP showed larger MMN responses to phonetic stimuli (both native /o/-/e/ and nonnative /o/-/ö/) than PP. This result reveals differences in the sensitivity of these individuals to processing phonetic contrasts, suggesting, contrary to training studies (57), as discussed below, a speech-specific origin of the individual variability in L2-phoneme mastery.
Besides this general result, another interesting difference between the two groups emerged. The difference in the amplitude of the MMN between the groups was present at frontal electrodes, but absent at supratemporal ones. The activity of the MMN generators cannot be directly assessed by ERPs, but it can be inferred from the amplitude and latency of the MMN (28, 29). Hence, GP and PP differences seems to rely in a distinct activity of frontal MMN generators. This observation is important, because it can shed some light on the origin of the observed individual differences. As mentioned, a functional dissociation is proposed for the temporal and frontal generators of the MMN. The temporal generator is associated with sensory processing and the comparison of sensory information with memory representations, whereas the frontal generator is associated with the triggering of involuntary attention (36, 37). According to this model, the lack of differences between the two groups at temporal electrodes would reveal that both groups are equally able to represent the phonetic auditory sensory information and to integrate this information into memory representations. However, and given the differences at frontal sites, the two groups would differ in the way the disparity between an incoming mismatching phonetic stimulus and the standard phonetic neural representation triggers involuntary attention. Thus, the two groups appear to differ in the way their perception system is able to extract relevant features of speech sounds, because GPs seem particularly good at marking the deviant features of phonemes as salient information.
How can this failure prevent PP from perceiving (and learning) nonnative sounds? It is proposed that the capacity to behaviorally discriminate between sounds depends on two stages (16, 23). The first one (reflected by the MMN) is the capacity of the perceptual system to automatically generate a neural signal of stimulus change. The second one is the capacity of higher cognitive processes to “read” the neural signal and eventually to create new perceptual categories. Given these assumptions, one could argue that PP fail to boost the neural code produced by the processing at the temporal areas, preventing the second stage from taking place.
Our results are also relevant beyond the L2-acquisition context, in that they show striking individual differences in native phonetic discrimination. To our knowledge, this is the first experimental study observing individual variability in native vowel perception in a totally normal population. Although there are some studies revealing native phonetic deficits in language processing pathologies, the differences between our two populations are of a different nature. The crucial difference lies in the fact that perceptual acoustic deficits are also observed in language pathologies (38). Conversely, our PP participants do not have any sensory deficits (as far as the MMN can tell), but they differ in the relevance the speech perception system gives to the stimuli.
The cooccurrence of subtle native phonetic deficits with poor L2 phonetic mastery has already been suggested (39). In addition, based on several neuroimaging and neurophysiological studies about the neural organization of language, Perani and Abutalebi (40) claim that “L2 seems to be acquired through the same neural devices responsible for L1 acquisition.” In the same vein, Golestani and Zatorre (6) argue that “the successful learning of a nonnative phonetic contrast results in the recruitment of the same areas that are involved during the processing of native contrasts.” Hence, the relationship between L1 and L2 proficiency might lie in the use of shared neural mechanisms. In accordance with this proposal, our findings suggest that native and nonnative phonetic processing is sustained by the same speech ability, the resources or efficiency of which vary across individuals.
The apparent discrepancies between our results and those of the training studies reviewed in the introduction (57) merit consideration. The main conclusion of these studies was that of better acoustic processing of rapidly changing stimuli for faster learners, when compared with slower learners. As described, the way phonemes are learned in natural settings and in laboratory training are very different. For example, differential activation of striatal regions has been shown in learning nonnative contrasts (the /r-l/ contrast in Japanese natives) for training involving mere passive observation compared with feedback (41). This result evidences that different learning mechanisms are recruited depending on the learning context.
Another aspect that differentiates our study from the training ones reviewed in the introduction is the type of phonemes studied. The Golestani (5) and Tremblay (23) studies analyzed the differences in learning plosive consonants. The acoustic differences between plosives are found in rapid changes in the spectral domain. However, vowels are characterized as relatively stable (steady-state frequency patterns). Given the specialization of the left auditory cortex in rapidly changing nonspeech stimuli (42, 43), it is not inconceivable that the use of this type of phonemes increased the chances of observing a relevant role of temporal structures in the Golestani (5) and Tremblay (23) studies. Still, it has to be kept in mind that our participants were not randomly selected (like those studied in the training studies), but carefully chosen according to their L2 phonetic perception, so they do not represent the average normal listener but rather the two extremes of normality. It is an open question whether equivalent results would be obtained if a population like the one we studied were preselected on the basis of their plosive discrimination performance.
In summary, the results obtained in the present study indicate that poor L2 contrast perception is correlated with both native and nonnative phoneme discriminatory abilities. Consequently, native phonetic capabilities may predict successful learning of a new phonetic system.


Selection of the Experimental Sample.

To select the participants, 126 healthy Spanish (L1)-Catalan (L2) bilinguals were tested in three behavioural tasks. Participants were early bilinguals who had lived all their lives in a fully bilingual society, Catalonia. As a result of this early and extensive exposure to both languages, they became highly skilled bilinguals. All participants were first exposed to their L2 at the age of four at the latest (when mandatory schooling started) and had equivalent language experience in both languages, as assessed by a questionnaire of language use (44). They were undergraduate students from the University of Barcelona. They received course credit for their participation.
All participants performed three behavioural tasks designed to evaluate their ability to perceive the Catalan /e/-/ε/ vowel contrast. This Catalan-specific vowel contrast is very difficult for Spanish native speakers to discriminate (1114, 45, 46). The three tasks were an identification task (12), a gating task (46), and a lexical decision task (13). Each task was intended to assess different aspects of phonological knowledge. From the performance of a native group, a cutoff point was calculated to establish native performance (average −2.5 the standard deviation of the native group). For a detailed description of the three tasks and selection criteria see ref. 3.

Experimental Groups.

From this population, two groups of participants were selected according to their performance in the behavioural tasks. The group of so-called GP was composed of 16 participants (12 females, age range 20–26), who scored within the performance range of a group of natives in any of the three tasks. The group of so-called PP included 15 participants (15 females, age range 20–24), who performed below natives in all of the three tasks; t test comparisons for each task separately confirmed that the two groups differed in their performance (P < 0.001 for all).
Participants were paid for their participation. They all signed the corresponding written informed consent. The Edinburgh Handedness Inventory (47) was administered to participants; all of them but one from the GP group were right-handed. To control the left lateralization of the language areas, participants were administered a dichotic listening test (48). The results showed that for all participants language was lateralized to the left hemisphere. None of the participants reported having any hearing or language difficulties or had received specific musical training.

Stimuli and Procedure.

Participants were presented with a protocol to evaluate central sound representation (49), that comprised three different conditions tapping general acoustic perception (duration, frequency, and pattern conditions) and specific language perception (phonetic condition) (see Table 1).
In the duration condition, the stimuli were pure tones of 1,000 Hz. The duration of the standard tone was 200 ms (including 10 ms of rise/fall times) and the durations of the three deviant tones were 120, 80 and 40 ms.
In the frequency condition, the auditory stimuli were pure tones of 50 ms (including 10 ms of rise/fall times). The frequency of the standard tone was 1,000 Hz, whereas the frequencies of the deviant tones were 1,030, 1,060, and 1,090 Hz.
In the duration and frequency conditions, the probability of the standard tone was always 0.8 (600 standard tones per block) and, for each deviant, the probability was 0.066 (50 presentations of each deviant tone per block). Tones were presented in random order with the restriction that the first five tones of the blocks were always a standard and that at least one standard tone was presented between two deviants. The stimulus onset asynchrony (SOA) was of 314 ms.
In the pattern condition, 400 trains of tones were presented. Each train consisted of six alternating pure tones of either 500 or 1,000 Hz (2,400 tones altogether). Tones lasted 50 ms, including 10 ms rise/fall times. Tones within and between the trains were presented at a constant SOA of 128 ms. Stimulus trains were presented in a predictable way (ABABAB-BABABA-BABABA-ABABAB…), in which A represents the 500 Hz tone and B the 1,000 Hz tone, the hyphen indicates the beginning of the trains, and A and B denote the deviant event (i.e., repetition of the last tone presented in the preceding train).
Last, in the phonetic condition, participants were presented with the same synthesized phonemes used by Näätänen et al. (20). The only difference across the phonemes was the value of the second formant frequency (F2), whereas the F0 (105 Hz), F1 (450 Hz), F3 (2,540), and F4 (3,500 Hz) were kept constant. The standard phoneme during the two blocks was the vowel /o/ with an F2 of 851 Hz and a probability of 0.8. In the native block, the deviant phoneme was the vowel /e/, with an F2 of 1,940 Hz. In the nonnative block, the deviant phoneme was the Finnish vowel /ö/, with an F2 of 1,533 Hz. The F values of the /o/ and /e/ stimuli, although based on the Finnish language, are similar to Spanish ones (50). The deviant stimuli had a probability of 0.2. Every block contained 500 stimuli (400 standards and 100 deviants) with a constant SOA of 488 ms. The duration of all phonemes was 200 ms including 10 ms of rise/fall times. The stimuli were presented at random, but always the five first phonemes of a block were standard, and there was at least one standard stimulus before every deviant one.
Sixteen presentation lists were created including two presentations of each condition. First, the four conditions appeared in a random order and, after a short break, they were repeated in the reverse order. Lists were counterbalanced between groups.
During the EEG recording, participants sat [three frontal (F3, Fz, F4) and three central (C3, Cz, C4)] in a comfortable armchair in an electrically shielded soundproof room while watching a silent movie. The experimental session lasted ≈ 1 h, including a 10-min break. All of the stimuli were delivered binaurally through headphones (Sennheiser HD 435 Manhattan) at an intensity of 70 dB.

Electrophysiological Recording.

The ERPs were recorded from the scalp by using tin electrodes mounted in an electrocap (Electro-Cap International) and located at six standard positions and the two mastoids (LM and RM). Eye movements were measured with electrodes attached to the infraorbital ridge and on the outer canthus of the right eye. The common EEG/electrooculogram (EOG) reference was attached to the tip of the nose. Electrode impedances were kept <5 kOhm. The electrophysiological signals were filtered on-line with a bandpass of 0.1–100 Hz and digitized at a rate of 500 Hz.
ERPs were averaged off-line for standard and deviant stimuli separately for each participant and condition. Eye movements were corrected by means of independent component analyses (ICA) implemented in the Brain Vision Analyzer Software package (v. 1.05; Brain Products). Epochs with EEG exceeding either ± 100 μV at any channel, activity <0.5 μV, or voltage step/sampling >50 μV within intervals of 200 ms were automatically rejected off-line. Also, standard stimulus epochs occurring immediately after deviant stimulus epochs were excluded from the analysis. Epochs included in all cases a prestimulus baseline of 100 ms and were 600 ms long. Base line was corrected and lineal DC detrend procedure was performed on the individual segments. Participants with <70% free of artifact epochs in one stimulus type were not included in the posterior analysis in that particular condition. One participant (GP group) was excluded in the frequency condition, one participant (PP group) was excluded in the duration condition, and one participant (PP group) was excluded in all of the conditions. Individual ERPs were digitally band-pass filtered between 0.1 and 30 Hz (with a slope of 12 dB/oct).

Data Analysis.

The MMN was identified in the difference waveforms obtained, by subtracting the standard ERPs from those elicited by deviant stimuli. The MMN was measured for each participant group and condition as the mean amplitude in a 40 ms latency window centered in its maximum peak (Table 2). To test whether a significant MMN was elicited by deviant stimuli, one sample t tests were carried out (separately for each group of participants) to compare the amplitudes of the MMN component at Fz against the zero level. Results are shown in Table 2.
A repeated-measures ANOVA was performed for each condition separately. Only deviants eliciting a reliable MMN at least for one group were included in the analysis. The factors laterality (left hemisphere: F3, C3; and right hemisphere: F4, C4), frontality (frontal location: F3, F4; and central location: C3, C4), and deviant type (when necessary) were included in the ANOVA as within subjects factors, whereas group of participants (GP and PP) was the between subjects factor.
Because two peaks were identified in the MMN elicited by the deviant phonemes (eMMN and lMMN), the factor subcomponent (eMMN and lMMN) was also included in the analysis (together with the factors frontality, laterality, and group of participants). A second ANOVA was performed to compare the electrophysiological response elicited by the deviant phonemes at each MMN generator. Thus, the factor MMN generator (frontal: F3, F4 vs. supratemporal: LM, RM) was included in the analysis together with the factors laterality, subcomponent, and group of participants. Significance levels of the F ratios were adjusted with the Greenhouse–Geisser correction and the corrected P values are reported.


We thank X. Mayoral for his technical support and I. Ivanova and M. Gillon Dowens for their comments on previous versions of this manuscript. This work was supported by the Generalitat de Catalunya Grants SGR2005-00953, SGR2005-01025, and SEJ-2007-60751, and Project Consolider-Ingenio 2010 (CSD 2007-00012). B.D. was supported by a Spanish Government postgraduate fellowship.


W Strange, S Dittman, Effects of discrimination training on the perception of/r-l/ by Japanese adults learning English. Percept Psychophys 36, 131–145 (1984).
D Birdsong Second Language Acquisition and the Critical Period Hypothesis (Lawrence Erlbaum, Mahwah, NJ, 1999).
N Sebastián-Gallés, C Baus Twenty-first Century Psycholinguistics: Four Cornerstones, ed A Cutler (Erlbaum, New York), pp. 279–292 (2005).
T Bongaerts Second language acquisition and the Critical Period Hypothesis, ed D Birdsong (Erlbaum, Mahwah, NJ), pp. 133–159 (1999).
N Golestani, T Paus, RJ Zatorre, Anatomical correlates of learning novel speech sounds. Neuron 35, 997–1010 (2002).
N Golestani, RJ Zatorre, Learning new sounds of speech: Reallocation of neural substrates. NeuroImage 21, 494–506 (2004).
N Golestani, N Molko, S Dehaene, D LeBihan, C Pallier, Brain structure predicts the learning of foreign speech sounds. Cereb Cortex 17, 575–582 (2007).
CT Best, GW McRoberts, R Goodell, Discrimination of non-native consonant contrasts varying in perceptual assimilation to the listener's native phonological system. J Acoust Soc Am 109, 775–794 (2001).
H Goto, Auditory perception by normal Japanese adults of the sounds “l” and “r.”. Neuropsychologia 9, 317–323 (1971).
P Ladefoged, I Maddieson The Sounds of the World's Languages (Blackwell Publishers, Oxford, 1996).
L Bosch, A Costa, N Sebastián-Gallés, First and second language vowel perception in early bilinguals. Eur J Cogn Psych 12, 189–222 (2000).
C Pallier, L Bosch, N Sebastián-Gallés, A limit on behavioral plasticity in speech perception. Cognition 64, B9–17 (1997).
N Sebastián-Gallés, S Echeverríia, L Bosch, The influence of initial exposure on lexical representation: Comparing early and simultaneous bilinguals. J Mem Lang 52, 240–255 (2005).
N Sebastián-Gallés, A Rodríguez-Fornells, R de Diego-Balaguer, B Díaz, First- and second-language phonological representations in the mental lexicon. J Cogn Neurosci 18, 1277–1291 (2006).
R Näätänen, PT Michie, Early selective-attention effects on the evoked potential: A critical review and reinterpretation. Biol Psychol 8, 81–136 (1979).
R Näätänen, The perception of speech sounds by the human brain as reflected by the Mismatch Negativity (MMN) and its magnetic equivalent (MMNm). Psychophysiology 38, 1–21 (2001).
E Amenedo, C Escera, The accuracy of sound duration representation in the human brain determines the accuracy of behavioural perception. Eur J Neurosci 12, 2570–2574 (2000).
R Näätänen, E Schröger, S Karakas, M Tervaniemi, P Paavilainen, Development of a memory trace for a complex sound in the human brain. NeuroReport 4, 503–506 (1993).
H Lang, et al., Pitch discrimination performance and auditory event-related potentials. Psychophysiol Brain Res 1, 294–298 (1990).
R Näätänen, et al., Language-specific phoneme representations revealed by electric and magnetic brain responses. Nature 385, 432–434 (1997).
S Nenonen, A Shestakova, M Huotilainen, R Näätänen, Speech-sound duration processing in a second language is specific to phonetic categories. Brain Lang 92, 26–32 (2005).
I Winkler, et al., Brain responses reveal the learning of foreign language phonemes. Psychophysiology 36, 638–642 (1999).
K Tremblay, N Kraus, T McGee, The time course of auditory perceptual learning: Neurophysiological changes during speech-sound training. NeuroReport 9, 3557–3560 (1998).
MH Giard, F Perrin, J Pernier, P Bouchet, Brain generators implicated in the processing of auditory stimulus deviance: A topographic event-related potential study. Psychophysiology 27, 627–640 (1990).
R Näätänen, The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function. Behav Brain Sci 13, 201–288 (1990).
C Escera, K Alho, I Winkler, R Näätänen, Neural mechanisms of involuntary attention to acoustic novelty and change. J Cogn Neurosci 10, 590–604 (1998).
E Yago, C Escera, K Alho, MH Giard, Cerebral mechanisms underlying orienting of attention towards auditory frequency changes. NeuroReport 12, 2583–2587 (2001).
P Paavilainen, et al., Evidence for the different additivity of the temporal and frontal generators of Mismatch Negativity: A human auditory event-related potential study. Neurosci Lett 349, 79–82 (2003).
T Rinne, K Alho, RJ Ilmoniemi, J Virtanen, R Näätänen, Separate time behaviors of the temporal and frontal Mismatch Negativity sources. NeuroImage 12, 14–19 (2000).
P Tallal, Auditory perception, phonics and reading disabilities in children. J Acoust Soc Am 62, S100 (1977).
B Wright, et al., Deficits in auditory temporal and spectral resolution in language-impaired children. Nature 387, 176–178 (1997).
T Baldeweg, A Richardson, S Watkins, C Foale, J Gruzelier, Impaired auditory frequency discrimination in dyslexia detected with mismatch evoked potentials. Ann Neurol 45, 495–503 (1999).
JR Saffran, RN Aslin, EL Newport, Statistical learning by 8-month-old infants. Science 274, 1926–1928 (1996).
JR Saffran, EL Newport, RN Aslin, Word segmentation: The role of distributional cues. J Mem Lang 35, 606–621 (1996).
M Atienza, et al., Effects of temporal encoding on auditory object formation: A Mismatch Negativity study. Cogn Brain Res 16, 359–371 (2003).
S Shalgi, LY Deouell, Direct evidence for differential roles of temporal and frontal components of auditory change detection. Neuropsychologia 45, 1878–1888 (2007).
LY Deouell, The frontal generator of the Mismatch Negativity revisited. J Psychophysiol 21, 188–203 (2007).
T Kujala, The role of early auditory discrimination deficits in language disorders. J Psychophysiol 21, 239–250 (2007).
R Sparks, L Ganschow, The impact of native language learning problems on foreign language learning: Case study illustrations of the linguistic coding deficit hypothesis. Mod Lang J 77, 58–74 (1993).
D Perani, J Abutalebi, The neural basis of first and second language processing. Curr Opin Neurobiol 15, 202–206 (2005).
E Tricomi, MR Delgado, BD McCandliss, JL McClelland, JA Fiez, Performance feedback drives caudate activation in a phonological learning task. J Cogn Neurosci 18, 1029–1043 (2006).
T Zaehle, T Wüstenberg, M Meyer, L Jäncke, Evidence for rapid auditory perception as the foundation of speech processing: A sparse temporal sampling fMRI study. Eur J Neurosci 20, 2447–2456 (2004).
MF Joanisse, JS Gati, Overlapping neural regions for processing rapid temporal cues in speech and nonspeech signals. NeuroImage 19, 64–79 (2003).
A Costa, M Hernández, N Sebastián-Gallés, Bilingualism aids conflict resolution: Evidence from the ANT task. Cognition 106, 59–86 (2008).
C Pallier, A Colomé, N Sebastián-Gallés, The influence of native-language phonology on lexical access: Exemplar-based versus abstract lexical entries. Psychol Sci 12, 445–449 (2001).
N Sebastián-Gallés, S Soto-Faraco, Online processing of native and non-native phonemic contrasts in early bilinguals. Cognition 72, 111–123 (1999).
RC Oldfield, The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia 9, 97–113 (1971).
E Azañón-Gracia, N Sebastián-Gallés, Dichotic listening test in Spanish: Pairs of disyllabic words. Rev Neurol 41, 657–663 (2005).
S Corbera, MJ Corral, C Escera, MA Idiazábal, Abnormal speech sound representation in developmental stuttering. Neurology 65, 1246–1252 (2005).
E Martínez Celdrán Fońetica (Teide, Barcelona, 1984).

Information & Authors


Published in

Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 105 | No. 42
October 21, 2008
PubMed: 18852470


Submission history

Received: May 23, 2008
Published online: October 21, 2008
Published in issue: October 21, 2008


  1. mismatch negativity
  2. event-related potentials
  3. bilingualism


We thank X. Mayoral for his technical support and I. Ivanova and M. Gillon Dowens for their comments on previous versions of this manuscript. This work was supported by the Generalitat de Catalunya Grants SGR2005-00953, SGR2005-01025, and SEJ-2007-60751, and Project Consolider-Ingenio 2010 (CSD 2007-00012). B.D. was supported by a Spanish Government postgraduate fellowship.


This article is a PNAS Direct Submission.



Begoña Díaz [email protected]
Grup de Recerca Neurociència Cognitiva, Parc Científic UB and Hospital Sant Joan de Déu (Edifici Docent), Santa Rosa 39-57, Esplugues, Barcelona 08950, Spain;
Departament de Psicologia Bàsica, Facultat de Psicologia, Universitat de Barcelona, Passeig de la Vall d'Hebron 171, Barcelona 08035, Spain;
Cristina Baus
Departamento de Psicología Cognitiva, Facultad de Psicología, Universidad de La Laguna, Campus Guajara S/N Tenerife 38205, Spain; and
Carles Escera
Cognitive Neuroscience Research Group, Department of Psychiatry and Clinical Psychobiology, Faculty of Psychology, University of Barcelona, Passeig Vall d'Hebron 171, Barcelona 08035, Spain
Albert Costa
Grup de Recerca Neurociència Cognitiva, Parc Científic UB and Hospital Sant Joan de Déu (Edifici Docent), Santa Rosa 39-57, Esplugues, Barcelona 08950, Spain;
Departament de Psicologia Bàsica, Facultat de Psicologia, Universitat de Barcelona, Passeig de la Vall d'Hebron 171, Barcelona 08035, Spain;
Núria Sebastián-Gallés
Grup de Recerca Neurociència Cognitiva, Parc Científic UB and Hospital Sant Joan de Déu (Edifici Docent), Santa Rosa 39-57, Esplugues, Barcelona 08950, Spain;
Departament de Psicologia Bàsica, Facultat de Psicologia, Universitat de Barcelona, Passeig de la Vall d'Hebron 171, Barcelona 08035, Spain;


To whom correspondence should be addressed. E-mail: [email protected]
Author contributions: B.D., C.E., and N.S.-G. designed research; B.D. and C.B. performed research; B.D. and C.E. analyzed data; and B.D., C.B., C.E., A.C., and N.S.-G. wrote the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations


Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.

Citation statements



If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by


    View Options

    View options

    PDF format

    Download this article as a PDF file


    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to access the full text.

    Single Article Purchase

    Brain potentials to native phoneme discrimination reveal the origin of individual differences in learning the sounds of a second language
    Proceedings of the National Academy of Sciences
    • Vol. 105
    • No. 42
    • pp. 16059-16407







    Share article link

    Share on social media