Infants who are rarely spoken to nevertheless understand many words

Significance Decades of research—largely conducted in Western, child-centered contexts—have highlighted the speech that parents direct to children as the primary driver of language development, as opposed to the speech that children overhear. This conclusion has driven interventions globally to encourage parents to speak more to their children. Yet there is dramatic cross-cultural variation in how much adults speak to infants, raising the question of how children robustly learn language across contexts. We show that Tseltal infants—who are rarely spoken to—show knowledge of common nouns, as well as knowledge of greeting terms they could only learn through overhearing. These findings suggest that, for some infants, learning from overhearing may be critical to developing language.


Supporting Information Text
More Information on Honorific Greetings.Greetings within this community follow a call-and-response format wherein adults exchange honorific terms based on sex, age, status, and sometimes kinship.Honorific greetings occur whenever community members encounter one another, including when passing in the road, when joining a shared car, or when calling from the edge of a property to announce their presence to its inhabitants.The core honorific terms are each two syllables long: me'tik (for an older woman), tatik (for an older man), kantsil (for a younger woman), and tatil (for a younger man), and are delivered with a dipping intonation with stress and variable degrees of lengthening on the second syllable (for sample recordings, see 1).Adults greet each person in a group individually; each recipient responds with the appropriate honorific using the same intonation.In greeting, honorific terms are also often produced at a high volume, as they are frequently used between community members greeting each other from a distance outdoors.As these honorifics are the basic address forms, they are also used without the drawn-out intonation whenever politely addressing a Tseltal person, and can be used to politely reference people.Tseltal honorifics are thus likely to be highly salient linguistic items, even for a learner with limited visual access and limited language knowledge.Critically, infants are excluded from the greeting routine: there is no honorific for addressing a baby.Thus, any recognition of these honorific greetings necessarily reflects language knowledge learned through overhearing.

Method
Participant Recruitment.To recruit eligible participants, a research assistant from the community and the first author called at family compounds within a two-hour hike of the testing location.Mothers with young infants were often summoned by family members or neighbors, and if they expressed interest in the study, were typically scheduled to participate the following day.
When asked at this initial contact, mothers often provided an approximate age for their infants; they were encouraged to look for evidence of infants' exact birth dates-or as near an estimate as possible-before their scheduled participation.In cases where the child's birth date remained uncertain, as when the child was known to have been officially registered several weeks or months after their actual birth, a specific date of birth was estimated by triangulating the child's birth relative to those of infants born during the same period (mothers of similar-aged infants often remembered whether another infant from the same community was born before or after their own child).We sought to enroll approximately 30 infants, to match the sample size in Bergelson & Swingley (2012).We successfully recruited 33 infants, but only tested infants on the procedures reported in the main text if their caregivers reported that infants almost exclusively heard Tseltal at home, and were not yet walking (the latter criterion a reflection of prior ethnographic work linking the onset of walking to increases in child-directed language).
The consent procedure we used was the product of previously-conducted informational interviews with mothers in the community, and provided a structured format for participants to specify their comfort level with different potential forms of data (e.g., text transcripts, voice recordings, photos and videos including their or their children's faces) and different potential audiences (e.g., community members, local versus foreign academics and students, etc.).Given that mothers' primary associations with having to sign a physical document (typically just with an "X") were the doctor's office and bank-both contexts with amplified colonial power dynamics-consent for each task was given verbally (elicited in Tseltal by a local research assistant).
Paired-Picture Stimuli Preparation.All visual and auditory stimuli can be found at the online repository for this study (1,2).Experiment 1. Nouns in Experiment 1 were drawn from responses on initial interviews with mothers (N = 16) regarding Tenejapan infancy, including the first solid foods that infants eat, the beverages that they first sip, the animals that they are most frequently around, and the first recognizable words (and conventionalized animal sounds) that they eventually produce.
Responses were highly consistent across caregivers (who themselves were often reporting on observations and experiences across multiple children).We next collected multiple images for the nouns that had been repeatedly mentioned across caregivers, and, with feedback from a local research assistant, narrowed the images down to 2-4 per noun.A sample of adults (N = 9) was then asked to name these images, and we used only those nouns and images that elicited the target noun as the first or second word that adults produced.Next, we excluded nouns whose images translated poorly to the digital displays, often because naturalistic examples of them were difficult to uniquely identify, had a difficult scale to convey without context, or were too dark, monochromatic, or low-contrast (e.g., chenek-"beans").
Following (3), we constructed candidate noun-pairs by pairing one animate (or animate-adjacent, in the case of tep-"shoe") and one inanimate noun, avoiding pairing nouns with matching onsets (e.g., wakax and waj), and trying where possible to match syllable counts and approximated levels of salience and/or frequency, informed by mothers' reports.For example, we paired the exciting and apparently early-produced ts'i'-"dog" with k'ajk'-"fire," rather than with ja'-"water" (a notably less thrilling referent).We used our set of named images to construct image-pairs that were visually equated in terms of area and brightness, which we then presented to infants (N = 2) without audio to ensure that infants looked at both images when they were presented side-by-side.Infants almost never looked at the images for tuts'-"spoon" (then paired with wakax-"cow"), and overwhelmingly looked at the images for karo-"car" (then paired with mayil-"squash"), leading us to pair wakax and ch'uhmte'e ("cow-squash" [chayote]), and to introduce tep-"shoe" as the distractor for karo.Looking-time preferences for this final set were measured with one infant before being finalized.
We recorded audio prompts for all four carrier phrases with each noun, before constructing a single ordered list of trials such that: (a) each noun appeared with two distinct carrier phrases, (b) each carrier phrase was used an equal number of times (n = 8 trials), and (c) adjacent trials neither used the same carrier phrase nor tested the same noun-pair.

of 16
Ruthe Foushee, Mahesh Srinivasan Experiment 2. Basic honorific terms were identified via observation and consultation with local research assistants.We collected a set of potential target faces via online image searches, and culled it in collaboration with research assistants, before asking same sample of adults cited above (N = 9) how they would greet each pictured addressee.We use only images that uniformly elicited a single honorific greeting (i.e., no matter the age or status of the 'greeter').Sample audiorecordings of adults producing honorifics as greetings for our face stimuli can be found at the online repository for this study (1).In both previous informational interviews and the stimuli validation task, we also asked Tseltal adults how they would greet an infant.This question was nearly uniformly met with laughter.
To produce the critical trials, the same female voice recorded all four honorific greeting prompts, which were paired with the image-pairs and presented in a single pseudorandom order across children, mindful to avoid immediate repetitions of the same image-pairs or target honorifics.
Manual Gaze Coding.To prepare for the manual coding of infants' gaze, we first clipped the raw study-session videos into individual test trials, starting and ending with the appearance of the visual stimuli on the displays.Trained research assistants used Datavyu (4) to code these pre-clipped trial videos in three serial 'passes' ( 5), conducted over the entire set of clips.In the first coding pass, research assistants segmented the video clips into two or three consecutive periods, depending on experiment: (1) from the appearance of the visual stimuli to the onset of the mother's speech ("pre"), (2; only for Experiment 1) from the onset of the mother's speech to the onset of the target noun ("sp"), and (3) from the onset of the target word to the end of the trial ("target").At the end of the first pass, research assistants hid the columns they had just coded.The second and third coding passes were completed without sound.In them, research assistants coded infants' leftward looks over the entire duration of the trial (second pass), then infants' rightward looks over the entire duration of the trial (third pass).See online repository for video-coding manual.
Pair-based Mean Difference Score Calculation.To clarify the calculation of the pair-based mean difference score, we will use Each image-pair is presented twice, both times using the same carrier sentence: in one presentation, the target word will be alal ("baby"), and in the other, the target word will be ixim ("corn").The four trials corresponding to each noun pair are separated by interleaved trials testing other noun-pairs (the actual position of the four alal-ixim trials in the complete list of trials that infants saw is noted via the Trial # labels in Fig. S1).
Importantly, the difference in fixation proportions is always computed within image-pair, meaning, based on trial-level data using identical visual stimuli and the same carrier sentence.Thus, with useable data for all four trials testing alal-ixim, an infant's pair-based difference score for the noun-pair would be the mean of two values: (1) the proportion of the infant's gaze during the analysis window, which was directed to baby-a when the prompt was banti ay te alal?-"where is the baby?" minus the proportion of the infant's gaze during the analysis window, which was directed to baby-a when the prompt was instead banti ay te ixim?-"where is the corn?" and (2) the proportion of the infant's gaze during the analysis window, which was directed to baby-b when the prompt was ilawil te alal-"look at the baby," minus the proportion of their gaze during the analysis window, which was directed to baby-b when the prompt was instead ilawil te ixim-"look at the corn." Given the pair-based nature of this calculation, as soon as an infant is missing (or has had excluded) data for one of the four trials, the data for the other trial presenting the same image-pair and carrier sentence will be dropped as well.In this situation, the infant's overall difference score for the noun-pair will reflect the difference score computed for just one image-pair.While a pair-based (unaveraged) difference score remains calculable for a given noun-pair when only a single trial is missing, it can be impossible to calculate as soon as infants are missing two trials, and the two trials that remain are from across image-pairs (that is, if one trial is from the babyA-cornA image-pair, and the other is from the babyB-cornB image-pair).Both remaining trials will also be dropped.This level of detail is intended to illustrate the value of our second analysis, which-by seeking evidence of word recognition in infants' within-trial looking time preferences-enables us take advantage of all non-excluded trials, regardless of whether useable data also exists for their 'matches.'
Modelling Pair-Based Mean Difference Scores.As reported in brief in the main text, we modeled infants' difference scores to ensure that our positive impression of infants' performance was not driven by a few particularly high-scoring infants.Specifically, we fit linear mixed effects models to children's difference scores for all item-pairs, including random intercepts for each subject and item to account for individual-and item-level variability.Null models were fit using the lmer package in R (6, 7), using the following syntax: lmer(difference_score ∼ 1 + (1|subject) + (1|item_pair)).In both experiments, models with this random effects structure resulted in a singular fit.Thus, we refit the models without the random intercepts for item (which showed zero variance): lmer(difference_score ∼ 1 + (1|subject)).We used the anova function to compare these models to models without an intercept (lmer(difference_score ∼ 0 + (1|subject)).Code for all analyses and plots ( 8)-here and in the main text-can be found at https://github.com/foushee/tseltal-infants(2).
Experiment 1.As reported in the main text, we can interpret the intercept of the mixed effects model as the expected value of children's scores.The intercept was significantly positive (β0 = 0.10, 95% CI: [0.03, 0.17], t(21.45)= 2.77, p = 0.011, Cohen d = 0.28; see Table S3), and of similar magnitude to our unadjusted calculation of the mean, suggesting that our results are not driven by the robust word recognition of a few exceptional participants.S4), though this did not reach significance (at the α = 0.05 level).
Modeling the Ratio of Target/Non-target Looking in Pre-naming and Post-naming Windows.We follow (3) in complementing our pair-based measure of word recognition with an analysis that enabled us to analyze all of the (non-excluded) triallevel data for each infant.Specifically, we fit mixed effects logit models to infants' ratios of target versus non-target looking times (in number of 20ms bins) on each trial, with trial phase (pre-naming/post-naming, reference level=prenaming) as a fixed effect, and random intercepts for subject and item.The pre-naming phase was defined as from the onset of the visual stimuli to 366 ms after the onset of the target word, and the post-naming phase was defined as from 367 to 3500 ms after the onset of the target word.We fit these models using the lme4 package in R (6, 7) using the following syntax: glmer(cbind(num_target_20ms_bins, num_nontarget_20ms_bins) ∼ 1 + trial_phase + (1|subject) + (1|item_pair), family="binomial") (Model 1 in Tables S5 and S6).In the case of a singular fit, as when fitting the model to the data for Experiment 2, we drop the random intercept for item, and refit the model.
Below and in the main text, we exponentiate the coefficients of these models to report the effects in terms of odds ratios (raw-unexponentiated-coefficients can be found in Tables S5 and S6).We additionally report odds ratios for models fit to the data for all infants and updated to explore the effect of age (in months, mean-centered) on children's target-/non-target looking times: glmer(cbind(num_target_20ms_bins, num_nontarget_20ms_bins) ∼ 1 + trial_phase + age_months + trial_phase:age_months + (1|subject) + (1|item_pair), family="binomial") (Model 3 in Tables S5     and S6).Odds ratios for trial phase that are greater than 1 (positive model coefficients) are consistent with word knowledge, as they indicate that infants showed a greater looking time preference for the target image after hearing the target word than they had before hearing it.Similarly, odds ratios greater than one for the interaction between trial phase and child age (post-naming:Age) suggest increasing word recognition with age, such that older infants showed a greater positive effect of trial phase than did younger infants.Finally, we obtain significance levels for each parameter via Wald tests (using the Anova function in the car package; 9).
Experiment 1.Of the three models we tested (Table S5), the best-fit model showed a reliable effect of trial phase (post-naming     Trial numbers shown on the left reflect the true sequential position of the trials as they were presented to infants, interleaved with trials testing other noun-pairs.Difference scores for a given infant are computed first within each image-pair, then averaged to obtain a single measure of word recognition at the level of the noun-pair.Thus, a difference score for the noun-pair alal-ixim ("baby-corn") can be computed only if an infant provides useable gaze data on TRIAL 6 and TRIAL 15, and/or on TRIAL 10 and TRIAL 20 (but not if the infant only provides useable gaze data on, e.g., TRIAL 6 and TRIAL 10, or TRIAL 6 and TRIAL 20).Here, the infant is looking to the appropriate addressee-the older man-after hearing her mother deliver the honorific greeting tatik.

Fig
Fig. S1, which illustrates the trials testing the noun-pair alal-ixim-"baby-corn." Infants saw two alal-ixim image-pairs: in one, baby-a appears on the left, with corn-a on the right.In the other, corn-b appears on the left, with baby-b on the right.

Fig. S1 .
Fig. S1.Visual Schematic of Trials Corresponding to Single Noun-Pair.Trial numbers shown on the left reflect the true sequential position of the trials as they were presented to infants, interleaved with trials testing other noun-pairs.Difference scores for a given infant are computed first within each image-pair, then averaged to obtain a single measure of word recognition at the level of the noun-pair.Thus, a difference score for the noun-pair alal-ixim ("baby-corn") can be computed only if an infant provides useable gaze data on TRIAL 6 and TRIAL 15, and/or on TRIAL 10 and TRIAL 20 (but not if the infant only provides useable gaze data on, e.g., TRIAL 6 and TRIAL 10, or TRIAL 6 and TRIAL 20).

Fig. S3 .
Fig. S3.Example Honorific Paired-Picture Trial from Experiment 2. Video still taken from a trial testing honorific-pair tatik-kantsil (OLD MAN-YOUNG WOMAN).Here, the infant is looking to the appropriate addressee-the older man-after hearing her mother deliver the honorific greeting tatik.

Fig. S4 .
Fig. S4.Looking Time Durations (in s) by Trial Phase in Experiments 1 (top) and 2 (bottom).Vertical dashed lines mark the median duration, labeled in s The analysis window (second panel in each row) was 3133ms long; trials where infants looked for less than one-third of its duration (1044ms, or 1.04s) were excluded.

Experiment 1 .
In Experiment 1, there were 16 data points/infant for the first, pair-based analysis (1 paired difference score for each image-pair, 2 image-pairs/noun-pair; observed range= 2 − 16, M = 10.33,Med= 10, Mode= 10), and up to 32 data points/child for the second, trial-phase logistic regression analysis (1 data point for each trial; observed range= 16 − 32, M = 25, Med= 26, Modes= 16, 25, 31).That is, we were able to use 94 trials in the second analysis that we were forced to drop in the first (out of 528 trials total; 18% of all trials).
Experiment 2. In Experiment 2, there were up to 4 data points/child for the pair-based analysis (1 paired difference score for each image-pair, 2 image-pairs/honorific-pair; observed range= 1 − 4, M = 2.88, Med= 3, Modes= 2, 4), and up to 8 data points/child for the second, trial-phase logistic regression analysis analysis (1 data point for each critical trial; observed range= 3 − 8, M = 6, Med= 6, Mode= 8).We were thus able to 'redeem' data from 25 trials (out of 115 trials total; 22% of all trials), and three children (who had useable data on a combination of trials that precluded computing a mean difference score for either honorific-pair).Ruthe Foushee,