Cetaceans are the next frontier for vocal rhythm research

While rhythm can facilitate and enhance many aspects of behavior, its evolutionary trajectory in vocal communication systems remains enigmatic. We can trace evolutionary processes by investigating rhythmic abilities in different species, but research to date has largely focused on songbirds and primates. We present evidence that cetaceans—whales, dolphins, and porpoises—are a missing piece of the puzzle for understanding why rhythm evolved in vocal communication systems. Cetaceans not only produce rhythmic vocalizations but also exhibit behaviors known or thought to play a role in the evolution of different features of rhythm. These behaviors include vocal learning abilities, advanced breathing control, sexually selected vocal displays, prolonged mother–infant bonds, and behavioral synchronization. The untapped comparative potential of cetaceans is further enhanced by high interspecific diversity, which generates natural ranges of vocal and social complexity for investigating various evolutionary hypotheses. We show that rhythm (particularly isochronous rhythm, when sounds are equally spaced in time) is prevalent in cetacean vocalizations but is used in different contexts by baleen and toothed whales. We also highlight key questions and research areas that will enhance understanding of vocal rhythms across taxa. By coupling an infraorder-level taxonomic assessment of vocal rhythm production with comparisons to other species, we illustrate how broadly comparative research can contribute to a more nuanced understanding of the prevalence, evolution, and possible functions of rhythm in animal communication.

Method S1.Calculating interval CVs from parameters reported in the cetacean literature.Few studies have explicitly focused on quantifying rhythm in cetacean vocalizations, but some provide metrics that we can retroactively use to do so.Most cetacean papers that measure temporal features of vocalizations present summary statistics for inter-event intervals (IEIs; the durations of silences separating consecutive events) rather than inter-onset intervals (IOIs; the durations of time between the starts of consecutive events).This is not necessarily an issue: silences can be just as or more important than sounds in rhythm production and perceptions (1)(2)(3); IEIs and IOIs are highly correlated if event duration is relatively consistent; and CVs can still be calculated for IEIs to get an initial sense of rhythmic regularity.Given this precedent, our quantification of rhythm in cetacean vocalizations is generally derived from IEIs, but we recommend that cetacean researchers report IOIs in the future to foster comparability and consistency with other rhythm researchers.
On a methodological note, no universal threshold exists for how close to 0% a CV must be to be perceived as isochronous (4).Humans still characterize a sound sequence as isochronous when the sound onsets are distorted from isochrony by ~4-17% ( 5), but similar (and essential) psychophysical studies have not been done for most other species.Rather than setting an arbitrary and likely inappropriate CV threshold for isochrony for non-human animal vocalizations, we propose considering species-and vocalization-specific CV values along a "more-to-lessisochronous" continuum (Figure 2), with several human-derived metrics (5-7) as guideposts.As additional psychophysical work is done on non-human animals, these guideposts should be updated to include other species.
As mentioned in the main text, we are interested in rhythm over any timescale.Given the average duration of cetacean vocalizations, timescales are typically on the order of seconds to minutes for mysticetes and milliseconds to seconds for odontocetes.9) and (C) a heterochronous sequence of pulses in fin whale song (10).In (C), short (8.55±0.24s, CV=2.8%) and long (11.34±0.09s, CV=0.8%) inter-pulse intervals alternate.

Vocal learning hypothesis
Vocal learning abilities are a preadaptation for rhythm production and perception abilities (11).Advanced vocal learning abilities are a preadaptation for a specific form of rhythmic entrainment: the ability to spontaneously perceive a beat and synchronize bodily movements to it (i.e., beat perception and synchronization, BPS) (12).Species with more advanced vocal learning abilities will have more advanced rhythm production and perception abilities.
Cetaceans are one of just eight animal groups with confirmed vocal learners, and both mysticetes (e.g., humpback whales, bowhead whales) and odontocetes (e.g., orcas, beluga whales, Risso's dolphins, bottlenose dolphins) are represented (13).Odontocetes may have more advanced vocal learning abilities than mysticetes, and are typically grouped with humans at the pinnacle of such abilities (13).Only species with the most advanced vocal learning abilities (e.g., the ability to imitate novel sounds or the vocalizations of other species (13)) will be able to spontaneously perceive and synchronize to externally generated acoustic rhythms (i.e., be capable of BPS).
Certain odontocetes can imitate novel sounds and vocalizations from other species (this ability has not yet been recorded in mysticetes) and should be capable of BPS (13).
Species with enhanced breathing control will have advanced vocal rhythm production abilities.For example, they will be better able to produce and imitate vocal rhythmic patterns than species with limited breathing control, as breathing and vocalizing typically both rely on breath control.
As conscious breathers, cetaceans have extremely advanced behavioral control of breathing (17).A spectrum of abilities also exists, with significant inter-specific variation in breathing anatomy, function, and capacity (17).Odontocetes are generally more extreme in behaviors related to breathing (e.g., dive depth, dive duration, swimming speed) than mysticetes, and may thus have more advanced breathing control (17).Unlike other species for which this hypothesis has been considered, however, mysticetes and odontocetes are capable of recirculating air and can produce many vocalizations on a single breath (18).Cetaceans could thus be a counterexample to the key prediction of this hypothesis, given that vocalizing and breathing are disconnected in cetaceans in a way rarely seen among mammals (18).

Sexual selection hypothesis
Rhythm, and other musical abilities, evolved due to (runaway) sexual selection for complex acoustic displays (19,20).
Vocalizations with more rhythmic structure or complexity should be sexually selected and hence indicate increased fitness of the vocalizer and/or enhanced mate preference of the listener.
Mysticete song is likely under sexual selection and is rhythmic (21,22), while non-song vocalizations are not thought to be under sexual selection and seem to be less rhythmic (23,24).Quantifying song rhythmic structure and complexity and comparing it with various measures of male reproductive success (e.g.

, number of mating opportunities, length of consortships, number of paternities) across individuals could indicate whether vocal rhythm specifically is under sexual selection in mysticetes.
Some odontocetes produce rhythmic vocalizations during courtship (25,26), but it is unknown if these displays are under sexual selection; similar analyses comparing vocalization rhythmicity and reproductive success could be done for those species.

Motherinfant bonding hypothesis
Rhythmic communication and entrainment evolved to establish an emotional bond during mother-infant Species with extended maternal care periods (and where both mothers and calves vocalize) should have more Cetaceans have prolonged, but very variable, periods of calf care (29).Weaning age is later in odontocetes (~16.5 months) than mysticetes (~7 months) (29).Post-weaning maternal care is limited in mysticetes, while some odontocetes stay with their mothers for life (29).interactions, to ensure that mothers would become committed to extended care of infants (27,28).advanced vocal rhythmic abilities than those with short care periods.Child-directed communication ("motherese") should be more rhythmic than communication directed at other age classes (30).
Evidence of motherese has been shown for certain mysticetes (31) and odontocetes (32).Such evidence has manifested as vocalizations specific to mother-calf contexts (e.g., grey whales) or spectrally-modified vocalizations (e.g., common bottlenose dolphins) (31,32).Very few studies have specifically investigated motherese in the form of rhythmic/temporal modifications to vocalizations, although a recent study found that common bottlenose dolphin mothers altered spectral, but not temporal, features of whistles in the presence of their calves (32).

Group display hypothesis
Individual rhythms-in particular, isochrony-evolved as a byproduct of group displays, largely due to the need to synchronize during displays (33,34).Synchronized group displays promote cohesion and cooperation, and also signal group quality to outsiders (33,34).
Group-living animals will have more rhythmic communication than solitary animals.
While mysticetes do coalesce on shared breeding or feeding grounds, they typically have relatively simple social structures and small group sizes outside of the breeding season (35).Outside of the mother-calf dyad, most mysticetes are thus considered solitary (35,36).This contrasts with group-living odontocetes (37).Odontocetes typically live in groups, which fall along spectrums of size (from few to thousands) and stability (undifferentiated relationships, weak community structure, fissionfusion networks, long-term groups, etc.) (37).Species where individuals regularly synchronize behaviors will have advanced individual rhythm production and perception abilities versus species that rarely synchronize.
Cetaceans, particularly odontocetes, synchronize many different types of behaviors, including breathing, swimming, migrating, and vocalizing (25,(38)(39)(40).There is anecdotal evidence linking behavioral synchronization to vocal rhythms for at least one odontocete species, the Atlantic spotted dolphin (38).Indo-Pacific bottlenose dolphins in Shark Bay, Western Australia would be an interesting case study: there, males work together in long-term alliances to gain reproductive access to females (25).They synchronize their behaviors and their (isochronous) vocalizations when cooperatively guarding females from rival alliances (25). 2 with rhythm descriptions (column 5) following the definitional framework (8).See Table 2  Species are arranged by phylogenetic relatedness (41).For family names and the behavioral context in which each vocalization is produced, see Table 2.The "Unit" column gives the acoustic unit of interest in the vocalization, and rhythm is considered at the level of the inter-unit interval.The "Reference" column gives the reference (and, when appropriate, relevant figures/tables) for the values in the "CVs" column.The CVs used in Figure 2 are bolded.We conceptualized heterochronous rhythms as multiple overlaid isochronous rhythms and calculated CVs separately for each constituent isochronous rhythm.

Table S2. Extended version of Table
1 Individual sound units (43); called 'parts' in (44); include call types A, B, C, D, and E (45) 2 An organized combination of calls (44); in A-only and B-only phrases, the phrase is the single A or B call, respectively; in non-contiguous A-B phrases, the phrase comprises the A call, the B call, and the intervening silent interval; in contiguous A-B phrases, the phrase comprises the combined A and B call (with no silent interval separating them) 3 "One or more phrases repeated in a regular cadence" (44) 4 Each individual sound (47) A sequence of star wars vocalizations "with shorter and more consistent inter-song intervals" ( A sequence of star wars vocalizations "with relatively long inter-song intervals" ( A sequence of star wars vocalizations "with the shortest inter-[vocalization] intervals exhibiting a bimodal distribution" (23) Some gunshots were missed due to intermittent audio signal Gunshots, downsweeps, moans, low-frequency pulsive calls, etc.

Figure S1 .
Figure S1.Definitional framework for characterizing vocal rhythms, adapted from (8). (A) The temporal structure of vocalizations can be described using this decision tree.Visual examples depict the timing of a sequence of vocalizations (black dots).Waveforms (top) and spectrograms (bottom) show (B) an isochronous sequence of sperm whale echolocation clicks (inter-click intervals=1.48±0.04s, CV=2.7%) (9) and (C) a heterochronous sequence of pulses in fin whale

Table S1 .
Extended version of Table 1 with added details in italics.See Table 1 caption for details.

Table S3 .
caption for details.Coefficients of variation (CVs) for examples of isochronous (I) and heterochronous (H) rhythm in mysticete (top) and odontocete (bottom) vocalizations.

PU terminal phrase inter-call interval: 3.8±0.2 s (5.3%
Average whistle inter-loop interval CV calculated from 16 animals was 21.1% (ranged from 9% to 46%) A whistle or a buzz A single whistle and buzz separated by a silent interval "composed of between 2 and 159 individual clicks" "composed of 6-18 individual burst-pulse units" Two types of calls (A and B) Inter-A-call interval (type II vocal sequence): 5.222±1.351s (25.9%)Inter-A-call interval (type IX vocal sequence): 2.611±0.367s (14.1%)