Climate-induced phenological shifts in a Batesian mimicry complex

Significance Climate change can degrade ecological interactions by separating interacting species in space and time, but this is not the case in one of the best-studied examples of mimicry in which hoverflies (mimics) imitate stinging wasps and bees (models). While there is no evidence of the emergence of mimics and models tracking climate change in the same way, historical records suggest that the mimicry complex is undergoing complex shifts in evolutionary pressures under climate change through changes in the relative emergence patterns of model-mimic pairs. This finding is based on the community-level description of mimetic relationships (comparing 2,352 pairs of species) and the most comprehensive demonstration of the importance of phenology for the fitness of mimics, models, and predators. Climate-induced changes in spatial and temporal occurrence of species, as well as species traits such as body size, each have the potential to decouple symbiotic relationships. Past work has focused primarily on direct interactions, particularly those between predators and prey and between plants and pollinators, but studies have rarely demonstrated significant fitness costs to the interacting, coevolving organisms. Here, we demonstrate that changing phenological synchrony in the latter part of the 20th century has different fitness outcomes for the actors within a Batesian mimicry complex, where predators learn to differentiate harmful “model” organisms (stinging Hymenoptera) from harmless “mimics” (hoverflies, Diptera: Syrphidae). We define the mimetic relationships between 2,352 pairs of stinging Hymenoptera and their Syrphidae mimics based on a large-scale citizen science project and demonstrate that there is no relationship between the phenological shifts of models and their mimics. Using computer game-based experiments, we confirm that the fitness of models, mimics, and predators differs among phenological scenarios, creating a phenologically antagonistic system. Finally, we show that climate change is increasing the proportion of mimetic interactions in which models occur first and reducing mimic-first and random patterns of occurrence, potentially leading to complex fitness costs and benefits across all three actors. Our results provide strong evidence for an overlooked example of fitness consequences from changing phenological synchrony.

Climate-induced changes in spatial and temporal occurrence of species, as well as species traits such as body size, each have the potential to decouple symbiotic relationships. Past work has focused primarily on direct interactions, particularly those between predators and prey and between plants and pollinators, but studies have rarely demonstrated significant fitness costs to the interacting, coevolving organisms. Here, we demonstrate that changing phenological synchrony in the latter part of the 20th century has different fitness outcomes for the actors within a Batesian mimicry complex, where predators learn to differentiate harmful "model" organisms (stinging Hymenoptera) from harmless "mimics" (hoverflies, Diptera: Syrphidae). We define the mimetic relationships between 2,352 pairs of stinging Hymenoptera and their Syrphidae mimics based on a large-scale citizen science project and demonstrate that there is no relationship between the phenological shifts of models and their mimics. Using computer game-based experiments, we confirm that the fitness of models, mimics, and predators differs among phenological scenarios, creating a phenologically antagonistic system. Finally, we show that climate change is increasing the proportion of mimetic interactions in which models occur first and reducing mimic-first and random patterns of occurrence, potentially leading to complex fitness costs and benefits across all three actors. Our results provide strong evidence for an overlooked example of fitness consequences from changing phenological synchrony. hover flies | Batesian mimicry | phenology | mismatch | climate change T he biological consequences of climate change for individual species have been documented in detail: distributions are moving poleward (1), phenological events are changing [most often documented as an advance in spring phenological events (2)], and organism size is reducing (3). These changes also have consequences for ecological interactions, as interacting species may become separated in space (4) or time (5,6). The challenge for the study of these eco-evolutionary processes is the linking of (i) ecological data on the nature of relationships among taxa, (ii) the shifting spatiotemporal associations among those taxa, and (iii) data on the fitness costs and benefits that result from changes in the strength of interactions. Here, we provide a comprehensive evaluation of a putative symbiosis that has the potential to provide significant insights into community-level, climate-driven ecological change: model-mimic complexes. Mimetic relationships can be cooperative [e.g., Müllerian mimicry, where multiple defended species evolve a common phenotype as a result of selection to share the burden of educating predators (7)] or parasitic [e.g., Batesian mimicry, where an undefended species exhibits the phenotype of a defended species to benefit from the learned aversion of predators to that phenotype (8)]. Batesian mimics (8) exploit a range of sensory modalities to enhance their similarity to defended models (9). One taxon that exploits multiple sensory cues is the hoverflies (Diptera: Syrphidae), which have evolved to produce visual (10), behavioral (11), and acoustic (12) cues that resemble those of stinging Hymenoptera.
Previous work has suggested that the order of appearance of mimics and models is important to the success of mimicry. In a classic study, Mostler demonstrated that avian predators learn to avoid stinging Hymenoptera quickly, but that being presented with palatable Diptera can reverse that learning (13). Highfidelity syrphid mimics of stinging Hymenoptera in North America emerge predominantly in spring before their models (14,15). It has been suggested that this phenological pattern has evolved so that the mimics emerge before naïve fledglings begin to feed independently and, to begin with, indiscriminately (16). Once the predators have been educated by the Hymenoptera models, the mimics show a second peak in emergence later in the summer. While consistent with theory on Batesian mimicry, other work has failed to support this idea, showing that mimics and models are largely synchronous and independent of fledging dates (17). However, no studies have attempted to test this idea comprehensively over large datasets and temporal ranges. A further source of uncertainty stems from the role of contemporary climate change in altering the spatiotemporal patterns of occurrence of mimics and models (18), with consequences for predator learning and, ultimately, the fitness of mimics, models, and the predators themselves.
Our understanding of the evolutionary consequences of global environmental change remains poorly developed, partly because Significance Climate change can degrade ecological interactions by separating interacting species in space and time, but this is not the case in one of the best-studied examples of mimicry in which hoverflies (mimics) imitate stinging wasps and bees (models). While there is no evidence of the emergence of mimics and models tracking climate change in the same way, historical records suggest that the mimicry complex is undergoing complex shifts in evolutionary pressures under climate change through changes in the relative emergence patterns of modelmimic pairs. This finding is based on the community-level description of mimetic relationships (comparing 2,352 pairs of species) and the most comprehensive demonstration of the importance of phenology for the fitness of mimics, models, and predators.
of the difficulty in measuring the consequences of natural selection and partly due to interactions between phenotypic plasticity and adaptive change (19). In this paper we make use of alternative approaches to the study of evolutionary change under warming climates to quantify shifting phenological antagonism in a Batesian mimicry complex. This is achieved through (i) defining the mimetic relationships within a large pool of Batesian models and mimics, (ii) quantifying the phenological trends between those models and mimics, (iii) estimating empirically the fitness consequences of phenological asynchrony for mimics, models and predators, and finally (iv) demonstrating changes in optimal phenological patterns under recent climate change.
Results and Discussion Study 1: Community-Level Mimetic Networks. We selected 42 species of Syrphidae and 56 species of bees and wasps from UK species lists based on their abundance in UK biological records and their taxonomic and morphological distinctness (see SI Appendix, Table S1 for the species list). All of the 2,352 species pairs are known to have co-occurred spatially in at least one 10 × 10 km grid square based on biological records, with Jaccard overlap indices (the ratio of shared squares to the number of squares in which one or both species was found) of 0.9-33.3% (mean 12.6%). A web interface was created that randomly paired single representative images from the Syrphidae and Hymenoptera and requested users to rate the pairing on a scale from 1 (not at all similar) to 10 (extremely similar). The experiment ran from 18 March 2016-28 March 2017, during which the 2,352 potential pairwise combinations had been rated a total of 30,300 times, with a minimum of 3 ratings and a maximum of 29 ratings for individual pairs. The modal rating was 1 (indicating negligible similarity), and the mean rating 3.0 ± 2.2 SD (SI Appendix, Fig. S1). We consider any mean ratings ≥5 to be indicative of a mimetic relationship, which was the case for 237 pairwise combinations (hereafter "highfidelity pairs"). These 237 species pairs had Jaccard overlap indices of 2.6-28.4% (mean 13.5%). Fig. 1 shows a matrix of those mimetic similarities, highlighting these "islands of mimicry" in the wider Batesian complex that form the basis for our subsequent analyses, and example pairs for different mean similarity scores (SI Appendix, Fig. S2 for a larger version of Fig. 1 with species labeled). Our results correlate significantly with data from experiments using pigeons (20) (r = 0.757, P = 0.030, SI Appendix, Fig.  S3), indicating that our human ratings are meaningful. Study 2: Comparative Phenology of Models and Mimics. We use the rank biserial correlation (RBC, a correlation between rank data [e.g., emergence date] and a categorical variable [e.g., species] [21]) to quantify the degree of overlap in phenology, where RBC = 0 is random occurrence, RBC = −1 is all Hymenoptera models emerging before Syrphidae mimics, and RBC = 1 is all Syrphidae mimics emerging before Hymenoptera models. Average RBC was significantly lower than zero (median = −0.015, V = 1709500000, P < 0.001) for the entire community of 56 Hymenoptera and 42 Syrphidae, showing that, on average, the Syrphidae emerge later than the Hymenoptera. When only highfidelity model-mimic pairs were included, median RBC was slightly but significantly greater from zero (median = 0.093, V = 12092000, P < 0.001; SI Appendix, In other words, we find no evidence that models and mimics are advancing their phenology at the same rate. Study 3: Fitness Consequences of Phenological Mismatch. We quantified the fitness consequences of phenological change using a computer game-based behavioral experiment within which human participants could act as "predators" and make decisions concerning the profitability of three pairs of prey stimuli: (i) Apis mellifera and Eristalis tenax; (ii) Vespula vulgaris and Chrysotoxum cautum; and (iii) Bombus terrestris and Criorhina ranunculi (SI Appendix, Fig. S8 for stimuli and SI Appendix, Fig. S9 for an example screenshot). Participants were presented with all three model-mimic pairs in one of three phenological scenarios involving 25 models and 25 mimics: (i) mimics on average first, (ii) models on average first, or (iii) random presentation with equal mean order of presentation. Mimic-and model-first scenarios were created by increasing the relative probability of the later species from 0 to 100% in increments of 2% over the 50 screens (example sequences are shown in SI Appendix, Table S2), resulting in a mean RBC of 0.677 or −0.677 (SD = 0.103) depending on whether the model or mimic occurs earlier. Participants gained 5 points for clicking a mimic, lost 10 points for clicking a model, and leaving the insect did not change the score.
The phenological scenario had a significant effect on predation rates on models (χ 2 = 49.218, df = 2, P < 0.001) and mimics (χ 2 = 34.544, df = 2, P < 0.001), and on the score achieved by human predators (χ 2 = 51.282, df = 2, P < 0.001; SI Appendix, Table S3 for full model results). Random presentation produced the highest fitness (survival rate) in mimics, and these were significantly higher than in model-first (z = 3.073, P = 0.006) or mimic-first scenarios (z = 5.773, P < 0.001; Fig. 2A). The model-first sequence of prey items produced the highest fitness (survival rate) in models, and those outcomes were significantly higher than random (z = 3.050, P = 0.006) or mimicfirst (z = 6.983, P < 0.001; Fig. 2B), in agreement with the theory underpinning Batesian mimicry. However, random presentation produced significantly lower predator scores than mimic-first (z = 5.390, P < 0.001) or model-first (z = 6.849, P < 0.001; Fig. 2C). These results highlight the phenological antagonism among the three actors within the mimicry system: models benefit from educating predators (model-first), mimics benefit from unpredictability (random), and predators benefit from consistent education on either prey item (model-first or mimic-first).
Temporal Trends in Fitness. Finally, we can infer temporal trends in fitness outcomes for model-mimic pairs under contemporary climate change based on how their RBCs change over time. From Study 3 we know that there are differences in fitness outcomes where the order of appearance of models and mimics corresponds to mean RBCs of < −0.677 ("model-first"), −0.677 < RBC < 0.677 ("random"), or >0.677 ("mimic-first"), and so we can apply those thresholds to the biological recording data to infer fitness consequences of real-world sequences. Therefore, for each year, we classify each of the 237 high-fidelity pairs based on their RBC into a model-first, random, or mimicfirst pattern. There was a significant increase between 1960 and 2005 in the proportion of interactions in which the RBC corresponded to a model-first pattern (ρ = 0.454, P = 0.001, Fig. 3A), a significant decline in the proportion of mimic-first sequences (ρ = −0.427, P = 0.003, Fig. 3B), and a weakly significant decline in the proportion of pairs in which the species occurred randomly (ρ = −0.295, P = 0.044, Fig. 3C; SI Appendix, Fig. S10 for sensitivity analysis around thresholds). Hence, we can infer a positive fitness trend for models from increasing model-first pairings (where models perform best) and decreasing mimic-first pairings (where models perform worst), a mixed fitness trend for mimics due to the decreasing proportion of random pairs (where mimics perform best) and a decrease in mimic-first pairs (where mimics perform worst), and positive fitness benefits for predators from the increasing proportions of model-first pairs (where predators perform best) and decreasing proportion of random pairs (where predators perform worst).
The evidence is building for a significant impact of phenological decoupling in a wide variety of systems. Snowshoe hares that molt after snowmelt show significant increases in mortality (22), many studies have shown that avian fitness is compromised if peak abundance of food does not coincide with chick rearing (23), and tritrophic studies suggest that oak-caterpillar-bird systems may have little room for buffering from warming springs (24). These exemplar studies are being carried out against a back drop of dynamic shifts in the degree of phenological synchrony over the past few decades (5,25). However, while previous studies have tended to find the negative aspects of phenological shifts, our data suggest that climate change will result in an increase in phenologically optimal emergence patterns that benefit (at least in part) all three actors within the mimicry system.
Building on past work (13), we have now developed comprehensive evidence to support the theory (26) that the evolutionary costs and benefits of mimicry to models, mimics, and predators depend upon the relative phenological patterns of models and mimics. We have used these findings to help understand the implications of changes in the temporal overlap of models and mimics in a classical Batesian mimicry system. The different actors within the mimicry system each experience costs and benefits from different phenological patterns: hymenopteran models benefit in all cases from increased model-first, decreased mimic-first, and decreased random patterns because these simplify-and, hence, accelerate-predator learning of aposematic signals. Mimics benefit from the decrease in mimic-first patterns and increase in model-first patterns because these accelerate predator learning of aposematic cues. However, mimics may suffer from a decline in randomness (their optimal phenological scenario according to Study 3), if predators are able to shift prey preferences as mimics increase in relative abundance later in the season. Finally, predators benefit from the reduction of randomness if they are able to respond by exploiting mimics when they become numerically dominant, but also show more subtle responses to declines in mimic-first and increased in model-first patterns. The results illustrate the benefits of integrating mechanistic and observational data to study large-scale eco-evolutionary processes within a phenologically antagonistic Batesian mimicry complex.

Methods
Study 1: Community-Level Mimetic Networks. Rather than relying upon subjective, isolated descriptions of mimetic relationships between particular species from published work, we derived a full matrix of mimetic associations using an extensive, online citizen science project. Human ratings of visual similarity correlate with data from experiments with birds and with ratings based on morphometric analysis (10) and provide a Gestalt perspective on similarity that avoids issues with the definition of particular traits and with variations in both size and shape that can complicate computational image analysis (27). The online mimicry experiment, which can still be found at www.mimicryexperiment.net, was based around a simple PHP script that randomly selected two images-one from a pool of 42 Hymenoptera images and one from a pool of 56 Syrphidae images (one image for each species). The landing page of the online study contained brief details about the project and contact details for the lead author if participants required any further information. Participants were instructed to click on a button if they consented to take part. Study 1 was approved by the University of Leeds Faculty of Biological Sciences Research Ethics Committee (ref BIOSCI . Species were selected for inclusion based on a hierarchical process: first, the most common species were selected from the highest ranked abundance in the Hoverfly Recording Scheme (HRS) and the Bees, Wasps, and Ants Recording Scheme (BWARS) datasets (see below for details of those schemes). The rationale behind this criterion was that more common species are more likely to interact and, therefore, to have a true mimetic relationship if species with a similar morphology were present. We calculated the proportion of these putative model-mimic pairs that are known to have co-occurred in the same 10 km grid square in the HRS and BWARS dataset. Secondly, congeners with close morphological similarity were excluded. There are many genera of UK Syrphidae and Hymenoptera with similar morphology and we refrained from using physically similar congeners to reduce phylogenetic autocorrelation and redundant morphological variation in the dataset (e.g., Melanostoma mellinum was included, but Melanostoma scalare was excluded, while Lasioglossum leucozonium was included but Lasioglossum villosulum was excluded). Thirdly, additional species were added to incorporate further morphological diversity where particularly distinct morphologies were known to the authors (particularly the rarer C. cautum, C. ranunculi, and Arctophila superbiens from among the Syrphidae and Vespula rufa from among the Hymenoptera, which are all thought to be involved in mimetic relationships [28]). Since the pairing of images was done at random, the number of comparisons between pairs of images was not equal across the dataset. The ratings gathered during the experiment exhibited a highly positively skewed distribution, with 53.2% of raw ratings and 56.1% of mean ratings being <3, while only 5.1% of raw ratings and 0.3% of mean pair ratings were >7 (SI Appendix, Fig. S1). Representative pairs of images are shown in Fig. 1 for mean pair similarities of 7.5, 4.9, 3.1, and 1.0, along with a matrix of similarity values for all 2,352 comparisons. To validate the online experiment, we compared the ratings given to comparisons between eight of the hoverflies in our study that had previously been compared with V. vulgaris in pigeon experiments (20). In that previous experiment, pigeons were trained to peck at an image of V. vulgaris in return for a food reward, and then shown different Syrphidae images. The peck rate in response to the Syrphidae images was assumed to be proportional to the pigeon's perception of the similarity of the Syrphidae to the original training stimulus. Despite the low number of species in the pigeon data, as in previous studies that have used human ratings there was a significant correlation between our human ratings experiment and the pigeon peck rate data (r = 0.757, P = 0.030, SI Appendix, Fig. S3), indicating that our human ratings reflect morphological similarity in much the same way that birds might assess it.
Study 2: Comparative Phenology of Models and Mimics. We extracted biological records from two extensive, long-term biological recording schemes, the HRS and BWARS datasets, to recreate past trends in phenology in model and mimic communities. Both HRS and BWARS are citizen science projects that rely on ad hoc recording by a community of recorders, and require compulsory fields relating to the species identity, location of sighting, and name of recorder, with desirable fields that describe ecological variables (e.g., flowers visited, pollen collected, prey/host). While abundance data are very occasionally recorded for each submitted record on a broad categorical scale, these data are rarely available and so have not been incorporated into this analysis. Validation of records for both schemes involves checking the validity of species names and geographical coordinates, while subsequent verification is based on identification difficulty, known spatial distributions, known seasonal phenology, and scarcity (29,30). Before analysis, we cleaned the biological records to remove records from before 1960, records without ordinal dates, and records without valid species names. After this processing, the HRS contained 620,460 records of 288 hoverfly species from between 1960 and 2014 at time of analysis (27 January 2015). The BWARS dataset contained 451,624 records of 547 species from between 1960 and 2013 at time of analysis (27 January 2015). Biological recording data from the HRS and BWARS datasets show spatiotemporal patterns that are consistent with other UK biological recording data: a strong increase in the number of records through the latter part of the 20th Century (SI Appendix, Fig. S4) and a strong concentration of records in the south of the country and around centers of population density (SI Appendix, Fig. S5). Further descriptive statistics for the Syrphidae can be found in ref. 31.
For each year in which a species was recorded, we calculated the 5th, 50th, and 95th percentile flight dates (representing the leading edge, median, and trailing edge of the flight period, respectively) and then conducted linear regressions of these dates against a general measure of mean annual UK temperature (central England temperature [CET] [32]). CET uses an average of values from a set of three long-term meteorological stations in central England to create an averaged trend over the country. We calculated Pearson correlation coefficients to provide a measure of the strength of the trends, and regression coefficients to provide a measure of the rate of change in phenology. A comparison of model and mimic shifts across the three flight dates can be seen in SI Appendix, Fig. S6.
To quantify the relative phenology of the model and mimic communities, we assumed that the sequence of occurrence is the relevant phenological metric for learning within a Batesian mimicry system. We calculated RBC of the flight dates of all potential model-mimic pairs and for the subset of pairs for which mean similarity ratings were ≥5. The RBC approach gives a correlation coefficient ranging from −1 (no overlap, first sample entirely before second sample) to +1 (no overlap, first sample entirely after second sample). An RBC of 0 indicates random occurrence. To illustrate the relationship between RBCs and differences in flight dates within the BWARS and HRS datasets, the RBCs calculated for the real-world species pairs from the BWARS and HRS datasets are plotted against the difference in median flight dates in SI Appendix, Fig. S7.
Study 3: Fitness Costs of Phenological Mismatch. We designed a computer game within which human participants could act as "predators" and make decisions concerning the profitability of different "prey". A psychological approach was selected above computational algorithms for a number of reasons. First, as we note elsewhere in the manuscript, we have demonstrated that human ratings of similarity are correlated with those of avian model systems. Second, the available algorithms for quantifying modelmimic similarity such as the distance transform method (27), and the neural net approach of Bain et al. (33) are highly data hungry, requiring extensive data to parameterize them compared with our "Gestalt" system based on human assessments. There is no reason to expect that these computational methods would produce a more ecologically relevant result Mimetic fidelity is derived from a large citizen science study, phenological trends are derived from >1 million biological records, and the three categories of emergence are defined using known fitness consequences from behavioral experiments. Shaded areas are 95% confidence intervals.
than the use of human scores. Indeed, the neural network classifier of Bain et al. (33) ranked the similarity of hoverfly species to wasps in a very similar manner to humans (R2 = 0.74, P < 0.001; [34]). Third, we were interested in generating a fitness measure that was more than just similarity but also included speed-accuracy trade-offs that are important to prey survival (35). The game was built in the Vizard (WorldViz) virtual reality environment, programmed in Python, and displayed using an Oculus Rift DK2 immersive virtual reality headset (full field of view horizontal visual angle = 100°) while participants were seated. Participants were given an opportunity to read an information sheet describing the study and ask any questions before giving written informed consent to take part. Study 3 was approved by the University of Leeds Faculty of Biological Sciences Research Ethics Committee (ref . The experiment began with a short training phase during which participants were asked to "eat" (click) red triangles and leave blue circles, and a scoreboard kept track of their points. All participants performed well during this phase. Following the training phase, participants were presented with the experimental stimuli. These stimuli comprised three pairs of insect images chosen to represent the three broad groups of modelmimic relationships within the UK Syrphidae-Hymenoptera community: (i) A. mellifera and E. tenax; (ii) V. vulgaris and C. cautum; and (iii) B. terrestris and C. ranunculi (SI Appendix, Fig. S8 for stimuli). Participants were presented with a series of sequences of insect images against a grass background to enhance the noise in the image and reduce the use of image artifacts (e.g., cropping of images) as cues to the identity of the insects (SI Appendix, Fig.  S9). Each screen contained only one insect at any time. The insects were displayed in random positions and orientations on each trial, all within a 30°c entral visual angle (horizontally and vertically). The insect images covered ∼5°visual angle. The three pairs of insects were presented in one of three phenological scenarios involving 25 models and 25 mimics: (i) mimics more likely to be first, (ii) models more likely to be first, or (iii) random presentation (example sequences are shown in SI Appendix, Table S2). Sequences were generated by increasing the probability of occurrence of one image from 0 to 100% in increments of 2% over the 50 presentations, producing a series of unique model-first sequences with a mean RBC of 0.678 (SD = 0.103). Sequences were reversed to give the mimic-first sequences to retain the same RBC. Each participant was asked to decide whether or not to "eat" an insect by pressing a key on a keyboard, after which a score counter would change to reflect whether the decision was "correct": consuming harmless Syrphidae increased the score by 5 points, consuming stinging Hymenoptera reduced the score by 10 points, and leaving the insect did not change the score. Participants were told that "eating" some types of insects would reduce their score while eating others would increase it, and that the aim was to score as many points as possible. We analyzed the data from 45 participants to evaluate the consequences of phenological scenario on (i) mimic survival, (ii) model survival, and (iii) final participant scores (as a proxy for predator fitness). Each participant experienced all three of the phenological scenarios, where each scenario was presented using a different model-mimic pair so that the participants also saw all three model pairs (e.g., a participant may have seen Bombus pratorum and C. ranunculi in the model-first scenario, A. mellifera and E. tenax in the mimic-first scenario, and V. vulgaris and C. cautum in the random scenario). For tests (i) and (ii), we used generalized linear mixed effects models with binomial errors in the lme4 package in R to analyze the survival or predation of each target as a binary variable. The trial number (from 1 to 150) was a covariate and phenological scenario was a fixed effect, while the model-mimic pair and the participant ID were entered as random effects. For analysis (iii), the same models were run but with general linear mixed effects models with the participant score as the response and normal error distribution. The residuals of the model for analysis (iii) were checked to ensure that the data met the assumptions of normality and homogeneous variance. Full model results can be found in SI Appendix, Table S3.