Recurrent viral capture of cellular phosphodiesterases that antagonize OAS-RNase L

Significance Horizontal gene transfer (HGT) of host genes to virus genomes is a common but underappreciated means of RNA virus evolution. We used a combination of sequence comparisons and protein structure predictions to reconstruct the evolutionary history of viral phosphodiesterase (PDE) genes. PDEs encoded by nidoviruses and rotaviruses appear to be descended from host AKAP7-like PDE genes and are potent antagonists of the antiviral OAS-RNase L pathway. By characterizing the evolutionary history of these genes, we shed light on how HGT shaped virus interactions with host immunity. By integrating evolutionary approaches, we overcome challenges posed by vast sequence divergence across viral and cellular PDE diversity to show that cellular AKAP7-like PDEs were recurrently acquired by two families of RNA viruses.

Phosphodiesterases (PDEs) encoded by viruses are putatively acquired by horizontal transfer of cellular PDE ancestor genes.Viral PDEs inhibit the OAS-RNase L antiviral pathway, a key effector component of the innate immune response.Although the function of these proteins is well-characterized, the origins of these gene acquisitions are less clear.Phylogenetic analysis revealed at least five independent PDE acquisition events by ancestral viruses.We found evidence that PDE-encoding genes were horizontally transferred between coronaviruses belonging to different genera.Three clades of viruses within Nidovirales: merbecoviruses (MERS-CoV), embecoviruses (HCoV-OC43), and toroviruses encode independently acquired PDEs, and a clade of rodent alphacoronaviruses acquired an embecovirus PDE via recent horizontal transfer.Among rotaviruses, the PDE of rotavirus A was acquired independently from rotavirus B and G PDEs, which share a common ancestor.Conserved motif analysis suggests a link between all viral PDEs and a similar ancestor among the mammalian AKAP7 proteins despite low levels of sequence conservation.Additionally, we used ancestral sequence reconstruction and structural modeling to reveal that sequence and structural divergence are not well-correlated among these proteins.Specifically, merbecovirus PDEs are as structurally divergent from the ancestral protein and the solved structure of human AKAP7 PDE as they are from each other.In contrast, comparisons of rotavirus B and G PDEs reveal virtually unchanged structures despite evidence for loss of function in one, suggesting impactful changes that lie outside conserved catalytic sites.These findings highlight the complex and volatile evolutionary history of viral PDEs and provide a framework to facilitate future studies.

coronaviruses | evolution | horizontal gene transfer | innate immunity | OAS-RNase L
Horizontal gene transfer (HGT) is a major force in ancient and ongoing evolution among diverse viruses and hosts.Within the virosphere, the acquisition and re-purposing of host genes has shaped evolutionary history spanning from the origins of viruses themselves (1) to more specialized interactions with host immune pathways.Viruses in distinct unrelated orders, nidoviruses and rotaviruses, have independently acquired host mRNAs encoding eukaryotic LigT-like 2H-phosphoesterases, specifically 2′-5′ phosphodiesterases (PDEs) with similarity to eukaryotic protein family members (2).
The role of PDEs in both coronavirus and rotavirus interactions with host innate antiviral immunity has been well characterized.Coronaviruses have large, single-stranded positive-sense RNA genomes of ~27 to 32 kilobases, the largest among all vertebrate viruses (3,4), whereas rotaviruses have segmented double-stranded RNA (dsRNA) genomes.The two virus families are located at distant positions within the broader RNA virus phylogeny, which itself is difficult to confidently infer given great diversity among RNA virus families (5).As with all RNA viruses, dsRNA molecules are produced during transcription and replication of coronaviruses and rotaviruses and directly activate host immune signaling proteins including melanoma differentiation-associated protein 5 (MDA5) (6)(7)(8) and oligoadenylate synthetases (OAS) (9).These sensors activate interferon responses and RNase L, respectively, to restrict viral replication.
Viruses that encode 2′-5′ PDEs such as the embecovirus and merbecovirus subgenera within the betacoronavirus genus, a rodent clade of alphacoronaviruses, and group A, B, and G rotaviruses can replicate unrestricted by the OAS-RNase L pathway.Their PDEs degrade the OAS-produced second messenger 2′-5′ oligoadenylate, blocking it from inducing RNase L dimerization and cleavage of single-stranded viral and cellular RNA (Fig. 1).The role of viral PDEs during infection has been demonstrated for the prototypical betacoronavirus mouse hepatitis virus (10), MERS-CoV (11,12), and group A rotavirus (13) and supported by evidence showing 2′-5′ oligoadenylate degradation via biochemical assay for other viral PDEs (14,15).Notably, selective pressure to encode a PDE may vary within an organism by cell type (16,17), so viruses of the same host may face different pressures when infecting different cell types.

pnas.org
In contrast to the extensive research on PDE function, the origin and evolutionary history of viral PDEs remain obscure.Based on structural similarity, the PDEs encoded by mouse hepatitis virus (18) and group A rotaviruses (15,19) have been linked to the PDE domain of mammalian AKAP7 proteins.Previously predicted structures for PDEs encoded by MERS-CoV (11), toroviruses (14) and group B and G rotaviruses (15) reveal an AKAP7 PDE-like fold and retain catalytic histidine (HxS/T) motifs.
The interface between viral PDEs and the OAS-RNase L pathway is an attractive opportunity to study evolutionary pressures related to the acquisition, fixation, and evolution of host-derived viral genes.The restriction of rotavirus (13) and coronavirus (10,12,17) replication in the absence of an active PDE along with the presence of these genes across diverse viruses provide strong evidence for potential fitness advantages conferred by PDE acquisition.
The capacity for viral genomes to accommodate new genomic material is a key determinant of their ability to evolve via HGT.Considered broadly, the genomic architecture of viral PDEs is of two types.PDEs encoded by rodent alphacoronaviruses (20), embecoviruses (14), and merbecoviruses (11) are ~240 to 300 amino acids long, translated off of a discrete viral mRNA, presumably exist in the cell as autonomous proteins, and exhibit high amino acid sequence divergence relative to conserved regions of the replicase gene.However, PDEs may also act as functional protein sub-domains of as few as ~115 amino acids.The rotavirus A PDE consists of 142 amino acids, rotavirus B PDE 115 amino acids, and rotavirus G PDE 116 amino acids, with all organized as the C-terminal subdomain of the >750 amino acid VP3 protein.
The torovirus PDE is a 141 amino acid C-terminal domain of the >4,000 amino acid polyprotein 1a, though whether it is proteolytically processed into an autonomous protein is unclear.
In this study, we identified several striking features at play in the evolution of viral PDEs.We used a combination of phylogenetics, conserved motif analysis, and AlphaFold-based structural predictions to characterize the origin of viral PDEs, their diversification following horizontal transfer, and the relationship of divergent viral PDEs to each other and mammalian AKAP7.Despite similarity in function and predicted structure with each other and mammalian AKAP7, we exclude the possibility of a single horizontal transfer event giving rise to viral PDEs and instead propose a more complex and volatile process of diversification.These findings advance our understanding of the history of viral PDEs as diverse and potent immune antagonists and provide a detailed characterization of how host-derived viral genes originate and evolve.

Viral PDEs Derive from Several AKAP-Like Horizontal Transfer
Events.We aligned 173 viral PDE amino acid sequences (18 torovirus, 10 alphacoronavirus, 55 embecovirus, 27 merbecovirus, 8 rotavirus G, 20 rotavirus B, and 35 rotavirus A) using MAFFT (version 7), which due to high sequence divergence involved intensive manual refinement (see Methods for details).Next, we constructed an unrooted maximum-likelihood (ML) phylogenetic tree (Fig. 2A).The PDE tree contains five major branches reflective of five putative acquisition events: rotavirus A (RVA) VP3, the rotavirus B (RVB) and rotavirus G (RVG) VP3 single common ancestor, toroviruses, embecoviruses, and merbecoviruses.The arrangement of some branches suffered low bootstrap support likely owing to sequence divergence, but the topology was sufficiently robust to reveal major events in viral PDE evolution.
We found evidence for three independent acquisitions of PDEs by ancestral nidoviruses.The nidovirus PDE phylogeny is incongruent with the nidovirus RdRp phylogeny (Fig. 2B) and the PDEs are not syntenic between virus groups, with one exception.Consequently, we found no support for a recent common ancestor of all nidovirus PDEs.In contrast to the three acquisitions of an ancestral cellular PDE, the rodent alphacoronavirus PDE and the embecovirus PDE are syntenic and have high sequence identity.This synteny suggests minimal homology-guided recombination played a role in the transfer of a PDE gene from one to the other.Embecovirus PDEs exhibit higher within-clade diversity, suggesting an older origin and providing support for the hypothesis that alphacoronavirus PDEs were acquired via recombination with embecoviruses (20), in contrast to the idea that they were acquired independently (Fig. 2A) (21).Given the overlapping ecological niches of rodent betacoronaviruses and alphacoronaviruses, we cannot exclude the possibility that ancestral viruses in each clade independently acquired a similar cellular PDE.Given the greater intra-clade divergence among embecovirus which supports an older origin; however, it is unlikely independently acquired PDEs would retain such high identity.
The rotavirus PDE phylogeny supports with high confidence two independent PDE gene captures-one on the RVA VP3 branch and one into an RVB/RVG VP3 common ancestor.We constructed a ML tree of 137 rotavirus VP3 genes representing all known rotavirus genotypes (RVA, RVB, RVC, RVG, RVH, RVI, and RVJ) with the PDE domain trimmed from the RVA, RVB, and RVG VP3s (Fig. 2C).The VP3 (with the PDE trimmed off) phylogeny matches the PDE phylogeny where RVA VP3 is extremely distant from the RVB and RVG VP3s, which derive from a recent common ancestor.This supports a parsimonious scenario in which that ancestor acquired a PDE which diversified at the same rate as the rest of the gene.Surprisingly, despite the synteny of the RVA and RVB/ RVG PDEs, their origins are clearly independent of each other.
Previous work identified mammalian AKAP7 central domains (CD) as structural and functional homologs of viral PDEs (22,23).However, low sequence conservation leaves the evolutionary link between the viral and cellular PDEs uncertain.While generating the viral PDE phylogeny (Fig. 2A), we identified several AKAP7-like motifs largely conserved in the viral PDEs (Fig. 2D) that proved critical to anchoring the low-identity alignment and conclusively link the viral PDEs to a cellular AKAP7-like ancestor, as no other eukaryotic PDEs share these motifs.One short motif (blue in Fig. 2D) is shared among all viral PDEs with the exception of merbecoviruses but is not present in any AKAP7.Though convergent evolution is formally possible, it is less parsimonious than this being another motif derived from the AKAP7-like ancestor.The motif sequence is AD(D/K/Q), amino acids that are enriched in the mammalian full-length AKAP7 AA 1-48 region that includes the nuclear localization sequence (NLS) (AA 40-47), suggesting the conserved viral motif is a remnant of N-terminal truncation that was repeatedly selected for during optimization of the OAS-RNase L suppression activity (22).
Finally, a PDE encoded by ORF4 of a shrew alphacoronavirus was recently reported (21), raising the possibility of an additional independent PDE acquisition, or virus-to-virus horizontal transfer.Its phylogenetic positioning remains uncertain on the end of a long branch with very low support due to sparse sampling (SI Appendix, Fig. S1A).Several of the key motifs, including the second catalytic motif, are degraded, indicating that this protein is almost certainly non-functional (SI Appendix, Fig. S1B).This observation further highlights the prevalence of PDE gene capture by HGT.
Ancestral Sequence Reconstruction of Nidovirus PDEs.Our phylogenetic analysis revealed clear distinctions between nidovirus PDE clades, supporting independent acquisition of PDEs by merbecoviruses, embecoviruses, and toroviruses.Nevertheless, confidence is clouded by the high sequence diver gence among PDEs and the shared, recombination-driven history of embecovirus and alphacoronavirus PDEs.Consequently, we performed additional analysis combining ancestral sequence reconstruction with AlphaFold2 structural modeling to further investigate relationships among viral PDEs (24).We initially focused on embecovirus NS2 and merbecovirus NS4b proteins given the higher degree of within-clade sequence diversity among these proteins, which likely indicates a combination of more thorough sampling and an older acquisition.We aligned 55 embecovirus NS2 sequences with the HsAKAP7 PDE domain.We then constructed a ML tree of embecovirus NS2 protein sequence with an HsAKAP7 outgroup rooting and used this tree as the basis for ancestral reconstruction in FastML (Fig. 3A).We aligned the HsAKAP7 and ancestral embecovirus NS2 sequences from major nodes and calculated amino acid identity.The deepest ancestral node exhibited substantially higher amino acid identity to HsAKAP7 than shallower nodes or observed extant sequences.
We conducted the same analysis with 28 merbecovirus NS4b sequences, which was of particular interest due to the wide range in size of these proteins (246-285 AA), largely due to a highly variable N-terminus.Ancestral sequence reconstruction produced an AncMerbecovirus (AncMerbeco) NS4b with 21.7% identity to HsAKAP7, again substantially higher than for ancestral sequences at reconstructed nodes or among observed sequences (Fig. 3B).The AncMerbeco NS4b is 246 AA long, the same as the NS4b encoded by human MERS-CoV isolates and significantly shorter than the 285 AA NS4b encoded by most HKU4 isolates.We also reconstructed ancestral sequences of alphacoronavirus and torovirus PDEs.In the case of alphacoronaviruses, this allowed comparisons not only with HsAKAP7 but also with ancestral embecovirus PDEs, critical for defining the relationship between them (SI Appendix, Fig. S2).structure of human (Hs)AKAP7 PDE to determine whether there were differences in RMSD, a quantitative estimate of the similarity of structures.Overall amino acid identity was the same between AncEmbeco (Fig. 4A) and AncAlphaCoV NS2 proteins (Fig. 4B) compared to HsAKAP7 (~27%) and RMSDs relative to the HsAKAP7 structure were nearly identical, indicating similar divergence from a cellular ancestor and suggesting that they may have a common origin, supporting a scenario of HGT between these clades.In contrast, AncMerbeco NS4b has the lowest amino acid identity to HsAKAP7(~21%) and highest RMSD (Fig. 4C), supporting its independent origin which, given the substantial within-clade diversity of these proteins (Fig. 2A), indicates an ancient acquisition event.In contrast, the torovirus PDE encoded on the 3' end of ORF1a has relatively high identity to HsAKAP PDE (~39%) and the lowest RMSD value (Fig. 4D).Coupled with the limited within-clade diversity among torovirus PDEs (Fig. 2A), this suggests that the torovirus PDE was acquired independently and more recently than the other nidovirus PDEs.

Nidovirus PDE Structural Models Show Similar Divergence from
A potentially complicating feature of the analysis is indels in viral PDE sequences.Relative to the HsAKAP7 PDE structure, which comprises just the central 206 AA of the 315 AA HsAKAP7 protein, nidovirus PDEs have additional domains at the N-termini (merbecovirus) or C-termini (embecovirus and alphacoronavirus).To account for the possibility that RMSD values might be inflated by poor modeling of these domains, we predicted the structure of the core PDE domains of nidovirus PDEs.The AncMerbeco core PDE was not successfully predicted by AlphaFold2, so we used the sequence from a more recent node, the AncMERS core PDE.Overall, similarity to HsAKAP7 PDE was generally unchanged regardless of whether the full-length or core viral PDE structural model was analyzed (SI Appendix, Fig. S3), bolstering the value of comparing full-length PDE structural models despite variable and disordered termini.

Motif and Structural Analysis Support Embecovirus-to-Alpha
coronavirus Horizontal Transfer of NS2.Our prior phylogenetic and structural analyses were consistent with a relationship between embecovirus and alphavirus PDEs, such that one was derived from the other via recombination.Because these viruses fall into different coronavirus genera, a single introduction event is implausible, leaving the possibility of a recombination event or acquisition of a similar cellular ancestor.Given the longer branch lengths in the embecovirus PDE phylogeny, we hypothesize the acquisition event occurred in ancestral embecoviruses and that ORF2 was subsequently transferred to the alphacoronavirus clade.To refine our understanding of this history more deeply, we conducted additional sequence and structural analysis.
We first analyzed conservation of key motifs between AncEmbeco NS2 and the reconstructed ancestral PDE sequences from the other viral clades.Compared to AncEmbeco NS2, AncAlphaCoV NS2 had the fewest amino acid differences, with only three substitutions outside the catalytic His motif in which the second residue frequently toggles among PDEs (Fig. 5A), supporting the phylogenetic inference that the embecovirus and alphacoronavirus NS2 proteins are closely related.We then conducted a series of overlays of AlphaFold2-predicted structural models of reconstructed PDEs as well as the known HsAKAP7 PDE structure.Specifically, we used a core PDE domain structural model to capture similarities/ differences in the PDE fold itself and avoid disordered regions or N-terminal/C-terminal domains that AlphaFold2 failed to model.Modeling the core region better approximates the HsAKAP7 PDE structure which was solved for just the 206 AA PDE central domain, not the entire protein.
The RMSD recovered by comparing the AncEmbeco and AncAlphaCoV NS2 models was comparable to the RMSD of AncEmbeco and AncHKU24, from a more recent inferred node in the embecovirus PDE phylogeny (Fig. 5 B and C), although amino acid identity was substantially lower.Comparing the AncEmbeco and AncAlphaCoV NS2 models to the HsAKAP7 PDE structure revealed that the viral PDEs are much closer to each other in amino acid identity and RMSD than either is to HsAKAP7 PDE (Fig. 5 D and E).Although it's possible that this is due to convergence in their structural evolution, the combination of higher amino acid identity and structural model similarity strongly support a common origin, which resulted from recombination.Finally, we compared the AncEmbeco NS2 model to the AncMERS NS4b model (AlphaFold2 failed to predict the AncMerbeco core PDE structure), showing that their divergence is comparable to that between the NS2 PDEs and HsAKAP7 PDE (Fig. 5F) and supporting a single origin of the embecovirus and alphacoronavirus PDEs.striking intra-clade diversification and structural divergence.In contrast, the embecovirus PDE phylogeny exhibits shorter branch lengths (Fig. 2B) and embecovirus PDEs have higher sequence identity with each other, even when accounting for greater sampling in the dataset.This supports their use as a comparator with merbecovirus NS4b.Relative to the AncEmbeco NS2, two reconstructed sequences at recently inferred nodes have high sequence conservation (Fig. 6A) and the modeled structures have a low RMSD value, suggesting minimal structural change throughout the course of embecovirus NS2 evolution (Fig. 6B).
We also compared how the radiation of embecovirus NS2 altered structure across the phylogeny by overlaying the modeled structures of the reconstructed NS2 PDEs ancestral to clades defined by the prototypical betacoronavirus mouse hepatitis virus (AncMHV) and bank vole (Myodes glareolus) betacoronavirus isolate-Grimso (AncGrimso) (Fig. 6C).This confirmed that PDE structural divergence is relatively constrained even among diverse embecoviruses.
In contrast, the merbecovirus NS4b has undergone significantly greater sequence divergence and structural variation.There has been notable extension, truncation, insertion, and deletion in the N terminus of the protein (Fig. 6D), likely facilitated by the fact that NS4b is encoded on a bicistronic ORF4ab viral subgenomic mRNA, which enables it to tolerate mutations that move the start codon upstream or downstream of the ancestral site within the overlapping region with ORF4a.However, even accounting for sequence divergence, the structural variation among merbecovirus NS4b proteins is remarkable.AncMerbeco and AncHKU25 NS4b (Fig. 6E); reconstructed at a less basal node)) exhibit 69.4% amino acid identity, comparable to AncEmbeco and AncGrimso (Fig. 6B), yet their RMSD is 2.71Å compared to 1.52Å for the embecovirus NS2 proteins and the deviation of reconstructed  sequences at recently inferred nodes of the embecovirus phylogeny is still 1.83Å (Fig. 6C).The reconstructed sequences of AncMERS and AncHKU4 NS4b exhibit just 26.6% amino acid identity and an average distance of deviation >3 Å, reflecting marked diversification among merbecovirus PDEs (Fig. 6F).

Sequence and Structural Analysis of Rotavirus PDEs Supports
Two Horizontal Transfer Events.Rotavirus PDE and VP3 phylogenies are consistent with independent acquisitions of PDEs in the RVA VP3 ancestor and in a common ancestor of RVB and RVG VP3 (Fig. 2 A and C).As with the nidovirus PDEs, ancestral sequence reconstruction was successful in producing sequences with the highest amino acid identity to HsAKAP7 at the deepest node of the tree (Fig. 7 A and C).
The AncRVA and AncRVB/G PDEs exhibited similar amino acid identity and RMSDs to HsAKAP7 of 34 to 36% and ~2.4 to 2.8Å (Fig. 7 B and D).However, their amino acid identity to each other was considerably lower than to HsAKAP7 PDE and their modeled structures were also relatively dissimilar, with RMSD >3Å (Fig. 7E).This result strongly suggests that these PDEs are similarly diverged from HsAKAP7 and each other, consistent with multiple origins of the rotavirus PDEs despite their synteny.A comparison of modeled structures of reconstructed RVB and RVG sequences reveal a stark contrast (Fig. 7F).These reconstructed PDE sequences had nearly 50% amino acid identity and an RMSD <1Å, consistent with the phylogenetic inference that the RVB and RVG PDEs arose from a single horizontal transfer event in a common ancestral VP3.

Loss of PDE Function in Rotavirus G Is Not Correlated with Major
Structural Changes.A previous study compared enzymatic activity of PDEs encoded by RVA, RVB, and RVG and found that the PDE of the RVG tested was inactive (15).Although a T90N amino acid substitution in the second catalytic motif is associated with loss of enzymatic activity, reversion of the substitution did not rescue function.These findings point to cryptic functional determinants outside the catalytic sites, as previously reported for MHV NS2 (25).The AncRVG and AncChicken RVG PDEs contain identical key motifs and threonines at position 90 (T90; Fig. 8A).In contrast, RVG inactive and other closely related viruses encode substitutions at position 90 and in other conserved motifs (Fig. 8A).To determine whether structural divergence might underlie loss-of-function, we compared structural models of RVB active and RVG inactive PDEs, which are similarly diverged from the AncRVB/G model in both sequence identity and RMSD (Fig. 8 B and C).We found that modeled structures of RVB active and RVG inactive PDEs are similar (Fig. 8D) and therefore no obvious structural changes can explain the RVG PDE loss of function.Whether amino acid changes in the key conserved motifs are involved is unclear and an important topic for future experimental study.

Discussion
In this study, we traced and characterized the acquisition and evolution of viral PDEs.These proteins are among the clearest examples of acquisition of a host gene by RNA viruses, in contrast to more abundant examples among DNA viruses.Previous work examining the role of gene flow in RNA virus evolution has largely considered conserved genes such as capsids, RNA-dependent RNA polymerases, and helicases (26).In contrast, we characterize an example of gene transfer shaping virus interactions with host immunity, with potential implications for host-range and virulence.A recent report proposing bacterial origins for eukaryotic OAS genes, suggests that both sides of the interface between viral PDE and OAS-RNase L arose via HGT (27).Understanding how HGT has influenced virus-innate immune interfaces benefits from a better understanding of the emergence of newly acquired viral genes and their subsequent evolution.Here, we took advantage of recent advances in protein structure prediction to study the evolution of viral PDEs.The amino acid sequence of viral PDEs can undergo extensive substitution without loss of function so identity between viral and host sequences is often very low.This complicates phylogeny-only approaches to understanding the history of these proteins, which motivated our integration of sequence and structural prediction-based comparisons.Previous work has linked these proteins to an AKAP7-like ancestor (2,23) based on similarity to solved HsAKAP7, MHV NS2, and RVA PDE structures (15,18,19,28), but these analyses provide only a limited view of PDE evolution.
Our phylogenetic analysis is a major expansion on prior efforts (23), incorporating 173 PDE sequences from embecoviruses, merbecoviruses, toroviruses, alphacoronaviruses, RVAs, RVBs, and RVGs.The structure of the nidovirus PDE tree, including several long branches incongruent with the RdRp phylogeny, suggests multiple independent horizontal transfer events and one virus-to-virus transfer via recombination (Fig. 2B).Because of the phylogenetic incongruence between the nidovirus PDE and RdRp trees, there is no support to infer a single introduction into the nidovirus or coronavirus ancestor followed by loss on some branches.The relatively large RMSD values between embecovirus and merbecovirus PDEs and the absence of synteny further argue against a single introduction into even the betacoronavirus common ancestor.Comparison of the rotavirus PDE and C-terminal truncated VP3 phylogenies supports two introductions, one into an ancestral RVA VP3 and one into an RVB/G VP3 common ancestor (Fig. 2C).The syntenic but independent acquisition of PDEs by rotaviruses suggests that the VP3 segment is particularly tolerant of gaining new genetic material.Future studies could test this idea by experimentally recapitulating horizontal transfer events using the established rotavirus A molecular clone (13,29).
Our characterization of several horizontal transfer events of orthologous ancestral AKAPs and their common use in antagonism of the OAS-RNase L pathway has intriguing biological implications.The cellular role of the AKAP7 PDE remains mysterious, so whether these proteins were evolutionarily repurposed after viral acquisition is currently unclear.Importantly, these horizontal transfer events are recurrent but phylogenetically restricted, occurring multiple times only in rotaviruses and nidoviruses despite the fact that the OAS-RNase L pathway restricts diverse viral families (9).Among RNA viruses, constraint on genome length due to error rate and packaging capacity may be a limiting factor (30), which is alleviated among coronaviruses by their proofreading exonuclease and resultant large genomes.
The ability of some coronaviruses to tolerate insertions of PDE genes raises the question of why other coronavirus subgenera, including sarbecoviruses such as SARS-CoV-2, lack PDEs.Two possibilities can be proposed.One is that OAS-RNase L activity, and thus the selective pressure to encode an antagonist is cell-type specific (17,31).The cellular tropism of sarbecoviruses in their natural rhinolophid hosts is unknown, but it is possible that they primarily infect cells that lack a potent OAS-RNase L pathway, similar to MHV and liver hepatocytes (17), or cells that specifically do not express the prenylated p46 isoform of OAS1 that has been linked to restriction of SARS-CoV-2 replication (32)(33)(34)(35)(36).Alternatively, rhinolophid bats may have acquired loss-of-function mutations in OAS genes, as seen with primate OAS1 (37), eliminating the pressure for sarbecoviruses to antagonize the pathway.
Merbecovirus NS4b highlights the potential for highly variable evolutionary trajectories following acquisition of host PDE genes.They exhibit striking sequence divergence and structural deviation from both HsAKAP7 and other viral PDEs and extensive divergence between each other.Relative to other viral PDEs and like AKAP7 proteins, merbecovirus PDEs contain a NLS and appear in the nucleus in abundance (12,38,39).However, the merbecovirus NLS is bipartite in contrast to the monopartite AKAP7 NLS and shows no evidence of being derived from the cellular ancestral NLS, suggesting it arose de novo early after acquisition of the PDE.Concordant with NS4b localization to the nucleus, this PDE has additional functions in immune antagonism distinct from disruption of RNase L activation (12,(38)(39)(40)(41)(42)(43).This multi-functionality distinguishes NS4b from all other viral PDEs and likely subjects NS4b to unique selective pressures, presumably accounting for its volatile evolutionary history.
As the evolutionary history of viral PDEs comes into better focus, our understanding of the roles for PDEs in viral replication and impacts on fitness remains limited.With the exception of MHV, any fitness benefits provided by coronavirus PDEs has not been explored, especially during infection and transmission of viruses in natural hosts.When bat coronaviruses move into new hosts the loss of some accessory protein function is common, suggesting the advantages provided are host specific.The development of more sophisticated cell culture systems from a variety of species, such as primary epithelial cells (44,45) and bat intestinal organoids (46) opens the possibility of unlocking new insights into how the combination of HGT and natural selection shapes the evolution of RNA viruses and their hosts.

Phylogenetic Analysis.
Viral PDEs.All alignments (SI Appendix) were generated using the MAFFT plug-in (47) with default parameters in Geneious Prime v2022.2.1.Alignments were manually refined as necessary.ML trees with 100 replicates were generated using IQ-Tree 2 (48) on the Los Alamos National Laboratory server.Different substitution models (with 20 rate categories) were necessary for different alignments to maximize tree stability and branch support.The all-viral PDE tree is unrooted, as no clear outgroup exists to all viral PDEs.ML trees for individual clades of viral PDEs are rooted with HsAKAP7.Alignments (.fasta*) and IQ-Tree logs (.log*) containing detailed parameters and associated with every PDE tree can be found in the Supplementary Files.The ViralPDEs.fastafile contains accessions for all viral PDE sequences analyzed.Nidovirus RdRps.To generate the nidovirus RdRp tree (Fig. 1B), we aligned 145 nidovirus RdRp sequences representing toroviruses, alphacoronaviruses, betacoronaviruses, gammacoronaviruses, and deltacoronaviruses using MAFFT (default parameters) and generated a ML tree in IQ-Tree 2 using a GTR20 substitution model, 20 rate categories, and 100 bootstrap replicates.Raw files associated with the analysis are in SI Appendix.Rotavirus VP3.We aligned 137 rotavirus VP3 amino acid sequences using MAFFT and trimmed the PDE domains from the RVA, RVB, and RVG sequences in the alignment.We then used IQ-Tree 2 to generate a ML tree using a JTT substitution model, 20 rate categories, and 100 bootstrap replicates.ML tree visualization.All .treefileoutputs from IQ-Tree 2 were imported into FigTree v1.4.4 for visualization and coloring.Branch support values and labels were added manually in Adobe Illustrator 2023.
Sequence Motif Analysis.We used previously defined motifs (14,23) to identify motifs and guide manual scanning of the viral PDE/AKAP alignments.
Ancestral Sequence Reconstruction and Analysis.We generated alignments of each viral PDE group with the HsAKAP7 PDE domain (described above) and fed these alignments into FastML (49,50) along with the ML tree for each alignment generated with IQ-Tree 2. Ancestral sequences were then reconstructed using a JTT substitution model and other default parameters (optimize branch length, Gamma distribution, marginal reconstruction) and exported for analysis.The raw files (outFile_seq_marginal.txt*,outTreefileAncestor.txt*, out-TreeFileNewick.txt*)are available on FigShare.After downloading the inferred ancestral sequences, we identified major nodes and aligned them to HsAKAP7 to confirm that deeper nodes have higher amino acid percent identity to the eukaryotic reference PDE.We also analyzed evolutionary trajectory in conserved AKAP-like motifs.
To quantify the similarities (or differences) between different model structures, we conducted pairwise alignments in FATCAT 2.0 (53) to calculate RMSD values.Data, Materials, and Software Availability.Phylogenetic data, FastML data, AlphaFold files .txtfiles with raw data from ancestral sequence reconstruction raw data from phylogenetic analysis of viral PDEs .fastamultiple sequence alignments iqtree log files iqtree .treefilefiles Raw output from AlphaFold structural modeling .fastafile of amino acid sequences .pdbfiles of the highest confidence prediction for each sequence data have been deposited in FigShare (https://figshare.com/projects/Viral_PDE_Evolution/164854) (54).All other data are included in the maunscript and/or supporting information.

Fig. 2 .
Fig. 2. Phylogenetic and sequence analysis of viral PDEs reveals multiple independent acquisitions and conservation of ancestral motifs.(A) Unrooted ML Tree of viral PDE amino acid sequences showing five PDE acquisitions from a cellular ancestor (black arrows) and one acquisition via recombination (red arrow).(B) Unrooted ML tree of coronavirus and torovirus full-length RdRp amino acid sequences.(C) Unrooted ML tree of rotavirus VP3 amino acid sequences, with the C-terminal domain PDEs trimmed from the RVA, RVB, and RVG sequences.(D) Schematic showing the conservation or absence of AKAP-like motifs in viral PDEs.

HsAKAP7.Fig. 3 .
Fig. 3. Ancestral sequence reconstruction (ASR) of embecovirus and merbecovirus PDEs.(A) ML tree of 55 embecovirus PDE sequences, collapsed into major clades, rooted with the HsAKAP7 PDE domain sequence (AA 82-287) and sequence identity to HsAKAP7 of reconstructed sequences at major nodes.(B) ML tree of 27 merbecovirus PDE sequences, collapsed into major clades, rooted with the HsAKAP7 PDE domain sequence (AA 82-287) and sequence identity to HsAKAP7 of reconstructed sequences at major nodes.

Fig. 4 .
Fig. 4. Structural similarity between HsAKAP7 and ancestral nidovirus PDEs.(A-D) Overlay of the previously determined HsAKAP7 structure with the indicated ancestral nidovirus PDE predicted structures.The number of aligned amino acids (from the query sequence) and the RMSD were calculated using FatCat.

MerbecovirusFig. 5 .
Fig. 5. Comparative analysis of AncEmbeco and AncAlphaCoV PDEs.(A) Motif analysis with AncEmbeco as the reference.Amino acid differences in key motifs are bolded and in the color corresponding to the viral PDE.(B) Overlay and RMSD comparison of AncEmbeco NS2 and AncAlphaCoV NS2.(C) Overlay and RMSD comparison of AncEmbeco NS2 and AncHKU24 NS2.(D) Overlay and RMSD comparison of HsAKAP7 PDE and AncAlphaCoV NS2.(E) Overlay and RMSD comparison of HsAKAP7 PDE and AncEmbeco NS2.(F) Overlay and RMSD comparison of AncEmbeco NS2 and AncMERS NS4b.

Fig. 6 .
Fig. 6.Variable sequence and structural PDE diversity within Nidovirus subgenera.(A) Schematic of embecovirus NS2 at key ancestral nodes with catalytic motifs (orange), deletions (horizontal lines), and insertions (black) indicated along with protein length and amino acid identity to the AncEmbeco sequence.(B) Overlay of AlphaFold predicted structures of the reconstructed AncEmbeco and derived AncGrimso sequences with amino acid identity and the RMSD as calculated in FatCat.(C) Overlay of AlphaFold predicted structures of the AncMHV and AncGrimso sequences with amino acid identity and the RMSD as calculated in FatCat.(D) Schematic of merbecovirus NS4b at key ancestral nodes with catalytic motifs (orange), deletions (horizontal lines), and insertions (black) indicated along with protein length and amino acid identity to the AncMerbeco sequence.(E) Overlay of AlphaFold predicted structures of the AncMerbeco sequence and a shallower node (AncHKU25) with similar AA identity to each other as between the sequences in panel B, with RMSD from a FatCat comparison.(F) Overlay of AlphaFold predicted structures of shallower ancestral nodes with different AA sequence identity and RMSDs from FatCat comparisons.

Fig. 7 .
Fig. 7. Ancestral sequence reconstruc tion and structural analysis of rotavirus PDEs.(A) Ancestral sequence reconstruc tion (ASR) of RVA PDEs at major nodes with AA identity to HsAKA7.(B) Overlay of the AncRVA PDE structure predicted in AlphaFold with the solved structure of the HsAKAP7 PDE.(C) ASR of RVB/G PDEs at major nodes with AA identities to the HsAKAP7 PDE sequence.(D) Overlay of the AncRVB/G PDE structure predicted in AlphaFold with the solved structure of the HsAKAP PDE.(E) Overlay of the AncRVA PDE structure modeled in AlphaFold with the structural model of the AncRVB/G PDE.(F) Overlay of the AncRVB Porcine clade PDE model structure and AncRVG Waterbird clade PDE structural model.

Fig. 8 .
Fig. 8. Sequence motif and structural analysis of RVB and RVG functional divergence.(A) ML tree of RVB and RVG PDEs, with previously assayed PDEs (15) indicated.(B) Overlay and similarity metrics for the predicted structures of the AncRVB/G PDE and the active RVB PDE tested by Ogden et al. (15).(C) Overlay and similarity metrics for the predicted structures of the AncRVB/G PDE and the inactive RVG PDE tested by Ogden et al. (15).(D) Overlay and similarity metrics for the predicted structures of the active RVB PDE and the inactive RVG PDE tested by Ogden et al.(15).RMSDs for all overlays were calculated in FatCat.

Figure 1
Figure Design.Figure 1 was produced using BioRender.All other figures were produced Adobe Illustrator 2023.