Previous Article |
Table of Contents
| Next Article
BIOLOGICAL SCIENCES / EVOLUTION
Novel sex pheromone desaturases in the genomes of corn borers generated through gene duplication and retroposon fusion




*Department of Entomology, New York State Agricultural Experiment Station, Cornell University, Geneva, NY 14456;
National Center for Agricultural Utilization Research, Agricultural Research Service, U.S. Department of Agriculture, Peoria, IL 61604; and
Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 4259-B21 Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Japan
Contributed by Wendell L. Roelofs, January 17, 2007 (received for review December 11, 2006)
| Abstract |
|---|
|
|
|---|
11 and
14 desaturase genes exist in the genomes of the European and Asian corn borers (Ostrinia nubilalis and Ostrinia furnacalis, respectively). Furthermore, an entirely novel class of desaturase gene has arisen in the Ostrinia lineage and is derived from duplication of the
11 desaturase gene and subsequent fusion with a retroposon. Interestingly, the genes have been maintained over relatively long evolutionary time periods in corn borer genomes, and they have not been recognizably pseudogenized, suggesting that they maintain functional integrity. The existence of cryptic desaturase genes in moth genomes indicates that the evolution of moth sex pheromone desaturases in general is much more complex than previously recognized.
Ostrinia | phylogeny | pseudogene | biosynthesis | evolution
9 (C16 > C18);
9 (C18 > C16);
9 (C14–C26);
10,
11,
12; and
14. One important question for researchers studying moth sex pheromones is how this unusual sex pheromone desaturase multigene family evolved and diversified.
Understanding how multigene families arise, and diversify in general, has been a subject of considerable interest over the last 40 years (3). The moth sex pheromone desaturases constitute a particularly interesting multigene family because they encode enzymes that are potential players in the establishment of barriers to reproduction and, therefore, could ultimately contribute to speciation (4). For example, the recruitment of the
14 enzyme into the female sex pheromone biosynthetic pathway of the Asian corn borer (ACB), Ostrinia furnacalis, resulted in the establishment of a novel sex pheromone blend that ultimately led to the divergence of this species from the European corn borer (ECB), Ostrinia nubilalis. Because sex pheromone desaturases are key players in moth reproduction, an understanding of the mechanisms that are responsible for generating their diversity may shed light on the molecular processes that underlie speciation in this group of insects. Thus, we initiated research to examine the diversity of sex pheromone desaturase genes and their patterns of evolution in the genomes of the ECB and ACB, two representative moth species for which a considerable amount of knowledge concerning their pheromone blends and chemical communication systems has been amassed over the past few decades (1, 4–9). Surprisingly, we found that an even larger number of "cryptic" sex pheromone desaturase genes exist within corn borer genomes and that these novel genes were generated as a result of a fusion event with a retroposon.
| Results |
|---|
|
|
|---|
14 gene sequences and 10
11 desaturase genes. Two of the three
14 genes possessed nearly identical nucleotide sequences [uncorrected p distance (p) = 0.0003], but they possessed divergent upstream promoter regions (p = 0.54). The sole nucleotide difference resulted in an amino acid change at the eighth codon (excluding the start codon). A third gene was much more divergent from the other two sequences [average p distance (
) = 0.23] at the nucleotide level and also showed a number of amino acid changes spread throughout the deduced protein sequence. The 10
11 desaturase genes consisted of 5 that were fully intact and 5 that were truncated. The five intact genes possessed three exons and two introns. The exon regions of two of these genes matched the published ECB
11 sequence (4). The only differences between these two sequences were found in intron 2, in which they differed in length by 87 nucleotides and diverged at an additional 73 nucleotide sites (p = 0.08). We denoted the gene containing the shorter intron as "S" and the gene containing the longer intron as "L." The other three intact sequences were very divergent from the published ECB
11 sequence (p = 0.287). One of these, which we called "ECB 
11," had four nonsense mutations within the first 65 codons. The remaining five
11 genes lacked exon 1 but possessed two exons and two introns that were homologous to exons 2 and 3 and introns 1 and 2 of the intact
11 genomic sequences (Fig. 1). The truncated genes were identical to each other in these regions but possessed distinct upstream regions. In addition, they were highly divergent from the two intact genes at their homologous regions (p = 0.40).
|
14 genes and five
11 genes. The coding regions of the two
14 genes differed only on the basis of their intron 3 sequences, which were highly divergent (p = 0.478). The remaining exon/intron regions were identical. With respect to the
11 gene complement, three were fully intact, whereas the other two were truncated. Of the three intact genes, only one matched the published ACB
11 mRNA sequence. In contrast, the other two intact genes and the two truncated genes were highly divergent from the intact gene (p = 0.58, range = 0.39–0.46) and also substantially divergent from one another (p = 0.34, range = 0.25–0.51) at their exon and intron regions. One of the truncated genes, which we called "ACB 
11," had two nonsense mutations within the first 20 codons. The other truncated gene was identical in structure (Fig. 1) to the ECB truncated genes in that it was missing a homologue of exon 1 but possessed exons 2 and 3 and introns 1 and 2 of the intact ACB
11 genomic sequence.
As mentioned previously, the truncated
11 gene sequences in both the ECB and ACB showed a considerable amount of nucleotide sequence divergence from the intact genomic sequences. Interestingly, the results from BLAST analyses indicated that the 5' upstream region corresponded to a reverse transcriptase (RT) ORF in the ECB truncated genes and the single ACB truncated gene that was similar in structure to the ECB truncated genes. A phylogenetic analysis revealed the ORF to be a long interspersed nuclear element (LINE) related to the RTE-1 LINE family of Caenorhabditis elegans (10) and previously unknown LINE families from the purple sea urchin Strongylocentrus purpuratus and the silkworm moth Bombyx mori (Fig. 2). The ECB–ACB LINE differed from these other LINEs by a substantial amount at the amino acid level: p = 0.63, range = 0.61–0.65 in comparison with C. elegans RTE-1; p = 0.55, range = 0.53–0.58 in comparison to the LINE from S. purpuratus; and P = 0.26, range = 0.24–0.32 in comparison with the LINE from B. mori. Thus, we have designated the ECB–ACB LINE as a new family known as ezi, which is the colloquial Mandarin Chinese word for "moth." In addition, we designated the LINE from B. mori as a different family known as kaikoga, which is the Japanese word for "silkworm moth," B. mori.
|
11
element (Fig. 1) encodes an endonuclease domain. The structure of the kaikoga element is also shown in Fig. 1. This element possesses 5' and 3' UTRs as well as an endonuclease domain. We were not able to determine the number of copies of the ezi element that exist within the ECB or ACB genomes. However, multiple BLAST searches of the B. mori genome database using the B. mori kaikoga element revealed the presence of at least 200 copies within this genome, although none of the copies were associated with a sex pheromone desaturase. Similar searches of the B. mori genome database using the ezi element indicated that this family is not present in the B. mori genome.
A phylogenetic analysis of all
11 desaturase genes from both the ACB and ECB, along with the sequences from several other representative species, is shown in Fig. 3. There are four groups of corn borer
11 genes evident in this phylogeny: (i) an ezi-
11
group that is composed of genes lacking exon 1 of the fully intact
11 gene; (ii) an ezi-
11
group that is composed of fully intact
11 genes but whose
11-homologous region is highly divergent from the "normal"
11 gene; (iii) a group consisting of the 
11 pseudogene; and (iv) the normal
11 gene group. It should be pointed out that the ACB member of the ezi-
11
group lacks an ezi LINE region and contains only a
11 homologous region. This finding could have resulted if the duplication event that gave rise to this gene occurred in the common ancestor of the ECB and ACB followed by insertion of the ezi segment only in the ECB subsequent to its divergence from ACB. Alternatively, the ezi region could have been present in this gene in the common ancestor of the ECB and ACB followed by excision from the gene in the ACB subsequent to its divergence from the ECB. In light of the fact that retroposons normally do not excise themselves once they insert in a genomic location (11), this scenario seems unlikely.
|
11 gene and was subsequently amplified. After the ezi element was integrated into the ancestral
11 duplicate, a region of homology was established between the new ezi-
11 fusion gene and other ezi elements in other parts of the genome. Subsequent unequal crossover events occurred, resulting in amplification of the ezi-
11 fusion genes. This mode of amplification could also result, from time to time, in excision of the ezi element from certain fusion genes (as in the case of the ACB ezi-
11
gene) or incorporation of an extra ezi element into an existing fusion gene (as in the case of two ECB ezi-
11
genes) (Fig. 1). To determine when the fusion event between the ancestral
11 and the ezi LINE took place, we attempted to use a standard molecular dating method (12) based on the formula d = 2rt, in which d is the level of nucleotide sequence divergence, r is the rate of substitution, and t is the time since divergence. If we examine the level of divergence at synonymous sites by using the modified Nei–Gojobori method (13), we find that the average level (
S) between the normal
11 and ezi-
11 genes is 0.927 ± 0.145, which is quite high and well above the saturation level. This considerable level of divergence at synonymous sites suggests that the ezi-
11 genes are older than the most recent common ancestor of the ECB and ACB. When we compared intact ACB vs. ECB
11 gene sequences with each other and ACB vs. ECB ezi-
11 gene sequences with each other, we found a difference in rate at synonymous sites (1.3 x 10–8 for the former and 4.1 x 10–8 for the latter assuming a date of 1 Mya for the ECB–ACB divergence). This disparity in evolutionary rate confounds attempts to date the origin of the ezi-
11 gene fusion event. Nevertheless, if we attempt to place a date by using the faster ezi-
11 gene rate, we obtain a date of 11.3 ± 1.8 Mya. The use of the slower
11 rate produces a date of 35.7 ± 5.6 Mya. Alternatively, if we assume that the rate of synonymous site variation among nuclear genes in Drosophila (15.6 substitutions per site per 109 years) (14) is about the same as it is in the ECB and ACB, we can use this rate to obtain a date of 29.7 ± 4.7 Mya, which is closer to the
11 rate. When nucleotide sites have reached the saturation level, it is better to use the deduced amino acid sequence for molecular dating because it evolves more slowly (15). There are no amino acid substitutions between the normal
11 genes of the ECB and ACB, but there are several between the ECB and ACB ezi-
11 genes. The rate of deduced amino acid substitution between these genes is 1.8 x 10–8. If we use this rate along with the Poisson-corrected amino acid distance estimate (16) between the normal
11 and ezi-
11 genes (P = 0.305 ± 0.041), we obtain a corresponding date of 8.5 ± 1.1 Mya, which is close to the date of 11.3 ± 1.8 Mya calculated by using the faster ezi-
11 gene rate.
|
11 genes retain their functionality. If these genes had reverted to a nonfunctional state, we would have seen evidence of a shift toward neutral evolution in which dS would not be statistically different from the levels of nonsynonymous substitutions per site dN. However, the difference in magnitude between dS and dN is consistent with a pattern of purifying selection because the results of statistical tests (Z-test and Fisher's exact test) were significant for dS > dN in all comparisons between the ACB and ECB ezi-
11 gene duplicates themselves as well as between them and the normal ACB and ECB
11 genes. To examine this in more detail, we conducted several simulation analyses aimed at examining the process of pseudogenization. In our simulations, we found that t1/2 (the time required for an intact ORF to be interrupted in half of the simulation replications) to be 0.12 million years by using the inferred ezi-
11 substitution rate. The resultant probability that an ezi-
11 gene retains its ORF was thus found to be 4.5 x 10–29, assuming a divergence date of 11.1 Mya, or 3.1 x 10–75, assuming a divergence date of 29.7 Mya. If we use the Drosophila substitution rate, we obtain t1/2 = 0.35 million years, which gives a probability of 2.9 x 10–10, assuming a divergence date of 11.1 Mya, or 2.8 x 10–26, assuming a divergence date of 29.7 Mya. Because the ORFs have remained intact in the face of such a small probability that they would not, under a model of neutral evolution, they must be subject to purifying selection. Similarly, in our simulations designed to test whether the observed number of frameshift mutations is equal to or less than what would be expected by chance, we found that ezi-
11 functional integrity is probably maintained (Pdis < 0.001). We also simulated whether the ratio of the observed number of nonsynonymous (NA) and synonymous (NS) substitutions deviates from the neutral expectation over the evolutionary time frames inferred above and found that the result was highly significant in either case (PNa/Ns < 0.001). | Discussion |
|---|
|
|
|---|
11 pseudogene in both the ACB and ECB genome, the 
11 gene (Fig. 3). Yet, the more interesting findings concern the presence of several duplicate
11 and
14 genes in both the ECB and ACB genomes.
Elucidating the mechanisms by which new genes originate and gain new function(s) has been a subject of intense interest in the study of multigene families and evolutionary genomics over the past several decades (3, 12, 14, 18, 19). The recruitment of transposable elements into gene duplicates is believed to be one mechanism by which gene duplicates can evolve new functions (20, 21). This process has been shown to play a role in the generation of new genes in a variety of animal and plant taxa (20–26). One way that this type of cooption occurs is through fusion or chimerism in which the mobile element is incorporated into a gene duplicate and the chimeric gene subsequently takes on a new function if it survives in the genome. In this study, we have identified several genes in the ECB and ACB that were derived from the fusion of a LINE with a
11 sex pheromone desaturase gene. The LINE, which we call ezi, represents a novel family of retroposons related to the RTE-1 family in C. elegans (10).
To explain the origin these fusion genes, we hypothesize that a
11 gene was duplicated in an ancestor of the ECB and ACB. Our molecular dating analyses suggest that this event took place some time during the Miocene. In light of our results, we feel confident in saying that most, if not all, members of the genus Ostrinia should possess ezi-
11 genes. However, it is less clear whether other members in the same subfamily (Pyraustinae) or family (Crambidae) will also possess these genes. Obviously, future laboratory investigations will be required to answer this question. After this initial duplication event occurred, there appear to have been two distinct fusion events: one for the ezi-
11
genes and one for the ezi-
11
genes. However, the ezi element was integrated into different positions within the desaturase-homologous region of the ezi-
11
gene (i.e., in intron 1) versus the ezi-
11
gene (i.e., upstream of exon 1) (Fig. 1). After these integrations took place, unequal crossover resulted in amplification of the new ezi-
11 fusion genes (Fig. 4). It should be pointed out that the occurrence of independent integrations in the ezi-
11
versus ezi-
11
genes suggests that there is a "signal" sequence in the 5' region of the normal
11 gene that facilitates the insertion of retroposons. If this is the case, it could be that more LINE-desaturase fusion genes exist in the genomes of other moth species. An important question is whether the ezi-
11 gene duplicates are pseudogenes.
We did not find any evidence to suggest that the ezi-
11 genes in the ECB and ACB are classical pseudogenes because their reading frames are intact. Of course, other mechanisms of pseudogenization are possible (e.g., promoter nonfunctionalization). Yet, if ECB and ACB are classical pseudogenes, it is surprising that the ezi-
11 gene duplicates have remained intact for at least over 1 million years since the divergence of the ECB and ACB. One would have expected some sort of frameshift or nonsense mutation to have occurred in the reading frames of these genes over the course of this time period. In contrast, our analyses strongly suggest that these genes have retained functionality and have been subjected to purifying selection during their evolution. Thus, the function(s) of these genes and whether or not they influence sex pheromone biosynthesis should be investigated. Such studies would provide an interesting opportunity to learn how LINEs contribute to the evolution of novel gene sequences.
What, then, are the implications concerning these ezi-
11 fusion genes as well as the other cryptic
11 and
14 genes in corn borer genomes? Certainly, the existence of cryptic sex pheromone desaturases in moth genomes indicates that their evolution is much more complex than previously recognized. In addition, if these genes are functional, or possess the capacity to become functional, they could potentially serve as raw material from which new pheromone blends could arise if the genes were coopted into sex pheromone biosynthesis pathways. Of course, it is entirely possible that these genes do not function in sex pheromone biosynthesis at all, and they may have been coopted to perform some other unrelated function. Exploration of these possibilities would provide important insights into how novel gene functions arise in genomes in general and, in this particular case, how it affects the process of mate attraction in moths.
| Materials and Methods |
|---|
|
|
|---|
DNA sequences from all clones were edited by using Lasergene sequence analysis software (DNASTAR, Madison, WI) with minor editing after visual inspection. The edited sequences were then used to query both the entire GenBank database and the complete genome databases of B. mori, Aedes aegypti, Anopheles gambiae, and Drosophila melanogaster by using BLAST searches targeting both nucleotide and protein sequences. The nucleotide sequences for the closest matches for each of the clones were compiled into an alignment with the nucleotide sequences of the ECB and ACB desaturase clones. The alignment was created with the computer program CLUSTAL-X (27), followed by visual inspection and manual adjustment.
Phylogenetic analyses were conducted by using the maximum likelihood (ML) and neighbor-joining (NJ) methods. The former were conducted by using the computer program PHYLIP 3.65 (28), and the latter were conducted by using the computer program MEGA 3.1 (29). For phylogenetic analysis of
11 plus ezi-
11 sequences, trees were constructed from nucleotide sequences by using the F84 +
model for ML analyses and the Tamura-Nei +
model for NJ analyses. In both cases, the
shape parameter (
) was 0.7, which was estimated by using the maximum likelihood method (16). For phylogenetic analysis of LINE sequences, trees were constructed from protein sequences by using the Jones–Taylor–Thornton (30) model for ML analysis and the Poisson model (16) for NJ analysis. The statistical reliability of internal branches was assessed by using 1,000 bootstrap pseudoreplicates for ML analyses and 1,500 pseudoreplicates for NJ analyses.
The method of Zhang and Webb (31), as implemented in the program PSEUDOGENE (31), was used to compute the rate at which an ORF becomes disrupted by using both the inferred ezi-
11 substitution rate and the Drosophila synonymous substitution rate (14). We assumed that the insertion–deletion rate was 15% of this estimate (32). By evaluating the magnitude of the probability that an ezi-
11 gene retains an intact ORF in relation to the divergence time, we then gauged the relative likelihood that the gene has remained subject to selective constraints. The method of Dupanloup and Kaessmann (33), as implemented in the program ReEVOLVER 1.0 (33) was used to compute the probability (Pdis) that the observed number of frameshift mutations is equal to or less than what would be expected by chance and the probability (PNa/Ns) that the ratio of NA to NS deviates from the neutral expectation. Both of these probabilities are computed through comparison of the observed value with the frequency distribution generated through simulation. The simulation requires the reconstruction of ancestral sequences, which we conducted by using the likelihood method (34) as implemented in ReEVOLVER. We used the inferred ezi-
11 substitution rate and assumed that the insertion–deletion rate was 15% of this estimate.
| Acknowledgements |
|---|
|
|
|---|
| Footnotes |
|---|
Abbreviations: ACB, Asian corn borer; ECB, European corn borer; LINE, long interspersed nuclear element; ML, maximum likelihood; NJ, neighbor-joining.
To whom correspondence should be addressed. E-mail: wlr1{at}cornell.edu
Author contributions: B.X. and W.L.R. designed research; B.X. performed research; B.X., A.P.R., M.K., and N.O. analyzed data; and B.X., A.P.R., and W.L.R. wrote the paper.
The authors declare no conflict of interest.
Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. EF113390–EF113404 and EF125923–EF125927).
© 2007 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||