Previous Article |
Table of Contents
| Next Article
BIOLOGICAL SCIENCES / MICROBIOLOGY
Kinetic analysis of a complete poxvirus transcriptome reveals an immediate-early class of genes






*Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA 92037;
Department of Molecular Biology and
DNA Array Core Facility, The Scripps Research Institute, La Jolla, CA 92037;
Department of Microbiology, University of Alabama at Birmingham, Birmingham, AL 35294; and ¶School of Biochemistry and Molecular Biology, Australian National University, Canberra ACT 0200, Australia
Contributed by Howard M. Grey, December 12, 2007 (received for review November 9, 2007)
| Abstract |
|---|
|
|
|---|
gene transcription | genome tiling array | microarray | vaccinia virus
Like other orthopoxviruses, VACV has a linear, double-stranded DNA genome that is nearly 200 kb long. The most common laboratory strain of VACV, Western Reserve (VACWR), is predicted to encode 223 ORFs. This number includes 12 ORFs (VACWR-001 through VACWR-012 and VACWR-218 through VACWR-207) that are repeated at each end of the genome, leaving a total of 211 unique ORFs.
VACV has a broad cellular tropism in vitro and potential host range in vivo, but there is no clearly identified animal reservoir for the virus in nature. As a result of an exclusively cytoplasmic life cycle, VACV encode its own enzymes and proteins required for gene transcription, genome replication, virion production, and morphogenesis and, for the most part, does not depend on host cell proteins for these processes. In addition, VACV infection induces a rapid and massive shutdown of host gene expression that acts at several levels (1–4).
Expression kinetics have been described for a variety of VACV genes. These studies, and work done to define promoters and transcription complexes, have led to the definition of four temporal gene classes and three distinct promoter types. The promoters have been named early, intermediate, and late, with each promoter associated with one gene class (5). In addition, some genes have elements of early and late promoters in their upstream region, giving rise to a fourth class referred to as early/late.
The virion contains early transcription complexes bound to DNA in position to promote transcription of early genes upon entry into host cells (6, 7). Despite analogy with other large, double-stranded DNA viruses, such as herpes viruses, and some use of the term in the literature (8), an immediate-early class of genes has not come to be included in the established model of poxvirus gene expression (5, 9, 10).
In terms of broad functions, besides proteins required for expression of the subsequent gene classes, early genes tend to encode virulence factors for modulation of host responses (11–14). Proteins encoded by late genes are generally required for virion morphogenesis and structure or early gene transcription factors packaged in newly made viral cores (15–17).
Although there is a wealth of information in the literature regarding VACV gene expression, this knowledge is highly fragmented with respect to experimental settings and methodologies used, and substantial gaps exist. Experimental data showing expression kinetics are only available for approximately two-thirds of the annotated ORFs, leaving predictions based on canonical promoter motifs as the only guide for 46 ORFs (www.poxvirus.org). For 23 ORFs lacking prototypic promoter sequences, even predictions are not possible.
Although microarrays have been used to show viral gene expression (18, 19), those arrays did not allow resolution between the ORFs and the untranslated regions. Tiling arrays, however, provide a comprehensive and unbiased sampling of transcriptional activity by using overlapping probes covering the entire region of interest (20). In this study, we simultaneously measured transcription of the entire VACV genome during infection and derived a complete picture of gene expression. The close relationship between the orthopoxviruses makes it reasonable to assume that the findings apply across the group (10).
| Results and Discussion |
|---|
|
|
|---|
An overview of the tiling array technique is presented in Fig. 1A. It shows probe signal intensity from a sample taken 2 h postinfection (hpi) covering
7 kb of the forward strand including VACWR-124 and -127. The signal intensities of probes lying completely within an ORF are pooled to calculate the median signal intensity and used for significance testing. Visual examination of the data provided confidence in the power of the method to resolve adjacent ORFs and also revealed unexpected complexities in VACV transcription. For example, a stretch of high signal continues well past the end of VACWR-027, indicating that this early transcript has an uncharacteristically long 3' untranslated region (Fig. 1A).
|
The reproducibility of the technique was assessed by using biological replicate RNA samples from 1, 2, 4, and 8 hpi. Pearson correlation coefficients (R) for median ORF expression levels were between 0.93 and 0.99. A representative scatter plot from two independent 2-hpi RNA samples is shown in Fig. 1B. In addition, analyzing RNA extracted from uninfected HeLa cells detected no VACV gene expression (SI Table 1), demonstrating that there was no significant cross-hybridization between the VACV probes and human RNA. Finally, we benchmarked our results against data from the literature for a set of genes with well established kinetics (three early and three late). To compare expression profiles between individual ORFs, the expression values were standardized. In concordance with their known expression profile, the expression of the three early genes (VACWR-034, -059, and -190) (23–25) was initiated at 0.5 hpi and peaked at 2 hpi (Fig. 1C). In contrast, expression of the three late genes (VACWR-052, -101, and -150) (26–28) was initiated at
2–4 hpi and peaked or was still rising at 8 or 24 hpi (Fig. 1D). Taken together, these results support our experimental approach and illustrate its high throughput, reproducibility, and general agreement with established methods.
The Vast Majority of Annotated VACV ORFs Are Expressed During the Viral Cycle. Median expression values and associated P values were determined for all annotated ORFs (SI Table 1). In terms of overall VACV transcription, within 30 min after infection 61 genes were transcribed, and by 1 hpi another 32 genes were transcribed (Fig. 2). By 24 hpi, 197 genes (93%) had been transcribed at some point during infection. Transcripts from 14 ORFs were never detected in this analysis. Seven of these were reported to be expressed, but 4 (VACWR-092, -097, -134, and -162) were detected by using other cell lines (29–31). The study demonstrating expression of VACWR-064 used a higher viral dose (MOI = 50) than what was used in our analysis. In fact, at 4 hpi there was border-line expression of VACWR-064 (P = 2.5 x 10–5) in our analysis, suggesting that the lack of expression might be due to a lower sensitivity in our assay. For VACWR-074 and VACWR-100 we have no explanation for the inconsistencies observed.
|
|
No "immediate-early/late" gene class was identified, suggesting that immediate-early genes are distinct from the early genes. When early promoter sequences (either verified or putative) for the two classes were compared, only minor differences were seen in the consensus promoter sequences, primarily just upstream of the transcription start site. Thus, although only one type of early promoter has been identified, it is possible that minor changes in promoter sequences or other as-yet-unrecognized sequences allow transcription complexes to preferentially assemble on the promoters of immediate-early genes in virions. Another possibility is that hitherto-unrecognized enhancer elements exist for the immediate-early genes. This is not unreasonable because the alternative would be that the 134 genes with early promoter elements will be transcriptionally active upon initiation of infection, giving the virus relatively poor control over gene expression in the early stages of infection.
We would also like to point out that there is a difference between defining "functional" versus "kinetic" classes of viral genes. Early poxvirus genes are traditionally defined functionally because they are expressed before viral DNA replication. The term immediate-early used here is a kinetic description of preferential expression before the early genes, and no functional distinction has been made between these classes. Either way, our identification of an immediate-early class of genes forces a revision of the current paradigm of poxvirus gene expression.
A second surprise of the cluster analysis was the lack of an intermediate gene class. A few genes in the late class were expressed at maximum levels at 4 hpi, which may suggest that they are intermediate genes. However, even when the analysis was biased in favor of detecting an intermediate class, it was never identified with statistical significance. Genes reported to belong to an intermediate class or having intermediate promoters were either distributed among the immediate-early (VACWR-072), early (VACWR-077), and late (VACWR-119 and VACWR-120) classes, or did not fit well into any class (VACWR-086) (SI Table 2). Some genes with intermediate promoters also have additional promoter elements in their upstream regions (29, 33). The complexity introduced by promoter combinations might explain why they fail to form a distinct cluster. It is also possible that shorter time intervals would be required for distinction between intermediate and late genes (34).
As a test of the biological relevance of the cluster analysis, we examined whether the expression of genes in the late class depended on viral DNA replication. The ORF expression levels were determined at 8 hpi in the presence and absence of Cytosine beta-arabinofuranoside (AraC), a nucleoside analog commonly used to block DNA replication. Of the 60 late genes, the transcription of 52 were either completely (n = 33) or partially (reduced by
30%, n = 19) blocked upon AraC treatment (SI Table 2). In contrast, transcription was not inhibited by AraC for any of the 134 genes from the other classes (SI Fig. 7). These results confirm that late gene transcription by VACV depends, for the most part, on DNA replication and strengthen the validity of the clustering analysis.
Tiling Array Analysis Is Largely Unaffected by Run-On Transcription. The current data processing does not distinguish between run-on and true transcription. Having assigned genes into kinetic classes, the impact of run-on transcription from upstream genes was assessed at 8 hpi. ORFs were divided into four groups based on the expression kinetics of their upstream neighboring ORF: Early ORFs located after early ORFs, early after late, late after early, and late after late. Median transcript levels for each ORF were compared with those of the 100-nt untranslated regions on either end (SI Fig. 6B). ORFs lacking flanking untranslated regions were not included. This showed that irrespective of the kinetic class of the upstream neighbor, ORF transcripts were, on average, much more abundant than transcripts from the flanking regions. Nevertheless, these analyses will be further refined to perform de novo transcript mapping of the virus. That will help to discover novel transcripts and more precisely determine the boundaries of transcription initiation and termination.
Functions of Genes in the Different Kinetic Classes. After the clustering of the genes by their expression profiles, a detailed analysis of gene functions within each kinetic class was performed. Each gene was assigned to one of five functional categories based on literature annotations and predicted functions (SI Table 3): DNA replication, immune evasion/virulence, transcription, virion core proteins, and virion membrane proteins (SI Table 2). Genes of unknown function and genes with functions outside of these categories were excluded. In addition, for ORFs that are fragments of a larger ancestor gene, only the ORF proximal to the promoter was placed into the corresponding category, and the downstream ORF(s) were denoted as "pseudo" and excluded. The analysis showed marked differences between genes in the two earliest classes, which were mainly involved in DNA replication, evasion/virulence, and transcription, and the early/late and late genes, which had a larger proportion of genes encoding virion core and membrane proteins (Fig. 4A). Strikingly, more than half of the immediate-early genes were of unknown function (SI Table 2), suggesting that the genes expressed first during infection are not well studied.
|
2 hpi and the lack of genes required for replication in the late class are consistent with current knowledge of the orthopoxvirus replicative cycle. VACV encodes a broad range of proteins that contribute to virulence in vivo, and some have established roles in the modulation of host defense (35–37). We found that at least 27 of these were expressed (SI Table 2). These genes were most frequently found in the two earliest classes (Fig. 4B) but were also present in the later classes. This implies that VACV continues to express immunomodulatory proteins throughout infection to provide a more favorable environment for viral growth in the face of the host's immune response.
Gene products important for RNA transcription were also represented in all temporal classes but were most abundant in the early class. In agreement with the literature, we found that the RNA polymerase subunits were expressed by genes of the early classes. To reconcile this observation with the presence of these proteins in virions, we looked at their expression levels later in infection and noted that although they peak at 2 hpi, levels remained relatively high at 8 hpi.
In general, genes coding for structural components of the virion core or membrane were largely expressed late, but there were exceptions. VACWR-156 (membrane phosphoglycoprotein) was immediate-early; VACWR-051 (involved in plaque and EEV formation) and VACWR-159 (transmembrane phosphoprotein) were early; and VACWR-181 (membrane glycoprotein) and VACWR-187 (EEV membrane glycoprotein) were early/late. Viral membrane proteins are often also expressed on the surface of infected cells (38). The broad expression time and cellular localization suggests that these proteins might play more than one role during infection. In fact, VACV proteins with dual functions are described in refs. 39 and 40.
Immediate-Early Genes Are More Highly Expressed than Genes of Other Classes. To this point, our analysis has been based on relative expression levels. Next, we compared absolute median expression levels amongst the four temporal classes identified. The most striking result was that RNA levels for the immediate-early genes were significantly higher (P = 0.003–0.023) than for genes of the other three temporal classes throughout infection (Fig. 4D). Indeed, despite a dramatic decline in levels after the peak at 2 hpi, immediate-early transcript levels still exceeded those of late genes up to 24 hpi. These data further support the division of immediate-early and early genes. Looking at individual genes, there were several exhibiting expression levels clearly higher than the average in their class. Some examples are: immediate-early, VACWR-059 (double-stranded RNA-binding protein) and VACWR-184 (unknown); early, VACWR-018 (unknown); early/late, VACWR-131 (core protein); and late, VACWR-169 (unknown). Because of their exceptionally high expression levels, these genes might be of special interest for future investigations.
The signal intensities were also compared for genes possessing different functions. This showed that RNA levels were highest for genes involved in DNA replication, followed by those associated with evasion/virulence and transcription through 4 hpi (Fig. 4C). As expected, given their predominance in the late gene class, levels for virion core and membrane ORFs did not dominate until 24 hpi.
VACV Transcriptome: Filling Experimental Knowledge Gaps Regarding VACV Gene Transcription. A comparison was made between the tiling array data and published data from the poxvirus database (SI Table 2). Of the 197 genes transcribed in HeLa cells, 20 were not previously shown to be expressed and lacked association with a typical early or late promoter. Their transcription was studied in more detail. We assume that an ORF is expressed from its promoter (and not merely due to run-on transcription) if it fulfills at least one of the following criteria: (i) ORF expression level is higher than that of the upstream region, (ii) ORF encodes known epitopes (therefore demonstrating translation), or (iii) ORF exhibits different expression kinetics than the upstream ORF. From this, it was concluded that at least 8 of the 20 ORFs exhibit true transcription.
Quantitative reverse-transcriptase PCR (qRT-PCR) was used to validate the transcription of these 20 genes and another 13 genes sampled from the four classes. This showed that all 33 genes generated detectable qRT-PCR products (SI Table 2). There was a significant correlation (P
10–7) of the fold-change in transcript levels between 2 and 8 hpi obtained with the tiling array analysis and qRT-PCR (SI Fig. 8). This suggests that tiling arrays can be used as a semiquantitative measurement of gene expression and that the signal strengths derived for individual ORFs here are a good representation of relative gene expression levels.
Of the 125 genes with kinetic expression previously described, 117 (94%) were found to be concordant with our data (SI Table 2), whereas 8 genes (VACWR-042, -071, -099, -111, -115, -142, -181, and -191) showed conflicting results. These discrepancies might be explained by methodological differences, such as the use of a different viral dose or cell type, which may alter the expression profile. Furthermore, we provide empirical evidence of the kinetic expression for 62 of the remaining ORFs, for which data were lacking. This underscores the advantages of and necessity for the simultaneous study of expression of all genes using a systematic approach.
To get an overview of transcription throughout the genome, a map of the transcriptome was constructed (Fig. 5). Visual inspection supported by statistical analysis (see Materials and Methods) revealed a strong preference for colocalization of genes within the same kinetic class. For all four classes, the occurrence of a neighboring gene within the same kinetic class was significantly higher than expected (SI Table 4). Six regions were identified in which genes of a certain class were significantly over-represented with respect to their overall genomic distribution (Fig. 5).
|
We have rendered the first complete picture of an orthopoxvirus transcriptome and demonstrated that the vast majority of annotated VACV genes are expressed. This also led to the surprising discovery of an immediate-early class of genes, more than half of which have unknown function despite being expressed at very high levels. Our study demonstrates the power of a genome-wide approach, compels a revision of the current understanding of orthopoxvirus gene regulation, and suggests many lines of investigation in orthopox virology and pathogenesis.
| Materials and Methods |
|---|
|
|
|---|
Infection and Flow Cytometry.
HeLa cells were incubated at 107 cells per 100 µl complete media with VACWR at an MOI of 10:1. After 60 min, 30 x 106 cells were distributed to 225-cm2 flasks with 30 ml of complete media and cultured at 37°C. The method of infection was tested by using a GFP-encoding VACV42) and/or staining with an anti-VACV serum (ViroStat) and analyzed by flow cytometry. AraC was used at 40 µg/ml during infection in some samples to block viral DNA replication. Expression was considered significantly decreased when median signal intensity was
30% reduced. Productive infection or replication block were tested by enumerating the plaques formed in the AraC-treated samples at 2 and 24 hpi.
RNA Preparation, Labeling, and Hybridization. Cells were harvested and resuspended in TRIzol and purified according to manufacturer's protocol. RNA clean-up was performed by using RNeasy columns. Ribosomal RNA was depleted by using the RiboMinus transcriptome isolation kit. NA samples were chemically labeled with biotin, using the ULS aRNA labeling kit, and hybridized to the arrays. Arrays were scanned by using the Affymetrix GeneChip Scanner 3000 7G and standard Affymetrix protocol as described in ref. 43.
Q-PCR. Q-PCR was performed as described in ref. 44. For each primer set, expression levels were quantified relative to that of human 18S rRNA. Confirmation of single amplicons and lack of primer-dimers were performed by melting-curve analysis and gel electrophoresis. Samples prepared in the absence of reverse transcription were run for each primer pair to confirm specificity and to exclude that the signal came from contaminating genomic DNA.
Affymetrix GeneChip NimbleExpress Tiling Array Design. The genome tiling array was built with NimbleGen probe synthesis technology. Packaging of the array was developed in collaboration with Affymetrix, using 25-mer probes covering both strands of the VACV genome (NCBI: AY243312.1) with a 4-nt spacing (97,334 probes). The array includes 15,308 negative control probes specific for Arabidopsis thaliana and 14,399 synthetic "antigenomic" probes with varying GC content.
Background Subtraction and Data Normalization. Probe level data were log2-transformed to stabilize their variance. The fluorescence signal resulting from nonspecific hybridization was subtracted by using the synthetic and A. thaliana probes as empirical estimators. As a strong dependence of nonspecific signal on GC content was observed (SI Fig. 9), the median background probe signal at the corresponding GC content was subtracted from the signal of each VACV probe. Probes with GC content <3 nt were not well represented and were grouped. After removal of nonspecific background, a dependence of the specific signal on GC content was still apparent. This was minimized by mapping all probe signals to their corresponding values in the empirical distribution of probes with a GC content equal to 8. To enable the direct comparison of signal intensities across different arrays, probe signals were quantile-normalized (45). For replicate samples, quantile-normalized signals for each probe were averaged.
Data Summarization and Significance Testing.
Median quantile-normalized probe intensities for probes lying completely within each ORF were used as representative ORF signal intensity. Probes for identical ORFs (VACWR-001 through VACWR-012 and VACWR-218 through VACWR-207) were combined. The significance of signal intensity in each ORF was calculated by using the binomial distribution (20). P
10–5 was used as a threshold, because this approximately corresponds to a value of 10–3 after adjusting for the number of tests performed (n = 223).
Conversion to Standard Units.
To observe the correlations between temporal expression patterns, the data were converted to standard units: Standard Units = (S – µ)/
. S = probe signal, µ = mean probe signal over the time course, and
= standard deviation.
Gene Clustering and Bootstrapping Analysis. Standardizedexpression profiles for each ORF were clustered by using the "cutHclust" function of the ClassDiscovery library (46) in the R statistical programming language (47). Data from 1–24 hpi were used. Data from uninfected cells and 0.5 hpi were excluded as their signal intensities were considerably lower and introduced excessive noise. All parameters were left at default values except for k, which was varied from 2 to 7. The "BootstrapClusterTest" function was used with values of k in the same range to determine the number of biologically relevant clusters. Other parameters included: cutHclust, metric = Pearson, and nTimes = 100. Data from 10 iterations were pooled. The robustness of clusters was measured averaging the number of times that two patterns ended up in the same cluster through 1,000 iterations on a subset of the data. A Pearson correlation test was run comparing expression values for each gene with the average for each cluster. R > 0.7 was considered as significant.
Gene Colocalization Statistics. The cluster of each gene and the kinetics of its neighbors was noted in genomic space. The position of each gene was shuffled randomly 10,000 times while keeping track of the neighboring genes' kinetics during each iteration. Each pairwise distribution of neighboring genes was calculated in this manner. The observed frequency of colocalization was converted into a Z score (and P value), using standard deviations obtained through the randomization process.
| ACKNOWLEDGMENTS. |
|---|
|
|
|---|
| Footnotes |
|---|
Freely available online through the PNAS open access option.
Author contributions: E.A. and J.A.G. contributed equally to this work; E.A., S.R.H., and A.S. designed research; E.A., J.A.G., M.S., and J.A.H. performed research; J.A.G., L.S., C.O., S.R.H., and B.P. contributed new reagents/analytic tools; E.A., J.A.G., V.P., R.C.H., E.J.L., J.S., H.M.G., B.P., and A.S. analyzed data; and E.A., J.A.G., D.C.T., H.M.G., and A.S. wrote the paper.
The authors declare no conflict of interest.
This article contains supporting information online at www.pnas.org/cgi/content/full/0711573105/DC1.
© 2008 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||