## New Research In

### Physical Sciences

### Social Sciences

#### Featured Portals

#### Articles by Topic

### Biological Sciences

#### Featured Portals

#### Articles by Topic

- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology

# Analysis of factorial time-course microarrays with application to a clinical study of burn injury

Contributed by Wing Hung Wong, March 8, 2010 (sent for review November 4, 2009)

## Abstract

Time-course microarray experiments are capable of capturing dynamic gene expression profiles. It is important to study how these dynamic profiles depend on the multiple factors that characterize the experimental condition under which the time course is observed. Analytic methods are needed to simultaneously handle the time course and factorial structure in the data. We developed a method to evaluate factor effects by pooling information across the time course while accounting for multiple testing and nonnormality of the microarray data. The method effectively extracts gene-specific response features and models their dependency on the experimental factors. Both longitudinal and cross-sectional time-course data can be handled by our approach. The method was used to analyze the impact of age on the temporal gene response to burn injury in a large-scale clinical study. Our analysis reveals that 21% of the genes responsive to burn are age-specific, among which expressions of mitochondria and immunoglobulin genes are differentially perturbed in pediatric and adult patients by burn injury. These new findings in the body’s response to burn injury between children and adults support further investigations of therapeutic options targeting specific age groups. The methodology proposed here has been implemented in R package “TANOVA” and submitted to the Comprehensive R Archive Network at http://www.r-project.org/. It is also available for download at http://gluegrant1.stanford.edu/TANOVA/.

Time-course microarray experiments are able to capture the temporal profiles for thousands of genes simultaneously to provide information on the dynamic changes of gene expression. A popular approach for analyzing time-course microarray data is to use clustering methods to group genes with similar temporal patterns (1–4). Cluster analysis may find functionally related genes, but the biological significance of the resulting gene clusters is often unclear. This is especially so when the time-course experiments are performed under different conditions that are defined by a set of experimental factors. Such experiments are called factorial time-course experiments.

The simplest factorial structure is obtained when there is only one binary factor. In this case, we have two experimental conditions corresponding to the two levels of the factor and under each condition we have a time course of microarray profiles. Then an important question is to detect genes showing different temporal patterns in the two conditions. This problem has been analyzed by many authors. Storey et al. modeled gene expression over time by splines and characterized the differences in temporal expression using gene-specific summary statistics based on the fitted smooth curves (5). Yuan and Kendziorski developed a hidden Markov model to utilize the dependency structure of the time-course data for inferring temporally differentially expressed (DE) genes (6). Tai and Speed treated time-course measurements as a multivariate normal vector and proposed an empirical Bayes approach to rank DE genes (7). Wei and Li identified DE genes by modeling both temporal dependency over time and spatial dependency within gene regulatory networks (8).

However, many time-course microarray experiments have more complicated designs that involve more than one factor. For example, in the clinical study that motivated this work, gene expression of blood samples from patients was measured at different times after burn injury. We are interested in genes that exhibit dynamic response to burn injury (i.e., show different dynamic patterns in burn patients over time as compared to normal controls), and in how the response may depend on additional factors such as age. In this case, we have a factorial study with two binary factors (burn status and age group: children/adults) and longitudinal time-course measurements (measurements from the same subject at several time points). To our knowledge there are no existing methods for analyzing microarray studies with such complex designs.

To meet this need, we develop a method to simultaneously handle the time course and factorial structure in microarray data. Our method can be applied to two types of time-course data: longitudinal and cross-sectional sampling. To analyze factor effects on gene expression, we draw on ideas from the classical statistical method of analysis of variance (ANOVA). ANOVA structure is used to model the dependency of gene expression on the experimental factors (Fig. 1*A*). For example, for some genes the structure may be interactive (i.e., the effect of one factor depends on the level of another factor), while for other genes the structure may be additive (i.e., the effect of one factor does not depend on the level of another factor). If the gene expression is influenced by only one of the factors, its ANOVA structure has the corresponding main effect. The remaining genes are not affected by any factor and are constantly expressed. For each gene, we conduct a series of statistical tests on the factor effects to select the best ANOVA structure to model its expression pattern and classify the genes into five mutually exclusive groups (C1–C5) based on their ANOVA structure (Fig. 1*B*). Our test is designed to detect the ANOVA structure as long as it is significant in some projected direction of the time-course vector—it is not necessary for the same structure to be present in all time points. The direction with a significant ANOVA structure represents a gene-specific response feature. In this way, our method simultaneously estimates the factorial models of the factor effects and the response timing feature of the genes. Furthermore, our method can be extended to handle cross-sectional experiments where the microarray measurements at each time point are taken from a different set of subjects. The source code and R package “TANOVA” have been submitted to the Comprehensive R Archive Network at http://www.r-project.org/ and are available for download at http://gluegrant1.stanford.edu/TANOVA/.

Our method was motivated by a large-scale clinical study of burn patients to characterize the gene expression impact of demographic factors important to patients’ outcomes after injury. This clinical study by the Inflammation and Host Response to Injury program includes gene expression data of blood samples from pediatric and adult patients measured at different times after severe burn injury and from healthy controls. Here we present results of our investigation, based on the proposed time-course ANOVA (TANOVA) methodology, on the effect of age on a patient’s genomic response to burn injury. The analysis identified many genes responsive to burn injury, including those with responses that are age-specific.

## Methods

### Overview of TANOVA.

We use two genes from the burn study to illustrate the TANOVA method (Fig. 1*C*). Each gene was measured at two time points (early and middle stage) for pediatric and adult patients as well as healthy controls (see *Results* for details). Gene SLC2A3 shows a significant difference between patients and controls (i.e., burn effect), but not between children and adults (i.e., no age effect). The burn effect presents at both time points. The other gene IGLJ3 shows an age-dependent difference between patients and controls (i.e., interactive effect), and the interactive effect is significant only at the middle stage, but not at the early stage. As the response pattern and timing of individual genes may be directly related to the differences in clinical outcomes of pediatric and adult burn patients and are of great interest in medical studies, we would like to capture them by TANOVA.

In the TANOVA method, temporal expression of a gene is represented by a vector (in this example, we have vectors of length two). We are interested in detecting the ANOVA structure along an optimal direction in the time space. The optimal direction is estimated from the data vector by finding the projection that has the strongest ANOVA signal of interest. If the ANOVA structure is not significant along the optimal direction, it would not be significant on other directions. Therefore, the direction is “optimal” because it best captures the ANOVA structure by pooling information across time points. In our example, gene SLC2A3 will be classified into the C3 group for the significant burn effect along the direction (0.697, 0.717)′. The coordinates of the optimal direction correspond to the two time points. Their similar magnitudes indicate the burn effect presents at both stages. On the other hand, gene IGLJ3 will be classified into the C1 group because of the interactive effect of burn and age along the direction (0.086, 0.996)′. This direction is heavily tilted toward the second time point; reflecting the interactive effect is only significantly present at the middle stage. The optimal direction (called “ANOVA direction”) is gene specific and depends on the type of the ANOVA structure. It captures the response timing of the gene to the ANOVA signal and is useful for summarizing and discovering the dynamic gene expression pattern. Next, we present the mathematical formulation of the above idea.

### Statistical Modeling and Estimation.

In longitudinal experiments, gene expression from the same individual is measured over *p* time points, which we treat as a *p*-dimensional column vector. Let **y**_{g,ijk}∈*R*^{p} be the expression vector of gene *g* in condition (*α*_{i},*β*_{j}), where *α*_{i}(*i* = 1,…,*I*) and *β*_{j}(*j* = 1,…,*J*) represent the two experimental factors and *k* (*k* = 1,2,…,*n*_{ij}) is the index for replicates. We model **y**_{g,ijk} as independently sampled from a multivariate distribution with mean *μ*_{g,ij}∈*R*^{p} and a gene-specific variance-covariance matrix **Σ**_{g}. The time-course dependence is captured by the variance-covariance matrix **Σ**_{g}. Factor effects are modeled in the structure of *μ*_{g,ij}. In a two-way factorial experiment, *μ*_{g,ij} have the following possible ANOVA structures (the gene index is dropped for simplicity): *μ*_{ij} = ** ν** +

*α*_{i}+

*β*_{j}+

*γ*_{ij}(Eq.

**S1**),

*μ*_{ij}=

**+**

*ν*

*α*_{i}+

*β*_{j}(Eq.

**S2**),

*μ*_{ij}=

**+**

*ν*

*α*_{i}(Eq.

**S3**),

*μ*_{ij}=

**+**

*ν*

*β*_{j}(Eq.

**S4**), and

*μ*_{ij}=

**(Eq.**

*ν***S5**). In the interactive model (Eq.

**S1**),

**∈**

*ν**R*

^{p}is the baseline gene expression across conditions.

*γ*_{ij}∈

*R*

^{p}is the interaction term. Eq.

**S2**is the additive model. Genes of models

**S3**and

**S4**have the main effect for only one of the factors. Model

**S5**represents genes whose expressions are not influenced by either factor. In these models, constraints , , and are imposed for identifiability.

The above models represent different ANOVA structures of the gene expression. To select an appropriate ANOVA structure for a gene, factor effects are evaluated through a series of statistical tests (Fig. 1*B*). In the first step, we assess if the gene is affected by either factors. This is done by treating Eq. **S5** as the null hypothesis and testing it against the alternative that mean expression of the gene is not constant across all combinations of the two conditions. If the null hypothesis is not rejected, the gene is classified to model **S5**. For a nonconstant gene, we then test if there is an interaction effect. A gene with a significant interaction effect is classified to model **S1**. For a noninteractive gene, main effects of α and β are further tested. If both factor effects are significant, the gene is classified to model **S2**. Otherwise, a gene with only α- or β-effect is classified to model **S3** or **S4**, respectively. Some nonconstant noninteraction genes may not show significant α- or β-effect alone because the effect of a single factor is weak, but are collectively strong. We also classify these genes into model **S2**.

A difficulty in carrying out this stepwise testing approach is that the ANOVA models **S1**–**S5** are specified for vectors in *R*^{p}, whereas traditional ANOVA tests are designed for scalar observations. To handle this difficulty we assume that there is a gene-specific direction λ (called the ANOVA direction) for which the relevant ANOVA structure exists after projection along that direction. We attempt to estimate this direction in our construction of the test statistics for each stage of the stepwise testing. For example, in testing model **S5** against the alternative that mean expression of the gene is not constant across all combinations of the two conditions, we assume the model **y**_{ijk} = ** ν** +

*θ*

_{ij}·

**+**

*λ***e**

_{ijk}, where

**,**

*ν***,**

*λ***e**

_{ijk}∈

*R*

^{p}, and . Here

**e**

_{ijk}represents the random error and is assumed to be distributed with mean 0 and a gene-specific variance-covariance matrix

**Σ**.

*θ*

_{ij}is a scalar, representing the one-way ANOVA signal along the direction

**. Under this model it is possible to derive an optimal estimator for**

*λ***for each gene. The estimation of**

*λ***incorporates information of the ANOVA signal-to-noise ratio at each time point as well as the correlations between time points. The estimated direction will be tilted toward the time points that have strong ANOVA signals. After projection to the gene-specific direction, we now have a factorial structure with scalar observation. We then compute the robust nonparametric ANOVA test statistics as in ref. 9 and identify genes with various interesting ANOVA structure under strict control of false discovery rates. The robust ANOVA statistic is similar to the**

*λ**F*statistic, but can handle unbalanced factorial designs and protect against outliers in the data. The detailed derivations for the procedures, including estimates for the covariance matrix, the ANOVA direction, the test statistics for the ANOVA structure, as well as extensions to handle cross-sectional time-course experiments are given in

*SI Text*.

### Use of Regularization to Handle Longer Time Courses.

In high-dimensional time spaces, there are a large number of directions to search for, but not all of them are biologically interpretable. To avoid overfitting it is desirable to use regulation techniques that penalize projection directions that represent high-frequency fluctuations in time. If we think of an ANOVA direction as a function *λ*(*t*) over a grid of time points *t*, we expect the function *λ*(*t*) to be smooth rather than jagged as true biological responses are not likely to be wildly fluctuating. To enforce smoothness, we represent *λ*(*t*) as an expansion of natural cubic splines, i.e., . In matrix form, ** λ** =

**H**

**, where**

*θ***H**is a

*p*×

*M*(

*M*<

*p*) basis matrix of natural cubic splines defined on the set of time points. We place knots uniformly over the integers 1,2,…,

*p*representing the time points. With the above representation, the ANOVA direction is restricted to a lower-dimensional subspace, and the method of ref. 10 can be extended to derive the optimal estimator of

**; see**

*λ**SI Text*for the rigorous derivation.

### Patient Enrollment, Sample Collection, and Processing.

The study was reviewed and approved by the Institutional Review Boards at each participating site. Details of the protocols are described in *SI Text*.

## Results

### Validation of Methodology on Simulated Data.

Datasets consisting of 1,000 genes were generated in the setting of two-way factorial experiments. The expression vector **y**_{g,ijk}, which represents the expression of gene *g* over three time points of replicate experiment *k* under condition (*α*_{i},*β*_{j}), was generated from a three-dimensional normal distribution with mean *μ*_{g,ij} and a gene-specific variance-covariance matrix **Σ**_{g}. Three replicates were simulated in each condition. We generated the mean vector as follows: *μ*_{g,ij} = *ν*_{g} + *α*_{g,i} + *β*_{g,j} + *γ*_{g,ij}·*λ*_{g} for genes 1–100, *μ*_{g,ij} = *ν*_{g} + (*α*_{g,i} + *β*_{g,j})·*λ*_{g} for genes 101–200, *μ*_{g,ij} = *ν*_{g} + *α*_{g,i}·*λ*_{g} for genes 201–300, *μ*_{g,ij} = *ν*_{g} + *β*_{g,j}·*λ*_{g} for genes 301–400, and *μ*_{g,ij} = *ν*_{g} for the genes 401–1,000. For each gene, the ANOVA direction *λ*_{g} was randomly sampled from the unit sphere. Our goal was to identify genes with (*i*) any significant factor effects (genes 1–400), (*ii*) significant interaction effects (genes 1–100), and (*iii*) significant effect involving factor α (genes 1–300). We compared the performance of our method (TANOVA) to the method of the linear mixed-effects modeling (11) in which factors are modeled by fixed effect components and time effects by random components (see *SI Text* for details). As shown in Table 1, by explicitly modeling the direction of interest, TANOVA can provide substantially improved sensitivity and specificity in detecting genes with significant ANOVA structures. We also compared our method with ANOVA analysis (9) based on various other features from the time profile, such as the observation at just the first time point, or the average over all times, or the projection of the expression vector along the true direction *λ*_{g}. The results show that TANOVA’s performance is substantially better than those resulting from ad hoc usage of the dynamic profile (such as using the time average) and generally comparable to the performance of the ideal ANOVA analysis that assumes the true direction. Details are presented in *SI Text* and Fig. S1.

To test our method in the case of longer time courses, we generated data on 1,000 genes under the interaction model on random directions in a two-way factorial experiment setting. There were three replicates in each condition, and the number of time points *p* varied from 5 to 30. To reflect the correlations between time points, the ANOVA direction of each gene was generated from an autoregressive moving average model: *λ*(*t*) = *λ*(*t* - 1) + *ε*(*t*) + *ε*(*t* - 1), where *ε*(*t*) (*t* = 1,…,*p*) was independently normally distributed and *λ*(1) was generated randomly for each gene. To regularize the ANOVA direction, natural cubic splines with four degrees of freedom was used. It can be seen in Fig. 2, as the number of time points became larger while the sample size was fixed, inference of the ANOVA direction became less accurate because of more parameters to be estimated and a larger space to search. But regularization always improved the estimation, and the improvement was significant for data with large number of time points. We also conducted similar simulations to validate the performance of the method for cross-sectional time course data. The results can be found in *SI Text* and Fig. S2.

### A Large-Scale Clinical Study of Gene Expression Response to Burn Injury.

We applied TANOVA to a large-scale clinical study of burn patients. Severe thermal injury is life threatening: It leads to inflammation and metabolic dysfunctions that require months of hospitalization and can last for a year after the injury (12–14). Demographic factors of patients, in addition to their clinical and genomic characteristics, are determinants of the outcomes of virtually every common disease. For burn injury, the patient’s age group is a well-documented risk factor: Adult patients have significantly higher morbidity and mortality rates compared to pediatric patients, even through the average injury severity level measured by total body surface area (TBSA) and inhalation injury is often higher in the pediatric group (12).

Although it is well recognized clinically that children exhibit unique physiological characteristics in recovery from injury not seen in adults, the molecular mechanism is not well understood. To elucidate the underlying gene effect that contributes to the age-dependent differences, we extracted relevant data from the large-scale burn study (see *SI Text* for details) and used TANOVA to identify the impact of age on temporal gene expression response to injury. TBSA, inhalation injury, and gender are potential confounding factors in our analysis, which were balanced by patient selection. The resulting dataset “Burn + Age” (Table S1) includes burn/control groups and two age groups: children (aged from 0 to 11 years old) and adults (aged from 22 to 86 years old), and Table 2 shows the statistics of selected demographic, clinical, and outcome information of the two age groups of patients. For each patient, two time-point measurements were considered in the analysis: the early stage (0–10 days) and the middle stage (11–49 days) post burn. Since there is a single array for each control subject, the array was duplicated to obtain pseudo time-course data. This is reasonable because the gene expression of controls is expected to be relatively stable between the two time points. To achieve a better balance on the confounding factors between the two age groups, we sampled two datasets by randomly deleting one adult with inhalation injury and four adult patients without inhalation injury. The resulting datasets both have 26 adult patients with inhalation injury ratio of 18/8 (yes/no), perfectly balanced with the pediatric group. TANOVA was then applied to each of the two datasets to classify the probe sets into five groups (C1–C5). The classification was done by a stepwise significance test as shown in Fig. 1*B*. Threshold of each test was chosen to control the false discovery rate (FDR) < 0.01. We analyzed the common probe sets in each group shared by both datasets.

### Age Effect in Burn Injury.

The analysis was done on the 8,639 probe sets with the coefficient of variation > 0.06 and median expression across arrays > 7 (log2 scale). The number of shared probe sets in C1, C2, C3, and C4 are 866, 642, 5,807, and 73, respectively. Expression patterns of four representative probe sets from groups C1, C2, C3, and C4 are displayed in Fig. S3. Genes in C1 and C2 show significant age effect after burn injury, while C3 is affected only by burn and C4 only by age. Thus the vast majority of probe sets (C1 + C2 + C3 = 7,315) are perturbed by burn injury, and 21% of these probe sets [(C1 + C2)/(C1 + C2 + C3) = 1,508/7,315] reveal differences between adults and children.

TANOVA captures response timing of a gene in its ANOVA direction. Magnitude of the coordinates in the ANOVA direction reflects the intensity of the ANOVA signal at the corresponding time points. For each probe set, we calculate the middle stage weight *w*, defined as , where *v* = (*v*_{1},*v*_{2})^{′} is the ANOVA direction of the probe sets. *w* reflects the distribution of the ANOVA signal between the two time points. For example, gene IGLJ3 (Fig. S3*A*) has an ANOVA direction (0.086,0.996)′ with *w* = 0.992, indicating the interaction effect was concentrated at the middle stage. This is clearly shown in Fig. S3*A*, as IGLJ3 expression is up-regulated in adult burn patients (compared to controls) and down-regulated in pediatric patients at the middle stage. Similar weights in the ANOVA direction suggest the gene response is constitutive and span the two stages. For example, the ANOVA direction (0.697,0.717)′ of gene SLC2A3 indicates its burn response lasts for the two stages (Fig. S3*C*). A group of genes sharing similar ANOVA directions may be biologically related as their dynamic response is likely to be coordinated (*SI Text*). Negative signs in the ANOVA direction reveal the ANOVA signal is embedded into the expression change at the corresponding time points (*SI Text*). Thus, the ANOVA direction captures dynamic response features of genes.

In the C1 group, 866 probe sets show significant age-dependent changes over the time course after burn injury. Hypermetabolism and inflammation are two hallmarks of the body’s response to burn injury (15, 16). Our analysis reveals that significantly enriched pathways in C1 include glycosphingolipid biosynthesis-globoseries (5 genes, *p* value: 8.19E-5), primary immunodeficiency signaling (10 genes, *p* value: 0.001), keratan sulfate biosynthesis (5 genes, *p* value: 0.007), galactose metabolism (5 genes, *p* value: 0.01), ubiquinone biosynthesis (8 genes, *p* value: 0.01) , death receptor signaling (7 genes, *p* value: 0.01), and mitochondrial dysfunction (12 genes, *p* value: 0.01) (the *p* values are calculated from the hypergeometric test; see *SI Text* for the details of pathway analysis). These specific metabolic and inflammatory genes and pathways are differentially perturbed in the two age groups and are potential contributors to the pathophysiological differences in pediatric and adult patients.

Genes in C2 are influenced by both age and burn, but the two factor effects are independent. As shown in Fig. 2*B*, C2 genes can also contribute to pediatric and adult differences because expression levels are different between the two age groups. Significantly enriched pathways in C2 include notch signaling (4 genes, *p* value: 0.002), mitochondrial dysfunction (11 genes, *p* value: 0.004), aminophosphonate metabolism (4 genes, *p* value: 0.005), clathrin-mediated endocytosis signaling (11 genes, *p* value: 0.005), virus entry via endocytic pathways (9 genes, *p* value: 0.005), and ubiquinone biosynthesis (7 genes, *p* value: 0.006).

C3 genes are burn responsive without age group differences. As expected, significant pathways in C3 are involved in cellular immune response and metabolic processes, such as glutamate metabolism (14 genes, *p* value: 0), *T* helper cell differentiation (14 gene, *p* value: 0), fatty acid biosynthesis (5 genes, *p* value: 0), nucleotide sugars metabolism (6 genes, *p* value: 0), role of pattern recognition receptors in recognition of bacteria and viruses (34 genes, *p* value: 0.005), CD28 signaling in *T* helper cells (49 genes, *p* value: 0.006), and calcium-induced *T* lymphocyte apoptosis (29 genes, *p* value: 0.007). Finally, C4 genes are related only to age and not burn injury. It includes genes (RHOB, PPP1CA, and MAP2K7) involved in production of nitric oxide and reactive oxygen species in macrophage, which has been related to the aging process (15).

Taking these results together, burn injury introduces an integrated cell-wide response in blood leukocyte cells. A number of important inflammation and metabolic processes are perturbed differentially in pediatric and adult patients, who are candidates for further studying the age differences in burn injury.

### Mitochondrial Function Is Differentially Perturbed in Pediatric and Adult Burn Patients.

Mitochondrial dysfunction and mitochondria related ubiquinone biosynthesis are among the top enriched pathways in both C1 and C2 groups, suggesting that mitochondrial functions are significantly affected by both burn and age. A limited number of previous animal studies have found that mitochondrial genes and function are decreased in skeletal muscle, liver, and cardiac muscle following burn injury (16). Fig. 3 shows that in pediatric patients genes of the subunits of NADH ubiquinone dehydrogenases (mitochondria complex I) are significantly down-regulated after burn starting at the early stage and lasting till the middle stage, which is consistent with the recent report that mitochondrial oxidative function is impaired in severely burned children (17). In contrast, mitochondria complex I genes in adult patients are much less affected by burn injury. Because of the intense therapeutic interest to correct the perturbed mitochondrial functions after injury (17), it is intriguing to further investigate these differential responses of mitochondrial genes for therapeutic options in children and adults.

### Burn Response of Immunoglobulin Genes is Age Specific.

ANOVA direction captures response timing of a gene to a particular ANOVA signal. Fig. S4 shows the distribution of response timing in different groups to the interaction effect (C1), additive burn and age effect (C2), and the main effect of burn (C3). Response of the majority of genes spans both stages. However, there is a group of C1 genes that respond differentially in children and adults specifically at the middle stage, as indicated by the bump at the right tail in the histogram of C1. We found among these 42 probe sets (32 genes) with the middle stage weight *w*≥0.9, there are 29 probe sets (20 genes) belonging to the immunoglobulin family. In adult patients, these immunoglobulin genes are significantly up-regulated at the middle stage post injury (> 5-fold on average), which is consistent with the observed changes in serum immunoglobulin levels in severely burned adult patients (18). Strikingly, little change or down-regulation is observed of these genes over the same period in pediatric patients (Fig. 4), suggesting a much tampered immunoglobulin production in children after burn injury. While we have not determined the detailed characteristics of the immunoglobulins produced nor the clinical impact of the significantly higher and prolonged activation of immunoglobulin response in adult patients (up to 49 days), evidences exist on the introduction of previously unexposed self-antigen and resulting self-reactive antibodies after injury that leads to inflammation and tissue damage (19, 20).

### Interaction Genes Are Significantly Associated with the Clinical Outcome.

Age is an important risk factor after burn injury. Among patients included in our TANOVA analysis, adults have a death rate of 35%, much higher than the death rate of pediatric patients (7.7%). Since interaction genes (C1 group) are differentially perturbed between the two age groups, they may contribute to the difference in clinical outcome in pediatric and adult burn patients. Moreover, we hypothesize that the interaction genes may be associated with the survival in burn injury in general. To test this hypothesis, we analyzed gene expression data from 71 adult patients (including the 31 adults used in TANOVA analysis) together with their clinical outcome. The 71 patients are classified into survival (54 patients) and nonsurvival (17 patients) groups. A nearest shrunken centroid method (21) was applied to select signature genes associated with the survival/nonsurvival phenotype. Using early stage data, the nearest shrunken centroid method identifies 302 probe sets with the cross-validation error of about 32%. Among them, 92 probe sets are in C1 (*p* value: 1.3E-27), 17 in C2 (*p* value: 0.78), 132 in C3 (*p* value: 0.09), and 61 in C5 (*p* value: 6.1E-5) (the *p* values are calculated from the hypergeometric test). The interaction genes (C1) are significantly overrepresented. Similarly, the nearest shrunken centroid method identifies 635 probe sets with the cross-validation error of about 30% using middle stage data. These probe sets are overrepresented by C1 (95 probe sets, *p* value: 2.8E-7) and C3 (459 probe sets, *p* value: 1.4E-63), but not C2 (31 probe sets, *p* value: 0.98), C4 (1 probe set, *p* value: 0.97), or C5 (49 probe sets, *p* value: 0.99). Again these results show that the interaction genes are significantly associated with the patient survival. Further studies are needed to reveal the underlying mechanisms through which these genes affect the clinical outcome.

## Discussion

We have developed a method to analyze factorial time-course microarray data. The TANOVA method differs from other existing methods in two important aspects. First, it was designed to simultaneously handle the time course and factorial structure in microarray data; second, the dynamic response feature of a gene is effectively captured in our method by ANOVA direction, the direction in time space that has the strongest ANOVA signal of interest. The estimated directional vector provides quantitative information about the response timing and expression pattern of a gene (Figs. S5 and S6). With multiple time points, a gene could have a large number of response features as when to respond (response timing) and how to respond (expression pattern). The ANOVA direction extracts the most informative feature for the evaluation of factor effects and provides guidance for biologists to study the dynamic behavior of a gene or a group of genes. In addition, it is possible that other directions orthogonal to the ANOVA direction contain interesting signals. We can remove the signals on the ANOVA direction from the data **y**_{ijk} to get the residual and apply TANOVA to find additional signals. Treating time-course observations as a vector ignores the inherent time order but renders the proposed method a broader applicability. It is applicable to longitudinal factorial data in general. If one wants to utilize the ordering of time points, regularization of the ANOVA direction by cubic splines can be imposed.

This method is particularly useful in analyzing time-course microarrays from clinical studies to characterize the gene expression effect of multiple factors of interest, such as demographics, clinical information, treatments, or clinical outcomes. Application of TANOVA to a large-scale clinical study of pediatric and adult burn patients reveals that, while burn injury invokes an overwhelming genome-wide response over the early and middle stages (up to 49 days) post injury, 21% of the genes perturbed by injury are age-dependent and enriched on specific pathways of metabolism and inflammation. In addition, dramatic differences are observed in mitochondrial function and immunoglobulin production post burn between children and adults. Furthermore, the interaction genes are significantly associated with patients’ outcomes after injury. Taken together, these findings provide insights on the uniqueness in the body’s response to injury between children and adults and underscore the importance of further investigations of therapeutic options targeting specific age groups.

## Acknowledgments

We thank Ted Anderson for helpful discussions. In particular, we wish to acknowledge the efforts of many individuals at participating institutions of the Glue Grant Program that generated the clinical and genomic data reported here. This study was supported by NIH U54 GM-062119 and SHC 86400.

## Footnotes

^{1}To whom correspondence may be addressed. E-mail: whwong{at}stanford.edu or wxiao1{at}partners.org.Author contributions: B.Z., R.T., R.D., W. Xiao, and W.H.W. designed research; B.Z., W. Xu, and W.H.W. performed research; D.H., R.T., R.D., and IHRIPcontributed new reagents/analytic tools; B.Z., W. Xu, and W. Xiao analyzed data; and B.Z., W. Xiao, and W.H.W. wrote the paper.

↵

^{2}A complete list of the investigators of the Inflammation and Host Response to Injury Program is available at http://www.gluegrant.org/institutions.htm.The authors declare no conflict of interest.

Data deposition: The data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE19743).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1002757107/-/DCSupplemental.

Freely available online through the PNAS open access option.

## References

- ↵
- Wakefield J,
- Zhou C,
- Self S

- ↵
- ↵
- ↵
- ↵
- Storey JD,
- Xiao W,
- Leek JT,
- Tompkins RG,
- Davis RW

- ↵
- ↵
- ↵
- Wei Z,
- Li H

- ↵
- Zhou B,
- Wong WH

- ↵
- ↵
- Pinheiro JC,
- Bates DM

- ↵
- ↵
- ↵
- ↵
- ↵
- Padfield KE,
- et al.

- ↵
- ↵
- ↵
- Zhang M,
- et al.

- ↵
- Zhang M,
- et al.

- ↵
- Tibshirani R,
- Hastie T,
- Narasimhan B,
- Chu G

## Citation Manager Formats

### More Articles of This Classification

### Physical Sciences

### Applied Mathematics

### Biological Sciences

### Medical Sciences

### Related Content

- No related articles found.