Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance

Lin et al. 10.1073/pnas.0407114101.

Supporting Information

Files in this Data Supplement:

Supporting Figure 6
Supporting Table 1
Supporting Figure 7
Supporting Figure 8
Supporting Table 2
Supporting Figure 9
Supporting Figure 10
Supporting Table 3
Supporting Table 4




Supporting Figure 6

Fig. 6. Interquartile range (IQR) versus the rank of the median expression level for log-transformed (A) and two-component noise model (TCM)-transformed (B) data. These two plots visually represent the effects of the data transformation that we employ in our analysis. The goal of the transformation is to stabilize the replicate variance, namely, to ensure that the variance is uniform and the errors are symmetric across the range of expression values. Simple log-transformation of the data are adequate at higher intensity levels; however, it tends to artificially amplify the variance at lower intensity levels. Each dot on these plots corresponds to a particular gene/time point of the experiment, using only data from time points 1–5 (synchronous period). The dots represent the dependence of interquartile range (a robust measure of variance or spread of the replicates) on the rank of the median intensity level for that spot. Log-transformed intensity values yield replicate residual distribution that is highly dependent on the intensity value. The objective of the TCM transformation is to remove such dependence and produce data with uniform replicate residuals.





Supporting Figure 7

Fig. 7. Validation of F test scores by using a list of 89 genes (see Table 1) whose expression patterns have been shown in the available literature by using quantitative or semiquantitative methods to be altered during the hair-growth cycle. For each of the key P values (indicated by red "X" marks), the fraction of known hair cycle-associated genes selected is on the x axis, and the total number of filtered probe sets selected is on the y axis.





Supporting Figure 8

Fig. 8. Q value analysis of the false discovery rate (FDR). The Q value of a given gene qi can be interpreted as the false discovery rate associated with rejecting all hypothesis with Q values less than qi. The stimated fraction of true null hypothesis as a function of a tuning parameter l(A), Q value as a function of P value (B), total number of significant tests as a function of FDR estimate (C), and expected number of false positives among significant tests as a function of a number of significant tests (D) are shown.





Supporting Figure 9

Fig. 9. Selection of K = 30 clusters is based on cross-validated per-point logP (log probability) score (A) and BIC (Bayesian information criterion) scores (B). The cross-validated per-point logP score plot indicates that the score plateaus at ≈30 clusters, and the BIC score plot, which incorporates higher penalties for more clusters, shows a peak in the score at 20 clusters. Comparison of the profile patterns between 20 and 30 clusters showed that additional unique profile patterns are gained by having more clusters, and thus, the number of profile clusters was set at K = 30.





Supporting Figure 10

Fig. 10. K30 profile clusters. The 12 clusters that show a hair-growth pattern can be further classified into four groups of clusters: genes whose expression profile peak at early (clusters 1-4), middle (clusters 5 and 6), or late (clusters 7 and 8) anagen, and genes that show a sharp decline in expression level at catagen (clusters 9-12). There are two main types of catagen-related expression patterns: genes that drop in expression level at catagen (cluster 13) and genes that peak at catagen (clusters 14-15). The nine clusters that follow an anti-hair-growth pattern can be broken down into three clusters: genes whose expression level rise sharply at catagen (clusters 16-18), rise slowly at catagen (clusters 19-21) or decline during anagen (clusters 22-24). There are six clusters (clusters 25-30) that contain all of the genes whose expression profiles do not fit into any of the three general profile patterns. For each time point, the standard deviation and the minimum and maximum values of the cluster are shown. Blue lines indicate expression profiles for individual genes, and yellow lines indicate mean expression profiles for clusters.

This Article

  1. PNAS November 9, 2004 vol. 101 no. 45 15955-15960
  1. AbstractFree
  2. Figures Only
  3. Full Text
  4. Full Text (PDF)
  5. » Supporting Information