Nonrandom variations in human cancer ESTs indicate that mRNA heterogeneity increases during carcinogenesis

Virtually all cancer biological attributes are heterogeneous. Because of this, it is currently difficult to reconcile results of cancer transcriptome and proteome experiments. It is also established that cancer somatic mutations arise at rates higher than suspected, but yet are insufficient to explain all cancer cell heterogeneity. We have analyzed sequence variations of 17 abundantly expressed genes in a large set of human ESTs originating from either normal or cancer samples. We show that cancer ESTs have greater variations than normal ESTs for >70% of the tested genes. These variations cannot be explained by known and putative SNPs. Furthermore, cancer EST variations were not random, but were determined by the composition of the substituted base (b0) as well as that of the bases located upstream (up to b − 4) and downstream (up to b + 3) of the substitution event. The replacement base was also not randomly selected but corresponded in most cases (73%) to a repetition of b − 1 or of b + 1. Base substitutions follow a specific pattern of affected bases: A and T substitutions were preferentially observed in cancer ESTs. In contrast, cancer somatic mutations [Sjoblom T, et al. (2006) Science 314:268–274] and SNPs identified in the genes of the current study occurred preferentially with C and G. On the basis of these observations, we developed a working hypothesis that cancer EST heterogeneity results primarily from increased transcription infidelity.

Virtually all cancer biological attributes are heterogeneous. Because of this, it is currently difficult to reconcile results of cancer transcriptome and proteome experiments. It is also established that cancer somatic mutations arise at rates higher than suspected, but yet are insufficient to explain all cancer cell heterogeneity. We have analyzed sequence variations of 17 abundantly expressed genes in a large set of human ESTs originating from either normal or cancer samples. We show that cancer ESTs have greater variations than normal ESTs for >70% of the tested genes. These variations cannot be explained by known and putative SNPs. Furthermore, cancer EST variations were not random, but were determined by the composition of the substituted base (b0) as well as that of the bases located upstream (up to b ؊ 4) and downstream (up to b ؉ 3) of the substitution event. The replacement base was also not randomly selected but corresponded in most cases (73%) to a repetition of b ؊ 1 or of b ؉ 1. Base substitutions follow a specific pattern of affected bases: A and T substitutions were preferentially observed in cancer ESTs. In contrast, cancer somatic mutations [Sjoblom T, et al. (2006) Science 314:268 -274] and SNPs identified in the genes of the current study occurred preferentially with C and G. On the basis of these observations, we developed a working hypothesis that cancer EST heterogeneity results primarily from increased transcription infidelity.
bioinformatics ͉ transcription ͉ oncology C ancer is a genetic disease caused by sequential accumulation of mutations in oncogenes and tumor suppressor genes (1). Recent work reveals that cancer somatic mutations were more frequent than initially suspected. A large sequencing effort directed toward human colorectal and breast cancer DNA gene coding domains as well as exon-intron boundaries led to the identification of 1,307 novel confirmed somatic mutations (2). This study further showed that the subset of affected genes varies both with cancer type and within the same cancer type with individual tumors. Thus, the heterogeneity of cancer somatic mutations has clearly been established.
Virtually all biochemical, biological, and clinical attributes are heterogeneous within human cancer of the same histological subtype (3). Somatic mutations, although occurring at rates higher than suspected, remain relatively rare (3.1 per 10 6 bases), leading on average to 90 amino acid substitutions in a given tumor (2). Thus, somatic mutations alone cannot explain the large number of variants observed in systematic proteomic approaches.
The possibility that transcription [a process mediated by DNA-dependent RNA polymerases (RNAP)], as well as posttranscriptional enzymatic RNA base changes (4-6), might con-tribute to molecular heterogeneity of cancer has thus far not been considered. Recent work using in vitro transcription assays revealed that T7 RNAP, yeast RNAP II, and bacterial RNAP permit template-strand misalignment, leading to transcription infidelity (7,8). This discovery led to the notion that in vivo transcription infidelity might increase when abasic sites or unrepaired DNA lesions are encountered (7,8).
We sought to compare ESTs deriving from normal and cancer samples (9). It is known that a great deal of sequence variations are present in ESTs, specifically those available in noncurated databases (10,11). EST sequence heterogeneity has thus far been considered as noise arising from (i) a high degree of sequencing errors, (ii) chimeric ESTs, (iii) intronic and inter-ORF sequences, (iv) pseudogenes and paralogues, (v) SNPs, or (vi) somatic mutations.
Here we show a higher nonrandom heterogeneity in ESTs originating from cancer samples as compared with those from normal samples. Bioinformatic filtering of these sequences did not suppress increased cancer EST heterogeneity. Cancer EST sequence variation is determined by the sequence of the DNA template. The base substituted to the normal base (defined by Watson-Crick complementarities) occurs nonrandomly and corresponds in most cases to the base located immediately upstream or downstream of the substitution event.

Results
We accessed the noncurated EST database available on human dbEST from the National Center for Biotechnology Information (June 2005 version) (11) and classified ESTs according to sample sources. This led to the creation of three different sets of sequences containing ESTs from normal (N) tissue (Ϸ2.8 ϫ 10 6 sequences), cancer (C) tissue (Ϸ2.6 ϫ 10 6 sequences), and unknown origin (Ϸ0.7 ϫ 10 6 sequences). This last set was not considered in this analysis. We then selected 17 genes on the basis of their large representation in the database. Each EST sequence was then aligned against its mRNA Reference Sequence (RefSeq) by using MegaBLAST 2.2.13 software (12). We then measured the proportion of ESTs deviating from RefSeq at any given position. Fig. 1 provides a graphical representation of these variations occurring in the normal and cancer sets for a representative gene (VIM coding for vimentin: gene ID 7431). Supporting information (SI) Fig. 6 provides the same graphical representation for the remaining 16 genes. The data show that sequence variation occurred most frequently in the cancer set and further that the phenomenon appeared most predominant in specific mRNA sites.  (13). Both putative and biologically validated SNPs leading to EST variations (n ϭ 442) were excluded from further analysis.
We next (i) tested the statistical significance of the differences in sequence variation occurring between cancer and normal ESTs, and (ii) when the statistical test was significant we determined whether the difference arose from sequence variation occurring in the normal (N Ͼ C) or cancer (C Ͼ N) sample. The statistical method is provided in SI Fig. 7 (14).
Data in Fig. 2 show that, for 15 of 17 abundantly expressed genes, statistically significant sequence variations arising from the cancer set, C Ͼ N, largely exceeded variations originating in the normal set, N Ͼ C. Furthermore, sequence variations C Ͼ N consistently and largely exceeded that of the estimated error resulting from multiple testing as defined by the location-based estimator (LBE) (shown as vertical bars). Variations N Ͼ C occurring at rates greater than that of LBE were found in 12 of 17 genes, but the ratio between statistically significant sequence variations and LBE was much lower in the normal set than in the cancer set. A more detailed analysis of these data is provided as SI Fig. 8.
Random sequencing errors cannot account for differences in sequence variation occurring between the normal and cancer sets. This interpretation is supported by the fact that libraries originating from cancer or from normal samples are processed essentially in the same manner. Mathematical analysis is consistent with this interpretation (SI Fig. 7).
Random sequencing errors being excluded, we sought to eliminate other sources of EST heterogeneity by filtering pro-cedures. Our initial requirements were that EST aligned to RefSeq with 100% identity on at least 16 consecutive bases, and with Ն90% identity on at least 50 bases. As shown in Table 1, this yielded 2,281 and 725 statistically significant differences C Ͼ N and N Ͼ C, respectively, that were distinct from putative or biologically validated SNPs. The constraints of the statistical test were met at 7,644 positions of 23,930 cumulated positions of the 17 genes. The second filter required that each EST aligned to RefSeq continuously on Ͼ70% of its length. The third filter removed ESTs with sequence more closely related to paralogues and pseudogenes than to the bona fide RefSeq. The fourth filter deleted from analysis the first and last 50 bases of each EST alignment to remove mismatches at the 3Ј and 5Ј borders of EST created by the MegaBLAST program. As shown in Table 1,  filters 2-4 decreased statistically significant sequence variation events C Ͼ N and N Ͼ C to 1,300 and 374, respectively. The C Ͼ N versus N Ͼ C ratio of sequence variations (C/N) increased from 3.15 to 3.48 (ϩ10%), and the same ratio subtracted from LBE increased from 6.95 to 7.95 (ϩ14%). The effect of filtering on C/N ratio subtracted from LBE was even more pronounced (25%) when one considered only the set of 13 genes with prefiltering C/N ratio Ͼ2 (SI Fig. 9).
We next hypothesized that cancer EST variations occurred because of differences in the base composition on genomic DNA template. This analysis was performed exclusively by using EST variations where C Ͼ N was statistically significant. To avoid bias that might be introduced by the filtering procedures, we used all available nonfiltered data. Results of Fig. 3 show the difference in base composition observed when comparing sequences upstream and downstream of substitution events (heterogeneous, n ϭ 2,281) with those where no substitution event was detected (nonheterogeneous, n ϭ 12,273). The criteria for nonheterogeneous sites were cancer set variations Ͻ0.5% and not statistically different from normal set variations. It must be emphasized that the 2,281 C Ͼ N heterogeneous positions used at this stage of analysis are defined on a statistical basis and hence contain a limited number of false positives as well as false negatives. Furthermore, the populations are necessarily incomplete because statistical tests cover only 32% of gene lengths. In this analysis we refer to the base undergoing substitution as b0, bases located on pre-mRNA (pmRNA) sequence 5Ј end are referred to as b Ϫ n, and bases located on 3Ј end are referred to as b ϩ n. The data show first that not all four bases were equally susceptible to variation: b0 ϭ A (33%) Ϸ T (32%) Ͼ Ͼ C (21%) Ͼ Ͼ G (14%). Furthermore, the compositions of the four bases upstream and three bases downstream of the site of event were statistically different from those of the sites without significant EST variation (Fig. 3). Specifically, sites where variations occur were more frequently preceded and followed by A Ն G Ͼ T Ϸ C. Thus, the occurrence of cancer EST heterogeneity is not random, but is determined first by the nature of the base undergoing substitution and second by the nature of the bases that immediately precede and follow the event.
It is clear from Fig. 4 that the replacement base was also not selected randomly. A was preferentially replaced by C (P ϭ 2.8 ϫ 10 Ϫ125 ), T by G (P ϭ 5.7 ϫ 10 Ϫ29 ), and G by A (P ϭ 2.2 ϫ 10 Ϫ32 ). Substitution of C showed a more even distribution, with a slight paucity of T (P ϭ 0.007).
To identify the underlying causes of such preferential base replacement, we distinguished two sets of informative and noninformative events.
Informative events were situations where the substituted base was different from either the preceding base (b Ϫ 1) or the following base (b ϩ 1) (n ϭ 1,676) (SI Fig. 10). Noninformative events were situations not matching these criteria. When informative events were analyzed, two cases were encountered: Table 1. Variations in ESTs before and after sequential application of electronic filters The C Ͼ N column provides the number of positions where statistically significant sequence variations are in excess in the cancer set. N Ͼ C provides the number of positions where sequence variations originate from the normal set. (C Ͼ N)/(N Ͼ C) ratio was calculated from pooled data. ͓(C Ͼ N) Ϫ LBE͔ and ͓(N Ͼ C) Ϫ LBE͔ were calculated for each gene. Negative results were treated as 0. Results of all genes were pooled to yield ͓(C Ͼ N) Ϫ LBE͔/͓(N Ͼ C) Ϫ LBE͔ ratio. The data show the results of each filtering procedure applied sequentially on the 17 studied genes.    Fig. 11). T was preferentially replaced by G (71%). For b Ϫ 1 substitutions, the pattern of relative influence of base composition was b0 For b ϩ 1 substitutions, the relative influence of the surrounding base followed a pattern of b0 In the second subset of informative events, the replacement base did not correspond to either b Ϫ 1 or b ϩ 1 (n ϭ 347). Affected bases were in the following order: A (47%) Ͼ T (29%) Ͼ C (14%) Ͼ G (10%). A was most commonly replaced by C (91% of the cases), T by C (50%) and A (42%), C by G (46%), and G by C (73%). Thus, when replacement base does not correspond to b Ϫ 1 or b ϩ 1, the replacement base is not randomly selected, but C is in large excess.
We next considered the set of noninformative events, i.e., situations where (i) b Ϫ 1 ϭ b ϩ 1 and where (ii) b Ϫ 1 ϭ b0 ϭ b ϩ 1 (SI Fig. 10). When b Ϫ 1 and b ϩ 1 were identical but different from b0 (n ϭ 339), substituted bases were in the order T (34.8%) Ͼ G (23.6%) Ͼ C (21.2%) Ͼ A (20.4%) and followed the same pattern of preference as in Fig. 4: T3G, G3A, and A3C. Substitutions occurring on the central base of repeat of three identical bases (n ϭ 266) were observed in the following order: A (46.2%) Ͼ T (36.9%) Ͼ G (10.5%) Ͼ C (6.4%). In this case, the most common substitution events were A3C and T3C and A. Rare GGG substitutions were most commonly replaced by GCG and CCC by CAC.
Thus, when substitutions occur within three consecutive identical bases and when substitutions did not correspond to either b Ϫ 1 or b ϩ 1, then C was the most common replacement base. When the replacement base corresponds to b Ϫ 1, the most common substitution was A3C; when the replacement base corresponds to b ϩ 1, the most common substitution was T3G. It can therefore be concluded that neither the base undergoing substitution nor the replacement base was selected randomly. Both phenomena followed predictable patterns defined by the composition of the base undergoing substitution and that of bases located upstream and downstream of this event.
Because EST sequences used in this analysis originated from many patients and several laboratories, we next sought to obtain biological validation of the differences in mRNA heterogeneity between normal and cancer cells isolated from the same patient. Preliminary analysis of denaturing HPLC (DHPLC) profiles revealed the ability to detect three bases cDNA heterogeneity occurring at a rate of 2.5-5% (data not shown). We selected the ENO1 gene because (i) it appears clearly affected by the increased EST heterogeneity in the cancer set ( Fig. 2) and (ii) its sequence allowed identification of primers that do not match on a genome-wide basis with paralogues or pseudogenes. The difference in cumulated percentage of base substitutions between cancer and normal ESTs for the ENO1 gene-amplified fragment that was calculated from dbEST data is 7.25%, i.e., within DHPLC sensitivity range. We also selected TMSB4X as the negative control because we found no statistical difference in EST heterogeneity between normal and cancer sets and were also able to identify genome-specific primers. The results show a difference in heterogeneity of RT-PCR-amplified products of ENO1 gene isolated from kidney normal and cancer cells of the same patient (Fig. 5A). This contrasted with the absence of difference observed with the TMSB4X gene using the same samples (Fig. 5B). The lengths of amplified fragments were similar for both genes, and the same polymerases were used in both experiments. PCR infidelity did not create variations for the TMSB4X gene, so one can therefore reasonably assume that it is also the case for ENO1 and thus that the observed difference between normal and cancer profiles reflects the heterogeneity of mRNA sequence. To further establish this conclusion, we repeated the analysis of ENO1 using two different PCR preparations and again observed significant differences between cancer and normal tissue of the same patient (Fig. 5C). Finally, we verified the reproducibility of the loading procedure to DHPLC (Fig. 5D). These data establish that mRNA heterogeneity is different when one compares RNA from cancer to normal tissues isolated from the same patient. We have repeated the procedure using different genes and different cancer types and obtained results indicating that cancer mRNA heterogeneity is both cancer-and gene-specific (data not shown). Because DHPLC does not allow probing the extent and position of sequence heterogeneity, a detailed analysis of cancer versus normal mRNA sequences using highly accurate sequencing procedures is needed.

Discussion
The primary conclusions of this analysis are that cancer EST heterogeneity is greatly increased compared with that of normal ESTs and is not random noise, but follows a specific pattern defined by the genomic DNA template and determined first by  5. DHPLC elution profiles of double-stranded DNA amplified from cDNA obtained from kidney cancer and adjacent normal tissues. mRNA isolated from cancerous and adjacent normal kidney tissues from the same patient were reverse-transcribed, and the double-stranded cDNA PCR-amplified by using primers specific to the ENO1 gene (A) and the TMSB4X gene (B) was loaded on a DHPLC system. DNA elution was followed by absorbance (mV) in function of retention time (minutes). Normal and cancerous elution profiles are shown in green and black, respectively. Variations due to repeated PCR applied to ENO1 cDNA are shown in C (green and red, normal; black and blue, cancer). Variations due to DHPLC loading of the ENO1 PCR products are shown in D with the same color coding. the composition of the base undergoing substitution and second by the composition of the bases that immediately precede or follow this event. Applying stringent filters did not erase the differences between cancer and normal groups. Therefore, cancer EST variations reflect the fact that a small but significant proportion of cancer cell mRNA are heterogeneous and do not carry the information predicted by simple base-pairing to the human genome.
Direct comparison of mRNA heterogeneity between cancer and normal cells from the same patient indicate that differences exist for some but not all mRNA. It is also most probable that genes affected are cancer-specific. We speculate that a large sequencing effort of normal and cancer cDNA libraries from the same patient using accurate pyrosequencing method (15,16) will lead to the definition of cancer-specific molecular signature of mRNA base substitution.
There are five limitations of our current study. First, the method of analysis does not take into account variations of EST sequences other than single base substitutions. Testing the possibility that deletion(s) and/or insertion(s) are also present requires a different analytical strategy. Second, the analysis is restricted to substitution events considered independent of one another. A longitudinal analysis of the relationship between substitution events is needed. Third, the statistical method chosen for this first study was conservative. The less stringent Fisher exact test will be needed to achieve complete coverage of gene lengths and allow testing of low-abundance genes. A large proportion (76%) of tested genes showed increased cancer EST heterogeneity and C/N Ͼ 2. However, selection criteria are biased toward genes highly transcribed and with relatively short transcripts. Hence, the conclusion may not extend to lowabundance genes and mRNA. Fourth, the analysis did not discriminate between different cancer types or cancer cell lines. We verified that a greater statistically significant EST heterogeneity persisted in the cancer set after removing ESTs produced from cultured cancer cells (data not shown). Fifth, we are currently unable to determine whether EST heterogeneity occurs in normal cells following a pattern similar to or different from that of cancer cells. Indeed, the number of N Ͼ C events is not in large excess of the estimated rate of false positives. Thus, N Ͼ C events contain a large proportion of false positives that prevent meaningful data interpretation. To address this important issue we need larger sets of data comparing the mRNA sequence of cancer and normal cells from the same individual.
The next issue is to define the origin of cancer EST heterogeneity. The first source is mutations affecting genomic DNA. We have excluded from our analysis all putative and biologically validated SNPs. Thus, except if a large number of SNPs are currently unsuspected, it is unlikely that SNPs are responsible for cancer mRNA heterogeneity. Furthermore, within the SNP pool of the 17 studied genes, C and G are preferentially affected (68% of 442 SNPs), whereas A and T are preferentially affected in cancer ESTs (65% of 2,281 substitution events).
Alternatively, one can speculate that somatic mutations of cancer cell DNA account for mRNA heterogeneity. Similar to SNPs of the current study, colon and breast cancer somatic mutations were shown to affect preferentially C and G bases (80.6% of cases). However, in-depth efforts of breast and colon cancer DNA sequencing that included 14 of 17 genes used in our study led to an estimated somatic mutation rate of 3.1 mutations per 10 6 bases (2). Sites of cancer EST variations are 3 to 4 orders of magnitude more commonly encountered than those of somatic mutations. This is not to say that all cancer cell mRNA carries 10 mutations every 100 bases but rather that up to 10 bases per 100 bases can be substituted on any given cancer mRNA. Thus, it is likely that EST heterogeneity occurs not at the genomic level, but rather at the pmRNA level.
Two mechanisms are envisioned. First, RNA editing is able to change C to U, U to C, and A to I read as G (4)(5)(6). In the cancer set we observed at the cDNA level 5.7% C3T, 9.2% T3C, and 4.7% A3G changes. Thus, mRNA editing cannot account for Ͼ20% of single base substitutions described here. Indeed, the most common base substitutions, A3C (24.6%) and T3G (16.8%), represent base family changes that are not explained by known human enzymatic RNA editing processes.
The alternate hypothesis is therefore to consider that cancer mRNA heterogeneity occurs as a result of transcription infidelity, leading most frequently (73%) to repeat of b Ϫ 1 or b ϩ 1.
Cases of transcription infidelity have been reported. In the Brattleboro rat with diabetes insipidus resulting from a ϩ 2 frameshift in the vasopressin gene, GA deletion of the GAGAG sequence of the mRNA reverts part of the transcript to normal and improves the phenotype (17,18). Transcriptional frameshift events affecting repetitive A sequence of ␤-amyloid and ubiquitin B yield proteins with alternate reading frames that are detected by immunological staining of Alzheimer's disease plaques (19,20). Transcriptional infidelity of dog AP3B1 gene yields to the addition or removal of a single A within a poly(A) stretch (21). In the rat p53 gene A 6 transcription leads to the insertion of an extra A in 9% of cloned transcripts (22).
In vitro assays of transcription infidelity using multisubunit yeast RNAP II show misalignment and b ϩ 1 base replacement (7,8). Forward misalignment results from extrahelical flipping out of the substituted base on DNA template. Crystal structure data show that space is available within the RNAP active site to transiently accommodate the flipped-out base and hence allow misalignment. Thus, a molecular mechanism explaining cases where replacement base is b ϩ 1 is documented. However, no molecular model currently explains our most common observation, i.e., repetition of b Ϫ 1. Consistent with the hypothesis that cancer EST heterogeneity is due to transcription infidelity is the finding that bases most influential of the substitution events are not only the substituted base itself (b0), but also the four bases located upstream and the three downstream of this event. This pattern corresponds in the elongation complex with the first four RNA-DNA base-pairing and to the three transcription-driven melted bases (23). In vitro studies of RNAP infidelity established that DNA grip three bases downstream of the misalignment event is critical at controlling transcription fidelity (24,25).
Although this is not the first report of transcription infidelity, our analysis introduces the notion that the phenomenon might occur on a previously unsuspected scale in cancer samples where it could affect a large proportion of genes. Furthermore, we show that transcription infidelity represents a nonrandom phenomenon driven in part by genomic DNA sequence. The underlying causes of the increase of transcription infidelity in cancer samples are currently unknown.
The relatively low selectivity of RNAP makes misincorporation unavoidable (26). Transcript-assisted transcriptional proofreading with strong Mg 2ϩ dependence corrects base misincorporation events (27). Most of the RNA in erroneous complexes are cleaved, but a fraction is extended past the misincorporation site (28). We compared in vitro efficacy of transcription-assisted transcriptional proofreading by measuring the k cat /K d ratio from data of Zenkin et al. (27) obtained with an in vitro model where b Ϫ 1 ϭ G and b ϩ 1 ϭ C. Interestingly, the highest efficacy of repair was for substitution with A. Among 138 substitutions that in our pool correspond to the same sequence context as that of Zenkin et al. (27), only seven substitutions with A were observed. Furthermore, Zenkin et al. (27) show that C3G substitutions led to equal ratio of cleavage versus extended RNA; we observed 67 C3G substitutions of 138. We thus speculate that increased transcription infidelity in the cancer set is due in part to defective transcription-assisted transcriptional proofreading. Alterna-tively, heterogeneous mRNA might accumulate in cancer cells because of a decrease in their degradation rate.
Cancer mRNA heterogeneity might explain the relative lack of reproducibility of cancer microarray transcriptome experiments (29). If heterogeneous cancer mRNA are translated, this would cause the occurrence of a myriad of protein variants, possibly explaining the complexity of current cancer proteomic results (30).
If the hypothesis of increased transcription infidelity during carcinogenesis is confirmed, we will have to recognize that EST sequencing efforts contributed by many laboratories and often considered of limited value have allowed the emergence of a new paradigm, perhaps opening new avenues toward a better understanding of cancer biology. Filtering Procedure. ESTs that did not align continuously with RefSeq on 70% of its length were removed.

Materials and Methods
For pseudogenes and paralogues filter, pseudogene sequences were downloaded from the http://pseudogene.org database by Ensembl identification number. Paralogue sequences were generated by sequence alignment of RefSeq against human RefSeq mRNA by using BLASTN, MegaBLAST, or Discontiguous MegaBLAST with default parameters (except W ϭ 16 for MegaBLAST). Homologous sequences were defined by BLASTN results completed by MegaBLAST and Discontiguous MegaBLAST results. ESTs deriving from pseudogene or paralogue sequences (Ps/Pa) were removed based on the following calculation: Cost 1 corresponds to the number of mismatches between RefSeq and EST. Cost 2 is calculated as the number of mismatches between EST and RefSeq ϩ the number of mismatches between Ps/Pa and RefSeq Ϫ [2 ϫ the number of common mismatches (with the same replacement base) between Ps/Pa and EST] Ϫ the number of common mismatches (with different replacement base) between Ps/Pa and RefSeq. When Cost 2 is lower than Cost 1, it is assumed that the EST sequence corresponds more closely to Ps/Pa, and the EST is removed.
Fifty bases were deleted on both 5Ј and 3Ј extremities of each aligned EST.

DHPLC Profiles of Kidney Cancer Tissue Versus Normal Adjacent Tissue.
cDNA from cancerous and adjacent normal kidney tissues obtained from the same individual (BioChain; CliniSciences, Montrouge, France) were amplified by PCR with oligonucleotides complementary to the ENO1 gene (positions 1505-1524 and 1788-1812), the TMSB4X gene (positions 311-335 and 590-614), and the Pfx polymerase (Invitrogen, Cergy-Pontoise, France). These cDNA of 300 bp were then purified on Nucleospin Extract II columns (Macherey-Nagel, Hoerdt, France), visualized on agarose gel, quantified, and normalized. Another PCR cycle was performed by using 50 ng of each cDNA with a Pfx DNA polymerase. Samples were then denatured for 5 min at 95°C and slowly renatured by decreasing the temperature by 1°C for 38 seconds and 72 cycles. The double-stranded DNA samples were then injected on a DHPLC system (Transgenomic, Elancourt, France). The temperature of the oven for each gene was selected by using Navigator software. For the ENO1 gene the temperature was 61.5°C, and for TMSB4X the temperature was 55°C. DNA elution was followed by absorbance (millivolts) in function of retention time (minutes). Curves were visualized and normalized with Navigator software.