Genomic fossils as a snapshot of the human transcriptome

  1. Ronen Shemesh*,,
  2. Amit Novik*,,
  3. Sarit Edelheit*, and
  4. Rotem Sorek*,,§
  1. *Compugen Ltd., 72 Pinchas Rosen Street, Tel Aviv 69512, Israel; and Department of Human Genetics and Molecular Medicine, Sackler Faculty of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel
  1. Edited by Francisco J. Ayala, University of California, Irvine, CA (received for review October 26, 2005)

Abstract

Processed pseudogenes (PPGs) are cDNA sequences that were generated through reverse transcription of mature, spliced mRNAs and have subsequently been reinserted at a new genomic location. These cDNA sequences are usually no longer transcribed and are considered “dead on arrival.” Here we show that PPGs can be used to generate a map of the transcriptome. By analyzing thousands of human PPGs, we were able to discover hundreds of transcript variants so far unidentified. An experimental verification of a subset of these variants by RT-PCR indicates that most of them are still active in the human transcriptome. Furthermore, we demonstrate that PPGs can enable the identification of ancient splice variants that were expressed ancestrally but are now extinct. Our results show that the genome itself carries a “virtual cDNA library” that can readily be used to analyze both present and ancestral transcripts. Our approach can be applied to sequenced metazoan genomes to computationally annotate splicing variation even when expressed sequences are unavailable.

Footnotes

  • § To whom correspondence should be addressed. E-mail: sorek{at}post.tau.ac.il.

  • R. Shemesh and A.N. contributed equally to this work.

  • Author contributions: R. Shemesh, A.N., and R. Sorek designed research; R. Shemesh, A.N., S.E., and R. Sorek performed research; R. Shemesh, A.N., and R. Sorek analyzed data; and R. Sorek wrote the paper.

  • Conflict of interest statement: No conflicts declared.

  • This paper was submitted directly (Track II) to the PNAS office.

  • Abbreviation: PPG, processed pseudogene.

« Previous | Next Article »Table of Contents