Intron distribution difference for 276 ancient and 131 modern genes suggests the existence of ancient introns

  1. Alexei Fedorov*,
  2. Xiaohong Cao*,,
  3. Serge Saxonov*,,
  4. Sandro J. de Souza§,
  5. Scott W. Roy*, and
  6. Walter Gilbert*,
  1. *Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138; and §Laboratory of Computational Biology, Ludwig Institute for Cancer Research, São Paulo Branch, Rua Professor Antonio Prudente, 109-4. andar, CEP 01509-010, São Paulo, Brazil
  1. Contributed by Walter Gilbert

Abstract

Do introns delineate elements of protein tertiary structure? This issue is crucial to the debate about the role and origin of introns. We present an analysis of the full set of proteins with known three-dimensional structures that have homologs with intron positions recorded in GenBank. A computer program was generated that maps on a reference sequence the positions of all introns in homologous genes. We have applied this program to a set of 665 nonredundant protein sequences with defined three-dimensional structures in the Protein Data Bank (PDB), which yielded 8,217 introns in 407 proteins. For the subset of proteins corresponding to ancient conserved regions (ACR), we find that there is a correlation of phase-zero introns with the boundary regions of modules and no correlation for the phase-one and phase-two positions. However, for a subset of proteins without prokaryotic counterparts (131 non-ACR proteins), a set of presumably modern proteins (or proteins that have diverged extremely far from any ancestral form), we do not find any correlation of phase-zero intron positions with three-dimensional structure. Furthermore, we find an anticorrelation of phase-one intron positions with module boundaries: they actually have a preference for the interior of modules. This finding is explicable as a preference for phase-one introns to lie in glycines, between G|G sequences, the preference for glycines being anticorrelated with the three-dimensional modules. We interpret this anticorrelation as a sign that a number of phase-one introns, and hence many modern introns, have been inserted into G|G “protosplice” sequences.

Footnotes

  • Present address: Genzyme Corporation, 5 Mountain Road, Framingham, MA 01701.

  • Present address: Stanford Medical Informatics, 251 Campus Drive, Medical School Office Building X-215, Stanford, CA 94305.

  • To whom reprint requests should be addressed. E-mail: gilbert{at}nucleus.harvard.edu.

  • Abbreviations:
    EID,
    exon/intron database;
    ACR,
    ancient conserved regions
« Previous | Next Article »Table of Contents