Submit Papers Directly to PNAS via Track II  Sign up for PNAS Online eTocs
Link: Info for AuthorsLink: Editorial BoardLink: AboutLink: SubscribeLink: AdvertiseLink: ContactLink: Sitemap Link: PNAS Home
Proceedings of the National Academy of Sciences
Link: Current Issue "" Link: Archives "" Link: Online Submission ""  Link: Advanced Search

Published online on September 7, 2006, 10.1073/pnas.0606239103
PNAS | September 19, 2006 | vol. 103 | no. 38 | 14056-14061
OPEN ACCESS ARTICLE


This Article
Free via Open Access: OA
Right arrow OA Abstract
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Supporting Information
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via ISI Web of Science (6)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Choi, I.-G.
Right arrow Articles by Kim, S.-H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Choi, I.-G.
Right arrow Articles by Kim, S.-H.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg  
What's this?

 Previous Article  | Table of Contents |  Next Article 

BIOLOGICAL SCIENCES / EVOLUTION
Evolution of protein structural classes and protein sequence families

In-Geol Choi*, and Sung-Hou Kim*,{dagger},{ddagger}

*Physical Biosciences Division, Lawrence Berkeley National Laboratory, and {dagger}Department of Chemistry, University of California, Berkeley, CA 94720

Contributed by Sung-Hou Kim, July 30, 2006


    Abstract
 Top
 Abstract
 Results
 Discussion
 Materials and Methods
 Acknowledgements
 References
 

In protein structure space, protein structures cluster into four elongated regions when mapped based solely on similarity among the 3D structures. These four regions correspond to the four major classes of present-day proteins defined by the contents of secondary structure types and their topological arrangement. Evolution of and restriction to these four classes suggest that, in most cases, the evolution of genes may have been constrained or selected to those genetic changes that results in structurally stable proteins occupying one of the four "allowed" regions of the protein structure space, "structural selection," an important component of natural selection in gene evolution. Our studies on tracing the "common structural ancestor" for each protein sequence family of known structure suggest that: (i) recently emerged proteins belong mostly to three classes; (ii) the proteins that emerged earlier evolved to gain a new class; and (iii) the proteins that emerged earliest evolved to become the present-day proteins in the four major classes, with the fourth-class proteins becoming the most dominant population. Furthermore, our studies also show that not all present-day proteins evolved from one single set of proteins in the last common ancestral organism, but new common ancestral proteins were "born" at different evolutionary times, not traceable to one or two ancestral proteins: "the multiple birth model" for the evolution of protein sequence families.

protein fold classes | common structural ancestor | evolutionary age | protein structure universe


The protein universe (1), the totality of all proteins in all organisms on earth, is vast. However, an estimate of the order of magnitude can be made (Table 1): Although the currently known genome sizes range from 106 to 1011 DNA base pairs, the number of genes are estimated to range only from 103 to <105 per organism (www.ncbi.nlm.nih.gov/genomes). Taking the estimated 13.6 million species of living organisms on Earth (2), which is very likely to be an underestimate, into account, there are >1010 to 1012 different proteins in all organisms from the three domains of life (Eukarya, Bacteria, and Archaea) on Earth. However, this vast number of proteins are predicted to consist of only {approx}105 sequence domain families (3), the members of each family having similar amino acid sequences (4). The sizes of the sequence domain families have a power law distribution (Fig. 1): Most families have a small number of members, but some have a very large number of members. We expect similar distribution for sequence families. Most of these {approx}105 sequence families are estimated to belong to {approx}104 structural families (57), because some sequence families turn out to have the same 3D structural fold. Some protein structures consist of more than one domain and, at present, {approx}103 structures of fold domains are known (8).


View this table:
[in this window]
[in a new window]
 
Table 1. The estimated orders of magnitude of the total numbers in various categories for all proteins in all organisms on Earth

 


Figure 1
View larger version (13K):
[in this window]
[in a new window]
 
Fig. 1. The family sizes of protein sequence domains in Pfam database (3) (release 16; 7,677 Pfam families) have a power law distribution. The Pfam families (x axis) were sorted by their family size. The number of members in a given Pfam family (y axis) was truncated from >1,000 in this plot. The dotted line indicates the median family size (41) of all protein sequence families in Pfam.

There has been a long history of attempts for classification of known protein structures based on subjective analysis of the secondary structure contents of proteins and their topological arrangements in the structures (912) and on objective analysis of 3D coordinates of C{alpha} atoms in protein structures (1, 13, 14). These attempts resulted in, among others, two excellent databases of protein structure classification, CATH (14) and SCOP (11). A recent study, based solely on objective similarity among the 3D structures represented by C{alpha} atoms and using a much larger structure database and multidimensional scaling, revealed that all of the known protein folds (15) and protein structures (16) cluster into four elongated regions in the very sparsely populated protein structure space (Fig. 2). Interestingly, these four groups correspond approximately to the four classes defined by Levitt and Chothia (9) and used in SCOP, the Structural Classification of Proteins (11).


Figure 2
View larger version (35K):
[in this window]
[in a new window]
 
Fig. 2. A global view of the protein structure space (16). The 1,898 nonredundant protein structures from Protein Data Bank are mapped in the 3D space to visualize the major feature of the map. The protein structure space is sparsely populated, and all of the proteins of known structures cluster mostly into four elongated regions, which correspond approximately to four SCOP classes (all-{alpha}, all-beta, {alpha}+beta, and {alpha}/beta) of protein structures indicated by red, yellow, purple, and cyan spheres, respectively. The small proteins and multidomain protein classes are represented by green and black spheres, respectively. All structural class assignments were based on the SCOP classification. Three axes are drawn in to visualize high-population regions of all-{alpha}, all-beta, and {alpha}/beta class proteins, and the "origin" is represented by a large orange ball at the point where two of the axes meet.

The fact that most of proteins are structured and that the protein structure space is very sparsely populated and restricted mostly to the four elongated regions suggest that mutations in genes encoding proteins have been constrained to those resulting in a structurally viable protein occupying one of the four allowed regions of the protein structure space: structural selection or "designability" (17, 18).

To obtain information on the evolution of these structural classes, we present a simple way of estimating the evolutionary ages of the common structural ancestor (CSA) of each protein sequence family of known fold. Assigning the age of the CSA of a protein family represented by each representative protein in the protein structure space (16) makes it possible to imbed the evolutionary information into the map of protein structure universe. We assign the age of the CSA of a protein family to be the same as the age of the most recent common ancestral organism that presumably contained the CSA of the family. Finally, we convert the map of the protein structure universe into the map of the ages of CSAs. Based on the analysis of these maps of protein structure universe and the evolutionary ages of the CSAs, we propose a model for the evolution of protein structural classes and a model for the evolution of protein sequence families.

We start with the following facts and assumptions:

  1. There is a key difference between the evolution of organisms vs. the evolution of proteins: the current model of evolution of organisms has the absolute requirement of reproduction of organisms and, thus, all present-day organisms ultimately come from one common ancestor organism. However, the evolution of proteins, therefore genes, does not need to follow the evolutionary path of organismic reproduction. Rather, the evolution of proteins is directly related to improved, unaltered, or diversified molecular functions, and the protein function is directly related to protein structure.
  2. Protein structures are more conserved than sequences in evolution, thus most proteins in a given sequence family have similar or related molecular structures.
  3. All information about protein structures is derived from the proteins of present-day organisms, and the protein universe of the present-day organisms represents a time-sliced view of all proteins at their various stages of evolution.


    Results
 Top
 Abstract
 Results
 Discussion
 Materials and Methods
 Acknowledgements
 References
 
Evolutionary Age of CSAs. Mapping the protein structure universe revealed four major clusters of protein structures (1, 15, 16). An examination of the map suggested a hint of imbedded evolutionary time in the map. To estimate the "age" of a protein structure in the map, we define the term CSA: For a given protein structure, all its sequence homologues are searched from a sequence database, for example, from the Pfam database (3), and all of the organisms that contain the genes coding for the members of that sequence family are identified. We then find the most recent common ancestor (MRCA) node of these organisms in the phylogenetic tree of life constructed based on the small subunit rRNA gene as described in Materials and Methods. We make an assumption that the CSA of the protein and its family members was present in the MRCA organism (Fig. 3a), and that the age of the CSA is represented by the phylogenetic distance between the MRCA and the reference node in the tree. The proecdure is shown schematically in Fig. 3b.


Figure 3
View larger version (27K):
[in this window]
[in a new window]
 
Fig. 3. Schematic diagram for building a phylogenetic tree representing all of the organisms that contain the proteins of known structures or their sequence homologues (a) and assigning the age of the CSA of a protein family (b). The MRCA organism of the organisms represented by the members of a protein family is traced in the e tree (red solid line). We then assume that the CSA resided in MRCA organism, and we assign the phylogenetic distance (the sum of black thick solid lines) from present day to MRCA as the age of the CSA.

Evolution of the Relative Abundance of the Protein Structural Classes. When each protein structure in the protein structure space (Fig. 2) is represented by the relative age of the CSA (Fig. 4a) of the protein family to which it belongs, we see a general trend: the proteins with young CSA age (blue) belong mostly to three classes ({alpha}, beta, and {alpha}+beta classes), those with middle age (green or yellow) belong to the same three classes plus {alpha}/beta class, and, finally, the majority of the CSAs of old age (red) belong to {alpha}/beta class. This observation suggests that recently born and still-evolving proteins belong to all-{alpha} or all-beta class (as well as their random mixtures, {alpha}+beta class), but the majority of the "mature" proteins belong to {alpha}/beta class. The trends of the evolution of the protein structural classes are more easily visible in a distribution of structural classes across the evolutionary ages (Fig. 4b Upper) or the relative percent population of structural classes in a given evolutionary age (Fig. 4b Lower).


Figure 4
View larger version (43K):
[in this window]
[in a new window]
 
Fig. 4. Evolution of the relative abundance of the protein structural classes. (a) The "age map" of CSAs. The color gradient, from blue (the youngest) to red (the oldest), represents the relative age of the CSAs of the protein families represented by each of the nonredundant protein structures and their sequence homologues. We used the average age of 22 nearest neighbors (the median of the number of statistically significant neighbors; Dalilite z score ≥ 2) of each point to reveal the major trends by smoothing out the noise. The proteins near the origin are youngest, and those near the end of {alpha}/beta axis are the oldest. (b) Relative abundance of structural classes to which all of the protein families belong vs. the ages of their CSAs. (b Upper) The total number of CSAs in each structural class is normalized to 1, so that the population density of each fold class is plotted vs. relative evolutionary ages. (b Lower) The sum of CSAs in all four classes at a given age is normalized to 1, and the percentage population for four classes is plotted for that age. Both are alternative presentations of Fig. 4a. (c) The "chain length map" of the protein structure space, where each of the nonredundant protein structures is represented by its chain length: from blue for proteins of short chain length (<50 residues) to red for those of long chain length (>300 residues). The proteins near the origin are short, and those near the ends of the feature axes are large.

We also notice that the protein chain lengths correlate significantly (Spearman's rank correlation coefficient r = 0.3098, P < 2 x 10–16) with the ages of CSAs (Fig. 4c). These observations combined with the assumption that the present-day proteins represent the entire spectrum of proteins at different stages of evolution from their respective CSAs, we propose a scenario for the evolution of protein structural classes: ancestral proteins of small short secondary structures primarily in three classes ({alpha}, beta, and {alpha}+beta classes) evolve to medium-sized proteins of four classes ({alpha}, beta, {alpha}+beta, and {alpha}/beta classes) in roughly similar proportions, then to larger proteins with a preponderance in {alpha}/beta class, as schematically shown in Fig. 5.


Figure 5
View larger version (14K):
[in this window]
[in a new window]
 
Fig. 5. Proposed scenario for the evolution of protein structural classes extrapolated from the age map of common structural ancestors. The age map (Fig. 4a) is a snapshot at present time of the global evolutionary process of protein structural classes. The observation that the age map is highly correlated with the chain length map (Fig. 4c) suggests an evolutionary history of structural classes in which ancestral proteins of small and short secondary structures primarily in three classes ({alpha}, beta, and {alpha}+beta classes) evolve to medium-sized proteins in four classes ({alpha}, beta, {alpha}+beta, and {alpha}/beta classes) with the least amount in {alpha}/beta class, then to larger proteins with a preponderance in {alpha}/beta class.

Evolution of Protein Families: Multiple Birth Model. We have expanded our approach to estimate the evolutionary ages of all curated protein sequence families in Pfam. As was evident from the ages of the CSAs of the proteins with known structural folds, not all present-day proteins are evolved from the proteins of the last common ancestor, but new CSAs can be traced to various points through out the evolutionary time. The above information combined again with the assumption that the protein universe of the present-day organisms represents a time-sliced view of all proteins at their various stages of evolution; we propose a possible scenario for the evolution of protein families as illustrated in Fig. 6. We hypothesize that, although all present-day organisms may have evolved from the last common ancestral organisms by organismic replication from an ancestor to a descendant organism, most of the present-day protein families were not evolved from their ancestor proteins existed in the last common ancestral organism (Fig. 6a) as expected for "the single birth model" of protein family evolution (19), but new CSAs were born throughout evolutionary time (Fig. 6b, the multiple birth model of protein evolution), and they evolved to the present-day proteins or died out.


Figure 6
View larger version (24K):
[in this window]
[in a new window]
 
Fig. 6. Model for the evolution of protein families: (a) Single birth model of protein families, where all of the present-day protein families are evolved from the proteins, existed in the last common ancestral organism. Each colored circle represents a CSA of a protein sequence family of present day, whose evolutionary path is schematically shown by the tree of the same color. The large rectangular box surrounding the circles indicates the single birth event of the CSAs of all protein sequence families, some of which died out. (b) Multiple birth model of protein families, where the CSAs of the present-day protein families (represented by circles in squares) emerged at different points in the evolutionary time.


    Discussion
 Top
 Abstract
 Results
 Discussion
 Materials and Methods
 Acknowledgements
 References
 
We emphasize that our studies are aimed at gaining a coarse-grained global view and overall trends associated with the evolution of the protein structure classes and sequence families. In our multistep processes projecting the evolutionary ages onto the protein structural space map, many details are "smoothed out" to extract the major trends of evolution of the protein structure classes and sequence families, such as the effect of horizontal gene transfer and sampling of only those globular proteins for which the 3D structures are known. For example, we remove those proteins that may have entered an organism through horizontal gene transfer by the jackknife test as described in Materials and Methods. Some of our conclusion is consistent with others. For example, the {alpha}/beta class proteins as the most ancient proteins also have been suggested by parsimonious scenario of fold occurrence in genomes (20), and birth, death, and diversification of genes have been described in ref. 21.

There are several questions invoked by the features of the protein structural space and its evolutionary implications. Some of them are as follows:

  1. How is the gene for a new CSA born? Because the new CSA has no traceable single ancestral protein, we propose that the new gene for the CSA was constructed of multiple gene fragments, for example, by multiple recombination events mediated by phages, viruses, or other mechanisms.
  2. Is the protein structure space constantly expanding or has it reached an equilibrium state? One possible argument for the equilibrium state is that a newly born protein evolves into gradually larger-sized proteins of improved, neutral, or diversified functions until it reaches an equilibrium, at which point destabilizing effects of the large size (of the protein, thus, its gene) outweigh the additional changes in function or diversity.
  3. What is the implication of Fig. 3 that reveals three evolutionary stages where the relative abundance of the four major protein structure classes changed their relative ranking? One possible implication is that there were three evolutionary periods when the Earth environment changed dramatically.


    Materials and Methods
 Top
 Abstract
 Results
 Discussion
 Materials and Methods
 Acknowledgements
 References
 
Construction of a Phylogenetic Tree Representing All of the Organisms That Contain the Proteins of Known Structures or Their Sequence Homologues. We used the 1,898 protein chains representing a nonredundant set of all of the known protein structures in Protein Data Bank (PDB) [PDB_select 25 data set (22) used by Hou et al. (16) for mapping the protein structure space] as a reference data set. For each chain, we identified the protein domain family in the Pfam database (Release 16.0) (3) to which it belongs and all of the organisms represented by the members of the family. To reconstruct the phylogenetic tree of organisms covering all members of the retrieved protein families, we combined the taxonomic sources of all members of the protein families and extracted nonredundant species (65,532 organisms).

To simplify the tree structure, we grouped the nonredundant organisms at a higher level of taxonomic classification. When the fourth level of taxonomic classification listed in the Pfam database is used for grouping, the final number of taxa was reduced to 468. Among them, the gene sequences of small subunit rRNA (16S rRNA of prokaryotes or 18S rRNA of eukaryotes) of 345 taxa were available in the European ribosomal RNA database (23) (Table 2, which is published as supporting information on the PNAS web site). For each of these 345 taxa, we chose the longest small subunit rRNA sequence (but not shorter than 1,200 bases). Using these prealigned rRNA sequences, a universal tree of life for the 345 taxa was constructed by using neighbor joining (NJ) and maximum likelihood (ML) methods by using the PHYLIP package (24). For the NJ tree, we used 100 bootstrapped sequence replicates and obtained a consensus tree. Because the consensus tree does not produce branch lengths, the branch lengths of the consensus tree were recalculated from maximum likelihood method while keeping NJ tree topology. The ML tree was built under the assumption of constant rate with F84 model of sequence evolution. For both tree-building procedures, we used a bacterial taxon (Aqufiex pyrophilus) as an outgroup. Because both trees were topologically similar and the evolutionary ages of protein structural classes calculated by the method (see below) showed no disparity in terms of shape of distribution and overall trend (Fig. 7, which is published as supporting information on the PNAS web site), we selected the ML tree as a reference tree for the purpose of obtaining a global view of the phylogenetic relationship among the organisms at a higher taxonomic level.

Estimating the Relative Age of the CSAs. We make the assumption that the CSA of a given protein sequence family appeared most recently in an organism at the MRCA node and the age of the CSA is represented by the branch length between the MRCA and a common reference node. To determine the MRCA node in the tree, we mapped all members of the Pfam family to which a given protein of known structure belongs on the tree. To remove the effect of horizontal gene transfer on estimating the CSA ages, we first tested the congruency of multiple MRCA nodes of the organisms represented by a protein family by using a jackknife operation: Each MRCA node is identified for all member organisms minus one, and the evolutionary age of the MRCA is estimated; examine the ages of the multiple MRCAs and remove those that are statistically outside of the mono-modal distribution; and take the median value of the remaining ages. The ages of CSAs were normalized to be in the range of 0 to 1. These relative ages are assigned to each of the 1,898 nonredundant protein structures in the protein structure space to visualize the major features of the distribution of the ages vs. protein structure classes (see below).

Mapping the Relative Ages of CSAs on the Protein Structure Space. Mapping of the relative ages of proteins of known structure is done in two stages: First, 1,898 nonredundant protein structures are positions (mapped) in the protein structure space based on their all-to-all structural similarities as described in Hou et al. (15, 16) and briefly summarized below. As mentioned earlier, we used the PDB_select 25 data set, which contained 1,949 protein chains with <25% pairwise sequence identity. Of those, 51 chains were further removed because of low resolution or length requirements of the DaliLite (25) program that we used to calculate the similarity of protein structures. The remaining data set has 1,898 chains. The pairwise structural similarity for the 1,898 protein chains were measured by using the DaliLite program. The 1,898 x 1,898 similarity score matrix [sij] (where i = 1,...,1,898; j = 1,...,1,898) was converted to dissimilarity matrix [dij], "distance metrix," by using

Formula 1

where s99.95 is the 99.95 percentile value of the maximum value among all off-diagonal sij's (i.e., i != j). The dissimilarity matrix then was subject to the classical multidimensional scaling (MDS) procedure (26) to find the positional coordinates in a multidimensional (1,898 dimension) space of the protein structure universe. We used s99.95 to prevent a few extremely large similarity scores from dominating the distribution feature of the structural space map. To capture and visualize the major features of the high dimensional space, we represent the protein structure space in three dimensions (Fig. 1) by using the three components with highest eigenvalues, which are substantially greater than the rest.

Second, we represent the relative age of each of the nonredundant protein structures by the relative age of the CSA of the sequence family to which that particular protein belongs. Then, we population average by replacing the age of each CSA by the average age of 22 nearest neighbors weighted on the distances in the map. The number of nearest neighbors was chosen by the median of the number of statistically significant score pairs (DaliLite z score ≥2) of 1,898 protein chains. The weighted population averaging is to visualize the major trends of the "age map of CSAs" and to smooth out the "noise" due to factors such as horizontal transfer of genes, sparse sampling of protein families, and the members of each family (Fig. 4).


    Acknowledgements
 Top
 Abstract
 Results
 Discussion
 Materials and Methods
 Acknowledgements
 References
 
We thank Drs. Jingtong Hou, Gregory Sims, and Se-Ran Jun for our weekly discussions on the subjects of this work as well as other related subjects and Drs. David Eisenberg, Norman Pace, and Yun S. Song, whose expertise helped us to improve our thoughts, for valuable comments. This work has been supported by National Institutes of Health Grant GM62412.


    Footnotes
 

Abbreviations: CSA, common structural ancestor; MRCA, most recent common ancestor.

{ddagger}To whom correspondence should be addressed. E-mail: shkim{at}lbl.gov

Freely available online through the PNAS open access option.

Author contributions: I.-G.C. and S.-H.K. designed research; I.-G.C. performed research; I.-G.C. and S.-H.K. analyzed data; and I.-G.C. and S.-H.K. wrote the paper.

The authors declare no conflict of interest.

© 2006 by The National Academy of Sciences of the USA


    References
 Top
 Abstract
 Results
 Discussion
 Materials and Methods
 Acknowledgements
 References
 

  1. Holm, L & Sander, C. (1996) Science 273, 595–603.[Abstract/Free Full Text]
  2. Hawksworth, PM & Kalin-Arroyo, MT. (1995) Global Biodiversity Assessment (Cambridge Univ Press, Cambridge, UK,).
  3. Bateman, A, Coin, L, Durbin, R, Finn, RD, Hollich, V, Griffiths-Jones, S, Khanna, A, Marshall, M, Moxon, S & Sonnhammer, ELL, et al. (2004) Nucleic Acids Res 32, D138–D141.[Abstract/Free Full Text]
  4. Dayhoff, MO. (1976) Fed Proc 35, 2132–2138.[ISI][Medline]
  5. Wolf, YI, Grishin, NV & Koonin, EV. (2000) J Mol Biol 299, 897–905.[CrossRef][ISI][Medline]
  6. Denton, M & Marshall, C. (2001) Nature 410, 417.[CrossRef][Medline]
  7. Coulson, AF & Moult, J. (2002) Proteins 46, 61–71.[CrossRef][ISI][Medline]
  8. Andreeva, A, Howorth, D, Brenner, SE, Hubbard, TJP, Chothia, C & Murzin, AG. (2004) Nucleic Acids Res 32, D226–D229.[Abstract/Free Full Text]
  9. Levitt, M & Chothia, C. (1976) Nature 261, 552–558.[CrossRef][Medline]
  10. Richardson, JS. (1977) Nature 268, 495–500.[CrossRef][Medline]
  11. Murzin, AG, Brenner, SE, Hubbard, T & Chothia, C. (1995) J Mol Biol 247, 536–540.[CrossRef][ISI][Medline]
  12. Chothia, C, Hubbard, T, Brenner, S, Barns, H & Murzin, A. (1997) Annu Rev Biophys Biomol Struct 26, 597–627.[CrossRef][ISI][Medline]
  13. Michie, AD, Orengo, CA & Thornton, JM. (1996) J Mol Biol 262, 168–185.[CrossRef][ISI][Medline]
  14. Orengo, CA, Michie, AD, Jones, S, Jones, DT, Swindells, MB & Thornton, JM. (1997) Structure (London) 5, 1093–108.
  15. Hou, J, Sims, GE, Zhang, C & Kim, SH. (2003) Proc Natl Acad Sci USA 100, 2386–2390.[Abstract/Free Full Text]
  16. Hou, J, Jun, SR, Zhang, C & Kim, SH. (2005) Proc Natl Acad Sci USA 102, 3651–3656.[Abstract/Free Full Text]
  17. Li, H, Helling, R, Tang, C & Wingreen, N. (1996) Science 273, 666–669.[Abstract]
  18. Tiana, G, Shakhnovich, BE, Dokholyan, NV & Shakhnovich, EI. (2004) Proc Natl Acad Sci USA 101, 2846–2851.[Abstract/Free Full Text]
  19. Chothia, C, Gough, J, Vogel, C & Teichmann, SA. (2003) Science 300, 1701–1703.[Abstract/Free Full Text]
  20. Winstanley, HF, Abeln, S & Deane, CM. (2005) Bioinformatics 21, Suppl 1, i449–i458.[Abstract]
  21. Koonin, EV, Wolf, YI & Karev, GP. (2002) Nature 420, 218–223.[CrossRef][Medline]
  22. Hobohm, U & Sander, C. (1994) Protein Sci 3, 522–524.[Abstract]
  23. Wuyts, J, Perriere, G & Van de Peer, Y. (2004) Nucleic Acids Res 32, D101–D103.[Abstract/Free Full Text]
  24. Felsenstein, J. (1989) Cladistics 5, 164–166.
  25. Holm, L & Park, J. (2000) Bioinformatics 16, 566–567.[Abstract/Free Full Text]
  26. Havel, TF, Kuntz, ID & Crippen, GM. (1983) J Theor Biol 104, 359–381.[CrossRef][ISI][Medline]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg    What's this?


This article has been cited by other articles in HighWire Press-hosted journals:


Home page
Proc. Natl. Acad. Sci. USAHome page
L. Xie and P. E. Bourne
Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments
PNAS, April 8, 2008; 105(14): 5441 - 5446.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
S. Kurtovic, A. Runarsdottir, L. O. Emren, A.-K. Larsson, and B. Mannervik
Multivariate-activity mining for molecular quasi-species in a glutathione transferase mutant library
Protein Eng. Des. Sel., May 1, 2007; 20(5): 243 - 256.
[Abstract] [Full Text] [PDF]


This Article
Free via Open Access: OA
Right arrow OA Abstract
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Supporting Information
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via ISI Web of Science (6)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Choi, I.-G.
Right arrow Articles by Kim, S.-H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Choi, I.-G.
Right arrow Articles by Kim, S.-H.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg  
What's this?