A glimpse at the organization of the protein universe
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
The amino acid sequences of most natural proteins result in an ability to fold to specific structures that generate biological activity, and simultaneously to avoid misfolding and aggregation (1). It appears from the data available to us at present that the overall architecture (the “fold”) of these structures is much more highly conserved during evolution than the sequences that encode them. These folds have therefore emerged as ideal candidates for classifying proteins (Fig. 1) and hence to begin to make order of the protein universe (2). The continuing advances in structural biology, and particularly the recent emergence of structural genomics initiatives in which particular emphasis is placed on the discovery of new folds (3), are providing an opportunity to build up a comprehensive map of the protein universe. Of particular significance is the fact that the number of distinct structural archetypes, or folds, is thought to be relatively small, less than ≈10,000 by most estimates, with many different sequences able to encode the same basic fold of the polypeptide chain (4). A key question in the analysis of protein sequences and structures is the way in which they relate to their functions. Clues as to the answer will not only begin to enlighten us as to the fundamental organization of the protein universe, and the location within it of natural proteins, but will also provide a means of predicting the functions of those proteins for which this information is not yet defined by experiment. The ability to predict function will be of tremendous value, for example, in interpreting …





