Nature of the protein universe
-
Contributed by Michael Levitt, May 9, 2009 (received for review April 20, 2009)
Abstract
The protein universe is the set of all proteins of all organisms. Here, all currently known sequences are analyzed in terms of families that have single-domain or multidomain architectures and whether they have a known three-dimensional structure. Growth of new single-domain families is very slow: Almost all growth comes from new multidomain architectures that are combinations of domains characterized by ≈15,000 sequence profiles. Single-domain families are mostly shared by the major groups of organisms, whereas multidomain architectures are specific and account for species diversity. There are known structures for a quarter of the single-domain families, and >70% of all sequences can be partially modeled thanks to their membership in these families.
Footnotes
- 1E-mail: michael.levitt{at}stanford.edu
-
Author contributions: M.L. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper.
-
The author declares no conflict of interest.
-
This article contains supporting information online at www.pnas.org/cgi/content/full/0905029106/DCSupplemental.
-
Freely available online through the PNAS open access option.




