ChemBridge Corporation Screening Compounds  Sign up for PNAS Online eTocs
Link: Info for AuthorsLink: Editorial BoardLink: AboutLink: SubscribeLink: AdvertiseLink: ContactLink: Sitemap Link: PNAS Home
Proceedings of the National Academy of Sciences
Link: Current Issue "" Link: Archives "" Link: Online Submission ""  Link: Advanced Search



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a colleague
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My File Cabinet
Right arrow Download to citation manager
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via ISI Web of Science (492)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Pellegrini, M.
Right arrow Articles by Yeates, T. O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pellegrini, M.
Right arrow Articles by Yeates, T. O.
Right arrowPubmed/NCBI databases
*Substance via MeSH
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg  
What's this?

 Previous Article  | Table of Contents |  Next Article 

Vol. 96, Issue 8, 4285-4288, April 13, 1999

Biochemistry
Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles

(genomic / bioinformatics / metabolic pathways / structural complexes)

Matteo Pellegrini*, Edward M. Marcotte*, Michael J. Thompson, David Eisenberg, and Todd O. Yeatesdagger

Molecular Biology Institute and Departments of Energy Laboratory of Structural Biology and Molecular Medicine, and Chemistry and Biochemistry, University of California, Box 951570, Los Angeles, CA 90095-1570

Contributed by David S. Eisenberg, January 20, 1999


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
CONCLUSION
REFERENCES

Determining protein functions from genomic sequences is a central goal of bioinformatics. We present a method based on the assumption that proteins that function together in a pathway or structural complex are likely to evolve in a correlated fashion. During evolution, all such functionally linked proteins tend to be either preserved or eliminated in a new species. We describe this property of correlated evolution by characterizing each protein by its phylogenetic profile, a string that encodes the presence or absence of a protein in every known genome. We show that proteins having matching or similar profiles strongly tend to be functionally linked. This method of phylogenetic profiling allows us to predict the function of uncharacterized proteins.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
CONCLUSION
REFERENCES

The fully sequenced genomes of numerous organisms offer large amounts of information about cellular biology (see the genomes listed at the web site of The Institute for Genome Research: www.tigr.org). It is a central challenge of bioinformatics to use this information in discovering the function of proteins. Functional assignments of genes come primarily from biochemical experimentation, which can be extended by matching recently sequenced proteins to those that have already been characterized (1). For the exceptionally well studied genome of Escherichia coli (2), these and related techniques (3, 4) have lead to tentative functional assignments of slightly more than half of its proteins (5). The problem of assigning functions to the remaining proteins is addressed here.

Our computational method detects proteins that participate in a common structural complex or metabolic pathway. Proteins within these groups are defined as functionally linked. The underlying hypothesis is that functionally linked proteins evolve in a correlated fashion, and, therefore, they have homologs in the same subset of organisms. For instance, we expect to find flagellar proteins in bacteria that possess flagella but not in other organisms. In short, we show that if two proteins have homologs in the same subset of fully sequenced organisms, they are likely to be functionally linked. We exploit this property systematically to map links between all the proteins coded by a genome. In general, pairs of functionally linked proteins have no amino acid sequence similarity with each other and, therefore, cannot be linked by conventional sequence-alignment techniques.


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
CONCLUSION
REFERENCES

To represent the subset of organisms that contain a homolog, we constructed a phylogenetic profile for each protein. This profile is a string with n entries, each one bit, where n corresponds to the number of genomes (16 in the present article). We indicate the presence of a homolog to a given protein in the nth genome with an entry of unity at the nth position. If no homolog is found, the entry is zero. Proteins are clustered according to the similarity of their phylogenetic profiles. Similar profiles show a correlated pattern of inheritance and, by implication, functional linkage. The method predicts that the functions of uncharacterized proteins are likely to be similar to characterized proteins within a cluster (Fig. 1).



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 1.   Our method of analyzing protein phylogenetic profiles is illustrated schematically for the hypothetical case of four fully sequenced genomes (from E. coli, Saccharomyces cerevisiae, Haemophilus influenzae, and Bacillus subtilis) in which we focus on seven proteins (P1-P7). For each E. coli protein, we construct a profile, indicating which genomes code for homologs of the protein. We next cluster the profiles to determine which proteins share the same profiles. Proteins with identical (or similar) profiles are boxed to indicate that they are likely to be functionally linked. Boxes connected by lines have phylogenetic profiles that differ by one bit and are termed neighbors.

We computed phylogenetic profiles for the 4,290 proteins encoded by the genome of E. coli by aligning (6) each protein sequence (Pi) with the proteins from 16 other fully sequenced genomes (listed at the web site of The Institute for Genome Research: www.tigr.org). Proteins coded by the nth genome are defined as including a homolog of Pi if they align to Pi with a score that is deemed statistically significant.Dagger


    RESULTS AND DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
CONCLUSION
REFERENCES

To test whether proteins with similar phylogenetic profiles are functionally linked, we examined the phylogenetic profiles for two proteins that are known to participate in structural complexes, the ribosome protein RL7 and the flagellar structural protein FlgL, as well as a protein known to participate in a metabolic pathway, the histidine biosynthetic protein HIS5. We first identified all other E. coli ORFs with phylogenetic profiles identical to those of these three proteins and then those ORFs with profiles that differ by one bit. The results are shown in Fig. 2.



View larger version (59K):
[in this window]
[in a new window]
 
Fig. 2.   Proteins with phylogenetic profiles in the neighborhood of ribosomal protein RL7 (A), flagellar structural protein FlgL (B), and histidine biosynthetic protein His5 (C). In each case, we first found all proteins with profiles identical to our query proteins; the proteins we found are shown in the double boxes. We then found all the proteins with profiles that differed from our query proteins by one bit; these are shown in the single boxes. Proteins in bold participate in the same complex or pathway as the query protein, and proteins in italics participate in a different but related complex or pathway. Proteins with identical profiles are shown within the same box. Single lines between boxes represent a one-bit difference between the two profiles. All neighboring proteins whose profiles differ by one bit from the query protein are shown. Homologous proteins are connected by a dashed line or are indented. Each protein is labeled by a four-digit E. coli gene number, a SwissProt gene name, and a brief description. Note that proteins within a box or in boxes connected by a line have similar functions. Hypothetical proteins (i.e., those of unknown function) are prime candidates for functional and structural studies. Proteins in the double boxes in A, B, and C have 11, 6, and 10 ones, respectively, in their phylogenetic profiles, of a possible 16 for the 17 genomes presently sequenced.

Homologs of ribosome protein RL7 are found in 10 of 11 eubacterial genomes and in yeast but not in archaeal genomes. We find that more than half of the E. coli proteins with the RL7 phylogenetic profile or profiles that differ from it by one bit have functions associated with the ribosome (Fig. 2A). Because none of these proteins have significant amino acid sequence similarity to RL7, the functional relationships to the ribosome---had they not been determined already---could not be inferred by sequence comparisons. This finding supports the idea that proteins with similar profiles are likely to belong to a common group of functionally linked proteins. Several other proteins with these profiles have no assigned function and are listed accordingly as hypothetical. The testable prediction of the clustering of phylogenetic profiles is that these as yet uncharacterized proteins have functions associated with the ribosome.

The comparisons of the phylogenetic profiles of flagellar proteins (Fig. 2B) further support the idea that proteins with similar profiles are likely to be functionally linked; 10 flagellar proteins share a common profile. Their homologs are found in a subset of five bacterial genomes: those of Aquifex aeolicus, Borrelia burgdorferi, B. subtilis, Helicobacter pylori, and Mycobacterium tuberculosis. Other proteins that appear in neighboring clusters (groups of proteins that share a common profile) include various flagellar proteins and cell-wall maintenance proteins. Flagellar and cell-wall maintenance proteins may be biochemically linked, because flagella are inserted through the cell wall. For example, the lytic murein transglycosylase MltD has a phylogenetic profile that differs by only one bit from that of the flagellar structural protein FlgL. This transglycosylase cuts the cell wall for unknown reasons. Therefore, another prediction is that this enzyme may participate in flagellar assembly.

Fig. 2 A and B includes proteins in structural complexes, whereas Fig. 2C shows proteins involved in amino acid metabolism. We find that more than half of the proteins with phylogenetic profiles similar (within one bit) to that of the histidine synthesis protein His5 are involved in amino acid metabolism. With the 16 currently available fully sequenced genomes, however, phylogenetic profiles are not able to separate the metabolic pathways of specific amino acids. Instead, because of the limitations of currently available data, a histidine biosynthesis protein seems to have the same profile as a tryptophan, arginine, and cysteine synthesis protein. It is probable that, as more genomes are fully sequenced and the number of entries in phylogenetic profiles is increased, similar but distinct amino acid metabolic pathways will cluster separately in phylogenetic-profile spaces.

The examples included in Fig. 2 show that proteins with phylogenetic profiles similar to a query protein are likely to be functionally linked with it. We next show the converse: that groups of proteins known to be functionally linked often have similar phylogenetic profiles. As shown in Table 1, we chose groups of E. coli proteins that share a common keyword in their SwissProt (7) annotation, reflecting well known families of functionally linked proteins. Because homologous proteins coded by the same genome necessarily have similar profiles, they were eliminated from the groups. For each group, we computed the number of protein pairs that are neighbors; neighbors were defined as proteins whose profiles differ by less than 3 bits. For a group of n proteins there are at most [n(n - 1)]/2 possible neighbors.


                              
View this table:
[in this window]
[in a new window]
 
Table 1.   Phylogenetic profiles link protein with similar keywords

The similarity of the phylogenetic profiles of the proteins that share a common keyword was evaluated by a statistical test; we compared the number of neighbors found in our keyword groups to the average number of neighbors found in a group of the same size but with randomly selected E. coli proteins. We found that, on average, the random sets contain very few neighbors compared with the keyword groups, even though the keyword groups contain only a fraction of all possible neighbor pairs. Thus, proteins that are functionally linked are far more likely to be neighbors in profile space than randomly selected proteins. However, we find only a fraction of all possible neighbors within a group. Therefore, not all functionally linked proteins have similar profiles; they may fall into multiple clusters in profile space. It is interesting to note that hypothetical proteins are also more likely to be neighbors than random proteins, suggesting that many hypothetical proteins are part of uncharacterized pathways or complexes.

A second indication that functionally linked proteins are likely to have similar phylogenetic profiles comes from the analysis of classes of proteins obtained from the EcoCyc library (Encyclopedia of Escherichia coli Genes and Metabolism, ref. 8). We selected several classes that contain more than 10 members and that represent well known biochemical pathways. The results of our analysis are listed in Table 2. The conclusions that we draw from this analysis are similar to those found with the keyword groups: members of the group are far more likely to have neighboring profiles than members of a randomly selected control group.


                              
View this table:
[in this window]
[in a new window]
 
Table 2.   Phylogenetic profiles link proteins in EcoCyc classes

Finally, we attempted to determine the ability of our method to predict the function of uncharacterized proteins. We equate the function of a protein with that of its neighbors in phylogenetic-profile space. This equation is accomplished by means of the keyword annotations found in the SwissProt database. To test the efficacy of this method, we compared the keywords of each characterized protein to those of the neighbors in phylogenetic-profile space. All of the neighbors, in this case, were other proteins with identical profiles. We found that on average 18% of the neighbor keywords overlapped the known keywords of the query protein. By comparison, random proteins had only a 4% overlap with the same set of neighbors. We make the rough estimate that, for more than half of E. coli proteins, we can assign the general function correctly by examining the functions of their phylogenetic-profile neighbors. This estimate should also hold true for the ability of phylogenetic profiles to assign functions to uncharacterized proteins.


    CONCLUSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
CONCLUSION
REFERENCES

The phylogenetic profile of a protein describes the presence or absence of homologs in organisms. Proteins that make up multimeric structural complexes are likely to have similar profiles. Also, proteins that are known to participate in a given biochemical pathway are likely to be neighbors in the space of phylogenetic profiles. These findings indicate that comparing profiles is a useful tool for identifying the complex or pathway in which a protein participates. Finally, we were able to make functional assignments of uncharacterized proteins by examining the function of proteins with identical phylogenetic profiles.

As the number of fully sequenced genomes increases, scientists will be able to construct longer and potentially more informative protein phylogenetic profiles. There are at least 100 genome projects underway that are due to be completed within the next few months. These data will enable the construction of profiles 100 bits rather than 16 bits in length. Because the number of profile patterns grows exponentially with the number of fully sequenced genomes, the results of 100-bit comparisons should be considerably more informative than those with 16 bits. Furthermore, because the newly sequenced genomes will include several eukaryotic organisms, protein phylogenetic profiles also should become a useful tool for studying structural complexes and metabolic pathways in these higher organisms.


    ACKNOWLEDGEMENTS

This work was supported by a postdoctoral fellowship from the Sloan Foundation and the Department of Energy (to M.P.), by a Hollaender postdoctoral fellowship from the Department of Energy and the Oak Ridge Institute for Science and Education (to E.M.), and by grants from the Department of Energy and the National Institutes of Health.


    Note Added in Proof

A data structure similar to the one described in Methods has been proposed independently by Regan and Gaasterland (9) for describing the distribution of proteins in genomes.


    ABBREVIATION

EcoCyc, Encyclopedia of Escherichia coli Genes and Metabolism.


    FOOTNOTES

* M.P. and E.M.M. contributed equally to this work.

dagger To whom reprint requests should be addressed. e-mail: yeates{at}mbi.ucla.edu.

Dagger The statistical significance of an alignment score is described by the probability (P) of obtaining a higher score when the sequences are shuffled. To compute a P value threshold, we first consider the total number of sequence comparisons that we are performing. If there are n proteins in E. coli and m in all other genomes, this number is n × m. If we were to compare this number of random sequences, we would expect one pair to yield a P value of 1/(n × m) by chance. We, therefore, set this P value as our threshold.


    REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
CONCLUSION
REFERENCES

1. Bork, P., Dandekar, T., Diaz-Lazcoz, Y., Eisenhaber, F., Huynen, M. & Yuan, Y. (1998) J. Mol. Biol. 283, 707-725 [CrossRef][ISI][Medline] .
2. Blattner, F. R., Plunckett, G., Bloch, C. A., Perna, N. T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J. D., Rode, C. K., Mayhew, G. F., et al. (1997) Science 265, 1453-1474 .
3. Tatusov, R. L., Mushegian, A. R., Bork, P., Brown, N. P., Hayes, W. S., Borodovsky, M., Rudd, K. E. & Koonin, E. V. (1996) Curr. Biol. 6, 279-291 [CrossRef][ISI][Medline] .
4. Andrade, M. A. & Sander, C. (1997) Curr. Opin. Biotechnol. 8, 675-683 [CrossRef][Medline] .
5. Riley, M. (1998) Nucleic Acids Res. 26, 54 [Abstract/Free Full Text].
6. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997) Nucleic Acid Res. 25, 3389-3402 [Abstract/Free Full Text].
7. Bairoch, A. & Apweiler, R. (1998) Nucleic Acids Res. 26, 38-42 [Abstract/Free Full Text].
8. Karp, P., Riley, M., Paley, S. & Pellegrini-Toole, A. (1998) Nucleic Acids Res. 26, 50-53 [Abstract/Free Full Text].
9. Gaasterland, T. & Regan, M. A. (1998) Microb. Comp. Genomics 3, 177-192 [Medline] .

Copyright © 1999 by The National Academy of Sciences  0027-8424/99/964285-4$2.00/0
Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg    What's this?


This article has been cited by other articles in HighWire Press-hosted journals:


Home page
BioinformaticsHome page
O. Gonzalez and R. Zimmer
Assigning functional linkages to proteins using phylogenetic profiles and continuous phenotypes
Bioinformatics, May 15, 2008; 24(10): 1257 - 1263.
[Abstract] [Full Text] [PDF]


Home page
Brief Funct Genomic ProteomicHome page
B. Lehner and I. Lee
Network-guided genetic screening: building, testing and using gene networks to predict gene function
Brief Funct Genomic Proteomic, April 29, 2008; (2008) eln020v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Guo, L. Yu, Z. Wen, and M. Li
Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences
Nucleic Acids Res., April 4, 2008; (2008) gkn159v1.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
G. Lima-Mendez, J. Van Helden, A. Toussaint, and R. Leplae
Reticulate Representation of Evolutionary and Functional Relationships between Phage Genomes
Mol. Biol. Evol., April 1, 2008; 25(4): 762 - 777.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
T. Ohnuma, S. Onaga, K. Murata, T. Taira, and E. Katoh
LysM Domains from Pteris ryukyuensis Chitinase-A: A STABILITY STUDY AND CHARACTERIZATION OF THE CHITIN-BINDING SITE
J. Biol. Chem., February 22, 2008; 283(8): 5178 - 5187.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Y. Yip, P. Patel, P. M. Kim, D. M. Engelman, D. McDermott, and M. Gerstein
An integrated system for studying residue coevolution in proteins
Bioinformatics, January 15, 2008; 24(2): 290 - 292.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Su, J. M. Peregrin-Alvarez, G. Butland, S. Phanse, V. Fong, A. Emili, and J. Parkinson
Bacteriome.org an integrated protein interaction database for E. coli
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D632 - D636.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Cui, P. Li, G. Li, F. Xu, C. Zhao, Y. Li, Z. Yang, G. Wang, Q. Yu, Y. Li, et al.
AtPID: Arabidopsis thaliana protein interactome database an integrative platform for plant systems biology
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D999 - D1008.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Pagel, M. Oesterheld, O. Tovstukhina, N. Strack, V. Stumpflen, and D. Frishman
DIMA 2.0 predicted and known domain interactions
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D651 - D655.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
M. Brilli, R. Fani, and P. Lio
Current trends in the bioinformatic sequence analysis of metabolic pathways in prokaryotes
Brief Bioinform, January 1, 2008; 9(1): 34 - 45.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Z. Lin, M. Nei, and H. Ma
The origins and early evolution of DNA mismatch repair genes multiple horizontal gene transfers and co-evolution
Nucleic Acids Res., December 3, 2007; 35(22): 7591 - 7603.
[Abstract] [Full Text] [PDF]


Home page
Microbiol. Mol. Biol. Rev.Home page
F. R. Tabita, T. E. Hanson, H. Li, S. Satagopan, J. Singh, and S. Chan
Function, Structure, and Evolution of the RubisCO-Like Proteins and Their RubisCO Homologs
Microbiol. Mol. Biol. Rev., December 1, 2007; 71(4): 576 - 599.
[Abstract] [Full Text] [PDF]


Home page
Ann. N. Y. Acad. Sci.Home page
R. A. CRAIG and L. LIAO
Improving Protein Protein Interaction Prediction Based on Phylogenetic Information Using a Least-Squares Support Vector Machine
Ann. N.Y. Acad. Sci., December 1, 2007; 1115(1): 154 - 167.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
T. Shlomi, M. Herrgard, V. Portnoy, E. Naim, B. O. Palsson, R. Sharan, and E. Ruppin
Systematic condition-dependent annotation of metabolic genes
Genome Res., November 1, 2007; 17(11): 1626 - 1633.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
M. G. Kann
Protein interactions and disease: computational approaches to uncover the etiology of diseases
Brief Bioinform, September 1, 2007; 8(5): 333 - 346.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
B. S. Srinivasan, N. H. Shah, J. A. Flannick, E. Abeliuk, A. F. Novak, and S. Batzoglou
Current progress in network research: toward reference networks for key model organisms
Brief Bioinform, September 1, 2007; 8(5): 318 - 332.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. D. Harrington, A. H. Singh, T. Doerks, I. Letunic, C. von Mering, L. J. Jensen, J. Raes, and P. Bork
Quantitative assessment of protein function prediction from metagenomics shotgun sequences
PNAS, August 28, 2007; 104(35): 13913 - 13918.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. Beaussart, J. Weiner 3rd, and E. Bornberg-Bauer
Automated Improvement of Domain ANnotations using context analysis of domain arrangements (AIDAN)
Bioinformatics, July 15, 2007; 23(14): 1834 - 1836.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y.-C. Chen, Y.-S. Lo, W.-C. Hsu, and J.-M. Yang
3D-partner: a web server to infer interacting partners and binding models
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W561 - W567.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. A. Maresca, J. E. Graham, M. Wu, J. A. Eisen, and D. A. Bryant
Identification of a fourth family of lycopene cyclases in photosynthetic bacteria
PNAS, July 10, 2007; 104(28): 11784 - 11789.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
G. Lerman and B. E. Shakhnovich
Defining functional distance using manifold embeddings of gene ontology annotations
PNAS, July 3, 2007; 104(27): 11334 - 11339.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Dawelbait, C. Winter, Y. Zhang, C. Pilarsky, R. Grutzmann, J.-C. Heinrich, and M. Schroeder
Structural templates predict novel protein interactions and targets from pancreas tumour gene expression data
Bioinformatics, July 1, 2007; 23(13): i115 - i124.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. L. Green and P. D. Karp
Using genome-context data to identify specific types of functional associations in pathway/genome databases
Bioinformatics, July 1, 2007; 23(13): i205 - i211.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. A. Wright, P. Kharchenko, G. M. Church, and D. Segre
Chromosomal periodicity of evolutionarily conserved gene pairs
PNAS, June 19, 2007; 104(25): 10559 - 10564.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
X.-C. Zhang, X. Wu, S. Findley, J. Wan, M. Libault, H. T. Nguyen, S. B. Cannon, and G. Stacey
Molecular Evolution of Lysin Motif-Type Receptor-Like Kinases in Plants
Plant Physiology, June 1, 2007; 144(2): 623 - 636.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
R. Liu and H. Ochman
Stepwise formation of the bacterial flagellar system
PNAS, April 24, 2007; 104(17): 7116 - 7121.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
S. Yellaboina, K. Goyal, and S. C. Mande
Inferring genome-wide functional linkages in E. coli by combining improved genome context methods: Comparison with high-throughput experimental data
Genome Res., April 1, 2007; 17(4): 527 - 535.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. J. Cockell, B. Oliva, and R. M. Jackson
Structure-based evaluation of in silico predictions of protein protein interactions using Comparative Docking
Bioinformatics, March 1, 2007; 23(5): 573 - 581.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. E. Reddy, B. E. Shakhnovich, D. S. Roberts, S. J. Russek, and C. DeLisi
Positional clustering improves computational binding site detection and identifies novel cis-regulatory sites in mammalian GABAA receptor subunit genes
Nucleic Acids Res., February 16, 2007; 35(3): e20 - e20.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
N. H. Bergman, K. D. Passalacqua, P. C. Hanna, and Z. S. Qin
Operon Prediction for Sequenced Bacterial Genomes without Experimental Information
Appl. Envir. Microbiol., February 1, 2007; 73(3): 846 - 854.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
E. Borenstein, T. Shlomi, E. Ruppin, and R. Sharan
Gene loss rate: a probabilistic measure for the conservation of eukaryotic genes
Nucleic Acids Res., January 12, 2007; 35(1): e7 - e7.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Dam, V. Olman, K. Harris, Z. Su, and Y. Xu
Operon prediction using both genome-specific and general genomic information
Nucleic Acids Res., January 12, 2007; 35(1): 288 - 298.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. von Mering, L. J. Jensen, M. Kuhn, S. Chaffron, T. Doerks, B. Kruger, B. Snel, and P. Bork
STRING 7--recent developments in the integration and prediction of protein interactions
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D358 - D362.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. D. Selengut, D. H. Haft, T. Davidsen, A. Ganapathy, M. Gwinn-Giglio, W. C. Nelson, A. R. Richter, and O. White
TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D260 - D264.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Barker, A. Meade, and M. Pagel
Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes
Bioinformatics, January 1, 2007; 23(1): 14 - 20.
[Abstract] [Full Text]