Identification of direct residue contacts in protein–protein interaction by message passing

  1. Martin Weigta,b,1,
  2. Robert A. Whitea,c,1,
  3. Hendrik Szurmantc,
  4. James A. Hochc,2, and
  5. Terence Hwaa,2
  1. aCenter for Theoretical Biological Physics and Department of Physics, University of California at San Diego, La Jolla, CA 92093-0374;
  2. bInstitute for Scientific Interchange, Viale S. Severo 65, I-10133 Torino, Italy; and
  3. cDivision of Cellular Biology, Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA 92037
  1. Edited by Alan Fersht, University of Cambridge, Cambridge, United Kingdom, and approved November 11, 2008

  2. 1M.W. and R.A.W. contributed equally to this work. (received for review June 18, 2008)

Abstract

Understanding the molecular determinants of specificity in protein–protein interaction is an outstanding challenge of postgenome biology. The availability of large protein databases generated from sequences of hundreds of bacterial genomes enables various statistical approaches to this problem. In this context covariance-based methods have been used to identify correlation between amino acid positions in interacting proteins. However, these methods have an important shortcoming, in that they cannot distinguish between directly and indirectly correlated residues. We developed a method that combines covariance analysis with global inference analysis, adopted from use in statistical physics. Applied to a set of >2,500 representatives of the bacterial two-component signal transduction system, the combination of covariance with global inference successfully and robustly identified residue pairs that are proximal in space without resorting to ad hoc tuning parameters, both for heterointeractions between sensor kinase (SK) and response regulator (RR) proteins and for homointeractions between RR proteins. The spectacular success of this approach illustrates the effectiveness of the global inference approach in identifying direct interaction based on sequence information alone. We expect this method to be applicable soon to interaction surfaces between proteins present in only 1 copy per genome as the number of sequenced genomes continues to expand. Use of this method could significantly increase the potential targets for therapeutic intervention, shed light on the mechanism of protein–protein interaction, and establish the foundation for the accurate prediction of interacting protein partners.

Footnotes

  • 2To whom correspondence may be addressed. E-mail: hoch{at}scripps.edu or hwa{at}ucsd.edu
  • Author contributions: M.W., R.A.W., and T.H. designed research; M.W. and R.A.W. performed research; M.W. and R.A.W. contributed new reagents/analytic tools; M.W., R.A.W., H.S., J.A.H., and T.H. analyzed data; and M.W., R.A.W., H.S., J.A.H., and T.H. wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

  • This article contains supporting information online at www.pnas.org/cgi/content/full/0805923106/DCSupplemental.

  • Despite significant structural homology, sequence homology between the Spo0B interaction domain and the HisKA domain is poor (E = 0.5 for HMM match to Spo0B) and only SK residues on the α1-helix can be reliably matched to Spo0B.

  • Interestingly, all false positives within the first 60 pairs include residues localized to the α1-helix. A rationale for the occurrence of these apparent false positives is given in the legend of Table S1.

  • ArcA numbering used throughout for clarity; for accurate MicA and PhoP numbering, deduct 2 from the ArcA numbering.

  • The possible E86-R108 salt bridge is not realized in the ArcA structure because of a likely crystallographic artifact. In the crystal lattice, residue R108 forms a salt bridge with an aspartyl residue in a neighboring ArcA dimer, a contact not available in solution (data not shown).

« Previous | Next Article »Table of Contents