A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation

Brylinski and Skolnick. 10.1073/pnas.0707684105.

Supporting Information

Files in this Data Supplement:

SI Figure 5
SI Text
SI Figure 6
SI Figure 7
SI Figure 8
SI Figure 9
SI Table 1




SI Figure 5

Fig. 5. The performance of FINDSITE and LIGSITECSC is compared to randomly selected patches on a target protein surface. Predictions were done using the top-ranked PROSPECTOR_3 templates (A) and MODELLER (B) models. The results are presented as the cumulative fraction of proteins with a distance between the center of mass of a ligand in the native complex and the center of the best of top five predicted binding sites £ the distance on the x axis (Left) and the rank of the best pocket selected from the top five prediction (Right).





SI Figure 6

Fig. 6. The average distance between the centers of predicted and observed pockets vs. the number of holo-templates (including homologous proteins for each template) that share common ligand-binding site (cluster multiplicity) are presented for the top-ranked FINDSITE predictions for crystal structures and protein models. Based on the cluster multiplicity, proteins can be classified as Easy, Medium, and Hard targets for threading-based ligand-binding site prediction.





SI Figure 7

Fig. 7. The performance of ligand-binding site prediction on the negative dataset of protein-protein interfaces is assessed in terms of the fraction of residues involved in protein-protein interactions predicted to be ligand binding. We show the worst of the top five solutions obtained for crystal structures (A), top-ranked PROSPECTOR_3 templates (B), and TASSER models (C). The inset shows the failure rate for the top predictions.





SI Figure 8

Fig. 8. the fingerprint profiles for HIV-1 protease inhibitor activity class: the MDDR (A) and MDDR-PDB (B) are compared to that predicted by FINDSITE using all threading holo-templates (including proteins homologous to HIV-1 protease, B) as well as the threading templates with <35% sequence identity (C). Frequency stands for the relative frequency of occurrence of particular bit position set on in query compounds (either known actives present in MDDR and MDDR-PDB or ligands predicted by FINDSITE). Only the frequencies >0.5 are shown.





SI Figure 9

Fig. 9. The enrichment behavior is presented for known HIV-1 protease inhibitors (the MDDR and MDDR-PDB sets) and the ligand templates predicted by FINDSITE from either homologous or weakly homologous (<35% sequence identity to HIV-1 protease) set of threading templates. The area shaded in gray corresponds to the enrichment that would be obtained by a random ranking of compounds in the screening library.





Table 1. The FINDSITE precision and sensitivity is presented for 20 most-accurate predicted molecular functions for the benchmark dataset assigned using the Gene Ontology classification

Top 20 function predictions

Top 20 Matthew's correlation coefficients

GO identifier

Description

Frequency in the dataset

Precision (PPV)

Sensitivity (TPR)

GO identifier

Description

MCC

Precision (PPV)

Sensitivity (TPR)

GO:0003824

Enzyme activity

0.81

0.93

0.89

GO:0019825

Oxygen binding

1.00

1.00

1.00

GO:0005488

Ligand binding

0.55

0.84

0.72

GO:0004879

Ligand-dependent nuclear receptor activity

1.00

1.00

1.00

GO:0016740

Transferase activity

0.27

0.75

0.65

GO:0004601

Peroxidase activity

1.00

1.00

1.00

GO:0016787

Hydrolase activity

0.23

0.86

0.67

GO:0004146

Dihydrofolate reductase activity

1.00

1.00

1.00

GO:0016491

Oxidoreductase activity

0.20

0.81

0.75

GO:0004114

3',5'-Cyclic-nucleotide phosphodiesterase activity

1.00

1.00

1.00

GO:0043167

Ion binding

0.30

0.82

0.56

GO:0003707

Steroid hormone receptor activity

1.00

1.00

1.00

GO:0046872

Metal ion binding

0.19

0.83

0.54

GO:0016410

N-acyltransferase activity

0.96

0.92

1.00

GO:0000166

Nucleotide binding

0.18

0.79

0.82

GO:0008080

N-acetyltransferase activity

0.96

0.92

1.00

GO:0043169

Cation binding

0.17

0.86

0.61

GO:0030170

Pyridoxal phosphate binding

0.94

1.00

0.89

GO:0017076

Purine nucleotide binding

0.17

0.77

0.83

GO:0004497

Monooxygenase activity

0.93

1.00

0.86

GO:0030554

Adenyl nucleotide binding

0.14

0.80

0.80

GO:0004194

Pepsin A activity

0.91

0.83

1.00

GO:0046914

Transition metal ion binding

0.15

0.86

0.64

GO:0004521

Endoribonuclease activity

0.89

0.80

1.00

GO:0005524

ATP binding

0.14

0.79

0.83

GO:0046906

Tetrapyrrole binding

0.89

0.88

0.92

GO:0016772

Transferase activity, transferring phosphorus-containing groups

0.12

0.74

0.59

GO:0020037

Heme binding

0.89

0.88

0.92

GO:0005515

Protein binding

0.10

0.36

0.11

GO:0004190

Aspartic-type endopeptidase activity

0.87

0.88

0.88

GO:0016301

Kinase activity

0.09

0.71

0.66

GO:0008081

Phosphoric diester hydrolase activity

0.87

1.00

0.75

GO:0005506

Iron ion binding

0.09

0.84

0.73

GO:0004112

Cyclic-nucleotide phosphodiesterase activity

0.87

1.00

0.75

GO:0003676

Nucleic acid binding

0.08

0.67

0.37

GO:0004197

Cysteine-type endopeptidase activity

0.84

1.00

0.71

GO:0048037

Cofactor binding

0.07

0.67

0.36

GO:0016407

Acetyltransferase activity

0.84

0.85

0.85

GO:0046906

Tetrapyrrole binding

0.06

0.88

0.92

GO:0016747

Transferase, groups other than amino-acyl groups

0.84

1.00

0.71

Table 2. The molecular function according to Gene Ontology is compared to that assigned by FINDSITE to HIV-1 protease sequence using homologous as well as weakly homologous (<35% sequence identity) set of ligand-bound threading templates

 

GO term

Gene Ontology

FINDSITE prediction

Homologous set

Weakly homologous set

GO:0003824 (catatytic activity)

+

+

+

GO:0016787 (hydrolase activity)

+

+

+

GO:0008233 (peptidase activity)

+

+

+

GO:0004175 (endopeptidase activity)

+

+

+

GO:0004190 (aspartic-type endopeptidase activity)

+

+

+

GO:0004194 (pepsin A activity)

-

-

+





SI Text

1. Benchmark set of protein-ligand complexes

Protein structures determined by x-ray crystallography to a resolution £2.5 Å that are 50-400 residues in length have been selected from the Protein Data Bank (1) using the following criteria: Organic molecules, cofactors, single nucleotides, and short peptides composed of standard or modified amino acids are considered as ligands. To exclude very small as well as very large molecules, we set the minimum and maximum number of ligand atoms to 6 and 100, respectively. Nonspecifically bound ligands that form contacts with less than six protein residues as well as covalently bound ligands that usually require enzymatic action to bind were rejected. Interatomic contacts are calculated from the LPC algorithm (2, 3). In contrast to methods that determine a contact between two units (amino acids, ligands) using interatomic distances, LPC is based on the interatomic contact surface analysis. Moreover, to simplify the evaluation of results, we rejected structures that contain more than one ligand in a binding pocket. Subsequently, the redundant set of protein-ligand complexes was subjected to a single-linkage clustering procedure, using a cutoff of 35% amino acid sequence identity between clusters. From each cluster, the centroid was selected as a representative protein. In addition, homologous proteins (members of the same cluster) were accepted into the dataset if their ligands were found to occupy different binding pockets and to have relatively low chemical similarity (Tanimoto coefficient (4), TC, below 0.5). Finally, we selected those proteins for which at least one holo-template is present in the PDB. In this manner, a representative benchmark dataset of 901 protein-ligand complexes was compiled and is available at http://cssb.biology.gatech.edu/skolnick/files/FINDSITE.

2. Overview of PROSPECTOR_3 threading algorithm

Threading or fold recognition is a technique to match sequences to proteins adopting very similar structures, where a template sequence does not have to be evolutionary-related to the target sequence (5). In general, for a given target sequence, template structures are identified from the library of known protein structures (the fold library) by threading the probe sequence through the template structure and selecting the best alignment of the target sequence to the template structure as assessed by a scoring function (6, 7). FINDSITE uses structure templates selected from a non-redundant PDB library by the threading algorithm PROSPECTOR_3 (8, 9), which was designed to identify analogous as well as homologous templates. On the basis of the score significance and the consensus of template alignments, proteins are categorized into the confidence of the threading prediction. Score significance is evaluated in terms of the Z-score of the sequence assigned to a given structure based on the average of the best alignment given by Dynamic Programming over the template library. Threading templates selected by PROSPECTOR_3 are ranked according to their Z-score. FINDSITE requires threading templates with Z-scores ³4.

3. Structure alignment by TM-align

Traditional structural superposition requires a priori knowledge of equivalent residues. The most commonly used metric is the rmsd, after the optimal translation and rotation of one structure with respect to the other (10). However, even for proteins sharing the same global topology, the rmsd could be misleadingly high due to even a small number of local deviations or different protein lengths. A number of structure comparison algorithms were proposed to overcome these problems (11, 12). In contrast to simple structure superpositioning, protein structure alignment approaches attempt to establish equivalences between a pair of structures based on their three-dimensional conformation where the equivalent positions are not a priori given. Various structural alignment algorithms using different search algorithms and scoring functions have been developed to identify the optimal structure alignment (13-15).

FINDSITE uses the TM-align structure alignment program (13) to superimpose ligand-bound templates identified by PROSPECTOR_3 onto a putative (or native) target structure. TM-align is an extension of STRUCTAL (11) and SAL (16) that combines the TM-score rotation matrix and Dynamic Programming. The main advantage of TM-align is that the TM-score rotation matrix is used instead of the RMSD rotation matrix. The TM-score weights small inter-structural distances stronger than large distances; thus, it is more sensitive to the global structure topology than the RMSD rotation matrix and provides more accurate structure alignments (13). The accuracy of structure alignment is crucial for binding site detection by FINDSITE that clusters binding pockets upon superimposition of identified threading template structures. Good structure alignments provided by TM-align ensure that common binding pockets are located in spatial proximity in the superimposed holo-templates. The second important factor is the reference structure used for template superposition. If the crystal structure is unavailable, a protein model can be used as the reference structure. However, an inaccurate model can possibly result in an imprecise structure alignment that would affect the detection and ranking of binding sites. We observed that FINDSITE tolerates rmsd from the crystal structure of 8-10 Å. The insensitivity of FINDSITE to the structural distortion of protein models used as reference structures may be explained by TM-align's ability to find reliable structural alignments even for models with a high rmsd from the crystal structure. In addition, we used a relatively large cutoff distance of 8 Å to cluster the templates' binding pockets to tolerate some errors in the template structure.

4. PROSITE ligand-binding/interaction motifs

The PROSITE database (17) is an annotated collection of motif descriptors designed to identify conserved regions in a protein sequences, i.e., patterns and generalized profiles. In this study, we used patterns rather than profiles, which are generally less specific and which are aimed at characterizing protein domains over their entire length. First, from all PROSITE patterns (1319 patterns, release 20.11, 01-May-2007), we select these describing biologically significant regions and residues involved in ligand binding (90 patterns). Next, we use the selected patterns to detect ligand-binding signatures in the target sequences using ScanProsite tool (18) and the best identified patterns were taken into consideration.

5. Evaluation metrics for prediction accuracy

The accuracy of predictions done by ScanProsite and FINDSITE were assessed by following evaluation metrics:

Accuracy:

Specificity:

Sensitivity:

Precision:

Matthew's correlation coefficient:

where TP, TN, FP, and FN denote true positives, true negatives, false positives and false negatives, respectively.

6. Protein structure modeling

TASSER (19-21) and MODELLER9v1 (22, 23) were used to generate weakly homologous protein models from threading templates identified by PROSPECTOR_3. Only threading templates that have sequence similarity to a target protein <35% were used in the modeling procedure.

TASSER is a coarse-grained threading based structure assembly/refinement procedure. Initial full-length models built up from threading templates are submitted to TASSER's parallel hyperbolic Monte Carlo algorithm (24) that utilizes 40-80 replicas, (with the number of replicas dependent target protein size), with each replica simulated at a different temperature. The simulation procedure consists of 200 MC steps for each replica and 1000 attempts at replica exchange. Subsequently, the structures generated for the 16 lowest temperature replicas are subjected to SPICKER (25), an iterative clustering procedure. The cluster centroid of the top cluster selected by SPICKER is taken as the final model of a target protein. Lastly, for each final model, the all-atom representation is reconstructed from the Ca coordinates using PULCHRA (26).

MODELLER implements comparative protein structure modeling by satisfaction of spatial restraints. All-atom models are calculated based on the alignment of a sequence to be modeled with the template structure. In our study, MODELLER was provided with the same input alignment generated by PROSPECTOR_3. However, only threading templates with a Z-score ³7.5 are used in the modeling procedure, since we found that spatial restraints derived from templates with a Z-score <7.5 frequently cannot be satisfied and usually result in poor quality models. The gaps in the alignment were filled with backbone fragments built using RAPPER (27), an ab initio conformational sampling method in dihedral space. For each fragment, a set of 100 conformations is generated, and the best one chosen based on RAPPER's scoring function. The side chains of the generated fragments were optimized using the SCWRL3 (28) rotamer library. For each target protein, 100 all-atom models are generated and ranked according to MODELLER's objective function. The top model is selected and used as the final model for further analysis.

In addition to protein models generated by TASSER and MODELLER, the top-ranked PROSPECTOR_3 templates were also used as targets for ligand binding site prediction. According to the threading alignment, the side-chains of a template were mutated to those of the target protein using the SCWRL3 (28) rotamer library.

7. Case study: Ligand-based virtual screening for HIV-1 protease inhibitors

HIV-1 protease was selected as an example to demonstrate the performance of ligand templates predicted by FINDSITE in ligand-based virtual screening. HIV-1 protease is an aspartic protease that cleaves the nascent polyproteins during viral replication (29, 30). The important role of HIV-1 protease in the life cycle of HIV has motivated the development of HIV-1 protease inhibitors that prevent the cleavage of viral polyproteins by obstructing the active site (31-34).

One of the most commonly used computational techniques for high-throughput ligand-based virtual screening is a similarity search that employs molecular fingerprints (35, 36). Molecular fingerprints consist of linear bit strings encoding chemical and structural properties of organic compounds that can be easily compared using a variety of similarity metrics (37). In the case when multiple query compounds are available, class-specific profiles have been shown to increase the performance of similarity search calculations by amplifying the consensus bit positions (38, 39). Class-specific profiles are typically constructed from characteristic patterns of bits in compounds known to exhibit a particular biological activity. Here, we show that ligands bound to threading templates identified by FINDSITE can be used: 1) as multiple query compounds and 2) to generate pocket-specific profiles in ligand-based virtual screening.

a. Dataset

The compound library used in this study consists of 895 active molecules and 123,331 background molecules. Active molecules (MDL activity index: 71523, HIV-1 protease inhibitor) were extracted from the MDL Data Drug Report (MDDR, 2007.2 version 2.3 SP2). The Asinex Platinum Collection (September 2007) of lead-like compounds was used as the background set.

b. Ligand templates

The FINDSITE prediction methodology is presented in Fig. 1 and described in detail in the main text. The amino acid sequence of HIV-1 protease was retrieved from the Swiss-Prot database (40) (AC: O90777) and subsequently used as input to PROSPECTOR_3. From all threading templates identified by PROSPECTOR_3, 269 holo-templates that share a common binding pocket were selected by FINDSITE and ranked as the top prediction. The clustering of ligands bound to these templates using a Tanimoto coefficient cutoff of 0.7, resulted in 119 clusters. Representative molecules selected from the clusters were then used as multiple query compounds in ligand-based virtual screening for HIV-1 protease inhibitors. In addition, the pocket-specific profile was generated from consensus bits set on in the fingerprints generated for FINDSITE's query ligands. Since all threading templates (including proteins homologous to HIV-1 protease) were used to construct the query ligands and the profile, we refer to this set as the homologous set. In addition, we compiled a weakly homologous set and the corresponding pocket-specific profile by using only those proteins with <35% sequence identity to the target. The weakly homologous set consists of 36 representative molecules obtained by the clustering of 70 ligands bound to holo-templates that share the top-ranked binding pocket.

The results of virtual screening using ligands predicted by FINDSITE were compared to the results obtained using known HIV-1 protease inhibitors. We compiled two sets of inhibitors. The first is composed of all active molecules extracted from the MDDR (895 compounds). Since FINDSITE selects potential ligands from the molecules bound to threading templates (proteins with experimentally solved structures), we created a second set of known binders that consists of those MDDR inhibitors for which at least one similar compound (with the TC ³0.7) can be found in the PDB. The second set of known inhibitors is referred to as MDDR-PDB and contains 431 compounds, selected based on the similarity to the molecules present in the MSDchem library (41) (Release: 01-2007_09_13). Subsequently, the active compounds in the MDDR as well as in the MDDR-PDB set were clustered using TC similarity cutoff of 0.7 and representative molecules from each cluster were used as multiple query compounds in the ligand-based virtual screening experiment. The clustering procedure identified 216 and 65 representative molecules for the MDDR and the MDDR-PDB set, respectively. The class-specific profile for HIV-1 protease inhibitors was also generated for each set of known actives and used in molecular fingerprints scaling, as described below.

c. Similarity search by fingerprint profile scaling

The basic idea of fingerprint profile scaling is to apply weight factors according to observed bit frequencies in the multiple template compounds to emphasize the similarity of compounds having similar biological activity. Here, the performance of the similarity search using the profiles generated by FINDSITE was compared to that employing the class-specific profile derived from known HIV-1 protease inhibitors present in MDDR. In this study, we employ the 1024-bit version of Daylight fingerprints (42) and the similarity search profiling adapted from Bajorath and coworkers (38, 39, 43). The representative molecules selected by FINDSITE were used as ligand templates to rank the screening library using mTC, as described in Materials and Methods, but now the fingerprint overlap was measured by the averaged Tanimoto coefficient (44): , where TC' is the Tanimoto coefficient calculated for bit positions set to zero rather than to one as in traditional TC (4). In profile scaling, we set the frequency cutoff to 0.5, i.e., bit positions with the frequency >0.5 are considered as consensus bits. The linearly scaled weight factors (from 0 to 50) were then applied to consensus bit positions while calculating aveTC (e.g., consensus bits with the frequency of 1.0 were counted 50 times, these with the frequency £0.5 were counted once) to induce a frequency-dependent effect that emphasizes the class-specific features. The results were assessed by calculating the enrichment behavior, i.e., the fraction of active compounds recovered in the top-ranked sample of the screening library.

d. Molecular function prediction for HIV-1 protease

First, we note the high accuracy of the FINDSITE's molecular function prediction for HIV-1 protease sequence that is presented in SI Table 2. Using the homologous set, FINDSITE accurately predicted all GO terms associated with HIV-1 protease according to Gene Ontology, with no false positives. For the weakly homologous set, all GO numbers were also correctly assigned; however, there is one false positive: pepsin A activity that is the child term of aspartic-type endopeptidase activity.

e. Results of ligand-based virtual screening

The class-specific profile generated for known HIV-1 protease inhibitors as well as the pocket-specific profiles created by FINDSITE are presented in SI Fig. 8. High quality agreement is found between the profile generated based on the known actives and these predicted by FINDSITE, even if only proteins with <35% sequence identity to HIV-1 protease were taken into account. Ligand templates selected by FINDSITE were subsequently used as multiple query compounds to rank the screening library consisting of known actives as well as the background molecules (assumed to be inactive). In SI Fig. 9, the performance of ligand-based virtual screening using FINDSITE ligands and profiles (presented in SI Fig. 8 C and D) is compared to that obtained by using known HIV-1 protease inhibitors (MDDR and MDDR-PDB sets) as multiple query compounds and the corresponding class-specific profiles (shown in SI Fig. 8 A and B). Overall, the enrichment achieved using FINDSITE ligand templates is lower than that obtained using known HIV-1 protease inhibitors: the enrichment factor calculated for the top 1% of ranked library is 74, 74, 42, and 40 for known inhibitors from MDDR and MDDR-PDB sets, and for ligands predicted by FINDSITE for the homologous as well as weakly homologous set, respectively.

Because HIV-1 protease is a well-studied drug target for which many crystal structures complexed with small molecules are available, the performance of ligand-based virtual screening does not decrease if the set of query compounds is limited to those present in the PDB. Therefore, the relatively lower performance of FINDSITE mostly results from the false positive ligand templates selected for ligand-based virtual screening that assign a high rank to some of the background compounds; this results in the lower enrichment factor. The difference between the FINDSITE curves obtained for homologous and weakly homologous sets can be further explained by the smaller number of predicted ligands subsequently used as the query compounds: 119 and 36, respectively. Nevertheless, we note that the ligand templates predicted by FINDSITE solely from the HIV-1 protease amino acid sequence give satisfactory enrichment for the ligand-based virtual screening experiment: 40% or more of known inhibitors are found in the top 1% of the ranked screening library.

f. Conclusions

In this case study, we have shown that ligand molecules predicted by FINDSITE can be used as multiple query compounds in ligand-based virtual screening. Furthermore, the pocket-specific profiles generated for FINDSITE ligand fingerprints are in good agreement with that created for known binders belonging to a particular activity class. The present approach seems to capture the most pronounced chemical features of predicted binding sites. Even if only limited activity information is available for a given target protein and no homologous proteins are present in the structural databases, FINDSITE still may be able to provide ligand templates for ligand-based virtual screening.

1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) Nucleic Acids Res 28:235-242.

2. Sobolev V, Edelman M (1995) Proteins 21:214-225.

3. Sobolev V, Sorokine A, Prilusky J, Abola EE, Edelman M (1999) Bioinformatics 15:327-332.

4. Tanimoto TT, IBM Internal Report, November 17, 1957.

5. Jones DT, Hadley C (2000) in Bioinformatics: Sequence, Structure and Databanks, eds Higgins D, Taylor WR (Springer, Heidelberg, Germany), pp 1-13.

6. Meller J, Elber R (2001) Proteins 45:241-261.

7. Bienkowska JR, Rogers, RG, Jr, Smith TF (1999) J Comput Biol 6:299-311.

8. Skolnick J, Kihara D (2001) Proteins 42:319-331.

9. Skolnick J, Kihara D, Zhang Y (2004) Proteins 56:502-518.

10. Kabsch W (1978) Acta Crystallogr A 34:827-828.

11. Levitt M, Gerstein M (1998) Proc Natl Acad Sci USA 95:5913-5920.

12. Zhang Y, Skolnick J (2004) Proteins 57:702-710.

13. Zhang Y, Skolnick J (2005) Nucleic Acids Res 33:2302-2309.

14. Shindyalov IN, Bourne PE (1998) Protein Eng 11:739-747.

15. Holm L, Sander C (1993) J Mol Biol 233:123-138.

16. Kihara D, Skolnick J (2003) J Mol Biol 334:793-802.

17. Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ (2006) Nucleic Acids Res 34:D227-230.

18. de Castro E, Sigrist CJ, Gattiker A, Bulliard V, Langendijk-Genevaux PS, Gasteiger E, Bairoch A, Hulo N (2006) Nucleic Acids Res 34:W362-365.

19. Zhang Y, Arakaki AK, Skolnick J (2005) Proteins 61(Suppl 7):91-98.

20. Zhang Y, Skolnick J (2004) Biophys J 87:2647-2655.

21. Zhang Y, Skolnick J (2004) Proc Natl Acad Sci USA 101:7594-7599.

22. Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A (2000) Annu Rev Biophys Biomol Struct 29:291-325.

23. Sali A, Blundell TL (1993) J Mol Biol 234:779-815.

24. Zhang Y, Skolnick J (2001) J Chem Phys 115:5027-5032.

25. Zhang Y, Skolnick J (2004) J Comput Chem 25:865-871.

26. Feig M, Rotkiewicz P, Kolinski A, Skolnick J, Brooks CL, 3rd (2000) Proteins 41:86-97.

27. DePristo MA, de Bakker PI, Lovell SC, Blundell TL (2003) Proteins 51:41-55.

28. Canutescu AA, Shelenkov AA, Dunbrack RL, Jr (2003) Protein Sci 12:2001-2014.

29. Debouck C, Gorniak JG, Strickler JE, Meek TD, Metcalf BW, Rosenberg M (1987) Proc Natl Acad Sci USA 84:8903-8906.

30. Farmerie WG, Loeb DD, Casavant NC, Hutchison CA, 3rd, Edgell MH, Swanstrom R (1987) Science 236:305-308.

31. Ghosh AK, Chapsal BD, Weber IT, Mitsuya H (2007) Acc Chem Res, 10.1021/ar7001232.

32. Marastoni M, Bazzaro M, Bortolotti F, Tomatis R (2003) Bioorg Med Chem 11:2477-2483.

33. Narendra Babu SN, Rangappa KS (2007) Bioorg Med Chem, 10.1016/j.bmc.2007.10.052.

34. Wlodawer A, Vondrasek J (1998) Annu Rev Biophys Biomol Struct 27:249-284.

35. Ewing T, Baber JC, Feher M (2006) J Chem Inf Model 46:2423-2431.

36. Willett P (2006) Drug Discov Today 11:1046-1053.

37. Willett P, Barnard JM, Downs GM (1998) J Chem Inf Comput Sci 38:983-996.

38. Xue L, Godden JW, Stahura FL, Bajorath J (2003) J Chem Inf Comput Sci 43:1218-1225.

39. Xue L, Stahura FL, Bajorath J (2004) J Chem Inf Comput Sci 44:2032-2039.

40. Gasteiger E, Jung E, Bairoch A (2001) Curr Issues Mol Biol 3:47-55.

41. Golovin A, Dimitropoulos D, Oldfield T, Rachedi A, Henrick K (2005) Proteins 58:190-199.

42. Anonymous (2007) Daylight Theory Manual (Daylight Chemical Information Systems, Inc., Aliso Viejo, CA).

43. Xue L, Godden JW, Stahura FL, Bajorath J (2004) J Chem Inf Comput Sci 44:1275-1281.

44. Xue L, Godden JW, Stahura FL, Bajorath J (2003) J Chem Inf Comput Sci 43:1151-1157.

This Article

  1. PNAS January 8, 2008 vol. 105 no. 1 129-134
  1. AbstractFree
  2. Figures Only
  3. Full Text
  4. Full Text (PDF)
  5. » Supporting Information