Biological spectra analysis: Linking biological activity profiles to molecular structure

  1. Anton F. Fliri*,
  2. William T. Loging,
  3. Peter F. Thadeio, and
  4. Robert A. Volkmann*,
  1. Pfizer Global Research and Development, Groton, CT 06340
  1. Communicated by Larry E. Overman, University of California, Irvine, CA, October 25, 2004 (received for review September 4, 2004)

  1. Fig. 1.

    A cross section of the drugable proteome. Proteins are clustered on the basis of sequence homology. Proteins in close proximity in this dendrogram are members of the same gene family and share sequence similarity and structure similarity in regulatory and ligand-binding domains. Cerep's BioPrint database (23) consists of >100 in vitro assays. Forty-two of the 92 assay constituents used in our studies (shown in red) are G-protein-coupled receptors (GPCRs). The rest encompass a wide range of functionally diverse proteins representing a number of protein superfamilies.


  2. Fig. 2.

    Biological activity spectra of antifungal agents clotrimazole and tioconazole. These spectra were constructed by using 92 bioassay data points from Cerep's BioPrint array. The bioassay proteins, listed in Data Set 2, are located on the x axis. Associated percent inhibition values (A), determined at 10 μM drug concentrations for each compound, are described in the two-dimensional spectra view shown in B and as a heat map shown in C, which presents the same information and layout for the x axis and uses a coloring scheme for expressing percent inhibition values. A white to green to black gradient expresses values between 0% and 100% inhibition. This coloring scheme is applied to all heat maps shown in this publication.


  3. Fig. 4.

    Hierarchical clustering of biospectra provides the azole section of this linkage map using 1,567 molecules in hierarchical clustering (see Fig. 3) (A) and the new y-axis dendrogram section containing azole derivatives 1–13 and using 1,571 molecules in hierarchical clustering, resulting from the addition of 10–13 to the database (B).


  4. Fig. 3.

    Hierchical clustering of 1,567 compounds by using percent inhibition values. (A) A heat map and x-axis and y-axis dendrograms obtained for the complete SAR matrix. Over 140,000 data points with a dimension 92 × 1,567 (assays × molecules) resides in the heat map. (B) A portion (23 molecules in 92 assays) of the heat map containing clotrimazole (1) and tioconazole (2), which were described in Fig. 2. The data are organized by using two classification schemes (dendrograms): one with horizontal orientation on top (x-axis dendrogram) and the other with vertical orientation on the left side (y-axis dendrogram). Receptors appearing in the x-axis dendrogram are color coded according to memberships in designated protein superfamilies: blue, G-protein-coupled receptors; pink, enzymes; green, ion channels; purple, transporters; orange, receptors; black, steroid receptors. Providing an unbiased organization of biospectra of individual molecules (shown on the y axis). The x-axis dendrogram clusters proteins into groups based on interaction profile similarity between proteins by using the percent inhibition values of 1,567 molecules as the measure. Proteins with similar percent inhibition value distribution (similar ligand-binding domain characteristics) appear on proximate branches of the x-axis dendrogram. The y-axis dendrogram, on the other hand, clusters molecules on the basis of similarity ranking obtained by comparing biospectra by using the UPGMA algorithm (molecule comparison). Biospectra similarity between clusters and individual molecules is measured by using confidence in CCS values. Clusters to the right of the red line of the y-axis dendrogram have CCS values >0.80. A similar scoring method for comparing molecular structure based on IR spectra similarity has been described by Varmuza et al. (24).


Footnotes

« Previous | Next Article »Table of Contents