Previous Article |
Table of Contents
| Next Article
CHEMISTRY
Charting biologically relevant chemical space: A structural classification of natural products (SCONP)







, ¶
*Department of Chemical Biology, Max Planck Institute of Molecular Physiology, Otto-Hahn-Strasse 11, D-44227 Dortmund, Germany;
Fachbereich 3, Chemische Biologie, Universität Dortmund, Otto-Hahn-Strasse 6, D-44227 Dortmund, Germany;
Novartis Institutes for Biomedical Research, CH-4002 Basel, Switzerland; and
Department of Nephrology and Hypertension, Department of Clinical Research, University of Berne, Freiburgstrasse 15, CH-3010 Berne, Switzerland
Edited by Richard A. Lerner, The Scripps Research Institute, La Jolla, CA, and approved September 9, 2005 (received for review May 3, 2005)
| Abstract |
|---|
|
|
|---|
-hydroxysteroid dehydrogenase type 1 with activity in cells guided by SCONP and protein structure similarity clustering. 11
-hydroxysteroid dehydrogenase type 1 is a target in the development of new therapies for the treatment of diabetes, the metabolic syndrome, and obesity.
chemical biology | compound libraries | hydroxysteroid dehydrogenase | cheminformatics
A systematic structure-orientated organizing principle of the known NPs combined with annotations of biological origin and pharmacological activity would chart the regions of chemical space explored by nature, provide a structural rationalization and categorization of NP diversity, and also provide guidance for the development of NP-like compound libraries.
Statistical analyses of different NP databases have been performed in a few cases (710); however, a systematic and annotated structural categorization of NPs leading to development principles for compound library design is missing.
Here, we introduce a structural classification of NPs (SCONP) as a idea- and hypothesis-generating tool to define structural relationships between different NP classes in a tree-like arrangement and for the design of NP-derived compound libraries.
| Materials and Methods |
|---|
|
|
|---|
From the biological source field (BSRC) field in the DNP, the genus of the source organism was extracted, and its taxonomic classification was identified in the Integrated Taxonomic Information System (ITIS) database (www.itis.usda.gov). For species not listed in ITIS, information was amended by using the National Center for Biotechnology Information taxonomy browser (www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html) and joined to the ITIS taxonomic tree at the lowest possible taxonomic level. The canonical Simplified Molecular Input Line Entry Specification (SMILES) (12) without stereochemistry of the structures in the DNP and the MDL Drug Data Report (MDDR) database (www.mdl.com) were matched against each other to join biological activity information contained in the MDDR to the DNP data set. Both database-joining operations were performed by using PIPELINEPILOT software (www.scitegic.com).
Inhibition of 11
-Hydroxysteroid Dehydrogenases (11
HSDs). The 11
HSD type 1 (11
HSD1)-dependent oxoreduction of cortisone and 11
HSD type 2 (11
HSD2)-dependent oxidation of cortisol were measured in lysates of stably transfected HEK-293 cells as described in ref. 13. The rate of conversion of cortisone to cortisol or the reverse reaction was determined by using [1,2,6,7-3H]-labeled substrate at a final concentration of 200 nM cortisone or 25 nM cortisol, respectively, in the presence of inhibitor (0200 µM). Data represent mean ± SD of at least four independent experiments. To exclude the possibility that the observed reduction in 11
HSD1 activity is due to promiscuous inhibition of the enzyme, the inhibition of the oxoreduction of cortisone by 3, 4, and 5 was measured in the presence of 0.1% Triton X-100 (14). The calculated IC50 values were comparable with those obtained in the absence of the detergent.
Nuclear Translocation Assay. HEK-293 cells (300,000 cells per well) were grown on poly(L-lysine)-coated glass slides in six-well plates containing 2 ml of DMEM supplemented with 10% FCS. Cells were transfected according to the calcium phosphate precipitation method with 1 µg of GFP-human glucocorticoid receptor (GR) expression plasmid (15) and either 0.5 µg of human 11
HSD1 plasmid or empty pcDNA3 vector (Invitrogen) per well. After 6 h, cells were washed twice with medium that was charcoal-stripped twice to remove steroids, followed by incubation in this medium for another 18 h. Cells then were preincubated for 30 min with inhibitor or GR antagonist as indicated, followed by the addition of 500 nM cortisone and further incubation for 40 min. Cells were washed and fixed with 4% paraformaldehyde for 10 min, and localization of GFP-GR was determined by fluorescence microscopy. Three independent transfection experiments were carried out whereby 300 fluorescent cells were determined per sample by an observer who was blinded to the cell treatment.
For further descriptions of cheminformatics methods, a summary of the library synthesis, and the transactivation assay, see Supporting Materials and Methods and Scheme 1, which is published as supporting information on the PNAS web site.
| Results |
|---|
|
|
|---|
However, in general, a possible subsequent synthesis effort planned on the basis of our analysis (see below), must take this self-limitation into account and possibly compensate it by the synthesis of different stereoisomers with the same underlying structural scaffold. The biological activity of NPs and compounds derived thereof is certainly determined to a major extent by their chirality. For a useful structure-based organizing principle of NPs, however, abstraction of structural information by focusing on 2D structures seemed to be acceptable. This finding is supported by several previous studies that showed that cheminformatics analyses employing 3D molecular descriptors do not perform better than analyses based on 2D molecular structures (1618).
Duplicates were not eliminated from the data set because these may represent stereoisomers or may be differently annotated (e.g., biological origin or function). The resulting set of 171,045 structures was processed. We found that 154,428 of the structures contain rings (90%). Because the overwhelming majority of the compounds used in medicinal chemistry and chemical biology research also contains rings (19), subsequent analysis focused on the ring-containing NPs.
Cheminformatics analysis of the ring-containing NPs started with the removal of acyclic substituents. Cyclic substituents were regarded as being part of the scaffold. We extracted 31,011 unique scaffolds [i.e., frameworks as unions of ring systems (20) including also exocyclic double bonds and possible linking chains between rings] from the ring-containing NPs. After careful visual inspection of this initial result, it was found that redundancies occurred because of different glycosylation patterns of a given aglycon (scaffold) (for an example, see Fig. 4, which is published as supporting information on the PNAS web site).
Glycosidic moieties, unless they contain complex sugars (for example, in glycosidic antibiotics, such as aminoglycosides, vancomycin, and others), usually exert a modulating effect on the aglycon's biological activity, e.g., modulation of solubility or release from an inactive precursor by a glycosidase (2123). Therefore, we introduced an additional step of in silico deglycosylation of standard O-glycosides before extracting scaffolds. The analysis was performed with a special computational algorithm written in JAVA (Sun Microsystems) by using recursive substructure matching. Altogether 25,337 NP molecules were deglycosylated (14.8% of the whole NP data set) by removing 112 sugar moieties from the parent glycosides, yielding at the end 149,513 NP structures with rings, which in turn provided 24,891 unique deglycosylated scaffolds (reduction of the number of distinct scaffolds by 20%). In an initial attempt to develop a structural organizing principle for NPs, cluster analysis was performed on the NP structures. JarvisPatrick clustering (24) of the NPs' molecular fingerprints using the Tanimoto coefficient (25) as quantitative similarity measure (Tanimoto index
0.9) led to a grouping of NPs into 42 large clusters containing
100 NP molecules, 1,507 medium clusters containing
10 NP molecules, 2,851 small clusters (containing
5 NP molecules), and
50,000 singletons. Thus, this clustering procedure did not provide a useful principle of structural classification.
For an alternative hierarchical tree-like classification of NP scaffolds, a computational algorithm was developed that identifies the parent scaffold of each individual scaffold. It searches for substructures in each scaffold that represent NP scaffolds and groups them hierarchically according to decreasing number of rings, i.e., scaffold size. Thus, a parent scaffold represents a substructure of a respective query scaffold. This hierarchical analysis generated a scaffold tree with several hierarchical levels and with single rings being the roots that were grouped into carbocycles, N- and O-heterocycles. Each scaffold on a given hierarchical level can be annotated with several individual NPs and represents a node from which further arborization may lead to more complex scaffolds. For the identification of "parent scaffolds," a set of rules was used that provides results close to the way of thinking of synthetic chemists. First, the parent had to be a substructure of the child scaffold. Second, no breaking of ring bonds in a child was allowed. In case several candidates were available, the parent was selected such as to get the maximum number of heteroatoms in the parent and then by choosing the parent of maximum size. If it was not possible to make a decision based on these rules simply the most frequent scaffold was chosen as parent scaffold (see Supporting Materials and Methods for a detailed description of the algorithm).
Such a modular tree-like arrangement of NPs (see Fig. 1) allows for the representation of very rare scaffolds representing only a few or only one NP as a branch in the tree and thereby for their highly dynamic and readily extendable structural classification and correlation with other scaffolds. Regular clustering approaches would define them as singletons, and structural similarities with other NPs would not be detected.
|
250 Å3 (see Fig. 6A, which is published as supporting information on the PNAS web site). A statistical evaluation of the volumes found in a data sample of 18,402 cavities extracted from the Protein Data Bank (www.pdb.org) revealed that most cavities fall into a volume range between 300 and 800 Å3 (26). Thus, the average volume of the two- to four-ring NPs correlates with and can be mapped to the average dimensions of protein cavities, taking into account that ligands of proteins often do not fill the whole volume of a given cavity. A similar analysis of the volume distribution of
30,000 drugs from the World Drug Index (Thomson Derwent, Philadelphia, www.thomsonderwent.com) revealed furthermore that the volumes of the two- to four-ring -containing NPs are also comparable with those found in drugs (see Fig. 6B). The NP scaffold tree can be used as a strategic and guiding tool for the selection of underlying frameworks for NP-inspired compound library development in different ways. The most logical and apparent approach is to select the scaffold of a given NP and close structural neighbors to guide the synthesis of compound libraries. These NP-derived compound collections should yield relatively high hit rates at comparably small library size. Such an approach should be of general value and applicability, and current experience with several NP-derived compound collections prepared by others and by us confirms this expectation (refs. 2731; see also references within ref. 29 for recent reviews of NP guided compound library synthesis).
In addition to the most logical use of the NP scaffold tree, we envisioned that for compound library development possible structurally simplified analogs with prevailing biological relevance (not necessarily identical activity) might be found by brachiation through the branches of the tree. Structural simplification would then be sought by identification of scaffolds according to the classification principle used for the construction of the tree, i.e., core structures of less complex NPs, which form a substructure of the scaffold characteristic for the query structure. Notably, these scaffolds still encode for the property to bind to proteins because they occur in NPs. We also note that this brachiation does not imply an evolutionary or biosynthetic relationship between the scaffolds involved; rather, the argument is exclusively based on structural, i.e., chemical relationships. We also stress that if structural simplification extends too far, most probably the guiding biological activity may be lost.
|
Such criteria could, for instance, be biological activity of compound classes represented by individual scaffolds or biological origin. Such information can be encoded for the different hierarchical levels and scaffolds of the tree (for an annotation example with biological origin, see Fig. 7, which is published as supporting information on the PNAS web site). Biological activity might be a very relevant second criterion, but currently it can only be applied for a relatively small subfraction of the NPs analyzed. Comparison of the NPs in the DNP with the MDL Drug Data Report (MDDR) database (www.mdl.com), which lists biological activity of 153,366 compounds, revealed that only for 2,110 of the NPs listed in the DNP biological activity was reported in the MDDR. Furthermore, biological activity is very ill-defined, ranging from the description of precise targets to very general annotations, e.g., "cytotoxic." It also is not general and may change frequently because it depends on the type and number of assays to which the NPs were and will be subjected.
The ability to bind to proteins is a hierarchically higher consideration than bioactivity because it also addresses the fact that NPs are biosynthesized by and therefore bind to proteins irrespective of activity in an assay. Thus, functional, mechanistic, evolutionary, and structural analogy between possible protein targets appear to be particularly suitable second criteria for selection of a structurally simpler scaffold for library development.
To investigate whether this approach may indeed be viable, we used the naturally occurring 11
HSD1 and 11
HSD2 ligand glycyrrhetinic acid (GA, 1; Fig. 2A) as query compound. 11
HSD1, which catalyzes the conversion of inactive cortisone into active cortisol, is a promising target for the development of new drugs (13, 32), and isoenzyme-selective 11
HSD1 inhibitors are very actively being sought.
Analysis of the pentacyclic core structure of the NP GA leads to an assignment of the NP scaffold to a branch within the NP tree (see Fig. 2 A). Brachiation in the direction of reduced complexity leads to a subset of two- and three-ring systems that were chosen because they occur most frequently (see above) and because of their accessibility by chemical synthesis.
For the choice of the precise scaffold of a compound library, we used as the second decisive criterion similarity between possible target proteins (see above). For this purpose a priori the established grouping of proteins into protein families according to evolutionary relationship and according to their function is certainly most appropriate and logical. As an alternative to these groupings, we have recently introduced protein structure similarity clustering (PSSC) (4) as an abstracting guiding principle for compound library development. PSSC focuses on sheer structural similarity in the ligand-sensing cores of proteins to group them into a structure similarity cluster. The structure of a known ligand of a cluster member protein, e.g., a NP, is used for the development of ligands for other cluster member proteins. Following this logic, we previously identified a protein structure similarity cluster (PSSC) containing the 11
HSDs, acetylcholinesterase, and Cdc25A phosphatase (4). Dysidiolide (2) is a known naturally occurring inhibitor of Cdc25A (33). This NP embodies the 1,2,3,4,4a,5,6,7-octahydronaphthalene scaffold, which also was identified as possible library scaffold by the brachiation approach (see Fig. 2 A). Thus, we investigated whether a compound library based on this scaffold would yield inhibitors of 11
HSD. 11
HSD inhibitors with this scaffold have not been described before.
The substitution pattern of the octahydronaphthalene system in dysidiolide and of the unsaturated C/D ring system in GA only partly match. Therefore, and to increase the diversity of the library, substitution patterns with different degrees of overlap between the GA and the Dysidiolide core structures were introduced into the basic core scaffold during library synthesis (see Fig. 2B for a general structural description of the library). The scaffolds themselves were synthesized employing the Robinson annulation as key step. For library synthesis on solid support, the scaffolds were equipped with a secondary alcohol for attachment to the polymeric carrier. Diversification of the basic scaffolds was achieved by means of aldol and Wittig reactions and Pd(0)-catalyzed coupling reactions (see Supporting Materials and Methods and Table 2, which is published as supporting information on the PNAS web site, for a description of the synthesis route and characterization of compounds 35).
A collection of 162 compounds was synthesized and investigated biochemically for inhibition of 11
HSD1 and 11
HSD2 (see Supporting Materials and Methods) (13). Compounds displaying IC50 values of
10 µM were considered as hits.
The collection contained 30 11
HSD1 inhibitors with IC50 values of 0.319.1 µM. Four 11
HSD1 inhibitors were in the nanomolar range (IC50 values of 0.310.74 µM). Three compounds inhibited 11
HSD2 with IC50 values of 2.06.6 µM. The results for the most relevant compounds are given in Table 1. Most remarkably, even at this comparably small library size, the hits indicated a pronounced degree of selectivity for the isoenzymes 11
HSD1 and 11
HSD2. Twenty-eight compounds selectively inhibited 11
HSD1. To demonstrate cellular activity of the previously undescribed inhibitor class, translocation and transactivation assays (see Supporting Materials and Methods for the description of the assays) were performed for compound 5 (see Table 1), one of the most potent and selective 11
HSD1 inhibitors (IC50 = 0.35 µM).
|
|
HSDs (34) and in which the GR is present at only very low levels were transfected with a GFP-human GR expression plasmid (15, 35). In the absence of cortisol, GR is cytosolic. If the potent GR agonist cortisol is produced from externally added cortisone through reduction by 11
HSD1, it binds GR and induces the translocation of the receptor to the nucleus and stimulation of transactivation.
As shown in Fig. 3, upon addition of cortisone to cells expressing 11
HSD1, a dose-dependent induction of translocation of GR into the nucleus was observed, and also stimulation of transactivation was recorded. Glucocorticoid-dependent nuclear translocation of GR and transactivation were blocked in the presence of the unspecific 11
HSD inhibitor GA (1) (Fig. 3). Both nuclear translocation and GR-dependent transactivation were abolished upon coincubation of cells with cortisone and the selective 11
HSD1 inhibitor 5, indicating the capability of 5 to efficiently inhibit the conversion of cortisone to cortisol in intact cells. Less than 50% of GR molecules localized to the nucleus in the presence of 3 µM compound 5 (see Fig. 8, which is published as supporting information on the PNAS web site), and transactivation by GR diminished to
40% (see Fig. 9, which is published as supporting information on the PNAS web site). Nuclear translocation and transactivation were completely blocked at 10 µM 5.
| Discussion |
|---|
|
|
|---|
Its most logical application for compound library development is the selection of library scaffolds based on relevance to nature and leading to compound collections that can be regarded as biologically prevalidated. Because NPs emerge by means of biosynthesis by proteins and often fulfill various biological functions through interaction with multiple proteins, the investigation of such compound collections in biochemical and biological screens should yield high hit rates at comparably small library size. Current experience with NP-inspired compound libraries synthesized and evaluated biochemically and biologically by others and by us supports this notion (see above), and the compound collection investigated here provides further proof-of-principle. Although the choice of NP-derived scaffolds currently is based on individual insight and knowledge about particular NP classes, the SCONP tree provides a systematization based on NP structure and allows for a statistically founded choice.
The systematic application of the NP scaffold tree for the establishment of a larger compound library from several smaller compound collections should allow for the assembly of an elaborate, yet still comparatively small, screening library that is diverse because it builds on the diversity found in nature and enriched in biologically relevant members because the scaffolds of the individual sublibraries are biologically prevalidated. Such a library as a whole also should yield comparably high hit rates in different biochemical and biological screens at comparably small size.
In addition to this most logical application of SCONP, a less obvious and probably less general, however, if successful, very valuable, application of the scaffold tree may be viable. Brachiation within the tree from an outer branch to structurally less complex scaffolds on an inner branch by means of intermediate naturally occurring parent scaffolds and leading to structurally simplified compound classes with similar biological activity was demonstrated to be possible for the pentacyclic unspecific 11
HSD inhibitor GA and the octahydronaphtalene scaffold identified in this way. If such a brachiation is performed, a second criterion is required for the choice of the smaller scaffold because typically on inner branches several scaffolds will be identified and because it is not obvious how far the brachiation can be allowed to proceed without loss of the desired biological activity. In the example pursued here, PSSC was chosen as second criterion, but other arguments like evolutionary and mechanistic relationships between possible target proteins and biological activity are very likely to be equally applicable or superior.
The successful experimental verification of this less-obvious approach demonstrates that brachiation may indeed be a viable procedure for the identification of structurally simpler compound classes with retained ability to bind to proteins and biological activity. However, we stress that generality cannot be claimed based on the example detailed above alone and that we do not intend to do so. Rather, reduction from larger to smaller scaffolds probably may be possible in only a limited number of cases and would depend on the complexity of the scaffold. Future investigations of further examples will be required to determine whether or not this far-going and more radical instrumentalization of the NP scaffold tree is generally applicable.
| Acknowledgements |
|---|
| Footnotes |
|---|
This paper was submitted directly (Track II) to the PNAS office.
Abbreviations: NP, natural product; SCONP, structural classification of NPs; DNP, Dictionary of Natural Products; PSSC, protein structure similarity clustering; 11
HSD, 11
-hydroxysteroid dehydrogenase; 11
HSD1, 11
HSD type 1; 11
HSD2, 11
HSD type 2; GR, glucocorticoid receptor; GA, glycyrrhetinic acid.
¶ To whom correspondence should be addressed. E-mail: herbert.waldmann{at}mpidortmund.mpg.de.
© 2005 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
S. N. Lopez, I. A. Ramallo, M. G. Sierra, S. A. Zacchino, and R. L. E. Furlan Chemically engineered extracts as an alternative source of bioactive natural product-like compounds PNAS, January 9, 2007; 104(2): 441 - 444. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Noren-Muller, I. Reis-Correa Jr, H. Prinz, C. Rosenbaum, K. Saxena, H. J. Schwalbe, D. Vestweber, G. Cagna, S. Schunk, O. Schwarz, et al. From the Cover: Discovery of protein phosphatase inhibitor classes by biology-oriented synthesis PNAS, July 11, 2006; 103(28): 10606 - 10611. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||