New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Structural basis of sequence-specific collagen recognition by SPARC
-
Edited by John Kuriyan, University of California, Berkeley, CA, and approved October 14, 2008 (received for review August 26, 2008)

Abstract
Protein interactions with the collagen triple helix play a critical role in collagen fibril formation, cell adhesion, and signaling. However, structural insight into sequence-specific collagen recognition is limited to an integrin-peptide complex. A GVMGFO motif in fibrillar collagens (O denotes 4-hydroxyproline) binds 3 unrelated proteins: von Willebrand factor (VWF), discoidin domain receptor 2 (DDR2), and the extracellular matrix protein SPARC/osteonectin/BM-40. We report the crystal structure at 3.2 Å resolution of human SPARC bound to a triple-helical 33-residue peptide harboring the promiscuous GVMGFO motif. SPARC recognizes the GVMGFO motifs of the middle and trailing collagen chains, burying a total of 720 Å2 of solvent-accessible collagen surface. SPARC binding does not distort the canonical triple helix of the collagen peptide. In contrast, a critical loop in SPARC is substantially remodelled upon collagen binding, creating a deep pocket that accommodates the phenylalanine residue of the trailing collagen chain (“Phe pocket”). This highly restrictive specificity pocket is shared with the collagen-binding integrin I-domains but differs strikingly from the shallow collagen-binding grooves of the platelet receptor glycoprotein VI and microbial adhesins. We speculate that binding of the GVMGFO motif to VWF and DDR2 also results in structural changes and the formation of a Phe pocket.
Collagen, the most abundant protein in vertebrates, is characterized by a triple-helical structure of 3 polypeptide chains containing repetitive glycine-X-X′ triplets; the X and X′ positions, respectively, are often occupied by the imino acids proline and 4-hydroxyproline (O). The 28 human collagens have numerous essential functions in tissue formation, stability and homeostasis, and mutations in collagen genes cause many human diseases (1). Collagens I–III, V and XI form supramolecular fibrils that lend mechanical stability to the vertebrate body. Normal collagen fibrillogenesis in vivo requires a number of globular proteins interacting with the triple helix, such as small leucine-rich repeat proteoglycans (2). Cellular interactions with collagen are mediated by a diverse group of transmembrane collagen receptors, including integrins, discoidin domain receptors (DDRs) and members of the Ig superfamily (3–5). Although the structure of the collagen triple helix has been known for >50 years (6), protein-collagen interactions remain poorly understood at the atomic level.
Synthetic triple-helical peptides have been invaluable in mapping specific binding sites within collagen (7). The major integrin-binding site in fibrillar collagens I and II is a GFOGER motif (8, 9). The atomic details of this important interaction were revealed by a crystal structure of the integrin α2 I-domain bound to a 21-residue triple helical peptide, providing the only example to date of a vertebrate protein-collagen complex (10). More recent studies have identified a GVMGFO motif, conserved in collagens I-III and situated ≈100 residues (≈30 nm) upstream of the GFOGER motif, as a binding hotspot for 3 structurally and functionally distinct proteins: the plasma protein von Willebrand factor (VWF), whose interaction with collagen contributes to haemostasis (11); the receptor tyrosine kinase DDR2 (12); and the extracellular matrix protein SPARC (13).
SPARC (also called osteonectin or BM-40) is a small evolutionarily conserved glycoprotein that modulates cell-matrix interactions and collagen assembly (14, 15). SPARC is essential for embryo development in invertebrates (16, 17) and has been suggested to act as a chaperone for basement membrane collagen IV (17). Mice lacking SPARC develop early onset cataract, lax skin and bone loss; this phenotype is likely to result, at least in part, from perturbed collagen fibrillogenesis or cell adhesion (14, 15). Human SPARC consists of an acidic 52-residue segment followed by a follistatin-like (FS) domain and an α-helical domain (EC) containing 2 unusual calcium-binding EF-hands and the collagen-binding site (18–20). Tissue-derived SPARC is processed by proteolytic cleavage in the EC domain, resulting in a ≈10-fold increase in collagen affinity. This effect can be mimicked by recombinant deletion of helix C (ΔαC mutant) (20). To obtain insight into the molecular mechanism of collagen recognition by SPARC, we have determined the crystal structure of a complex between SPARC FS-EC ΔαC and a triple-helical collagen peptide containing the GVMGFO motif.
Results and Discussion
Structure of the SPARC-Collagen Complex.
SPARC FS-EC ΔαC binds with high affinity to residues 564–590 of the human collagen α1(III) chain (corresponding to 397–423 of the triple-helical domain) (13). A 33-residue peptide containing this collagen III region preceded by 2 GPO repeats forms a stable triple-helix and a 1:1 complex with SPARC FS-EC ΔαC (data not shown). We determined the crystal structure of this SPARC-collagen complex at 3.2 Å resolution by a combination of molecular replacement and heavy atom phasing [supporting information (SI) Table S1 and Fig. S1].
The collagen peptide is a straight triple helix of ≈95 Å length (Fig. 1). The 3 collagen chains are coiled around each other with the characteristic 1-residue stagger, leading to the designation of a leading, middle and trailing chain (10). The triple-helical twist does not vary much along the collagen peptide: the GPO-rich N terminus is close to an ideal 75 helical symmetry, whereas the remainder of the helix assumes a twist halfway between 75 and 107 symmetry (21, 22). SPARC is bound to the C-terminal half of the collagen peptide, burying 720 Å2 of solvent-accessible collagen surface. Residues 21–23 of the middle chain and residues 17–27 of the trailing chain make contacts closer than 4.0 Å with SPARC; the leading chain does not contribute to SPARC binding (Fig. 2 and Table S2). Thus, the SPARC-binding site observed in the complex structure spans 4 collagen triplets, GQOGVMGFOGPK (570–581 of the collagen α1(III) chain). In principle, SPARC could equally well bind to the leading and middle collagen chains, whose relative disposition is identical to that of the middle and trailing chains. As in the case of the integrin-collagen complex (10), the observed mode of binding is likely to be determined by crystal packing forces.
Crystal structure of SPARC FS-EC ΔαC bound to a 33-residue collagen peptide (stereoview). The FS and EC domains of SPARC are in green and cyan, respectively. Disulphide bridges are in pale pink, the glycan attached to N99 is in gray, and a calcium ion is shown as a purple sphere. The collagen peptide is shown as a Cα ribbon (leading chain, yellow; middle chain, orange; trailing chain, red). The chain termini, selected helices and the location of the αC deletion are labeled.
Details of the SPARC-collagen interaction. (A) Sequence of the collagen peptide, indicating the 1-residue stagger between the chains. The sequence numbering of the leading strand is indicated at the top, and the SPARC-binding motifs of the middle and trailing chains are colored. (B) Surface representation of the collagen binding site of SPARC. The FS and EC domains of SPARC are in green and cyan, respectively. The collagen peptide is shown as a Cα ribbon (leading chain, yellow; middle chain, orange; trailing chain, red). Selected residues of the middle and trailing chains are shown in atomic detail and are labeled. (C) Interactions of SPARC with the trailing chain. SPARC is shown as a ribbon diagram with semitransparent surface rendering. Selected SPARC residues are shown in atomic detail and are labeled. The collagen leading and middle chains are shown as a Cα ribbon and the trailing chain is shown in atomic detail. Hydrogen bonds are indicated by dashed lines. (D) Interactions of SPARC with the collagen middle chain.
Only the SPARC EC domain interacts with the collagen peptide, although the FS domain comes close (4.5 Å) to V20 of the middle chain (Fig. 2B). The collagen-binding site of the SPARC EC domain is composed of the long αA helix and the adjacent αE- αF loop (Fig. 1). These 2 elements form a deep pocket that accommodates F23 of the collagen trailing chain (Fig. 2C and Table S2). This “Phe pocket,” the most notable feature of the SPARC-collagen interface, is delineated by several apolar SPARC residues (F146, M150, W153, L242) and, surprisingly, a salt bridge between R149 and E246. The refined side chain conformation of the collagen F23 in the Phe pocket is unusual and allows the phenyl ring of F23 to approach SPARC residue W153 edge-on, suggestive of a C-H··π hydrogen bond (23). A similar interaction may also be formed with the carboxylate group of E246.
SPARC residues W153 and L242 are also involved in the second major apolar SPARC-collagen contact, with M21 of the collagen middle chain (Fig. 2D and Table S2). Finally, F23 of the middle chain is situated atop SPARC residue M245 at the apex of the αE- αF loop. In addition to these apolar interactions, a total of 6 direct hydrogen bonds are formed between SPARC and the collagen peptide, 5 with the collagen main chain and 1 with the hydroxyl group of O24 of the middle chain. Notably, all polar residues at the rim of the Phe pocket (R149, E246, the indole nitrogen of W153) are engaged in hydrogen bonding with the collagen peptide (Fig. 2 C and D).
Previous studies have shown that the glycan attached to N99 in the FS domain modulates the affinity of SPARC binding to fibrillar collagens I and V (24, 25). In our structure, the N-acetylglucosamine moieties attached to N99 are at >10 Å distance from the collagen peptide (Fig. 1), but the glycan may cause steric hindrance when SPARC binds to fibrillar collagens. Recently, an atomic model of the collagen I fibril structure has been reported, based on diffraction patterns of fibrils from rat tail tendon (26). Inspection of this structure shows that the SPARC-binding site is largely exposed on the flat surface of the fibril's gap zone (data not shown). Even so, a certain amount of local unfolding (disorder) of the fibril surface would appear to be required for SPARC binding to occur.
Structural Changes Within SPARC.
Collagen binding is accompanied by substantial structural rearrangements within SPARC. Most notably, the crucial Phe pocket is not present in free SPARC, but is only formed upon collagen binding by extensive remodelling of the αE- αF loop in the EC domain (Fig. 3). In the 3 different structures of the uncomplexed SPARC EC domain (18–20), this loop acts like a flap, with I243 and P244 plugging the pocket and burying most of the critical W153 side chain (Fig. 3A). In the complex with collagen, the αE helix is extended by 2 irregular turns (there are prolines at positions 237, 241 and 244), thus opening up the pocket and completing its rim by forming the R149-E246 salt bridge (Fig. 3 B and C).
Structural changes within SPARC upon collagen binding. (A) Surface representation of the collagen binding region in apo SPARC. The FS and EC domains are in green and cyan, respectively. Residues implicated in collagen binding by mutagenesis (20) are in purple and labeled. W153 is shown in blue and the location of P244 (see Structural Changes within SPARC) is indicated. (B) Surface representation of the collagen-binding site in the SPARC-collagen complex. The color scheme is the same as in A. The side chain of F23 of the collagen trailing chain is shown in atomic detail (red) to facilitate comparison with Figs. 1 and 2. (C) Stereoview of a superposition of the region surrounding the Phe pocket in apo SPARC (light brown) and the SPARC-collagen complex (cyan). Selected residues are shown in atomic detail and are labeled. F23 of the collagen trailing chain is shown as a semitransparent space-filling model (red).
The SPARC-collagen crystals grown at pH 5.5 are missing a calcium ion in the second EF-hand (Fig. S2), but we are confident that the remodelling of the αE-αF loop is a consequence of collagen binding and not of incomplete calcium loading. Crucially, loss of 1 of the 2 calcium ions does not compromise the native structure of the EF-hand pair. The partially and fully calcium-loaded EF-hand structures can be superimposed with a root-mean-square deviation of 0.59 Å for 44 Cα atoms, whereas complete removal of calcium from SPARC is known to result in large conformational changes and abolish collagen binding (27). The integrity of the EF-hand pair in the partially loaded state appears to be maintained by hydrogen bonding between the calcium ligands and the C256-C272 disulphides bridge at the empty EF-hand. Besides the αE-αF loop, there are other differences between free and collagen-bound SPARC, including a reorientation of the FS domain and changes in the αA-αB loop (Fig. S2); however, these changes are distant from the SPARC-collagen interface and likely to be the result of crystal packing forces.
Collagen binding to the integrin I-domain results in a conformational change that is linked to integrin signal transduction (10). Whether the structural changes within SPARC are functionally important is unknown. Computational docking of collagen triple helices is often used in the absence of experimental protein-collagen complex structures (11, 28, 29). Our observation that collagen binding causes substantial structural changes within SPARC cautions against the uncritical use of such docking procedures.
Agreement with Previous Mutagenesis.
In ref. 20, we showed that SPARC residues R149, N156, L242, M245 and E246 are important for binding to fibrillar collagen I and basement membrane collagen IV. All of these residues are now seen to make multiple interactions with the collagen peptide, with the exception of N156 (Figs. 2 and 3). This agreement is remarkable given the unexpected structural changes within SPARC upon collagen binding. The critical role of N156 demonstrated by mutagenesis could be explained if this residue were involved in a water-mediated hydrogen bond with collagen (water molecules are not resolved in our structure). Indeed, a canonical structural water molecule (30) is predicted to be present in an appropriate position on the trailing chain, bound between the peptide carbonyl group of G22 and the hydroxyl group of O24. Collagen triple helices are highly hydrated (30) and other structural water molecules may also contribute to the SPARC-collagen interaction.
Collagen Binding by Vertebrate and Invertebrate SPARCs.
The collagen-binding residues of human SPARC are highly conserved in all vertebrate and invertebrate SPARC sequences (Fig. S3 and data not shown), suggesting that all SPARCs bind collagen in a similar manner as the human orthologue. Indeed, collagen binding by invertebrate SPARCs has been demonstrated by genetic (17) and biochemical (31) experiments. Lacking an equivalent of helix αB, invertebrate SPARCs have a much shorter connection between helices αA and αD (Fig. S3) and may therefore not be subject to the same proteolytic activation mechanism as mammalian SPARCs (20).
Prediction of SPARC-Binding Sites in Collagens I–IV.
The detailed collagen sequence requirements for SPARC binding have not been probed biochemically (13). From our structure, we predict an absolute requirement for phenylalanine at position 23 of the trailing chain. The apolar nature of the Phe pocket would appear to preclude binding of polar side chains, whereas aliphatic side chains would not be able to fully occupy the available space. We further predict a preference for apolar side chains at positions 21 and 23 of the middle chain, and at position 20 of the trailing chain. Finally position 24 of both chains should be hydroxyproline to allow (water-mediated) hydrogen bonding interactions with N156 and E246 (Fig. 2 C and D).
We have used these structure-derived rules to predict further SPARC-binding sites in the fibrillar collagens I–III and in basement membrane collagen IV. Collagen I is a [α1(I)]2α2(I) heterotrimer, and we assumed that the 2 α1(I) chains are adjacent (i.e., not leading and trailing). Collagens II and III are homotrimers. The major SPARC-binding site in the fibrillar collagens I-III, GVMGFO, is ≈600 residues from the C terminus of the triple helix (13). The only other plausible site from sequence analysis is a GATGFO sequence (GAAGFO in collagen III) halfway between the GVMGFO motif and the C terminus. Rotary shadowing electron micrographs indeed show a secondary SPARC-binding site in this region (13). The 6 distinct mammalian collagen IV chains assemble into 3 heterotrimers, of which the [α1(IV)]2α2(IV) heterotrimer is the most abundant; invertebrates have only 1 type of collagen IV heterotrimer (1, 32). There is experimental evidence that the α2(IV) chain in the human [α1(IV)]2α2(IV) heterotrimer is trailing (33). We have identified 4 putative SPARC-binding sites in the human [α1(IV)]2α2(IV) heterotrimer, 2 of which are almost perfect matches for the structure-derived consensus (Fig. 4). Rotary shadow electron micrographs indeed show multiple SPARC-binding sites along the human collagen IV triple-helix (34), but the data do not permit the precise locations to be determined. It would be interesting to approach this question using peptide libraries (7), but the synthesis of the required heterotrimeric collagen peptides is technically challenging.
Putative SPARC-binding sites in collagen IV. Shown are partial sequences of human collagen III (SwissProt entry P02461) and collagen IV (α1 chain, P02462; α2 chain, P08572). The SPARC-binding site in collagen III is highlighted; residues that are predicted to be strictly required for SPARC binding (see Prediction of SPARC-Binding Sites in Collagens I–IV) are in red, residues that should be apolar are in orange. The same coloring scheme is used to indicate the 4 putative SPARC-binding sites in collagen IV.
Comparison with Other Collagen-Binding Proteins.
The GVMGFO motif recognized by SPARC bears little similarity to the GFOGER motif recognized by integrins, yet the 2 proteins appear to achieve sequence specificity by similar means. SPARC uses a restrictive pocket to bind the phenylalanine side chain of the GVMGFO motif (Fig. 2 B and C). Similarly, the crucial glutamic acid side chain of the integrin-binding GFOGER is bound in a deep pocket, coordinating the divalent metal ion at the metal ion-dependent adhesion site (10). SPARC and the integrin I-domain bury a total of 720 Å2 and 660 Å2 of solvent-accessible collagen surface, respectively, with both proteins undergoing structural changes upon collagen binding. One notable difference is that the collagen peptide is kinked in the complex with the integrin I-domain, but remains completely straight in the complex with SPARC.
VWF and DDR2 recognize the same GVMGFO motif as SPARC, and the phenylalanine side chain in collagen is critical for all 3 interactions (11, 12). It is thus reasonable to assume that SPARC, VWF and DDR2 share a common mode of collagen recognition, in particular with regard to the critical phenylalanine residue. Whether the 3 proteins compete for collagen in a physiological situation remains to be studied.
The collagen-binding A3 domain of VWF has been crystallized and its collagen binding site mapped by NMR, antibody blocking and mutagenesis experiments (35–38). The collagen-binding site delineated by these studies is a shallow groove, with no obvious Phe pocket. We hypothesize that a Phe pocket is formed upon collagen binding to the VWF A3 domain and that W982, which lies buried beneath the collagen-binding site, is part of this pocket. In support of this hypothesis, there is evidence for conformational variability in this region: the 4 crystallographically independent structures of the A3 domain show substantial differences in the vicinity of W982 (35, 36), and fluorescence experiments have shown that W982 exists in multiple rotamers (39).
The collagen-binding site of in the discoidin domain of DDR2 also is a shallow groove, into which a generic (GPO)n triple helix could be docked (29). How DDR2 achieves sequence specificity was not revealed by this study, however. A likely candidate for binding the phenylalanine side chain of the GVMGFO motif is the surface-exposed side chain of W52, whose mutation to alanine abolishes collagen binding (29). Given that the collagen-binding groove of DDR2 is made up of several long loops, pronounced structural changes upon collagen binding are to be expected.
Platelet glycoprotein VI (GPVI) binds to certain GPO-rich collagen sequences (40). Structure determination of the GPVI ectodomain has revealed a shallow groove appropriate for binding a (GPO)n triple helix (28), but the structural basis for sequence-specific collagen binding remains unknown. Finally, a number of microbial collagen receptors bind triple-helical collagen without any apparent sequence specificity. The structure of Staphylococcus aureus CNA in complex with a (GPO)n collagen peptide has revealed a mode of binding termed the “collagen hug” (41). The collagen triple helix is clamped by the 2 CNA domains, 1 of which accommodates the GPO repeats of the peptide in a relatively featureless and mostly hydrophobic groove.
In summary, our structure has revealed how the hydrophobic GVMGFO motif in fibrillar collagens is recognized by SPARC. An unexpected conformational change in SPARC creates a deep specificity pocket that binds the phenylalanine side chain of the GVMGFO motif. The same motif is also recognized by VWF and DDR2, and we predict that collagen binding to these proteins results in the formation of similar Phe pockets.
Methods
Peptide Synthesis, Protein Expression, and Complex Formation.
The collagen peptide was prepared by solid-phase synthesis as described in SI Materials and Methods. SPARC FS-EC ΔαC was produced in human embryonic kidney 293 cells as described in ref. 20. The SPARC-peptide complex was formed by mixing protein and peptide in a ≈1:1.5 ratio and purified by size exclusion chromatography as described in SI Materials and Methods.
Crystallization, Data Collection, and Structure Determination.
The SPARC-collagen complex was crystallized as described in SI Materials and Methods. X-ray diffraction data were collected and processed as described in SI Materials and Methods. The structure of the SPARC complex was solved by a combination of molecular replacement and SIRAS techniques, and refined to an R-factor of 0.254 (Rfree = 0.320) at 3.2 Å resolution, as described in SI Materials and Methods. Data processing, phasing and refinement statistics are listed in Table S1.
Acknowledgments
We thank Noemi Fukuhara, Federico Carafoli, and the staff at beamline 10.1 at the SRS Daresbury for help with data collection; Peter Brick for help with SHARP; and Jordi Bella for help with collagen refinement and analysis. This work was supported by grants from the Wellcome Trust (E.H.) and from Shriner's Hospital for Children (H.P.B.). E.H. is a Wellcome Senior Research Fellow.
Footnotes
- 1To whom correspondence should be addressed. E-mail: e.hohenester{at}imperial.ac.uk
-
Author contributions: E.H., T.S., R.W.F., and H.P.B. designed research; E.H., T.S., C.G., and H.P.B. performed research; E.H., T.S., and H.P.B. analyzed data; and E.H., R.W.F., and H.P.B. wrote the paper.
-
The authors declare no conflict of interest.
-
This article is a PNAS Direct Submission.
-
Data deposition: The atomic coordinates have been deposited in the Protein Data Bank, www.PDB.org (PDB ID code 2V53).
-
This article contains supporting information online at www.pnas.org/cgi/content/full/0808452105/DCSupplemental.
- © 2008 by The National Academy of Sciences of the USA
References
- ↵
- ↵
- Ameye L,
- Young MF
- ↵
- ↵
- ↵
- Sweeney SM,
- et al.
- ↵
- ↵
- ↵
- Knight CG,
- et al.
- ↵
- Xu Y,
- et al.
- ↵
- ↵
- Lisman T,
- et al.
- ↵
- Konitsiotis AD,
- et al.
- ↵
- Giudici C,
- et al.
- ↵
- ↵
- Rentz TJ,
- et al.
- ↵
- ↵
- Martinek N,
- Shahab J,
- Saathoff M,
- Ringuette M
- ↵
- ↵
- ↵
- ↵
- Bella J,
- Eaton M,
- Brodsky B,
- Berman HM
- ↵
- ↵
- ↵
- Xie RL,
- Long GL
- ↵
- Kaufmann B,
- et al.
- ↵
- Orgel JP,
- Irving TC,
- Miller A,
- Wess TJ
- ↵
- ↵
- Horii K,
- Kahn ML,
- Herr AB
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Bienkowska J,
- et al.
- ↵
- ↵
- ↵
- Romijn RA,
- et al.
- ↵
- ↵
- Jarvis GE,
- et al.
- ↵