Xenoprotein engineering via synthetic libraries
See allHide authors and affiliations
Edited by David Baker, University of Washington, Seattle, WA, and approved April 20, 2018 (received for review December 29, 2017)

Significance
Combinatorial protein libraries—prepared via molecular biology-based approaches—are invaluable tools for protein engineering. The inclusion of noncanonical amino acids in such libraries is of considerable interest. However, at present no approach competes with chemical synthesis in terms of the variety and number of noncanonical amino acids that can be simultaneously incorporated into a protein molecule. Here, we describe selection from synthetic libraries as a strategy for protein engineering. The approach enables identification of small (∼30 aa), functional protein variants comprising a virtually unlimited variety of noncanonical amino acids. Increasing the throughput of synthetic library screening, which was achieved through this effort, is anticipated to improve the utility of synthetic libraries for identifying polypeptide-based ligands with de novo function.
Abstract
Chemical methods have enabled the total synthesis of protein molecules of ever-increasing size and complexity. However, methods to engineer synthetic proteins comprising noncanonical amino acids have not kept pace, even though this capability would be a distinct advantage of the total synthesis approach to protein science. In this work, we report a platform for protein engineering based on the screening of synthetic one-bead one-compound protein libraries. Screening throughput approaching that of cell surface display was achieved by a combination of magnetic bead enrichment, flow cytometry analysis of on-bead screens, and high-throughput MS/MS-based sequencing of identified active compounds. Direct screening of a synthetic protein library by these methods resulted in the de novo discovery of mirror-image miniprotein-based binders to a ∼150-kDa protein target, a task that would be difficult or impossible by other means.
Xenoproteins—protein molecules composed of noncanonical amino acids—might exhibit function not readily achieved by the use of proteogenic amino acids alone (1⇓⇓⇓⇓⇓⇓⇓–9) and other favorable properties such as protease stability or altered immunogenicity (10). Chemical protein synthesis is a powerful approach for incorporating a virtually unlimited variety of noncanonical amino acids into a protein molecule for biophysical and structure–function studies (11⇓⇓⇓⇓⇓⇓⇓⇓–20). However, no experimental approach exists to differentiate beneficial versus deleterious mutations in a synthetic protein molecule with significant throughput. Such an approach would facilitate the discovery of functional xenoproteins, since incorporation of noncanonical amino acids into protein molecules is frequently deleterious (21⇓–23).
Identification of functional variants from synthetic combinatorial libraries is a potential solution to the challenge of xenoprotein engineering (24⇓–26). The synthesis of small proteins is straightforward with solid-phase peptide synthesis (SPPS); however, the screening of high-diversity synthetic libraries presents a formidable challenge. Generally, synthetic libraries have been limited to peptide and peptidomimetic structures comprising relatively short polymers with a limited number of varied positions. This is in contrast to molecular biology-based screening and selection strategies, which can routinely examine at least 107–108 variants of large protein molecules, such as antibody fragments, that contain point mutations across the entire polypeptide chain (27).
The one-bead one-compound (OBOC) approach (28) is amenable to the preparation of high-diversity synthetic libraries, provided that sufficiently small resin beads are employed. For example, 1 g of 10-μm TentaGel resin contains ∼2 × 109 beads. The screening of at least ∼3 × 106 compounds by the OBOC approach has been reported (29); however, typical screens examine fewer than 106 compounds (30). The challenge of handling high-diversity OBOC libraries can be attributed to the labor associated with manually screening large numbers of beads (31) and to de novo sequencing of active compounds using the limited amount of material present on 30- or 10-μm beads (4 pmol or 100 fmol, respectively, for amine loadings of 0.2 mmol/g).
We set out to render practical the screening of high-diversity synthetic libraries and to explore the use of synthetic protein libraries for engineering de novo binding activity into a mirror-image miniprotein molecule (MIM). This task was chosen to illustrate the utility of chemical synthesis for xenoprotein engineering, since a MIM is composed entirely of noncanonical amino acids, with the exception of glycine. The overall approach is shown in Fig. 1. Key steps are the synthesis of folded protein variants bound to 30-μm beads, high-throughput analysis of on-bead screens by flow cytometry, and de novo sequencing of identified active compound mixtures by a recently developed liquid chromatography/tandem mass spectrometry (LC-MS/MS) approach (32). These steps were individually optimized and then combined to discover MIM-based binders to a monoclonal antibody target. The results constitute important proof of concept for the synthetic library approach to xenoprotein engineering, which should in principle be applicable to the discovery of functional xenoproteins based on a variety of other small protein scaffolds.
An approach to identifying binders from synthetic protein libraries. Thirty-micrometer beads displaying xenoprotein variants are prepared by a combination of stepwise SPPS and on-bead folding. The resulting beads are incubated with a protein target bearing a fluorescent label (red star), and beads displaying functional xenoprotein variants are isolated by fluorescence-activated sorting (FACS). The sequences of xenoproteins contained on sorted beads are then determined by de novo MS/MS-based peptide sequencing.
Results and Discussion
EETI-II Is a Robust Scaffold for the Display of Chemical Diversity.
As a molecular scaffold (33) for the generation of a mirror-image protein library we selected the trypsin inhibitor from Ecballium elaterium, EETI-II (Fig. 2). This 29-residue protein molecule is amenable to chemical synthesis (34), is known to oxidatively fold spontaneously despite sequence variation of the trypsin-binding loop (23, 35⇓⇓–38), and has been successfully engineered to bind αvβ3 and αvβ5 integrins based on the Arg–Gly–Asp motif using yeast-surface display (39, 40). The broad tolerance of the EETI-II molecule to loop expansion and sequence variation (38) is typical of cystine knot proteins generally (41), and we anticipated that this feature could be leveraged to prepare synthetic libraries of folded variants for the selection of novel binding proteins. To confirm this important property of the EETI-II molecule, we tested the ability of a number of synthetic EETI-II variants, containing nonnative loop sequences, to fold spontaneously (Fig. 2C and SI Appendix, section 3). In all cases (20 shown), a single major product containing three disulfide bonds was formed. These data suggest that in a combinatorial library of EETI-II variants, many will be present as folded cystine knots, which likely retain the disulfide connectivity and general tertiary structure of the native EETI-II molecule (37, 42).
EETI-II is a robust scaffold for the display of chemical diversity. (A) Amino acid sequences of EETI-II and a synthetic library scaffold based on mirror-image EETI-II. d-amino acids are in lowercase; cysteine residues (all in disulfide form) are underlined. Residues corresponding to the trypsin-binding loop are shown in green. (B) Cartoon rendering of the EETI-II molecule, based on Protein Data Bank ID code 1W7Z. The trypsin-binding loop is shown in green, and the three disulfide bonds as yellow sticks. (C) LC-MS data showing the spontaneous oxidative folding of a synthetic EETI-II variant with a nonnative trypsin-binding loop sequence. A loss of 6 Da was observed upon treatment with soluble redox buffer (SI Appendix, section 3), consistent with the formation of three disulfide bonds. Calculated and observed monoisotopic masses are indicated. (D) Stability of l- and d-forms of two engineered EETI-II variants to proteinase K. The fraction of intact protein remaining at each time point was determined by LC-MS–based quantitation (average of two measurements).
The susceptibility of l-polypeptides to protease digestion is a major drawback to their use in vivo, which has motivated the development of binding molecules based on d-polypeptides (43, 44) and d-proteins (10). Cystine knot proteins sometimes exhibit protease resistance, but this is not general (45). To demonstrate a significant advantage of d-protein molecules, we investigated the stability of several mirror-image EETI-II variants to proteinase K. Minimal degradation was observed over a period of 24 h, whereas the corresponding l-proteins survived less than 1 h under identical conditions (Fig. 2D and SI Appendix, section 4). Taken together, these data support the notion that mirror-image EETI-II is a robust scaffold for the generation of binding molecules, which may be of practical utility in biotechnology.
Protein-Based Fluorophores Enable Flow Cytometry Analysis of On-Bead Assays.
Flow cytometry is a powerful technique for high-throughput analysis of cell-surface display libraries (46⇓–48), and we sought to adopt its use for bead-based libraries, to increase screening throughput (49⇓–51). An initial goal was to maximize the fluorescence contrast between beads displaying a known protein ligand versus beads displaying nonbinding ligands, after incubation with a fluorescently labeled target protein. We used SPPS to prepare 30-μm TentaGel beads functionalized with the streptavidin (SA)-binding peptide StrepTag II (52) and assayed them for binding to a variety of commercially available SA conjugates alongside samples of beads displaying random peptides. SA conjugates were chosen based on compatibility with 633-nm excitation, to minimize bead autofluorescence, and included SA-Alexa Fluor 633 (SA-AF633), SA-allophycocyanin (SA-APC), and SA-quantum dot 655 (SA-QD).
Of the fluorophores examined, only the protein-based APC gave acceptable contrast between StrepTag II beads and library beads (Fig. 3A and SI Appendix, section 6). The comparatively poor performance of AF633 and QD fluorophores was due to a combination of 10- to 15-fold higher background fluorescence of the library beads, and approximately fivefold lower fluorescence of the bound StrepTag II beads. The propensity of highly charged, small-molecule fluorophores such as AF to bind nonspecifically to library beads has been noted (53) and may be responsible for the high level of background binding observed with SA-AF633.
Flow cytometry is amenable to monitoring on-bead binding assays. (A) Fluorescence histograms showing the contrast between library beads (cyan) and StrepTag II beads (maroon) achieved by the use of different fluorescent SA reagents. (B) Fluorescence histograms showing protein target-dependent binding to beads functionalized with the indicated peptide ligands. (C) A plot of the fluorescence means obtained by treatment of biotin-functionalized beads (four different bead loadings) with increasing concentrations of SA-APC. Gly-functionalized beads served as a negative control. (D) A plot of the fluorescence means obtained by treatment of IgG1 binder-functionalized beads with increasing concentrations of a fixed molar ratio of SA-APC and biotinylated polyclonal IgG1. Library beads served as a negative control.
SA-APC enabled detection of protein-specific binding to beads functionalized with ligands to a variety of different protein targets. For example, incubation of beads functionalized with either the nine-residue linear epitope (HA epitope) of an anti-hemagglutinin monoclonal antibody (clone 12CA5; anti-HA mAb 12CA5) (Fig. 3B, Middle) or an IgG1 Fc-binding peptide (Fig. 3B, Right) (54) with either SA-APC or a mixture of SA-APC and the appropriate biotinylated protein target resulted in target-dependent gains in fluorescence intensity. The ∼100-fold difference in fluorescence intensity observed for bound versus unbound beads was comparable to that observed in cell-surface binding assays (55); therefore, we concluded that flow cytometry is suitable for the analysis of on-bead binding assays, and potentially for on-bead screens (discussed below).
On-Bead Binding Assays Are Highly Sensitive.
For cell-surface binding assays, the fluorescence intensity of a cell displaying a bound ligand is directly proportional to ligand expression level (47, 55). To account for this effect, which could otherwise lead to the identification of weak binders displayed at high copy number, target binding is normalized based on binding to a coexpressed affinity tag, in a two-color experiment. We sought to determine whether a similar procedure would be required for beads, which reportedly exhibit substantial variation in amine loading (56) (moles of synthetic product per individual bead). To address this question, we prepared biotinylated TentaGel samples of varying biotin loading and assayed them by flow cytometry after incubation with SA-APC.
The fluorescence intensity of bound biotin beads varied only fivefold over a 1,000-fold range of biotin loadings (Fig. 3C), and a similar outcome was obtained for beads functionalized with varying levels of StrepTag II (SI Appendix, Fig. S38). These outcomes were not due to fluorescence quenching at high bead loadings: When incubated with mixtures of unlabeled SA and SA-APC, the fluorescence intensity of biotin-functionalized beads varied proportionally to the ratio of fluorescent and unlabeled SA (SI Appendix, Fig. S39). We concluded that a strategy to normalize bead fluorescence based on amine loading would not be required for success with bead-based screens, since the fluorescence intensity of a bound bead was largely invariant with bead loading. This result could be understood if the amount of protein target present on a bound bead were small relative to the lowest bead loading studied. Prior studies of on-bead binding assays found that only ∼0.002 μmol/g of the ligand present on a bead is involved in binding (57), which supports this interpretation.
For a variety of particle-based screen technologies, avidity effects can result in the identification of weak ligands. For example, SA-binding peptides with Kd of ∼300 μM (58) are readily detected with both on-bead assays (28, 59) and by phage display (60). A loading of 2 μmol/g was selected for use in library synthesis, to minimize the identification of weak binders to the SA-APC component of stain reagents (59). Fig. 3D shows the differentiation of IgG1 binder-functionalized beads of 2 μmol/g loading from library beads, over a range of stain reagent concentrations.
Functional EETI-II Can be Prepared on Beads.
We used an on-bead binding assay to test the combination of SPPS and on-bead folding for preparing functional miniproteins on beads, which was an important prerequisite for MIM library preparation. l-EETI-II was prepared by Fmoc chemistry SPPS on beads uniformly attenuated to 20 μmol/g amine loading, to minimize oxidative polymerization during on-bead folding (61) while retaining sufficient material for liquid chromatography-mass spectrometry (LC-MS) characterization (discussed below). Following SPPS and removal of side-chain protecting groups, beads were treated with the same conditions used for solution-based folding studies (SI Appendix, section 7). As a negative control, beads functionalized with EETI-II trypsin-binding loop sequence only were evaluated. These were expected not to bind trypsin, since intact disulfide bonds are required for trypsin binding (62).
EETI-II beads exhibited trypsin-dependent on-bead binding relative to library beads. In contrast, beads functionalized with trypsin-binding loop only were indistinguishable from library beads under the same conditions (Fig. 4 A and B and SI Appendix, section 8). These results were suggestive of the presence of folded EETI-II on beads, since the binding loop alone was insufficient for trypsin binding.
Folded EETI-II can be prepared on beads. (A) Fluorescence histograms showing the contrast between library beads (cyan) and either EETI-II beads (maroon, Right) or trypsin-binding loop beads (maroon, Left) after incubation with a mixture of SA-APC (50 nM) and biotinylated trypsin (100 nM). (B) Fluorescence means obtained by treatment of the same samples with a fixed concentration of SA-APC (50 nM) and one of two different concentrations of biotinylated trypsin. Trypsin-dependent binding was observed for the EETI-II beads only. (C) LC-MS data showing the spontaneous oxidative folding of EETI-II while bound to beads. (D) LC-MS data showing the spontaneous oxidative folding of an analog of the engineered EETI-II variant 2.5F, while bound to beads; † denotes a three-disulfide-containing product and * denotes a two-disulfide-containing product. For C and D, the indicated mass values correspond to monoisotopic masses.
As a second means of characterizing synthetic EETI-II on beads, we used LC-MS to follow the progress of on-bead oxidative folding. EETI-II was synthesized on beads functionalized with a PAM ester linker (63), which was stable to the conditions of side-chain deprotection but cleavable by stronger acid in a separate step. By cleavage of bead-bound EETI-II either before or after a folding treatment, we were able to evaluate the outcome of on-bead folding by LC-MS.
On-bead folding conditions effected the conversion of EETI-II to a single three-disulfide-linked product (64), as for the solution-based folding studies (Fig. 4C). To investigate the prospect of preparing folded EETI-II variants with expanded loops—necessary to accommodate large libraries—we examined an analog of the engineered EETI-II variant 2.5F (39) in the same assay. In this case, the desired three-disulfide product formed in an approximately one-to-three ratio with a two-disulfide intermediate (Fig. 4D). These data support the notion that folded, synthetic EETI-II variants with randomized loop sequences can be prepared on beads. However, the possibility of synthetic coproducts participating in screens must be kept in mind, as for any bead or cell surface-based screen. In the case of libraries based on EETI-II, the two-disulfide folding intermediate may be a common and significant coproduct.
High-Diversity Mirror-Image EETI-II Libraries Were Prepared on Bilayer Beads.
Having demonstrated the preparation of beads displaying functional EETI-II, we proceeded to construct a mirror-image EETI-II-based library, for use in screening (Fig. 5). Several considerations factored into the design of this strategy. First was the need to display a low density of synthetic protein in the accessible portion of the bead, while maintaining an adequate amount of material for sequencing. This need suggested the use of spatially segregated resin particles (59), which were prepared by an enzymatic shaving approach (65) (SI Appendix, section 9). Second was our inability to reliably assign de novo sequences to MS/MS spectra obtained from polypeptides longer than ∼15 aa residues. This limitation suggested the use of a “coding structure” of amino acid sequence identical to the varied portion of the MIM, contained in the bead interior, and coupled to a cleavable linker for release postscreening. The coding structure was installed by removal of an Aloc group on the bead interior, after SPPS of the MIM “constant region” (Fig. 5).
(A–C) Strategy for the preparation of a mirror-image EETI-II-based library.
Manual split-pool synthesis (28) of the MIM library was performed using 5 g of resin, which comprised more than 2 × 108 individual beads (SI Appendix, section 10). For each split, 15 protected d-amino acid monomers were employed (Cys, Arg, Ile, Asn, and Gln were excluded). The use of 15 possible monomers at each of nine varied positions equated to a theoretical diversity of 3.8 × 1010 compounds—roughly 200 times the number of beads used for synthesis. Thus, the totality of possible sequences was undersampled, as for many combinatorial protein libraries (27). We hypothesized that higher diversity would increase the odds of identifying consensus binding sequence motifs without a major penalty, since nonredundant libraries could still be redundant with respect to such motifs.
Protein Target-Dependent Binding Limits the Enrichment Attainable by Fluorescence-Activated Bead Sorting.
With mirror-image EETI-II-based libraries in hand, we sought to determine the enrichment that could be achieved by flow cytometry selection for on-bead binding, where enrichment is defined as (positive beads/total beads)sorted/(positive beads/total beads)analyzed (46). Flow cytometry routinely achieves enrichments of ∼10,000 (46, 66) for cell-surface display libraries. However, the sorting of beads is thought to be complicated by high background due to autofluorescence (67). To understand the severity of this issue, and other variables that might affect enrichment, we studied the frequency of sorted beads (>6,000 fluorescence counts) for two different bead samples—underivatized beads and library beads—under a variety of assay conditions (Table 1 and SI Appendix, section 11.1). Sort frequency sets an upper bound on enrichment, which can be expressed as (total beadsanalyzed/total beadssorted) × (positive beadssorted/positive beadsanalyzed).
Binding of protein targets to chemical functionality on beads limits the enrichment attainable by flow cytometry selection
Enrichment was limited primarily by the binding of protein targets to chemical functionality on beads. The process of solid-phase synthesis did result in a 10-fold increase in the frequency of autofluorescent beads, as determined by comparison of library beads and underivatized beads in the absence of fluorescent protein (Table 1, entries 1 and 2); however, incubation of underivatized beads with either of two SA-APC conjugates resulted in more than 103-fold gains in sort frequency (Table 1, entries 3 and 4). Library beads displayed a higher degree of protein binding compared with underivatized beads, with sort frequencies 103- to 104-fold above the autofluorescence level (Table 1, entries 5 and 6 vs. entry 2). These sort frequencies corresponded to maximum enrichments of 15 or 5 for anti-HA mAb 12CA5 or thrombin, respectively, which were sufficiently poor as to make sorting untenable.
To minimize nonspecific binding of protein targets, we conducted assays in complex media. For both underivatized beads and library beads, use of FBS buffer resulted in lower sort frequencies compared with BSA alone (Table 1, entries 7–10). In FBS, library beads exhibited only slightly higher sort frequencies compared with underivatized beads, corresponding to maximum enrichment factors of ∼2 × 103. Enrichment factors on this order are close to those obtained for the sorting of bacterial and yeast cells, suggesting that acceptable conditions for on-bead screening had been identified. However, since functional MIM variants were anticipated to be less frequent than one in several thousand, we concluded that a preenrichment step would be required for a successful screen. Magnetic bead enrichment was explored, since this procedure is used to reduce the initial size of cell surface display libraries before sorting (68) and is capable of retaining even modest-affinity (Kd ∼ micromolar) binders with high yield (69).
A Two-Stage Magnetic Bead Selection/On-Bead Screen Improves Enrichment for Active Beads.
We investigated the utility of magnetic bead selection for improving enrichments achieved in screens of the mirror-image EETI-II library against anti-HA mAb 12CA5 and human α-thrombin, respectively. Library beads (400 mg, ∼2 × 107 beads) were incubated with SA-coated magnetic microparticles conjugated to the appropriate biotinylated target protein (SI Appendix, section 11.3). Retained library beads were isolated with the goal of maximizing recovery and incubated with the appropriate SA-APC conjugate in secondary screens. Two screens were carried out with anti-HA mAb, to assess reproducibility.
Magnetic bead selections for thrombin or 12CA5 binding retained 1–6% of library beads, corresponding to maximum enrichments of 16–90. Combined with the enrichment obtained by subsequent on-bead screens, overall maximum enrichments of up to 8.2 × 104 were obtained (Table 2). To evaluate whether the actual enrichments achieved were sufficient to identify functional variants, the material present on sorted beads was sequenced as described (32) (SI Appendix, section 11).
Two-stage magnetic bead selection/on-bead screens can yield overall enrichments approaching 105
For the first of two screens against 12CA5, 9 of 16 sequence assignments contained a motif, yp*e*d/e, where * is any amino acid (SI Appendix, Fig. S55). This was in contrast to assignments from a single-pass screen of 1.5 × 106 beads, which lacked discernable similarity (SI Appendix, Fig. S53), suggesting that the magnetic bead selection preferentially retained positive beads. For the replicate 12CA5 screen, seven yp*e*d/e-containing sequences were obtained; however, these comprised a smaller fraction of the total (180), consistent with the lower overall enrichment obtained for the replicate screen (SI Appendix, Figs. S57 and S58). Likewise, no obvious similarity was discernable among sequences obtained for the thrombin screen, which had the lowest overall enrichment (SI Appendix, Fig. S60). These results highlight the importance of the magnetic bead selection and suggest that the protocol has yet to be optimized (70). Significantly higher enrichments can probably be achieved, since magnetic bead selection can enrich yeast cells 104-fold with near-quantitative yield (69).
Sequence Motifs Are Predictive of Reproducible On-Bead Binding.
To assess platform fidelity, we synthesized a select group of candidate 12CA5 binders for evaluation in an on-bead binding assay (SI Appendix, section 12). This experiment is analogous to the use of cell-surface binding assays to characterize active clones following selection by cell-surface display and would demonstrate the feasibility of our approach for identifying rare beads based on a reproducible screen criterion. Candidate sequences containing the yp*e*d motif were selected for evaluation, alongside several unrelated sequences.
With one exception, compounds containing the yp*e*d motif gave reproducible, 12CA5-dependent binding, similar to positive control beads functionalized with the HA epitope (Fig. 6, Table 3, and SI Appendix, Fig. S61). In contrast, candidate compounds lacking the yp*e*d motif were indistinguishable from negative control. These findings demonstrate the potential of our platform for isolating rare beads displaying active compounds and confirm the commonsense premise that emergence of a consensus binding sequence is indicative of a successful screen.
Replicate preparations of identified active library beads exhibit reproducible on-bead binding. Comparable 12CA5-dependent binding was observed for beads functionalized with either (A) HA epitope or (B) putative 12CA5-binding MIMs.
On-bead binding activity of putative 12CA5-binding MIMs
Orthogonal Binding Assays Confirm the Activity of Mirror-Image EETI-II-Based Binders.
To confirm the activity of the identified MIM-based 12CA5 binders, select MIMs were synthesized for evaluation in off-bead assays (Fig. 7A and SI Appendix, section 13). In a biolayer interferometry assay (BLI), all MIMs tested exhibited a robust binding response to 12CA5 (SI Appendix, section 14.1). For three MIMs examined further, strong responses were also observed over a range of 12CA5 dilutions (SI Appendix, section 14.2). Individual kinetic traces were fit to a 1:1 binding model, which for compound 1 yielded average values of kon = 4.5 (±1.9) × 104 M−1 s−1, koff = 2.2 (±0.8) × 10−3 s−1, and Kd = 50 nM (22–120 nM) (Fig. 7B). Equilibrium binding responses were consistent with the Kd values obtained from kinetic analysis (50% of Rmax achieved at 22 nM), which provided an elementary test of self-consistency for the BLI data (71).
Orthogonal binding assays confirm the activity of 12CA5-binding MIMs. (A) LC-MS analysis of representative biotinylated 12CA5-binding MIM 1. (B) BLI assay, showing association of exogenous 12CA5 to immobilized MIM 1. (C) FP competition binding assay, showing the displacement of fluorescent HA epitope by unlabeled HA epitope (red) and MIM 1 (black).
Binding activity was investigated further using a fluorescence polarization-monitored competition assay (SI Appendix, section 15). MIM 1 was evaluated first and competed with fluorescent HA epitope for binding to 12CA5 (Fig. 7C, black trace). Based on the IC50 obtained from self-competition with unlabeled HA epitope (red trace), a Kd value of 460 ± 100 nM was obtained—∼10-fold weaker than the value obtained from BLI. Based on reported discrepancies between the results of surface-based and other binding assays (72⇓–74), and the ability of the fluorescence polarization assay to accurately measure the affinity of fluorescent HA epitope (SI Appendix, Fig. S79), we evaluated the remaining MIMs using the competition assay (Table 4 and SI Appendix, section 15.3). Affinities ranged from 460 nM to 6.1 μM; binders containing aromatic residues in the * positions of the sequence motif yp*e*d displayed the highest affinity.
Affinity of mirror-image EETI-II–based binders (compounds 1–7) to 12CA5
‟Hot-Spot” Residues Determine Binding Affinity of MIMs for 12CA5.
Identification of a sequence motif associated with 12CA5 binding implies that the identified residues are “hot spots,” since they are conserved among MIM-based binders. The interactions of l-peptide HA epitope Tyr98-Ala108 (YPYDVPDYALA) with anti-HA mAbs are also facilitated by a limited number of hot-spot residues: Asp101, Asp104, and Tyr105 (75⇓–77). The d-polypeptide yp*e*d binding motif bears a striking resemblance to the l-peptide epitope, in terms of both sequence similarity (yp*e vs. YP*D) and overall side-chain composition. However, the hot-spot residues are different (yp*e*d vs. D**DY). In crystal structures of anti-HA mAbs complexed with HA peptide Tyr100-Ala108, the side chain of Tyr105 is sandwiched between side chains of the mAb H3 loop (78, 79). Energetically, this is the most important interaction in the mAb–peptide complex. For the MIM-based binders, we speculate that the conserved d-tyrosine residue fulfills a similar interaction, and that the d-polypeptide traverses the mAb binding “cleft” formed by loops H2 and H3 in opposite direction of the l-peptide epitope.
To investigate a possible role of constraint in facilitating the binding interaction between 12CA5 and the MIM-based binders, select MIMs were exhaustively reduced/alkylated for evaluation in the fluorescence polarization (FP) assay (compounds 1 and 5; SI Appendix, section 16.1). When assayed side by side with the corresponding MIMs containing intact disulfide bonds, these “unfolded” variants exhibited approximately twofold weaker binding affinity (SI Appendix, section 16.2). The minimal role of constraint in stabilizing the anti-HA/engineered mirror-image EETI-II interface is in contrast to interactions between engineered cyclic peptide binders and targets such as integrin (80) or SA (60), and for the interaction of an engineered thioredoxin loop with an anti-hapten mAb (81), where constraint significantly improves binding affinity. Constraint may be comparatively unimportant in facilitating binding to mAbs that target linear epitopes, as for the mAb 12CA5 studied here (82). Potentially, the β-Ala residues play a role in offsetting a potential benefit of constraint. Further studies on a wide range of protein targets will be required to evaluate the general utility of constraint for increasing the binding activity of engineered knottins.
Significance.
The methods described here form the basis of a platform for the discovery of xenoprotein molecules with de novo binding activity. As illustrated by the identification of mirror-image EETI-II-based “mimotopes” (83), discovery of MIM-based binding molecules is one application of this strategy. Mirror-image peptide and protein-based binding molecules are of longstanding interest in biotechnology (84). However, to date their discovery by mirror-image phage or cell-surface display has required chemical synthesis of the pertinent target molecule (43, 85, 86), which has restricted the practical size limit of proteins that could realistically be targeted. Direct screening of synthetic MIM libraries is a potential solution to this limitation. Further work will be required to evaluate the general utility of engineered knottins for this application, and of other small protein scaffolds based on helical bundles (87), β-sheets (88), or de novo designs (89⇓–91).
For de novo protein engineering, higher-diversity libraries yield generally superior outcomes. For example, the affinities of antibodies identified by selection from nonimmune libraries improve markedly with increasing library size (66, 92). Thus, extending the throughput by which synthetic libraries can be screened is expected to considerably extend their value. In this work, throughput was achieved by interfacing a multistage selection and screening procedure with a high-throughput LC-MS/MS–based sequencing approach, applicable to large numbers of small beads (32). Both of these steps were necessary to practically explore the utility of synthetic protein libraries for de novo engineering and should facilitate progress in combinatorial chemistry generally. Our work establishes significant parallels between flow cytometry analysis of bead-based (49) versus cell-surface display libraries (46, 47, 93) and highlights the importance of multistage, high-yield selection procedures in the case of beads, since synthetic libraries cannot be propagated and resorted to improve enrichment.
Other uses of synthetic protein libraries may be found. For example, screens for retention of binding could be used to identify positions of an engineered binding protein—identified through a molecular biology-based method—that are compatible with noncanonical amino acids. Such an approach could be useful for modifying the protease stability of biotherapeutics, which is currently achieved through rational design (94). In the future, synthetic libraries may be used to empirically evaluate the success of de novo xenoprotein designs, as has recently been achieved with yeast surface display and de novo designs based on proteogenic amino acids (91, 95). The combination of design and synthetic library screening could prove extremely powerful, with experimental feedback informing design strategy improvements.
Conclusion
The potential of synthetic protein libraries for identifying functional xenoproteins has been demonstrated. By using a combination of magnetic bead enrichment and flow cytometry, the screening of at least 2 × 107 synthetic protein variants was rendered feasible on the timescale of hours. This is a significant improvement over state-of-the-art bead-based screening methods, which examine ∼100-fold fewer compounds—of smaller molecular weight (∼1 kDa)—in a typical screen (51, 96). Throughput could be further improved by the use of even smaller beads, which in principle contain sufficient material for MS/MS-based sequencing (32). With continued development, and as synthetic methods continue to evolve, synthetic protein libraries may become a powerful tool for the discovery of folded, nonnatural polymers with biological function.
Methods
Peptide Synthesis.
Thirty-micrometer TentaGel M NH2 microspheres (M30352) (97) (Rapp Polymere GmbH) and Fmoc chemistry SPPS were employed throughout. For library synthesis, a protocol for Boc-chemistry SPPS (98) was adapted for use with manual Fmoc SPPS. Fmoc deprotection was achieved by treatment with 20% piperidine in dimethylformamide (30 s flow wash; 2 × 3-min batch wash). For the synthesis of defined peptidyl resins, a manual flow-based method for Fmoc SPPS (99) was employed. Methods for the preparation of spatially segregated and attenuated loading resin beads are described in SI Appendix.
Flow Cytometry.
All studies employed a FACS Aria III cytometer (BD Biosciences) equipped with 488-, 561-, and 633-nm lasers, and a 130-μm nozzle (operated at 10 psi sheath pressure). All experiments using SA-APC–based stain reagents employed 633-nm excitation and 660-nm (20-nm bandwidth) detection. Fluorescence measurements were recorded based on a forward scatter threshold. Detector voltages and laser delays were set using a sample of biotin-functionalized TentaGel that had been stained with SA-APC (the fluorescence mean of this sample was set to ∼100,000 counts—the middle of the dynamic range). Before analysis, all bead samples were passed through a 70-μm cell strainer to minimize inlet tube clogs.
Acknowledgments
We thank Anne Fischer and D. Tyler McQuade (Defense Advanced Research Projects Agency, DARPA) for their support and guidance and Jeremiah Johnson, Stephen Kent, John Lampe, Thomas Nielsen, Glenn Paradis, Amy Rabideau, Michael Santos, K. Dane Wittrup, and Chi Zhang for encouragement and insightful discussions. This work was supported by DARPA Award 023504-001 (to B.L.P. and T.F.J.) and a STAR Postdoctoral fellowship from Novo Nordisk (to A.B. and Z.P.G.). We acknowledge use of the Biophysical Instrumentation Facility at Massachusetts Institute of Technology (MIT) (NIH S10 OD016326; Deborah Pheasant, Director) and the Swanson Biotechnology Center High Throughput Screening Facility at MIT’s Koch Institute (Jaime Cheah, Core Leader).
Footnotes
- ↵1To whom correspondence may be addressed. Email: zgates{at}mit.edu or blp{at}mit.edu.
Author contributions: Z.P.G., A.A.V., A.J.Q., A.B., E.D.E., M.D.S., T.F.J., and B.L.P. designed research; Z.P.G., A.A.V., A.J.Q., A.B., Z.-N.C., E.D.E., K.H.H., A.J.M., S.K.M., M.D.S., E.A.S., E.D.S., S.Z.T., F.T., J.M.W., J.L.W., and B.L.P. performed research; Z.P.G., A.A.V., A.J.Q., A.B., and B.L.P. analyzed data; and Z.P.G., A.A.V., and B.L.P. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1722633115/-/DCSupplemental.
Published under the PNAS license.
References
- ↵
- ↵
- ↵
- ↵
- Liu CC, et al.
- ↵
- ↵
- Young TS, et al.
- ↵
- ↵
- ↵
- Xiao H, et al.
- ↵
- Uppalapati M, et al.
- ↵
- Baca M,
- Kent SBH
- ↵
- ↵
- Romanelli A,
- Shekhtman A,
- Cowburn D,
- Muir TW
- ↵
- Wu Z, et al.
- ↵
- Shogren-Knaak M, et al.
- ↵
- Valiyaveetil FI,
- Leonetti M,
- Muir TW,
- Mackinnon R
- ↵
- Torbeev VY, et al.
- ↵
- Liu Z, et al.
- ↵
- Murakami M, et al.
- ↵
- Thompson RE, et al.
- ↵
- ↵
- Reinert ZE,
- Lengyel GA,
- Horne WS
- ↵
- Simon MD, et al.
- ↵
- DeGrado WF,
- Sosnick TR
- ↵
- ↵
- Epton R
- Lowe G,
- Quarrell R
- ↵
- ↵
- ↵
- Wu X,
- Upadhyaya P,
- Villalona-Calero MA,
- Briesewitz R,
- Pei D
- ↵
- ↵
- Das S, et al.
- ↵
- Vinogradov AA, et al.
- ↵
- ↵
- ↵
- ↵
- ↵
- Mong SK, et al.
- ↵
- ↵
- ↵
- Kimura RH,
- Cheng Z,
- Gambhir SS,
- Cochran JR
- ↵
- ↵
- ↵
- Schumacher TN, et al.
- ↵
- Garton M, et al.
- ↵
- ↵
- ↵
- ↵
- ↵
- Needels MC, et al.
- ↵
- Müller K, et al.
- ↵
- Mendes KR, et al.
- ↵
- ↵
- Jee J-E, et al.
- ↵
- DeLano WL,
- Ultsch MH,
- de Vos AM,
- Wells JA
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Munson MC,
- Barany G
- ↵
- Wentzel A,
- Christmann A,
- Krätzner R,
- Kolmar H
- ↵
- ↵
- ↵
- Vágner J, et al.
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Houghten RA
- ↵
- Pinilla C,
- Appel JR,
- McPherson SE,
- Houghten RA
- ↵
- ↵
- Rini JM,
- Schulze-Gahmen U,
- Wilson IA
- ↵
- ↵
- ↵
- James LC,
- Roversi P,
- Tawfik DS
- ↵
- ↵
- ↵
- ↵
- Mandal K, et al.
- ↵
- ↵
- Braisted AC,
- Wells JA
- ↵
- ↵
- ↵
- Baker EG,
- Bartlett GJ,
- Porter Goff KL,
- Woolfson DN
- ↵
- ↵
- ↵
- Francisco JA,
- Campbell R,
- Iverson BL,
- Georgiou G
- ↵
- Checco JW, et al.
- ↵
- Rocklin GJ, et al.
- ↵
- ↵
- ↵
- ↵
Citation Manager Formats
Article Classifications
- Biological Sciences
- Biophysics and Computational Biology
- Physical Sciences
- Chemistry