Environmental control programs the emergence of distinct functional ensembles from unconstrained chemical reactions

Significance We show that materials with different structure and function can emerge from the same starting materials under different environmental conditions, such as order of reactant addition or inclusion of minerals. The discoveries we report were made possible by using analytical tools more common in omics/systems biology for functional and structural characterization, retasked for exploring and manipulating complex reaction networks. We not only demonstrate that environments can differentiate fixed sets of starting materials (both mixtures of pure amino acids and the classic Miller–Urey “prebiotic soup” model), but that this has functional consequences. It has been often said that biology is “chemistry with history” and this work shows how this process can start.


Page 4
Samples were used directly from the synthesis procedures, as described. Where too much material was present (causing saturation of MS detector through excess signal), all samples in the series were diluted by a 1 in 10 dilution, to allow injection in the 2-5 μl volume range while optimising MS signal.
The instrument was calibrated before each set of analytical replicates (each of which was completed before progressing to the next analytical replicate).

Amino Acids
Where amino acids (AAs) are discussed, they are frequently identified using standard single-letter notation: A = alanine; D = aspartic acid; G = glycine; H = histidine; V = valine. All those incorporating stereocentres are the L-enantiomer.

Peptide synthesis
Peptide standards (for identification of different G4A sequence permutations) were synthesised separately using a standard solid phase technique (Fmoc Ala and Gly Wang resin; coupling with DIC/HOBT and a TFA cleavage; Fmoc deprotection with 20% Piperidine/DMF) using a Biotage Initiator+ Alstra Petide Synthesiser. DIC and TFA were purchased from Sigma Aldrich, and protected amino acids and Wang resin were purchased from Activotec.

Effect of Soluble Salts (G, A, H)
In this set of experiments, one solution containing an equimolar amount of three different amino acids was reacted to different soluble salts under successive dehydration-hydration cycles. 5. 3.5 ml of HPLC water were added in cycles 2, 3, 5, 6, 8 and 9.
6. Each dehydration-hydration cycle was performed on a multiwell hotplate at 130 °C for 12 h (a fixed arbitrary cycle time; all repeat reactions performed together to avoid error). 7. Once finished, all the samples were diluted by adding 6 ml of HPLC water. 8. 500 µl were taken for LC-MS analysis. The remaining sample was dialysed with a G2 Floata-lyser (500-1000 Da) cut-off (5 ml) for 20 h. 9. Once the dialysis was completed, the samples were left to freeze-dry for 48 h. 10. The solid product material was redissolved in 6 ml of water, filtered through 0.22 μm syringe filters, and stored at 4 o C to be used without further treatment.

Untargeted LC-MS & fingerprinting analysis approach
Each reaction (performed in triplicate) was analysed three times in LC-MS, giving a total of 9 repeats (3 experimental x 3 analytical repeats). A qualitative overview of product distribution vs LC-MS intensity was obtained using bespoke script, under the R environment, 1 with files input in the ".mzML" format, and the xcms library 3 for data extraction and peak picking functions. The procedure was as follows (results in following sections): i. Input all data in groups (9 experiments, in 7 groups).
ii. Independently  with 'bubbles' plotted around each set of experiments (each environment) representing two standard deviations around their mean (using ellipse3d function from the rgl library).
vi. Principal component discriminant function analysis (PC-DFA) was also performed (using the MASS library), 14 using the first five principal components (these accounted for the overwhelming majority of variance in all cases, see Section 2.2.2). This facilitated sharper observation of the differences between product populations (plotting the first three DFs), but was qualitatively similar to the results of simple (unsupervised) PCA analysis.

Page 10
Notes and variations on this process: • No attempt at this stage was made to identify unknown products -the intention of this analysis is to obtain an overview of product distribution, a 'fingerprint', since thorough quantification & identification of every species present is neither practical nor necessary.
• Given these aims, peak picking algorithm settings were deliberately not stringent, to include as many features as possible. We note that while some noise may have been included as a result, its effect is likely to have been negligible: this is demonstrated through the observation that qualitatively similar differentiation of populations is observed when product peaks are filtered to include only potential product peptide masses from the AAs used (see Figure S3) and of the systematic variation of several peaks (see Figures S5 to S7 for example EICs). Furthermore, LC-MS/MS analysis of some species to identify isomers (see Figure S27c for typical example) demonstrates that peptide products are present as expected.
• Isobaric species (those with the same mass) are not resolved in MS detection, and since chromatographic separation frequently did not completely resolve manifolds of isobaric species resulting from different sequence permutations (e.g. GGGAG, GGAGG, GAGGG), in many cases it is likely that several species may have been included in the same 'feature' -manifested as broad manifolds of coeluting peaks. Since in many cases the shape (composition distribution) and size (amount of species present) of these features tends to vary in a robust (reproducible) manner, this is not problematic for the conclusions drawn.

General observations
• In all experiments, analysis reveals that many product populations are clearly and consistently different as a result of the variation of reaction environment: this can be observed in PCA ( Figure S3) and PC-DFA analysis ( Figure S2), and in extracted ion chromatogram (see for selected examples, demonstrating reproducible differences) and peak intensity data (Figures S8 to S10).
• PCA yields qualitatively similar results to PC-DFA in demonstrating this, but with less sharp separation. That is, the populations which can be observed to be similar, and those which are clearly resolved, in plots of PCA ( Figure S3) are generally those of which similar observations can be made in PC-DFA plots ( Figure S2). That PC-DFA, a supervised technique, provides sharper resolution than PCA (an unsupervised technique) is unsurprising; the qualitative similarity reflects the robust and reproducible nature of the difference between populations.
• In all cases, plotting contributions ( Figure S4) to the principal components demonstrates that population difference is not defined by a few 'key' features/species; instead, many provide similar (small) contributions.
• Since in most cases experimental repeats produced extremely similar results, in cases where results are not very similar (large 'bubbles') we suspect that this largely due to material loss during sample work-up (filtering; dialysis; filtering; dissolution), for example inconsistency in dialysis membranes. This is consistent with observations during work on these systems (e.g. LC-MS analysis of undialysed samples).
• When the feature list was 'filtered' to exclude all masses not corresponding to a plausible oligomer or the amino acids used (from a combinatorial list of possible peptide products from the AAs combined, as "Peptide mass product distributions", +/-0.01 Da), the resulting plots ( Figure S3) are qualitatively broadly similar to those unbiased by product expectations (the same populations are resolved/unresolved), demonstrating the robustness of the approach and that differences result from 'real' condensation products, not analytical artefacts. , with the reaction in which all amino acids were added together clearly resolved from all. In PC-DFA some of these pairs are resolved (although clearly adjacent), but this separation is not robustly observed across all analyses. • The reaction pattern is consistent with the trends observed in preliminary binary crossreactivity tests ("Intensity"=sum of MS intensity accounted-for by putative combinatorial products) where G/A hetero-oligomerisation clearly dominates. For example, products of GA reactions are likely to resemble AG if G/A hetero-oligomerisation rates are very much larger than either possible homo-oligomerisation. While our approach in this work has been non-deterministic, interested in observing difference, these observations point to the potential for deliberate 'programming', using modelling of rate measurements, however, as we observe that simple thermodynamic considerations are not adequate, this will require a more advanced approach.
Figure S1. Plot of data from preliminary cross-reactivity investigation for different G, A and H amino acid combinations. "Intensity" is the combined intensity corresponding to the masses of putative oligomeric products (trimer and larger) produced when reacted in simple binary mixtures in the same conditions as used in Section 2.   Page 21        Page 28 Figure S16.

Figure S19. Table of selected features ordered by m/z from the experiments where the order of addition was varied. Features were selected from a full list based on absolute MS intensity (appearing in top 20 for at least one condition); this is an arbitrary reduction of data for more detailed display, and it is important to note that no conclusion should be drawn on the significance of this selection due to the non-linear relationship between abundance and intensity. Intensities are averaged over experimental and analytical replicates.
Page 32 Figure S20.

Sequence permutation distribution difference between populations
As outlined above, our aim in LC-MS analysis was to characterise product distribution without the bias/distraction associated with product expectations. We see clearly, both in populationlevel analyses, and in simple observation of extracted ion chromatograms (EICs) of particular m/z values, that product distribution differs clearly and consistently. Since the (secondary & higher) structure and function of oligomeric species depend not only on their composition (e.g. which AAs are incorporated), but also on the sequence of monomers, it is instructive to ask: 'Is the sequence of oligomer products altered by the conditions being manipulated?".
To answer this question unequivocally is difficult; however, it requires identifying and separating very similar species, including those of identical mass. In many cases such isomeric species are extremely difficult to resolve using chromatography -even more so when the chromatography method is general, rather than optimised to resolve specific sequence variants.
Below, we show an example where discrete peaks in chromatograms can be assigned to correspond to particular species and demonstrate that different product ensembles can incorporate different sequence permutation distributions ( Figure S26; the basis of these assignments explained in Figure 27. Note: It was necessary to use synthetic standards (produced by standard SPPS) to confirm the identity of each peak, as robust unequivocal de novo assignment is not possible solely based on MS 2 data. MS 2 analysis of each peak did yield fragments consistent with β-and γ-series derived from the sequences finally assigned, however, other peaks were also observed which were consistent with other sequences. For example, MS 2 spectra of the first peak, which corresponds to AGGGG, included a strong peak with m/z = 151.0502: this is consistent with GG β-fragment produced from a peptide with an N-terminal GG, but inconsistent with simple β-or γ-fragments of the AGGGG sequence. We speculate that this might result from a Maclaffery Rearrangement. We include this note to illustrate that robust unequivocal de novo assignment of abiotic peptide sequence, where many of the possible sequence permutations are present, is not facile, even in this case with only three monomers (in contrast to biological samples, where complexity is limited, facilitating database approaches). We refrain from drawing conclusions based on such an approach, as they are likely flawed. This -and our intentions to move beyond these simple systems -is the basis for our preferring tools developed for untargeted metabolomics (no specific product expectations), over the more obvious tools developed for proteomics.

Reactivity testing using pNPA
In these experiments, the effects of product populations on the breakdown of para-nitrophenyl acetate (pNPA, colourless) to release para-nitrophenol (pNP, yellow) were observed, following this potentially very complex reaction system through the evolution of the yellow colour characteristic of free pNP (absorbance at 405 nm). Processing: Data was output from the instrument software in a spreadsheet format. Typical time-resolved traces can be observed in Figure S29. Initial rates were extracted using Microsoft Excel as the gradient (not constrained to the origin) of the plot of Abs405 (in AU) against time (in seconds) over the first 60 minutes (close to linear in all cases).

Page 42
Notes: We note that while this is a common assay for esterase activity, catalysis of ester hydrolysis by the condensation products is not the only possible reaction type. We are interested in the effect of the complete ensemble of products on the reaction system and have made no attempt to identify the mechanism of pNP release (the complex set of competing pathways may include: ionic strength effects on uncatalysed reaction, inhibition of hydrolysis by recognition, disassembly of active catalytic assemblies on pNPA or pNP recognition, and other pathways).  Box plot (g) compares the rate of pNP release from ensembles produced from reactions with different mixing histories, diluted to a constant concentration of 0.5 mg/ml (rather than dissolving whatever products are yielded by a reaction to a fixed volume).

Recognition assay using ThT
This was carried out following an adaptation of an established approach. 15

Procedure
Carbon-coated copper grids (200 mesh) were glow discharged in air for 30 seconds. The support film was touched onto the peptide solution surface for 10 seconds, and excess solution was removed using filter paper. 20 μl of negative stain (Nanovan; Nanoprobes) was applied and the mixture was blotted again using filter paper to remove any excess stain. The dried specimens were then imaged using an FEI Tecnai T20 Tranmission Electron Microscope (TEM) operating at 200 kV fitted with Gatan 794 Multiscan camera. Images were collected and converted to .tiff files using Gatan Microscopy Suite software. Page 46

Observation of different properties of gels produced on addition of Ca 2+ salts.
Following the difference in structural formation ability from amino acids' mixing history (Section 2.3.3), the difference of gelability was studied by peptides crosslinking with Ca 2+ .
This was performed by mixing 500 µl of each peptide solution (prepared in Section (2.1.4)) together with 2.5 µl of 1 M CaCl2, vortexing and leaving to stand overnight at room temperature. Gelability was verified by the inverted vial method (see Figure S32, in which those samples which are immobile were persistent in the position shown for periods > 1 h), and the products were observed using TEM following gelation, revealing dramatically different morphology ( Figure S33).

Environment-Directed Complex Mixture Experiments: Product Analysis
LC-MS analysis was accomplished in an adaptation of the general procedure described in This LC-MS data was then processed and plotted as described in Section 2.2.1: peak picking and grouping, and gap-filling from raw data where no peaks were observed. PCA was performed on the resulting data as earlier (m/z and rt coordinates for each feature, with corresponding intensity for each sample), again with scaling. The results of this analysis are shown below, along with some sample EICs illustrating variance.   20 for at least one condition); this is an arbitrary reduction of data for more detailed display, and it is important to note that no conclusion should be drawn on the significance of this selection due to the non-linear relationship between abundance and intensity. Intensities are averaged over experimental and analytical replicates. Figure S40.

Page 57
Formula assignment for SD mix: As in our previous work considering patterns in these kinds of complex mixtures, we have made very tentative formula assignments of the most influential features observed, to illustrate the kinds of compositions might be present. We followed a simplified version of a procedure we reported previously for this task. 18 This employed a script in R using the RDisop library 19 to assign compositions on the basis of m/z limiting to the elements carbon, hydrogen, nitrogen, oxygen, and sodium, and to a 10 ppm error. We discarded assignments falling outside the following elemental composition rules as implausible: 20  Since the identification of species is outside the remit of this work (and has no bearing on our conclusions), more in-depth analysis to determine formulae was not pursued.
(Table on next page) Figure S41.