The Folding Pathway of an Ig Domain is Conserved On and Off the Ribosome

Proteins that fold cotranslationally may do so in a restricted configurational space, due to the volume occupied by the ribosome. How does this environment, coupled with the close proximity of the ribosome, affect the folding pathway of a protein? Previous studies have shown that the cotranslational folding process for many proteins, including small, single domains, is directly affected by the ribosome. Here, we investigate the cotranslational folding of an all-b immunoglobulin domain, titin I27. Using an arrest peptide-based assay and structural studies by cryo-EM, we show that I27 folds in the mouth of the ribosome exit tunnel. Simulations that use a kinetic model for the force-dependence of escape from arrest, accurately predict the fraction of folded protein as a function of length. We used these simulations to probe the folding pathway on and off the ribosome. Our simulations - which also reproduce experiments on mutant forms of I27 - show that I27 folds, while still sequestered in the mouth of the ribosome exit tunnel, by essentially the same pathway as free I27, with only subtle shifts of critical contacts from the C to the N terminus. Significance Statement Most proteins need to fold into a specific three-dimensional structure in order to function. The mechanism by which isolated proteins fold has been thoroughly studied by experiment and theory. However, in the cell proteins do not fold in isolation, but are synthesized as linear chains by the ribosome during translation. It is therefore natural to ask at which point during synthesis proteins fold, and whether this differs from the folding of isolated protein molecules. By studying folding of a well characterized protein domain, titin I27, stalled at different points during translation, we show that it already folds in the mouth of the ribosome exit tunnel, and that the mechanism is almost identical to that of the isolated protein.


Introduction 56
To what extent is the cotranslational folding pathway of a protein influenced by the presence 57 of the ribosome and by the vectorial emergence of the polypeptide chain during translation? 58 force-measurement assay (modified from (2)). I27, preceded by a His-tag, is placed L 122 residues away from the last amino acid of the SecM AP, which in turn is followed by a 23-123 residue C-terminal tail derived from E. coli LepB. Constructs are translated for 15 min. in the 124 PURE in vitro translation system, and the relative amounts of arrested and full-length peptide 125 chains produced are determined by SDS-PAGE. The fraction full-length protein, fFL, reflects 126 the force exerted on the AP by the folding of I27 at linker length L. At short linker lengths 127 (top), there is not enough room in the exit tunnel for I27 to fold, little force is exerted on the 128 AP, and the ribosome stalls efficiently on the AP (fFL ≈ 0). At intermediate linker lengths 129 (middle), there is enough room for I27 to fold but only if the linker segment is stretched, 130 force is exerted on the AP, and stalling is reduced (fFL > 0). At long linker lengths (bottom), 131 I27 has already folded when the ribosome reaches the last codon in the AP, and again little 132 force is exerted on the AP (fFL grey, the peptidyl-tRNA with the nascent chain in green, and an additional density 158 corresponding to I27 at the ribosome tunnel exit in red. The black cartoon eye and dash lines 159 indicate the angle of view in panel (C). The density contour level for feature visualization is 160 at 1.7 times root-mean-square deviation (1.7 RMSD). (B) Rigid-body fit of the I27 domain 161 (PDB 1TIT) to the cryo-EM density map displaying from high (left) to low (right) contour 162 levels at 2.6, 2.0 and 1.4 RMSD, respectively. N and C represent the N and C termini of the 163 I27 domain, respectively. (C) View looking into the exit tunnel (arrow) with density for the 164 nascent chain (nc) in dark green. Ribosomal proteins uL29 (blue; PDB 4UY8), uL24 (light 165 green; the β hairpin close to I27 domain was re-modelled based on PDB 5NWY) and the 166 fitted I27 domain (red) are shown in cartoon mode; 23S RNA and proteins not contacting I27 167 are shown as density only. The density contour level is at 5 RMSD excluding tRNA, nascent 168 chain and I27 domain, which are displayed at 1.7 RMSD. 169 170 Coarse-grained molecular dynamics simulations recapitulate I27 folding on the 171 ribosome 172 The yield of folded protein in arrest peptide experiments has been used as a proxy for the 173 pulling forces that are exerted on the nascent chain at different points during translation in all 174 studies to date (1,2,29). Here, to further elucidate the molecular origins of these forces and 175 provide a quantitative interpretation of the observed folding yield of I27, we have calculated 176 force profiles based on coarse-grained MD simulations (see Methods). Briefly, in the MD 177 model, the 50S subunit of the E. coli ribosome (36) (PDB 3OFR) and the nascent chain are 178 explicitly represented using one bead at the position of the Ca atom per amino acid, and three 179 beads (for P, C4', N3) per RNA base ( Figure 3A). The interactions within the protein were 180 given by a standard structure-based model (37)(38)(39), which allowed it to fold and unfold. 181 Interactions between the protein and ribosome beads were purely repulsive (40) and the 182 ribosome beads were fixed in space, as in previous simulation studies (18). I27 was 183 covalently attached to unstructured linkers having the same sequences as those used in the 184 force-profile experiments ( Figure 3B) and the C terminus of the linker was tethered to the last 185 P atom in the A-site tRNA (41) with a harmonic potential, allowing the force exerted by the 186 folding protein to be directly measured. The potential chosen was stiff enough that 187 displacements caused by typical pulling forces were smaller than 1 Å. For each linker length 188 L, we used umbrella sampling to determine the average force exerted on the AP by the 189 protein in the folded and unfolded states while arrested, as well as the populations of those 190 two states ( Figure 3C). We also estimated the folding and unfolding rates directly from 191 folding/unfolding simulations. blue. Pseudo-atoms with grey colour are not used in the simulations. The instantaneous force 198 exerted on the AP is calculated from the variation in the distance x between the C-terminal 199 Pro pseudo-atom and the next pseudo-atom in the linker (see inset protein is in the unfolded or folded state, from which the fraction full-length protein obtained 212 with a given linker length and incubation time can be determined from a kinetic model, as 213 described in Methods. The calculated fFL profile for I27 is shown in Figure 3D (see also SI 214 Appendix, Fig. S5) for the full solution of the kinetic model, as well as for an approximation 215 in which the folding and unfolding rates are assumed to be faster than the escape rate ("pre-216 equilibrium"). Both results are very consistent with each other, as well as with the 217 experimental profile. The peak in the folding yield arises as consequence of two opposing 218 effects, the force exerted by the folding protein and population of the folded state, which 219 respectively decrease and increase as the linker length increases. In the simulations with the 220 I27[L=35] construct, the folded I27 domain is seen to occupy positions that largely overlap 221 with the cryo-EM structure (Supporting Video S2). Overall, these results suggest that the MD 222 model provides a good representation of the folding behaviour of the I27 domain in the 223 ribosome exit tunnel. To show that the simulation model is not specific to I27, we have also 224 applied it to another two proteins with different topologies for which experimental force 225 profiles have been recorded, Spectrin R16 (all-a fold) and S6 (a/b fold) (2, 29). In these 226 cases, we also obtain force profiles similar to experiment (SI Appendix, Fig. S6 and S7). 227 228

Force profiles of I27 variants probe the folding pathway 229
To test whether the cotranslational folding pathway is the same as that observed for the 230 isolated I27 domain in vitro, we investigated three destabilised variants of I27, both by 231 simulation and experiment. One mutation in the core, Leu 58 to Ala (L58A), located in b-232 strand E ( Figure 4A) destabilizes the protein by 3.2 kcal mol -1 , and removes interactions that 233 form early during folding of the isolated domain, playing a key role in formation of the 234 folding nucleus (f-value = 0.8) (26). Two further mutations, M67A and deletion of the N-235 terminal A-strand, remove interactions that form late in the folding of I27 (i.e., both mutants 236 have low f-values (26, 27)). The A-strand is the first part of I27 to emerge from the 237 ribosome, while M67 is located in a part of I27 that is shown by cryo-EM to be located in 238 very close proximity to a β hairpin loop of ribosomal protein uL24 in I27- The simulated force profile for the L58A variant predicts a much lower force peak than for 254 wild-type I27; likewise, the experimental force peak is lower and broader than for wild-type, 255 extending from L = 37-53 residues ( Figure 4B). The fFL values are very similar to those 256 obtained for I27[L58A,W34E], a non-folding variant of I27[L58A]. Therefore, the weak 257 forces seen at L ≈ 40-50 residues are not due to a folding event, indicating that I27[L58A] 258 does not exert an appreciable force due to folding near the ribosome. 259 The A-strand comprises the first seven residues of I27 and removal of this strand, 260 I27[-A], results in a destabilisation of 2.78 kcal mol -1 ; however, both the simulated and 261 experimental force profiles for I27[-A] are very similar to those for wild-type I27 ( Figure  262 4C). Residue M67 is located in the E-F loop, and mutation to alanine results in a 263 destabilisation of 2.75 kcal mol -1 ; for this variant, folding commences at L ≈ 35 residues as 264 for wild-type I27, but the peak is much broader ( Figure 4D). Non-folding control experiments 265 uL29 have been suggested to form a potential interaction site for nascent proteins such as 274 trigger factor (43), signal recognition particle (44) and SecYE (45). Here we have explored 275 the hypothesis that the broad force peak of mutant M67A might due to interactions between 276 an exposed hydrophobic cavity on I27[M67A] resulting from the mutation, and hydrophobic 277 surface residues of ribosomal proteins uL23 and uL29. By introducing such interactions into 278 the model, we are able to obtain a broad peak in the force profile very similar to that seen in 279 experiment ( Figure 4D). 280 The folding pathway is only subtly affected by the presence of the ribosome. 281 To compare the folding pathways when the protein is folding near the tunnel exit or outside 282 the ribosome, we estimated f-values based on the transition paths of I27 folding on the 283 ribosome from our coarse-grained simulations, using a method introduced previously(46). 284 The transition paths are those regions of the trajectory where the protein crosses the folding 285 barrier, here defined as crossing between Q = 0.3 and Q = 0.7. For each linker length, 30 286 transition paths were collected from MD simulations. To reduce the uncertainty in the 287 experimental reference data, we only compared with experimental f-values if the change in 288 folding stability between the mutant and the wild type is sufficiently large (|∆∆ | > 7 kJ/mol) 289 (47). As seen in Figure 5A, when the linker is long (L = 51 residues) and I27 is allowed to L=31 the simulated f-values are higher at the N terminus and lower at the C terminus, than 302 the experimental values, reflecting a change in importance of these regions when I27 folds in 303 the confines of the ribosome. Middle row: Relative probability that if a particular contact is 304 formed then the protein is on a folding trajectory, (TP| +, ) .. . When the protein is 305 constrained the limiting factor is formation of a few key contacts. A cartoon of the ribosome 306 with I27 in red is shown on each panel. Bottom row: The top ten most important contacts are 307 coloured in cyan on the native structure. 308

309
To obtain a more detailed picture regarding the relative importance of different native 310 contacts in the folding mechanism, we computed the conditional probability of being on a 311 transition path (TP), given the formation of a contact +, between residues i and j, 312 Residue position determining a successful folding event. (TP| +, ) .. is closely related to the frequency of the 314 contact qij on transition paths ( +, |TP), but is effectively normalized by the probability that 315 the contact is formed in non-native states ( +, ) .. , and can be expressed as: 316 ..
[1] 317 where (TP) nn is the fraction of non-native states which are on transition paths at 318 equilibrium. The subscript nn means that only the non-native segments of a trajectory are 319 included, i.e., unfolded states and transition paths; the native, folded state is not included in 320 the calculation since native contacts are always formed in this state. The simulations suggest 321 that formation of native contacts between the N and C termini is somewhat more important 322 when folding takes place in the mouth of the exit tunnel (L = 31 residues) than far outside the 323 ribosome (L = 51 residues) ( Figure 5D-F, upper left-hand corner in the panels). This is likely 324 due to the greater difficulty of forming these contacts (examples are shown in Figure  Our experiments show that I27 variants destabilized in regions of the protein that are 363 unstructured, or only partially structured, in the transition state, are still able to commence 364 folding close to the ribosome. The force profiles reveal that the onset of folding of mutants 365 with the A-strand deleted, or with the Met 67 to Ala mutation in the E-F loop, is the same as 366 for wild-type although these have a similar destabilisation as L58A (Figure 4). The broader 367 peak observed experimentally for M67A is harder to interpret. A plausible explanation is that 368 the mutation introduces non-specific interactions of the folded domain with the ribosome 369 surface, and we have shown that incorporating such interactions into the simulations could 370 reproduce the results. An additional factor may be that that the mutation is in a region that 371 interacts closely with ribosomal protein uL24 in the wild-type cryo-EM structure (SI 372 Appendix Fig. S3). 373 Our simulations reproduce the onset of folding in the three mutant variants of I27 (Figure 4), 374 and so give us the confidence to investigate how confinement within the ribosome affects the 375 folding pathway of I27. We used simulations to investigate the folding of I27 arrested on the 376 ribosome at various linker lengths, using a Bayesian method for testing the importance of 377 specific contacts on the folding pathway, as well as by computing f-values ( Figure 5). 378 Overall, we find that the mechanism and pathway of folding are robust towards variation in 379 linker length and relatively insensitive to the presence of the ribosome; small but significant 380 changes are observed only for contacts near the N and C termini. These shifts are consistent 381 with the greater importance of forming N-terminal contacts when the C terminus is 382 sequestered within the exit tunnel, possibly to compensate for loss of contacts at the C 383 terminus. 384 In our kinetic modelling, we found that we obtained similar results with or without the 385 assumption that folding and unfolding are fast relative to the escape rate, suggesting that this 386 "pre-equilibrium" assumption is justified, at least for this protein. The reason for its validity 387 in the case of I27 can be seen by comparing the folding and unfolding rates with the force-388 dependent escape rate of ~ 2.4 × 10 -3 s -1 obtained at the highest forces of ~20 pN (c.f., Fig.  389 3C). Folding and unfolding rates at different linker lengths can be obtained by combining the 390 linker-length dependence of the rates from simulation with the known folding/unfolding rates 391 for isolated I27 from experiment (SI Appendix, Fig. S9). The presence of the ribosome 392 increases the unfolding rate at shorter linker lengths so that it is faster than the maximum 393 escape rate, while not slowing the folding rate sufficiently for it to drop below the escape 394 rate. Note that the unfolding rate does drop below the maximum escape rate at larger linker 395 lengths, but by that point the folded population is already almost 100%, so the pre-396 equilibrium assumption still gives accurate results. Although this assumption appears to be 397 justified in the case of I27, it is probably not true in general, and it will be interesting to 398 investigate for slower-folding proteins in future. 399 The arrest peptide experiments, in which a protein exerts a force due to folding in 400 some ways resemble atomic force microscopy or optical tweezer experiments in which an 401 external force is applied to the protein termini. It is important to note, however, that the 402 nature and effect of the forces exerted on the folding protein by tethering to the ribosome are 403 very different than is the case for pulling on both termini by an external force. For example, 404 forces of the magnitude seen in this work (up to ~20 pN) tend to have very little effect on the 405 unfolding rate when applied to the termini of I27, due to the similarity in extension of the 406 folded and transition states (61); by contrast folding rates are dramatically slowed, even by 407 very small forces, due to the large difference in extension of between unfolded and transition 408 states (56). The forces arising from tethering to the ribosome are due to the folding of the 409 protein itself rather than an external device. They arise from the constriction of available 410 configuration space, particularly for folded and partially folded states, as well as from any 411 attractive interactions between the protein and the ribosome. Our simulations suggest that for 412 I27, reducing the linker length speeds up unfolding and slows folding rates by similar factors. 413 Thus, it seems that comparisons to the effects of forces exerted by AFM and optical tweezer 414 experiments need to be performed with care. 415 We have previously shown that α-helical proteins can fold co-translationally (2) intermediates (11,65) or folding by different pathways on the ribosome (2). This mechanistic 431 difference may relate partly to the small contact order of helical proteins, allowing partially 432 folded states to be more stable than for all-b proteins. The situation for multidomain proteins 433 is likely to be still more complicated, as some studies have already indicated (11,23,66,67

In vitro transcription and translation 487
Transcription and translation were performed using the commercially available PUREfrex in performed at 37 °C, 500 r.p.m. for exactly 30 min. The resultant force profile was slightly 502 higher than that obtained at 15 min but has essentially the same shape (SI Fig. S5). 503 The reproducibility of force profile data has been discussed previously (2).

Cloning and purification of ribosome-nascent chain complexes 518
The I27 construct at L = 35, which is at the peak of fFL ( Figure 1B), was studied by cryo-EM. 519 The SecM AP in these constructs was substituted with the TnaC AP (34) for more stable 520 arrest, and the constructs were engineered to maintain a linker length of 35 amino acid 521 residues. An N-terminal 8X His tag was introduced to enable purification. The amino acid 522 sequence of the construct used was (I27 in bold and TnaC AP underlined): 523

AANLKVKELSGSGSGSGGPNILHISVTSKWFNIDNKIVDHRP** 526
The construct was engineered into a pBAD expression vector, under the control of an 527 arabinose-inducible promoter. The translation-initiation region was optimized as described in 528 (68). The plasmid was transformed into the E. coli KC6 ΔsmpB ΔssrA strain. 4 colonies were 529 picked and tested for expression of the RNCs at 37°C in Lysogeny broth (LB). 530 Large-scale purification of RNCs was carried out based on a protocol described in (34). 531 Briefly, a single colony of the KC6 cells found to express the RNCs was picked and cultured 532 in LB at 37°C to an A600 of 0.5. Expression was induced with 0.3% arabinose and was carried 533 out for 1 hour. Thereafter, the cells were chilled on ice, harvested by centrifugation, and 534 resuspended in Buffer A at pH 7.5 (50 mM HEPES-KOH, 250 mM KOAc, 2 mM 535 Tryptophan, 0.1% DDM, 0.1% Complete protease inhibitor). Cell lysis was carried out by 536 passing the cell suspension thrice through the Emulsifex (Avestin) at 8000 psi at 4°C. The 537 lysate was cleared of cell debris by centrifugation at 30,000xg for 30 min in the JA25-50 538 rotor (Beckman Coulter). The supernatant obtained was loaded on a 750 mM sucrose cushion 539 (in Buffer A) and centrifuged at 45, 000 x g for 24 hours in a Ti70 rotor (Beckman Coulter) 540 to obtain a crude ribosomal pellet, which was resuspended in 200 µl Buffer A by shaking 541 gently on ice. 542 RNCs from the crude suspension were purified via their His tags by affinity purification 543 using Talon (Clontech) beads, which was pre-incubated with 10 µg/ml tRNA to reduce 544 unspecific binding of ribosomes. The suspension was incubated with the beads for 1 hour at 545

Cryo-EM sample preparation, data collection, processing and accession codes 555
Approximately 4 A260/ml units of RNCs were loaded on Quantifoil R2/2 grids coated with 556 carbon (3 nm thick) and vitrified using the Vitrobot Mark IV (FEI-Thermo) following the 557 manufacturer's instructions. Cryo-EM data was collected at the Cryo-EM National Facility at 558 the Science for Life Laboratory in Stockholm, Sweden. 559 Data was acquired on a 300 keV Titan Krios microscope (FEI) equipped with a K2 camera 560 and a direct electron detector (both from Gatan). The camera was calibrated to achieve a 561 pixel size of 1.06 Å at the specimen level. 30 frames were acquired with an electron dose 562 0.926 e -/Å 2 /frame and a total dose of 27.767 e -/Å 2 and defocus values between -1 to -3 µm. 563 The first two frames were discarded and the rest were aligned using MotionCor2 (69). Raw 564 images were cropped into squares by RELION 2.1 beta 1 (70). Power-spectra, defocus values 565 and estimation of resolution were determined using the Gctf software (71) and all 2,613 566 micrographs were manually inspected in real space, in which 2,613 were retained. 468,015 567 particles were automatically picked by Gautomatch (http://www.mrc-lmb.cam.ac.uk/kzhang/) 568 using the E. coli 70S ribosome as a template. Single particles were processed by RELION 2.1 569 beta 1 (70). After 80 rounds of 2D classification, 384,039 particles were subjected to 3D 570 refinement using the E. coli 70S ribosome as reference structure, followed by 160 rounds of 571 3D classification without masking and 25 rounds of tRNA-focused sorting. One major class 572 containing 301,510 particles (64% of the total) was further refined including using a 50S 573 mask, resulting in a final reconstruction with an average resolution of 3.2 Å (0.143 FSC). The 574 local resolution was calculated by ResMap (72). Finally, the final map was obtained by local 575 B-factoring followed by low-pass filtering to 4.5 Å by RELION 2.1 beta 1 (70) in order to 576 best demonstrate the I27 domain. 577 For interpretation of the cryo-EM density, the cryo-EM structure model (PDB 4YU8) of E. 578 coli TnaC-stalled ribosome was fitted into corresponding density using UCSF Chimera (73). 579 The NMR model (PDB 1TIT) of I27 domain was fitted into the extra density of TnaC-stalled 580 ribosome using UCSF Chimera (73). Since the I27 domain represents a flat ellipsoid, we used 581 all four major and minor axes covering all possible orientations of the model fitting within 582 the density to validate the orientation of the fitted I27 model. Briefly, the model with four 583 different orientations were converted into densities (8 Å) by UCSF Chimera, and the cross-584 correlation coefficients of each model map and the isolated I27 density were calculated by 585 RELION 2.1 beta 1 (70). Finally, uL24 β hairpin was remodeled as the tip of the hairpin is 586 shifted due to the existence of I27 domain. 587

Coarse-grained molecular simulations 598
The 50S subunit of the E. coli ribosome (PDB 3OFR (36)) and the nascent chain are 599 explicitly represented using one bead at the position of the a-carbon atom of each amino 600 acid, and three beads (for P, C4', N3) per RNA residue (Figure 2A). The interactions within 601 the protein were given by a standard structure-based model (37-39), which allowed it to fold 602 and unfold. Interactions between the protein and ribosome beads were purely repulsive (40)  where +, is the distance between residues i and j, is the range of the interaction and 621 represent the strength of the interaction. and are fixed at 6 Å and 5 kJ/mol respectively. 622 Residues of I27[M67A] which are involved in the attractive interactions are defined as the 623 ones whose heavy atoms are within 4.5 Å of any heavy atoms from residue 67 in the native 624 state. 625 To calculate the pulling force exerted on the nascent chain by the folding of I27, the bond 626 between the last and the second last amino acid of the SecM AP was modelled by a harmonic 627 potential as a function the distance between these two atoms, ( Figure 3B where ^ is a reference distance. Here ^ is set to 3.8 Å, which is the approximate distance 630 between adjacent Ca atoms in protein structures and ] is a spring constant, set to 3000 631 kJ.mol -1 .nm -2 . The value of ] was chosen so that the average displacement −^ remains 632 below 1 Å for forces up to ~500 pN, which is much larger than the forces actually exerted by 633 the folding protein. The pulling force on the nascent chain was measured by the extension of 634 this bond as = − ] ( −^). 635 I27 was covalently attached to unstructured linkers having the same sequences as used in the 636 force-profile experiments (see Figure 2B). Linker amino acids are repulsive to both the 637 ribosome and I27 beads, with interaction energy as described in Eq. 2. 638 The protein in its arrested state is subject to force F(t), which will fluctuate, for example 639 when the protein folds or unfolds. The rate of escape from arrest has been shown to be force-640 dependent (25); here we approximate the sensitivity to force using the phenomenological 641 expression originally proposed by Bell (42) where ^ is a zero-force rupture rate, ∆ ‡ is the distance from the free energy minimum to 644 the transition state, = 1/ f where f is Boltzmann's constant and the absolute 645 temperature. While there are functions to describe force-dependent rates with stronger 646 theoretical basis, we use the Bell equation due to its simplicity and because its parameters 647 have previously been estimated from optical tweezer experiments for the SecM AP. In all 648 cases, we set ^ (Eq. 5) to 3.4 ×10 -4 s -1 and ∆ ‡ to 3.2 Å, based on the values determined by 649 Goldman et al. (they estimated ^ and ∆ ‡ to be in the range of 0.5 ×10 -4 to 20 ×10 -4 s -1 and 650 1-8 Å, respectively) (25). 651 We assume the probability of remaining on the ribosome ( ) = 1 − jk ( ) assuming that 652 ̇= − ( ( )), hence 653 The escape of I27 from the ribosome can be described using kinetic model shown in SI 655 Appendix, Fig. S8 which explicitly takes into account the linker length-dependent 656 folding/unfolding rates of the I27 nascent chain, t ( ) and v ( ), on the ribosome, and the 657 force-dependent rate of escape from ribosome: ( t ( )) and ( v ( )).  The solution to the kinetic model can be simplified if we further assume that the escape from 683 the ribosome is slow relative to the folding and unfolding of the protein. In this situation, we 684 can approximate S(t) in terms of the mean forces experienced when the protein is unfolded, 685 Fu, or folded, Ff, and the unfolded and folded populations of Pu and Pf respectively, 686 The equilibrium properties of the system for each linker length were obtained from umbrella 688 sampling using the fraction of native contacts Q as the reaction coordinate, allowing t , v 689 and Fu, Ff to be determined ( Figure 3C). The details of the definition of Q have been 690 previously described (48) where the sum runs over the N pairs of native contacts ( , ), +, is the distance between and 693 in configuration, +, is the distance between and in the native state, λ =1.2 which 694 accounts for fluctuations when the contact is formed. A boundary of Q = 0.5 is used to 695 separate folded from unfolded states. 696 In order to characterize folding mechanism, we used transition paths from folding 697 simulations for the L = 51 case at 291 K. 50 independent simulations, each started from fully 698 In which ( +, | ) is the probability that the native contact +, between residues i and j is 708 formed on transition paths as defined above. We also characterized the importance of 709 individual contacts in determining the folding mechanism using /TP2 +, 3 .. , defined in Eq. 710 1 of the main text, i.e. the probability of being on a transition path given that contact qij is 711 formed and the protein is not yet folded. Having already calculated ( +, | ) above, 712 evaluating ( | +, ) nn required / +, 3 .. , the probability of a contact being formed in all