Robust nonequilibrium pathways to microcompartment assembly

Cyanobacteria sequester photosynthetic enzymes into microcompartments which facilitate the conversion of carbon dioxide into sugars. Geometric similarities between these structures and self-assembling viral capsids have inspired models that posit microcompartments as stable equilibrium arrangements of the constituent proteins. Here we describe a different mechanism for microcompartment assembly, one that is fundamentally nonequilibrium and yet highly reliable. This pathway is revealed by simulations of a molecular model resolving the size and shape of a cargo droplet, and the extent and topography of an elastic shell. The resulting metastable microcompartment structures closely resemble those of carboxysomes, with a narrow size distribution and faceted shells. The essence of their assembly dynamics can be understood from a simpler mathematical model that combines elements of classical nucleation theory with continuum elasticity. These results highlight important control variables for achieving nanoscale encapsulation in general, and for modulating the size and shape of carboxysomes in particular.

Spatial segregation is an ubiquitous strategy in biology for organizing the crowded, active viscera of the cell [1][2][3][4].Viral capsids exemplify this organization at very small scales, sequestering genetic material from the cytosol and recapturing it for delivery to new hosts.Extensive work has explored the structure, stability, and assembly dynamics of viruses, highlighting generic design principles and physical origins of the spontaneous assembly process [5][6][7][8][9][10].Overall capsid structure typically follows from the arrangements of neighboring proteins that are preferred by their noncovalent interactions.Strong preferences yield regular and highly stable structures, but at the same time impair kinetic accessibility by producing deep kinetic traps [11].
Bacterial microcompartments serve a very different biomolecular purpose from viruses but have striking structural similarities, namely, a quasi-icosahedral protein shell that assembles around a fluctuating cargo [2,[12][13][14].This comparison raises the question: Do the same assembly principles, based on a balance between equilibrium stability and kinetic accessibility, apply to microcompartments as well?Here we focus on a paradigmatic example of a microcompartment, the carboxysome.Carboxysomes play an essential role in the carbon fixation pathway of photosynthetic cyanobacteria [15].The shell proteins of the carboxysome, which have hexameric and pentameric crystallographic structures, assemble to encapsulate a condensed globule of protein including the enzymes RuBisCO and carbonic anhydrase [16][17][18].These vital, organelle-like structures regulate a microscopic environment to enhance catalytic efficiency, which has made them an attractive target for bioengineering applications [19][20][21][22].
Like some viral capsids, carboxysomes feature pronounced facets and vertices, and crystal structures of shell components suggest highly symmetric local protein arrangements [16,18,23].Unlike in viral capsids, however, these apparent preferences do not directly indicate the assembled structure, nor do they clearly point to a characteristic microcompartment size [24].Indeed, experimental evidence has not constrained a particular mechanism for assembly: live cell images support the notion that the cargo assembles first and is subsequently coated by the shell [12], while direct observations of partially formed carboxysomes in electron micrographs point to a cooperative mechanism for assembly [25].Identifying essential variables for controlling or emulating carboxysome assembly awaits a clearer understanding of its dynamical pathways and underlying driving forces.
Theoretical and computational models for nanoshell assembly have primarily followed approaches inspired by viral capsid assembly.These approaches have emphasized the role of preferred shell curvature and shell-cargo interactions as a template for the final structure [26].For example, irreversible growth of shells comprising monomers that prefer a bent binding geometry has been shown in simulations to successfully assemble empty shells [27,28].The assumption of such a preferred local curvature, however, is at odds with structural models of the carboxysome based on crystallography of shell proteins, which feature tiling of the shell by hexameric proteins that appear to bind optimally with zero curvature [18,25,29] Indeed, a number of experimental measurements bolster the view that bacterial microcompartments are bounded by protein sheets that are essentially flat away from localized regions of high curvature.The predominant constituents of diverse microcompartments (e.g., β-carboxysomes, α-carboxsyomes, eut microcompartments, and pdu microcompartments) crystallize as layers of flat sheets comprising hexamers whose lateral contacts are genetically conserved [18].Recent atomic force microscopy measurements have demonstrated that these constituents also form flat monolayers in physiological buffer conditions [29].Furthermore, in vivo tomography data strongly suggest that biological carboxysomes have extended flat faces with curvature sharply localized at the joint of a facet [25].
Here we introduce and explore a model based on a mechanistic perspective that is fundamentally different from the one commonly applied to virus assembly and that does not require assuming an innately preferred curvature for contacts between shell proteins.The basic components are a cargo species that is prone to aggregation, a shell species that spontaneously forms flat, hexagonally symmetric elastic sheets, and an attractive interaction between the inside of the shell and the cargo, depicted in Figs.1A, S1-S3.These ingredients appear to be the essential constituents for carboxysome biogenesis in vitro-mutagenesis experiments have shown that pentameric proteins sometimes presumed to stabilize shell curvature are in fact not necessary for the formation of faceted shell structures [24].For such a basic model, thermodynamic considerations imply that finite encapsulated structures have negligible weight at equilibrium (Sec.S3).Nevertheless, we show that regularly sized microcompartments can emerge reliably in the course of natural dynamics.

I. METHODS
We regard the protein shell of a growing carboxysome as a thin elastic sheet, whose energy can be discretized and expressed as a sum over the edges ij of a triangulated surface [30].Its Hamiltonian is where l ij is the length of an edge, whose deviations from the preferred length l 0 experience a restoring force with stiffness .The bending rigidity κ sets the energy scale for developing a nonzero angle θ ij between normal vectors of triangles sharing the edge ij [30].
We associate each triangular face in the discretized sheet with a protein monomer.The term H steric in (1) imposes steric constraints that prevent overlap of these monomers, by placing a hard sphere of diameter 0.45l 0 at the center of each triangle in the sheet.Because the deformations of individual shell proteins are expected to be small relative to the bending fluctuations between adjacent monomers, we work in a limit where l 2 0 κ.
The cohesive interactions that bind adjacent shell proteins in the sheet are accounted for separately, with an energy H bind that tightly associates the vertices of contacting monomers.In the limit that bound vertices coincide exactly, the thermodynamic influence of H bind depends only on the connectivity of the sheet, contributing a binding affinity factor K for each constrained vertex, as described in SI.
We represent aggregating cargo as an Ising lattice gas on an FCC lattice with chemical potential µ c , restricted to configurations with a single connected cargo droplet.Nearest neighbors are coupled through an interaction energy c , where σ i = {0, 1} for occupied and unoccupied lattice sites, respectively, ij indicates that the sum is taken over nearest neighbors, and N L is the number of lattice sites.The lattice spacing used throughout is l 0 , the average length of a shell monomer edge.The primary cargo species in a carboxysome, the protein RuBisCO, has a diameter L 0 that is larger than l 0 ; adopting a finer lattice spacing amounts to averaging approximately over configurational fluctuations of RuBisCO about a close-packed lattice.
Interactions between cargo and shell species in our model mimic a short-ranged directional attraction suggested by structural data [31], and the steric repulsion intrinsic to compact macromolecules.For a particular lattice site i and shell monomer j, where nj is the outward facing normal vector of shell monomer j.The notation ∈ V(i) means that the point specified by the vector lies within the volume V(i) occupied by lattice cell i.The full interaction is the sum of H int over all i and j.This interaction, which contributes a favorable energy if the inward-facing normal lies within a cargo-occupied lattice site and an unfavorable energy if the outward-facing normal vector does so, is depicted schematically in Fig. S5.The thermal relaxation of a shell with a fixed number of vertices and fixed connectivity, subject to the energetics described above, can be simulated with a straightforward Monte Carlo dynamics.In each move, a single vertex is chosen at random.As depicted in Fig. S2 (A), a random perturbation in three-dimensional space is made to the selected vertex, attempting to change its position from a to a .The resulting energy difference, ∆H shell + ∆H int , is used in a standard Metropolis acceptance criterion, acc(a → a ) = min 1, e −β(∆H shell +∆Hint) . ( This procedure ensures that the configurations sampled in the Markov chain are consistent with a Boltzmann distribution. A Monte Carlo dynamics for growth of the shell is significantly more involved.In order that the procedure satisfy detailed balance, care must be taken to ensure that every step is reversible and that algorithmic asymmetries of forward and reverse moves are fully accounted for.Our goal is to draw samples from a grand canonical distribution, p(X, N s , {σ}) = Ξ −1 z Ns s e −β(H shell +H bind +Hcargo+Hint) , (5) where Ξ denotes the grand canonical partition function, X gives the coordinates of the shell protein configuration, N s is the number of shell monomers, and z s ∝ e βµs is an activity set by the concentration of free shell monomers in solution.We implement two types of Monte Carlo moves that allow the structure to grow, increasing the number of monomers by adding new vertices at the edge of the shell.Fusion of pre-existing vertices can also occur, representing the binding between monomers in the shell that are nearby in space but not necessarily in connectivy.Details of these moves and their reverse counterparts, along with acceptance criteria that ensure detailed balance with respect to p(X, N s , {σ}), are described in Sec.SI 1.In microcompartment growth simulations, they are performed in tandem with standard trial moves that change the occupation state of the cargo lattice, acting to grow or shrink the cargo droplet.
The routes of Monte Carlo trajectories propagated in this way are influenced by energetic parameters like , κ, and µ, and by the shell binding affinity K.The pathways are also shaped by the relative frequencies of proposing each type of move.A basic time scale τ is set by the duration of a "sweep" in which each vertex experiences a single attempted spatial displacement (on average).For every n c sweeps we attempt roughly N boundary moves that add or remove material at the surface of the cargo droplet, where N boundary is the number of lattice sites defining the droplet surface.This procedure establishes an effective rate k 0 c = (2n c τ ) −1 at which cargo monomers arrive from solution at a given surface site.Similarly, a basic rate k 0 s = (2n s τ ) −1 for shell monomer arrival at a given site on the shell's perimeter is established by attempting N perim shell monomer addition/removal moves for every n s sweeps, where N perim is the number of edges at the shell boundary.In experiments these arrival rates are approximately proportional to the solution concentrations of the respective monomers.As a result, microcompartment mass changes slowly on the time scale of elastic fluctuations.Dynamics of vertex fusion and fission also proceed much more rapidly than growth.In essence, all processes other than monomer addition and removal follow along adiabatically.For the fate of assembly, the key dynamical parameter in our model is therefore the ratio k 0 c /k 0 s of arrival rates for cargo and shell species.

II. MOLECULAR SIMULATIONS
Fig. 1 depicts an example assembly pathway of this molecular model.Trajectories are initiated with a small droplet of cargo (comprising a few hundred cargo monomers) and a handful of proximate shell monomers, as described in Sec.S5.Under conditions favorable for assembly, such a droplet faces no thermodynamic barrier to growth; absent interactions with the shell, its radius R would increase at a constant average rate as cargo material arrives from the droplet's surroundings.Growth of the shell, on the other hand, is impeded by the energetic cost of wrapping an elastic sheet around a highly curved object.In early stages of this trajectory, shown in Fig. 1 (C), the net cost of encapsulation is considerable, and the population of shell monomers remains small as a result.
As the droplet grows, this elastic penalty is eventually overwhelmed by attractions between shell and cargo, similar to the mechanism of curvature generation by nanoparticles adsorbed on membrane surfaces [32][33][34].At a characteristic droplet size R * , encapsulation becomes thermodynamically favorable.If the arrival rate of shell monomers is much higher than that of cargo (e.g., due to a higher concentration in solution), then a nearly complete shell will quickly develop, Fig. 1 (D), hampering the incorporation of additional cargo.The nascent shell that results, while sufficient to block cargo arrival, is highly defective and far from closed.A slow healing process ensues, dominated by relaxation of grain boundaries between growth faces, Fig. 1 (E).This annealing process is, in part, achieved by relocating topological defects in the structure, a phenomenon that has been previously encountered in ground state calculations for closed elastic shells [35].
In the vast majority of the hundreds of assembly trajectories we have generated, healing leads ultimately to a completely closed structure with exactly 12 fivefold defects.Placement of these defects is often irregular and unlikely to yield minimum elastic energy, but further evolution of the structure is extraordinarily slow.Subsequent defect dynamics could produce more ideal shell structures as in Ref. [35], and transient shell opening could allow additional cargo growth.But these relaxation processes require the removal of shell monomers that are bound to several others and that interact strongly with enclosed cargo.Under the conditions of interest, the time scale for such a removal is vastly longer than the assembly trajectories we propagate.Our model microcompartments are equilibrated with respect to neither shell geometry nor droplet size, yet they are profoundly metastable, requiring extremely rare events to advance towards the true equilibrium state.
In addition to generating molecular trajectories of microcompartment assembly, we have performed umbrella sampling simulations to compute the equilibrium free energy of the molecular model as a function of N s and the number N c of cargo monomers.Results, presented in Sec.S4, underscore the mechanistic features described above: c /k 0 s = 0.001, all other parameters given in Sec.S4).Several structures along the assembly trajectory are shown near the corresponding values of R and θ.Encapsulation is thermodynamically favorable for R > R * (lower dashed line).The free energy barrier to encapsulation is smaller than the thermal energy kBT for R > R ‡ (Eq.S63) (dot-dashed line).(B) Histograms of microcompartment radius when θ reaches π (at which point dynamics cease).Ten thousand independent trajectories were collected for k 0 c /k 0 s = 0.1, 0.01, and 0.001.Radius values are given in a unit of length that is comparable to the shell monomer size (see Eq. S37).
a small critical nucleus size for cargo condensation, and a size-dependent thermodynamic bias on encapsulation that becomes sharply favorable at a characteristic length scale.In the next section we examine the origins and consequences of these features in a more idealized context.

III. MINIMALIST MODEL
The scenario described above for our molecular model can be cast in a simpler light by considering as dynamical variables only the amounts of cargo and shell material in an idealized geometry.In this minimalist approach we focus on the radius R of a spherical droplet and the polar angle θ subtended by a contacting spherical cap of shell material, as depicted in Fig. 2 (A).Taking the cargo and shell species to have uniform densities ρ and ν within the droplet and cap, respectively, the geometric parameters R and θ can be simply related to the monomer popula- In the spirit of classical nucleation theory, we estimate a free energy landscape in the space of R and θ through considerations of surface tension and bulk thermodynamics.For our system these contributions include free energy of the condensed cargo phase, surface tension of a bare cargo droplet, free energy of a macroscopic elastic shell, and line tension of a finite shell, together with the energetics of shell bending and shell-cargo attraction.As shown in the Sec.S3, this free energy F (R, θ) can be written in the form where the energy scales a c and a s and the length scales R c , R * , and R s (all of which can be related to parameters of the molecular model) transparently reflect the thermodynamic biases governing growth and encapsulation.
The function F (R, θ) can be minimum only at R = 0 and R = ∞.As in the molecular model, finite microcompartments can appear and persist only as kinetically trapped nonequilibrium states.Contours of this schematic free energy surface, plotted in Fig. 2 (A), are consistent with basic thermodynamic trends observed for the molecular model.Most importantly, they manifest the prohibitive cost of encapsulating small droplets.Only for radii R > R * is shell growth thermodynamically favorable.This characteristic length scale is determined by a straightforward balance among the shell's bending rigidity, the macroscopic driving force for flat shell growth, and the attraction between shell and cargo, as detailed in Sec.S3.Whether encapsulation occurs immediately once the droplet exceeds the critical size R * depends on kinetic factors that are only partly governed by F (R, θ).Line tension of the shell, represented by the final term in Eq. 6, effects a barrier to encapsulation.As the droplet size increases, the height of this barrier declines, and encapsulation becomes facile only when it is sufficiently low.As with our molecular model, the subsequent dynamics may deviate significantly from equilibrium expecations based on the free energy surface.To explore these nonequilibrium outcomes, we must supplement the free energy surface F (R, θ) with a correspondingly minimal set of dynamical rules.
We take the rate of shell monomer addition to be a product of the per-edge rate of monomer arrival k 0 s and the number of edges that can bind new monomers, Similarly accounting for the number of sites where cargo can be added, we take the rate of cargo monomer addition to be The ratio of addition and removal rates is completely specified by the constraint of detailed balance.Defining the shell monomer and cargo monomer free energy differences we require that the rates of monomer removal satisfy and Stochastic trajectories for this minimalist description were generated by advancing the populations N s and N c in time with a kinetic Monte Carlo algorithm based on the transition rates k + s , k + c , k − s , and k − c .By construction, these rates are consistent with the equilibrium statistics implied by ( 6), but the geometry dependence of addition and removal rates ensures that typical trajectories do not simply follow a gradient descent on the free energy surface F (R, θ).The sampled paths shown in Fig. 2 (A) (initiated with N s = 1 and N c = 10, and terminated when N s ≥ 2N 2/3 c as described in Sec.S4) indeed exhibit strong geometric biases.
Most notably, θ increases rapidly at late times in these successful assembly trajectories, despite equilibrium forces that encourage droplet growth at the expense of reduced shell coverage.This route reflects an imbalance in the base addition rates k 0 s and k 0 s of cargo and shell material, which suppresses droplet growth while the shell develops.If the addition of shell material does not outpace that of cargo, then the angle θ decreases in time-the microcompartment grows but never advances towards closure, Figs.S5 and S6.Since the addition rate of cargo scales with the droplet's surface area, while the rate of adding shell material scales only with the shell's perimeter, successful encapsulation generally requires that the imbalance in addition rates be substantial, which could be straightforwardly achieved through a large excess of shell monomers in solution relative to cargo.When the shell is able to envelop the cargo with only modest changes in droplet size, the distribution of sizes can be quite narrow, as shown in Fig. 2 (B).

IV. DISCUSSION
The minimalist model defined by Eqs.7-13 explains the general course of our more detailed molecular assembly trajectories.But its assumption of ideally spherical geometry is a rough approximation, as evidenced by asymmetric droplet shapes in the molecular trajectory of Fig. 1.Fluctuations in cargo droplet shape are discouraged by its surface tension, but they are inevitable on small length scales.In extreme cases they lead to highly aspherical microcompartments akin to distended structures that have been observed in experiments [24].More common shape fluctuations have mechanistic influences that are more subtle but nonetheless important.These act to reduce local curvature and thus facilitate growth of a nearly flat shell patch over considerable areas even in the early stages of assembly.Enveloping the entire cargo globule still awaits sufficient cargo growth to reduce the average curvature below a threshold of roughly 1/R * .The necessary curvature that ultimately develops in the shell is also spatially heterogeneous.Protein sheets, which generally resist stretching more strongly than bending, tend to concentrate curvature at topological defects and along lines connecting them [36].As a kinetic consequence, bending is advanced primarily by discrete, localized events, in particular the formation of fivefold defects during shell growth.Indeed, the dynamics of encapsulation are noticeably punctuated by the appearance of topological defects, as seen in Movie S1.
Systems out of equilibrium often support rich spatial and temporal patterns, but in many cases they are highly variable and challenging to harness or control.By contrast, the encapsulation process we have described is quite reliable.Fig. 3 (A) shows closed structures result-ing from assembly trajectories of our molecular model.They are not perfectly icosahedral (Figs.S7 (A) and S7 (B)), echoing the discernibly nonideal geometries observed in electron micrographs of carboxysomes [25].Our assemblies are nonetheless roughly isotropic and distinctly faceted, as quantified by the distributions of sphericity and defect angles shown in Fig. 3 (B) and Fig. S7.Gently curved regions do sometimes appear, as they do in the laboratory.In our model this smoothness reflects the presence of paired defects (one fivefold and one sevenfold) that sometimes form in the process of shell closure, screening the elastic interactions that give rise to faceting [36].
In published [26] and concurrent work, Hagan et al. have computationally examined assembly dynamics of microcompartments stabilized by spontaneous curvature of the protein shell.Experimental data suggest that key shell proteins of the carboxysome tend, by contrast, to bind in an unbent geometry.But existing measurements do not definitively exclude a role for mechanically preferred curvature in carboxysome assembly.Quantitative characterization of the elastic properties of shell protein sheets in the absence of cargo would help clarify this issue.Systematic measurement of microcompartment geometry as a function of protein concentrations would provide a more stringent test of our nonequilibrium mechanistic hypothesis, which can be achieved with reconstitution experiments.
The nonequilibrium assembly mechanism revealed by our model has a particularly spare set of essential requirements: One needs only a monomer that assembles planar, elastic sheets and has a directional affinity for cargo that is prone to aggregation.Tuning the concentrations of these components provides a direct route to generating monodisperse, nanoscale, self-organizing structures.The simplicity of the process presents opportunities to investigate the dynamics of shell assembly in highly controlled experimental settings, for example using colloidal particles.Such a platform has already been demonstrated for a related example of kinetically defined microsctructure assembly, namely bicontinuous gels stabilized by colloidal interfaces [37].Our results suggest that kinetic control can be used for precise, nanoscale assembly in a surprisingly general context.

SUPPLEMENTARY MATERIALS
FIG. S1.A still from the animation of an assembly trajectory is shown with the shell components in gray and the cargo as spheres centered at the cargo lattice sites.In order to clearly show the shell structure, only occupied lattice sites that are above a distance cutoff of 1.5l0 are shown in our visualizations.

S1. MONTE CARLO DYNAMICS OF THE MOLECULAR MODEL
The Monte Carlo dynamics of our molecular model for microcompartment assembly involves several kinds of trial moves, each constructed to be reversible and to preserve the grand canonical probability distribution p(X, N s , {σ}) in Eq. [5].Here we detail these moves and their associated acceptance probabilities.For the cargo we require only a cargo addition/removal move.For the shell we employ: vertex displacement moves, simple monomer insertion/deletion moves, wedge insertion/deletion moves, and vertex fusion/fission moves.

A. Vertex distribution function
Representing the protein shell as a triangulated sheet introduces a subtlety to the grand canonical distribution function p(X, N ).Triangles in the sheet each correspond to a molecular monomer with 3 × 3 degrees of freedom, corresponding to the positions of each of its constituent vertices.Physically, a pair of such monomers would possess 2 × 3 × 3 degrees of freedom even when bound.Represented as a "sheet" comprising two juxtaposed triangles (with a total of four vertices), however, a bound dimer possesses only 4 × 3 degrees of freedom.The implied constraint mimics strong and short-ranged forces of cohesion between bound monomers, which limit the fluctuations of a bound vertex to a very small volume v a about its binding partner.For a favorable binding energy bind that is constant within this volume, the binding equilibrium for a pair of monomers is parameterized by the resulting vertex affinity K = v a e −β bind [38].In setting v a = 0, we implicitly take bind → −∞ such that K remains nonzero and finite.
The triangulated sheet representation does not resolve fluctuations of bound vertices on the irrelevantly small length scale v 1/3 a .These fluctuations have effectively been integrated out, with each bound pair contributing a factor of K to the probability distribution for sheet vertices: FIG. S2.A vertex is randomly displaced and the energy difference between the initial and final states is computed.The move is then accepted or rejected so as to guarantee detailed balance as detailed in the text.
where N bind = 3N s − N vertex , and X denotes the configuration of the set of N vertex distinct vertices comprising the triangulated sheet.The activity of shell monomers, has units of (length) −9 reflecting the 9 degrees of freedom for an unbound monomer.As written in Eq.S2, λ is an arbitrary length scale that sets the compensatingly arbitrary zero of chemical potential µ s .

B. Metropolis acceptance probability
Our simulations are designed to sample the probability distribution p( X, N vertex , {σ}).In a Metropolis Monte Carlo scheme, detailed balance with respect to this distribution is ensured by accepting proposed moves with probability where Γ is shorthand for the variables ( X, N vertex , {σ}) that define the system's microscopic state, and gen(Γ → Γ ) is the probability of proposing Γ as a trial state from the existing state Γ.Details of our Monte Carlo moves, and the resulting changes in equilibrium probability p(Γ) that result, are described below.For those that change the number of shell monomers, acceptance probabilties involve factors of z s and K in addition to the energetics described by H shell and H int .Several of these trial moves have asymmetric generation probabilities, gen(Γ → Γ) = gen(Γ → Γ ), which must be accounted for as well.

C. Vertex displacement
Elastic relaxation of the shell is accomplished by conventional Monte Carlo displacement moves.Such a move selects a vertex at random and attempts to displace it by an amount ∆a that is chosen at random from a symmetric distribution, as depicted in Fig. S2.The standard acceptance criterion is given in Eq. [4].
For a state Γ with N vertex (Γ) shell monomers, a Monte Carlo sweep is defined as a sequence of N vertex (Γ) displacement moves, each acting on a randomly chosen vertex.

D. Simple monomer insertion/deletion
Our shell growth moves are depicted in Fig. S3.They are typically attempted less than once per sweep, at a rate proportional to the number N perim of edges at the perimeter of the sheet.Precisely, in each sweep we propose a shell monomer insertion move with probability N perim k 0 s τ (where k 0 s τ is presumed to be much less than 1/N perim ).First, one of the N perim edges is selected at random.Each of these edges corresponds to a particular monomer i.The type of growth move that is proposed depends on the local connectivity and geometry of the sheet.In a simple monomer insertion move (Fig. S3 (A)), addition of a new shell monomer is proposed by adding a single vertex that is connected to the two existing vertices defining the selected edge.Because the vertices of bound monomers are constrained to coincide exactly along the shared edge, a new monomer defined by three vertices can be introduced by adding just one new vertex.The position of the new vertex is generated by first identifying an "ideal" insertion site a ideal , lying in the same plane as monomer i, at a distance √ 3/3l 0 from the midpoint of the selected edge, and equidistant from the edge's two endpoints.The position a of the new vertex is chosen randomly within a small volume v add centered on a ideal .The new monomer's outward normal vector is chosen to have a positive projection onto that of monomer i.The generation probability for such a simple monomer insertion is The reverse of a simple addition move deletes a shell monomer by removing a single vertex and its two edges.We first select one of the N perim (Γ ) edges at the perimeter of the sheet.A monomer added by simple insertion can be removed by selecting either of its two exposed edges.The generation probability for simple deletion is therefore The grand canonical probability of Γ relative to the initial configuration is The two factors of K account for constraining positions of two of the vertices defining the new monomer.The energy difference ∆H due to changes in H shell + H int involves only the elastic energy of the edges of the new monomer and the interaction with nearby cargo lattice sites.The Metropolis acceptance criterion for the simple monomer insertion move is thus In the acceptance probability for simple monomer deletion, acc(Γ → Γ), the second argument of the min function is inverted.Some surface edges are not eligible for simple insertion in our protocol.Specifically, in many configurations it is possible to add a shell monomer by filling in a "wedge" at the sheet's perimeter, as depicted in Fig. S3 (B).We do not attempt simple insertion moves at the edges defining such a wedge.When such an edge is selected, we instead propose the wedge insertion move described below.

E. Wedge insertion/deletion
In some shell configurations a new monomer can be added without introducing any new vertices at all.Geometrically, this corresponds to filling in an empty wedge at the sheet's boundary, as sketched in Fig S3 (B).We detect this possibility by checking if two disconnected perimeter vertices share a neighbor and are within a distance l < l max .Since the new monomer is completely constrained in this case, the corresponding generation probability is whose factor of 2 reflects that the same insertion can result from selecting either of the two existing edges that define the wedge.The generation probability for the reverse move is simply The grand canonical probability of Γ relative to the initial configuration is As with the simple insertion move, factors of K account for constraining positions of vertices defining the new monomer.The energy difference ∆H due to changes in H shell + H int again involves only volume exclusion, the elastic energy of the edges of the new monomer, and the interaction with nearby cargo lattice sites.The resulting Metropolis acceptance criterion for wedge insertion is In the acceptance probability for wedge monomer deletion, acc(Γ → Γ), the second argument of the min function is inverted.

F. Vertex fusion/fission
The insertion moves described above represent events in which shell monomers arrive from bulk solution and bind to the growing sheet.A separate move is required to allow for binding between monomers that already belong to the sheet.In the triangulated sheet description, such an event fuses one or more existing vertices, so we term them "fusion" moves (and "fission" in reverse).At each sweep we attempt a fusion move with probability N fuse k 0 fuse τ , where N fuse is the number of vertices eligible for fusion events.The reverse moves, i.e., fission, are attempted with probability N fiss k 0 fuse τ , where N fiss is the number of eligible fission events.In each case one of the eligible moves is selected at random and then proposed as detailed below.
Some fusion moves act to close gaps between monomers that already share a common vertex, as depicted in Fig. S4.In such a "stitching" move, two vertices i and j, initially separated by a distance less than l fuse , are merged into a single vertex.The new vertex is placed at the midpoint of i and j.The generation probability for a stitching move is The corresponding change in grand canonical weight is The reverse of a stitching move splits one surface vertex i into two.One of the new vertices is placed at a position randomly distributed within a spherical volume v fuse = 4πl 3  fuse /3 centered on the existing vertex.The other new vertex is placed opposite the original vertex, such that i coincides with the midpoint of the new vertices.The resulting generation probability The acceptance probability for a stitching fusion move is thus In the acceptance probability for the corresponding fission move, acc(Γ → Γ), the second argument of the min function is inverted.For a shell that is sufficiently large and sufficiently curved, two different parts of the perimeter can meet.When this happens, binding of the two parts should be quite facile, since it is not necessary to await arrival of a new monomer from the dilute solution.Physically, this union event is just another instance of two shell monomers binding.Algorithmically, it is more complicated, involving a highly nonlocal change in vertex connectivity.
We permit such a nonlocal connection move only when two edges at the sheet's perimeter nearly coincide.Denoting the vertices of one edge as i 1 and i 2 , and those of the other edge as j 1 and j 2 , we require that i 1 and j 1 are separated by a distance less than l fuse and that i 2 and j 2 also reside within the short distance l fuse .One new fused vertex is placed at the midpoint of i 1 and j 1 ; the other is placed at the midpoint of i 2 and j 2 .The resulting generation probability is In the acceptance probability for the corresponding nonlocal fission move, acc(Γ → Γ), the second argument of the min function is inverted.
Counting fission moves is more involved than counting fusion moves: For the local stitching scenario, a vertex i on the perimeter is eligible for fission along an edge ij if the vertex j is not on the perimeter.In the case of nonlocal fission, a perimeter vertex i is eligible to be split if it has a neighbor j which is also on the perimeter and if removal of the edge does not lead to two topologically disconnected domains on the shell.In practice, we test this condition efficiently using a breadth first search.

G. Cargo addition/removal
For a given configuration {σ} the cargo droplet possesses N boundary sites adjacent to the droplet boundary.This set includes unoccupied sites with one or more occupied neighbors, where a new cargo monomer can be accommodated.It also includes occupied sites that are incompletely coordinated, where a monomer could be removed without changing the droplet's connectivity.During each sweep we attempt with probability N boundary k 0 c τ to change the cargo configuration.We select one of the boundary sites at random and attempt to change its state, either inserting or deleting a cargo monomer.The Metropolis acceptance probability for this procedure is acc(Γ → Γ ) = min 1, e −β(∆Hcargo+∆Hint) . (S20)

H. Topological defects
In the absence of cargo, the ground state of a shell protein sheet features only six-fold coordinated vertices except at the perimeter.In order to close, however, the shell must develop either vacancies or else undercoordinated vertices.Five-fold coordinate sites can arise as a consequence of insertion or fusion moves, and they play an important role in encapsulation.
The Monte Carlo moves we have described could also produce overcoordinated vertices.Because there is no evidence for seven-fold (or higher) symmetric protein assemblies in the carboxysome literature, we prohibit these defects from forming, with a single exception.A seven-fold defect is allowed to result from fusion that causes two fivefold defects to share a neighbor.This arrangement can be viewed as accommodating a vacancy in the monomer sheet, rather than creating a heptameric arrangement of shell monomers.The large favorable energy for such a fusion event makes the distinction between vacancy and excess coordination thermodynamically unimportant.

I. Constraints and parameterization
The constraints on connectivity we impose in these dynamics -there is exactly one connected cargo droplet and exactly one connected sheet -are artificial.This is done as a matter of convenience, with substantial computational savings.The constraint should not significantly affect assembly dynamics provided that nucleation of cargo and shell are both slow compared to their growth.The physical conditions we examine fall in this regime.
The set of parameters governing our molecular simulations are listed in Tables S2 and S3.One key feature is that elastic relaxation is very fast relative to monomer addition, i.e., k 0 s τ 1 and k 0 c τ 1.Another is that shell growth is highly unfavorable in the absence of cargo, as guaranteed by small values of z s K 2 v add and z s K 3 .The potent attraction between shell and cargo makes simple shell monomer addition feasible but reversible when cargo is present on the interior side of the new monomer.Wedge insertion next to cargo is even more favorable and is typically accepted with high probability.
In some cases, two disconnected edges can come together so as to enclose a triangular domain, but one which was not explicitly occupied by a monomer before the fusion event.In the triangulated sheet representation, the newly enclosed domain is assumed to be occupied by a shell monomer.Such a move therefore implicitly entails the arrival and binding of a new monomer.Because this binding is strongly downhill in free energy for the physical conditions of interest, we treat it as inevitable and irreversible, and we do not represent it explicitly.

A. Macroscopic equilibrium
In order to assess the equilibrium arrangements of cargo and shell species, we view their assembly as a chemical transformation, in which N s shell capsomers S combine with N c cargo subunits C to form a complex A Ns,Nc , which might or might not be a completely closed microcompartment.This process can be represented as a simple kinetic scheme At thermodynamic equilibrium, stability criteria require that the chemical potentials µ associated with the constituent materials are balanced.(S22) Assuming these particles are present at very dilute concentration, the chemical potential of each species j can be written as a function of its density, the temperature T , and a density-independent standard state chemical potential µ (0) , µ j (T ) = µ Simulations were visualized using the VMD software package [39].In order to clearly show the shell structure, only occupied lattice sites that are above a distance cutoff of 1.5l 0 are shown in our visualizations.The cargo is depicted either as density isosurfaces or spheres centered at the lattice sites.In the Movie S1, the shell is depicted as dynamic bonds with a distance cutoff of 1.5l 0 , which leads to unconnected vertices appearing bonded in some frames.S3.Simulation parameters for the elastic shell model.All energies are given in units of kBT , and lengths in units of 0.

500.
stretching energy scale κ 12.5 bending energy scale l0 1. equilibrium shell bond length lmax 1.5 l0 shell bond length cutoff lmin 0.5 l0 shell bond length cutoff l fuse 0.5 l0 distance cutoff for fusion attempts l add 0.5 l0 radius for fusion proposal ln K 7 binding affinity scale parameter ln z -16.5 activity parameter k 0 s τ 0.0075 rate of shell insertion attempts

S4. FREE ENERGY CALCULATIONS FROM THE MOLECULAR MODEL
We performed a series of umbrella sampling calculations to compare the thermodynamics of our molecular model with the minimalist model.First, we considered a small cargo droplet N c = 135, corresponding to a sphere with diameter 4l 0 .As shown in Fig. S8, the cost of deforming the shell makes encapsulation uphill in free energy.In the case of a droplet with N c = 429, a sphere of diameter 6l 0 , which is above the nucleation barrier for cargo growth, we find that there is a modest barrier to the initial growth which becomes downhill as N s increases.defect vertices in a structure, of which there are N d .We also define B(v i ) to be the set of bonds connected to vertex v i which has elements ij.Then, the faceting parameter is We quantified the size of assembled microcompartments according to their "effective" radii.We defined the effective radius as  S11.Histograms of assembled microcompartments' effective diameter from simulation of our molecular model (blue).We also show experimental data (red) from Ref. [25].
Fig. S11, together with experimental data reported in Ref. [25].It is challenging to make a quantitative comparison in this case because the number of experimental observations is small.

FIG. 1 .
FIG.1.Model dynamics of microcompartment assembly.(A) Cargo monomers (green spheres) in our molecular model occupy sites of an FCC lattice, experiencing short-ranged attraction to their nearest neighbors and also to protein monomers comprising the shell.Each shell monomer corresponds to a triangle in a discretized shell (gray) that resists bending and stretching according to an elastic Hamiltonian.Closure of the shell requires the presence of topological defects in the triangulated surface, vertices that are connected to only five neighbors.Red spheres here and in other figures highlight the locations of these defects.(B) A fully assembled microcompartment includes at least twelve of these defects.Here and in other figures, we show the boundary of the cargo droplet as a translucent green surface.(C) At early time t in the assembly process, high curvature of the cargo droplet limits shell growth to an area that is relatively flat while maintaining contact with the cargo.(D) Droplet growth reduces curvature until encapsulation becomes thermodynamically favorable and kinetically facile (Movie S1).(E) The nearly closed shell effectively halts cargo aggregation, but the approach to a simply connected envelope proceeds slowly as defects reposition and combine to heal grain boundaries.

FIG. 2 .
FIG. 2. Minimalist model for dynamics of cargo growth and encapsulation.(A)Our phenomenological model resolves only the radius R of a cargo droplet, and the polar angle θ of a spherical cap that coats it.Black contours indicate lines of constant free energy F (R, θ).Green lines show the course of ten kinetic Monte Carlo trajectories under conditions favorable for microcompartment assembly (k 0 c /k 0 s = 0.001, all other parameters given in Sec.S4).Several structures along the assembly trajectory are shown near the corresponding values of R and θ.Encapsulation is thermodynamically favorable for R > R * (lower dashed line).The free energy barrier to encapsulation is smaller than the thermal energy kBT for R > R ‡ (Eq.S63) (dot-dashed line).(B) Histograms of microcompartment radius when θ reaches π (at which point dynamics cease).Ten thousand independent trajectories were collected for k 0 c /k 0 s = 0.1, 0.01, and 0.001.Radius values are given in a unit of length that is comparable to the shell monomer size (see Eq. S37).

FIG. 3 .
FIG. 3. Structural variation among microcompartments generated by our molecular model.(A) Structures from the ensemble evincing the distinct regimes of sphericity and faceting.Note that two dimensional projections can obscure some of the variation.(B) Typical assemblies are not substantially elongated, as indicated by a histogram of the sphericity α (Sec.S7).(C) Faceting necessitates relatively sharp angles between shell monomers surrounding a fivefold defect (Sec.S7).A histogram of this angle θ defect (averaged over all defects in an assembled structure) underscores the typically strong faceting visible in (A).

FIG
FIG.S4.A schematic depiction of the stitching fusion move.

K 2
FIG.S5.The cargo molecules, represented as a lattice gas, interact with the shell through an interaction Hint that is favorable (by an amount γin) if the monomer's inward facing normal vector terminates inside a cargo-occupied lattice cell.The interaction is unfavorable (by a large amount γout) if the outward facing normal vector terminates inside a cargo-occupied lattice cell.The latter contribution serves to bias the shell fluctuations so that shell monomers are sterically occluded by the cargo.

N
s µ s + N c µ c = µ Ns,Nc .
FIG. S6.Histograms of the radii of the encapsulated structures computed from kinetic Monte Carlo simulations.(A) κ = 100.(B) κ = 300.Data was collected from 10 4 independent KMC simulations.Cargo radii are given relative to the length scale = (2πν) .
FIG. S7.Trajectories of the minimal model illustrating runaway growth.The cargo and shell arrival rates in this case are equal, k 0 c /k 0 s = 1.The trajectories do not encapsulate after 10 6 steps of KMC dynamics.Cargo radius is given relative to the length scale = (2πν) −1/2 .

Finally
FIG. S10.Histograms of Q6 and Ŵ10 from final structures of the simulations of the microscopic model.

TABLE S1 .
Notation used to define the state of our molecular model and to formulate Monte Carlo moves that satisfy detailed balance.
S49)which assumes an idealized spherical droplet.While this measure of the radius is only approximate, it readily provides insight into the degree of size heterogeneity.A distribution of assembled compartment size is shown in