# Design principles for self-assembly with short-range interactions

See allHide authors and affiliations

Edited by Peter G. Wolynes, University of California, San Diego, La Jolla, CA, and approved February 4, 2011 (received for review September 20, 2010)

## Abstract

Recent experimental advances have opened up the possibility of equilibrium self-assembly of functionalized nanoblocks with a high degree of controllable specific interactions. Here, we propose design principles for selecting the short-range interactions between self-assembling components to maximize yield. We illustrate the approach with an example from colloidal engineering. We construct an optimal set of local interactions for eight colloidal particles (coated, e.g., with DNA strands) to assemble into a particular polytetrahedral cluster. Maximum yield is attained when the interactions between the colloids follow the design rules: All energetically favorable interactions have the same strength, as do all unfavorable ones, and the number of components and energies fall within the proposed range. In general, it might be necessary to use more component than strictly required for enforcing the ground state configuration. The results motivate design strategies for engineering components that can reliably self-assemble.

Self-assembly is a process in which simple building blocks spontaneously assemble into structures of higher complexity. Many complex biological structures (virus shells, microtubules, etc.) have achieved robust assembly, presumably by evolving specific interactions between the components.

A fundamental challenge is to determine how to engineer components that reliably assemble into complex structures. Part of this challenge requires understanding the principles for choosing component interactions. Current experimental methods allow components to interact via surface properties, charge, polarizability, magnetic dipoles, etc. (1, 2); technology is rapidly evolving to allow for novel components (3) or adhesive elements (e.g., DNA) to bind components to each other in a highly specific fashion (4–8).

A simple method for choosing component interactions, for assembly of a desired structure, is to choose short-range interactions between neighbor components so that only the desired conformation is allowed. Such “local rules” can be thought of as templates for energetically favorable arrangement of components and suggest the number of component types that are required to assemble a structure. A prominent and early example of this general approach is the theory of virus shells; Crick and Watson (9) and Caspar and Klug (10) showed how virus shells can assemble out of a small number of irregularly shaped but configurable capsid proteins with local interactions. The field of DNA self-assembly (11–13) is based on the idea of tuning local binding interactions between DNA strands to facilitate assembly of complex structures. Theories for viral shell assembly show that local rules (14, 15) specify the assembly of the desired structure by limiting the combinatoric space of all possible arrangements. Local-rule-based approaches have been used extensively by Winfree and collaborators (16, 17), who have demonstrated that the number of component types needed for assembly is bounded by the algorithmic complexity of the structure.

The major issue with specifying the properties of components through only local rules is that this neglects the role of thermodynamics and energetics for determining the yield of the assembled structures. Clearly, the energy of every bond must be tuned to maximize yield: If the bond strength is too low, the structures will fall apart; if it is too high, mistakes are more difficult to correct. On the other hand, energetic considerations could also change the number of components required to maximize yield. Although local rules give a lower bound on the number of different components required to assemble a structure, it is possible that increasing the component heterogeneity above this lower bound could improve yield.

To date, these considerations have been dealt with empirically. The goal of this paper is to introduce a simple model to determine how the number of components as well as their interaction energies can be tuned to maximize yield. We show that to maximize yield, it is in general advantageous to use more component types than strictly required by local rules.

We demonstrate these principles on an example from colloidal engineering, inspired by recent analyses of colloidal hard sphere clusters (18, 19). A cluster of *N* identical colloids has strong degeneracy of the ground state when 6 ≤ *N* ≤ 10 (20), so that the quilibrium probabilities for the different clusters is determined by entropy, with rotational entropy strongly suppressing the yield of the most symmetric clusters (18). One way to favor such clusters is to use specific binding between the spheres by labeling them with DNA (5, 8, 9). We consider the specific case of eight particle clusters, where the ground state is 16-fold degenerate, and show how to design the labels to optimize the yield for the most symmetric cluster. The example shows how the design principles outlined here can be used in practice.

## The Model

We consider a general scenario for assembly, in which a set of objects interact with each other through local, short-ranged, interactions to form a desired structure or pattern. The objects are endowed with two sets of labels: geometrical labels, which generates the pattern, and energetic labels, which specifies the interaction energies with other objects. As an example, consider the self-assembly of a puzzle: The images on each piece of the puzzle are the geometrical labels-proper alignment of the pieces gives the desired pattern. The energetic labels are the contours of each puzzle piece, which determines the set of allowed nearest neighbors. The same geometric labels can have different physical labels, which reduces the degeneracy of the desired pattern. Fig. 1 demonstrates the tiling of a square lattice by (*i*) two types of physical monomers with two different geometrical labels and (*ii*) six types of physical monomers (each of which has four favorable interactions) and two types of geometrical labels. Although both patterns give the same geometric pattern (a two dimensional checkerboard of 1 and 2) they are coded for by different energetic interactions.

This simple case generalizes easily to monomers arranged on or off lattices in any dimension. We assume an infinite reservoir of monomers so that the probability of occurrence of any particular monomer type is constant. There could be various global constraints imposed on how the monomers are arranged. For instance, one might require that there exist a fixed linear order to the monomers being arranged on the lattice (21–23). Such global constraints restrict the total number of allowed configurations. We denote this total number as *γ*^{N}, where *N* represents the number of monomers in the system, and γ, the number of allowed configurations per monomer. The energetic coordination number of each lattice site, the number of nearest neighbor sites that energetically interact with the site, is denoted by *z*.

We would like to derive principles for choosing interactions between the objects to maximize the yield of a desired configuration. This general question has been previously addressed in the problem of protein design, and we adapt some of the arguments invented there (23–25) for the present class of examples.

### Colloidal Clusters

For concreteness, we present our arguments by solving the aforementioned example from colloidal engineering; the approach followed gives a general prescription that could be applied to any problem in which our assumptions hold. Our example involves assembling a particular cluster of eight colloidal spheres. Recent work (18, 20) has demonstrated that there are 16 different rigid hard sphere clusters of eight particles and that these different clusters have identical number of contacts (18) and hence the same total binding energy. Such clusters have also been studied extensively using the short-ranged Morse potential (26–29). For identical particles, the equilibrium probability of the clusters is dictated by entropy alone, with experimental measurements of equilibrium probabilities for the different clusters agreeing quite well with theoretical predictions (18). Highly symmetrical clusters have the lowest equilibrium probabilities, with the *T*_{d} cluster (Fig. 2) being observed only approximately 1% of the time. Studies of atomic clusters have also highlighted the importance of permutational symmetry (30).

The question is to what extent can the equilibrium yield of the *T*_{d} cluster be enhanced by using nonidentical spheres—in particular, by using specific binding between the spheres (e.g., labeling them with DNA) (5, 8, 19).

In what follows, we will answer how many sphere types are required and with what energies they should interact with each other to maximize yield. Within our general framework, the coated spheres are the energetic labels, and the cluster type, the desired arrangement of geometrical labels. Energetic interactions between the spheres are assumed to be pairwise and mediated through contact. The interaction energy between the spheres is defined by the interaction matrix. The *i*,*j*th element of this matrix is the energetic cost of sphere type *i* making contact with sphere type *j*. Our objective is to find the interaction matrix so that the *T*_{d} cluster is selected in thermodynamic equilibrium and without kinetic arrest.

### Random Interactions.

Before addressing this design problem, it is useful to first consider the behavior of a typical case, in which eight sphere types are present. Each element of the 8 × 8 interaction matrix is chosen independently from a Gaussian distribution with mean 0 and standard deviation 5.

The partition function is composed of all the distinct permutations of distinguishable particles for the 16 packings. For each cluster, we enumerate the 8! permutations of the spheres. Then we discard the permutations that are rotated versions of those already considered. This means that clusters with higher rotational symmetry will have a fewer number of configurations and are less favored entropically. This is equivalent to the rigorous method where all permutation-inversion isomers are considered and those related by geometrical symmetry operations discarded (26, 31).

The total number of configurations for this system *γ*^{N} is 504,000. Next, we calculate the energy of each configuration. The packing type and arrangement of spheres within it determine the contacts present in a given configuration. We simply sum the energy of all the contacts to calculate the energy of the configuration. The histogram in Fig. 2 shows the distribution of energies of all configuration for a typical random interaction matrix. The red curve shows that the distribution of energies is well fit by a Gaussian.

That the energy distribution is well approximated by a Gaussian follows directly from the Random Energy Model (REM) (24, 25, 32–34), which assumes that the energy levels are independent and identically distributed (i.i.d.). In this case, the central limit theorem implies that in the thermodynamic limit *N* → ∞, the energy distribution is given by [1]where σ is the standard deviation of the interaction energies, computed over the a priori occurrence probability of pairwise interactions. In the present example, we assume no prior knowledge of the possible configurations. The a priori probability is a uniform distribution.

Although the assumption of independent interactions is incorrect, it nonetheless well approximates the mean and variance of the energy distribution. To show this, we enumerated the energy distribution *p*(*E*) for different random interactions matrices. Fig. 3 compares the predicted mean and standard deviation to the result of the enumerations. REM almost exactly predicts the mean, presumably because any correlations in pairwise interactions are averaged out over all the configurations. The standard deviation, however, is systematically about 0.8 times the prediction. REM always overestimates the variance of energy distribution because it assumes complete independence of interactions, when clearly correlations are present.

Given *p*(*E*), we can compute the entropy per monomer as [2]

Here and henceforth we express *S* in units of the Boltzmann constant *k*_{B}. It is evident that there exists a critical energy, [3]below which no corresponding configuration (or equivalently zero entropy) exists. This notion of entropy depletion has been studied extensively in the context of the glass transition (35–37). The partition function can be evaluated using the saddle point approximation, implying that above a critical standard deviation of the energy distribution, [4]the entropy is dominated by constant contribution at the critical energy *E*_{c}. The free energy *F*_{REM} = *E* - *TS*, then takes the form, [5]

## An Optimal Set of Interactions

The previous section demonstrated that a typical interaction matrix leads to a Gaussian *p*(*E*), with mean and variance quantitatively consistent with i.i.d. interactions. The main result, Eqs. **4** and **5**, show that for kinetic trapping to be avoided the spread in interaction energies must be bounded by [6]For the case of the eight-sphere colloidal clusters, this corresponds to the constraint that *σ* < 1.2*k*_{B}*T*.

We now turn to determining the interaction matrix so that a specific desired configuration is selected from all random permutations. For this to happen, we need the free energy of this desired configuration to be smaller than all of the other states, namely for *F*_{∗}(*T*) < *F*_{REM}(*T*). We seek the form of an optimal interaction matrix, which maximizes the yield of the desired configuration over all the others.

Without loss of generality, we normalize the energies in the interaction matrix to have zero mean. Then, we claim that the *A* × *A* optimal interaction matrix is completely characterized by three parameters: (*i*) the alphabet size *A* or the number of energetic labels; (*ii*) the number of favorable interactions *n*_{f}; and (*iii*) the standard deviation of the interaction energies *σ*. In our example of eight particle clusters, alphabet sizes of *A* = 1,2,8 are shown in Fig. 2 *A*, *B*, and *C*, respectively. The case of *A* = 1 corresponds to identical particles and as discussed does not select the *T*_{d} cluster. After developing our design principles, we will compare the two- and eight-letter alphabets.

For a given alphabet size *A*, a general interaction matrix has *A*(*A* - 1)/2 parameters; the large reduction in the number of parameters occurs because of two special properties of the optimal matrix.

### It Is Always Possible to Design the Interaction Matrix So That All Interactions in the Ground State Are Favorable.

Namely, no monomer interacts with any other monomer in the designed ground state with an interaction energy that is larger than the mean of the interaction matrix. Fulfilling this condition requires that we choose a large enough *n*_{f} and *A* for the configuration in question. There always exists an *A* ≤ *z*, the energetic coordination number, for which only favorable interactions are possible. To see this in our lattice construct, consider a given configuration as a graph, with each vertex corresponding to a monomer with a unique energetic label. Each edge connects two letters that contact in the ground state, so that each vertex has *z* edges. We now color each vertex such that connected vertices have different colors. The Graph Coloring Theorem (38) demonstrates that a greedy algorithm can color the graph with *z* colors. If we assign an unfavorable interaction energy to same-color interactions and favorable to all others, the resulting interaction matrix has a ground state with all favorable interactions.

### All Favorable Interactions Have the Same Strength, As Do All Unfavorable Interactions.

The free energy of the desired configuration, *F*_{∗}(*T*), is a summation over favorable interactions, whereas the free energy of the other configurations *F*_{REM}(*T*) is a function of the variance of the energy level distribution. The energy of the ground state is , summing up the interaction energies *η*_{ij} of all monomers *i* and *j* that contact in the ground state. Recall that all such interactions are favorable. Let us fix the energy *F*_{∗}(*T*) of the ground state and ask what is the effect of letting the different (favorable) interactions have different strengths. This will clearly increase *σ*^{2}, the variance of the bond energies, and hence will increase the width of the distribution for the configurations not in the ground state. This in turn decreases the energy gap between *F*_{∗}(*T*) and *F*_{REM}(*T*). To maximize the gap for a given ground state energy *F*_{∗}(*T*), all favorable interactions should therefore have the same strength. A similar argument applies to the unfavorable interactions: Making these nonidentical also increases *σ*^{2} and hence decreases the energy gap.

### The Optimal Interaction Matrix.

Together, these results imply that the optimal interaction matrix has the following, rather simple, form: [7]

Each alphabet has a total of *A*^{2} interactions of which *n*_{f} are favorable by design. We have defined *n* = *n*_{f}/*A* as the average number of favorable interactions per letter. All favorable interactions have the same energy. The energy of the other *A*^{2} - *n*_{f} interactions is also equal but with an undesired energy cost Δ compared to favorable interactions. The weighting of *E*_{f}(*E*_{f} + Δ) with ()) ensures that *σ*^{2} is the variance of the energies. The form of the interaction matrices with *A* = 6 thus implied is shown in Fig. 4 for *n* = 1,2,3,4,5; the red and blue squares refer to favorable and unfavorable interactions, respectively. Ideally, we can select the favorable partners by assuming that if letter *i* interacts favorably with letters *c*_{1},⋯,*c*_{n}, then letter *i* + *k* will interact favorably with letters *c*_{1} + *k*,⋯,*c*_{n} + *k*, where additions are cyclic over *A*. For the trivial case *n* = 1, this gives the diagonal matrix.

We emphasize that the interactions proposed above are only optimal within the constraints of the assumptions of the model, mainly, for independent configuration energies, and in the thermodynamic limit. In practice, for finite-size systems with correlations in the energy distribution, other forms of interactions could result in higher yield. Nonetheless, later in this paper we test the proposed optimal distribution within the eight-particle colloidal cluster and demonstrate that deviations from the optimal form reduce yield.

### Bounds on *A* and *σ*.

The analysis thus far has suggested the structure of the optimal interaction matrix, with the number of favorable interactions *n* set by the form of the desired ground state. The allowed values of *A* and *σ* are set by the requirement that the ground state free energy *F*_{∗}(*T*) < *F*_{REM}(*T*); this translates into the constraint [8]Eq. **8** is a quadratic constraint on *σ*/*T*; straightforward algebra shows that when [9]there is a finite range of *σ*/*T* for which this constraint is satisfied. When *A* obeys Eq. **9**, the range of allowed *σ*/*T* is given by [10]Here the upper bound comes from the constraint that kinetic trapping is avoided, Eq. **6**. Above this upper limit, the energy barriers are too high to be traversed at temperature *T*. Below the lower limit, the desired configuration does not have sufficiently low enough energy to overcome the entropic advantage of assembly into random conformations.

As *n* increases, the minimum allowable alphabet size *A* grows. It is worth remarking that there is a lower bound on the allowed *A*; because we require at minimum a two-letter alphabet with *n* = 1, if [11]no desired configuration can be assembled even if the local-rules predict so.

### The Physical Example.

The bounds above are derived from ensemble averages over all microstates. The results also apply for off-lattice systems where *z* is the average coordination number. Furthermore, although no longer sharp, the bounds are also useful for design strategies of finite systems. For a system size *N*, we want to maximize the desired configuration’s occurrence probability, [12]For finite clusters the bounds on alphabet size and interaction energies are no longer sharp. A finite probability of observing the desired configuration is possible even with violations of the bounds.

We demonstrate these claims by returning to the eight-particle cluster example. This will serve as both a practically important test of the ideas and a prescription for how to apply them to more general examples. There are two alphabets that result in all favorable interaction for only the *T*_{d} cluster (or satisfy the local rule criterion): first, the two-letter alphabet of Fig. 2*B*, with the interaction matrix, —where *f* and *u* represent favorable and unfavorable interactions, respectively, with *A* = 2, *n*_{f} = 3, and *n* = 1.5; second, the eight-letter alphabet (Fig. 2*C*), where the interaction matrix has the same form as the cluster’s contact matrix, with *A* = 8, *n*_{f} = 36, and *n* = 4.5.

Let’s compare the two alphabets. Fig. 5 *Top* shows the allowed range of *σ* Eq. **10** by plotting the energy gap Δ, as a function of the alphabet size *A*, for *n* = 1.5,2.5,3.5,4.5. Here, Δ is the energy difference between favorable and unfavorable interactions (*f*-*u*) and is in general much smaller than the energy gap between the designed ground state and the Gaussian band of excited states. With increasing *A*, the allowed range of Δ increases. This can prove useful for systems where the number of components is more easily tuned than the interactions.

Both of the proposed alphabets violate the bound in Eq. **9**. This means that although the *T*_{d} cluster has the lowest energy, it is not favored entropically. However, the variance of the energy distribution is overestimated on the right hand side of Eq. **8** due to neglect of correlations. These correlations are especially important for finite systems and can result in a larger energy gap *F*_{∗}(*T*) - *F*_{REM}(*T*) and a lower minimum alphabet size compared to theory prediction. To get a better estimate of the minimum alphabet size we calculate the actual energy standard deviation σ by fully enumerating all configurations as explained earlier for random interactions. The predicted σ is indeed 1.7 times larger than the actual value. Solving Eq. **8** with the revised σ for *F*_{REM} gives a minimum alphabet size of 6.8 for the eight-letter alphabet and 2.3 for the two-letter alphabet. The eight-letter alphabet satisfies the lower bound on size for its actual energy variance, while the two-letter alphabet barely violates it. The eight-letter alphabet then is expected to have a larger energy gap for assembly and a higher yield. To better understand this we consider the energy of the *T*_{d} cluster –or the ground state energy.

Eq. **7** gives the favorable interaction energy and in turn the ground state energy as a function of *A* and *n* for a given σ. We showed that, with a fixed *σ*, a lower ground state energy means a larger gap between the desired cluster and the Gaussian band of random packings, resulting in a higher probability. Fig. 5 *Bottom* plots the ground state energy as a function of alphabet size for four different values of *n*. The two points corresponding to the proposed alphabets are marked with arrows. The eight-letter alphabet has a lower ground state energy. To maximize yield then, we should select the eight-letter alphabet, assign the favorable and unfavorable interaction energies according to Eq. **7**, and select the maximum practical σ less than *σ*_{c}.

To test if the proposed interactions are indeed optimal, we enumerate the full partition function of eight-sphere rigid packings with interactions. In Fig. 6 *Top* the equilibrium probability of *T*_{d} cluster is plotted for both alphabets’ optimal interaction matrix (with σ varying) as a function of the enumerated energy distribution standard deviation. The eight-letter alphabet results in a higher yield over the entire range. At the onset of kinetic freezing (*σ* = *σ*_{c}), the eight-letter alphabet enhances yield by 20% over the two-letter alphabet. The optimal specific interactions result in an equilibrium yield of 70% compared to less than 1% for the identical spheres. The largest contribution to yield is the difference between the ground state energy and the free energy of all other clusters. The entropy of the ground state (the two-letter alphabet is 48 fold degenerate, the eight-letter alphabet twofold) is less important but also enhances the yield.

We proposed as a design principle that all favorable interactions should be equivalent, as should all the unfavorable interactions. This minimized the standard deviation of the random clusters while maintaining the same ground state energy, enhancing the energy gap. To computationally test this, we randomly varied the favorable and unfavorable interactions while maintaining their respective means and the form of the interaction matrix constant. The results are plotted in Fig. 6 *Bottom*. As the interactions become more dissimilar within their kind, the distribution of cluster energies becomes larger, resulting in a reduction of yield. This confirms our claims.

Are these interaction energies physical? We only considered rigid packings with the maximum number of contacts. Experiments (18) show that a 4*k*_{B}*T* attraction from depletion interactions results in an equilibrium distribution of only rigid packings with the maximum number of contacts for eight microspheres. Specific interactions with an energy gap of 2.5*k*_{B}*T* can be introduced using a complementary DNA strand (39, 40). DNA-induced interactions between colloids have been experimentally measured (8) and are easily tunable using temperature to the desired range.

## Conclusions

The ideas herein were developed to provide a guide for the experimental design of interactions between components used in self-assembly. Recent experimental advances in controlling the shape of colloidal scale structures (3) and the highly specific interactions between them (1, 2, 4–8) have led to new possibilities for controlling the equilibrium self-assembly of complex structures. The fundamental issue that this paper sought to address is how to choose both the number and strength of the interactions between the blocks to maximize equilibrium yield of a designed structure.

Our analysis has led to a number of general principles for implementing this strategy in experiments, under the assumption that all components interact with local rules or short-range interactions. (*i*) Optimal interaction matrix: To maximize thermodynamic yield and kinetic accessibility, the interaction matrix between components should obey Eq. **7**; all favorable interactions should be of identical magnitude between different component types, as should all unfavorable interactions. (*ii*) Optimal interaction strength: The magnitude of the energy difference between favorable and unfavorable interactions obeys strict bounds as a function of alphabet size and number of favorable interactions (Fig. 5). There is in general an optimal interaction strength for maximizing yield; this idea has been previously observed in simulations of virus shell assembly and patchy particles aggregation (41, 42). (*iii*) Optimal alphabet size: The number of energetic labels (the alphabet size) could well exceed the number of geometric labels required for satisfying the local rules. An enhanced alphabet results in a wider energetic range for assembly, resulting in improved yield for systems where energies are not easily tunable.

To demonstrate utility of these ideas for off-lattice and finite systems of experimental interest, we considered the example of designing specific short-range interactions (such as DNA-mediated attractions) that would favor the *T*_{d} ploytetrahedron packing of eight microspheres. Two alphabets were proposed that would select for this cluster from geometrical considerations alone. A thermodynamic analysis revealed the optimal interactions and the superior alphabet. The equilibrium probability was enhanced from less than 1% for identical spheres to roughly 70% with optimal interactions.

These general ideas and constructions should prove useful for design of functionalized nanoblocks for self-assembly.

## Acknowledgments

We thank V. N. Manoharan and N. Arkus for a stimulating collaboration. We also thank D. J. Wales for his insightful comments on cluster symmetry. This research was supported by Defense Advanced Research Planning Agency Programmable Matter and the National Science Foundation through the Harvard Materials Research Science and Engineering Center and the Division of Mathematical Sciences.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed. E-mail: hormoz{at}fas.harvard.edu.

Author contributions: S.H. and M.P.B. designed research, performed research, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

## References

- ↵
- Whitesides GM,
- Grzybowski B

- ↵
- Rothemund PWK

- ↵
- ↵
- Glotzer SC

- ↵
- ↵
- ↵
- Crocker J,
- Russel W,
- Chaikin P

- ↵
- ↵
- ↵
- Caspar DL,
- Klug A

- ↵
- ↵
- Rothemund PWK,
- Papadakis N,
- Winfree E

- ↵
- ↵
- Berger B,
- Shor PW,
- Tucker-Kellogg L,
- King J

- ↵
- ↵
- Rothemund PWK,
- Winfree E

- ↵
- ↵
- Meng G,
- Arkus N,
- Brenner MP,
- Manoharan VN

- ↵
- Arkus N

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- Malins A,
- Williams SR,
- Eggers J,
- Tanaka H,
- Royall CP

- ↵
- ↵
- ↵
- ↵
- Wales DJ

- ↵
- ↵
- Bryngelson JD,
- Wolynes PG

- ↵
- Doye JPK,
- Wales DJ

- ↵
- Mezard M,
- Parisi G,
- Virasoro M

- ↵
- ↵
- ↵
- West DB

- ↵
- ↵
- ↵
- ↵
- Wilber AW,
- et al.

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Physics