New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Electrostatic origin of the genome packing in viruses

Communicated by Richard S. Stein, University of Massachusetts, Amherst, MA, September 26, 2006 (received for review February 15, 2006)
Abstract
Many ssRNA/ssDNA viruses bind their genome by highly basic semiflexible peptide arms of capsid proteins. Here, we show that nonspecific electrostatic interactions control both the length of the genome and genome conformations. Analysis of available experimental data shows that the genome length is linear in the net charge on the capsid peptide arms, irrespective of the actual amino acid sequence, with a proportionality coefficient of 1.61 ± 0.03. This ratio is conserved across all ssRNA/ssDNA viruses with highly basic peptide arms, and is different from the onetoone charge balance expected of specific binding. Genomic nucleotides are predicted to occupy a radially symmetric spherical shell detached from the viral capsid, in agreement with experimental data.
Viruses present one of the most elegant examples of spontaneous selfassembly. They can be produced both in vitro and in vivo, with particles of the highest degree of monodispersity. At the basic level, virus particles consist of a viral genome surrounded by a protein capsid. The genome can be singlestranded or doublestranded, composed of RNA or DNA, and stored in one or more polynucleotide chains. The surrounding capsid is assembled by the association of the repeat units of similar or identical proteins (Fig. 1).
The structure of the capsid is generally well understood, with highresolution images available from cryoelectron microscopy and xray analysis (1). The most common viruses, the socalled icosahedral viruses, have capsids of a nearly spherical shape. Specifics of the protein structure, capsid bending rigidity, and protein packing constraints may cause buckling of virus capsids into icosahedral shape. Still, the deviations from the perfectly spherical shape are generally minimal (2).
Conformations of the encapsidated genome are far less clear. Bacteriophages (3–5), for example, use external forces to package their genomes. The strong preassembled capsids of bacteriophages can withstand very high internal pressure, and the genome is densely packed inside. Many ssRNA viruses (6), on the other hand, directly bind their genome to capsid proteins. The genome may even become a prerequisite component for the capsid assembly.
In this work we look at the subclass of ssRNA viruses that bind their genome by using long and highly basic peptide arms (Fig. 1). These arms extend from capsid proteins, and their role in genomeselective binding has been under extensive investigation. Yet, most experimental studies were targeted at identifying specific binding domains. In this work we develop an electrostatic model for the genome binding. We demonstrate that the nonspecific electrostatic interactions dominate in the specified ssRNA viruses and control both the length and conformations of the genome.
Electrostatic interactions by their nature are of long range. In the densely packed environment of ssRNA viruses, contributions arise from all charged constituents, demanding a selfconsistent treatment: all of the charges inside the capsid, which include charged amino acid residues, charged polynucleotides, and salt ions, constitute an electrostatic field φ. This field is uniquely defined by Poisson's equation in terms of the charge distribution. At the same time, an electrostatic field influences arrangement of the charges inside that field. The spacial placement of the charged constituents of a virus and the magnitude of the field inside that virus are therefore mutually dependent. Here, we develop a selfconsistent model that accounts for this dependence. We find that genomic nucleotides are concentrated in a spherical shell detached from the virus capsid, whereas genome length is linear in the net charge of capsid peptide arms.
The proposed mechanism is different from other models of genome packing. Dense packing of nucleotides, such as in bacteriophages (3), would cause the genome length to be proportional to the capsid volume. Alternatively, direct surface adsorption of the viral genome (6, 7) should make the genome length proportional to the capsid surface area or the number of capsid proteins. Both of these trends fail in viruses with highly basic peptide arms, as one can deduce from the references listed in Table 1. The genome length of these viruses is not uniquely related to the geometry of the capsid. Yet there is a direct correlation between the genome length and the net charge on capsid peptide arms, as demonstrated here.
Results
The interior of the singlestranded viruses considered here consists of several major components: highly basic peptide arms, negatively charged genomic polynucleotides, salt ions, and counterions. All of these components interact via the common selfconsistent field. In this section, we show that the general selfconsistent problem can be split into separate tasks on the capsid peptide arms and the genomic nucleotides. The nonlinear behavior of the selfconsistent problem is then treated analytically. We first proceed with the capsid peptide arms.
SelfConsistent Model for the Capsid Peptide Arms.
In the known viruses, capsid peptide arms are (semi)flexible and are typically spaced 2–4 nm from each other. Each arm contains as many as 120 amino acid residues with a net positive charge of up to 30 (for references see Table 1). The high concentration of charge near the internal surface of the capsid puts flexible arms in the category of the synthetic polyelectrolyte brushes. Polyelectrolyte brushes, on the other hand, are known to follow the continuum theories (48, 49). We put this analogy as the basis of our model.
In the Gaussian approximation, the free energy of a single peptide chain in the brush consists of a stretch energy, (1/2a _{p} ^{2}) ∫_{0} ^{Np } (dz/dn)^{2} dn, and interaction energy with the meanfield electrostatic potential φ(z). Here z is the distance from the capsid inner surface (Fig. 1), a _{p} is Kuhn's segment length, n is the index of the segment along the chain, and N _{p} is the chain length. We use a common approximation applied to charged brushes (48) that chains are strongly stretched and that all interactions are carried by the electrostatic field. The latter assumption explicitly ignores excluded volume effects, demanding that viral components are not closely packed. We will selfconsistently justify this assumption in Discussion. The free energy of a single peptide arm in the brush may thus be written as (50, 51): where q is the electronic charge, β = 1/k _{B} T is the inverse temperature, k _{B} is the Boltzmann constant, and f _{p} is the fraction of residues that are charged.
The equilibrium properties of the brush are then determined by the partition function Z = Σexp(−βF[z(n)]), with the summation being carried over all possible chain configurations. The assumption of strong stretching implies that the partition function Z and other equilibrium properties of the system are dominated by the chain conformations with the lowest free energy. Hence, every peptide chain may be assumed to be at its lowest energy state.
Eq. 1 has a striking resemblance to the action of a classical particle moving in the potential −qf _{p}φ(z). The segment index n plays the role of time in this analogy. The minimization problem is then equivalent to finding a classical path of a moving particle, with conservation of energy yielding: The constant of integration here is chosen to eliminate the pulling force on the open end of the chain, located at z _{0}: (dz/dn)_{z=z} _{0} = 0.
We now enforce the monodispersity of the peptide chains to find the electrostatic potential φ. The total number of segments in a chain extending to distance z _{0} is given by and should be independent of the end point z _{0}. This condition, often referred to as the “equal travel time constraint” for its classical particle analogy (50, 51), can only be satisfied by a parabolic potential: as shown in more detail in refs. 50 and 51. Outside the polypeptide brush, the electrostatic field is screened, and the potential is nearly flat (Fig. 1).
The parabolic potential (Eq. 4 ) is very robust. In the complex environment of the lumen of the viruses, the electrostatic field is created by all charged constituents. However, the constraint (Eq. 3 ) imposed by the brush must still be satisfied. The parabolic profile and the strength of the potential φ are therefore unaffected by other charges in the system. This conserved shape of the potential will prove crucial in understanding the organization of the genome molecules discussed below. Only the thickness of the brush and the associated cutoff in the parabolic potential are influenced by other charges (Fig. 1).
Conformation of Genomic Nucleotides.
We now turn to the conformations of the viral genome. The singlestranded genome of ssRNA viruses may be conveniently approximated by a flexible polyelectrolyte chain. The statistical nature of the singlestranded genome is then given by the Green function G(r, r′; n), which is the partition function for a chain subsection extending from the spatial position r to r′ and comprising n segments of the chain. It satisfies the secondorder differential equation (52, 53): which is equivalent to the imaginarytime Schrödinger equation of quantum mechanics. Here a _{n} and f _{n} are the Kuhn segment length and the fraction of charged segments, respectively, for the polynucleotide. Only approximate solutions of Eq. 5 are known even in the simpler cases of single polyelectrolyte chains (54, 55). The potential φ(r) accounts for the intersegment interactions and depends nonlinearly on the Green function G, making analytical solution of Eq. 5 impossible.
The situation changes inside the viral capsid. We have shown in the previous section that monodispersity of the capsid peptide arms leads to the parabolic shape of the electrostatic potential φ and breaks up the nonlinear coupling between φ and G. Genomic polynucleotides now interact with a priori known parabolic potential of Fig. 1 and Eq. 4 . Eq. 5 can then be readily solved. Furthermore, the majority of the encapsidated genome may be expected to reside inside the potential well created by the capsid peptide arms (Fig. 1).
We now proceed with the expansion of G in terms of eigenstates ψ_{τ}(r) and eigenvalues E _{τ} of the timeindependent Schrödinger equation, G(r, r′; n) = Σ_{τ}ψ_{τ}(r)ψ_{τ}(r′)exp(−E _{τ} n), with eigenstates determined by (54): where ω^{2} = 3π^{2} f _{n}/4f _{p} a _{n} ^{2} a _{p} ^{2} N _{p} ^{2}. Once again, coordinate z here measures the distance from the capsid inner surface toward the interior of the capsid. Eq. 6 was originally written in the sherical geometry. However, the genome of the virus is largely localized in the vicinity of the capsid, where variations in radius are small. Curvature effects are therefore minimal and may be neglected.
The problem of viral genome packing has thus been mapped onto a quantum harmonic oscillator (56). The critical condition for the genome binding is the existence of at least one bound state inside the potential well of Fig. 1, with the ground state generally dominating the Green function G (53). Assuming the boundary conditions ψ_{0}_{z=0} = ψ_{0}_{z=∞} = 0, the ground state of the viral genome now corresponds to the first excited state of a symmetric harmonic oscillator, so that: with the corresponding energy E _{0} = ωa _{n} ^{2}/2. Small perturbative corrections associated with the cutoff in the parabolic potential are not considered here (Fig. 1). Contributions from the excited states of the harmonic oscillator are obviously negligible as they provide exponentially small contribution to the Green's function, on the order of exp(−2ωa _{n} ^{2}Λ/3) → 0. Here Λ is the total length of the genomic polynucleotide, which is much larger than the length N _{p} of the capsid peptide arms, so that ωa _{n} ^{2}Λ ≈ Λ/N_{p} ≫ 1.
We therefore predict that nucleotide density should vary as: with z _{max} = [1/ω]^{1/2}. Significantly, nucleotide density is not monotonic and has a pronounced maximum at a distance z _{max} from the capsid inner surface (Fig. 1). Genomic nucleotides are concentrated in a spherical shell, which is separated from the virus capsid by a gap with low nucleotide density. The characteristic width of this shell, defined by the points where nucleotide density is half the maximum value, is Δz ≈ 1.16 z _{max}. The nucleotide density vanishes in the vicinity of the capsid with a gap ≈0.48 z _{max} if measured at half density, or 0.28 z _{max} at visually insignificant 20% density. Known WT viruses correspond to z _{max} = 1–4 nm.
Total Genome Length.
We may now estimate the total length of the viral genome. The criterion for the capture of the genome by the peptide brush is related to the depth of the potential well (Fig. 1). At least one bound state must exist, or 6 E _{0}/a _{n} ^{2} < ω^{2} H ^{2}. This sets the minimum for the width of the potential well and the thickness of the brush H: Intuitively, brush thickness should decrease with the capture of the oppositely charged genome. Adsorption of longer genome is favorable when H > H _{min}. Naturally occurring viruses may then be expected to have H = H _{min}, corresponding to the energetically most favorable amount of genome.
We now relate the thickness H to the net charge on the capsid peptide arms, qf _{p}Σ, and the net charge on genomic polynucleotides, −qf _{n}Λ. Here Λ is the virus genome length per capsid, and Σ is the total number of residues in the peptide arms. Integration of the PoissonBoltzmann equation gives (48): where ρ_{s} is the salt concentration outside the peptide brush, εε_{0} is the dielectric permittivity of the media, and the last term describes the net charge of the displaced salt ions and counterions. For viruses, this expression evaluates ^{†} to Λ ≈ (f _{p}/f _{n})Σ. The viral genome length is therefore related to the charge on the capsid peptide arms. Still, universality of this relation is limited by the varying amino acid content of the peptide arms. A more universal parameter is the net charge on the peptide arms. We therefore predict that the total genome length Λ of a virus should be proportional to the net charge Q on its capsid peptide arms, The proportionality coefficient η ≈ (f _{p}Σ/Q)/f _{n} here is expected to be conserved across different viruses. However, its value may differ from unity. Condensation of ions may prevent dissociation of ions from some nucleotides and amino residues, leaving fractional charges f _{n} and f _{p} below their maximum values of 1 and Q/Σ, respectively. We address this issue later in Discussion.
At first sight, Eq. 11 states that the dissociated charge on the genomic nucleotides should compensate the opposite charge on the capsid peptide arms. We stress that this is not a trivial conclusion from the charge neutrality, as salt ions and counterions always keep the system neutral. Eq. 11 describes the limit on the polyelectrolyte adsorption by the oppositely charged brush. The predicted trend is linear only for the charge densities associated with the observed viruses. ^{†}
Discussion
We first compare our predictions with several published density profiles. Following Eq. 8 , the nucleotide density is expected to be nonuniform, with maximum density located at a distance z _{max} from the virus capsid. To verify this prediction, we use the published cryoelectron microscopy measurements on the hepatitis B virus (27) and a mutant of flock house virus (21), as shown in Fig. 2. The radial density for the hepatitis B virus was obtained by azimutally averaging the data in figure 4i of ref. 27. The choice of these viruses was driven by their basic residue content, with nearly uniform distribution of positive charges in the peptide arms. Both particles encapsidate nonnative cellular ssRNA molecules. What we observe is that the genome of these viruses is indeed concentrated in a single spherical shell near the capsid. The thickness and location of this shell are in agreement with our predictions. The nucleotide shell is indeed detached from the protein capsid. We note that the gap between the nucleotide density and the virus capsid, as predicted by Eq. 8 , is driven solely by the electrostatic potential inside the brush and the fluctuating nature of the conformations of the genomic nucleotides. The gap is not related to the amino acid sequence of the capsid peptide arms.
In the aforementioned virus particles, the average volume per nucleotide is ≈1,700 Å^{3}, much larger than the 655Å^{3} volume of hydrated RNA. Viral nucleotides are therefore loosely packed. Still, viruses with very long genomes may run into packing constraints, thus broadening the peak width (13). Also, some WT viruses, including the WT flock house virus, have their basic residues organized into two separate domains. Both domains are positively charged, resulting in two concentric shells of nucleotide density (21, 35).
We now return to the main question of the encapsulated genome size. Following Eq. 11 , the electrostatic interactions constrain the genome size to the charge on the peptide arms, Λ = ηQ. To verify this prediction, we ran a quantitative comparison with various known WT viruses, as shown in Table 1 and Fig. 3. Representative particles were selected from sufficiently different virus families, infecting both animals and plants. We find that the ratio of the genome length to the charge on the capsid peptide arms is indeed conserved, with η = 1.61 ± 0.03. This ratio is universal for all virus families considered, as is evident from Fig. 3.
This result is in good agreement with a qualitative estimate of η based on the simplified models of ion condensation. In the basic Manning condensation theory (58), counterion dissociation is favorable when the distance between the adjacent charged segments along the chain backbone is above the Bjerrum length, roughly 7 Å in pure water. This translates into every second nucleotide being charged, or f _{n} ≈ 0.5. At the same time, up to 35% of the peptide residues may carry charge, subject to the actual amino acid sequence. Higher basic content in the peptide arms would not increase the visible charge of the arms and is not observed in the WT viruses. The hepatitis B virus is the only exception known to us, with ≈60% of basic residues and relatively short peptide arms (see Table 1 and references therein). Setting f _{p} ≈ Q/Σ, we get η ≈ 2. More precise numerical estimates of ion condensation, accounting for salt ions and chain flexibility (59, 60), give f _{p} ≈ 0.67 Q/Σ, f _{n} ≈ 0.37, and η ≈ 1.8, close to the bestfit value η = 1.61 of Fig. 3.
The predicted genome size is rigorously followed by the WT viruses. Small deviations in either direction would decrease the capsid cohesive energy and make the virus less stable. WT viruses that have gone through long periods of mutations are therefore constrained to this ratio. Yet deviations in the genome length are more likely in mutants. Unfortunately, currently published data on mutants are still rare and incomplete to make conclusive quantitative comparison. Mutants often encapsidate several short pieces of RNA, with the total amount of RNA per capsid often being unknown. Several of the published results are listed in Table 1.
In conclusion, we have shown that the length of the viral genome is unambiguously determined by the nonspecific electrostatic interactions. Mutations and evolution seem to have favored energetically most stable configurations in WT viruses. Our theory explains the observed insensitivity of viruses to sequence variations in peptide arms (17). It may also offer help in the design of new viruslike particles. Still, specific interactions should not be completely discarded. During the onset of assembly, charged residues are sparse and specific interactions may increase the rate of assembly and the yield of particles containing the native genome. Both the native and nonnative genomes should follow the trend of Eq. 11 and Fig. 3, in agreement with observations (20).
Materials and Methods
Amino acid sequences of capsid proteins were obtained from protein databanks and verified through the references listed in Table 1. The only exception is Sesbania mosaic virus, whose sequence was derived from its genome (39) because of incomplete data from protein sequencing. The total uncertainty in the charge of capsid peptide arms is estimated at plus or minus one residue, mainly attributed to sequence variations between virus species and the uncertainty in distinguishing flexible peptide arms from the bulk of capsid protein.
The number of capsid proteins in viruses is often known from their crystal structures (1). The number of capsid proteins of alfalfa mosaic virus, whose crystal structure is not known, was obtained by comparing surface area of the crystallizable T = 1 mutant of the virus (1) versus area derived from the electron micrographs of the virus (8). The number of capsid proteins in luteoviruses was estimated from the mass ratios reported in refs. 14, 24, and 25.
Acknowledgments
We thank R. Nossal, E. DiMarzio, C. Woodcock, K. Belaya, and C. Forrey for valuable comments and discussions. This work was supported by National Science Foundation Grant DMR0605833 and National Institutes of Health Grant 1R01HG00277601.
Footnotes
 *To whom correspondence should be addressed. Email: muthu{at}polysci.umass.edu

Author contributions: V.A.B. and M.M. performed research and wrote the paper.

The authors declare no conflict of interest.

↵ † Substitution of Eq. 4 into Eq. 10 gives f _{p}Σ − f _{n}Λ = (S/2πl _{B} H _{0}){h + π^{3/2} l _{B} H _{0} ^{2}ρ_{s}[exp(h ^{2})erf(h) − exp(−h ^{2})erfi (h)]}, where S is the capsid inner surface area, l _{B} = q ^{2}β/4πεε_{0} is the Bjerrum length, h = H/H _{0} is the reduced brush thickness, and H _{0} ^{2} = 8f _{p}a_{p} ^{2} N _{p} ^{2}/π^{2} defines the characteristic length of the extended peptide arms. Using the values associated with the WT and mutant viruses (Table 1), and setting H = H _{min}, we find that f _{p}Σ − f _{n}Λ ≈ 10^{1}, much smaller than either f _{p}Σ or f _{n}Λ.
 © 2006 by The National Academy of Sciences of the USA
References

↵
 Reddy VS ,
 Natarajan P ,
 Okerberg B ,
 Li K ,
 Damodaran KV ,
 Morton RT ,
 Brooks CL ,
 Johnson JE
 ↵
 ↵
 ↵

↵
 Evilevitch A ,
 Lavelle L ,
 Knobler CM ,
 Raspaud E ,
 Gelbart WM

↵
 Fields BN ,
 Knipe DM
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
 Zhang W ,
 Fisher BR ,
 Olson NH ,
 Strauss JH ,
 Kuhn RJ ,
 Baker TS
 ↵

↵
 Krol MA ,
 Olson NH ,
 Tate J ,
 Johnson JE ,
 Baker TS ,
 Ahlquist P
 ↵
 ↵
 ↵

↵
 Smith TJ ,
 Chase E ,
 Schmidt T ,
 Perry KL

↵
 Schneemann A ,
 Marshall D

↵
 Tihova M ,
 Dryden KA ,
 Le TL ,
 Harvey SC ,
 Johnson JE ,
 Yeager M ,
 Schneemann A

↵
 Dong XF ,
 Natarajan P ,
 Tihova M ,
 Johnson JE ,
 Schneemann A
 ↵

↵
 Rajeshwari R ,
 Murant AF

↵
 Scott KP ,
 Farmer MJ ,
 Robinson DJ ,
 Torrance L ,
 Murant AF
 ↵

↵
 Zlotnik A ,
 Cheng N ,
 Conway JF ,
 Steven AC ,
 Wingfield PT
 ↵
 ↵
 ↵

↵
 Zlotnik A ,
 Seres P ,
 Singh S ,
 Johnson JM
 ↵
 ↵

↵
 Qu C ,
 Liljas L ,
 Opalka N ,
 Brugidou C ,
 Yeager M ,
 Beachy RN ,
 Fauquet CM ,
 Johnson JE ,
 Lin T
 ↵
 ↵

↵
 Forsell K ,
 Suomalainen M ,
 Garoff H

↵
 Rice CM ,
 Strauss JH
 ↵
 ↵
 ↵
 ↵
 ↵

↵
 Owen KE ,
 Kuhn RJ

↵
 Frolov I ,
 Frolova E ,
 Schlesinger S

↵
 Tellinghuisen TL ,
 Hamburger AE ,
 Fisher BR ,
 Ostendorp R ,
 Kuhn RJ
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵

↵
 de Gennes PG
 ↵
 ↵

↵
 Landau LD ,
 Lifshitz EM
 ↵
 ↵
 ↵
 ↵