The weighted-volume derivative of a space-filling diagram

  1. Herbert Edelsbrunner*, and
  2. Patrice Koehl,§
  1. *Department of Computer Science, Duke University, Durham, NC 27708-0129; Raindrop Geomagic, Research Triangle Park, NC 27709; and Department of Structural Biology, Stanford University, Stanford, CA 94305-5126
  1. Communicated by Michael Levitt, Stanford University School of Medicine, Stanford, CA (received for review December 12, 2002)

Abstract

Computing the volume occupied by individual atoms in macromolecular structures has been the subject of research for several decades. This interest has grown in the recent years, because weighted volumes are widely used in implicit solvent models. Applications of the latter in molecular mechanics simulations require that the derivatives of these weighted volumes be known. In this article, we give a formula for the volume derivative of a molecule modeled as a space-filling diagram made up of balls in motion. The formula is given in terms of the weights, radii, and distances between the centers as well as the sizes of the facets of the power diagram restricted to the space-filling diagram. Special attention is given to the detection and treatment of singularities as well as discontinuities of the derivative.

It is widely believed that characterizing the geometry of a protein molecule is essential for understanding its folding process as well as its interactions with other biomolecules and small ligands. Among all geometric measures, volume is probably the most fundamental property to study. The relevance of calculations of molecular volume is seen in the correlation of volume, or change in volume, with experimental factors such as energetics (1) as well as with physical chemical properties of the molecule such as charge, temperature, and pressure (2, 3). There is a large amount of current research that focuses on the calculations of standard atomic volumes (46). These atomic volumes have been considered as parameters that can be used to measure the quality of a three-dimensional protein structure (7) as well as probes to quantify structural variations in proteins that occur during evolution (8). Atomic volumes have also been used in packing (911) and solvation-energy calculations (12, 13). The latter energy function is central to computational biology, because solvation is an important factor determining the structure and thermodynamics properties of molecules in aqueous solution. All solvent effects on a molecule can be included in an effective solvation potential, W = W elec + W np, in which the first term accounts for the molecule–solvent electrostatics polarization, and the second term accounts for the molecule–solvent van der Waals interactions and for the formation of a cavity in the solvent. W elec is usually described by a continuum model such as the generalized Born model (13, 14). Interestingly, an essential element of such a model is the knowledge of the volume associated with the individual atoms of the solute. Many solvation models describe the nonpolar part of the solvation energy W np as a weighted sum of the solvent-exposed or -accessible surface area of each atom of the solute (15, 16). There is evidence, however, that for small solute the hydrophobic term W np is not proportional to the surface area (17) but rather to the excluded volume of the molecule (18). A volume-dependent solvation term was originally introduced by Gibson and Scheraga (19) as the hydration shell model. Biomolecular simulations with implicit solvent that rely on iterative energy calculations such as energy minimization and molecular dynamics require efficient, exact algorithms for the evaluation of the solvation-energy function and its derivatives. In this article, we provide a complete geometric description of the derivatives of the weighted volume of a molecule when its atoms are in motion.

Lee and Richards (20) define the accessible surface of a protein as the van der Waals surface of the molecule expanded by the radius of the solvent sphere about each atom center. The solvent-excluded volume is the volume enclosed by the accessible surface. Computational methods that evaluate the solvent-excluded volume can be divided into approximate and exact methods. Most of the approximate methods rely on numerical integration (2123), whereas most analytical methods (9, 12, 2427) point out that overlap volumes of four or more atoms do not occur in real molecules. An alternative approach for computing the volume and studying the packing of balls was originally developed by Voronoi (28) and later was applied to protein by Richards (9). It has been used successfully for the calculation of protein constituents, the description of protein motions, and the analysis of cavities in proteins (2, 8, 29, 30). A drawback of this approach is that it requires information on the first shell of hydration so that polyhedra may be defined for atoms at the surface of the molecule. It was also developed for equal-size balls and its applications to proteins with atoms that have different radii required approximations (30). Pavani and Ranghino (22) proposed a method for computing the volume of a molecule by using the inclusion–exclusion principle. In their implementation, only intersections of up to three balls were considered. This idea was generalized to any levels of intersections by Gibson and Scheraga (26), applying a theorem from mathematics that higher-order unions and intersections can always be reduced to lower-order unions and intersections (31). Doing the reduction correctly, however, remains computationally difficult and expensive. The alpha-shape theory solves this problem exactly by using Delaunay triangulations and their filtrations (32). Alpha shapes have been used to compute the surface area and volume of proteins as well as for detecting and measuring cavities in proteins (33). Fig. 1 provides a simple overview of the application of the alpha-shape theory to compute the volume of a protein.

Figure 1

Computing geometric properties of a molecule. As is common in biology, atoms are treated as intersecting balls, the union of which forms the space-filling diagram. Gerstein et al. (8) and Richards (9) proposed to study this union by using the Voronoi diagram (or more exactly the power diagram since the balls have different radii). That diagram divides the space into convex cells, one per atom. In the two-dimensional example shown here, the edges that separate these cells are shown as solid inside and dotted outside the union of disks. Note that the power cells of some surface atoms extend to infinity. The superimposed Delaunay triangulation (thick solid and dashed lines) is the dual of the power diagram obtained by drawing a line segment between two ball centers if their convex cells share a common edge. Despite their different appearance, the Delaunay triangulation and the power diagram contain exactly the same information. The key to connecting the power diagram to the molecule is to consider the intersection of the two; this defines a convex decomposition of the space-filling diagram (i.e. the light shaded area inside the disks, divided into convex regions by the solid part of the Voronoi edges). The dual of this decomposition is the dual complex (thick solid lines and shaded triangles). The dual complex is a subset of the Delaunay triangulation and contains all simplices (tetrahedra, triangles, and edges) that correspond to overlapping atoms. As demonstrated in ref. 32, the dual complex contains all information about a molecule required to compute its surface and volume. In this article we show that the same dual complex can be used to compute the derivative of the volume.


The distinction between approximate and exact computation also applies to existing methods for computing the derivatives of the solvent-excluded volume with respect to atomic coordinates (3436). All these methods have to take a large number of singularities into account, where approximations are usually required (36). This problem of singularities is even more acute for surface-area calculation (37, 38). The alpha-shape theory proposes a robust solution to this problem by implementing arbitrary precision arithmetic to avoid numerical problems and systematically resolving all singularities without explicitly perturbing the positions of the ball centers. The latter method is referred to as Simulation of Simplicity (39). In this article we describe an extension of the alpha-shape method that implements the weighted volume derivative theorem described below to provide an efficient, robust, exact computation of the derivatives of volumes. There is an inherent difficulty in using a potential based on volume for energy minimization or molecular dynamics. Although the volume is continuous in the position of the atoms, its derivatives may not be. We examine this issue within the framework of the alpha-shape method and relate discontinuities to combinatorial changes in the subcomplex of the Delaunay triangulation that is dual to the space-filling diagram of the molecule.

Background and Results introduces the background and states the main result of this paper. Derivation uses geometric arguments to prove the formula for the weighted-volume derivative. Discontinuities and Implementation discuss the continuity of the derivatives and the implementation of the theorem, and Discussion concludes the article.

Background and Results

As common in biology, we model an atom as a ball and a molecule as the union of a finite collection of such balls. This union is referred to as the space-filling diagram of the molecule. In this section we explain how we approach the problem of computing the derivatives of the weighted volume of a three-dimensional space-filling diagram.

Geometric Structures.

Let B i = (z i, ϱi) be the ball with center z i and radius ϱi for 0 ≤ i < n and call F = ∪i B i a space-filling diagram. The corresponding state is the point z ∈ ℝ3n that lists the 3n center coordinates in sequence. The weighted volume of F is a map W:ℝ3n → ℝ, and its derivative at z is a linear map DW z:ℝ3n → ℝ defined by DW z(t) = 〈w, t〉, where w ∈ ℝ3n is the gradient of W at z. Specifying the derivative is equivalent to giving the gradient.

Denote the sphere bounding B i by S i. The power distance of a point x ∈ ℝ3 from S i is πi(x) = ∥xz i2 − ϱFormula, and the power cell of S i is the set of points P i for which S i minimizes the power distance. We assume or simulate the generic case, in which every power cell is a three-dimensional convex polyhedron, and every facet, edge, and vertex belongs to exactly two, three, and four power cells. The power diagram consists of all power cells and their facets, edges, and vertices. As illustrated in Fig. 1, the power diagram decomposes the space-filling diagram into convex cells of the form P iF, and these cells share facets, edges, and vertices, which are the intersections of F with the facets, edges, and vertices of the power diagram. The dual complex K of that decomposition consists of all vertices, edges, triangles, and tetrahedra that are convex hulls of sphere centers with cells that intersect in nonempty cells, facets, edges, and vertices.

Sizes and Distances.

We use fractions to express the sizes of geometric entities in the decomposition of the space-filling diagram. For example, βi = vol(P iF)/vol(B i) is the fraction of the ith ball that belongs to its power cell. The volume of F = ∪i B i is therefore (4π/3) ∑i βiϱFormula, and given real weights αi, the weighted volume is W = (4π/3) ∑i αiβiϱFormula. The formula for the derivative of W needs fractions of disks, which we now discuss. Two spheres intersect in a circle S ij = S iS j, which bounds a disk B ij. The fraction of the disk that belongs to the power diagram is Formula As explained in ref. 32, it is straightforward to compute these fractions from the dual complex K of F by using the principle of inclusion–exclusion. More details on how to perform this calculation are provided in Appendix. We will also need the area of the facet P iP jF, which we compute as βij A ij. An expression for A ij = area(B ij) in terms of radii and distances between centers of spheres will be given in Derivation (see Eq. 7).

The derivative of W also requires the distance between the line L ij passing through the centers z i and z j and the line L ijk, the points of which have equal power distance from S i, S j, and S k. This distance δij,k is equal to the distance between the center z ij of S ij and the point y ijk on L ijk closest to z ij. We note that y ijk is the unique point in the plane of z i, z j, and z k that has equal power distance to all three spheres.

Averages.

The derivative of W also refers to the weighted-average vector from the center to the boundary of a facet P iP jF. The weight is the infinitesimal contribution to the area of the facet as we rotate the vector. That weight is constant over each edge and each arc in the boundary of the facet. We therefore compute the weighted-average vector as the weighted sum of the average vectors of the edges and arcs, with the weights being the areas of the subtended sectors. The average vector from the center to an edge xy = P iP jP kF is the vector to the midpoint, V xy = (x + y)/2 − z ij. Consider next an arc ab on the circle S ij, and let ϕ = ϕab be half the angle, as in Fig. 2. Assuming the center is the origin, the radius is 1, and the arc midpoint lies on the abscissa, we get the average by integration Formula To give a formula without the normalizing assumptions, we use the midpoint M ab of the arc ab. The average vector is V ab = v ab(M abz ij).

Figure 2

The average of the solid arc ab lies on the line connecting the center z ij and the midpoint M ab of the arc.


In general, the facet P iP jF has a boundary that consists of several edges and arcs. The weighted-average vector from the center to that boundary is the weighted sum of the individual averages, Formula where h xy = δij,k is the distance from z ij to the line spanned by xy, and ϱij is the radius of S ij. The first sums range over all edges xy, and the second sums range over all arcs ab of the facet. Note that the denominator is twice the area of the facet. We provide a method to compute the weighted-average vector using inclusion–exclusion in Appendix.

Main Result.

The main result of this article is a complete description of the weighted-volume derivative. We need some notation. The weight of the ith ball is αi ∈ ℝ, which may be positive or negative. We write ζij = ∥z iz j∥ for the distance between two centers and U ij = (z iz j)/ζij for the unit vector between the same. Recall also the definitions of βij, A ij, and V ij given above.

Weighted-Volume Derivative Theorem.

The derivative of the weighted volume of the space-filling diagram with state z is DW z(t) = 〈w, t〉, where Formula for 0 ≤ i < n. The sum is over all edges z i z j in K.

We illustrate the formula with a small example consisting of three spheres that intersect in two points, the projection of which along the shared edge of the power diagram is shown in Fig. 3. In the unweighted case we have αi = 1, w ij = 1, and x ij = 0 for all i and j. The derivative thus simplifies to a sum of vectors between the centers: ∑ βij A ij U ij. As shown by Csikós (40), this formula generalizes to dimensions different from three. We note that DW is not everywhere continuous. Specifically, it may not be continuous when two or more spheres coincide or three or more spheres intersect in a common circle (see Discontinuities).

Figure 3

Three equal-sized balls with different weights of 1.0, 2.0, and 3.0. For each pair we get components of the gradient normal to (dotted) and parallel to (dashed) the separating facet. The solid vectors are the sums of the components as well as the portions of the gradient that correspond to the different balls.


Derivation

In this section we derive the formula claimed in the Weighted-Volume Derivative Theorem. It is the sum of contributions from locally direction-preserving and locally distance-preserving components of the motion.

Direction-Preserving Motion.

A direction is determined by two spheres and is preserved if one sphere stays put and the other moves along that line. To study the volume derivative for such a motion, we restrict out attention to two spheres S i and S j with nonempty intersection S ij and define F = B iB j. Let ζi and ζj be the signed distances of the centers from the plane of S ij, as illustrated in Fig. 4. The sum of the two distances is ζij = ζi + ζj = ∥z iz j∥. We have ϱFormula − ϱFormula = ζFormula − ζFormula = ζiji − ζj), and therefore Formula and Formula It is geometrically obvious that the weighted-volume derivative is the area of the disk bounded by S ij times the sum of the weighted derivatives of ζi and ζj. Because the square radius of the disk is ϱFormula = ϱFormula − ζFormula, the area of the disk is Formula where the product ranges over the four different ways of assigning signs. The derivatives of ζi and ζj with respect to ζij can be computed from Eqs. 5 and 6, and the weighted-volume derivative is Formula Formula Alternatively, we could work analytically, compute the weighted-volume contributions of the two balls by integration, and get dW/dζij by differentiation. The result is the same.

Figure 4

The two spheres intersect in a circle with center z ij and radius ϱij.


Distance-Preserving Motion.

The distance between the centers of two spheres S i and S j is preserved if one rotates about the other. We consider what happens to the facet P iP jF that separates the two corresponding cells. As illustrated in Fig. 5, that facet is the intersection of a disk and a convex polygon. The boundary consists of edges and circular arcs. Let T i normal to U ij be the motion vector of S i, and let R i,j be the line normal to both vectors that passes through the center z ij of the disk. The infinitesimal motion of the facet consists of a rotation about R i,j composed with a translation along T i, but only the rotation has a nonzero contribution to the volume derivative. On one side of the line we trade volume of B i for that of B j, and on the other side we do it the other way round. The net gain or loss is the accumulation of the contributions of the sectors over the edges and arcs in the boundary of the facet. Here we count a sector over an edge negative if the line defined by the edge separates z ij and the facet.

Figure 5

Infinitesimal motion of a facet between two balls B i and B j. The two bounding spheres intersect in the dotted circle S ij with center z ij, perpendicular to the edge z i z j. The (shaded) facet P iP jF between the corresponding two power cells is the intersection of the disk bounded by S ij with the convex polygon common to the two cells. The solid boundary of the facet consists of edges and circular arcs. The only relevant component of the infinitesimal motion of the facet is a rotation around the line R i,j perpendicular to the edge z i z j and the motion vector T i. On one side of this line (dark shaded area), we trade volume of B i for that of B j, and on the other side (light shaded area) we do it the other way around.


We first consider an edge xy defined by S i, S j, and a third sphere S k. We compute the derivative of the signed volume swept out by the triangle z ij xy with respect to the rotation angle θ, Formula where h xy = δij,k is the distance from z ij to the line passing through x and y, and V xy = (x + y)/2 − z ij is the vector to the midpoint. Note that the result is two thirds the area of the triangle times the scalar product divided by the distance between the two ball centers.

Second, we consider a circular arc ab, which is part of the circle S ij. We compute the derivative of the volume swept out by the disk sector z ij ab with respect to θ, Formula where 2ϕab is the angle of the arc, and V ab is the average vector from the center to the arc. Again, the result is two thirds the area of the sector times the scalar product divided by the distance between the two ball centers. As explained in Background and Results, V ab = v ab(M abz ij), where M ab is the midpoint of the arc and v ab = (sin ϕab)/ϕab.

Assembling the Relations.

Let t be the velocity vector of the state z and T i = [t 3i+1, t 3i+2, t 3i+3]T the velocity vector of z i. Because the derivative is linear, we can decompose the motion into components and add the contributions. For every ordered pair i, j, we consider the direction-preserving component 〈U ij, T iU ij and the distance-preserving component T i − 〈U ij, T iU ij of T i.

The contribution of the direction-preserving component is the fraction of the disk B ij that belongs to the power diagram times what is given in Eq. 8. This is βij A ij w ijU ij, T i〉, where w ij is given in Weighted-Volume Derivative Theorem. The contribution of the distance-preserving component is the weight difference times two thirds the area times the weighted average vector divided by the distance between the ball centers. The factor 2/3 is necessary to change the factor 1/2, which is used in the area computation of sectors, to 1/3, which arises in the volume computation of cones. The contribution is then x ijβij A ijV ij, T i〉, where x ij is given in Weighted-Volume Derivative Theorem. It is also the weight difference times the sum of contributions over all edges, given in Eq. 9, plus the sum of contributions over all arcs, given in Eq. 10. Adding the terms for all ordered pairs gives Formula for the weighted-volume derivative. We can write this more succinctly as DW z(t) = 〈w, t〉, where w is as defined in Weighted-Volume Derivative Theorem. We have βij = 0 unless z i z j is an edge of K, which implies that the summation can be limited to the edges in K.

Discontinuities

The weighted-volume derivative, DW, maps a state z to the gradient w = wz of W at z. From the formula in Weighted-Volume Derivative Theorem, we glean that DW is continuous except at states at which either two or more spheres coincide or three or more spheres intersect in a common circle. The set of such states is a (3n − 3)-dimensional subset of ℝ3n, which should be compared with the (3n − 1)-dimensional subset at which the weighted-area derivative is discontinuous. To see that there are only these two types of discontinuities, we note that A ij is everywhere continuous, x ij, w ij, and U ij are continuous except when z i = z j, and βij and V ij are continuous as long as the facet P iP jF varies continuously. States at which that facet changes abruptly either have z i = z j or contain at least one sphere, other than S i and S j, that contains the circle S ij.

Every state z at which DW is discontinuous has the property that every open neighborhood contains states with combinatorially different dual complexes. In other words, when z passes through a discontinuity of DW, then the dual complex passes a moment at which it may undergo a combinatorial change. Note, however, that not every combinatorial change in the dual complex corresponds to a discontinuity in DW.

Implementation

To compute the volume and its derivatives with respect to the coordinates of the center of the balls, we first need the dual complex of the space-filling diagram. For that purpose, we have written a new version of the alpha-shape software, alphavol, specific to molecular simulation applications. alphavol first performs a regular triangulation of the set of centers of the balls by using the algorithm described by Edelsbrunner and Shah (41). Singularities are resolved by using the Simulation of Simplicity (39), whereas numerical imprecisions are avoided by using arbitrary precision arithmetic. The software is kept efficient through the application of a floating-point filter. The dual complex is computed by using the same care for singularities and numerical uncertainties. alphavol includes implementations of the volume and weighted-volume derivative formulas. Computation of the volume and its derivatives of a 250-residue protein by using alphavol requires ≈0.37 sec on a 1,000-Mhz Pentium processor. The 0.37 sec roughly breaks down to 0.11 sec for computing the regular triangulation, 0.03 sec for generating the dual complex, 0.17 sec to compute the weighted volume, and 0.06 sec to compute the weighted-volume derivatives.

The weighted-volume derivative formula, Eq. 4, was tested against the numerically estimated derivatives, Formula where z 3i+j is the jth coordinate of the ith atom center. The average relative difference between analytical and numerical derivatives was computed as a relative root mean square error: Formula Using δ = 0.0001, we find μ(W) equal on the average to 5.1 × 10−8 and never larger than 9 × 10−8 for a collection of 100 proteins varying in size from 50 residues or ≈400 atoms to 500 residues or ≈4,000 atoms. The good agreement between the analytical and numerical derivatives of the weighted volume shows that our approach and implementation are correct. alphavol is available online at http://biogeometry.duke.edu/software/proshape.

Discussion

The software alphavol, which computes the weighted volume of a union of balls and its derivatives with respect to the coordinates of the centers of the balls, provides a fast, accurate, and robust method for computing the interaction of water with nonpolar atoms of a molecule in a hydration shell model (19). ALPHAVOL implements analytical formulas for computing the weighted volume and its derivatives and explicitly deals with the problem of discontinuities. We have made preliminary steps toward inserting alphavol into the molecular dynamics software encad (42) and gromacs (43), but it is too early to say anything about the corresponding results.

Measuring the volume occupied by individual atoms in macromolecular structures has been the subject of scientific investigation for several decades. In recent years, this interest has increased, because volumes of specific atom types are used in a wide range of implicit solvent models. One of the most widespread uses of atomic volumes in molecular mechanics calculations is in the generalized Born model of electrostatic solvation (14), where the atom volume appears in the calculation of the desolvation of one atom by another atom. In the present implementation of the generalized Born model, ball overlaps are ignored, corrected by scaling factors, or dealt with by using a Gaussian representation of the ball. We expect that the formalism introduced in this article for computing volumes and their derivatives that explicitly and accurately deal with ball overlaps should prove useful for the generalized Born model.

Voronoi polyhedra have been widely used to compute the volume occupied by the atoms of proteins (4, 69) and recently DNA (5) in solution. One difficulty in applying the Voronoi-polyhedra approach to macromolecular structures is that the treatment of atoms at the surface requires special attention because their power cells are infinite. Possible solutions to this problem are the construction of a layer of solvent molecules around the macromolecule (44) or limiting the power diagram to the space-filling diagram (29, 33). This article describes the latter approach and extends it by providing exact expression for the derivatives of the atomic volume. These derivatives are expected to be useful for including potential based on packing in molecular mechanics or molecular dynamics calculation.

The mathematical tools used in this paper to express the volume derivative of a space-filling diagram have also been used to compute the weighted surface-area derivatives (R. Bryant, H.E., P.K., and M. Levitt, unpublished data).

Acknowledgments

This work was supported in part by National Science Foundation Grant CCR-00-86013. P.K. acknowledges support from the Department of Energy (DE-F603-95ER62135).

Footnotes

  • § To whom correspondence should be addressed. E-mail: koehl{at}csb.stanford.edu.

References