The kernel energy method of quantum mechanical approximation carried to fourth-order terms

  1. Lulu Huang,
  2. Lou Massa, and
  3. Jerome Karle,§
  1. Laboratory for the Structure of Matter, Naval Research Laboratory, 4555 Overlook Avenue SW, Washington, DC 20375-5320; and
  2. Departments of Chemistry and Physics, Hunter College and the Graduate School, City University of New York, 695 Park Avenue, New York, NY 10021
  1. Contributed by Jerome Karle, December 3, 2007 (received for review November 1, 2007)

Abstract

It is now possible to calculate the ab initio quantum mechanics of very large biological molecules. Two things lead to this perspective, namely, (i) the advances of parallel supercomputers, and (ii) the discovery of a quantum formalism called quantum crystallography and the use of quantum kernels, a method that is well suited for parallel computation. The kernel energy method (KEM) carried to second order has been used to calculate the quantum mechanical ab initio molecular energy of peptides, protein (insulin and collagen), DNA, and RNA and the interaction of drugs with their biochemical molecular targets. The results were found to have good accuracy. In this article, the accuracy of the KEM is investigated up to an approximation including fourth-order interactions among kernels. Remarkable accuracy is achieved in the calculation of the energy of the ground state of the important biological molecule Leu1-zervamicin, whose crystal structure is known and used in the calculations.

The kernel energy method (KEM) combines structural crystallographic information with quantum mechanical theory. KEM may be described as the determination of the quantum mechanical molecular energy by the use of the parts of a whole molecule, which are referred to as kernels. Because the kernels are much smaller than a full biological molecule, the calculations of kernels and multiple kernels are practicable. Subsequently, kernel contributions are summed in a manner affording an estimate of the energy for the whole molecule. Thus, the task of obtaining a quantum mechanical energy has been simplified for biological molecules as large as those containing thousands of atoms. The computational time is much reduced by employing the KEM, and the accuracy obtained appears to be quite satisfactory, as shown in previous work.

The first applications of the KEM (1) referred to above, involved a large number of peptides of various shapes and sizes, giving good accuracy, which was retained throughout a wide range of basis functions and computational methods (2). The KEM has been applied to the protein insulin (3), various A, B, and Z DNA (4), RNA (5), drug-target and ribosomal A-site RNA (6), and a collagen triple helix (7) with favorable results. The overall theoretical background for the application of quantum mechanics with crystallography may be found in refs. 815. References that review the quantum mechanical methods related to the KEM may be found in ref. 10.

This article is devoted to calculating the KEM energy to an order of approximation including terms up to a fourth order of interaction among the kernels. Given that extending the accuracy of the KEM is a subject of interest, the problem that presents itself is the derivation of the approximation to higher orders and its application, to test whether an expected improvement in accuracy results from computation. That is the problem addressed here.

To review, in the KEM, the use of atomic coordinates from x-ray crystallography leads to a reduction of computational effort and an extraction of quantum information from the crystallography. Central to the KEM is the concept of the kernel. Kernels are the quantum pieces into which the full molecule is mathematically broken. All quantum calculations are carried out on kernels and multiple kernels. Because the kernels are chosen to be smaller than a full biological molecule, the calculations are accomplished efficiently, and the computational time is much reduced. Subsequently, the properties of the full molecule are reconstructed from those of the kernels and multiple kernels. Thus, a quantum realization of the aphorism that the whole is the sum of its parts is obtained.

It is assumed that the crystal structure is known for a molecule under study. With known atomic coordinates, the molecule is mathematically broken into tractable pieces called kernels. The kernels are chosen such that each atom occurs in only one kernel. Schematically defined kernels, double, triple, and quadruple kernels, are shown in supporting information (SI) Fig. 1, and only these objects are used for all quantum calculations. The total molecular energy is reconstructed therefrom, by summation over the contributions of the kernels and multiple kernels up to the highest order of interaction to be imposed. In our previous work, only an approximation including up to double kernels had been invoked, and in that case, the total energy is calculated according to the formula E n total = Σ1≤i<jn Eij − (n − 2)Σ1≤in Ei.

In this article, we extend the KEM to higher orders of approximation. The purpose of this, of course, is to increase the accuracy of the KEM calculations. We anticipate that this higher order of accuracy can be applied to obtain the energy of enormous biological molecules when it is not computationally feasible otherwise to treat the entire molecule as a whole because of its size. When a structure of interest has known crystallographic coordinates, one may easily define kernels, which altogether represent the entire composite molecule. The use of single kernels and multiple kernels is an approximation that is made to obtain a simplification in the quantum calculation. The validity of this approximation, up to the use of double kernels, for a variety of large biological molecules has been mentioned above. In this article, we derive an extension of the KEM approximation, to higher orders of interaction, and show how increased accuracy may be obtained in its application. Remarkable accuracy, as we indicate below, can be achieved.

Results for the Total Molecular Energy Expanded in Contributions of Kernels and Their Interactions

Molecular Energy as a Sum Over Kernel Energies.

The formulas for invoking the KEM up to orders of approximation including double, triple, and quadruple energies are displayed as Eqs. 1, 2, and 3, respectively. Formula Formula Formula

KEM Molecular Energy Approximation, Obtained from the Kernel Interaction Energies, up to Fourth Order.

Using the notion of interaction energies, we may decompose the total energy of a molecule composed of n kernels into its single, double, triple, and quadruple kernel contributions as follows (1618): Formula The various terms ΔE in this last equation are the interaction energies between and among kernels. We will define each of these interaction energies explicitly as they are used in the derivations below.

Double-kernel interactions.

The formula for the total energy, calculated up to only an inclusion of the interactions between double kernels is Formula where Ei is a single-kernel energy and ΔEij is the interaction energy between two single kernels (i and j) defined as ΔEij = Eij − (Ei + Ej).

By using this definition of the double-kernel interaction energy, the total energy becomes Formula In the above expression, and in all of the derivations below, we shall find that collecting coefficients of similar terms is simplified by using average quantities. We emphasize that such simplification is the purpose of introducing average quantities. We now define the average quantities associated with Eq. 1b. The single-kernel energies E 1, E 2, E 3, …, En have an average energy that we call Ei*, defined as Ei* = Σ1≤in Ei/n. The double-kernel energies, E 12, E 13, E 14, … E (n−1)n will also have an average energy value that we call Eij*. The number of double kernels is given by the binomial coefficient Formula This means the average of double-kernel energies is given by the expression Formula and similarly the average of the double-kernel interaction energies is Formula This last average can also be written as ΔEij* = Eij* − (Ei* + Ej*). Summing the double-kernel interaction energies, we have Formula By using average values, Eq. 1b becomes Formula Noting that Ei* is the average energy of Ei(i = 1 ⋯ n), and Ej* is the average energy of Ej(j = 1 ⋯ n), so that Ei* = Ej* and Ei* + Ej* = 2Ei*, the total energy becomes Formula Reverting back from average values, to their expressions over sums of energies, we have the desired result (see Eq. 1).

Triple-kernel interactions.

The formula for the total energy, calculated up to only an inclusion of the interactions among triple kernels is Formula where, as before, Ei is the energy of a single kernel, ΔEij is the interaction energy between two single kernels, and ΔEijk is the interaction energy among three single kernels. The interaction energy for a given triple kernel is obtained from its total energy by subtracting the energy of its single kernels and all of its double-kernel interaction energies. What remains after subtraction is the triple-kernel interaction energy, ΔEijk = (EijkEiEjEk) − ΔEij − ΔEik − ΔEjk. Using this in Eq. 2a, we have for the total energy of the full molecule, Formula As before, by using the definition of the average quantities Ei*, Eij*, ΔEij*, and ΔEij* = Eij* − (Ei* + Ej*), so too here, it will be useful to define the analogous generalizations associated with the triple kernels. The number of triple kernels E 123, E 124, E 134, …, E (n−2)(n−1)n, is given by the binomial coefficient Formula and the average of the triple-kernel energies and interaction energies is therefore Formula respectively. Also, ΔEijk* = (Eijk* − Ei* − Ej* − Ek*) − [Eij* − (Ei* + Ej*)] − [Eik* − (Ei* + Ek*)] − [Ejk* − (Ej* + Ek*)].

Now, by using average quantities, the contribution of the triple-kernel interaction energies to the total molecular energy becomes Formula Altogether, we have for the total molecular energy, expressed entirely in the average expressions for single-, double-, and triple-kernel contributions, that Eq. 2b becomes Formula Using Ei* = Ej* = Ek* and Eij* = Eik* = Ejk*, we have Formula Collecting coefficients of similar terms, gives Formula Now, reverting back from the average quantities in this last equation to the equivalent expressions in sums over individual energies that define the averages, we obtain Formula The reader may be convinced that [(n − 2)(n − 3)]/2 = Σ1≤in−3 i by evaluating the first few terms of the above equation (say for n = 4, 5, 6, 7, 8, …). Thus, we obtain the desired result (see Eq. 2).

Quadruple-kernel interactions.

The formula for the total energy, calculated up to only an inclusion of the interactions among quadruple kernels is Formula where in the equation above the various symbols retain their previously used meanings, and ΔEijkl is the interaction energy among four single kernels (i, j, k, l). ΔEijkl is defined as the total energy of the quartet of kernels after subtracting the energy of its single kernels, and the interaction energies of its double kernels and its triple kernels. The quadruple interaction energy is written as ΔEijkl = EijklEiEjEkEl − ΔEij − ΔEik − ΔEil − ΔEjk − ΔEjl − ΔEkl − ΔEijk − ΔEijl − ΔEikl − ΔEjkl. The contribution of such terms to the total molecular energy would be obtained by evaluation of their sum Σ1≤i<j<k<lnΔEijkl = Σ1≤i<j<k<ln{EijklEiEjEkEl − [Eij − (Ei + Ej)] − [Eik − (Ei + Ek)] − [Eil − (Ei + El)] − [Ejk − (Ej + Ek)] − [Ejl − (Ej + El)] − [Ekl − (Ek + El)] − {(EijkEiEjEk) − [Eij − (Ei + Ej)] − [Eik − (Ei + Ek)] − [Ejk − (Ej + Ek)]} − {(EijlEiEjEl) − [Eij − (Ei + Ej)] − [Eil − (Ei + El)] − [Ejl − (Ej + El)]} − {(EiklEiEkEl) − [Eik − (Ei + Ek)] − [Eil − (Ei + El)] − [Ekl − (Ek + El)]} − {(EjklEjEkEl) − [Ejk − (Ej + Ek)] − [Ejl − (Ej + El)] − [Ekl − (Ek + El)]}}. When this sum is added to the contributions of single, double, and triple kernels, one obtains the rather lengthy expression for the total molecular energy, Formula As before, the use of average energy quantities will allow a simplification of the energy expression above so that the individual contributions of single, double, triple, and quadruple kernels can be gathered together and separated from one another.

We retain the previous meanings of the expressions for averages Ei*, Eij*, ΔEij*, ΔEij* = Eij* − (Ei* + Ej*), Eijk*, ΔEijk*, and ΔEijk* = (Eijk* − Ei* − Ej* − Ek*) − [Eij* − (Ei* + Ej*)] − [Eik* − (Ei* + Ek*)] − [Ejk* − (Ej* + Ek*)]. Eijkl* is the average energy of the quadruple kernels E 1234, E 1245, E 1345, …, E (n−3)(n−2)(n−1)n. The number of such quadruple kernels is given by the binomial coefficient Formula Therefore, the average energy for the quadruple kernels is Formula Now, introducing into Eq. 3b the averages of all kernel and multiple kernel energy terms, we obtain Formula By using Ei* = Ej* = Ek* = El*, Eij* = Eik* = Eil* = Ejk* = Ejl* = Ekl*, Eijk* = Eijl* = Eikl* = Ejkl*, Eq. 3c simplifies to Formula Gathering the coefficients of similar terms, yields the expression Formula Reverting back from the average energies in the last equation to their expressions as sums over the individual energy terms gives the total molecular energy as Formula The reader may be convinced that Formula and Formula by evaluating the first few terms of these equations for various values of n. Using these last two equations in Eq. 3f gives the desired result (see Eq. 3).

Application to Leu1-zervamicin of the Fourth-Order Approximation of KEM.

We test the accuracy achievable with the above formulas by application to the important biological molecule Leu1-zervamicin, shown in SI Fig. 2. It is an antibiotic that transports potassium ions across cell membranes. Groups of zervamicin molecules assemble to form channels that serve to allow ion passage. The molecule has hydrophobic side chains extending from the peptide residues on one side and polar side chains extending from other peptide residues on the other side. A side chain of particular interest is a side chain of (residue 11) glutamine. This side chain is attached on the hydrophobic side of the peptide, as determined from a crystal-structure investigation by Karle et al. (19). The side chain acts as a gate that allows K+ ions to proceed through a channel membrane. Glutamine 11, by a swinging motion, acts to open and close the channel each time an ion goes through it. If the structural arrangement of the zervamicin molecules in a cell membrane is similar to that in the crystal, it is possible that the zervamicin crystal structures offer a model of a gating mechanism on the level of atomic resolution. The results of calculation related to Leu1-zervamicin are listed in Table 1.

View this table:
Table 1.

The HF STO-3G energy calculated up to fourth-order approximation of KEM


The program GAUSSIAN was used to do the calculations, related to each of the formulas 1, 2, and 3 applied to the Leu1-zervamicin molecule. The results are listed in Table 1 in order of increasing accuracy, reading across the table, for the double-, triple-, and quadruple-interaction approximations. For each of these approximations, the result listed as the exact energy is the brute force Hartree–Fock (HF) calculation on the full molecule using the limited basis of functions called STO-3G. The table also lists the corresponding energy calculated by means of the KEM equations. Eq. 1, 2, and 3 are used to obtain the results listed for the double-, triple-, and quadruple-kernel approximations, respectively. The differences between the brute-force and the KEM calculations are listed in atomic units (au). Finally the results are listed on the basis of energy difference per atom in units of Kcal/mol. As expected, the difference between the exact and the KEM results decrease as the order of the KEM approximations taken in to account increase from double to triple to quadruple interactions. The numerical differences are −0.0234 au, −0.0017 au, and 0.0000 au for the double, triple, and quadruple interactions, respectively.

Discussion and Conclusions

The KEM has successfully combined x-ray structures with quantum mechanical theory to obtain the molecular energy of a variety of large biological molecules. Because the kernels are much smaller than a full biological molecule, the calculations of kernels and multiple kernels are practicable, and the task of obtaining a quantum mechanical energy has been simplified for molecules containing thousands of atoms. The question arises, what are the limits of accuracy based on the idea of quantum kernels. To answer this question, this article is devoted to calculating the KEM energy to an order of approximation including terms up to a fourth order of interaction among the kernels. In this first set of test calculations, a limited basis set of atomic orbitals and the HF approximation were used. However, as we have indicated previously, the KEM (2) applies to any basis set and to all of the usual quantum mechanical methods based on orbitals.

We have derived the energy equations that allow the generalization of the kernel approximation to higher orders of interaction. Eq. 1 takes into account the double-kernel interactions, and this is the equation we have used for most of the results published heretofore (1, 37). The generalizations of this equation to include triple- and quadruple-kernel interactions are new and are given by Eqs. 2 and 3, respectively.

The numerical test of these equations using the molecule Leu1-zervamicin indicate that remarkable accuracy is achievable. This molecule contains 265 atoms and was broken into seven kernels as indicated in SI Fig. 3. All three of the equations were applied to calculate the energy of the molecule by using these seven kernels. The principal guideline for choosing kernel boundaries is simply to sever only single bonds between amino acids. We consider a kernel to be of convenient and optimal size if it contains ≈100 atoms.

The standard of accuracy for these calculations was the brute-force HF calculation of the energy for the full molecule in the same basis. As the results of Table 1 show, the accuracy of the KEM increases with each order of approximation. For example, by using Table 1 consider the calculation difference on the basis of difference per atom. At the level of double-kernel interactions (Eq. 1) the magnitude of difference is 10−2 Kcal/mol per atom, an already small error. If triple-kernel interactions (Eq. 2) are invoked the magnitude of difference is reduced by an order of magnitude to 10−3 Kcal/mol per atom. Finally, the use of quadruple interactions (Eq. 3) induces a further reduction in difference by an additional four orders of magnitude to 10−7 Kcal/mol per atom. For this molecule at least, and those similar to it, one need not contemplate going beyond the quadruple level of accuracy in the KEM approximation. The results here allow one to conclude that ab initio accuracy is obtainable for biological molecules within the KEM, carrying the approximation up to quadruple interactions. For large enough molecules, for which brute force calculations are not feasible, the KEM calculations will still be practicable, because the kernels and multiple kernels can be chosen to be very much smaller than the full molecule.

In previous papers, we have shown that the KEM provides an effective simplification in the task of solving the Schrödinger equation for the case of large biological molecules. The point of this article, then, is to demonstrate that the accuracy of KEM can be vastly increased by incorporating higher-order interactions among the kernels. To make this point, we have calculated the higher-order interactions associated with the molecule Leu1-zervamicin listed in Table 1. Moreover, because the KEM constructs a whole from the sum of its parts, it is especially suitable for parallel computation.

It is important to mention the computational complexity of the method that has yielded the results of Table 1. In the simplest terms, one may state that the complexity of an HF calculation is proportional to n 4, where n is the number of atoms (and in a rough way this is proportional to the number of basis functions used). If a system of n atoms is broken into nK single kernels, then the number of double kernels, triple kernels, and quadruple kernels would be of the order of nK2, nK 3, and nK 4, respectively, and each individual multiple kernel would require computational effort of the order of (n/nK)4. Thus, the total computational effort, in the case of using at most double, triple, or quadruple kernels, would be of the order of (n 4/nK 2), (n 4/nK), and n 4, respectively. It is clear that a great saving in time would be associated with the use of Eq. 1 and Eq. 2 (which incorporate, at most, double kernels and triple kernels) because of the way a power of nK enters the denominator in the description of their computational effort. Because Eq. 1 and Eq. 2 are capable of quite good results, as may be seen from Table 1, in most cases there would not be a need to go beyond their level of accuracy. Eq. 3 (which incorporates quadruple kernels) is even more accurate but at the cost of the increased computational effort noted above. Reasons to expend such cost would include the advantage of parallel computation organized by the assignment of the kernels and multiple kernels to individual parallel nodes of computation, the possibility of an organized omission of certain quadruples bound to be negligible based on a criterion of distances among kernels, and obtaining separate interaction energies of second, third, and fourth order. An HF calculation on a full molecule, for example, would not yield the separate interaction energies of each order. Most importantly, the use of quadruple kernels would allow the computation of extremely large molecules that would not otherwise be possible if the space required to store the large number of energy integrals were insufficient. For example, in an HF calculation where the number of integrals to be stored would grow as n 4, the corresponding memory space required in the case of individual quadruple calculations would only grow as (n/nK)4.

There is a further point worth remembering. In the KEM, the fragment calculations are carried out on multiple kernels and single kernels whose ruptured bonds have been mended by the attachment of H atoms. The total contribution of hydrogen atoms introduced to saturate the broken bonds tends to zero. This happens because the effect on the energy of the hydrogen atoms that have been added with positive sign are effectively canceled, in the overall expression of the energy, containing contributions from equal numbers of hydrogen that enter with negative sign. This cancellation of the mending hydrogen atom energy effects contributes to the accuracy achieved by the KEM, at every order of kernel-interaction approximation.

We mention in passing that the calculations of this article proceed from crystallographic coordinates, and they are therefore not intended to account for biological molecules in an aqueous environment. However, water molecules of crystallization can, of course, be accounted for by simply treating them as a separate kernel. This article is focused on calculation of the molecular energy, but we are aware that in some circumstances it would be of interest to calculate other physical properties of a molecule such as the charge distribution, dipole moment, polarizabilities, etc., but we leave these properties for future investigations.

The KEM, suggested here for especially high accuracy, might find application in many problems in which the object of calculation might concern the quantum mechanics of large molecules. These include the rational design of drugs, peptide folding, and the study of weak interactions among biological molecules.

Method of Calculation

The calculations of this article were carried out by using the HF linear combination of atomic orbitals (LCAO) method (20) for approximating the solutions to the molecular Schrödinger equation.

The basis functions chosen for implementation of the HF equations are Gaussian-type functions (21). The STO-3G basis (22, 23) is a minimal basis set which contracts three Gaussian functions to approximate one Slater-type orbital, and these are the basis functions that have been used here.

The calculation results described in this article were completed by using Gaussian ‘03 on a p670 IBM supercomputer running the AIX 5.2 IBM operating system.

Acknowledgments

We acknowledge improvements in our manuscript by suggestions of a referee in the discussion of this article. The research reported in this article was supported by the Office of Naval Research. L.M. thanks the U.S. Navy Summer Faculty Research Program administered by the American Society of Engineering Education for the opportunity to spend summers at the Naval Research Laboratory. LM studies were supported by the U.S. Army Breast Cancer Award W81XWH-06-1-0658 and National Institutes of Health Grants NIGMS MBRS SCORE5 S06GM606654 and RR-03037 from the National Center For Research Resources).

Footnotes

  • §To whom correspondence should be addressed. E-mail: jerome.karle{at}nrl.navy.mil
  • Author contributions: L.H., L.M., and J.K. designed research, performed research, and wrote the paper.

  • The authors declare no conflict of interest.

  • This article contains supporting information online at www.pnas.org/cgi/content/full/0711297105/DC1.

References

« Previous | Next Article »Table of Contents