New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Inflationary dynamics for matrix eigenvalue problems

Contributed by Eric J. Heller, March 5, 2008 (received for review October 30, 2007)
Abstract
Many fields of science and engineering require finding eigenvalues and eigenvectors of large matrices. The solutions can represent oscillatory modes of a bridge, a violin, the disposition of electrons around an atom or molecule, the acoustic modes of a concert hall, or hundreds of other physical quantities. Often only the few eigenpairs with the lowest or highest frequency (extremal solutions) are needed. Methods that have been developed over the past 60 years to solve such problems include the Lanczos algorithm, Jacobi–Davidson techniques, and the conjugate gradient method. Here, we present a way to solve the extremal eigenvalue/eigenvector problem, turning it into a nonlinear classical mechanical system with a modified Lagrangian constraint. The constraint induces exponential inflationary growth of the desired extremal solutions.
Physical problems of importance to many fields of science are routinely reduced to an eigenvalue problem for a real symmetric, or hermitian, N × N matrix A: where ψ_{n} is the nth eigenvector with eigenvalue e_{n}. Realistic simulations can generate matrices of dimension n = 10^{9} or more, but often most of the matrix elements vanish for physical reasons, yielding a sparse matrix. Well established methods exist that focus on finding extremal eigenpairs (or internal eigenpairs with various preconditioning strategies), such as conjugate gradient methods (1), Jacobi–Davidson techniques (2), and Lanczos algorithms (3⇓–5).
The bulk of the numerical effort in these approaches goes into the iteration step, e.g., multiplying the matrix A by a vector φ. The methods differ as to how information from each iteration is used. Lanczos and Arnoldi methods use a Krylov space spanned by the initial (often random) vector φ_{1} and its iterates φ_{n} = Aφ_{n−1}.
Within the class of Krylov space approaches, diagonalizing in the full Krylov space is variationally optimal and in principle cannot be bested by any other use of the Krylov vectors (such as the one we propose here). However, if this were the end of the story there would be little need for restarting algorithms (such as implicitly restarted Arnoldi, which is used in the ARPACK library and in the MATLAB environment, for example), and many other strategies that have been proposed. One reason for these strategies, and the continued activity in the field, is that, in practice, there are memory issues in storing the matrix A and/or the Krylov vectors and numerical stability issues that limit the performance of the ideal Lanczos method.
Furthermore, the subject of optimal approaches to large matrix eigenvalue problems remains active because of special requirements associated with different problems (such as the need for interior eigenpairs, the number of eigenpairs needed, the accuracy required, etc.), and the existence of classes of matrices with special convergence characteristics (diagonal dominance, large or small spread of diagonal elements, near degeneracy of lowest eigenvalues, etc.). NonKrylov space approaches, especially the Jacobi–Davidson method, have been invented to address some of these issues. The Davidson method, which uses preconditioning, has been invaluable for quantum chemistry applications, especially where several lowest eigenpairs are needed, the spread of diagonal elements is large, and the matrices, although sparse, still contain too many nonzero elements to store.
Our purpose in this article is to report an approach that starts from a fresh premise. Although it also relies on matrixvector multiplication and is not immune to the issues that limit the standard approaches, it is different enough in its design and implementation to deserve special attention. The main idea is to replace a large real symmetric (or hermitian) Ndimensional eigenvalue problem with an Ndimensional classical trajectory problem, where the potential energy minimum corresponds to the minimum eigenvalue e_{0} and the coordinates of this minimum correspond to the eigenvector ψ_{0}. Even with a billion dimensions, for the proper choice of parameters one quickly finds the minimum under “time” evolution (discrete iterations). There are no false local minima. We show that, under this dynamics, the lowest eigenvectors grow exponentially fast relative to their neighbors, through a Lagrange multiplier that regulates this inflation. Nearby eigenpairs are also easily found. Implementation of the algorithm is simple, flexible, and robust, and convergence rates are easily proved analytically.
Associating various kinds of eigenvalue problems with dynamical systems is not new; it is especially popular in the context of quantum mechanics. Examples include the Pechukas approach to random matrix eigenvalue spectra (6), the Miller–Meyer–McCurdy classical analog for electronic degrees of freedom (7), the powerful and popular idea of replacing quantum statistical mechanics with a classical polymer bead system (8), and the well known association of classical driven, damped oscillators with quantum transitions in spectroscopy (9). We single out the Car–Parrinello (CP) method and especially Car's damped CP method, proposed in the context of density functional theory (DFT), as most closely related to the present work (10). However, we treat here the general real symmetric (or hermitian) eigenvalue problem; there is no connection to DFT, the Kohn–Sham equations, or even quantum systems. Nonetheless, the potential is there in the future to try to construct on the fly CPlike methods that are not DFT based.
The real symmetric eigenvalue problem is equivalent to finding the principal axes or “normal modes” of the harmonic potential defined by A(x⇑) = Σ_{i,j} x_{i} A_{i,j} x_{j}, where the x_{i} are thought of as orthogonal coordinates (we will not explicitly write down expressions for the hermitian case, but the extension is trivial). We adopt a Lagrangian approach initially, but several modifications to follow may take us away from a strict Lagrangian dynamics. We take the Lagrangian to be where the Lagrange multiplier λ enforces normalization. Lagrange's equations require λ(t) = A(x⃑(t)) = Σ_{i,j} x_{i}(t) A_{i,j} x_{j}(t), i.e., the potential energy at time t. The corresponding Euler–Lagrange equations read Defining p_{i} = ẋ_{i}, and applying a naïve Euler integrator with time step δt gives the discretetime mapping (More sophisticated discretizations, such as Verlet, also work well.) It is revealing to make a linear transformation to the (as yet unknown) normal (eigenvector) momenta and coordinates Iterating Eq. 4 is mathematically equivalent to iterating Eq. 5, which is a discrete areapreserving map corresponding to a set of N independent damped harmonic oscillators.
For a sufficiently small time step (not necessarily the regime we want to be in numerically) and assuming temporarily that λ(t) is constant, we have if e_{i} > λ, where and δ_{i} is a real phase shift [which along with a_{i} depends on initial conditions ξ_{i}(0) and π_{i}(0)]. On the other hand, for low eigenvalues e_{i} < λ, the normal coordinates evolve as where , and the first term obviously dominates at long times. Of course λ(t) is not constant in practice, but it does quickly become slowly decreasing. Thus, we see that eigenmodes with eigenvalues e_{i} < λ(t) (which set always includes the ground state, by the variational theorem) are exponentially inflating at any given time t, while the higher modes become simple oscillators. We will say more about the optimal choice of δt below.
The normalization Σ_{i} x_{i}^{2} = 1 can no longer be enforced by the Lagrange multiplier λ(t) when finite, and possibly large time steps δt are taken. That is not a problem, because the vector norm is easily imposed numerically before each time step, or even at the very end of the calculation (since we may easily generalize the previous expression for λ to λ = Σ_{i,j} x_{i}A_{i,j}x_{j}/Σ_{i} x_{i}^{2}, making Eqs. 4 and 5 equally valid for x of any norm). However, λ(t) retains the more important job of regulating inflation, by controlling the border between inflating and noninflating states. This is a key point. From this point of view, there is no reason for λ(t) in Eq. 4 or 5 to be strictly defined as the current potential energy estimate at time t. Instead, we may replace it with λ̃(t), a timedependent parameter under our control to help optimize convergence by inflating the desired eigenmodes as rapidly as possible relative to the other modes. One may show, through a simple rescaling of variables, that this replacement is mathematically equivalent to adding a timedependent damping term to Eq. 4: p_{i}(t + δt) = ⋯ − γ(t)p_{i}(t)δt, where λ̃ = λ + γ^{2}/4.
To calculate the rate of convergence of the inflation method, it is sufficient to consider inflation of the ground state coordinate ξ_{0} relative to the first excited state coordinate ξ_{1}, since ground state inflation relative to higher excited states is obviously at least as fast. From Eqs. 6 and 7, we have . The exponent is peaked, and thus the ground state is approached fastest, when the inflation border λ is precisely equal to e_{1}, the eigenvalue of the first excited state. In that case the ground state coordinate grows relative to every other at the fastest possible rate, In practice we do not know the energy e_{1} a priori, but given an estimate of the gap ε_{01} = e_{1} − e_{0}, which can be obtained in several ways, we can set λ̃(t) = λ(t) + ε_{01}, ensuring that as λ(t) approaches the ground state energy, e_{0}, we get maximal possible inflation of the ground state. When working with a class of similar physical systems, the simplest approach is to use the typical value of the gap as our initial estimate for ε_{01}, and set the initial value of λ̃ accordingly. One can do much better in a particular case by performing some number of iterations by using an initial guess for ε_{01} (to cleanse the estimate of highlying eigenpairs), and then varying this guess, while noting which value gives the steepest descent of μ = Tr[(A − λ(t))^{2}], which is a standard measure of error that requires no additional matrixvector multiplications. The optimal estimate of the gap will result in the error decaying as μ ∼ e^{−2ε01t}, providing an obvious consistency check on our estimate.
The convergence of the algorithm is limited by the time step δt, which we would like to take as large as possible to reach large times quickly. Strict accuracy is not a concern, because we are only interested in inflating the lowest few modes relative to the others, not in faithful integration of the secondorder differential equations. However, too large a time step will destabilize the stable oscillators corresponding to very high eigenvalues (very large e_{i} − λ in Eq. 5), causing inflation of the wrong modes. A short analysis shows that Eq. 5 remains stable for large e_{i} only if the time step δt obeys δt < 2/ω_{max}, where ω_{max}^{2} = e_{max} − e_{0} is an upper bound on the possible values of e_{i} − λ. Thus, we have an approximate bound δt < 2/ω_{max}, which we have found holds rather well in practice. Combining this result with the optimal rate of convergence in continuous time (Eq. 8), we find that the number of discrete steps required for convergence scales as Of course to choose an appropriate time step δt in a particular calculation, we need a rough estimate of the spectral range e_{max} − e_{0}. In this regard, it is interesting to note that starting off with too large a time step does not ruin the calculation, as is revealed by continuing with the iterations at a smaller time step: the standard convergence measure μ(t) often quickly recovers, dropping dramatically in just a few steps. This is easily understood from our earlier analysis: Too large a step may inflate some of the highestlying eigenpairs, with eigenvalues e_{i} ∼ e_{max}, ruining the measure μ. However, these eigenpairs are precisely the ones killed most rapidly as soon as the time step is reduced to the stable region. That is, inflation has already been working well over most of the spectrum, but this was not reflected in the error measure; the errant eigenpairs are then easily eliminated once the time step is reduced.
The square root in Eq. 9 is a result of using a secondorder differential equation in Eq. 3, as against a firstorder one. It is instructive to consider the more straightforward idea of applying exp(−βA) to a trial vector φ(0), solving dφ(β)/dβ = −Aφ(β) in discrete steps. In a specific basis, this gives dx_{i}(β)/dβ = −Σ_{j} A_{i,j}x_{j}(β). Why is it not an even better idea to discretize this firstorder equation in (imaginary) time β? This will inflate the ground state relative to the first excited state as e^{(e1 − e0)β} causing the ground state to dominate excited states for large β. The answer is that it works, of course, but only for much smaller time steps and with slower convergence. To prevent runaway growth of the highlying eigenmodes, the time step δβ must be chosen so that δβ ∼ 1/(e_{max} − e_{0}), as compared with in the Lagrangian case, and the number of steps needed for convergence scales as β/δβ ∼ (e_{max} − e_{0})/(e_{1} − e_{0}), i.e., the square of the number of time steps needed in the Lagrangian method (Eq. 9). The same speedup was found for the secondorder damped Car–Parrinello method (10), and also applies to the conjugate gradient method.
It is tempting to try differential equations of even higher order m ≥ 3, e.g., d^{m}x_{i}/dt^{m} = −Σ_{j} A_{i,j}x_{j} + ⋯, but this does not work, because it is impossible for all m roots ω_{i} ∼ (e_{i} + ⋯)^{1/m} to be in the same halfplane, for either sign of (e_{i} + ⋯), and thus inflation will always occur both for large and small e_{i}.
The above analysis uncovers a problem common to iterative eigenpair methods: slow convergence to the ground state if one or more excited state energies are very close to the ground state energy. A very satisfactory solution exists for this problem in the present context. Instead of waiting for inflation to separate out the ground state from these lowlying excited states, we admit the lowlying excited states into our calculation by choosing a window size w and performing inflation with λ̃(t) = λ(t) + w. This choice optimally inflates away all modes with energy e_{i} > e_{0} + w, requiring only steps for convergence, and the resulting vector φ contains (to any desired accuracy) only contributions from the k states within the energy window [e_{0}, e_{0} + w]. Subsequently, φ is iterated an additional k − 1 times, saving vectors φ_{n} and Aφ_{n} after each iteration, and finally the hamiltonian A is constructed and diagonalized explicitly in the subspace spanned by φ_{0} … φ_{k−1}. The parameter k may be incremented until convergence to the lowest eigenpair is achieved. We note that the diagonalization is numerically trivial for moderate k, so the only significant additional cost is that associated with the storage of the 2k vectors φ_{n} and Aφ_{n}. A tradeoff between time and storage constraints determines the optimal window size w, because a larger w requires fewer iteration steps, but makes it necessary for a greater number of vectors to be simultaneously stored in memory before the final diagonalization. We have successfully used this approach on various matrices (see below for details).
Several obvious generalizations can be implemented. The governing equations of the inflation method may easily be extended to nonorthogonal basis vectors. Also, the eigenvectors associated with the first excited state, second excited state, and so on, can easily be found by evolving several vectors simultaneously and including constraints that enforce their orthogonality to one another, i.e., φ⃗^{α}·φ⃗^{β} = 0 for α ≠ β. For example, this may be accomplished by adding a term ½ν_{αβ} Σ_{i} x_{i}^{α} x_{i}^{β} to the Lagrangian, where ν_{αβ} is a Lagrange multiplier that can be shown to equal Σ_{ij} x_{i}^{α} A_{i,j} x_{j}^{β}. Alternatively, one may begin by dynamically evolving a single random vector by using an appropriate window w, where w is chosen to include all eigenvalues of interest. After eigenpairs lying outside the window have been inflated away, one performs k × k diagonalization by using k iterates of the initial vector as discussed above to obtain approximations for k lowestlying eigenpairs. The k approximate vectors may then be dynamically evolved individually while enforcing the orthogonality constraints to obtain any desired accuracy for each eigenpair.
We have applied the inflation method to a considerable variety of large matrices, including diagonally dominated sparse, random sparse, and full, with diagonal elements chosen unevenly or evenly spaced, and eigenvectors weakly, moderately, or strongly mixed, up to a size of 10^{9} × 10^{9} (see Fig. 1 for a selection, which shows results for some test matrices taken from ref. 11). The limiting factor is not the dimension of the matrix per se, but rather the number N_{nz} of its nonzero elements, both in terms of storage and time required for an iteration, which both scale linearly with N_{nz}. These traits are also common to all methods employing matrixvector multiplication. The advantage of the abovedescribed diagonalization in the subspace spanned by φ_{0} … φ_{k−1} is most visible in the model of spinpolarized electrons on a triangular lattice (see Fig. 1D). This is a numerically very difficult problem because it has a high density of lowlying excitations. The lowlying states can be efficiently separated from the ground state by the diagonalization, leading to considerably better convergence.
In Fig. 2 we show that multiple eigenpairs can be obtained simultaneously by dynamically evolving a single initial vector. Here we perform 6 × 6 diagonalization after every 6 steps, and show convergence of the first 4 eigenpairs. The algorithm, in fact, allows us to obtain an arbitrary number of lowest eigenpairs. For this we use the diagonalization in a subspace generated by inflating several eigenpairs under an orthogonality constraint.
We have investigated a few significant variations of the ideas presented here and have found thus far that the inflation approach works best. For example, the idea of using a dynamical system to find eigenpairs suggests the following alternative idea: Instead of a Lagrangian constraint, consider a “soft” normalization constraint, imposed by adding a smooth quartic potential term, making the complete pseudopotential Again damped dynamics is used. In the limit of large κ, the quartic potential enforces unit norm of the solution vector, just as the Lagrangian constraint does. Now consider the nature of the extrema of V. With no loss of generality, we make a unitary transformation to the (unknown) normal coordinates η_{i}, The extrema of this potential are given (in the η basis) by Besides the trivial extremum at the origin (all η_{i} = 0), we have up to 2N extrema corresponding to the N possible eigenstates, where a single , while η_{j} = 0 for all j ≠ i. For sufficiently strong constraint coefficient (κ > e_{0}/2), we easily check that all extrema are saddles with at least one unstable direction, with the exception of the extremum associated with the ground state (i.e., the global minimum). Thus, for almost all (i.e., all but a set of measure zero) initial conditions, the trajectory leads downhill to the true minimum. It does not dally long on any intermediate saddles it encounters because of their exponential instability (see Fig. 3). The global minimum is doubled to two equivalent solutions, which are related by a prefactor of −1 and correspond to the same eigenstate. It also does not matter if we adhere strictly to the rules of classical mechanics in getting to the minimum; any stepping method leading to the minimum will do. Note that the solution is independent of κ and is exact, provided only that κ > e_{0}/2.
The inflationary approach proposed here is quite competitive with standard methods. We have not yet investigated preconditioning. Although the inflationary method is quite general, it seems especially well suited to physics and quantum chemistry problems, because of its dynamical underpinnings.
Acknowledgments
We thank Ernest Davidson for very helpful comments. This work was supported in part by National Science Foundation Grant PHY0545390.
Footnotes
 ↵^{§}To whom correspondence should be addressed. Email: heller{at}physics.harvard.edu

Author contributions: E.J.H, L.K., and F.P. designed research; E.J.H, L.K., and F.P. performed research; and E.J.H., L.K., and F.P. wrote the paper.

The authors declare no conflict of interest.
 Received October 30, 2007.
 © 2008 by The National Academy of Sciences of the USA
References
 ↵
 Vorst HA
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Davis T
Citation Manager Formats
More Articles of This Classification
Physical Sciences
Related Content
 No related articles found.
Cited by...
 No citing articles found.