New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Localized bases of eigensubspaces and operator compression

Communicated by Ingrid Daubechies, Princeton University, Princeton, NJ, November 24, 2009 (received for review November 21, 2007)
Abstract
Given a complex local operator, such as the generator of a Markov chain on a large network, a differential operator, or a large sparse matrix that comes from the discretization of a differential operator, we would like to find its best finite dimensional approximation with a given dimension. The answer to this question is often given simply by the projection of the original operator to its eigensubspace of the given dimension that corresponds to the smallest or largest eigenvalues, depending on the setting. The representation of such subspaces, however, is far from being unique and our interest is to find the most localized bases for these subspaces. The reduced operator using these bases would have sparsity features similar to that of the original operator. We will discuss different ways of obtaining localized bases, and we will give an explicit characterization of the decay rate of these basis functions. We will also discuss efficient numerical algorithms for finding such basis functions and the reduced (or compressed) operator.
Introduction
Given a local operator or a large sparse matrix A and a number M, it is well known that, in some appropriate sense, the M × M matrix that best approximates or A is its projection to the eigensubspaces corresponding to the M largest or smallest eigenvalues, depending on the problem. The question of interest in this paper is the following: Assuming that the operator is local or that the matrix A is sparse, how do we represent these subspaces most efficiently, i.e., how do we find basis vectors of these eigensubspaces that require the fewest degrees of freedom to represent?
Questions of this nature arise in many different contexts. Here are a few examples.
LowRank Approximation of a Large Matrix.
In many applications ranging from DNA microarray analysis, facial and object recognition, to web search models, we encounter the following problem (1): Given a large N × N matrix A , one wants to find the best approximation of A by a lowrank matrix D , i.e., [1]where the norm is usually taken as the spectral norm ‖·‖_{2} or the Frobenius norm ‖·‖_{F}. The solution to this problem is easily formulated in terms of the singular value decomposition (SVD) of A . Finding it efficiently, however, is a nontrivial matter (1, 2).
Approximation of Differential Operators with Multiscale Coefficients.
Consider an elliptic equation with multiscale coefficients: [2] [3]where D is a smooth domain in . Here, we used the notation a ^{ε}(x) to represent the coefficient to signal the fact that it varies over many disparate scales. Our objective is to reduce this operator to an operator with fewer degrees of freedom or much less information content. When the scales of a ^{ε}(x) are far apart, this is a standard problem in homogenization theory, which tells us that the reduced operator takes the same form as in Eq. 2, except that the coefficient is replaced by the homogenized coefficient (3, 4). However, we are interested in the case when standard homogenization theory does not apply, i.e., when there is no particular structure in a ^{ε} and, in particular, when there is no clear separation of scales. We will then ask the question: Given an integer M, what is the M × M matrix A _{0} that best approximates the operator in Eq. 2.
Electronic Structure Analysis.
The objective of electronic structure analysis using, for example, density functional theory, can be formulated as finding the eigensubspace of the Hamiltonian that corresponds to its lowest eigenvalues (5). This is a major problem in computational quantum chemistry and material science (6). Localized orbitals such as Wannier functions have been proposed, and it is expected that they will lead to the socalled linear scaling algorithms (7).
Reduction of Markov Chains on a Large Network.
Many problems can be formulated as Markov chains on networks. Examples include chemical kinetic systems, diffusion and chemical reaction on surfaces, image processing, and, of course, dynamics on networks such as communication networks. Often such networks are rather large and it is of interest to reduce them to a much smaller state space. One classical idea is to lump the Markov chain (8). However, lumpability is a rather stringent condition which in many ways is analogous to the scale separation property discussed above for differential operators with multiscale coefficients. Just as the case the coefficients do not have separated scales, we should consider the situation when the Markov chain is not lumpable.
In principle, these problems can be considered as special cases of model reduction. In fact the solutions to these problems are given quite simply by different variants of SVD. However, the standard SVD algorithms have a complexity of . This is too costly when M≫1. In addition, the outputs of SVD are eigenvectors, which are global objects. This is sometimes not as convenient as local ones.
The viewpoint we will take is the following: Our real interest is the subspace itself, not the particular basis functions. Although the eigenvectors form a natural basis set, we are free to choose other basis sets which may better suit our purposes. In particular, for the problems discussed above, it is of special interest to choose a localized basis set. This will enable us to obtain sparse representation for the basis functions and the reduced operator. This philosophy is similar to that of the multiresolution analysis of operators (9), except that we are now dealing with a large dimensional eigensubspace of the operator, not the whole space. For this reason, the basis functions will have to be adapted to the operator, unlike the wavelet bases which are universal.
Our work is motivated by the work done in electronic structure analysis where similar issues have been discussed (6, 7). In particular, the ideas of Wannier functions (10, 11) are meant to provide a set of localized bases. However, although these ideas are intuitively very attractive, they still remain quite vague (except for the case of crystals) and the present work provides a starting point for making them precise.
There are two main results in this paper. The first is a theoretical result that shows how the rate of decay of the basis functions depends on the choice of the weight functions in the setting when weight functions are used to find the localized basis. The second is a general algorithm for computing such localized basis functions. This algorithm has already found applications in electronic structure analysis (12), image segmentation (13), and homogenization problems.
There is a large amount of literature on topics closely related to the work presented here. We already mentioned the work on lowrank approximation, lumping Markov chains, and electronic structure analysis, and operator compression using wavelet basis (9, 14). In addition, algebraic multigrid methods and generalized finite element methods (15 ⇓ ⇓–18) are also related. We will comment on some of these connections throughout the paper.
Problem Formulation
We begin with the problem of finding the best approximation of a large symmetric matrix by a much smaller matrix. Given a large N × N symmetric positive definite matrix A and an integer M, M ≪ N, find a matrix , rank( Q ) = M, such that Q minimizes [4]where Q ^{+} = ( Q ^{T} Q )^{1} Q ^{T} is the Moore–Penrose generalized inverse of Q . It is easy to see that the real object here is the subspace spanned by the column vectors of Q : The value of the objective function does not change as long as this subspace does not change. This means that the functional tr( Q ^{+} AQ ) is defined on the Grassmanian manifold formed by all Mdimensional subspaces of . We can also limit the column vectors of Q to be orthonormal vectors. In this case the problem can be formulated as follows: Find a set of orthonormal bases U = (u _{1},u _{2},…,u _{M}) that minimizes [5]
Note that there is no need to start with a matrix: We could have just as well started with a selfadjoint operator. This would correspond to the case when N = ∞. Much of what will be discussed below applies equally well to this case.
Similar questions can be asked about the subspace spanned by the eigenvectors corresponding to the M largest eigenvalues, e.g., we may consider the objective function [6]This is the formulation relevant to the problem of finding lowrank approximation of matrices and the reduction of Markov chains given the transition probability matrix (1, 8).
At first sight, the problem discussed here also bears some similarity to the procedure in the algebraic multigrid method (18). There, we are given a large stiffness matrix A = A _{h} corresponding to a finite element discretization on a fine grid with grid size h, and one wants to find a set of localized functions in the finite element space that solve the minimization problem [7]where Q is the matrix formed by . Formally, this seems quite similar to [5], except for the requirement of orthonormality in [5]. If the orthonormality condition is removed in [5], Q ^{T} should be replaced by Q ^{+} as in [4].
Localized Basis Functions
Wannier Basis Functions.
Let us first look at a simple example, the 1D Laplace operator with periodic boundary condition with period 2π. Let M = 2n + 1. The eigensubspace of interest is [8]The basis functions shown above are the Bloch functions in quantum mechanics (6). We will call these the eigenbases. In this representation, the projected (compressed) operator on V _{M} takes the form A _{0} = diag(0,1,1,…,n,n). This is a diagonal operator and is obviously very sparse. However, the basis functions themselves are not at all sparse.
To construct a localized basis set, we apply the discrete inverse Fourier transform on a uniform grid with size M = 2n + 1 to the eigenbases above, and we get [9]up to a normalization constant, where x _{j} = 2jπ/(2n + 1). This is in fact the Wannier basis functions (11). We call this the Dirichlet basis because they correspond to the Dirichlet kernel in the Fourier series (19). The corresponding compressed operator takes the form [10]up to a constant factor. A _{0} is a circular matrix. In Figs. 1 and 2, we plot, respectively, the vector u _{0} and the gray map of A _{0} (darker color represents larger amplitudes of the matrix elements). Both show obvious decay behavior.
The Dirichlet basis has another attractive feature besides being localized: It is translation invariant. Obviously, this property is a consequence of the fact that the original operator is translation invariant and is not expected to hold in more general cases. However, even in the general case, it is still possible that some features of translational invariance are retained. For example, we may expect that the functions ψ _{j} have similar shapes for different j. This is definitely not the case for the eigenbases.
Approximate δFunctions.
We can view the Dirichlet kernel as the projection of δfunction, which is the most localized function, to the subspace V _{M}: Let ; its projection to V _{M} is which is the Dirichlet kernel. The slow decay of the oscillatory tail of D _{n} is a result of the Gibbs phenomenon caused by the lack of smoothness of δ(x). An obvious idea for obtaining basis functions with better decay properties is to use filtering (20). We define [11]where σ(η) is a filter. The decay properties of depends on the smoothness of the filter σ. The following result is well known.
Assume that σ satisfies the following conditions for positive integer p:

σ(η) is real and even in (∞,+∞).

σ(0) = 1, and if p≥2, σ ^{(l)}(0) = 0, 1 ≤ l ≤ p  1.

σ(η) = 0, η≥1.

and .
Then we have [12]where C is a constant independent of n and x, ‖·‖_{BV} is the bounded variation norm.
The filter σ(η) satisfying the above conditions is called a pthorder filter (20). The Dirichlet kernel corresponds to the case when σ(η) = χ _{[1,1]}(η), which is a zerothorder filter in the sense of [12]. If we take σ(η) = 1  η(η∈[1,1]), which is a firstorder filter, we obtain the Fejér kernel.
If we use the filtered kernel to form the basis functions, the components of the compressed operator for the Laplacian can be found as follows: [13]up to a normalization constant. The behavior of A _{0} is similar to that shown in Fig. 2, i.e., the entries decay away from the diagonal. More results on the basis functions that correspond to the filtered delta functions can be obtained similarly from Eq. 11.
Localization Using Weight Function.
Another natural idea for obtaining localized bases is to use a weight function. More precisely, consider the following variational problem: [14]where the weight function w(x) is an even, nonnegative periodic function in [π,π) with w(0) = 0. Such ideas have been explored in ref. 21 for electronic structure computation where w(x) = x ^{2} is used.
The minimization problem [14] is equivalent to an eigenvalue problem [15]where , are the eigenbases, and the matrix The smallest eigenvalue of B is λ _{min}.
Take the Laplace operator as an example: Let M = 2n + 1, ψ _{j}(x) = e ^{ijx},j = n,n + 1,…,n, using the weight function x ^{2}, we obtain the localized bases shown in Fig. 3. It is easy to see that these basis functions decay faster than the Dirichlet kernel.
The eigenvalue problem [15] can be expressed in a more explicit form. Assume that the weight function has the Fourier representation , since w(x) is real and even, [14] is equivalent to [16]where is the projection operator to the subspace V _{M}. The eigenvalue problem then becomes [17]where [18]Note that W is a real symmetric Toeplitz matrix (22).
It is natural to ask whether we can obtain a characterization for basis functions obtained using weight functions. To answer this question, we will find an effective filter for a given weight function, when applied to the example V _{M}. If we set , we have [19]Assuming that, as M → ∞, σ _{n} has a limit σ, the localized function obtained by minimizing [14] can then be regarded as an approximation of δfunction with filter σ. This observation suggests considering the limiting problem of [16] and [17].
Assume that c is given by , where σ is smooth and equal to zero outside (1,1). The key is to understand the behavior of W c as M → ∞, where W is defined as in Eq. 18. In this direction, we have
Assume that the weight function w is smooth in (π,π). If w ^{(2l)}(0) is the first nonzero term in , we have [20]pointwise as M → ∞, where .
According to Lemma 2, only the local behavior of w at the origin matters in the limit. This is easy to understand intuitively because, as M becomes large, the basis function becomes more localized around zero, and hence it only feels the local behavior of w there.
The proof of Lemma 2 is based on the observation that W can be regarded as the discretization of a differential operator. Because zero boundary condition is assumed, Taylor expand σ around , we get because the odd order terms vanish due to the symmetry of w _{j}. An elementary computation gives as M and hence n goes to infinity. Assume that w ^{(2l)}(0) is the first nonzero term in , it is clear that by the definition of .
Given the weight function w, the effective filter is characterized by the problem [21] [22]The smoothness of σ improves as l increases. The effective filter functions can be found by solving the eigenvalue problems [21] and [22]. For w(x) = x ^{2}, the effective filter function is given explicitly by σ(η) = cos(πη/2). The different effective filter functions corresponding to the weight functions x ^{2} and x ^{6} are shown in Fig. 4. Putting Lemma 1 and Lemma 2 together, we arrive at
Assume that w is smooth in (π,π). If w ^{(2l)}(0) is the first nonzero term in , then [23]where C is a constant independent of n and x.
The basis functions produced with the weight functions x ^{2}, x ^{4} and x ^{6} are shown in Fig. 3. Only one basis function is shown for each case, due to the translation invariance property.
Another interesting weight function is proposed in (23): w(x) = 1  χ _{[Rc,Rc]}, where R _{c} is the localization radius and χ is the characteristic function.
Corollary 1 suggests choosing w(x) = x ^{2l} with large l. However, this is only part of the story. The other important part of the story is numerical stability, i.e., condition number of the system. It can be shown that the condition number increases like O((π/ε _{M})^{l}), where ε _{M} is the minimal eigenvalue of W in the space V _{M}. Therefore, the best weight function has to be a compromise between smoothness and numerical stability. Numerical results show the weight functions x ^{6} and x ^{8} are good choices.
Given a weight function, Lemma 2 tells us how to find the effective filter. It is natural to consider the inverse problem: Given a filter σ, can we find a weight function w that corresponds to this filter? This problem is closely related to the inverse Toeplitz eigenvalue problem, which is quite nontrivial. Some existence results are proved by Landau in ref. 24.
Results
In the general case, we will have to resort to numerical computations to find the bases discussed above. The standard numerical technique for finding eigensubspaces is the subspace iteration method (25, 26). This is a generalization of the power method for computing leading eigenvalues and eigenvectors of a matrix. Subspace iteration consists of two main steps: multiplication of a set of approximate basis vectors by the matrix A and orthogonalization (25, 26). We propose a localized version of this algorithm, in which the orthogonalization step is replaced by a localization step. The resulting algorithm has a complexity of O(N) instead of O(N + M ^{3}).
To begin with, we choose beforehand the localization regions which will be the support of the basis functions and the weight functions centered in S _{i}, which are appropriate translations of the weight function analyzed above. The choice of these regions will have to be informed by the understanding of the specific problems. Adaptive algorithms for choosing these sets can be contemplated but have not yet been investigated.
Knowing , we find by the following steps:

Purification: Compute , i = 1,2,…,M;

Localization: For each i = 1,2,…,M, denote by the linear space spanned by whose support overlaps with that of . Find in V _{i} by solving the variational problem [24]Here, ‖·‖_{2,wi} is the weighted l ^{2} norm with weight function w _{i}.

Truncation: is the truncation of on S _{i}. [25]where χ _{Si} is the indicator function of S _{i}.
In the algorithm, p( A ) is some polynomial. A good choice is the Chebyshev polynomial (26). For an illustration of the algorithm and the resulting localized bases, we consider a differential operator with εperiodic coefficient: A = ∇·(a(x/ε)∇) with a(y) = 1 + 0.5 cos(2πy) on the interval [0, 21] with periodic boundary condition. We take M = 21 and therefore the corresponding coarse grid size is one. The small parameter ε is chosen to be oneeighth and the operator is discretized with standard secondorder finite difference scheme with fine grid size 1/64. Four weight functions are used: x ^{2}, x ^{6}, 1  χ _{[Rc,Rc]} with R _{c} = 1, and 2; the resulting localized bases are shown in Fig. 5. In Fig. 6, we compare the localized bases with the basis function from the generalized finite element method (15, 17) with support sizes two and four (corresponding to the choices R _{c} = 1 and 2, respectively). One can see that the generalized (or multiscale) finite element basis with support size four compares much better with the bases produced using our approach. Detailed results on the application of the ideas presented here to homogenization problems can be found in ref. 14.
Other applications of the algorithm include electronic structure calculation (12) and image segmentation (13). Some issues of accuracy and convergence in the context of electronic structure calculation are discussed in ref. 27.
Discussion
There is a long list of problems that need to be investigated further. On the algorithmic side, the most pressing problem is to speed up the convergence of the algorithms discussed here. On the numerical analysis side, it is crucial that we have a clear understanding of the effect of truncating the support of the basis functions. Of course, the application of these ideas remains largely unexplored, although initial progress has been made for some of the problems discussed in the Introduction.
Acknowledgments
W.E. and J.L. are supported in part by Office of Naval Research Grant N00014010674, Department of Energy Grant DEFG0203ER25587, and National Science Foundation Grant DMS0708026. T.L. is partially supported by National Science Foundation of China under Grant 10401004 and the National Basic Research Program under Grant 2005CB321704.
Footnotes
 ↵ ^{1}To whom correspondence should be addressed. Email: tieli{at}pku.edu.cn.

Author contributions: W.E., T.L., and J.L. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper.

The authors declare no conflict of interest.
References
 ↵
 ↵
 ↵
 Bensoussan A,
 Lions JL,
 Papanicolaou G
 ↵
 Zhikov VV,
 Kozlov SM,
 Oleinik OA
 ↵
 Jeltsch R,
 Wanner G
 E W
 ↵
 Martin R
 ↵
 ↵
 Kemeny JG,
 Snell JL
 ↵
 ↵
 ↵
 ↵
 GarcíaCervera CJ,
 Lu J,
 E W
 ↵
 ↵
 ↵
 ↵
 ↵
 ↵
 Xu J,
 Zikatanov L
 ↵
 Zygmund A
 ↵
 ↵
 ↵
 Gray RM
 ↵
 Gao W,
 E W
 ↵
 ↵
 Golub G,
 Van Loan C
 ↵
 Saad Y
 ↵
Citation Manager Formats
Sign up for Article Alerts
Article Classifications
 Physical Sciences
 Applied Mathematics