New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
 Agricultural Sciences
 Anthropology
 Applied Biological Sciences
 Biochemistry
 Biophysics and Computational Biology
 Cell Biology
 Developmental Biology
 Ecology
 Environmental Sciences
 Evolution
 Genetics
 Immunology and Inflammation
 Medical Sciences
 Microbiology
 Neuroscience
 Pharmacology
 Physiology
 Plant Biology
 Population Biology
 Psychological and Cognitive Sciences
 Sustainability Science
 Systems Biology
Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods

Contributed by R. R. Coifman, February 2, 2005
Abstract
In the companion article, a framework for structural multiscale geometric organization of subsets of and of graphs was introduced. Here, diffusion semigroups are used to generate multiscale analyses in order to organize and represent complex structures. We emphasize the multiscale nature of these problems and build scaling functions of Markov matrices (describing local transitions) that lead to macroscopic descriptions at different scales. The process of iterating or diffusing the Markov matrix is seen as a generalization of some aspects of the Newtonian paradigm, in which local infinitesimal transitions of a system lead to global macroscopic descriptions by integration. This article deals with the construction of fastorder N algorithms for data representation and for homogenization of heterogeneous structures.
In the companion article (1), it is shown that the eigenfunctions of a diffusion operator, A, can be used to perform global analysis of the set and of functions on a set. Here, we present a construction of a multiresolution analysis of functions on the set related to the diffusion operator A. This allows one to perform a local analysis at different diffusion scales.
This is motivated by the fact that in many situations one is interested not in the data themselves but in functions on the data, and in general these functions exhibit different behaviors at different scales. This is the case in many problems in learning, in analysis on graphs, in dynamical systems, etc. The analysis through the eigenfunctions of Laplacian considered in the companion article (1) are global and are affected by global characteristics of the space. It can be thought of as global Fourier analysis. The multiscale analysis proposed here is in the spirit of wavelet analysis.
We refer the reader to (2–4) for further details and applications of this construction, as well as a discussion of the many relationships between this work and the work of many other researchers in several branches of mathematics and applied mathematics. Here, we would like to at least mention the relationship with fast multiple methods (5, 6), algebraic multigrid (7), and lifting (8, 9).
Multiscale Analysis of Diffusion
Construction of the Multiresolution Analysis. Suppose we are given a selfadjoint diffusion operator A as in ref. 1 acting on ℒ^{2} of a metric measure space (X, d, μ). We interpret A as a dilation operator and use it to define a multiresolution analysis. It is natural to discretize the semigroup {A^{t} } _{t} _{≥0} of the powers of A at a logarithmic scale, for example at the times For a fixed ε ∈ (0,1), we define the approximation spaces by where the φ _{i} s are the eigenvectors of A, ordered by decreasing eigenvalue. We will denote by P_{j} the orthogonal projection onto V_{j} . The set of subspaces is a multiresolution analysis in the sense that it satisfies the following properties:

lim _{j} _{→–∞} V_{j} = ℒ^{2}(X, μ), .

V_{j} _{+1} ⊆ V_{j} for every .

is an orthonormal basis for V_{j} .
We can also define the detail subspaces W_{j} as the orthogonal complement of V_{j} in V_{j} _{+1}, so that we have the familiar relation between approximation and detail subspaces as in the classical wavelet multiresolution constructions:
This is very much in the spirit of a Littlewood–Paley decomposition induced by the diffusion semigroup (10). However, in each subspace V_{j} and W_{j} we have the orthonormal basis of eigenfunctions, but we would like to replace them with localized orthonormal bases of scaling functions as in wavelet theory. Generalized Heisenberg principles (see Extension of Empirical Functions of the Data Set) put a lower bound on how much localization can be achieved at each scale j, depending on the spectrum of the operator A and on the space on which it acts. We would like to have basis elements as much localized as allowed by the Heisenberg principle at each scale, and spanning (approximately) V_{j} . We do all this while avoiding computation of the eigenfunctions.
We start by fixing a precision ε > 0 and assume that A is represented on the basis Φ_{0} = {δ _{k} } _{k} _{ε} _{X} . We consider the columns of A, which can be interpreted as the set of functions on X. We use a local multiscale Gram–Schmidt procedure, described below, to carefully but efficiently orthonormalize these columns into a basis Φ_{1} = {ϕ_{1,} _{k} } _{k} _{ε} _{X1} (X _{1} is defined as this index set) for the range of A up to precision ε. This is a linear transformation we represent by a matrix G _{0}. This yields a subspace that is εclose to V _{1}. Essentially, Φ_{1} is a basis for a subspace that is εclose to the range of A, the basis elements that are well localized and orthogonal. Obviously, X _{1} ≤ X, but the inequality may already be strict since part of the range of A may be below the precision ε. Whether this is the case or not, we have then a map M _{0} from X to X _{1}, which is the composition of A with the orthonormalization by G _{0}. We can also represent A in the basis Φ_{1}: We denote this matrix by A _{1} and compute . See the diagram in Fig. 1.
We now proceed by looking at the columns of , which are up to precision ε, by unraveling the bases on which the various elements are represented. Again we can apply a local Gram–Schmidt procedure to orthonormalize this set: This yields a matrix G _{1} and an orthonormal basis Φ_{2} = {ϕ_{2,} _{k} } _{k} _{ε} _{X} _{2} for the range of up to precision ε, and hence for the range of up to precision 2ε. Moreover, depending on the decay of the spectrum of A, X _{2} << X _{1}. The matrix M _{1}, which is the composition of G _{1} with , is then of size X _{2} × X _{1}, and is a representation of A ^{4} acting on Φ_{2}.
After j steps in this fashion, we will have a representation of A ^{1+2+22+· · ·+2 j } = A ^{2} ^{j} ^{+1–1} onto a basis Φ _{j} = {ϕ _{j} _{,} _{k} } _{k} _{ε} _{Xj} that spans a subspace which is jεclose to V_{j} . Depending on the decay of the spectrum of A, we expect X_{j}  << X; in fact, in the ideal situation§ the spectrum of A decays fast enough so that there exists γ < 1 such that X _{j} < γ^{2} ^{j} ^{+1–1}X. This subspace is spanned by “bump” functions at scale j, as defined by the corresponding power of the diffusion operator A. The “centers” of these bump functions can be identified with X_{j} , which we can think of X_{j} as a coarser version of X. The basis Φ _{j} is naturally identified with the set of Dirac δfunctions on X_{j} ; however, one can extend these functions, defined on the “compressed” graph X_{j} , to the whole initial graph X by writing Since every function in Φ_{0} is defined on X, so is every function in Φ _{j} . Hence, any function on the compressed space X_{j} can be extended naturally to the whole X. In particular, one can compute lowfrequency eigenfunctions on X_{j} and then extend them to the whole X. This is of course completely analogous to the standard construction of scaling functions in the Euclidean setting (5, 11, 12). Observe that each point in X_{j} can be considered as a “local aggregation” of points in X_{j} _{–1}, which is completely dictated by the action of the operator A on functions on X: A itself is dictating the geometry with respect to which it should be analyzed, compressed, and applied to any vector.
We have thus computed and efficiently represented the powers A ^{2 j }, for j > 0, that describe the behavior of the diffusion at different time scales. This applies to the solution of discretized of partial differential equations, of Markov chains, and in learning and related classification problems.
Wavelet Transforms and Green's Function. The construction immediately suggests an associated fast scaling function transform: Suppose we are given f on X and want to compute 〈f, ϕ _{j} _{,} _{k} 〉 for all scales j and corresponding “translations” k. Being given f is equivalent to saying we are given (〈f, ϕ_{0,} _{k} 〉) _{k} _{ε} _{X} . Then we can compute (〈f, ϕ_{1,} _{k} 〉) _{k} _{ε} _{X} _{1} = M _{0} (〈f, ϕ_{0,} _{k} 〉) _{k} _{ε} _{X} , and so on for all scales. The matrices M_{j} are sparse (since A_{j} and G_{j} are), so this computation is fast. This generalizes the classical scaling function transform. We will see later that wavelets can be constructed as well and a fast wavelet transform is possible.
In the same way, any power of A can be applied fast to a function f. In particular, Green's function (I – A)^{–1} can be applied fast to any function. Since if we let , we see that and each term of the product can be applied fast to f.
The construction of the multiscale bases can be done in time Ο(n log^{2} n), where n = X, if the spectrum of A has fast enough decay. The decomposition of a function f onto the scaling functions and wavelets we construct can be done in the same time, and so does the computation of (I – A)^{–1} f.
The Orthogonalization Process. We sketch here how the orthogonalization works (for details, refer to refs. 2 and 3). Suppose we start from a δlocal basis (in our case, ϕ _{x} is going to be a bump A^{l} δ _{x} ). We greedily build a first layer of basis functions , as follows. We let ϕ_{0,} _{x0} be a basis function with greatest ℒ ^{2}norm. Then we let ϕ_{0,} _{x} _{1} be a basis function with biggest ℒ ^{2}norm among the basis functions with support disjoint from the support of ϕ_{0,} _{x} _{0} but not farther than δ from it. By induction, after ϕ_{0,} _{x} _{0},..., ϕ_{0,} _{xl} have been chosen, we let ϕ_{0,xl+1} be a scaling function with largest ℒ ^{2}norm among those having a support that does not intersect any of the supports of the basis functions already constructed but is not farther than δ from the closest such support. We stop when no such choice can be made. One can think of roughly as a 2δ lattice.
At this point, Φ_{0} in general spans a subspace much smaller than the one spanned by Φ. We construct a second layer , as follows. Orthogonalize each to the functions . Observe that since the support of ϕ _{x} is small, this orthogonalization is local, in the sense that each ϕ _{x} needs to be orthogonalized only to the few that have an intersecting support. In this way, we get a set , orthogonal to Φ_{0} but not orthogonal itself. We orthonormalize it exactly as we did to get Φ_{0} from Φ. We proceed by building as many layers as necessary to span the whole space 〈Φ〉 (up to the specified precision ε).
Wavelets. We would like to construct bases {ψ _{j} _{,} _{k} } _{k} for the spaces W_{j}, j ≥ 1, such that V_{j} ⊕^{⊥} W_{j} = V_{j} _{+1}. To achieve this, after having built and , we can apply our modified Gram–Schmidt procedure with geometric pivoting to the set of functions which will yield an orthonormal basis of wavelets for the orthogonal complement of V_{j} _{+1} in V_{j} . Observe that each wavelet is a result of an orthogonalization process that is local, so the computation is again fast. To achieve numerical stability, we orthogonalize at each step the remaining ϕ _{j} _{+1,} _{k} s to both the wavelets built so far and ϕ _{j} _{,} _{k} . Wavelet subspaces can be recursively split further to obtain diffusion wavelet packets (4), which allow the application of the classical fast algorithms (13) for denoising (14), compression (15), and discrimination (16).
Examples and Applications
Example 3.1: Multiresolution diffusion on the homogeneous circle. To compare with classical constructions of wavelets, we consider the unit circle, sampled at 256 points, and the classical isotropic heat diffusion on it. The initial orthonormal basis Φ_{0} is given by the set of δfunctions at each point, and we build the diffusion wavelets at all scales, which clearly relate to splines and multiwavelets. The spectrum of the diffusion operator does not decay very fast (see Figs. 2 and 3).
Example 3.2: Dumbbell. We consider a dumbbellshaped manifold, sampled at 1,400 points, and the diffusion associated to the (discretized) Laplace–Beltrami operator as discussed in ref. 1. See Fig. 4 for the plots of some scaling functions and wavelets: They exhibit the expected locality and multiscale features, dependent on the intrinsic geometry of the manifold.
Example 3.3: Multiresolution diffusion on a nonhomogeneous circle. We can apply the construction of diffusion wavelets to nonisotropic diffusions arising from partial differential equations, to tackle problems of homogenization in a natural way. The literature on homogenization is vast (see, e.g., refs. 17–21 and references therein).
Our definition of scales, driven by the differential operator, in general results in highly nonuniform and nonhomogeneous spatial and spectral scales, and in corresponding coarse equations of the system, which have high precision.
For example, we can consider the nonhomogeneous heat equation on the circle where c(x) is a positive function close 0 at certain points and almost 1 at others. We want to represent the intermediate and large scale/time behavior of the solution by compressing powers of the operator representing the discretization of the spatial differential operator ∂/∂x(c(x)(∂/∂x)). The spatial differential operator on the righthand side of Eq. 4 is a matrix T that, when properly normalized, can be interpreted as a nontranslation invariant random walk. Our construction yields a multiresolution associated to this operator that is highly nonuniform, with most scaling functions concentrated around the points where the conductivity is highest, for several scales. The compressed matrices representing the (dyadic) powers of this operator can be viewed as multiscale homogenized versions, at a certain scale that is time and space dependent, of the original operator (see Fig. 5).
While the examples above illustrate classical settings, the construction of diffusion wavelets carries over unchanged to weighted graphs, by considering the generator of the diffusion associated to the natural random walk (and Laplacian) on the graph. It then allows a natural multiscale analysis of functions of interest on such a graph. We expect this to have a wide range of applications to the analysis of large data sets, document corpora, network traffic, etc., which are naturally modeled by graphs.
Extension of Empirical Functions of the Data Set
An important aspect of the multiscale developed so far involves the relation of the spectral theory on the set to the localization on and off the set of the corresponding eigenfunctions and diffusion scaling functions and wavelets. In addition to the theoretical interest of this topic, the extension of functions defined on a set X to a larger set X̄ is of critical importance in applications such as statistical learning. To this end, we construct a set of functions, termed geometric harmonics, that allow one to extend a function f off the set X, and we explain how this provides a multiscale analysis of f. For a more detailed studied of geometric harmonics, the reader is referred to ref. 22.
Construction of the Extension: The Geometric Harmonics. Let us specify the mathematical setting. Let X be a set contained in a larger set X̄, and μ be a measure on X. Suppose that one is given a positive semidefinite symmetric kernel k(·, ·) defined on X̄ × X̄, and if f is defined on X, let K: L ^{2}(X, μ) → L ^{2} (X, μ) be defined by Let {ψ _{j} } and be the eigenfunctions and eigenvalues of this operator. Note that under weak hypotheses, the operator K is compact, and its eigenfunctions form a basis of L ^{2}(X, μ). Then, by definition, if , then where this identity holds for x ε X. Now, if we let x be in X̄, the righthand side of this equation is well defined, and this allows one to extend ψ _{j} as a function defined on X̄. This procedure, which goes by the name of Nyström extension, has already been suggested to overcome the problem of largescale data sets (23) and to speed up the data processing (24).
From the above, each extension is constructed as an integral of the values over the smaller set X and consequently verifies some sort of mean value theorem. We call these functions geometric harmonics.
From the numerical analysis point of view, one has to be careful as λ _{j} → 0 as j → +∞, and one can extend only the eigenfunctions ψ _{j} for which , where δ > 0 is preset number. We can now safely define the extension of function f from X to X̄ by for x ∈ X̄, where 〈·, ·〉 _{X} is the inner product of L ^{2}(X, μ). This way, the extension operation has condition number 1/δ. We immediately notice that for f̄ to approximately coincide with f on X, one must have that most of the energy of f be concentrated in the first few eigenfunctions ψ _{j} .
Let us give three examples of geometric harmonics. The first example is related to potential theory. Assume that X is a smooth closed hypersurface of , dμ = dx and consider the Newtonian potential in : Then the geometric harmonics have the form and are obviously harmonic in the domain with boundary X. If f is a function on X representing the single layer density of charges on X, then the extension f̄ is, by construction, a sum of harmonic functions and an harmonic extension of f.
For the second example, consider a Hilbert basis of a subspace . For instance, this could be a wavelet basis of some finitescale shiftinvariant space. Then the diagonalization of the restriction of kernel to a set X generates geometric harmonics, and an extension procedure of empirical functions on X to functions of V.
The third example is of particular importance because it generalizes the prolate spheroidal wave functions introduced in the context of signal processing in refs. 25 and 26. Assume that and consider the space V_{B} of bandlimited functions with fixed band B > 0 (we call these functions Bbandlimited). Following the procedure explained in the second example, we can construct geometric harmonics that are Bbandlimited. It can be shown that this comes down to diagonalizing the kernel where x and y belong to X, and J _{ν} is the Bessel function of the first type and of order ν. From the first equality sign, we see that the geometric harmonics arise from a principal component analysis of the set of all restrictions of Bbandlimited complex exponentials to X.
It can verified that, in addition to be orthogonal on the set X, these Bbandlimited geometric harmonics are also orthogonal over the whole space . Moreover, minimizes the Rayleigh quotient under the constraint that f be orthogonal to {ψ_{0}, ψ_{1},..., ψ _{j} _{–1}}. In other words, is the Bbandlimited extension of ψ _{j} that has minimal energy on . As a consequence, f̄ is the Bbandlimited extension of f that has minimal energy off the set X. This type of extension is optimal in the sense that it is the average of all Bbandlimited extension of f. It also suggests that this extension satisfies Occam's razor in that it is the “simplest” among all bandlimited extensions: Any other extension is equal to f̄ plus an orthogonal bandlimited function that vanishes on X.
Multiscale Extension. For a given function f on X, we have constructed a minimal energy Bbandlimited extension f̄. In the case when X is a smooth compact submanifold of , we can now relate the spectral theory on the set X to that on .
On the one hand, any band limited function of band B > 0 restricted to X can be expanded to exponential accuracy in terms of the eigenfunctions of the Laplace–Beltrami operator Δ with eigenvalues not exceeding CB ^{2} for some small constant C > 0. On the other hand, it can be shown that every eigenfunction of the Laplace–Beltrami operator satisfying this condition extends as a bandlimited function with band C′B. Both of these statements can be proved by observing that eigenfunctions on the manifold are well approximated by restrictions of bandlimited functions.
We conclude that any empirical function f on X that can be approximated as a linear combination of eigenfunctions of Δ, and these eigenfunctions can be extended to different distances: If the eigenvalue is ν^{2}, then the corresponding eigenfunction can be extended as a νbandlimited function off the set X to a distance Cν^{–1}. This observation constitutes a formulation of the Heisenberg principle involving the Fourier analysis on and off the set X, and which states that any empirical function can be extended as a sum of “atoms” whose numerical supports in the ambient space is related to their frequency content on the set.
The generalized Heisenberg principle is illustrated on Fig. 6, where we show the extension of the functions f_{j} (θ) = cos(2πjθ)for j = 1, 2, 3 and 4, from the unit circle to the plane. For each function, we used Gaussian kernels, and the scale was adjusted as the maximum scale that would preserve a given accuracy.
Conclusion
We have introduced a multiscale structure for the efficient computation of large powers of a diffusion operator, and its Green function, based on a generalization of wavelets to the general setting of discretized manifolds and graphs. This has application to the numerical solution of partial differential equations and to the analysis of functions on large data sets and learning. We have shown that a global (with eigenfunctions of the Laplacian) or local (with diffusion wavelets) analysis on a manifold embedded in Euclidean space can be extended outside the manifold in a multiscale fashion by using bandlimited functions.
Acknowledgments
We thank James C. Bremer, Jr., Naoki Saito, Raanan Schul, and Arthur D. Szlam for their useful comments and suggestions during the preparation of the manuscript. This work was partially funded by the Defense Advanced Research Planning Agency and the Air Force Office of Scientific Research.
Footnotes

↵ † To whom correspondence should be addressed. Email: coifmanronald{at}yale.edu.

Author contributions: R.R.C., M.M., and S.W.Z. designed research; R.R.C., S.L., A.B.L., M.M., B.N., F.W., and S.W.Z. performed research; S.L. and M.M. analyzed data; and R.R.C., S.L., and M.M. wrote the paper.

↵ § By Weyl's Theorem on the distribution function of the spectrum of the Laplace–Beltrami operator, this is the case when A is an accurate enough discretization of the Laplace–Beltrami on a smooth compact Riemannian manifold with smooth boundary.
 Copyright © 2005, The National Academy of Sciences
References

↵
Coifman, R. R., Lafon, S., Lee, A. B., Maggioni, M., Nadler, B., Warner, F. & Zucker, S. W. (2005) Proc. Natl. Acad. Sci. USA 102 , 7426–7431. pmid:15899970

↵
Coifman, R. R. & Maggioni, M. (2004) Tech. Rep. YALE/DCS/TR1289 (Dept. Comput. Sci., Yale Univ., New Haven, CT).

↵
Coifman, R. R. & Maggioni, M. (2004) Tech. Rep. YALE/DCS/TR1303, Yale Univ. and Appl. Comput. Harmonic Anal., in press.

↵
Bremer, J. C., Jr., Coifman, R. R., Maggioni, M. & Szlam, A. D. (2004) Tech. Rep. YALE/DCS/TR1304, Yale Univ. and Appl. Comput. Harmonic Anal., in press.

↵
Beylkin, G., Coifman, R. R. & Rokhlin, V. (1991) Comm. Pure Appl. Math 44 , 141–183.
 ↵

↵
Brandt, A. (1986) Appl. Math. Comp. 19 , 23–56.
 ↵
 ↵

↵
Stein, E. (1970) Topics in Harmonic Analysis Related to the Littlewood–Paley Theory (Princeton Univ. Press, Princeton), Vol. 63.

↵
Meyer, Y. (1990) Ondelettes et Operatéurs (Hermann, Paris).

↵
Daubechies, I. (1992) Ten Lectures on Wavelets (Soc. for Industrial and Appl. Math., Philadelphia).

↵
Coifman, R. R., Meyer, Y., Quake, S. & Wickerhauser, M. V. (1993) in Progress in Wavelet Analysis and Applications (Toulouse, 1992) (Frontières, GifSurYvette, France), pp. 77–93.

↵
Donoho, D. L. & Johnstone, I. M. (1994) Ideal Denoising in an Orthonormal Basis Chosen from a Library of Bases (Stanford Univ., Stanford, CA).
 ↵

↵
Coifman, R. R. & Saito, N. (1994) C. R. Acad. Sci. 319 , 191–196.

↵
Babuska, I. (1976) Numerical Solutions of Partial Differential Equations (Academic, New York), Vol. 3, pp. 89–116.

Gilbert, A. C. (1998) Appl. Comp. Harmonic Anal. 5 , 1–35.

Beylkin, G. & Coult, N. (1998) Appl. Comp. Harmonic Anal. 5 , 129–155.

Hackbusch, W. (1985) Multigrid Methods and Applications (Springer, New York).
 ↵

↵
Coifman, R. R. & Lafon, S. (2004) Appl. Comp. Harmonic Anal., in press.
 ↵

↵
Williams, C. & Seeger, M. (2001) in Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, eds. Leen, T. K., Dietterich, T. G. & Tresp, V. (MIT Press, Cambridge, MA), pp. 682–688.

↵
Slepian, D. & Pollack, H. O. (1961) Bell Syst. Tech. J. 40 , 43–64.

↵
Slepian, D. (1964) Bell Syst. Tech. J. 43 , 3009–3058.
Citation Manager Formats
More Articles of This Classification
Physical Sciences
Related Content
 No related articles found.
Cited by...
 Mapping the human DC lineage through the integration of highdimensional techniques
 Using sketchmap coordinates to analyze and bias molecular dynamics simulations
 Simplifying the representation of complex freeenergy landscapes using sketchmap
 Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps
 Defining functional distance using manifold embeddings of gene ontology annotations
 Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps