# Ergodic theorem, ergodic theory, and statistical mechanics

See allHide authors and affiliations

Edited by Kenneth A. Ribet, University of California, Berkeley, CA, and approved January 9, 2015 (received for review November 13, 2014)

## Abstract

This perspective highlights the mean ergodic theorem established by John von Neumann and the pointwise ergodic theorem established by George Birkhoff, proofs of which were published nearly simultaneously in PNAS in 1931 and 1932. These theorems were of great significance both in mathematics and in statistical mechanics. In statistical mechanics they provided a key insight into a 60-y-old fundamental problem of the subject—namely, the rationale for the hypothesis that time averages can be set equal to phase averages. The evolution of this problem is traced from the origins of statistical mechanics and Boltzman's ergodic hypothesis to the Ehrenfests' quasi-ergodic hypothesis, and then to the ergodic theorems. We discuss communications between von Neumann and Birkhoff in the Fall of 1931 leading up to the publication of these papers and related issues of priority. These ergodic theorems initiated a new field of mathematical-research called ergodic theory that has thrived ever since, and we discuss some of recent developments in ergodic theory that are relevant for statistical mechanics.

George D. Birkhoff (1) and John von Neumann (2) published separate and virtually simultaneous path-breaking papers in which the two authors proved slightly different versions of what came to be known (as a result of these papers) as the ergodic theorem. The techniques that they used were strikingly different, but they arrived at very similar results. The ergodic theorem, when applied say to a mechanical system such as one might meet in statistical mechanics or in celestial mechanics, allows one to conclude remarkable results about the average behavior of the system over long periods of time, provided that the system is metrically transitive (a concept to be defined below). First of all, these two papers provided a key insight into a 60-y-old fundamental problem of statistical mechanics, namely the rationale for the hypothesis that time averages can be set equal to phase averages, but also initiated a new field of mathematical research called ergodic theory, which has thrived for more than 80 y. Subsequent research in ergodic theory since 1932 has further expanded the connection between the ergodic theorem and this core hypothesis of statistical mechanics.

The justification for this hypothesis is a problem that the originators of statistical mechanics, J. C. Maxwell (3) and L. Boltzmann (4), wrestled with beginning in the 1870s as did other early workers, but without mathematical success. J. W. Gibbs in his 1902 work (5) argued for his version of the hypothesis based on the fact that using it gives results consistent with experiments. The 1931–1932 ergodic theorem applied to the phase space of a mechanical system that arises in statistical mechanics and to the one-parameter group of homeomorphisms representing the time evolution of the system asserts that for almost all orbits, the time average of an integrable function on phase space is equal to its phase average, provided that the one-parameter group is metrically transitive. Hence, the ergodic theorem transforms the question of equality of time and phase averages into the question of whether the one-parameter group representing the time evolution of the system is metrically transitive.

To be more specific about statistical mechanics systems, consider a typical situation in gas dynamics where one has a macroscopic quantity of a dilute gas enclosed in a finite container. The molecules are in motion, colliding with each other and with the hard walls of the container. The molecules can be assumed for instance to be hard spheres (billiard balls) bouncing off each other or alternately may be assumed to be polyatomic molecules with internal structure and where collisions are governed by short-range repelling potentials. One may also choose to include the effects of external forces, such as gravity on the molecules. We assume that the phase space *M* consists of a surface of constant energy. This assumption, together with the finite extent of the container, ensures that *M* is compact and that the invariant measure derived from Liouville’s theorem is finite. The equations of motion, say in Hamiltonian form, can be written in local coordinates as a first-order system of ordinary differential equations

First, the number of variables in the equations is enormous, perhaps on the order of Avogadro’s number, and the equations are quite complex. The system is perfectly deterministic in principle; hence, given the initial positions and momenta of all of the molecules at an initial time, the system evolves deterministically. However, there is no chance of knowing these initial conditions exactly and little chance of integrating these equations to find the solutions. Given therefore that we only have partial information about the system, a statistical approach to the analysis of such systems is appropriate and necessary. Maxwell (6) and Boltzmann (7) began such a project, which was further developed and elaborated by Gibbs (5).

The system of differential equations above generates a flow, which we denote *M* representing the system at time *t* so that *x* at time *M* onto itself defined for all *t* (as the system is time reversible) and satisfies the group property *μ*, which is finite in this case.

Now if *f* is an integrable function on the phase space *M*, one may argue that if one makes a physical measurement of *f* on a system that is in state *x*, is a time average

This is a time average of the values of *f* over a part of the path of the solution of the equations with value *x* at time *T* tends to infinity of the time average above. The first problem is, of course, to know that this limit exists, except of course for a negligible set of *x*. Then the assumption of equality of time averages and phase averages would assert that the limit of the time average above is independent of *x* and equal to the phase average

for all but a neglible set of *μ* is the Liouville invariant measure. The significant and useful point is this phase average can be calculated in many cases, whereas the time average cannot be calculated.

Another way of phrasing this equality is to use for *f* the indicator function of a measurable subset *A* of *M* (a function that is equal to 1 on *A* and 0 outside of *A*). Then the time averages above record the fraction of time that the orbit spends in *A*, and the basic hypothesis of statistical mechanics asserts that this is equal for almost all orbits to the Liouville measure of *A* (assuming that the measure of the total space *M* has been normalized to 1). However, another way of expressing this equality is to assert that for all but a negligible set of states of the gas, the observed value of a function *f* will be equal to the average value of *f* taken over *M* that is an average value of *f* taken over an ensemble (to use Gibbs’s language) of all possible states with the same energy. The constant energy surface *M* with its invariant volume element here is what Gibbs called the microcanonical ensemble.

The ergodic theorems of Birkhoff and von Neumann assert first of all of the existence of the time limit for

The difference between the two theorems is that Birkhoff proved that the convergence of the functions of *x* on the left side is pointwise almost everywhere (the limit in general can be identified as the conditional expectation of *f* onto the sigma field of invariant sets, to use the language of probability theory). In the case of metric transitivity, this function is just the constant function equal to the integral of *f* over *M*. von Neumann proved that these functions of *x* converge in mean square [that is in *f* onto the closed subspace of invariant functions, which in the metrically transitive case is one dimensional consisting of the constant functions. Birkhoff assumes the function *f* is bounded and measurable, whereas von Neumann assumes the more general condition that the function *f* is square integrable. Although both theorems were originally formulated and proved for measure preserving one parameter groups generated by first-order differential equations on compact manifolds, subsequent work has shown using the same arguments that these results are valid for a much broader class of dynamical systems including one-parameter families of measure preserving transformations of a finite measure space, which may not necessarily be defined by systems of differential equations. Later work also showed that Birkhoff’s theorem holds for an integrable function *f*. Thus, these theorems are theorems about one-parameter groups of automorphisms of measure spaces with no mention of topology. The theorems also clarify what is meant by the informal term in statistical mechanics, negligble set, namely, it is a set of *μ* that measures zero. It should be added that the time average does not have to be taken from *T*, but can be taken over any intervals from

Before moving on to subsequent developments in ergodic theory, it is worth pausing to examine the sequence of events leading to the proofs and publication of the two ergodic theorems: the pointwise ergodic theorem of Birkhoff and the mean ergodic theorem of von Neumann. Much of this was laid out in a subsequent paper of Birkhoff and Koopman (8) in the March 1932 issue of the PNAS. von Neumann was very much aware of the results of M. H. Stone (9) on spectral theory of one-parameter groups of unitary operators and the results of Koopman (10) that used Stone’s results to analyze one-parameter groups of measure preserving transformations. Koopman had indeed suggested to von Neumann that he might use these results to resolve the problem of equality of time and phase averages, and von Neumann writes that Andre Weil had made the very same suggestion to him. von Neumann seized on the notion of metric transitivity, introduced, somewhat ironically, by Birkhoff and Smith (11) a few years earlier in 1928, and proved his mean ergodic theorem under the hypothesis of metric transitivity. See the article by Mackey (12) for more details.

According to Birkhoff and Koopman, von Neumann communicated his result personally to both of them on October 22, 1931, and pointed out to them that his result raised the important question of whether a pointwise result might be valid. Birkhoff then went to work and, by different methods, quickly established his pointwise ergodic theorem. He submitted his paper to PNAS on December 1, 1931, for appearance in the December 1931 issue. One presumes that he sent copies to Koopman and von Neumann, who would have noticed that Birkhoff had not given von Neumann adequate credit and recognition for his result. von Neumann evidently planned to include his ergodic theorem and its proof in a much longer paper he was writing for the *Annals of Mathematics*, but he then apparently quickly drafted a short paper for PNAS with his proof of the mean ergodic theorem and submitted it to PNAS on December 10, 1931. It appeared in the January 1932 issue. One suspects that these events led Koopman and Birkhoff to write and publish their paper in PNAS 2 months later, which set matters straight and clearly acknowledged von Neumann’s priority. It should also be noted that E. Hopf (13) presented a slightly different proof of the mean ergodic theorem and some improvements on the Birkhoff theorem in a paper, which appeared in the January 1932 issue of PNAS. For whatever reason, the Birkhoff paper and its result has over time become the better known of the two papers, but in light of these historical details, the von Neumann paper deserves at least equal billing.

There are also ergodic theorems for a single measure-preserving map *P* and its iterates *P* is metrically transitive; that is

The convergence is pointwise for almost all *x* for integrable *f*, and in the mean for *f* square summable. One way to conceive of metric transitivity and the ergodic theorem for a single transformation is that for almost all points *x*, the *n* iterates under *P* of *x* is distributed in some sense evenly throughout the space so that taking the average of a function *f* over these points gives a result that is a good approximation to the integral of *f* over the space and that the more iterates one includes in the average, the better the approximation. Therefore, it is like a numerical integration scheme.

Finally, we need to define metric transitivity, a concept, as previously noted, that was introduced by Birkhoff and Smith (11). A one-parameter group of measure preserving transformation *P*) on a measure space *M* is metrically transitive provided that any *μ* measurable set invariant under *t* (or *P*) must have zero measure or its complement must have zero measure. This means that the flow is indecomposable or irreducible in the sense that one cannot decompose it into a union of two disjoint subflows. It also means that there are no measurable functions invariant under the flow (or the transformation *P*).

It is heuristically reasonable to argue, owing to the molecular chaos in gas dynamics, that there are no nonconstant continuous invariants or so-called first integrals of the motion. However, more is required for metric transitivity—namely no nonconstant measurable invariants of the motion. In the example from gas dynamics, the total energy is clearly an invariant of the motion, but we have restricted the flow to a surface of constant energy. In addition, total momentum is normally preserved, but although momentum is preserved in collisions between molecules, collisions with the walls do not preserve momentum, so this possible invariant of the motion disappears. Although the term metric transitivity is still in use, current terminology, due to von Neumann, is that any flow or single transformation with this property is simply called ergodic.

It is worth observing that metric transitivity is a necessary and sufficient condition for the validity of the ergodic theorem. To see this, assume the ergodic theorem holds and then apply the statement of the theorem to the indicator function *f* of a supposed invariant measurable set *A—*that is, *f* is equal 1 on *A* and 0 on the complement of *A*; the left side of time averages is always equal to *f*, but the right side is a constant function. Hence, *f* is a constant function, and the alleged invariant set is of measure zero or its complement is of measure zero.

It is interesting to look back at the early history of statistical mechanics to see how the founders of the subject handled the topic of time averages and space averages. Boltzmann (4) coined the terms ergoden or ergodische (which we translate as ergodic) from the Greek

In their influential 1911 article, Ehrenfest and Ehrenfest (17) summarized and discussed problems with the ergodic hypothesis and then proposed instead the quasi-ergodic hypothesis as a replacement. This hypothesis states that some orbit of the flow will pass arbitrarily close to every point of phase space, or in other words this orbit is topologically dense in the phase space. This hypothesis is a far more plausible one than the old ergodic hypothesis, and it does imply that any continuous function invariant under the flow is constant. Some authors [von Plato (18)] have argued that, despite what Boltzmann had written down most of the time in his articles about the ergodic hypothesis, that he probably really meant something like what was later termed the quasi-ergodic hypothesis. However, the quasi-ergodic hypothesis does not imply metric transitivity. For instance, it is not even true that a minimal flow (every orbit is dense) with an invariant measure is metrically transitive [see Furstenberg (19)]. for examples. Therefore, although the original ergodic hypothesis was too strong to be plausible, the quasi-ergodic hypothesis was too weak to establish equality of time and phase averages. Further mathematical, progress had to await the concept of metric transitivity and the ergodic theorems of 1931 and 1932. For more details, see the survey article of Mackey (20).

One reaction to the Birkhoff and von Neumann ergodic theorems might be that they do not really solve the problem of equating time average and phase averages but only reduce it to a possibly equally difficult problem of proving metric transitivity. For instance, how can one prove that a one-parameter flow is metrically transitive and indeed how do you know metrically transitive systems exist at all. At this point, let us transfer to current terminology and simply call metrically transitive transformations ergodic, as von Neumann suggested.

As to the existence of ergodic transformations, Oxtoby and Ulam (21) showed that on a compact polyhedron *M* equipped with a finite Lebesgue–Stieljes measure, the set of all ergodic measure preserving homeomorphisms is a dense

von Neumann in his *Annals of Mathematics* paper (23) provides an intriguing and powerful answer to the existence problem. He shows that any one-parameter flow or any single transformation on a finite measure space can be written as a possibly continuous sum of ergodic ones. To make this precise, let *M* with the unit interval *I* with Lebesgue measure *p* be the projection to the first factor *I* and denote the part of *r* in *I* by *r*, one can piece together these *J* of *I*, but it is displayed as a continuous sum of ergodic ones, the

Irrational rotations on a torus provide important examples of ergodic flows. We represent points on a 2D torus *T* by pairs of complex numbers

where *b* is irrational. All orbits are dense so this flow is quasi-ergodic, and it is also ergodic. To see this, assume there is an invariant measurable set *A* and let *f* be the indicator function of this set (1 on *A* and 0 on its complement). Expand *f* in a double Fourier series and use the invariance of *A* under *f* is constant, and this establishes ergodicity.

An important set of examples for the subsequent development of ergodic theory is the shift transformations. Let *F* be a finite set of *n* elements and assign a probability measure to *F*; that is nonnegative numbers *M* of the measure space *F* with itself using the integers as the index set. Thus, *M* consists of doubly infinite sequences of points of *F*, *F*. The shift transformation is defined by *P* has the effect of shifting a sequence *s* one place. *P* also preserves the product measure on *M*, and it is easily seen to be ergodic. Indeed any invariant set in the language of probability theory is statistically independent from the family of subsets of *F* that are defined by any finite set of indices. Hence, it is a so-called tail event, and by the Borel–Cantelli Lemma of probability theory, it has probability zero or one. Hence, shifts are ergodic.

Specializing to the case *M* is the probability model of doubly infinite sequences of the results of tossing a fair coin. Letting *f* in the ergodic theorem is

which is simply the proportion of heads in the first *s* to the integral of *f* over *M*, which is

Another set of examples of significance for ergodic theory as well as statistical mechanics are geodesic flows, in particular geodesic flows on compact Riemannian manifolds of negative curvature. First, consider the 2D case of a surface. Geodesic flow take place on the unit tangent bundle *T* of such a surface, which consists of pairs *x* is a point of the surface *M* and *v* is a unit tangent vector at *x*. Then *T* consists of points of *M* with a circle above each point consisting of unit tangent vectors, so *T* is a 3D compact manifold. Geodesic flow *T* flows a point *x* with tangent vector *v* a distance *t* with *y* is the endpoint of the geodesic of length *t* and *w* is its tangent vector at *y*. Hedlund (24) and Hopf (25) independently established that such flows were ergodic and also later extended the result to higher dimensional manifolds with negative curvature. A key part of the reasoning was the fact that negative curvature makes nearby geodesics diverge exponentially from each other so that the flow has sensitive dependence on initial conditions in that, if *u* and *v* are very close together, *t*. This property is called hyperbolicity, and it is known to be key in many proofs showing that flows or single transformations are ergodic. It is also a property that a system of colliding gas molecules will have. If one perturbs very slightly the position and momentum of a gas molecule, then in its next collision, it will bounce off the other molecule at a quite different direction, a situation that is repeated in further collisions, resulting in rapid divergence of its path from the path of the unperturbed molecule.

There is a substantial and fascinating amount of papers in ergodic theory pursuing a program using hyperbolicity to help prove that various approximations to a system of a volume of gas molecules confined in a container are ergodic. The program has not succeeded as yet in proving that an actual model of a gas is ergodic, but the results are getting close. The first paper was the path breaking paper of Sinai (26). Sinai redefined the problem from one where the gas molecules are contained in a cubical enclosure with reflecting walls to a model where the particles move in a cube with periodic boundary conditions. The molecules collide with each other but not with the walls. This is not physically realistic, but it can shed light on the original problem by analogy. Much progress has been made on this model, especially by Simanyi and Szasz (27) and Simanyi (28), where, using hyperbolocity as indicated above, a nearly complete resolution of ergodicity issues is established for such Sinai flows [see also the survey article by Szasz (29)].

Going back to the more realistic case of molecules in a container with reflecting walls, one should begin in dimension two with a single ball—a billiard flow in a planar region or table. If the table is a rectangle, the momenta of the ball over time can take on only a finite number of values so the flow on a phase space of fixed energy cannot be ergodic. However, if one has a billiard table of more complex geometry, the situation becomes more interesting. For instance, if the billiard table is a polygon, then Kerckhoff et al. (30) show that for topologically almost all polygons, the billiard flow on a phase space of constant energy is ergodic. In particular, one has to stay away from rational polygons where all of the angles are rational multiples of *π*. The authors point out an interesting corollary of this result, which is that a mechanical system of two particles of masses

Finally, Simanyi (31) has established that a system consisting of two hard spheres contained in a cube of any dimension at least two bouncing off each other and off the hard walls is ergodic on any surface of constant energy. This appears to be a rigorous ergodicity result in a situation that comes closest to an actual gas. However, this whole array of theorems, of which we have mentioned only a few, suggest that the hypothesis of ergodicity (or metric transitivity) for a physical system like that of gas dynamics mentioned at the outset of this essay is very plausible.

## Footnotes

- ↵
^{1}Email: ccmoore{at}math.berkeley.edu.

Author contributions: C.C.M. wrote the paper.

The author declares no conflict of interest.

This article is a PNAS Direct Submission.

This article is part of the special series of PNAS 100th Anniversary articles to commemorate exceptional research published in PNAS over the last century. See the companion articles, “Proof of the ergodic theorem” on page 656 in issue 12 of volume 17 and “Proof of the quasi-ergodic hypothesis” on page 70 in issue 1 of volume 18, and see Core Concepts on page 1914.

## References

- ↵.
- Birkhoff GD

- ↵.
- von Neumann JV

- ↵.
- Maxwell JC

- ↵.
- Boltzmann L

- ↵.
- Gibbs JW

- ↵.
- Maxwell JC

- ↵.
- Boltzmann L

- ↵.
- Birkhoff GD,
- Koopman BO

- ↵.
- Stone MH, Operational Methods and Group Theory

- ↵.
- Koopman BO

- ↵.
- Birkhoff GD,
- Smith PA

- ↵
- ↵.
- Hopf E

- ↵.
- Plancherel M

- ↵.
- Rosenthal A

- ↵.
- Poincaré H

- ↵.
- Ehrenfest P,
- Ehrenfest T

*Begriffliche Grundlagen der statistischen Auffassung in der Mechanik Enc Math. Wiss*Vol 4, no 6 (Teubner, Leipzig, Germany); trans Moravcsik M (1959) [*The Conceptual Foundations of the Statistical Approach in Mechanics*] (Cornell Univ Press, Ithaca, NY) - ↵
- ↵
- ↵
- ↵
- ↵.
- Markus L,
- Meyer K

- ↵
- ↵
- ↵.
- Hopf E

*Ber Verh Sachs*. Akad Leipzig 91:261–304 - ↵.
- Sinai Ya

- ↵.
- Simanyi N,
- Szasz D

- ↵.
- Simanyi N

- ↵.
- Szasz D

- ↵
- ↵.
- Simanyi N

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Mathematics

### Related Articles

- Proof of the Ergodic Theorem- Dec 15, 1931
- Proof of the Quasi-Ergodic Hypothesis- Jan 15, 1932
- Ergodic theory has a key role in multiple fields- Feb 17, 2015