# Learning to predict the cosmological structure formation

^{a}Department of Physics, Carnegie Mellon University, Pittsburgh, PA 15213;^{b}McWilliams Center for Cosmology, Carnegie Mellon University, Pittsburgh, PA 15213;^{c}Center for Computational Astrophysics, Flatiron Institute, New York, NY 10010;^{d}Berkeley Center for Cosmological Physics, University of California, Berkeley, CA 94720;^{e}Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720;^{f}Kavli Institute for the Physics and Mathematics of the Universe, University of Tokyo Institutes for Advanced Study, The University of Tokyo, Chiba 277-8583, Japan;^{g}Computer Science Department, University of British Columbia, Vancouver, BC V6T1Z4, Canada;^{h}Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213

See allHide authors and affiliations

Edited by Neta A. Bahcall, Princeton University, Princeton, NJ, and approved May 23, 2019 (received for review December 17, 2018)

## Significance

To understand the evolution of the Universe requires a concerted effort of accurate observation of the sky and fast prediction of structures in the Universe. *N*-body simulation is an effective approach to predicting structure formation of the Universe, though computationally expensive. Here, we build a deep neural network to predict structure formation of the Universe. It outperforms the traditional fast-analytical approximation and accurately extrapolates far beyond its training data. Our study proves that deep learning is an accurate alternative to the traditional way of generating approximate cosmological simulations. Our study shows that one can use deep learning to generate complex 3D simulations in cosmology. This suggests that deep learning can provide a powerful alternative to traditional numerical simulations in cosmology.

## Abstract

Matter evolved under the influence of gravity from minuscule density fluctuations. Nonperturbative structure formed hierarchically over all scales and developed non-Gaussian features in the Universe, known as the cosmic web. To fully understand the structure formation of the Universe is one of the holy grails of modern astrophysics. Astrophysicists survey large volumes of the Universe and use a large ensemble of computer simulations to compare with the observed data to extract the full information of our own Universe. However, to evolve billions of particles over billions of years, even with the simplest physics, is a daunting task. We build a deep neural network, the Deep Density Displacement Model (

Astrophysicists require a large amount of simulations to extract the information from observations (1⇓⇓⇓⇓⇓⇓–8). At its core, modeling structure formation of the Universe is a computationally challenging task; it involves evolving billions of particles with the correct physical model over a large volume over billions of years (9⇓–11). To simplify this task, we either simulate a large volume with simpler physics or a smaller volume with more complex physics. To produce the cosmic web (12) in large volume, we select gravity, the most important component of the theory, to simulate at large scales. A gravity-only N-body simulation is the most popular and effective numerical method to predict the full 6D phase-space distribution of a large number of massive particles whose position and velocity evolve over time in the Universe (13). Nonetheless, N-body simulations are relatively computationally expensive, thus making the comparison of the N-body–simulated large-scale structure (of different underlying cosmological parameters) with the observed Universe a challenging task. We propose to use a deep model that predicts the structure formation as an alternative to N-body simulations.

Deep learning (14) is a fast-growing branch of machine learning, where recent advances have led to models that reach and sometimes exceed human performance across diverse areas, from analysis and synthesis of images (15⇓–17), sound (18, 19), text (20, 21), and videos (22, 23) to complex control and planning tasks as they appear in robotics and game play (24⇓–26). This new paradigm is also significantly impacting a variety of domains in the sciences, from biology (27, 28) to chemistry (29, 30) and physics (31, 32). In particular, in astronomy and cosmology, a growing number of recent studies are using deep learning for a variety of tasks, ranging from analysis of cosmic microwave background (33⇓–35), large-scale structure (36, 37), and gravitational lensing effects (38, 39) to classification of different light sources (40⇓–42).

The ability of these models to learn complex functions has motivated many to use them to understand the physics of interacting objects, leveraging image, video, and relational data (43⇓⇓⇓⇓⇓⇓⇓⇓⇓–53). However, modeling the dynamics of billions of particles in *N*-body simulations poses a distinct challenge.

In this work, we show that a variation on the architecture of a well-known deep-learning model (54) can efficiently transform the first-order approximations of the displacement field and approximate the exact solutions, thereby producing accurate estimates of the large-scale structure. Our key objective is to prove that this approach is an accurate and computationally efficient alternative to expensive cosmological simulations, and, to this end, we provide an extensive analysis of the results in the following section.

The outcome of a typical *N*-body simulation depends on both the initial conditions and on cosmological parameters which affect the evolution equations. A striking discovery is that the Deep Density Displacement Model (

## Setup

We build a deep neural network, *N*-body–simulation scheme that is based on a particle-mesh (PM) solver. FastPM quickly approaches a full *N*-body simulation with high accuracy and provides a viable alternative to direct *N*-body simulations for the purpose of our study.

A significantly faster approximation of *N*-body simulations is produced by second-order Lagrangian perturbation theory (2LPT), which bends each particle’s trajectory with a quadratic correction (58). The 2LPT is used in many cosmological analyses to generate a large number of cosmological simulations for comparison of the astronomical dataset against the physical model (59, 60) or to compute the covariance of the dataset (61⇓–63). We regard 2LPT as an effective way to efficiently generate a relatively accurate description of the large-scale structure, and therefore we select 2LPT as the reference model for comparison with

We generate 10,000 pairs of ZAs as input and accurate FastPM approximations as the target. We use simulations of

An important choice in our approach is training with a displacement field rather than a density field. Displacement field Ψ and density field ρ are two ways of describing the same distribution of particles. And an equivalent way to describe a density field is the overdensity field, defined as **1**.*SI Appendix*, Fig. S1).

## Results and Analysis

Fig. 1 shows the displacement vector field as predicted by *Left*) and the associated point-cloud representation of the structure formation (*Right*). It is possible to identify structures such as clusters, filaments, and voids in this point-cloud representation. We proceed to compare the accuracy of

### Point-Wise Comparison.

Let

### Two-Point Correlation Comparison.

As suggested by Fig. 2, the denser regions seem to have a higher error for all methods—that is, more nonlinearity in structure formation creates larger errors for both

Cosmologists often use compressed summary statistics of the density field in their studies. The most widely used of these statistics are the two-point correlation function (2PCF)

Because FastPM, 2LPT, and

We focus on the Fourier-space representation of the two-point correlation. Because the matter and the displacement power spectrum take the same form, in what follows, we drop the subscript for matter and displacement field and use *r*(*k*) is a form of normalized cross-power spectrum,

Fig. 3*A* shows the average power spectrum, transfer function

Now, we turn to the

### Three-Point Correlation Comparison.

The three-point correlation function (3PCF) expresses the correlation of the field of interest among three locations in the configuration space, which is equivalently defined as bispectrum in Fourier space. Here, we concentrate on the 3PCF for computational convenience:

We compare the 3PCF calculated from FastPM, 2LPT, and *B* shows the ratio of the binned multipole coefficients of the two 3PCFs for several triangle configurations,

## Generalizing to New Cosmological Parameters

So far, we train our model using a “single” choice of cosmological parameters

Here, we report an interesting observation: The

### Varying Primordial Amplitude of Scalar Perturbations A s .

After training the

Fig. 5*A* shows the transfer function and correlation coefficient for both

### Varying Matter Density Parameter Ω m .

We repeat the same experiments, this time changing *C* and *D* show the two-point statistics for density field predicted by using different values of

## Conclusions

To summarize, our deep model

Looking forward, we expect that replacing FastPM with exact *N*-body simulations would improve the performance of our method. As the complexity of our

## Materials and Methods

### Dataset.

The full simulation data consists of 10,000 simulations of boxes with ZA and FastPM as input–output pairs, with an effective volume of 20 (Gpc/h)^{3} (

### Model and Training.

The *SI Appendix*, Fig. S2. In the training phase, we use the Adam Optimizer (72) with a learning rate of 0.0001, and first- and second-moment exponential decay rates equal to 0.9 and 0.999, respectively. We use the mean-squared error as the loss function (Loss Function) and

### Details of the D 3 M Architecture.

The contracting path follows the typical architecture of a convolution network. It consists of two blocks, each of which consists of two successive convolutions of stride 1 and a down-sampling convolution with stride 2. The convolution layers use 3

We take special care in the padding and cropping procedure to preserve the shifting and rotation symmetry in the up-sampling layer in expansive path. Before the transposed convolution, we apply a periodic padding of length 1 on the right, down, and back sides of the box [padding = (0,1,0,1,0,1) in pytorch], and after the transposed convolution, we discard one column on the left, up, and front sides of the box and two columns on the right, down, and back sides [crop = (1,2,1,2,1,2)].

A special feature of the

The expansive building block then follows a 1

#### Padding and Periodic Boundary.

It is common to use constant or reflective padding in deep models for image processing. However, these approaches are not suitable for our setting. The physical model we are learning is constructed on a spatial volume with a periodic boundary condition. This is sometimes also referred to as a torus geometry, where the boundaries of the simulation box are topologically connected—that is,

We find that the periodic padding strategy significantly improves the performance and expedites the convergence of our model, comparing to the same network using a constant padding strategy. This is not surprising, as one expects that it is easier to train a model that can explain the data than to train a model that does not.

#### Loss Function.

We train the *N* is the total number of particles. This loss function is proportional to the integrated squared error, and by using a Fourier transform and Parseval’s theorem, it can be rewritten as**5**, and r is the correlation coefficient defined in Eq. **6**, which characterize the similarity between the predicted and true fields, in amplitude and phase, respectively. Eq. **10** shows that our simple loss function jointly captures both of these measures: As T and r approach 1, the loss function approaches 0.

#### Data Availability.

The source code of our implementation is available at https://github.com/siyucosmo/ML-Recon. The code to generate the training data is also available at https://github.com/rainwoodman/fastpm.

## Acknowledgments

We thank Angus Beane, Peter Braam, Gabriella Contardo, David Hogg, Laurence Levasseur, Pascal Ripoche, Zack Slepian, and David Spergel for useful suggestions and comments; Angus Beane for comments on the paper; and Nick Carriero for help on Center for Computational Astrophysics (CCA) computing clusters. The work was supported partially by the Simons Foundation. The FastPM simulations were generated on the computer cluster Edison at the National Energy Research Scientific Computing Center, a US Department of Energy Office of Science User Facility operated under Contract DE-AC02-05CH11231. The training of the neural network model was performed on the CCA computing facility and the Carnegie Mellon University AutonLab computing facility. The open-source software toolkit nbodykit (73) was used for the clustering analysis. Y.L. was supported by the Berkeley Center for Cosmological Physics and the Kavli Institute for the Physics and Mathematics of the Universe, established by the World Premier International Research Center Initiative of the MEXT, Japan. S. Ho was supported by NASA Grants 15-WFIRST15-0008 and Research Opportunities in Space and Earth Sciences Grant 12-EUCLID12-0004; and the Simons Foundation.

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: shirleyho{at}flatironinstitute.org or siyuh{at}andrew.cmu.edu.

Author contributions: S. He, Y.L., S. Ho, and B.P. designed research; S. He, Y.L., Y.F., and S. Ho performed research; S. He, Y.L., S.R., and B.P. contributed new reagents/analytic tools; S. He, Y.L., Y.F., and W.C. analyzed data; and S. He, Y.L., Y.F., S. Ho, S.R., and W.C. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The source code of our implementation is available at https://github.com/siyucosmo/ML-Recon. The code to generate the training data is available at https://github.com/rainwoodman/fastpm.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1821458116/-/DCSupplemental.

- Copyright © 2019 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).

## References

- ↵
- ↵
- D. J. Eisenstein et al.

- ↵
- ↵
- ↵
- M. Scodeggio et al.

- ↵
- Ž. Ivezić et al.

- ↵
- L. Amendola et al.

- ↵
- D. Spergel et al.

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- G. Huang,
- Z. Liu,
- L. Van Der Maaten,
- K. Q. Weinberger

- ↵
- T. Karras,
- T. Aila,
- S. Laine,
- J. Lehtinen

- ↵
- I. Guyon et al.

- I. Gulrajani,
- F. Ahmed,
- M. Arjovsky,
- V. Dumoulin,
- A. C. Courville

- ↵
- A. Van Den Oord et al.

- ↵
- M. F. Balcan,
- K. Q. Weinberger

- D. Amodei et al.

- ↵
- Z. Hu,
- Z. Yang,
- X. Liang,
- R. Salakhutdinov,
- E. P. Xing

- ↵
- I. Guyon et al.

- A. Vaswani et al.

- ↵
- E. Denton,
- R. Fergus

- ↵
- J. Donahue et al.

- ↵
- ↵
- ↵
- S. Levine,
- C. Finn,
- T. Darrell,
- P. Abbeel

- ↵
- T. Ching et al.

- ↵
- ↵
- ↵
- J. Gilmer,
- S. S. Schoenholz,
- P. F. Riley,
- O. Vinyals,
- G. E. Dahl

- ↵
- G. Carleo,
- M. Troyer

- ↵
- C. Adam-Bourdarios et al.

- ↵
- S. He,
- S. Ravanbakhsh,
- S. Ho

*Proceedings of the 33rd International Conference on Machine Learning*(Journal of Machine Learning Research, 2016), Vol. 48. (2018). - ↵
- N. Perraudin,
- M. Defferrard,
- T. Kacprzak,
- R. Sgier

- ↵
- J. Caldeira et al.

- ↵
- S. Ravanbakhsh et al.

- ↵
- A. Mathuriya et al.

- ↵
- Y. D. Hezaveh,
- L. P. Levasseur,
- P. J. Marshall

- ↵
- F. Lanusse et al.

- ↵
- J. Dy,
- A. Krause

- N. Kennamer,
- D. Kirkby,
- A. Ihler,
- F. J. Sanchez-Lopez

- ↵
- E. J. Kim,
- R. J. Brunner

- ↵
- M. Lochner,
- J. D. McEwen,
- H. V. Peiris,
- O. Lahav,
- M. K. Winter

- ↵
- P. W. Battaglia,
- J. B. Hamrick,
- J. B. Tenenbaum

- ↵
- D. D. Lee,
- U. von Luxburg,
- R. Garnett,
- M. Sugiyama,
- I. Guyon

- P. Battaglia et al.

- ↵
- R. Mottaghi,
- H. Bagherinezhad,
- M. Rastegari,
- A. Farhadi

- ↵
- M. B. Chang,
- T. Ullman,
- A. Torralba,
- J. B. Tenenbaum

- ↵
- C. Cortes,
- D. D. Lee,
- M. Sugiyama,
- R. Garnett

- J. Wu,
- I. Yildirim,
- J. J. Lim,
- B. Freeman,
- J. Tenenbaum

- ↵
- J. Wu,
- J. J. Lim,
- H. Zhang,
- J. B. Tenenbaum,
- W. T. Freeman

- ↵
- N. Watters et al.

- ↵
- A. Lerer,
- S. Gross,
- R. Fergus

- ↵
- P. Agrawal,
- A. V. Nair,
- P. Abbeel,
- J. Malik,
- S. Levine

- ↵
- K. Fragkiadaki,
- P. Agrawal,
- S. Levine,
- J. Malik

- ↵
- J. Tompson,
- K. Schlachter,
- P. Sprechmann,
- K. Perlin

- ↵
- O. Ronneberger,
- P. Fischer,
- T. Brox

- ↵
- Y. B. Zel’dovich

- ↵
- ↵
- Y. Feng,
- M.-Y. Chu,
- U. Seljak,
- P. McDonald

- ↵
- ↵
- ↵
- ↵
- K. S. Dawson et al.

- ↵
- K. S. Dawson et al.

- ↵
- DESI Collaboration
- et al.

- ↵
- Y. Feng,
- U. Seljak,
- M. Zaldarriaga

- ↵
- K. C. Chan

*Phys. Rev. D***89**, 083515. - ↵
- A. Perko,
- L. Senatore,
- E. Jennings,
- R. H. Wechsler

- ↵
- ↵
- ↵
- F. Milletari,
- N. Navab,
- S.-A. Ahmadi

- ↵
- P. Berger,
- G. Stein

- ↵
- M. A. Aragon-Calvo

- ↵
- D. Kingma,
- J. Ba

- ↵
- N. Hand et al.

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Astronomy