# Improved surrogates in inertial confinement fusion with manifold and cycle consistencies

^{a}Center for Applied Scientific Computing (CASC), Lawrence Livermore National Laboratory, Livermore, CA 94550;^{b}Center for Extreme Data Management Analysis and Visualization (CEDMAV), University of Utah, Salt Lake City, UT 84112;^{c}Design Physics Division, Lawrence Livermore National Laboratory, Livermore, CA 94550

See allHide authors and affiliations

Edited by David A. Weitz, Harvard University, Cambridge, MA, and approved March 16, 2020 (received for review September 25, 2019)

## Significance

Neural networks have demonstrated remarkable success in predictive modeling. However, when applied to surrogate modeling, they 1) are often nonrobust, 2) require large amounts of data, and 3) are inadequate for estimating the inversion process; i.e., they do not capture parameter sensitivities well. We propose a different form of self-consistency regularization by incorporating an inverse surrogate into the learning process and show that it leads to highly robust, self-consistent surrogate models for complex scientific applications.

## Abstract

Neural networks have become the method of choice in surrogate modeling because of their ability to characterize arbitrary, high-dimensional functions in a data-driven fashion. This paper advocates for the training of surrogates that are 1) consistent with the physical manifold, resulting in physically meaningful predictions, and 2) cyclically consistent with a jointly trained inverse model; i.e., backmapping predictions through the inverse results in the original input parameters. We find that these two consistencies lead to surrogates that are superior in terms of predictive performance, are more resilient to sampling artifacts, and tend to be more data efficient. Using inertial confinement fusion (ICF) as a test-bed problem, we model a one-dimensional semianalytic numerical simulator and demonstrate the effectiveness of our approach.

Across scientific disciplines, researchers commonly design and evaluate experiments by comparing empirical observations with simulated predictions from numerical models. Simulations can provide insights into the underlying phenomena and are often instrumental to effective experiment design. Unfortunately, the most reliable, high-fidelity simulators are often too expensive to allow extensive calibration or parameter estimation. Hence, it is common to use ensembles of simulations to train a surrogate model that approximates the simulator over a large range of inputs, thereby enabling parameter studies as well as sensitivity analysis (1). Furthermore, one often fits a second—inverse—model to guide adaptive sampling and to identify parameters that drive the surrogate model into consistency with experiment.

Until recently, surrogate modeling has largely been restricted to one or at most a handful of scalar outputs. Consequently, scientists have been forced to distill their rich observational and simulated data into simple summary indicators or hand-engineered features such as the integral of an image, the peak of a time history, or the width of a spectral line. Such feature engineering severely limits the effectiveness of the entire analysis chain as most information from both experiments and simulations is either highly compressed or entirely ignored. Unsurprisingly, surrogate models designed to predict these features are often underconstrained, ill-conditioned, and not very informative.

Neural networks (NNs) have become a popular option to address this challenge due to their ability to handle more complex, multivariate datatypes, such as images, time series, or energy spectra. In a number of different application areas ranging from particle physics (1) to porous media flows (2) and many other scientific problems (2), NNs are able to effectively capture correlations across high-dimensional data signatures and produce high-quality surrogates, predictors, or classifiers. Inverse problems tend to be ill-posed, yet deep neural networks have shown remarkable progress in addressing challenging problems (3). Some notable examples are in imaging (4) and more recently leveraging novel regularizers such as structural priors (5, 6) or generative models (7, 8) for traditionally challenging inverse problems.

As a result there has been renewed interest in building better surrogates using neural networks for scientific problems. These include incorporating known scientific constraints into the training process (9, 10) or reducing dimensionality for better uncertainty quantification (11). However, surrogate forward models are often constructed in isolation such that they are inconsistent with an inverse model, leading to an implausible overall system in which the intuitive cycle of mapping inputs to outputs and back to inputs produces wildly varying results. Not only can an inverse prediction from the surrogate output be far away from the initial input, but even univariate sensitivities, i.e., inferring changes in predictions with respect to a single input parameter, are often unintuitive.

To address these issues, this paper advocates for the training of manifold and cyclically consistent (MaCC) surrogates using a multimodal and self-consistent neural network that outperforms the current state of the art on a wide range of metrics. Using a semianalytic model of inertial confinement fusion (ICF) (12, 13) as a test-bed problem, we propose a MaCC surrogate, containing two distinct components: 1) an autoencoding network to approximate the low-dimensional latent manifold and to accurately capture the correlations between multimodal outputs of a simulator, i.e., multiple images and a set of scalar quantities, and 2) an inverse (or pseudoinverse because of the ill-posed nature) neural network that trains alongside the surrogate network. Cyclical consistency has emerged as a powerful regularization technique in unsupervised problems in the past few years (14⇓–16), improving the state of the art in a variety of applications including image-to-image translation (14), domain adaptation (17), visual question answering (18), and voice conversions (19). We propose a direct coupling between forward and inverse models to enforce cyclical consistency, which regularizes the training to produce higher-fidelity and more robust models.

## Main Findings

We find that manifold consistency significantly improves the predictive capabilities, while the cycle consistency helps in smoothing the high-dimensional function space in the outputs, resulting in improved resilience to sampling artifacts and data scarcity. Surprisingly, we find that cyclical consistency generalizes even to other inverse models (from data bootstraps) not accessed during training, demonstrating a tight coupling between the input and output spaces.

## Surrogate Design for ICF

In any surrogate-based technique, the challenge is to build a high-fidelity mapping from the process inputs, say target and laser settings for ICF, to process outputs, such as ICF implosion neutron yield and X-ray diagnostics. Developing surrogates in the ICF context is particularly challenging. The physics of ICF fusion ignition are predicated on interactions between multiple strongly nonlinear physics mechanisms that have multivariate dependence on a large number of controllable parameters. This presents the designer with a complicated response function that has sharp, nonlinear features in a high-dimensional input space. While this is challenging, deep neural network solutions have made building surrogates for scalar-valued outputs relatively routine (20). However, to take full advantage of the rich range of diagnostic techniques, we require surrogates that can also replicate a wide range of array-valued image data. In ICF, the images can be produced by different particles (X-rays, neutrons) at different energies (hyperspectral), at different times, and from different lines of sight. These complicated modalities are more difficult to ingest, and techniques for learning them can introduce large model capacity and an associated need for excessive amounts of data. Thus, our principal design task is to develop a neural network surrogate that can handle multiple data modalities, can produce predictions acceptable for precision physics, and can be trained without requiring unreasonably large amounts of data.

## Predictive Surrogates with Neural Networks

Formally, the surrogate modeling problem is defined as follows: Given a set of input parameters,

In this paper, we propose two consistency requirements to improve surrogate modeling: first, a manifold consistency that ensures the predictions are physically meaningful and, second, a notion of cyclical consistency (14, 15) between the forward and inverse models. For the former, we use an autoencoder to embed all output quantities into a low-dimensional manifold, Z, and repose surrogate modeling as

## Notations

Since we have several networks interacting with each other, we clarify our notation for the rest of this paper. We refer to the inputs corresponding to a set of samples by matrix X, while each sample is denoted as x. Similarly, the collections of outputs and latent representations are denoted as Y and Z, while their individual realizations are y and z, respectively. The predictions from the trained models F and G are referred to as

## Methods

### Multimodal Prediction Using an Autoencoder.

Exploiting the correlation between multimodal outputs should lead to a better forward model because it disambiguates simulations that may otherwise appear similar in some aggregated response function. A straightforward multimodal forward model

### Design.

As shown in Fig. 1*A*, the output space in our setup is composed of a set of images (treated as different channels) and diagnostic scalars (

Since the exact parameterization of

### Cyclical Regularization in Surrogates.

While the surrogate model introduced above performs well, it is important to recognize a number of implicit assumptions in the process and consider how they might affect the quality of the model. One of the most important and often disregarded assumptions is the choice of loss function used to construct F. We formulate the training objective for the surrogate as

Conceptually, the challenge in using [**2**] to define F is twofold: First, since we cannot build a customized ρ and the space of θs is large, there likely exist many different **3** still makes the Euclidean assumption, but is more appropriate in the latent space Z, which is trained to be close to a full-dimensional, Euclidean space [although this cannot be guaranteed (22)]. We also expect the cyclical regularization to account for some of the nonisotropic error behavior. The cycle regularization directly in the data (or pixel) space can be unstable when the mapping between the two domains is not isomorphic, as is likely the case in a surrogate problem. Although this problem still persists, it is mitigated to a large extent by including cycle regularization in the latent space instead (similar observations have been reported by ref. 23 for image translation tasks). We explore this further in *Experiments and Results*.

Consequently, the optimization objective for MaCC surrogates can be expressed as

In this context, the bidirectional consistency penalty in Eq. **3** encourages the surrogate F to be consistent with the pseudoinverse in different ways. The first term is not affected by the mode collapse in the inverse since it is entirely computed in the output space alone. As a result, it encourages the high-dimensional output function to be smoothly varying, while the second term constrains the forward model to make predictions closer to the data manifold.

We observe that due to the ill-conditioned nature of the inverse problem, a neural network takes significantly longer to converge than the forward network. To address this challenge, we first pretrain the inverse network; i.e., we train a standalone pseudoinverse neural network until convergence. We then load this pretrained model and resume training with the forward model which is trained from scratch using the cyclical consistency. This process is sometimes referred to as a “warm start.” During cyclic training, the pseudoinverse continues to train with the loss**3**. Note that optimizing F according to Eq. **4** necessarily biases the model toward a particular pseudoinverse G. However, as is discussed in more detail below, the resulting F is highly consistent with a diverse set of Gs, different from the one used during training, constructed by bootstrapping the training data. In other words, by including the consistency regularization, the surrogate F converges to a solution where the resulting residuals are better guided by the characteristics of G. This achieves the same effect as explicitly constructing a specialized loss function ρ to better fit the data characteristics. As we show in our experiments, surrogates obtained using existing neural network solutions are inconsistent with the inverse model and result in nonsmooth, nonrobust models in practice.

### A New Self-Consistency Test for Surrogates.

Given the limitations of commonly used error metrics in surrogate evaluation, we introduce a metric for surrogate fidelity that couples the performance of both the forward and inverse models. We create a test set by varying only a single input parameter using a linear scan of 100 steps (from min to max), while fixing all other parameters. These 100 samples are then passed through the forward model and subsequently through the inverse model before obtaining back input parameter predictions. We check whether the predictions are consistent with the “ground truth,” i.e., the linear scan. This is conceptually similar to partial dependency tests in statistics and effectively captures sensitivities of the forward and inverse models.

Given the underdetermined nature of the inverse process, it is possible that the achieved self-consistency is biased by the specific solution of G. Hence, we propose to evaluate the consistency with respect to different solutions from the space of possible pseudoinverse models. To this end, we use multiple random subsets of the original training set (bootstraps) and obtain independent estimates of G. We find that the cyclical consistency remains valid for MaCC across all of these models, indicating that the self-consistency achieved is actually statistically meaningful. The consistency measure is given by

## Experiments and Results

### Dataset.

Our training dataset is composed of input parameter settings and the corresponding outputs from the semianalytical ICF simulator described in ref. 12, where each output is a collection of four multienergy images sized *SI Appendix*.

### Experimental Details.

First, we train the autoencoder with a 32-dimensional latent space until convergence requiring about 600 epochs. Additionally, we use a pretrained inverse that is trained for about 2,500 epochs. The architectural details for all networks are available in *SI Appendix*.

### Baselines.

We compare the performance of the surrogate across all of the proposed metrics with several baselines which we describe next: 1) For non-NN baseline, we train an extremely randomized tree model that predicts directly into the latent space, Z, coupled with the pretrained decoder. This is similar to recent work (20) in ICF where they use decision trees to initialize a surrogate that maps only to scalars. 2) For NN baseline, we consider an NN baseline (trained with and without cycle consistency) that takes in the inputs and predicts the images via two separate networks. We construct a baseline with similar architecture, with approximately the same number of parameters, the main difference being that it does not use the manifold consistency. In addition, we also create other baselines using ablation studies of the *SI Appendix*.

### Results.

#### Qualitative evaluation.

Fig. 2*A* shows random samples from the simulator and their corresponding predictions obtained using our surrogate, demonstrating that MaCC captures details very accurately, across the four energy channels. Next, Fig. 2*B* illustrates the residual error images for 20 randomly chosen examples (only one energy band shown) obtained using predictions from the baseline and MaCC. All images are intensity normalized by the same maximum intensity value. In most cases, MaCC predicts higher-quality outputs, where smaller residuals indicate higher-fidelity predictions.

We evaluate the quantitative performance of the surrogates using widely adopted metrics, namely MSE and *A*.

#### Cycle-consistency score.

We show the results for one particular pseudoinverse trained with a random *SI Appendix*. In Fig. 4*A*, we show how cyclical regularization impacts the quality of the surrogate model, against its tendency to be self-consistent. We observe that a small

### Benefits of Cyclical Consistency.

Cyclical consistency acts as a regularization technique that helps in smoothing out the prediction space, and as a result we expect to see gains in predictive performance of the forward model when there are fewer training data available, as well as in improved robustness to perturbed inputs. We see both of these to be the case and discuss the results next.

#### Behavior in small data regimes.

We observe improved predictive performance of the forward model when there are significantly fewer training samples, as shown in Fig. 3. We train different surrogates while providing access only to a fraction of the training set. It must be noted that the autoencoder is used in this experiment, which has been trained on the 100,000 dataset, but it is unsupervised; i.e., it only approximates the physics manifold without any knowledge of the forward process. We evaluate the performance of all models on the same 10,000 validation set as before to make them comparable Additionally, we show generalization when an “oracle” inverse is available, in which the inverse has access to the entire dataset as an upper bound. The benefit makes it clear that the inverse has useful gradients to improve the quality of the forward model, sometimes reducing prediction error by nearly

#### Robustness to sampling artifacts.

At test time, we add a small amount of uniform random noise, *B*. On the *y* axis we show the sensitivity to local perturbations, i.e., the difference in MSE between **6** on the *x* axis. We observe that the cyclical regularization results in significantly more robust models, while having very similar prediction errors on clean data, as seen in Fig. 4*A*. To ensure that the perturbations are not extreme, we pick

#### Discussion.

In this paper, we introduced MaCC surrogates, which contain two distinct elements: a pretrained autoencoder that enforces the surrogate to map input parameters to the latent space, i.e.,

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: anirudh1{at}llnl.gov.

Author contributions: R.A., J.J.T., P.-T.B., and B.K.S. designed research; R.A. performed research; R.A., J.J.T., and B.K.S. contributed new reagents/analytic tools; R.A. analyzed data; and R.A., J.J.T., P.-T.B., and B.K.S. wrote the paper.

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1916634117/-/DCSupplemental.

- Copyright © 2020 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).

## References

- ↵
- M. Paganini,
- L. de Oliveira,
- B. Nachman

- ↵
- Y. Zhu,
- N. Zabaras

- ↵
- L. Ardizzone,
- J. Kruse,
- C. Rother,
- U. Köthe

- ↵
- ↵
- D. Ulyanov,
- A. Vedaldi,
- V. Lempitsky

- ↵
- A. Shocher,
- N. Cohen,
- M. Irani

- ↵
- R. A. Yeh et al.

- ↵
- A. Bora,
- A. Jalal,
- E. Price,
- A. G. Dimakis

- ↵
- Y. Zhu,
- N. Zabaras,
- P.-S. Koutsourelakis,
- P. Perdikaris

- ↵
- ↵
- R. K. Tripathy,
- I. Bilionis

- ↵
- J. Gaffney,
- P. Springer,
- G. Collins

*APS Division of Plasma Physics Meeting*(APS, 2014) http://meetings.aps.org/link/BAPS.2014.DPP.PO5.11. Accessed 7 April 2020. - ↵
- A. L. Kritcher et al.

- ↵
- J.-Y. Zhu,
- T. Park,
- P. Isola,
- A. A. Efros

- ↵
- Z. Yi,
- H. Zhang,
- P. Tan,
- M. Gong

- ↵
- Y. Choi et al.

- ↵
- J. Hoffman et al.

- ↵
- M. Shah,
- X. Chen,
- M. Rohrbach,
- D. Parikh

- ↵
- H. Kameoka,
- T. Kaneko,
- K. Tanaka,
- N. Hojo

- ↵
- K. D. Humbird,
- J. L. Peterson,
- R. G. McClarren

- ↵
- I. Tolstikhin,
- O. Bousquet,
- S. Gelly,
- B. Schoelkopf

- ↵
- G. Arvanitidis,
- L. K. Hansen,
- S. Hauberg

- ↵
- M. Binkowski,
- D. Hjelm,
- A. Courville

- ↵
- D. P. Kingma,
- J. Ba

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Computer Sciences