# Statistical finite elements for misspecified models

Edited by Nancy M. Reid, University of Toronto, Toronto, ON, Canada, and approved November 18, 2020 (received for review July 23, 2020)

## Significance

Science and engineering have benefited greatly from the ability of finite element methods (FEMs) to simulate nonlinear, time-dependent complex systems. The recent advent of extensive data collection from such complex systems now raises the question of how to systematically incorporate these data into finite element models, consistently updating the solution in the face of mathematical model misspecification with physical reality. This article describes general and widely applicable methodology for the coherent synthesis of data with FEM models, providing a data-driven probability distribution that captures all sources of uncertainty in the pairing of FEM with measurements.

## Abstract

We present a statistical finite element method for nonlinear, time-dependent phenomena, illustrated in the context of nonlinear internal waves (solitons). We take a Bayesian approach and leverage the finite element method to cast the statistical problem as a nonlinear Gaussian state–space model, updating the solution, in receipt of data, in a filtering framework. The method is applicable to problems across science and engineering for which finite element methods are appropriate. The Korteweg–de Vries equation for solitons is presented because it reflects the necessary complexity while being suitably familiar and succinct for pedagogical purposes. We present two algorithms to implement this method, based on the extended and ensemble Kalman filters, and demonstrate effectiveness with a simulation study and a case study with experimental data. The generality of our approach is demonstrated in

*SI Appendix*, where we present examples from additional nonlinear, time-dependent partial differential equations (Burgers equation, Kuramoto–Sivashinsky equation).### Sign up for PNAS alerts.

Get alerts for new articles, or get an alert when an article is cited.

The central role of physically derived, nonlinear, time-dependent partial differential equations (PDEs) in scientific and engineering research is undisputed, as is the need for numerical intervention in order to understand their behavior. The finite element method (FEM) has emerged as the foremost strategy to undergo this numerical intervention, yet when these discretized solutions are compared with empirical evidence, elements of model mismatch are revealed that require statistical formalisms to be dealt with appropriately (1–3). To address this problem of model misspecification, in this paper we introduce stochastic forcing inside the PDE and update the FEM discretized PDE solution with data in a filtering context.

Stochastic forcing is introduced through a random function within the governing equations. This represents an unknown process, which may have been omitted in the formulation of the physical model. For an elliptic linear PDE with coefficients $\mathrm{\Lambda}$, this can be expressed asThe push forward of the Gaussian random field ${\xi}_{\theta}$, with covariance parameters $\theta $, induces a probability measure over the space of admissible solutions to the above. To embed this into a finite element model, we start with the weak formwhere ${\mathcal{A}}_{\mathrm{\Lambda}}\left(\cdot ,\cdot \right)$ is the bilinear form generated from ${L}_{\mathrm{\Lambda}}$ and $\u27e8\cdot ,\cdot \u27e9$ is the appropriate Hilbert space inner product. Discretizing with finite elements $u\left(x\right)={\sum}_{i=1}^{M}{u}_{i}{\varphi}_{i}\left(x\right)$, $v\left(x\right)={\sum}_{i=1}^{M}{v}_{i}{\phi}_{i}\left(x\right)$ yields the Gaussian measure over the solution FEM coefficients $\mathbf{u}=\left({u}_{1},{u}_{2},\dots ,{u}_{M}\right)\in {\mathbb{R}}^{M}$:where ${\mathbf{A}}_{ij}={\mathcal{A}}_{\mathrm{\Lambda}}\left({\varphi}_{i},{\phi}_{j}\right)$, ${\mathbf{b}}_{i}=\u27e8f,{\phi}_{i}\u27e9$, and $\mathbf{G}{\left(\theta \right)}_{ij}=\u27e8{\varphi}_{i},{C}_{\theta}{\phi}_{j}\u27e9$. This defines a (finite-dimensional) prior distribution over the FEM model, which represents all assumed knowledge before observing data. The mean is the standard Galerkin solution, and the covariance results from the action of the discretized PDE operator on the covariance $\mathbf{G}\left(\theta \right)$; further details are contained in

$$\left\{\begin{array}{c}{L}_{\mathrm{\Lambda}}u=f+{\xi}_{\theta},\text{\hspace{1em}}{\xi}_{\theta}\sim \mathcal{G}\mathcal{P}\left(0,{C}_{\theta}\right),\text{\hspace{1em}}\hfill \\ u\u2254u\left(x\right),\hspace{0.17em}f\u2254f\left(x\right),\hspace{0.17em}x\in \mathrm{\Omega}\subset {\mathbb{R}}^{d}.\text{\hspace{1em}}\hfill \end{array}\right.$$

$${\mathcal{A}}_{\mathrm{\Lambda}}\left(u,v\right)=\u27e8f,v\u27e9,+\u27e8{\xi}_{\theta},v\u27e9,$$

$$p\left(\mathbf{u}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}\mathrm{\Lambda},\theta \right)=\mathcal{N}\left({\mathbf{A}}^{-1}\mathbf{b},{\mathbf{A}}^{-1}\mathbf{G}\left(\theta \right){\mathbf{A}}^{-\mathsf{T}}\right),$$

*SI Appendix*, section 1. This was first developed in ref. 4, and we demonstrate the generality of such an approach by extending it to nonlinear, time-dependent PDEs.An area in which nonlinear and time-dependent problems are ubiquitous is ocean dynamic processes, where essentially all problems stem from a governing system of nonlinear, time-dependent equations (e.g., the Navier–Stokes equations). The ocean dynamics community has grown increasingly cognizant of the importance of accurate uncertainty quantification (5, 6), with many possible applications [e.g., rogue waves (7), turbulent flow (8)] for our proposed methodology.

An example process is nonlinear internal waves (solitons), which are observed as waves of depression or elevation along a pycnocline in a density-stratified fluid and are of broad interest to both the scientific and engineering communities (9–13). The classical mathematical model for solitons is the Korteweg–de Vries (KdV) equation (14):where $u$ is the pycnocline displacement. Coefficients $\alpha $, $\beta $, and $c$ are determined by physical parameters. Eq.

$$\left\{\begin{array}{c}{u}_{t}+\alpha u{u}_{x}+\beta {u}_{xxx}+c{u}_{x}=0,\text{\hspace{1em}}\hfill \\ u\u2254u\left(x,t\right),x\in \left[0,L\right],t\in \left[0,T\right],\text{\hspace{1em}}\hfill \end{array}\right.$$

[1]

**1**is readily interpretable: waves propagate at wave speed $c$, nonlinear steepening results from $u{u}_{x}$, and dispersion is due to ${u}_{xxx}$. Relative coefficient values determine the dominating regime, and waves can vary from quasilinear to highly nonlinear.Despite KdV being well studied (15) and widely applied (16–18), its relative simplicity makes it prone to model mismatch. To compensate for this mismatch, we update the FEM discretized solution with observations in a filtering context. The resulting statistical FEM (statFEM) is shown using simulated and experimental data toFor practitioners faced with data, we believe these benefits are of importance, and we demonstrate the generality of our method in

*i*)

Approximate the data-generating process with a statistically coherent uncertainty quantification.

*ii*)

Synthesize physics and data to give an interpretable posterior distribution.

*iii*)

Utilize sparsely observed data to reconstruct observed phenomena.

*iv*)

Enable the application of simpler physical models, updated with observations.

*SI Appendix*with further examples. Code to replicate the analysis is freely available online.*## A Nonlinear, Time-Evolving Statistical FEM

A Gaussian process (GP), ${\xi}_{\theta}$, is introduced inside of the governing equations, which represent an unknown forcing process in space and time, with time-varying parameters $\theta $. For a general nonlinear PDE, this is given bywhere ${L}_{\mathrm{\Lambda}}$ and ${F}_{\mathrm{\Lambda}}$ represent linear and nonlinear differential operators, respectively, with coefficients $\mathrm{\Lambda}$. The push forward of the Gaussian measure ${\xi}_{\theta}$ induces a probability measure over the space of admissible solutions to Eq. The exact form of this covariance can be decided upon by domain experts so that the uncertainty induced is physically motivated. For example, ${k}_{\theta}$ can be chosen to be a Matérn covariance function to reflect the unknown forcing having derivatives up to a known order. We assume a white noise process in time to facilitate the application of standard Kalman methods to solve the filtering problem; this is also convention in stochastic differential equations (19). When computing the prior defined by Eq.

$${u}_{t}+{L}_{\mathrm{\Lambda}}u+{F}_{\mathrm{\Lambda}}\left(u\right)+{\xi}_{\theta}=0,\text{\hspace{1em}}{\xi}_{\theta}\sim \mathcal{G}\mathcal{P}\left(0,{C}_{\theta}\right),$$

[2]

**2**and characterizes our prior belief in the model based on modeling assumptions. The kernel of the covariance operator ${C}_{\theta}$ is given by$$\mathbb{E}\left[{\xi}_{\theta}\left(x,t\right){\xi}_{\theta}\left({x}^{\prime},{t}^{\prime}\right)\right]={k}_{\theta}\left(x,{x}^{\prime}\right)\cdot \delta \left(t,{t}^{\prime}\right).$$

**2**, we use fixed parameters $\theta $. When conditioning on data, we take an empirical Bayes approach and estimate $\theta $ through the log-marginal posterior.Coefficients of Eq.

**2**are assumed to be known, and we choose to update the numerical solution to the model, acknowledging that estimating $\mathrm{\Lambda}$ [using, e.g., maximum likelihood methods (20), Markov chain Monte Carlo (21), or inversion methods in general (22)] is also of utmost interest.Discretizing Eq. The vector ${\mathbf{e}}_{n-1}$ represents Galerkin discretized increments of a Brownian motion process, ${\mathbf{e}}_{n-1}\sim \mathcal{N}\left(\mathbf{0},{\mathrm{\Delta}}_{t}\mathbf{G}\left({\mathit{\theta}}_{n}\right)\right)$.

**2**using finite elements in space with an implicit or explicit Euler method in time,^{†}denote by ${\mathbf{u}}^{n}\in {\mathbb{R}}^{M}$ the FEM coefficients at time $n\mathrm{\Delta}t$, for time step $\mathrm{\Delta}t$. Further analysis of this discretization will be the focus of future work. For a potentially nonlinear system of equations ${\mathcal{F}}_{\mathrm{\Lambda}}$, the resultant system can be expressed as (full construction is given in*SI Appendix*, sections 2 and 3)$${\mathcal{F}}_{\mathrm{\Lambda}}\left({\mathbf{u}}^{n},{\mathbf{u}}^{n-1}\right)+{\mathbf{e}}_{n-1}=0.$$

Unlike the elliptic example in the Introduction, for Eq.

**2**the induced probability measure on the FEM coefficients is not available in closed form, and we present two approximations, based on the extended Kalman filter (EKF) and the ensemble Kalman filter (EnKF). The first linearizes about the current solution with the Jacobian $\mathbf{J}$ of the nonlinear ${\mathcal{F}}_{\mathrm{\Lambda}}$ (evaluated at the current solution) to give a Gaussian approximation of $p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathit{\theta}}_{1:n},\mathrm{\Lambda}\right)\approx \mathcal{N}\left({\mathbf{m}}_{n},{\mathbf{C}}_{n}\right)$. The second uses an ensemble in which a perturbed system is solved, with realizations from ${\mathbf{e}}_{n}$, at each time step. Summary statistics (e.g., mean, covariance) are then computed from this ensemble.For the prior, the deterministic FEM solution is identically equal to the mean in the EKF approach. However we have found in numerical experiments that the EKF method, due to the use of the Jacobian, inflates the covariance at points of high gradient with reduction at points approaching near-zero gradient, when using large time steps. This does not occur with the EnKF approach.

Algorithm 1: EKF algorithm |
---|

for $n\le {n}_{t}$ do |

(Prediction step) |

Solve $\mathcal{F}\left({\mathbf{m}}_{n|n-1},{\mathbf{m}}_{n-1|n-1}\right)=\mathbf{0}$. |

${\widehat{\mathbf{C}}}_{n|n-1}={\left({\mathbf{J}}^{n}\right)}^{-1}\left({\mathbf{J}}^{n-1}{\mathbf{C}}_{n-1|n-1}{\left({\mathbf{J}}^{n-1}\right)}^{\top}\right){\left.mb{J}^{n}\right)}^{-\top}$ |

Estimate: |

${\text{arg max}}_{{\mathit{\theta}}_{n},{\sigma}_{n}}\left\{\mathrm{log}p\left({\mathbf{y}}_{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n},{\sigma}_{1:n}\right)\right.$ |

$\left.+\mathrm{log}p\left({\mathit{\theta}}_{n}\right)+\mathrm{log}p\left({\sigma}_{n}\right)\right\}$. |

${\mathbf{C}}_{n|n-1}={\widehat{\mathbf{C}}}_{n|n-1}+{\mathrm{\Delta}}_{t}{\left({\mathbf{J}}^{n}\right)}^{-1}\mathbf{G}\left({\mathit{\theta}}_{n}\right){\left.{\mathbf{J}}^{n}\right)}^{-\top}$. |

${\mathbf{S}}_{n}={\mathbf{H}}_{n}{\mathbf{C}}_{n|n-1}{\mathbf{H}}_{n}^{\top}+{\sigma}_{n}^{2}\mathbf{I}$. |

(Analysis step) |

${\mathbf{m}}_{n|n}={\mathbf{m}}_{n|n-1}+{\mathbf{C}}_{n|n-1}{\mathbf{H}}_{n}^{\top}{\mathbf{S}}_{n}^{-1}\left({\mathbf{y}}_{n}-{\mathbf{H}}_{n}{\mathbf{m}}_{n|n-1}\right)$. |

${\mathbf{C}}_{n|n}={\mathbf{C}}_{n|n-1}-{\mathbf{C}}_{n|n-1}{\mathbf{H}}_{n}^{\top}{\mathbf{S}}_{n}^{-1}{\mathbf{H}}_{n}{\mathbf{C}}_{n|n-1}$. |

### Conditioning on Data.

Data ${\mathbf{y}}_{n}\in {\mathbb{R}}^{N}$ are observed at time $n\mathrm{\Delta}t$ on the grid ${\mathbf{x}}_{\text{obs}}$. These data are corrupted with noise ${\mathit{\eta}}_{n}\sim \mathcal{N}\left(\mathbf{0},{\sigma}_{n}^{2}\mathbf{I}\right)$ independent to the model ${\mathbf{u}}^{n}$ to give the data-generating process ${\mathbf{y}}_{n}={\mathbf{H}}_{n}{\mathbf{u}}^{n}+{\mathit{\eta}}_{n}$, where the linear observation operator ${\mathbf{H}}_{n}:{\mathbb{R}}^{M}\to {\mathbb{R}}^{N}$ maps from the computed solution grid to the observation grid, using the FEM interpolant.

The filtered distribution $p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n},{\mathit{\theta}}_{1:n},{\sigma}_{1:n},\mathrm{\Lambda}\right)$, where ${\mathbf{y}}_{1:n}=\left({\mathbf{y}}_{1},{\mathbf{y}}_{2},\dots ,{\mathbf{y}}_{n}\right)$, is our primary object of interest.

^{‡}We take a Bayesian interpretation and refer to this as the filtered posterior distribution or just the posterior, when the context is clear. However, as this is a filtering problem, non-Bayesian methods are perfectly valid. We assume that all distributions are Gaussian, so the posterior can be computed with standard methods in data assimilation (23); we use the EKF (24) and the EnKF (stochastic form) (25).The initial conditions are known (i.e., they are given a Dirac measure). For time $n$, we make a tentative prediction step according to the PDE model ${\mathcal{F}}_{\mathrm{\Lambda}}$, propagating uncertainty in the previous time step [described by $p\left({\mathbf{u}}^{n-1}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n-1},{\sigma}_{1:n-1}\right)$], to give the prediction measure $p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n-1},{\sigma}_{1:n-1}\right)$. Parameters $\left({\mathit{\theta}}_{n},{\sigma}_{n}\right)$ are then estimated, and the full prediction step is completed to estimate $p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n},{\sigma}_{1:n}\right)$. Data observed at time $n{\mathrm{\Delta}}_{t}$ are then conditioned on to give the updated filtering distribution $p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n},{\mathit{\theta}}_{1:n},{\sigma}_{1:n}\right)$.

We assume the parameters are independent across time [i.e., $p\left({\mathit{\theta}}_{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathit{\theta}}_{n-1}\right)=p\left({\mathit{\theta}}_{n}\right)$]. Parameters may also be time constant, which is discussed in the maximum likelihood setting in ref. 26 and in the hierarchical Bayesian setting in ref. 27.

*SI Appendix*also contains a possible modification of the method that accounts for time-invariant parameters. The following procedure provides an overview of the method, and*Algorithms 1*and*2*give pseudocode versions.#### Conditioning procedure.

At time $n$, assume that the measure on the previous time is described byThen, proceed as follows:

$$p\left({\mathbf{u}}^{n-1}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n-1},{\sigma}_{1:n-1}\right)=\mathcal{N}\left({\mathbf{m}}_{n-1|n-1},{\mathbf{C}}_{n-1|n-1}\right).$$

*i*)

Compute the tentative prediction step:

$$\begin{array}{ll}\hfill & p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n-1},{\sigma}_{1:n-1}\right)\hfill \\ \hfill & \hspace{0.17em}=\int p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{u}}^{n-1}\right)p\left(\mathrm{d}{\mathbf{u}}^{n-1}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n-1},{\sigma}_{1:n-1}\right)\hfill \\ \hfill & \hspace{0.17em}\approx \mathcal{N}\left({\widehat{\mathbf{m}}}_{n|n-1},{\widehat{\mathbf{C}}}_{n|n-1}\right).\hfill \end{array}$$

*i*)

Maximize the EKF log-marginal posterior to estimate parameters:

$$\begin{array}{ll}\hfill & {\text{arg max}}_{{\mathit{\theta}}_{n},{\sigma}_{n}}\left\{\mathrm{log}p\left({\mathbf{y}}_{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n},{\sigma}_{1:n}\right)\right.\hfill \\ \hfill & \text{\hspace{1em}}\left.+\mathrm{log}p\left({\mathit{\theta}}_{n}\right)+\mathrm{log}p\left({\sigma}_{n}\right)\right\},\hfill \end{array}$$

*i*)

where $\widehat{\mathbf{G}}\left({\mathit{\theta}}_{n}\right)={\mathrm{\Delta}}_{t}{\left({\mathbf{J}}^{n}\right)}^{-1}\mathbf{G}\left({\mathit{\theta}}_{n}\right){\left({\mathbf{J}}^{n}\right)}^{-\top}$,

$$\begin{array}{ll}\hfill & p\left({\mathbf{y}}_{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n},{\sigma}_{1:n}\right)\hfill \\ \hfill & \text{\hspace{1em}}=\mathcal{N}\left({\mathbf{H}}_{n}{\widehat{\mathbf{m}}}_{n|n-1},{\mathbf{H}}_{n}{\widehat{\mathbf{C}}}_{n|n-1}{\mathbf{H}}_{n}^{\top}+{\mathbf{H}}_{n}\widehat{\mathbf{G}}\left({\mathit{\theta}}_{n}\right){\mathbf{H}}_{n}^{\top}+{\sigma}_{n}^{2}\mathbf{I}\right).\hfill \end{array}$$

*i*)

Compute the full prediction step:

$$\begin{array}{ll}\hfill & p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n},{\sigma}_{1:n-1}\right)\hfill \\ \hfill & \hspace{0.17em}=\int p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{u}}^{n-1},{\mathit{\theta}}_{n}\right)p\left(\mathrm{d}{\mathbf{u}}^{n-1}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n-1},{\sigma}_{1:n-1}\right)\hfill \\ \hfill & \hspace{0.17em}\approx \mathcal{N}\left({\mathbf{m}}_{n|n-1},{\mathbf{C}}_{n|n-1}\right).\hfill \end{array}$$

*i*)

Complete the analysis step:

$$\begin{array}{ll}\hfill & p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n},{\mathit{\theta}}_{1:n},{\sigma}_{1:n}\right)\hfill \\ \hfill & \text{\hspace{1em}}\propto p\left({\mathbf{y}}_{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{u}}^{n},{\sigma}_{n}\right)p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n},{\sigma}_{1:n-1}\right)\hfill \\ \hfill & \text{\hspace{1em}}=\mathcal{N}\left({\mathbf{m}}_{n|n},{\mathbf{C}}_{n|n}\right).\hfill \end{array}$$

The prior $p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathit{\theta}}_{n},\mathrm{\Lambda}\right)$ is recovered if only the full prediction step (step 3) is completed at each iteration; completing the full sequence gives the posterior. Optimization of the log-marginal posterior is done using the limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm (28) with starting points set to the previous estimates. This log-marginal posterior is calculable due to the Gaussian assumption made in step 1. Prior information on hyperparameters is incorporated through $p\left({\mathit{\theta}}_{n}\right)$ and $p\left({\sigma}_{n}\right)$, which regularizes the optimization problem.

### Simulation Study.

We condition on data generated from an extended Korteweg–de Vries (eKdV) equation with a cubic nonlinear term:

setting $\alpha =1$, $\beta =0.01$, and $\epsilon =20$. The misspecified KdV modelhas the same coefficient values, with initial conditions set to a wave of depression $u\left(x,0\right)=-0.3{\text{sech}}^{2}\left(x-15\right)$ on the space–time grid $\left(x,t\right)\in \left[\mathrm{0,20}\right]\times \left[\mathrm{0,50}\right]$. Boundary conditions are periodic. Gaussian random forcing ${\xi}_{\theta}$ has spatial covariance ${k}_{\theta}\left(x,{x}^{\prime}\right)={\tau}^{2}\mathrm{exp}\left(-\parallel x-{x}^{\prime}{\parallel}^{2}/\left(2{\ell}^{2}\right)\right)$ (we refer to $\tau $ as the scale parameter and $\ell $ as the length parameter).

$${w}_{t}+\alpha w{w}_{x}+\beta {w}_{xxx}+\epsilon {w}^{3}{w}_{x}=0,$$

Algorithm 2: EnKF algorithm |
---|

for $n\le {n}_{t}$ do |

(Prediction step) |

for $i\le {N}_{ens}$ do |

Solve $\mathcal{F}\left({\mathbf{u}}_{\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{d}}^{n,\left[i\right]},{\mathbf{u}}^{n-1,\left[i\right]}\right)=0$. |

Compute ${\mathbf{m}}_{n|n-1}$ and ${\widehat{\mathbf{C}}}_{n|n-1}$ from $\left\{{\mathbf{u}}_{\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{d}}^{n,\left[i\right]}\right\}$ |

Estimate: |

${\text{arg max}}_{{\mathit{\theta}}_{n},{\sigma}_{n}}\left\{\mathrm{log}p\left({\mathbf{y}}_{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n},{\sigma}_{1:n}\right)\right.$ |

$\left.+\mathrm{log}p\left({\mathit{\theta}}_{n}\right)+\mathrm{log}p\left({\sigma}_{n}\right)\right\}$ |

for $i\le {N}_{ens}$ do |

Solve $\mathcal{F}\left({\mathbf{u}}_{\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{d}}^{n,\left[i\right]},{\mathbf{u}}^{n-1,\left[i\right]}\right)+{\mathbf{e}}_{n-1}^{\left[i\right]}=0$. |

Compute ${\mathbf{m}}_{n|n-1}$ and ${\widehat{\mathbf{C}}}_{n|n-1}$ from $\left\{{\mathbf{u}}_{\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{d}}^{n,\left[i\right]}\right\}$ |

${\mathbf{S}}_{n}={\mathbf{H}}_{n}{\mathbf{C}}_{n|n-1}{\mathbf{H}}_{n}^{\top}+{\sigma}_{n}^{2}\mathbf{I}$. |

(Analysis step) |

for $i\le {N}_{ens}$ do |

${\mathbf{u}}^{n,\left[i\right]}={\mathbf{u}}_{\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{d}}^{n,\left[i\right]}+{\mathbf{C}}_{n|n-1}{\mathbf{H}}_{n}^{\top}{\mathbf{S}}_{n}^{-1}\left({\mathbf{y}}_{n}+{\mathit{\eta}}_{n}^{\left[i\right]}-{\mathbf{H}}_{n}{\mathbf{u}}_{\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{d}}^{n,\left[i\right]}\right)$ |

$${u}_{t}+\alpha u{u}_{x}+\beta {u}_{xxx}+{\xi}_{\theta}=0$$

[3]

KdV is discretized following ref. 29, using ${P}_{1}$ trial functions and ${P}_{0}$ testing functions, with a Crank–Nicolson method in time. The data-generating process is simulated using Dedalus (30) with 1,024 grid points in space. This is then down sampled to 20 grid points and jittered with synthetic Gaussian observational noise (mean 0, variance $0.00{1}^{2}$) to give the simulated dataset.

We assume ${\mathbf{y}}_{n}=\mathbf{H}{\mathbf{u}}^{n}+{\mathit{\eta}}_{n}$, where ${\mathbf{u}}^{n}$ is the Galerkin discretized solution to Eq.

**3**and ${\mathit{\eta}}_{n}\sim \mathcal{N}\left(0,{\sigma}_{n}^{2}\mathbf{I}\right)$. Hyperparameters ${\mathit{\theta}}_{n}=\left({\tau}_{n},{\ell}_{n}\right)$ and noise level ${\sigma}_{n}$ are estimated at each step by maximizing the log-marginal posterior, with the weakly informative truncated Gaussian priors ${\tau}_{n}\sim {\mathcal{N}}_{+}\left(1,{1}^{2}\right)$, ${\ell}_{n}\sim {\mathcal{N}}_{+}\left(1,{1}^{2}\right)$, and ${\sigma}_{n}\sim {\mathcal{N}}_{+}\left(0,{1}^{2}\right)$. The EnKF is used in this section, with ${n}_{x}=400$, ${n}_{t}=\mathrm{2,001}$, and ${N}_{ens}=400$. Results are presented in Fig. 1.Fig. 1.

For a fixed set of hyperparameters ${\tau}_{n}=0.0025$, ${\ell}_{n}=1$ for all $n$, the data-generating process and estimated prior are shown in Fig. 1

*A*, appearing visually mismatched in the mean via phase shift, increased oscillations, and increased wave interactions. Note that the stochastic forcing induces an uncertainty about the PDE solution, represented by the $95\%$ credible intervals shown. Note also that the data-generating process is approximately contained within the credible intervals.Fig. 1

*B*shows that the posterior mean approximates the data-generating process, and the posterior uncertainty bounds have shrunk as a result of conditioning, indicating high certainty about the posterior mean values. Model discrepancy between the data and the statFEM solution has been corrected for. The space–time view of the posterior, shown in Fig. 1*C*, shows that the posterior has incorporated the complex soliton interactions in the data, not present in the prior.Parameter estimates (Fig. 1

*D*) indicate that the length and noise parameters are both stable, with the noise being slightly overestimated (i.e., ${\sigma}_{n}\approx 0.003>0.001$). Times at which the noise is not identified result in it being set to the lower bound. The scale parameter quantifies the accuracy of the model prediction step at each time step. In this case, model predictions vary in their accuracy and appear approximately bounded to within $\left(1{0}^{-5},1{0}^{-1}\right)$.## Case Study: Experimental Data

We now apply the method to the experimental data collected in ref. 31. Experiments were conducted to study weakly nonlinear models for internal waves in lakes and consisted of generating internal waves in a two-layer stratified system, inside of a clear acrylic tank of dimensions $6\times 0.3\times 0.29\hspace{0.17em}\text{m}$. The tank contained an upper layer of fresh water and a lower layer of saline water, with a density gradient of $\mathrm{\Delta}\rho =20\hspace{0.17em}{\text{kgm}}^{-3}$. The tank was able to rotate in order to establish the initial conditions, which were an inclined plane of angle $\mathit{\vartheta}=0.5\xb0$. This initial condition mimics the shear induced by strong winds in lakes. At time $t=0$, the tank is rotated to restore it to the horizontal.

Data were recorded at three spatially equidistant locations in the tank using ultrasonic wave gauges (Fig. 2), taking measurements approximately every $0.01\hspace{0.17em}\text{s}$, up to $T=\mathrm{1,000}\hspace{0.17em}\text{s}$; we use data up to $T=300\hspace{0.17em}\text{s}$. Data are measured in voltages and are postprocessed to give pynocline displacements in meters. These data are plotted in Fig. 3, where the small measurement error is visually apparent. Transient behavior is observed before steepening, and a soliton wave train forms; three such steepening events are observed in the data we analyze. As $T\to \mathrm{1,000}\hspace{0.17em}\text{s}$, dissipation results in the wave profile approaching a flat steady-state profile.

Fig. 2.

Fig. 3.

Our physical model is an eKdV equation with—for computational simplicity—a linear dissipation term. We acknowledge that for laminar boundaries, other methods are preferred (31, 32). Including some form of dissipation is important as otherwise, the model becomes impractically mismatched by the end of the simulation. The eKdV is given byfor $u\u2254u\left(x,t\right)$, $x\in \left[0,L\right]$, $t\in \left[0,T\right]$, and with coefficientsWe set ${\xi}_{\theta}$ as a GP as described previously, with spatial covariance kernel ${k}_{\theta}$ set to a squared exponential with scale and length hyperparameters $\tau $ and $\ell $. For the experiment under consideration, we have ${h}_{1}=0.232\hspace{0.17em}\text{m}$, ${h}_{2}=0.058\hspace{0.17em}\text{m}$, $H={h}_{1}+{h}_{2}=0.29\hspace{0.17em}\text{m}$, and ${\rho}_{0}=\mathrm{1,000}\hspace{0.17em}{\text{kgm}}^{-3}$. The dissipation coefficient $\nu $ is an inverse timescale, which is set to $3\times 1{0}^{-3}\hspace{0.17em}{\text{s}}^{-1}$.

$${u}_{t}+\alpha u{u}_{x}+\beta {u}_{xxx}+c{u}_{x}+\nu u+{\xi}_{\theta}=0$$

[4]

$$\alpha =\frac{3}{2}\frac{c\left({h}_{1}-{h}_{2}\right)}{{h}_{1}{h}_{2}},\hspace{0.17em}\beta =\frac{c{h}_{1}{h}_{2}}{6},\hspace{0.17em}c=\sqrt{\frac{{g}^{\prime}{h}_{1}{h}_{2}}{H}},\hspace{0.17em}{g}^{\prime}=\frac{\mathrm{\Delta}\rho g}{{\rho}_{0}}.$$

Incorporation of reflective boundary conditions is done by solving the eKdV equation across the extended domain $\left[\mathrm{0,2}L\right]$ with periodic boundary conditions and summing solutions in the (reflected) subdomains $\left[0,L\right]$, $\left[L,2L\right]$:Details on the derivation are in ref. 31. Solutions to the deterministic version of Eq.

$${u}_{\text{exact}}\left(x,t\right)\u2254u\left(x,t\right)+u\left(2L-x,t\right),\text{\hspace{1em}}x\in \left[0,L\right].$$

**4**at the locations of the wave gauges are shown in Fig. 3. We show the deterministic solution instead of the prior due to accumulation of errors for large simulation times.The eKdV model does not capture the observed behavior exactly. The model waves have higher velocity than the observations, and model amplitudes are slightly larger than observed amplitudes. It is conjectured that this is due to misparameterization of dissipation, but in any case, the model is misspecified. Rather than estimating the eKdV parameters using inversion techniques, we sequentially update the model with observations to give the posterior $p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n},{\mathit{\theta}}_{1:n},{\sigma}_{1:n}\right)$.

As before, we assume ${\mathbf{y}}_{n}=\mathbf{H}{\mathbf{u}}^{n}+{\mathit{\eta}}_{n}$ with known noise ${\mathit{\eta}}_{n}\sim \mathcal{N}\left(\mathrm{0,1.3588}\times 1{0}^{-8}\mathbf{I}\right)$. As we solve on the extended domain $\left[\mathrm{0,2}L\right]$ and sum solutions, the observation operator $\mathbf{H}$ is taken to be the sum of the appropriate function values given by our FEM interpolant (a linear operation). The observation points are unchanging, as is the solution mesh, so $\mathbf{H}$ is constant in time. The hyperparameters of ${\xi}_{\theta}$, ${\mathit{\theta}}_{n}=\left({\tau}_{n},{\ell}_{n}\right)$, must be estimated at each iteration by maximizing the log-marginal posterior. Due to small data in space (three observations each time step), we use a projection method to estimate hyperparameters. This linearly projects the predicted mean ${\mathbf{m}}_{n|n-1}$ forward, estimated from the data points: ${y}_{i}^{n}={a}_{n}+{b}_{n}{\mathbf{m}}_{n|n-1}\left({x}_{i}\right)$. Parameters ${a}_{n},{b}_{n}$ are estimated to give the best least squares linear projection from the prediction to the data. This gives a projected dataset, ${\stackrel{\u0303}{\mathbf{y}}}_{n}$, using the linear shift: ${\stackrel{\u0303}{\mathbf{y}}}_{n}={a}_{n}+{b}_{n}{\mathbf{m}}_{n|n-1}$. The estimated hyperparameters are then given by ${\text{arg max}}_{{\mathit{\theta}}_{n}}\left\{\mathrm{log}p\left({\stackrel{\u0303}{\mathbf{y}}}_{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n-1},{\mathit{\theta}}_{1:n},{\sigma}_{1:n}\right)+\mathrm{log}p\left({\mathit{\theta}}_{n}\right)\right\}$, in which the observed data ${\mathbf{y}}_{n}$ are replaced with the projected data ${\stackrel{\u0303}{\mathbf{y}}}_{n}$. We project to a grid of 100 points uniformly spaced across the solution grid. Note that this is only for the parameter estimation step, and we do not use this ${\stackrel{\u0303}{\mathbf{y}}}_{n}$ as the data in the analysis step.

We set weakly informative priors: ${\tau}_{n}\sim {\mathcal{N}}_{+}\left(1,{1}^{2}\right)$ and ${\ell}_{n}\sim {\mathcal{N}}_{+}\left(1,{1}^{2}\right)$. The posterior is computed using the ensemble method with ${n}_{x}=200$, ${n}_{t}=\mathrm{1,001}$, and ${N}_{ens}=\mathrm{2,048}$. Results are shown in Fig. 3. The posterior mean values of $p\left({\mathbf{u}}^{n}\text{\hspace{0.5em}}|\text{\hspace{0.5em}}{\mathbf{y}}_{1:n},{\mathit{\theta}}_{1:n},{\sigma}_{1:n}\right)$ at the wave gauges are shown in Fig. 3

*A*and offer a close fit to the data in comparison with the eKdV solution. The credible intervals shrink about the data (compare Fig. 3*B*) and are not seen on the figure.Posterior wave profiles are shown in Fig. 3

*B*and demonstrate that given the data, the method is able to yield a sensible estimate for the underlying wave profile and is hence able to reconstruct the wave profile given sparse observations in space (Fig. 3*D*). Furthermore the provided uncertainty quantification is physically sensible, with bounds contracting about the data and expanding near the boundaries.The hyperparameters, ${\mathit{\theta}}_{n}$, are shown in Fig. 3

*C*. The scale parameter is seen to vary between two distinct levels, indicating that the model predictions vary in their accuracy. The amplitude of these mismatch scales shows that in this case, model mismatch is a cumulative effect that takes some time before it is obviously occurring (Fig. 3*A*, eKdV solution). Repeated conditioning on data helps to mitigate these long timescale effects due to continual updating. A space–time view of the posterior mean wave profile is shown in Fig. 3*D*, demonstrating that the general behavior of the flow (e.g., reflective boundary conditions, dissipation, wave train formation) is indeed captured.## Conclusions

We present a data-driven approach to the FEM that assimilates observations into nonlinear, time-dependent PDEs by embedding model misspecification uncertainty in the governing equations and sequentially updating the discretized equations with observations in a filtering context. Examples presented using the KdV equation (and the additional systems studied in

*SI Appendix*) demonstrate that the method can approximate the data-generating process to give an interpretable posterior distribution, which reconstructs the observed phenomena.This work sets the foundation for future studies of embedding data within FEM models. The use of the underlying Kalman framework permits the drawing upon of ideas from high-dimensional data assimilation, which for the systems studied here, were not needed. Techniques from Bayesian inversion can also be used to provide uncertainty quantification for physical quantities of interest, which will also allow for more accurate prediction. Finally, we believe that the development of similar methodology for alternate discretizations (e.g., spectral methods) could also be of great benefit, allowing for even broader application.

## Data Availability

Data and code have been deposited on GitHub (https://github.com/connor-duffin/statkdv-paper).

## Acknowledgments

We thank the two anonymous referees whose suggestions greatly improved the manuscript. C.D. was supported by a Bruce and Betty Green Postgraduate Research Scholarship and an Australian Government Research Training Program Scholarship at the University of Western Australia. C.D. and E.C. were supported by Australian Research Council Industrial Transformation Research Hub Grant IH140100012. E.C. was supported by Australian Research Council Industrial Transformation Training Centre Grant IC190100031. M.G. was supported by Engineering and Physical Sciences Research Council Grants EP/R034710/1, EP/R018413/1, EP/R004889/1, and EP/P020720/1 and a Royal Academy of Engineering Research Chair.

## Supporting Information

Appendix (PDF)

- Download
- 6.17 MB

## References

1

M. C. Kennedy, A. O’Hagan, Bayesian calibration of computer models.

*J. Roy. Stat. Soc. B***63**, 425–464 (2001).2

K. Judd, L. A. Smith, Indistinguishable states II: The imperfect model scenario.

*Phys. Nonlinear Phenom.***196**, 224–242 (2004).3

J. O. Berger, L. A. Smith, On the statistical formalism of uncertainty quantification.

*Annu Rev Stat Appl.***6**, 433–460 (2019).4

M. Girolami, E. Febrianto, G. Yin, F. Cirak, The statistical finite element method (statFEM) for coherent synthesis of observation data and model predictions.

*Comput. Methods Appl. Mech. Eng.*, (2020).5

P. F. J. Lermusiaux, Uncertainty estimation and prediction for interdisciplinary ocean dynamics.

*J. Comput. Phys.***217**, 176–199 (2006).6

O. B. Fringer, C. N. Dawson, R. He, D. K. Ralston, Y. J. Zhang, The future of coastal and estuarine modeling: Findings from a workshop.

*Ocean Model.***143**, 101458 (2019).7

M. R. Alam, Predictability horizon of oceanic rogue waves.

*Geophys. Res. Lett.***41**, 8477–8485 (2014).8

A. J. Majda, M. Branicki, Lessons in uncertainty quantification for turbulent dynamical systems.

*Discrete Contin. Dyn. Syst. Ser. A***32**, 3133–3221 (2012).9

A. R. Osborne, T. L. Burch, Internal solitons in the Andaman Sea.

*Science***208**, 451–460 (1980).10

L. Boegman, M. Stastna, Sediment resuspension and transport by internal solitary waves.

*Annu. Rev. Fluid Mech.***51**, 129–154 (2019).11

D. Cacchione, L. F. Pratson, A. Ogston, The shaping of continental slopes by internal tides.

*Science***296**, 724–727 (2002).12

Y. H. Wang, C. F. Dai, Y. Y. Chen, Physical and ecological processes of internal waves on an isolated reef ecosystem in the South China Sea.

*Geophys. Res. Lett.***34**, L18609 (2007).13

X. Huang et al., An extreme internal solitary wave event observed in the northern South China Sea.

*Sci. Rep.***6**, 30041 (2016).14

D. J. Korteweg, G. de Vries, On the change of form of long waves advancing in a rectangular canal, and on a new type of long stationary waves.

*Lond. Edinb. Dubl. Phil. Mag. J. Sci.***39**, 422–443 (1895).15

P. G. Drazin, R. S. Johnson,

*Solitons: An Introduction*(Cambridge University Press, 1989).16

K. G. Lamb, L. Yan, The evolution of internal wave undular bores: Comparisons of a fully nonlinear numerical model with weakly nonlinear theory.

*J. Phys. Oceanogr.***26**, 2712–2734 (1996).17

P. E. Holloway, E. Pelinovsky, T. Talipova, A generalized Korteweg-de Vries model of internal tide transformation in the coastal zone.

*J. Geophys. Res. Oceans***104**, 18333–18350 (1999).18

K. R. Helfrich, W. K. Melville, Long nonlinear internal waves.

*Annu. Rev. Fluid Mech.***38**, 395–425 (2006).19

B. Øksendal,

*Stochastic Differential Equations*(Springer, 2003).20

E. L. Ionides, C. Bretó, A. A. King, Inference for nonlinear dynamical systems.

*Proc. Natl. Acad. Sci. U.S.A.***103**, 18438–18443 (2006).21

M. Girolami, Bayesian inference for differential equations.

*Theor. Comput. Sci.***408**, 4–16 (2008).22

A. M. Stuart, Inverse problems: A Bayesian perspective.

*Acta Numer.***19**, 451–559 (2010).23

K. Law, A. Stuart, K. Zygalakis,

*Data Assimilation*(Springer, Cham, Switzerland, 2015).24

A. H. Jazwinski,

*Stochastic Processes and Filtering Theory*(Courier Corporation, 2007).25

G. Evensen, The ensemble Kalman filter: Theoretical formulation and practical implementation.

*Ocean Dynam.***53**, 343–367 (2003).26

R. H. Shumway, D. S. Stoffer,

*Time Series Analysis and Its Applications: With R Examples*(Springer Texts in Statistics, Springer International Publishing, ed. 4, 2017).27

M. Katzfuss, J. R. Stroud, C. K. Wikle, Ensemble Kalman methods for high-dimensional hierarchical dynamic space-time models.

*J. Am. Stat. Assoc.*, 1–43 (2019).28

J. Nocedal, S. Wright,

*Numerical Optimization*(Springer Science & Business Media, 2006).29

A. Debussche, J. Printems, Numerical simulation of the stochastic Korteweg–de Vries equation.

*Phys. Nonlinear Phenom.***134**, 200–226 (1999).30

K. J. Burns, G. M. Vasil, J. S. Oishi, D. Lecoanet, B. P. Brown, Dedalus: A flexible framework for numerical simulations with spectral methods.

*Phys. Rev. Res.***2**, 023068 (2020).31

D. Horn, J. Imberger, G. Ivey, L. Redekopp, A weakly nonlinear model of long internal waves in closed basins.

*J. Fluid Mech.***467**, 269–287 (2002).32

R. Grimshaw, E. Pelinovsky, T. Talipova, Damping of large-amplitude solitary waves.

*Wave Motion***37**, 351–364 (2003).## Information & Authors

### Information

#### Published in

#### Classifications

#### Copyright

© 2021. This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

#### Data Availability

Data and code have been deposited on GitHub (https://github.com/connor-duffin/statkdv-paper).

#### Submission history

**Published online**: December 28, 2020

**Published in issue**: January 12, 2021

#### Keywords

#### Acknowledgments

We thank the two anonymous referees whose suggestions greatly improved the manuscript. C.D. was supported by a Bruce and Betty Green Postgraduate Research Scholarship and an Australian Government Research Training Program Scholarship at the University of Western Australia. C.D. and E.C. were supported by Australian Research Council Industrial Transformation Research Hub Grant IH140100012. E.C. was supported by Australian Research Council Industrial Transformation Training Centre Grant IC190100031. M.G. was supported by Engineering and Physical Sciences Research Council Grants EP/R034710/1, EP/R018413/1, EP/R004889/1, and EP/P020720/1 and a Royal Academy of Engineering Research Chair.

#### Notes

This article is a PNAS Direct Submission.

*It is available at https://github.com/connor-duffin/statkdv-paper.

†

Crank–Nicolson may also be used to ensure stability.

‡

^{#x2021;}From here on in, we implicitly condition on PDE coefficients $\mathrm{\Lambda}$.

### Authors

#### Competing Interests

The authors declare no competing interest.

## Metrics & Citations

### Metrics

#### Citation statements

#### Altmetrics

### Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

#### Cited by

Loading...

## View Options

### View options

#### PDF format

Download this article as a PDF file

DOWNLOAD PDF### Get Access

#### Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Personal login Institutional Login#### Recommend to a librarian

Recommend PNAS to a Librarian#### Purchase options

Purchase this article to get full access to it.