# The difficulty of computing stable and accurate neural networks: On the barriers of deep learning and Smale’s 18th problem

Edited by Ronald DeVore, Texas A&M University, College Station, TX; received April 16, 2021; accepted October 26, 2021

## Significance

Instability is the Achilles’ heel of modern artificial intelligence (AI) and a paradox, with training algorithms finding unstable neural networks (NNs) despite the existence of stable ones. This foundational issue relates to Smale’s 18th mathematical problem for the 21st century on the limits of AI. By expanding methodologies initiated by Gödel and Turing, we demonstrate limitations on the existence of (even randomized) algorithms for computing NNs. Despite numerous existence results of NNs with great approximation properties, only in specific cases do there also exist algorithms that can compute them. We initiate a classification theory on which NNs can be trained and introduce NNs that—under suitable conditions—are robust to perturbations and exponentially accurate in the number of hidden layers.

## Abstract

Deep learning (DL) has had unprecedented success and is now entering scientific computing with full force. However, current DL methods typically suffer from instability, even when universal approximation properties guarantee the existence of stable neural networks (NNs). We address this paradox by demonstrating basic well-conditioned problems in scientific computing where one can prove the existence of NNs with great approximation qualities; however, there does not exist any algorithm, even randomized, that can train (or compute) such a NN. For any positive integers $K>2$ and

*L*, there are cases where simultaneously 1) no randomized training algorithm can compute a NN correct to*K*digits with probability greater than 1/2; 2) there exists a deterministic training algorithm that computes a NN with*K*–1 correct digits, but any such (even randomized) algorithm needs arbitrarily many training data; and 3) there exists a deterministic training algorithm that computes a NN with*K*–2 correct digits using no more than*L*training samples. These results imply a classification theory describing conditions under which (stable) NNs with a given accuracy can be computed by an algorithm. We begin this theory by establishing sufficient conditions for the existence of algorithms that compute stable NNs in inverse problems. We introduce fast iterative restarted networks (FIRENETs), which we both prove and numerically verify are stable. Moreover, we prove that only $\mathcal{O}(|\mathrm{log}\u200a(\mathrm{\u03f5})|)$ layers are needed for an*ϵ*-accurate solution to the inverse problem.### Sign up for PNAS alerts.

Get alerts for new articles, or get an alert when an article is cited.

Deep learning (DL) has demonstrated unparalleled accomplishments in fields ranging from image classification and computer vision (1–3), to voice recognition and automated diagnosis in medicine (4–6), to inverse problems and image reconstruction (7–12). However, there is now overwhelming empirical evidence that current DL techniques typically lead to unstable methods, a phenomenon that seems universal and present in all of the applications listed above (13–21) and in most of the new artificial intelligence (AI) technologies. These instabilities are often detected by what has become commonly known in the literature as “adversarial attacks.” Moreover, the instabilities can be present even in random cases and not just worst-case scenarios (22)—see Fig. 1 for an example of AI-generated hallucinations. There is a growing awareness of this problem in high-stakes applications and society as a whole (20, 23, 24), and instability seems to be the Achilles’ heel of modern AI and DL (Fig. 2,

*Top*row). For example, this is a problem in real-world clinical practice. Facebook and New York University’s 2019 FastMRI challenge reported that networks that performed well in terms of standard image quality metrics were prone to false negatives, failing to reconstruct small, but physically relevant image abnormalities (25). Subsequently, the 2020 FastMRI challenge (26) focused on pathologies, noting, “Such hallucinatory features are not acceptable and especially problematic if they mimic normal structures that are either not present or actually abnormal. Neural network models can be unstable as demonstrated via adversarial perturbation studies (19).” For similar examples in microscopy, see refs. 27 and 28. The tolerance level for false positives/negatives varies within different applications. However, for scenarios with a high cost of misanalysis, it is imperative that false negatives/positives be avoided. AI-generated hallucinations therefore pose a serious danger in applications such as medical diagnosis.Fig. 1.

Fig. 2.

Nevertheless, classical approximation theorems show that a continuous function can be approximated arbitrarily well by a neural network (NN) (29, 30). Thus, stable problems described by stable functions can always be solved stably with a NN. This leads to the following fundamental question:

## Question.

*Why does DL lead to unstable methods and AI-generated hallucinations, even in scenarios where one can prove that stable and accurate neural networks exist?*

## Foundations of AI for Inverse Problems.

To answer the above question we initiate a program on the foundations of AI, determining the limits of what DL can achieve in inverse problems. It is crucial to realize that an existence proof of suitable NNs does not always imply that they can be constructed by a training algorithm. Furthermore, it is not difficult to compute stable NNs. For example, the zero network is stable, but not particularly useful. The big problem is to compute NNs that are both stable and accurate (30, 31). Scientific computing itself is based on the pillars of stability and accuracy. However, there is often a trade-off between the two. There may be barriers preventing the existence of stable and accurate algorithms, and sometimes accuracy must be sacrificed to secure stability.

## Main Results.

We consider the canonical inverse problem of an underdetermined system of linear equations:

$$\text{Given}\text{measurements}\hspace{0.17em}y=Ax+e\in {\u2102}^{m},\text{recover}\hspace{0.17em}x\in {\u2102}^{N}.$$

[1]

Here, $A\in {\u2102}^{m\times N}$ represents a sampling model (

*m*<*N*), such as a subsampled discrete Fourier transform in MRI, and*x*the unknown quantity. The problem in Eq.**1**forms the basis for much of inverse problems and image analysis. The vector*e*models noise or perturbations. Our results demonstrate fundamental barriers preventing NNs (despite their existence) from being computed by algorithms. This helps shed light on the intricate question of why current algorithms in DL produce unstable networks, despite the fact that stable NNs often exist in the particular application. We show the following:1)

*Theorems 1*and

*2*: There are well-conditioned problems (suitable condition numbers bounded by 1) where, paradoxically, mappings from training data to suitable NNs exist, but no training algorithm (even randomized) can compute approximations of the NNs from the training data.

2)

*Theorem 2*: The existence of algorithms computing NNs depends on the desired accuracy. For any $K\in {\mathbb{Z}}_{\ge 3}$, there are well-conditioned classes of problems where simultaneously 1) algorithms may compute NNs to

*K*–1 digits of accuracy, but not

*K*; 2) achieving

*K*–1 digits of accuracy requires arbitrarily many training data; and 3) achieving

*K*–2 correct digits requires only one training datum.

3)

*Theorems 3*and

*4*: Under specific conditions that are typically present in, for example, MRI, there are algorithms that compute stable NNs for the problem in Eq.

**1**. These NNs, which we call fast iterative restarted networks (FIRENETs), converge exponentially in the number of hidden layers. Crucially, we prove that FIRENETs are robust to perturbations (Fig. 2,

*Bottom*row), and they can even be used to stabilize unstable NNs (Fig. 3).

Fig. 3.

4)

There is a trade-off between stability and accuracy in DL, with limits on how well a stable NN can perform in inverse problems. Fig. 4 demonstrates this with a U-net trained on images consisting of ellipses that is quite stable. However, when a detail not in the training set is added, it washes it out almost entirely. FIRENETs offer a blend of both stability and accuracy. However, they are by no means the end of the story. Tracing out the optimal stability vs. accuracy trade-off is crucial for applications and will no doubt require a myriad of different techniques to tackle different problems and stability tolerances.

Fig. 4.

## Fundamental Barriers

We first consider basic mappings used in modern mathematics of information, inverse problems, and optimization. Given a matrix $A\in {\u2102}^{m\times N}$ and a vector $y\in {\u2102}^{m}$, we consider the following three popular minimization problems:known respectively as quadratically constrained basis pursuit [we always assume existence of a feasible

$$\begin{array}{l}({P}_{1})\hspace{1em}{\text{argmin}}_{x\in {\u2102}^{N}}{F}_{1}^{A}(x)\u2254\left|\right|x|{|}_{{l}_{w}^{1}},\text{s}.\text{t}.\hspace{0.17em}\hspace{0.17em}\text{||}Ax-y{\text{||}}_{{\text{l}}^{2}}\text{}\le \hspace{0.17em}\text{}\mathrm{\u03f5},\\ ({P}_{2})\hspace{1em}{\text{argmin}}_{x\in {\u2102}^{N}}{F}_{2}^{A}(x,y,\lambda )\u2254\lambda \left|\right|x|{|}_{{l}_{w}^{1}}+||Ax-y|{|}_{{l}^{2}}^{2},\\ ({P}_{3})\hspace{1em}{\text{argmin}}_{x\in {\u2102}^{N}}{F}_{3}^{A}(x,y,\lambda )\u2254\lambda \left|\right|x|{|}_{{l}_{w}^{1}}+||Ax-y|{|}_{{l}^{2}},\end{array}$$

*x*for $({P}_{1})$], unconstrained least absolute shrinkage and selection operator (LASSO), and unconstrained square-root LASSO. Such sparse regularization problems are often used as benchmarks for Eq.**1**, and we prove impossibility results for computing the NNs that can approximate these mappings. Our results initiate a classification theory on which NNs can be computed to a certain accuracy.The parameters

*λ*and*ϵ*are positive rational numbers, and the weighted ${l}_{w}^{1}$ norm is given by $||x|{|}_{{l}_{w}^{1}}\u2254{\displaystyle {\sum}_{l=1}^{N}{w}_{l}}|{x}_{l}|,$ where each weight*w*is a positive rational. Throughout, we let_{j}$${\mathrm{\Xi}}_{j}(A,y)\hspace{0.17em}\text{be}\text{the}\text{set}\text{of}\text{minimizers}\text{for}\hspace{0.17em}({P}_{j}).$$

[2]

Let $A\in {\u2102}^{m\times N}$ and let $\mathcal{S}={\left\{{y}_{k}\right\}}_{k=1}^{R}\subset {\u2102}^{m}$ be a collection of samples ($R\in \mathbb{N}$). We consider the following key question:

### Question.

*Given a collection*Ω

*of pairs*$(A,\mathcal{S})$,

*does there exist a neural network approximating*Ξ

*,*

_{j}*and if so*,

*can such an approximation be trained or determined by an algorithm*?

To make this question precise, note that where $\left|\right|\xb7\left|\right|$ refers to the usual Euclidean operator norm. The bounds ${2}^{-n}$ are simply for convenience and can be replaced by any other sequence converging to zero. We also assume access to rational ${\left\{{x}_{k,n}\right\}}_{k=1}^{R}$ with

*A*and samples in $\mathcal{S}$ will typically never be exact, but can be approximated/stored to arbitrary precision. For example, this would occur if*A*was a subsampled discrete cosine transform. Thus, we assume access to rational approximations ${\left\{{y}_{k,n}\right\}}_{k=1}^{R}$ and*A*with_{n}$$||{y}_{k,n}-{y}_{k}|{|}_{{l}^{2}}\le {2}^{-n},\hspace{1em}||{A}_{n}-A||\le {2}^{-n},\hspace{1em}\forall n\in \mathbb{N},$$

[3]

$$\underset{{x}^{*}\in {\mathrm{\Xi}}_{j}({A}_{n},{y}_{k,n})}{\mathrm{inf}\u200a}||{x}_{k,n}-{x}^{*}|{|}_{{l}^{2}}\le {2}^{-n},\hspace{1em}\forall n\in \mathbb{N}.$$

[4]

Hence, the training data associated with $(A,\mathcal{S})\in \mathrm{\Omega}$ must be

$${\iota}_{A,\mathcal{S}}\u2254\left\{\left({y}_{k,n},{A}_{n},{x}_{k,n}\right)\hspace{0.17em}|\hspace{0.17em}k=1,\dots ,R,\text{and}\hspace{0.17em}n\in \mathbb{N}\right\}.$$

[5]

This set is formed of arbitrary precision rational approximations of finite collections of data associated with $(A,\mathcal{S})$. Given a collection Ω of pairs $(A,\mathcal{S})$, the class of all such admissible training data is denoted by

$${\mathrm{\Omega}}_{\mathcal{T}}\u2254\left\{{\iota}_{A,\mathcal{S}}\hspace{0.17em}\text{as}\text{in}\text{Eq}.\hspace{0.17em}\hspace{0.17em}5\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\text{|}\hspace{0.17em}(A,\mathcal{S})\in \mathrm{\Omega},\text{Eqs}.3\text{to}4\text{hold}\right\}.$$

Statements addressing the above question are summarized in

*Theorems 1*and*2*. We use ${\mathcal{N}}_{m,N}$ to denote the class of NNs from ${\u2102}^{m}$ to ${\u2102}^{N}$. We use standard definitions of feedforward NNs (32), precisely given in*SI Appendix*.Despite the existence of NNs guaranteed by

*Theorem 1*, computing or training such a NN from training data is most delicate. The following is stated precisely and proved in*SI Appendix*. We also include results for randomized algorithms, which are common in DL (e.g., stochastic gradient descent).#### Remark 1 (condition and class size).

The statement in Theorem 2 refers to the standard condition numbers used in optimization and scientific computing. For precise definitions, see

*SI Appendix*. The class Ω we construct is infinite. Similarly, one can design a finite class Ω with the same conclusion by allowing the sample size*R*to be infinite.#### Remark 2 (distributions on training data).

In DL it is often the case that one assumes some probability distribution on the training data. This is not needed for Theorem 2. However, having a probability distribution on the training data ${\iota}_{A,\mathcal{S}}$ would not invalidate statement 1 in Theorem 2. In particular, there is no (computable) probability distribution that would make statement 1 in Theorem 2 cease to be true. This follows from the probabilistic part of statement 1 in Theorem 2, as the existence of such a (computable) distribution and an algorithm would yield a randomized algorithm violating statement 1 in Theorem 2.

#### Remark 3 (on the role of K in Theorem 2).

The result should be understood as fixing an integer

*K*(and*L*) and then $\mathrm{\Omega}=\mathrm{\Omega}(K,L)$ depends on*K*and*L*. However, given a particular Ω one can ask, what is the largest*K*such that one can compute*K*correct digits? Note that we typically have $K=\lfloor \mathrm{log}\u200a({\mathrm{\u03f5}}^{-1})\rfloor $, where $\mathrm{\u03f5}>0$ is the so-called breakdown epsilon of the problem (33), i.e., the largest $\mathrm{\u03f5}>0$ for which all algorithms will fail to provide*ϵ*accuracy. When the breakdown epsilon $\mathrm{\u03f5}>0$, it is typically impossible to check whether an algorithm fails (33). Thus, even if an algorithm would succeed with probability 1/2, one could never trust the output.#### Remark 4 (Gödel, Turing, Smale, and Theorem 2).

Theorems 1 and 2 demonstrate basic limitations on the existence of algorithms that can compute NNs despite their existence. This relates to Smale’s 18th problem, “What are the limits of intelligence, both artificial and human?”, from the list of mathematical problems for the 21st century (34), which echoes the Turing test from 1950 (35). Smale’s discussion is motivated by the results of Gödel (36) and Turing (37) establishing impossibility results on what mathematics and digital computers can achieve (38). Our results are actually stronger, however, than what can be obtained with Turing’s techniques. Theorem 2 holds even for any randomized Turing or BSS machine that can solve the halting problem. It immediately opens up for a classification theory on which NNs can be computed by randomized algorithms. Theorem 3 is a first step in this direction. See also the work by Niyogi, Smale, and Weinberger (39) on existence results of algorithms for learning.

### Numerical Example.

To highlight the impossibility of computing NNs (Theorem 2)—despite their existence by Theorem 1

*—*we consider the following numerical example: Consider the problem $({P}_{3})$, with*w*= 1 and $\lambda =1$. Theorem 2 is stated for a specific input class $\mathrm{\Omega}=\mathrm{\Omega}(K)$ depending on the accuracy parameter_{l}*K*, and in this example we consider three different such classes. In Theorem 2, we required that*K*> 2 so that $K-2>0$, but this is not necessary to show the impossibility statement 1, so we consider*K*= 1, 3, 6. Full details of the following experiment are given in*SI Appendix*.To show that it is impossible to compute NNs that can solve $({P}_{3})$ to arbitrary accuracy we consider FIRENETs ${\mathrm{\Phi}}_{{A}_{n}}$ (the NNs in Theorem 3) and learned ISTA (LISTA) networks ${\mathrm{\Psi}}_{{A}_{n}}$ based on the architecture choice from ref. 40. The networks are trained to high accuracy on training data on the form of Eq.

**5**with*R*= 8, 000 training samples and*n*given as in Table 1. In all cases*N*= 20, $m=N-1$, and the ${x}_{k,n}$ s minimizing $({P}_{3})$ with input data $({y}_{k,n},{A}_{n})$ are all 6-sparse. The choice of*N*,*m*, and sparsity is to allow for fast training; other choices are certainly possible.Table 1.

${\mathrm{\Psi}}_{{A}_{n}}$ | ${\mathrm{\Phi}}_{{A}_{n}}$ | $\left|\right|{A}_{n}-A\left|\right|\le {2}^{-n}$ | ${10}^{-K}$ | $\mathrm{\Omega}(K)$ |
---|---|---|---|---|

$\left|\right|{y}_{n}-y|{|}_{{l}^{2}}\le {2}^{-n}$ | ||||

0.2999690 | 0.2597827 | n = 10 | 10^{–1} | K = 1 |

0.3000000 | 0.2598050 | n = 20 | 10^{–1} | K = 1 |

0.3000000 | 0.2598052 | n = 30 | 10^{–1} | K = 1 |

0.0030000 | 0.0025980 | n = 10 | 10^{–3} | K = 3 |

0.0030000 | 0.0025980 | n = 20 | 10^{–3} | K = 3 |

0.0030000 | 0.0025980 | n = 30 | 10^{–3} | K = 3 |

0.0000030 | 0.0000015 | n = 10 | 10^{–6} | K = 6 |

0.0000030 | 0.0000015 | n = 20 | 10^{–6} | K = 6 |

0.0000030 | 0.0000015 | n = 30 | 10^{–6} | K = 6 |

We demonstrate statement 1 from Theorem 2 on FIRENETs ${\mathrm{\Phi}}_{{A}_{n}}$ and LISTA networks ${\mathrm{\Psi}}_{{A}_{n}}$. Shown is the shortest

*l*^{2}distance between the output from the networks and the true solution of the problem $({P}_{3})$, with*w*= 1 and_{l}*λ*= 1, for different values of*n*and*K*. Note that none of the networks can compute the existing correct NN (that exists by Theorem 1 and coincides with ${\mathrm{\Xi}}_{3}$) to ${10}^{-K}$ digits accuracy, while all of them are able to compute approximations that are accurate to ${10}^{-K+1}$ digits [for the input class $\mathrm{\Omega}(K)$]. This agrees exactly with Theorem 2.Table 1 shows the errors for both LISTA and FIRENETs. Both network types are given input data (

*y*,_{n}*A*), approximating the true data (_{n}*y*,*A*). As is clear from Table 1, none of the networks are able to compute an approximation to the true minimizer in ${\mathrm{\Xi}}_{3}(A,y)$ to*K*digits accuracy. However, both networks compute an approximation with*K*–1 digits accuracy. These observations agree precisely with Theorem 2.### The Subtlety and Difficulty of Removing Instabilities and the Need for Additional Assumptions.

Theorem 2 shows that the problems $({P}_{j})$ cannot, in general, be solved by any training algorithm. Hence, any attempt at using the problems $({P}_{j})$ as approximate solution maps of the general inverse problem in Eq. whose difference $\left|\right|x-x\prime \left|\right|\gg 2\mathrm{\u03f5}$ is large, but where the difference lies close to the null space of

**1**, without additional assumptions, is doomed to fail. This is not just the case for reconstruction using sparse regularization, but also applies to other methods. In fact, any stable and accurate reconstruction procedure must be “kernel aware” (22), a property that most DL methods do not enforce. A reconstruction method $\mathrm{\Psi}:{\u2102}^{m}\to {\u2102}^{N}$ lacks kernel awareness if it approximately recovers two vectors$$\left|\right|\mathrm{\Psi}(Ax)-x\left|\right|\le \mathrm{\u03f5}\hspace{1em}\text{and}\hspace{1em}\left|\right|\mathrm{\Psi}(Ax\prime )-x\prime \left|\right|\le \mathrm{\u03f5}$$

[6]

*A*(which is nontrivial due to*m*<*N*) so that $\left|\right|A(x-x\prime )\left|\right|<\mathrm{\u03f5}$. In particular, by applying Eq.**6**and the triangle inequality twice, we have that$$\left|\right|\mathrm{\Psi}(Ax)-\mathrm{\Psi}(Ax\prime )\left|\right|\ge \left|\right|x-x\prime \left|\right|-2\mathrm{\u03f5}$$

[7]

implying instability, as it requires only a perturbation $e=A(x\prime -x)$ of size $\left|\right|e\left|\right|<\mathrm{\u03f5}$ for $\mathrm{\Psi}(Ax+e)=\mathrm{\Psi}(Ax\prime )$ to reconstruct the wrong image. The issue here is that if we want to accurately recover

*x*and $x\prime $, i.e., we want Eq.**6**to hold, then we cannot simultaneously have that $x-x\prime $ lies close to the kernel. Later we shall see conditions that circumvent this issue for our model class, thereby allowing us to compute stable and accurate NNs.While training can encourage the conditions in Eq.

**6**to hold, it is not clear how many of the defense techniques in DL, simultaneously, will protect against the condition $\left|\right|A(x-x\prime )\left|\right|<\mathrm{\u03f5}$. One standard attempt to remedy instabilities is adversarial training (41). However, while this strategy can potentially avoid Eq.**6**, it may yield poor performance. For example, consider the following optimization problem, which generates a reconstruction in the form of a NN given samples $\mathrm{\Theta}=\{({y}_{s},{x}_{s}):s=1,\dots ,R,A{x}_{s}={y}_{s}\}$ and $\mathrm{\u03f5},\lambda >0$:$$\underset{{}_{\varphi \in {\mathcal{N}}_{m,N}}}{\mathrm{min}}\text{}{\displaystyle \sum _{s=1}^{R}\text{}}\underset{{}_{\left|\right|e|{|}_{{l}^{2}}\le \mathrm{\u03f5}}}{\mathrm{max}}\text{}\left\{\text{}\left|\right|{x}_{s}\text{}-\text{}\varphi ({y}_{s})|{|}_{{l}^{2}}^{2}\text{}+\text{}\lambda ||{x}_{s}\text{}-\text{}\varphi ({y}_{s}\text{}+\text{}e)|{|}_{{l}^{2}}^{2}\text{}\right\}\text{}.\text{}$$

[8]

In other words, for each training point $(y,x)\in \mathrm{\Theta}$ we find the worst-case perturbation

*e*in the*ϵ*-ball around*y*. This is a simplified model of what one might do using generative adversarial networks (GANs) to approximate adversarial perturbations (42, 43). For simplicity, assume that*A*has full row rank*m*and that we have access to exact measurements*y*=_{s}*Ax*. Suppose that our sample is such that ${\mathrm{min}}_{i\ne j}||{y}_{i}-{y}_{j}|{|}_{{l}^{2}}>2\mathrm{\u03f5}$. In this case, $\varphi $ minimizes Eq._{s}**8**if and only if $\varphi ({y}_{s}+e)={x}_{s}$ for all*e*with $\left|\right|e|{|}_{{l}^{2}}\le \mathrm{\u03f5}.$ A piecewise affine network achieving this can easily be constructed using ReLU (rectified linear unit) activation functions. Now suppose that*x*_{2}is altered so that ${x}_{1}-{x}_{2}$ lies in the kernel of*A*. Then for any minimizer $\varphi $, we must have $\varphi ({y}_{1}+e)=\varphi ({y}_{2}+e)=({x}_{1}+{x}_{2})/2$ for any*e*with $\left|\right|e|{|}_{{l}^{2}}\le \mathrm{\u03f5}$, and hence we can never be more than $\left|\right|{x}_{1}-\varphi ({y}_{1})\left|\right|=\left|\right|{x}_{1}-{x}_{2}|{|}_{{l}^{2}}/2$ accurate over the whole test sample. Similar arguments apply to other methods aimed at improving robustness such as adding noise to training samples (known as “jittering”) (Fig. 4). Given such examples and Theorem 2, we arrive at the following question:### Question.

*Are there sufficient conditions on A that imply the existence of an algorithm that can compute a neural network that is both stable and accurate for the problem in*

*Eq.*

**1**?

## Sufficient Conditions for Algorithms to Compute Stable and Accurate NNs

Sparse regularization, such as the problems $({P}_{j})$, forms the core of many start-of-the-art reconstruction algorithms for inverse problems. We now demonstrate a sufficient condition (from compressed sensing) guaranteeing the existence of algorithms for stable and accurate NNs. Sparsity in levels is a standard sparsity model for natural images (44–47) as images are sparse in levels in X-lets (wavelets, curvelets, shearlets, etc.).

This model has been used to explain the effectiveness of compressed sensing (46, 48–52) in real-life applications (53). For simplicity, we assume that each ${s}_{l}>0$ and that ${w}_{i}={w}_{(l)}$ if ${M}_{l-1}+1\le i\le {M}_{l}$ (the weights in the ${l}_{w}^{1}$ norm are constant in each level). For a vector

*c*that is compressible in the wavelet basis, ${\sigma}_{\mathbf{s},\mathbf{M}}{(x)}_{{l}_{w}^{1}}$ is expected to be small if*x*is the vector of wavelet coefficients of*c*and the levels correspond to wavelet levels (54). In general, the weights are a prior on anticipated support of the vector (55), and we discuss some specific optimal choices in*SI Appendix*.For $\mathcal{I}\subset \{1,\dots ,N\}$, let ${P}_{\mathcal{I}}\in {\u2102}^{N\times N}$ denote the projection ${({P}_{\mathcal{I}}x)}_{i}={x}_{i}$ if $i\in \mathcal{I}$ and ${({P}_{\mathcal{I}}x)}_{i}=0$ otherwise. The key kernel-aware property that allows for stable and accurate recovery of $(\mathbf{s},\mathbf{M})$ -spare vectors for the inverse problem Eq.

**1**is the weighted robust null space property in levels (wrNSPL):We highlight that if where $C=C(\rho ,\gamma )>0$ is a constant depending only on

*A*satisfies the wrNSPL, then$$\left|\right|x-x\prime |{|}_{{l}^{2}}\le C||A(x-x\prime )|{|}_{{l}^{2}},\hspace{1em}\forall x,x\prime \in {\mathrm{\Sigma}}_{\mathbf{s},\mathbf{M}},$$

*ρ*and*γ*(*SI Appendix*). This ensures that if $\left|\right|x-x\prime |{|}_{{\ell}^{2}}\gg 2\mathrm{\u03f5}$, then we cannot, simultaneously, have that $\left|\right|A(x-x\prime )\left|\right|<\mathrm{\u03f5}$, causing the instability in Eq.**7**. Below, we give natural examples of sampling in compressed imaging where such a property holds, for known*ρ*and*γ*, with large probability. We can now state a simplified version of our result (the full version with explicit constants is given and proved in*SI Appendix*):Hence, up to the small error term ${\sigma}_{\mathbf{s},\mathbf{M}}{(x)}_{{l}_{w}^{1}}$, as $n\to \infty $ (with exponential convergence), we recover

*x*stably with an error proportional to the measurement error $\left|\right|Ax-y|{|}_{{l}^{2}}$. The explicit constant in front of the $\left|\right|Ax-y|{|}_{{l}^{2}}$ term can be thought of as an asymptotic local Lipschitz constant for the NNs as $n\to \infty $ and thus measures stability of inexact input*y*. The error of order ${\sigma}_{\mathbf{s},\mathbf{M}}{(x)}_{{l}_{w}^{1}}$ measures how close the vector*x*is from the model class of sparse in levels vectors. In the full version of Theorem 3, we also bound the error when we only approximately apply the nonlinear maps of the NNs and show that these errors can only accumulate slowly as*n*increases. In other words, we also gain a form of numerical stability of the forward pass of the NN. We call our NNs FIRENETs.### Remark 5 (unrolling does not in general yield an algorithm producing an accurate network).

Unrolling iterative methods has a rich history in DL (9, 56). Note, however, that Theorem 2 demonstrates that despite the existence of an accurate neural network, there are scenarios where no algorithm exists that can compute it. Thus, unrolling optimization methods can work only under certain assumptions. Our results are related to the work of Ben-Tal and Nemirovski (57) (see also ref. 58), which shows how key assumptions such as the robust nullspace property help bound the error of the approximation to a minimizer in terms of error bounds on the approximation to the objective function. This is related to robust optimization (59).

In the case that we do not know

*ρ*or*γ*(the constants in the definition of wrNSPL), we can perform a log-scale grid search for suitable parameters. By increasing the width of the NNs to $\mathcal{O}(N\mathrm{log}\u200a(n))$, we can still gain exponential convergence in*n*by choosing the parameters in the grid search that lead to the vector with minimal ${F}_{3}^{A}$ [the objective function of $({P}_{3})$]. In other cases, such as Theorem 4 below, it is possible to prove probabilistic results where*ρ*and*γ*are known.### Examples in Image Recovery.

As an application, we consider Fourier and Walsh (binary) sampling, using Haar wavelets as a sparsifying transform. Our results can also be generalized to infinite-dimensional settings via higher-order Daubechies wavelets. Theorem 3 is quite general and there are numerous other applications where problem-dependent results similar to Theorem 4 can be shown.

Let $K={2}^{r}$ for $r\in \mathbb{N}$, and set $N={K}^{d}$ so that the objective is to recover a vectorized

*d*-dimensional tensor $c\in {\u2102}^{N}$. Let $V\in {\u2102}^{N\times N}$ correspond to the*d*-dimensional discrete Fourier or Walsh transform (*SI Appendix*). Let $\mathcal{I}\subset \{1,\dots ,N\}$ be a sampling pattern with cardinality $m=\left|\mathcal{I}\right|$ and let $D=\text{diag}({d}_{1},\dots ,{d}_{m})\in {\u2102}^{m\times m}$ be a suitable diagonal scaling matrix, whose entries along the diagonal depend only on $\mathcal{I}$. We assume we can observe the subsampled, scaled and noisy measurements $y=D{P}_{\mathcal{I}}Vc+e\in {\u2102}^{m}$, where projection ${P}_{\mathcal{I}}$ is treated as an $m\times N$ matrix by ignoring the zero entries.To recover a sparse representation of

*c*, we consider Haar wavelet coefficients. Denote the discrete*d*-dimensional Haar wavelet transform by $\mathrm{\Psi}\in {\u2102}^{N\times N}$ and note that ${\mathrm{\Psi}}^{*}={\mathrm{\Psi}}^{-1}$ since $\mathrm{\Psi}$ is unitary. To recover the wavelet coefficients $x=\mathrm{\Psi}c$ of*c*, we consider the matrix $A=D{P}_{\mathcal{I}}V{\mathrm{\Psi}}^{*}$ and observe that $y=Ax+e=D{P}_{\mathcal{I}}Vc+e$. A key result in this work is that we can design a probabilistic sampling strategy (*SI Appendix*), for both Fourier and Walsh sampling in*d*dimensions, requiring no more than $m\mathrm{\gtrsim}({s}_{1}+\dots +{s}_{r})\xb7\mathcal{L}$ samples, that can ensure with high probability that*A*satisfies the wrNSPL with certain constants. The sparsity in levels structure (*Definition 1*) is chosen to correspond to the*r*wavelet levels. Here $\mathcal{L}$ is a logarithmic term in*N*,*m*,*s*, and ${\mathrm{\u03f5}}_{\mathrm{\mathbb{P}}}^{-1}$ [where ${\mathrm{\u03f5}}_{\mathrm{\mathbb{P}}}\in (0,1)$ is a probability]. This result is crucial, as it makes*A*kernel aware for vectors that are approximately $(\mathbf{s},\mathbf{M})$ -sparse and allows us (using Theorem 3) to design NNs that can stably and accurately recover approximately $(\mathbf{s},\mathbf{M})$ -sparse vectors. Moreover, due to the exponential convergence in Theorem 3, the depth of these NNs depends only logarithmically on the error*δ*. Below follows a simplified version of our result (the full precise version is given and proved in*SI Appendix*).## Balancing the Stability and Accuracy Trade-Off

Current DL methods for image reconstruction can be unstable in the sense that 1) a tiny perturbation, in either the image or the sampling domain, can cause severe artifacts in the reconstructed image (Fig. 2,

*Top*row) and/or 2) a tiny detail in the image domain might be washed out in the reconstructed image (lack of accuracy), resulting in potential false negatives. Inevitably, there is a stability–accuracy trade-off for this type of linear inverse problem, making it impossible for any reconstruction method to become arbitrarily stable without sacrificing accuracy or vice versa. Here, we show that the NNs computed by our algorithm (FIRENETs) are stable with respect to adversarial perturbations and accurate for images that are sparse in wavelets (cf. Theorem 4). As most images are sparse in wavelets, these networks also show great generalization properties to unseen images.### Adversarial Perturbations for AUTOMAP and FIRENETs.

Fig. 2 (

*Top*row) shows the stability test, developed in ref. 19, applied to the automated transform by manifold approximation (AUTOMAP) (60) network used for MRI reconstruction with 60% subsampling. The stability test is run on the AUTOMAP network to find a sequence of perturbations $\left|\right|{e}_{1}|{|}_{{l}^{2}}<|\left|{e}_{2}\right|{|}_{{l}^{2}}<\left|\right|{e}_{3}|{|}_{{l}^{2}}$. As can be seen from Fig. 2,*Top*row, the network reconstruction completely deforms the image and the reconstruction is severely unstable (similar results for other networks are demonstrated in ref. 19).In contrast, we have applied the stability test, but now for the FIRENETs reported in this paper. Fig. 2 (

*Bottom*row) shows the results for the constructed FIRENETs, where we rename the perturbations ${\tilde{e}}_{j}$ to emphasize the fact that these perturbations are sought for the FIRENETs and have nothing to do with the adversarial perturbations for AUTOMAP. We now see that despite the search for adversarial perturbations, the reconstruction remains stable. The error in the reconstruction was also found to be at most of the same order of the perturbation (as expected from the stability in Theorem 3). In applying the test to FIRENETs, we tested/tuned the parameters in the gradient ascent algorithm considerably (much more so than was needed for applying the test to AUTOMAP, where finding instabilities was straightforward) to find the worst reconstruction results, yet the reconstruction remained stable. Note also that this is just one form of stability test and it is likely that there are many other tests for creating instabilities for NNs for inverse problems. This highlights the importance of results such as Theorem 3, which guarantees stability regardless of the perturbation.To demonstrate the generalization properties of our NNs, we show the stability test applied to FIRENETs for a range of images in

*SI Appendix*. This shows stability across different types of images and highlights that conditions such as Definition 2 allow great generalization properties.### Stabilizing Unstable NNs with FIRENETs.

Our NNs also act as a stabilizer. For example, Fig. 3 shows the adversarial example for AUTOMAP (taken from Fig. 2), but now shows what happens when we take the reconstruction from AUTOMAP as an input to our FIRENETs. Here we use the fact that we can view our networks as approximations of unrolled and restarted iterative methods, allowing us to use the output of AUTOMAP as an additional input for the reconstruction. We see that FIRENETs fix the output of AUTOMAP and stabilize the reconstruction. Moreover, the full concatenation itself of the networks remains stable to adversarial attacks.

### The Stability vs. Accuracy Trade-Off and False Negatives.

It is easy to produce a perfectly stable network: The zero network is the obvious candidate! However, this network would obviously have poor performance and produce many false negatives. The challenge is to simultaneously ensure performance and stability. Fig. 4 highlights this issue. Here we have trained two NNs to recover a set of ellipses images from noise-free and noisy Fourier measurements. The noise-free measurements are generated as

*y*=*Ax*, where $A\in {\u2102}^{m\times N}$ is a subsampled discrete Fourier transform, with $m/N=0.15$ and $N=1,{024}^{2}$. The noisy measurements are generated as $y=Ax+ce$, where*A*is as before, and the real and imaginary components of $e\in {\u2102}^{m}$ are drawn from a zero mean and unit variance normal distribution $\mathcal{N}(0,1)$, and $c\in \mathbb{R}$ is drawn from the uniform distribution $\text{Unif}([0,100])$. The noise $ce\in {\u2102}^{m}$ is generated on the fly during the training process.The trained networks use a standard benchmarking architecture for image reconstruction and map $y\mapsto \varphi ({A}^{*}y)$, where $\varphi :{\u2102}^{N}\to {\mathbb{R}}^{N}$ is a trainable U-net NN (8, 61). Training networks with noisy measurements, using for example this architecture, have previously been used as an example of how to create NNs that are robust toward adversarial attacks (62). As we can see from Fig. 4 (

*Bottom*row) this is the case, as it does indeed create a NN that is stable with respect to worst-case perturbations. However, a key issue is that it is also producing false negatives due to its inability to reconstruct details. Similarly, as reported in the 2019 FastMRI challenge, trained NNs that performed well in terms of standard image quality metrics were prone to false negatives: They failed to reconstruct small, but physically relevant image abnormalities (25). Pathologies, generalization, and AI-generated hallucinations were subsequently a focus of the 2020 challenge (26). FIRENET, on the other hand, has a guaranteed performance (on images approximately sparse in wavelet bases) and stability, given specific conditions on the sampling procedure. The challenge is to determine the optimal balance between accuracy and stability, a well-known problem in numerical analysis.## Concluding Remarks

1)

(Algorithms may not exist—Smale’s 18th problem) There are well-conditioned problems where accurate NNs exist, but no algorithm can compute them. Understanding this phenomenon is essential to addressing Smale’s 18th problem on the limits of AI. Moreover, limitations established in this paper suggest a classification theory describing the conditions needed for the existence of algorithms that can compute stable and accurate NNs (remark 5).

2)

(Classifications and Hilbert’s program) The strong optimism regarding the abilities of AI is comparable to the optimism surrounding mathematics in the early 20th century, led by D. Hilbert. Hilbert believed that mathematics could prove or disprove any statement and, moreover, that there were no restrictions on which problems could be solved by algorithms. Gödel (36) and Turing (37) turned Hilbert’s optimism upside down by their foundational contributions establishing impossibility results on what mathematics and digital computers can achieve.

Hilbert’s program on the foundations of mathematics led to a rich mathematical theory and modern logic and computer science, where substantial efforts were made to classify which problems can be computed. We have sketched a similar program for modern AI, where we provide certain sufficient conditions for the existence of algorithms to produce stable and accurate NNs. We believe that such a program on the foundations of AI is necessary and will act as an invaluable catalyst for the advancement of AI.

3)

(Trade-off between stability and accuracy) For inverse problems there is an intrinsic trade-off between stability and accuracy. We demonstrated NNs that offer a blend of both stability and accuracy, for the sparsity in levels class. Balancing these two interests is crucial for applications and will no doubt require a myriad of future techniques to be developed.

4)

Tracing out the optimal stability vs. accuracy trade-off remains largely an open problem and depends on several factors such as the model class one wishes to recover, the error tolerance of the application, and the error metric used. We have shown stability and accuracy results in the

*l*^{2}norm, since it is common in the literature to measure noise via this norm. We expect a program quantifying the stability and accuracy trade-off to be of particular relevance in the increasing number of real-world implementations of machine learning in inverse problems.5)

6)

(Future work—Which NNs can be computed?) There is an enormous literature (29, 30, 64–66) on the existence of NNs with great approximation qualities. However, Theorem 2 shows that only certain accuracy may be computationally achievable. Our results are just the beginning of a mathematical theory studying which NNs can be computed by algorithms. This opens up for a theory covering other sufficient (and potentially necessary) conditions guaranteeing stability and accuracy and extensions to other inverse problems such as phase retrieval (67, 68). One can also prove similar computational barriers in other settings via the tools developed in this paper.

## Methods: The Solvability Complexity Index Hierarchy

Our proof techniques for fundamental barriers in Theorem 2 stem from the mathematics behind the solvability complexity index (SCI) hierarchy (33, 69–78). The SCI hierarchy generalizes the fundamental problems of Smale (79, 80) on existence of algorithms and work by McMullen (81) and Doyle and McMullen (82). We extend and refine these techniques, in particular those of ref. 33, and generalize the mathematics behind the extended Smale’s ninth problem (33, 34)—which also builds on the SCI hierarchy. More precisely, to prove our results we develop the concept of sequential general algorithms. General algorithms are a key tool in the mathematics of the SCI hierarchy. Sequential general algorithms extend this concept and capture the notion of adaptive and/or probabilistic choice of training data. The architectures of the NNs in Theorem 3 are based on unrolled primal–dual iterations for $({P}_{3})$. In addition to providing stability, the wrNSPL allows us to prove exponential convergence through a careful restarting and reweighting scheme. Full theoretical derivations are given in

*SI Appendix*.## Data Availability

All the code and data used to produce the figures in this paper are available from GitHub, https://www.github.com/Comp-Foundations-and-Barriers-of-AI/firenet.

## Acknowledgments

We are grateful to Kristian Haug for allowing us to use parts of code for unrolled primal–dual algorithms in TensorFlow. This work was supported by a Research Fellowship at Trinity College, Cambridge University (to M.J.C.), as well as by a Leverhulme Prize and a Royal Society University Research Fellowship (to A.C.H.).

## Supporting Information

### Materials/Methods, Supplementary Text, Tables, Figures, and/or References

Appendix 01 (PDF)

- Download
- 1.30 MB

## References

1

A. Krizhevsky, I. Sutskever, G. E. Hinton, “Imagenet classification with deep convolutional neural networks ” in

*Advances in Neural Information Processing Systems*, F. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger, Eds. (Curran Associates, Inc., 2012), pp. 1097–1105.2

K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for image recognition” in

*Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*(IEEE, 2016), pp. 770–778.3

R. Girshick, J. Donahue, T. Darrell, J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation” in

*Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*(IEEE, 2014), pp. 580–587.4

G. Hinton et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups.

*IEEE Signal Process. Mag.***29**, 82–97 (2012).5

J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, V. Svetnik, Deep neural nets as a method for quantitative structure-activity relationships.

*J. Chem. Inf. Model.***55**, 263–274 (2015).6

G. E. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition.

*IEEE Trans. Audio Speech Lang. Process.***20**, 30–42 (2011).7

U. S. Kamilov et al., Learning approach to optical tomography.

*Optica***2**, 517–522 (2015).8

K. H. Jin, M. T. McCann, E. Froustey, M. Unser, Deep convolutional neural network for inverse problems in imaging.

*IEEE Trans. Image Process.***26**, 4509–4522 (2017).9

M. T. McCann, K. H. Jin, M. Unser, Convolutional neural networks for inverse problems in imaging: A review.

*IEEE Signal Process. Mag.***34**, 85–95 (2017).10

K. Hammernik et al., Learning a variational network for reconstruction of accelerated MRI data.

*Magn. Reson. Med.***79**, 3055–3071 (2018).11

S. Arridge, P. Maass, O. Öktem, C. B. Schönlieb, Solving inverse problems using data-driven models.

*Acta Numer.***28**, 1–174 (2019).12

G. Ongie et al., Deep learning techniques for inverse problems in imaging.

*IEEE J. Sel. Areas Inf. Theory***1**, 39–56 (2020).13

C. Szegedy et al., “Intriguing properties of neural networks” in

*International Conference on Learning Representations*(2014). https://openreview.net/forum?id=kklr_MTHMRQjG. (Accessed 3 March 2022).14

S. M. Moosavi-Dezfooli, A. Fawzi, P. Frossard, “Deepfool: A simple and accurate method to fool deep neural networks” in

*Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*(IEEE, 2016), pp. 2574–2582.15

S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, “Universal adversarial perturbations” in

*IEEE Conference on Computer Vision and Pattern Recognition*(IEEE, 2017), pp. 86–94.16

N. Akhtar, A. Mian, Threat of adversarial attacks on deep learning in computer vision: A survey.

*IEEE Access***6**, 14410–14430 (2018).17

N. Carlini, D. Wagner, “Audio adversarial examples: Targeted attacks on speech-to-text” in

*2018 IEEE Security and Privacy Workshops (SPW)*(IEEE, 2018), pp. 1–7.18

Y. Huang et al., “Some investigations on robustness of deep learning in limited angle tomography” in

*International Conference on Medical Image Computing and Computer-Assisted Intervention*, A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, G. Fichtinger, Eds. (Springer, 2018), pp. 145–153.19

V. Antun, F. Renna, C. Poon, B. Adcock, A. C. Hansen, On instabilities of deep learning in image reconstruction and the potential costs of AI.

*Proc. Natl. Acad. Sci. U.S.A.***117**, 30088–30095 (2020).20

S. G. Finlayson et al., Adversarial attacks on medical machine learning.

*Science***363**, 1287–1289 (2019).21

I. Y. Tyukin, D. J. Higham, A. N. Gorban, “On adversarial examples and stealth attacks in artificial intelligence systems” in

*2020 International Joint Conference on Neural Networks*(IEEE, 2020), pp. 1–6.22

N. M. Gottschling, V. Antun, B. Adcock, A. C. Hansen, The troublesome kernel: Why deep learning for inverse problems is typically unstable. arXiv [Preprint] (2020). https://arxiv.org/abs/2001.01258 (Accessed 5 January 2020).

23

N. Baker et al., “Workshop report on basic research needs for scientific machine learning: Core technologies for artificial intelligence” (Tech. Rep. 1478744, USDOE Office of Science, 2019).

24

R. Hamon, H. Junklewitz, I. Sanchez,

*Robustness and Explainability of Artificial Intelligence - From Technical to Policy Solutions*(Publications Office of the European Union, 2020).25

F. Knoll et al., Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge.

*Magn. Reson. Med.***84**, 3054–3070 (2020).26

M. J. Muckley et al., Results of the 2020 fastMRI Challenge for Machine Learning MR Image Reconstruction.

*IEEE Trans. Med. Imaging***40**, 2306–2317 (2021).27

C. Belthangady, L. A. Royer, Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction.

*Nat. Methods***16**, 1215–1225 (2019).28

D. P. Hoffman, I. Slavitt, C. A. Fitzpatrick, The promise and peril of deep learning in microscopy.

*Nat. Methods***18**, 131–132 (2021).29

A. Pinkus, Approximation theory of the MLP model in neural networks.

*Acta Numer.***8**, 143–195 (1999).30

R. DeVore, B. Hanin, G. Petrova, Neural network approximation.

*Acta Numer.***30**, 327–444 (2021).31

B. Adcock, N. Dexter, The gap between theory and practice in function approximation with deep neural networks.

*SIAM J. Math. Data Sci.***3**, 624–655 (2021).32

C. F. Higham, D. J. Higham, Deep learning: An introduction for applied mathematicians.

*SIAM Rev.***61**, 860–891 (2019).33

A. Bastounis, A. C. Hansen, V. Vlačić, The extended Smale’s 9th problem –On computational barriers and paradoxes in estimation, regularisation, computer-assisted proofs and learning. arXiv [Preprint] (2021). https://arxiv.org/abs/2110.15734 (Accessed 29 October 2021).

34

S. Smale, Mathematical problems for the next century.

*Math. Intell.***20**, 7–15 (1998).35

A. Turing, I.-Computing machinery and intelligence.

*Mind***LIX**, 433–460 (1950).36

K Gödel, Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I.

*Monatsh. Math. Phys.***38**, 173–198 (1931).37

A. Turing, On computable numbers, with an application to the Entscheidungsproblem.

*Proc. Lond. Math. Soc.***42**, 230–265 (1936).38

S. Weinberger,

*Computers, Rigidity, and Moduli: The Large-Scale Fractal Geometry of Riemannian Moduli Space*(Princeton University Press, Princeton, NJ, 2004).39

P. Niyogi, S. Smale, S. Weinberger, A topological view of unsupervised learning from noisy data.

*SIAM J. Comput.***40**, 646–663 (2011).40

X. Chen, J. Liu, Z. Wang, W. Yin, “Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds” in

*Advances in Neural Information Processing Systems*, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, K. Q. Weinberger, Eds. (Curran Associates, Inc., 2018), pp. 9061–9071.41

A. Raj, Y. Bresler, B. Li, “Improving robustness of deep-learning-based image reconstruction” in

*International Conference on Machine Learning*, H. Daumé III, A. Singh, Eds. (PMLR, 2020), pp. 7932–7942.42

I. Goodfellow et al., “Generative adversarial nets” in

*Advances in Neural Information Processing Systems*, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, K. Q. Weinberger, Eds. (Curran Associates, Inc., 2014), pp. 2672–2680.43

M. Arjovsky, S. Chintala, L. Bottou, “Wasserstein generative adversarial networks” in

*International Conference on Machine Learning*, D. Precup, Y. W. Teh, Eds. (PMLR, 2017), vol.**70**, pp. 214–223.44

B. Adcock, A. C. Hansen, C. Poon, B. Roman, “Breaking the coherence barrier: A new theory for compressed sensing” in

*Forum of Mathematics, Sigma*(Cambridge University Press, 2017), vol.**5**.45

A. Bastounis, A. C. Hansen, On the absence of uniform recovery in many real-world applications of compressed sensing and the restricted isometry property and nullspace property in levels.

*SIAM J. Imaging Sci.***10**, 335–371 (2017).46

B. Adcock, A. C. Hansen,

*Compressive Imaging: Structure, Sampling, Learning*(Cambridge University Press, 2021).47

C. Boyer, J. Bigot, P. Weiss, Compressed sensing with structured sparsity and structured acquisition.

*Appl. Comput. Harmon. Anal.***46**, 312–350 (2019).48

E. J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information.

*IEEE Trans. Inf. Theory***52**, 489–509 (2006).49

E. J. Candes, J. K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements.

*Commun. Pure Appl. Math.***59**, 1207–1223 (2006).50

D. L. Donoho, Compressed sensing.

*IEEE Trans. Inf. Theory***52**, 1289–1306 (2006).51

D. L. Donoho, J. Tanner, Counting faces of randomly-projected polytopes when the projection radically lowers dimension.

*J. Am. Math. Soc.***22**, 1–53 (2009).52

A. Cohen, W. Dahmen, R. DeVore, Compressed sensing and best

*k*-term approximation.*J. Am. Math. Soc.***22**, 211–231 (2009).53

A. Jones, A. Tamtögl, I. Calvo-Almazán, A. Hansen, Continuous compressed sensing for surface dynamical processes with helium atom scattering.

*Sci. Rep.***6**, 27776 (2016).54

R. A. DeVore, Nonlinear approximation.

*Acta Numer.***7**, 51–150 (1998).55

M. P. Friedlander, H. Mansour, R. Saab, Ö. Yilmaz, Recovering compressively sampled signals using partial support information.

*IEEE Trans. Inf. Theory***58**, 1122–1134 (2012).56

V. Monga, Y. Li, Y. C. Eldar, Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing.

*IEEE Signal Process. Mag.***38**, 18–44 (2021).57

A Ben-Tal, A Nemirovski, Lectures on modern convex optimization (2020–2021). https://www2.isye.gatech.edu/~nemirovs/LMCO_LN.pdf. (Accessed 5 February 2022).

58

Y. E. Nesterov, A. Nemirovski, On first-order algorithms for l1/nuclear norm minimization.

*Acta Numer.***22**, 509–575 (2013).59

A. Ben-Tal, L. El Ghaoui, A. Nemirovski,

*Robust Optimization*(Princeton Series in Applied Mathematics, Princeton University Press, 2009).60

B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, M. S. Rosen, Image reconstruction by domain-transform manifold learning.

*Nature***555**, 487–492 (2018).61

J. Long, E. Shelhamer, T. Darrell, “Fully convolutional networks for semantic segmentation” in

*Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition*(IEEE, 2015), pp. 3431–3440.62

M. Genzel, J. Macdonald, M. März, “Solving inverse problems with deep neural networks–Robustness included?” in

*Transactions on Pattern Analysis and Machine Intelligence*, (2022).63

A. Bastounis, A. C. Hansen, V. Vlacic, The mathematics of adversarial attacks in AI –Why deep learning is unstable despite the existence of stable neural networks. arXiv [Preprint] (2021). https://arxiv.org/abs/2109.06098 (Accessed 13 September 2021).

64

H. Bölcskei, P. Grohs, G. Kutyniok, P. Petersen, Optimal approximation with sparsely connected deep neural networks.

*SIAM J. Math. Data Sci.***1**, 8–45 (2019).65

I. Daubechies, R. DeVore, S. Foucart, B. Hanin, G. Petrova, Nonlinear approximation and (deep) ReLU networks.

*Constr. Approx.***55**, 127–172 (2021).66

W. E. S. Wojtowytsch, On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics.

*CSIAM Transact. Appl. Math***1**, 387–440 (2020).67

E. J. Candes, T. Strohmer, V. Voroninski, Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming.

*Commun. Pure Appl. Math.***66**, 1241–1274 (2013).68

A. Fannjiang, T. Strohmer, The numerics of phase retrieval.

*Acta Numer.***29**, 125–228 (2020).69

A. C. Hansen, On the solvability complexity index, the

*n*-pseudospectrum and approximations of spectra of operators.*J. Am. Math. Soc.***24**, 81–124 (2011).70

A. C. Hansen, O. Nevanlinna, Complexity issues in computing spectra, pseudospectra and resolvents.

*Banach Cent. Publ.***112**, 171–194 (2016).71

J. Ben-Artzi, M. J. Colbrook, A. C. Hansen, O. Nevanlinna, M. Seidel, Computing spectra –On the solvability complexity index hierarchy and towers of algorithms. arXiv [Preprint] (2020). https://arxiv.org/abs/1508.03280 (Accessed 15 June 2020).

72

J. Ben-Artzi, A. C. Hansen, O. Nevanlinna, M. Seidel, New barriers in complexity theory: On the solvability complexity index and the towers of algorithms.

*C. R. Math.***353**, 931–936 (2015).73

M. Colbrook, “The foundations of infinite-dimensional spectral computations,” PhD thesis, University of Cambridge, Cambridge, UK (2020).

74

M. J. Colbrook, Computing spectral measures and spectral types.

*Commun. Math. Phys.***384**, 433–501 (2021).75

M. Colbrook, A. Horning, A. Townsend, Computing spectral measures of self-adjoint operators.

*SIAM Rev.***63**, 489–524 (2021).76

J. Ben-Artzi, M. Marletta, F. Rösler, Computing the sound of the sea in a seashell.

*Found. Comput. Math.*, (2021).77

M. J. Colbrook, A. C. Hansen, The foundations of spectral computations via the solvability complexity index hierarchy. arXiv [Preprint] (2021). https://arxiv.org/abs/1908.09592 (Accessed 6 August 2020).

78

H. Boche, V. Pohl, “The solvability complexity index of sampling-based Hilbert transform approximations” in

*2019 13th International Conference on Sampling Theory and Applications (SampTA)*(IEEE, 2019), pp. 1–4.79

S. Smale, The fundamental theorem of algebra and complexity theory.

*Am. Math. Soc. Bull.***4**, 1–36 (1981).80

S. Smale, Complexity theory and numerical analysis.

*Acta Numer.***6**, 523–551 (1997).81

C. McMullen, Families of rational maps and iterative root-finding algorithms.

*Ann. Math.***125**, 467–493 (1987).82

P. Doyle, C. McMullen, Solving the quintic by iteration.

*Acta Math.***163**, 151–180 (1989).83

R. Strack, Imaging: AI transforms image reconstruction.

*Nat. Methods***15**, 309 (2018).84

Q. Fan et al., MGH-USC Human Connectome Project datasets with ultra-high b-value diffusion MRI.

*Neuroimage***124**(Pt. B), 1108–1114 (2016).## Information & Authors

### Information

#### Published in

#### Classifications

#### Copyright

Copyright © 2022 the Author(s). Published by PNAS. This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).

#### Data Availability

All the code and data used to produce the figures in this paper are available from GitHub, https://www.github.com/Comp-Foundations-and-Barriers-of-AI/firenet.

#### Submission history

**Received**: April 16, 2021

**Accepted**: October 26, 2021

**Published online**: March 16, 2022

**Published in issue**: March 22, 2022

#### Keywords

#### Acknowledgments

We are grateful to Kristian Haug for allowing us to use parts of code for unrolled primal–dual algorithms in TensorFlow. This work was supported by a Research Fellowship at Trinity College, Cambridge University (to M.J.C.), as well as by a Leverhulme Prize and a Royal Society University Research Fellowship (to A.C.H.).

#### Notes

This article is a PNAS Direct Submission.

### Authors

#### Competing Interests

The authors declare no competing interest.

## Metrics & Citations

### Metrics

#### Citation statements

#### Altmetrics

### Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

#### Cited by

Loading...

## View Options

### View options

#### PDF format

Download this article as a PDF file

DOWNLOAD PDF### Get Access

#### Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Personal login Institutional Login#### Recommend to a librarian

Recommend PNAS to a Librarian#### Purchase options

Purchase this article to access the full text.