# Toward the first quantum simulation with quantum speedup

^{a}Department of Computer Science, University of Maryland, College Park, MD 20742;^{b}Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742;^{c}Joint Center for Quantum Information and Computer Science, University of Maryland, College Park, MD 20742;^{d}Division of Computing and Communication Foundations, National Science Foundation, Alexandria, VA 22314;^{e}IonQ, Inc., College Park, MD 20740;^{f}Department of Mathematics and Statistics, Dalhousie University, Halifax, NS B3H 4R2, Canada

See allHide authors and affiliations

Edited by John Preskill, California Institute of Technology, Pasadena, CA, and approved August 10, 2018 (received for review January 30, 2018)

## Significance

Near-term quantum computers will have limited numbers of qubits and will only be able to reliably perform limited numbers of gates. Therefore, it is crucial to identify applications of quantum processors that use the fewest possible resources. We argue that simulating the time evolution of spin systems is a classically hard problem of practical interest that is among the easiest to address with early quantum devices. We develop optimized implementations and perform detailed resource analyses for several leading quantum algorithms for this problem. By evaluating the best approaches to small-scale quantum simulation, we provide a detailed blueprint for what could be an early practical application of quantum computers.

## Abstract

With quantum computers of significant size now on the horizon, we should understand how to best exploit their initially limited abilities. To this end, we aim to identify a practical problem that is beyond the reach of current classical computers, but that requires the fewest resources for a quantum computer. We consider quantum simulation of spin systems, which could be applied to understand condensed matter phenomena. We synthesize explicit circuits for three leading quantum simulation algorithms, using diverse techniques to tighten error bounds and optimize circuit implementations. Quantum signal processing appears to be preferred among algorithms with rigorous performance guarantees, whereas higher-order product formulas prevail if empirical error estimates suffice. Our circuits are orders of magnitude smaller than those for the simplest classically infeasible instances of factoring and quantum chemistry, bringing practical quantum computation closer to reality.

While a scalable quantum computer remains a long-term goal, recent experimental progress suggests that devices capable of outperforming classical computers will soon be available (refs. 1⇓⇓⇓–5; www.research.ibm.com/ibm-q/). Multiple groups have already developed programmable devices with several qubits and two-qubit gate fidelities around 98% (6), and similar devices with around 50 qubits are under active development. While the error rates of these early machines severely limit the total number of gates that can be reliably performed, future improvements should lead to machines with more qubits and more-reliable gates. This raises the exciting possibility of solving practical problems that are beyond the reach of classical computation. Such an outcome would be a landmark in the development of quantum computers and would begin an era in which they serve not only as test beds for science but as practical computing machines.

Reaching this goal will require not only significant experimental advances but also careful quantum algorithm design and implementation. Here we address the latter issue by developing explicit circuits, and thereby producing concrete resource estimates, for practical quantum computations that can outperform classical computers. Through this work, we aim to identify applications for small quantum computers that help to motivate the significant investment required to develop scalable, fault-tolerant quantum computers.

There has been considerable previous research on compiling quantum algorithms into explicit circuits (see *SI Appendix*, section A for more detail). However, to the best of our knowledge, none of these studies aimed to identify minimal examples of superclassical quantum computation, and typical resource counts were high. Our work is also distinct from recent work on quantum computational supremacy (7), where the goal is merely to accomplish a superclassical task, regardless of its practicality. Instead, we aim to pave the way toward practical quantum computations (which may not be far beyond the threshold for supremacy).

Arguably, the most natural application of quantum computers is to the problem of simulating quantum dynamics (8). Quantum computers can simulate a wide variety of quantum systems, including fermionic lattice models (9), quantum chemistry (10), and quantum field theories (11). However, simulations of spin systems with local interactions likely have less overhead, so we focus on them as an early candidate for practical quantum simulation. Note that analog quantum simulation (4, 5) is another promising approach to simulating spin systems that may be easier to realize in the short term. However, analog simulators lack universal control, and it can be challenging to ensure correctness of their output. Here we focus on digital simulation for its greater flexibility, for the prospect of invoking fault tolerance, and for its role as a stepping-stone to other forms of quantum computation.

Efficient quantum algorithms for simulating quantum dynamics have been known for over two decades (12). Recent work has led to algorithms with significantly improved asymptotic performance as a function of various parameters such as the evolution time and the allowed simulation error (13⇓⇓⇓–17). Our work investigates whether these alternative algorithms can be advantageous for simulations of relatively small systems, and aims to lay the groundwork for an early practical application of quantum computers.

## 1. Target System

To produce concrete benchmarks, we focus on a specific simulation task. Specifically, we consider a one-dimensional nearest-neighbor Heisenberg model with a random magnetic field in the z direction. This model is described by the Hamiltonian

This Hamiltonian has been considered in recent studies of self-thermalization and many-body localization (see *SI Appendix*, section B for more detail). Despite intensive investigation, the details of a transition between thermal and localized phases remain poorly understood. A major challenge is the difficulty of simulating quantum systems with classical computers; indeed, the most extensive numerical study we are aware of was restricted to, at most, 22 spins (18).

Hamiltonian simulation can efficiently access any feature that could be observed experimentally (and more), and there are several proposals for exploring self-thermalization by simulating dynamics (19⇓–21). Since all of these approaches involve only very simple state preparations and measurements, we focus on the cost of simulating dynamics. We consider evolution times comparable to the number of spins, since the system must evolve for this long for self-thermalization to take place (or even for information to propagate across the system, owing to the Lieb–Robinson bound).

Specifically, we produce gate counts for simulations with

## 2. Implementations

There are many distinct quantum algorithms for Hamiltonian simulation, some of which are summarized in Table 1. We implement algorithms based on high-order product formulas (PFs) (13), direct application of the Taylor series (TS) (15), and the recent quantum signal processing method (QSP) (16), all of which are introduced in *SI Appendix*, section C. We expect these to be among the most efficient approaches to digital quantum simulation. In particular, approaches based on quantum walk (14, 23) appear to incur greater overhead (as discussed in *SI Appendix*, section D).

To produce concrete circuits, we implement quantum simulation algorithms in a quantum circuit description language called Quipper (24) (see *SI Appendix*, section E for more details). Wherever possible, we tighten the analysis of algorithm parameters and manually optimize the implementation. We also process all circuits using an automated tool we developed for large-scale quantum circuit optimization (25). Our implementation is available in a public repository (https://github.com/njross/simcount).

We express our circuits over the set of two-qubit cnot gates, single-qubit Clifford gates, and single-qubit z rotations

Our analysis ignores many practical details such as architectural constraints, instead aiming to give a broad overview of potential implementation costs that can be refined for specific systems. When counting qubits, we assume that measured ancillas can be reused later.

### PF Algorithm.

The PF approach approximates the exponential of a sum of operators by a product of exponentials of the individual operators. The asymptotic complexity of this approach can be improved with higher-order Suzuki formulas (27). By splitting the evolution into r segments and making r sufficiently large, we can ensure that the simulation is arbitrarily precise. The main challenge in making these algorithms concrete is to choose an explicit r that ensures some desired upper bound on the error. *SI Appendix*, section F gives a detailed description of these implementation details.

We present two bounds, which we call the “analytic” and “minimized” bounds, that slightly strengthen previous analysis (13). However, bounds of this type are far from tight (28⇓–30). Thus, we develop an improved bound that exploits commutation relations among terms in the target Hamiltonian. For a one-dimensional system of n spins with nearest-neighbor couplings, evolving for time

Naive computation of the commutator bound takes time

Unfortunately, even the commutator bound can be very loose. To address this, we report empirical error estimates by extrapolating the error seen in direct classical simulations of small instances [as also explored in previous work on simulating many-body dynamics (28) and quantum chemistry (29, 30)]. While these empirical bounds do not provide rigorous guarantees on the simulation error, they may nevertheless be useful in practice, and they improve the cost of PF algorithms by several orders of magnitude.

### TS Algorithm.

The TS algorithm directly implements the (truncated) Taylor series of the evolution operator for a carefully chosen constant time using a procedure for implementing linear combinations of unitary operations (15). This segment is then simply repeated until the entire evolution time has been simulated. The circuit for a segment is built using three subroutines: a state preparation procedure, a reflection about the *SI Appendix*, section G) also includes a concrete error analysis that establishes rigorous, non-asymptotic bounds on the simulation parameters.

The aforementioned

### QSP Algorithm.

The QSP algorithm of Low and Chuang (16, 17) effectively implements a linear combination of unitary operators by a different mechanism. This algorithm applies a sequence of operations called “phased iterates” that manifest each eigenvalue of the Hamiltonian as a rotation acting on an ancilla qubit. By carefully choosing a sequence of rotation angles for that qubit, we induce the desired evolution.

The circuit for each phased iterate is built from subroutines similar to the TS algorithm. However, computing the M rotation angles for the phased iterates requires finding the roots of a polynomial of degree

One way to alleviate this problem is to consider a segmented implementation of the QSP algorithm. In this approach, we divide the evolution time into r segments, each of which is sufficiently short that the classical preprocessing is tractable. Since the optimality of the QSP approach to Hamiltonian simulation relies essentially on simulating the entire evolution as a single segment, the segmented approach has higher asymptotic complexity. However, it allows us to develop a complete implementation, and the overhead for moderate values of n is not too high.

For the full version of the algorithm, we consider an empirical error bound on the Jacobi–Anger expansion, giving a modest improvement. Numerical evidence suggests that the additional savings from an empirical error bound for the overall algorithm would not be significant. For the segmented version of the algorithm, we instead used an analytic error bound so that the algorithm remains rigorous (and because an empirical Jacobi–Anger error bound did not give much improvement in that case).

*SI Appendix*, section H discusses our implementation of QSP algorithms in detail.

## 3. Results

Fig. 1 compares gate counts for the PF algorithm (with commutator and empirical error bounds), the TS algorithm, and the QSP algorithm (in both segmented and nonsegmented versions). The TS algorithm uses significantly more qubits than the QSP algorithm (as shown in Fig. 2) while also requiring more gates, so the latter is clearly preferred. In contrast, the QSP algorithm has only slightly greater space requirements than the PF algorithm.

Surprisingly, despite being more involved, the QSP algorithm outperforms the rigorously bounded PF algorithm even for small system sizes. In particular, among the rigorously analyzed algorithms, the segmented QSP algorithm has the best performance, improving over the PF algorithm by about an order of magnitude for cnot count and by almost two orders of magnitude for T count.

Empirical error bounds improve the performance of the PF algorithm by two to three orders of magnitude, making it the preferred approach if rigorous performance guarantees are not required. For the cnot count, the empirical PF algorithm improves over the full QSP algorithm by about an order of magnitude. The advantage in the T count is less significant, but still indicates that the PF algorithm is dominant, especially considering its lower qubit count.

Although we expected that higher-order product formulas would not be advantageous for small system sizes, we find that the fourth- and sixth-order formulas had the best performance for our benchmark system with tens to hundreds of qubits, as shown in Fig. 3. The fourth-order formula with commutator bound gives the best available PF algorithm with a rigorous performance guarantee. Using empirical error bounds, the sixth-order formula outperforms the fourth-order formula for systems of about 30 or more qubits, making the former the method of choice for simulations just beyond the reach of classical computers (again, provided a heuristic error bound can be tolerated). These results suggest that higher-order formulas may be advantageous for other quantum simulations, such as those for quantum chemistry, even though they have not usually been considered (30, 32).

For a system of 50 qubits—which is presumably close to the limits of direct classical simulation for circuits such as ours (33)—the TS algorithm uses 171 qubits and the QSP algorithm uses 67, whereas the PF algorithm uses only 50. [Recent work has demonstrated simulation of 56-qubit computations, but only for circuits of much smaller depth than those considered in our work (34).] At this size, the segmented QSP algorithm is the best rigorously analyzed approach, using about

For comparison, previous estimates of gate counts for factoring, discrete logarithms, and quantum chemistry simulations are significantly larger, as illustrated in Fig. 4. First, consider factoring a 1,024-bit number, which is beyond the factorization of RSA-768 that was achieved classically in 2009 (36). The best implementation we are aware of uses 3,132 qubits and about *SI Appendix*, section A.) Quantum algorithms for classically hard instances of the elliptic curve discrete logarithm problem have roughly comparable cost (37). For quantum chemistry, a natural target for a problem just beyond the reach of classical computing is a simulation of FeMoco, the primary cofactor of nitrogenase, an enzyme that catalyzes the nitrogen fixation process. Even for a fairly low-precision simulation, and using nonrigorous estimates of the product formula error, the best implementation we are aware of uses 111 qubits and

For a more detailed discussion of the results, see *SI Appendix*, section I.

## 4. Discussion

The work described in this paper represents progress toward a possible early application of quantum computers, solving a practical problem that is beyond the reach of classical computation. Our optimized implementations can perform a 50-spin quantum simulation using fewer qubits, and dramatically fewer gates (by over five orders of magnitude in the case of quantum chemistry), than comparable instances of other classically hard problems. Of course, our results only represent upper bounds. While we attempted to optimize the implementation wherever possible, it is likely that further improvements can be found, and it is conceivable that another algorithm (or computational task) may offer better performance. Our work establishes a concrete set of benchmarks that we hope can be improved through future studies.

Demonstrations of digital quantum simulation performed to date (38⇓⇓–41) have been limited in scope, primarily using the first-order formula [except for some limited applications of the second-order formula (38, 39)]. Our results show that higher-order formulas are useful even for simulations of small systems. In the near term, it could be fruitful to demonstrate the utility of these formulas experimentally. Even relatively small experiments might be able to probe the validity of our empirical error bounds.

We have also identified some avenues for future improvement of quantum simulation algorithms. We saw that rigorous error bounds for product formulas are very loose, even with our newly developed commutator bound. This motivates attempting to prove stronger rigorous error bounds for product formulas. Also, the difficulty of computing the angles needed to perform the QSP algorithm prevents us from taking full advantage of the algorithm in practice, so it would be useful to develop a more efficient classical procedure for specifying these angles.

After the initial version of this paper was posted, Haah et al. (42) developed a new quantum algorithm for simulating the dynamics of spatially local Hamiltonians. Their approach uses bounds on the propagation of information in a local system to give a simulation with complexity only linear in the product of the system size and the evolution time (up to log factors), asymptotically improving over all previous methods. While the gate count estimates presented in figure 3 of ref. 42 suggest that, for the example considered in this paper, the PF and QSP methods remain favorable for systems with up to about 100 spins, the techniques developed in ref. 42 are likely to be useful in relatively near-term simulations.

Further reduction of the gate count could be especially significant if it led to a simulation with sufficiently few gates to be performed without invoking fault tolerance. With our current estimate of millions of cnot gates for a superclassical simulation, this is likely out of reach at present. However, further improvement could obviate the need for error correction in a system with highly accurate gates, making an early demonstration of superclassical simulation more accessible.

Finally, our work has considered an idealized system. Practical devices will come with architectural constraints, may use different basic operations than those considered here, may allow parallelization of gates, and will likely require fault tolerance. By incorporating such features, we hope the work begun here will lead to a blueprint for an early practical quantum computation.

## Acknowledgments

We thank Zhexuan Gong, Alexey Gorshkov, Guang Hao Low, Chris Monroe, and Nathan Wiebe for helpful discussions. This work was supported, in part, by the Army Research Office (Multidisciplinary University Research Initiative Award W911NF-16-1-0349), the Canadian Institute for Advanced Research, and the National Science Foundation (Grant 1526380). This material was partially based on work supported by the National Science Foundation during D.M.’s assignment at the foundation. Any opinion, finding, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

## Footnotes

- ↵
^{1}To whom correspondence should be addressed: Email: amchilds{at}umd.edu.

Author contributions: A.M.C., D.M., Y.N., N.J.R., and Y.S. designed research, performed research, and wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The implementations of quantum algorithms for the simulation of Hamiltonian dynamics in the Quipper quantum programming language have been deposited on GitHub and are available at https://github.com/njross/simcount.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1801723115/-/DCSupplemental.

Published under the PNAS license.

## References

- ↵
- ↵
- Debnath S, et al.

- ↵
- Song C, et al.

- ↵
- ↵
- ↵
- Linke NM, et al.

- ↵
- Harrow AW,
- Montanaro A

- ↵
- ↵
- Wecker D, et al.

- ↵
- Wecker D,
- Bauer B,
- Clark BK,
- Hastings MB,
- Troyer M

- ↵
- Jordan SP,
- Lee KSM,
- Preskill J

- ↵
- ↵
- Berry DW,
- Ahokas G,
- Cleve R,
- Sanders BC

- ↵
- Berry DW,
- Childs AM

- ↵
- Berry DW,
- Childs AM,
- Cleve R,
- Kothari R,
- Somma RD

- ↵
- Low GH,
- Chuang IL

- ↵
- Low GH,
- Chuang IL

- ↵
- Luitz DJ,
- Laflorencie N,
- Alet F

- ↵
- ↵
- Schreiber M, et al.

- ↵
- Smith J, et al.

- Berry DW,
- Childs AM,
- Cleve R,
- Kothari R,
- Somma RD

- ↵
- Berry DW,
- Childs AM,
- Kothari R

- ↵
- Green AS,
- Lumsdaine PL,
- Ross NJ,
- Selinger P,
- Valiron B

- ↵
- Nam Y,
- Ross NJ,
- Su Y,
- Childs AM,
- Maslov D

- ↵
- Ross NJ,
- Selinger P

- ↵
- ↵
- Raeisi S,
- Wiebe N,
- Sanders BC

- ↵
- Babbush R,
- McClean J,
- Wecker D,
- Aspuru-Guzik A,
- Wiebe N

- ↵
- Reiher M,
- Wiebe N,
- Svore KM,
- Wecker D,
- Troyer M

- ↵
- Maslov D

- ↵
- Poulin D, et al.

- ↵
- Häner T,
- Steiger DS

- ↵
- Pednault E, et al.

- ↵
- Kutin SA

- ↵
- Kleinjung T, et al.

- ↵
- Roetteler M,
- Naehrig M,
- Svore KM,
- Lauter K

- ↵
- ↵
- Lanyon BP, et al.

- ↵
- Barends R, et al.

- ↵
- O’Malley PJJ, et al.

- ↵
- Haah J,
- Hastings MB,
- Kothari R,
- Low GH

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Computer Sciences