Main

Taming the exponential complexity of many-body quantum systems remains one of the biggest challenges in modern physics. Exact numerical simulations provide the gold standard in accuracy. However, the computational cost quickly becomes prohibitive above a few tens of particles, and even rapid developments in computing power cannot outpace the exponential scaling of the complexity of fully solving a many-body quantum system. Although there are efficient methods for estimating the ground states of various quantum systems captured by local Hamiltonians, complexity becomes even more of an obstacle for time evolution. The time evolution of a given quantum state under the action of a local Hamiltonian is BQP complete in the worst case complexity. For this reason, one cannot hope to find universal classical methods that can accurately and efficiently simulate this evolution for all time and all local Hamiltonians1. Although the ultimate goal may be the development of flexible and reliable quantum simulators2,3,4 able to directly realize many models of interest, in the near term we must continue to rely upon classical computers to simulate quantum matter.

To that end, many highly effective numerical techniques have been developed for studying many-body quantum systems subject to controlled and clear approximations. Leading the charge are tensor network methods5,6, which are instances of variational methods that build on tensor networks, particularly matrix product states in one dimension and projected entangled pair states in two dimensions. These methods work well for ground states and the short-time evolution of generic non-integrable systems, but are limited in how they can capture dynamics, a state of affairs sometimes dubbed the ‘entanglement barrier’. One may encounter indefinite oscillations and can, hence, capture long-time dynamics only for some specific and fine-tuned instances of weak ergodicity breaking7. Disorder will also assist in lessening the burden of long-time dynamics with tensor network methods8,9. That said, the core limitation stems from the generation of entanglement, as highly entangled systems require large bond dimensions, giving rise to computationally intractable situations, particularly in two dimensions. Quantum Monte Carlo techniques10 are also widely used, including for non-equilibrium dynamics11,12. However, they suffer from the well-known sign problem and stability issues. Dynamical mean-field theory can also capture quantum dynamics13,14, but again stability matters arise. Recently, machine learning has been used to capture many-body dynamics, which constitutes a strikingly interesting approach15,16,17, but here, questions about the predictive power and the explanatory value emerge. These obstacles all reflect the computational hardness of the task and highlight the need for thinking about tools for many-body dynamics that are entirely different altogether.

In this work, we develop a radically different approach to time evolution in closed quantum systems. Combining the established method of continuous unitary transforms (CUTs), which are also known as ‘flow equations’18,19,20,21,22,23,24,25, with the newly developed method of scrambling transforms (sketched in Fig. 1), we present a flexible and powerful approach to diagonalizing large Hamiltonians and computing time evolution to very long times. The key ingredient in our work is the use of scrambling transforms to improve the convergence properties of CUT-based methods, as this significantly improves their accuracy and validity. We demonstrate the potential of this technique by computing the dynamics of disordered quantum systems in one and two dimensions. The limitation is very different compared to tensor network approaches. Here, it is not entanglement that provides the limitation but the accuracy of the approximate transform used. Heuristically, we find that in practice, this restriction is less severe than overcoming the entanglement barrier.

Fig. 1: Cartoon illustration of the scrambling process.
figure 1

a, The conventional CUT process, which uses a single unitary transform U to diagonalize a Hamiltonian H, smoothly transforming it from the initial basis (l = 0) to the diagonal basis (l → ). b, The scrambling transform S first induces ‘effective disorder’, even in completely clean systems, which allows established CUT techniques to then take over and efficiently diagonalize the scrambled Hamiltonian SHS with a second unitary transform U.

We will focus on a generic system of interacting fermions, captured as

$$\begin{aligned}H&=\mathop{\sum}\limits_{i,\,j\in {{{\mathcal{L}}}}}{H}_{ij}^{\,(2)}:{c}_{i}^{{\dagger}}c_{j}:+\mathop{\sum}\limits_{i,\,j,k,q\in {{{\mathcal{L}}}}}{H}_{ijkq}^{\,(4)}:{c}_{i}^{{\dagger} }c_{j}{c}_{k}^{{\dagger} }{c}_{q}:,\\ &=:{H}^{\,(2)}+{H}^{\,(4)},\end{aligned}$$
(1)

where : . . . : represents normal ordering with respect to the vacuum, and \(| {{{\mathcal{L}}}}| =:L\) is the system size. We make no assumptions as to the form of the couplings nor the dimensionality of the system. The complexity of the calculation is set by the total number of lattice sites L, not by their geometry or the size of the local Hilbert spaces. A two-dimensional (2D) or three-dimensional system can be unfolded onto a one-dimensional (1D) system with long-range hopping, as sketched in Fig. 2, which does not pose a problem for CUT-based techniques.

Fig. 2: Unfolding a two-dimensional system.
figure 2

Illustration of how a 2D lattice can be mapped onto a 1D chain with correlated long-range hopping, which can be easily handled with CUT-based techniques.

Flow equation methods diagonalize the Hamiltonian by successively applying infinitesimal unitary transforms \(\mathrm{d}U(l)=\exp (-\eta (l)\,{{{\rm{d}}}}l)=\)\(1-\eta(l){\rm{d}}l\), where η(l) is the generator and l represents a fictitious ‘flow time’ such that l = 0 is the initial Hamiltonian. The parameterized Hamiltonian H(l) U(l)HU(l) becomes diagonal in the limit l → , where the full unitary transform \(U(l)=\) is a time-ordered integral over flow time l. The diagonalization procedure can be recast as solving the equation of motion dH/dl = [η(l), H(l)] (refs. 23,24). We store H(2) as a matrix with O(L2) entries and H(4) as a tensor of order four with O(L4) real entries, and we employ a similar procedure for the generator η(l) =: η(2)(l) + η(4)(l). This allows the relevant commutators to be computed efficiently as the sum of all one-point contractions of pairs of matrices or tensors26, at a cost polynomial in system size. In all of the following, we truncate at fourth order, \({{{\mathcal{O}}}}({L}^{4})\). The main consequence of fermionic statistics is the minus signs, which arise when computing the contractions. The method can be applied to bosons with minor changes.

A common choice of generator is η(l)  [H0, V(l)], where H0(l) and V(l) are, respectively, the diagonal and off-diagonal parts of the Hamiltonian. In the following, we use the symbol V for off-diagonal elements. This is often known as the Wegner generator23,24. The diagonalization can be seen because the squared \(\parallel V(l){\parallel }_{2}^{2}\) is non-increasing in the fictitious time l as \({{{\rm{d}}}}\parallel V(l){\parallel }_{2}^{2}/{{{\rm{d}}}}l=-2\parallel \eta (l){\parallel }_{2}^{2}\le 0\) (see, for example, ref. 27). Convergence relies upon the model in question having a clear separation of energy scales in the initial basis. Models where this is not true (such as homogeneous systems and disordered systems with many near-degeneracies) cannot be fully diagonalized by this generator, as they act like unstable fixed points. Perturbing the Hamiltonian away from this fixed point can allow the flow to begin. However, small perturbations can result in long convergence times, whereas large perturbations improve convergence but risk changing the underlying physics. Here, we resolve this by introducing scrambling transforms, which are targeted unitary transforms aimed at lifting degeneracies, which the Wegner procedure alone is unable to resolve. As they are unitary, they cannot change the underlying physics. They simply act to ‘prepare’ the Hamiltonian in a basis more amenable to being diagonalized by the conventional Wegner flow (Fig. 1). The (infinitesimal) scrambling transform takes the form \(\mathrm{d}S(l)=\exp (-\lambda (l)\,{{{\rm{d}}}}l)\), with a generator λ(l) given by

$${\lambda }_{ij}(l):= \begin{cases}\operatorname{sgn}(i-j\,){H}_{ij}^{\,(2)}(l):{c}_{i}^{{\dagger} }{c}_{j}:,&{{\text{if}}}\,\,{H}_{ij}^{\,(2)}(l)\ge \delta h,\\ 0,&{{\mbox{otherwise}}},\end{cases}$$
(2)

with \(\delta h=\varepsilon | {H}_{ii}^{\,(2)}(l)-{H}_{jj}^{\,(2)}(l)|\), where ε > 0 is the threshold parameter, which controls how easily the scrambling transform triggers. For ε = 0, this reduces to the Toda–Mielke generator27,28. Here, we use ε = 0.5. The full scrambling transform S(l) can be written as a time-ordered integral over dS(l). It is employed at the beginning of the flow and during the diagonalization procedure if degeneracies are encountered (see Supplementary Fig. 1 for details).

The scrambling transform used here is quadratic and does not induce any new higher-order terms. However, the action of the Wegner generator will typically lead to the generation of new terms containing six or more fermionic operators, like the way that such terms arise in renormalization group procedures. The central approximation of the CUT technique is that the Hamiltonian must be truncated and terms above a certain order neglected. We shall present rigorous error bounds later; for the moment, we emphasize that in cases where the method is insufficiently exact, higher-order terms can be systematically included until the desired precision is reached, at a cost polynomial in the system size.

We will investigate initial local Hamiltonians of the form \(H=\sum_{i\in {{{\mathcal{L}}}}}\)\([{h}_{i}:{n}_{i}:+J(:{c}_{i}^{{\dagger} }{c}_{i+1}:+\;\mathrm{H.c.})+{\Delta }_{0}:{n}_{i}{n}_{i+1}:]\), using open boundary conditions, with J = 1 and Δ0 = 0.1. In one dimension, this Hamiltonian maps onto the XXZ chain by a Jordan–Wigner transform. We diagonalize these Hamiltonians in both one and two dimensions, for two different choices of hi: random disorder (hi [−d, d]) and quasi-periodic potentials (QP). For the latter, in one spatial dimension, \({h}_{i}:= d\cos\) \((2\pi i/\phi +\theta )\), with \(\phi := (1+\sqrt{5})/2\) and θ a (real) randomly chosen phase that plays the role of a ‘disorder realization’. In two dimensions, \({h}_{i}:= d(\cos(2\pi {i}_{x}/\phi +\theta )+\cos (2\pi {i}_{y}/{\phi }_{2}+{\theta }_{2}))\), where (ix, iy) represent the coordinates of lattice site i, \({\phi }_{2}=1+\sqrt{2}\) and θ2 is another random phase. For simplicity, we refer to d as the ‘disorder strength’ in both cases. This model (and its spin chain equivalent) has been extensively studied in the context of many-body localization in the presence of both random and QP potentials, particularly in one dimension29,30,31,32 but also in two dimensions33. The end point is an (approximately) diagonal Hamiltonian:

$$\tilde{H}=\mathop{\sum}\limits_{i\in {{{\mathcal{L}}}}}{\tilde{h}}_{i}:{\tilde{n}}_{i}:+\mathop{\sum}\limits_{i,\,j\in {{{\mathcal{L}}}}}{\Delta }_{ij}:{\tilde{n}}_{i}{\tilde{n}}_{j}:+{{{\mathcal{R}}}},$$
(3)

where \({{{\mathcal{R}}}}\) represents neglected higher-order terms, typically of order \(O({\Delta }_{0}^{2})\) and higher. The interaction coefficients decay exponentially with distance in strongly (quasi)disordered systems, Δij eij/ξ (refs. 26,33,34,35,36,37), where the \({\tilde{n}}_{i}\) operators are known as local integrals of motion. We emphasize, however, that the form of equation (3) does not assume localization or the existence of local integrals of motion. This construction is equally valid whether the unitary transform is quasilocal (as for many-body localization) or entirely non-local.

Once the Hamiltonian has been diagonalized, it is possible to obtain a closed-form solution (within a given truncation scheme) to the Heisenberg equation of motion for any operator O expressed in the diagonal basis. The operator must first be transformed according to the flow equation \({{{\rm{d}}}}O/{{{\rm{d}}}}l=[\overline{\eta }(l\,),O(l\,)]\), where \(\overline{\eta }(l\,)\) collectively denotes both the scrambling and Wegner generators. This transformed operator also contains valuable information about the locality of the unitary transform and can be used to extract both a localization length and a measure of the ‘complexity’ of the diagonalization procedure, which can be linked to the existence of Lieb–Robinson bounds in flow time38. Specifically, the transformed creation operator takes the form \({c}_{i}^{{\dagger} }={\sum }_{j}\,{A}_j^{(i)}{\tilde{c}}_j^{{\dagger} }+{\sum }_{j,k,q}{B}_{jkq}^{(i)}{\tilde{c}}_{j}^{{\dagger} }{\tilde{c}}_{k}^{{\dagger} }{\tilde{c}}_{q}\), and higher-order terms are neglected. A measure of the complexity of the transformed operators is given by the fraction of non-zero terms in this operator expansion. Intuitively, we would expect \({c}_{i}^{{\dagger}}\) to remain sparse in a localized phase but not in a delocalized phase. Supporting analysis is shown in Supplementary Information Section 3. In practice, we choose a cutoff value ϵ = 10−6 below which we consider terms to be zero. The complexity is defined as

$$\chi (\epsilon )=\frac{| \{x\in (A\cup B)| {x}^{2} > {\epsilon }^{2}\}| }{| \{x\in (A\cup B)\}| },$$
(4)

where (AB) represents the set of all coefficients Ai and Bijk in the operator expansion of \({c}_{i}^{{\dagger} }\). We also define \(\overline{\chi (\epsilon )}=| \{x\in (A\cup B)| {x}^{2} > {\epsilon }^{2}\}|\). The results in Fig. 3 demonstrate a qualitative difference between one and two spatial dimensions. In one dimension, we find a phase where \(\overline{\chi (\epsilon )}\) tends to a constant and χ(ϵ)  (1/L)3 for large system sizes, indicating a ‘low complexity’ situation at strong disorder, as well as a higher complexity phase at small values of d where \(\overline{\chi (\epsilon )}\) increases rapidly with system size, suggestive of thermalization. In two dimensions, we find that \(\overline{\chi (\epsilon )}\) always increases, although for a quasi-periodic potential at large values of d, it increases sufficiently slowly that the normalized complexity χ(ϵ) still vanishes. By contrast, for small values of d in two dimensions, the complexity χ(ϵ) remains much larger than zero for all system sizes studied here. This suggests a slow crossover from a high complexity phase (consistent with the expectation of thermalization at small values of d) to a low complexity phase with anomalous thermalization properties. This notion of complexity is reminiscent of circuit complexity39,40.

Fig. 3: The complexity χ of the transformed creation operator \({c}_{i}^{{\dagger} }\) in the middle of the system.
figure 3

ad, Results are averaged over disorder realizations, with Ns [20, 1,024] samples depending on system size (Supplementary Information). Error bars show the standard deviation. a,b, Results in one dimension (L = 8, 10, 12, 16, 24, 36, 48, 64, 100) for random (a) and QP potentials (b). c,d, The same in two dimensions (L2 = 9, 16, 25, 36, 49, 64, 100) for random (c) and QP potentials (d). Dashed black lines close to the origin in a and b are fits with the form χ 1/L3, which suggests localization and is valid for large systems and strong disorder. Insets show the unnormalized complexity (that is, the numerator of equation (4)), which tends to a constant in strongly disordered 1D chains but grows in two dimensions even for strong disorder.

Previous works that used CUT methods to compute non-equilibrium dynamics33,36,41 employed a computationally costly inversion of the unitary transform to obtain time-evolved operators in the original basis. Here, we circumvent this limitation and directly obtain the infinite-temperature autocorrelation function. This highly non-trivial quantity fully characterizes the transport properties of the system. The thermal expectation value of any arbitrary operator O is given by \(\langle O\rangle =\operatorname{Tr}[\exp (-\beta H)O]/\operatorname{Tr}[\exp (-\beta H)]\), where β = 1/T is the inverse temperature (in units of kB = 1). In the limit T → , the expectation value becomes a uniform average over eigenstates, which in the diagonal basis are trivial product states. We approximate this average for large systems by randomly sampling \({{{{\mathcal{N}}}}}_{s}\in [50,256]\) half-filled eigenstates. Specifically, we compute

$$C(t)=4\langle ({n}_{i}(t)-1/2)({n}_{i}(0)-1/2)\rangle .$$
(5)

To minimize boundary effects, we choose i to be in the centre of the system. This is a highly demanding quantity that can be extremely challenging to compute with other methods but can be obtained very efficiently with the flow equation approach. Results for system size L = 100 (L2 = 10 × 10 for two dimensions) are shown in Fig. 4. The results remain reasonable far beyond the naive expectation that the accuracy should break down beyond timescales \(tJ \approx1/{\Delta }_{0}^{2}\), corresponding to the typical inverse magnitude of the terms cut off by the truncation, as verified by comparison with exact diagonalization (Supplementary Fig. 7). Figure 5 shows results for system sizes up to L = 100 (L2 = 10 × 10 in two dimensions), along with a linear fit indicating the L →  behaviour. Strikingly, very little dependence on system size is observed in one dimension, although in two dimensions there is a slow trend towards decreasing values of C(t) as the system size increases, except for strong QP potentials. This is consistent with the expectation that many-body localization may be ultimately unstable in two dimensions. For weak random disorder in two dimensions, the linear fit for large values of L breaks down, suggesting that our results probably overestimate the long-time value of C(t) as L → . Additionally, at any finite order of truncation, there may still exist higher-order processes that could contribute to thermalization on very long timescales. Nonetheless, for a given truncation scheme, we can make precise statements about the validity of this technique.

Fig. 4: Infinite-temperature correlation functions shown for a variety of disorder types, strengths, and dimensionalities.
figure 4

ad, Infinite-temperature correlation functions: a, 1D, random, L = 100; b, 1D, QP, L = 100; c, 2D, random, L2 = 10 × 10; d, 2D, QP, L2 = 10 × 10. Grey dot-dashed vertical lines indicate the approximate timescale beyond which accuracy cannot be guaranteed. However, the results typically remain reasonable until much longer timescales. Black dashed lines show the long-time average computed directly without explicit time evolution, which is valid at strong (quasi)disorder only. The results are averaged over Ns [50, 128] disorder realizations. Error bars indicate the variance over disorder realizations. For both D = 1 and D = 2, the QP potential exhibits most much robust localization at large values of d/J, but by contrast also exhibits more complete thermalization at low values of d/J due to the underlying single-particle phase transition at d/J = 2.

Fig. 5: Finite-size scaling results for the long-time average of the infinite-temperature correlation function \(\overline{C(t)}\), averaged over a time window of Jt [50, 103], and then averaged over Ns [20, 1,024] disorder realizations depending on system size.
figure 5

ad, Results for various system sizes, disorder types and strengths, and dimensionalities: a, 1D, random; b, 1D, QP; c, 2D, random; d, 2D, QP. Error bars indicate the variance over disorder realizations. The black squares indicate the results obtained from exact diagonalization, shown for small system sizes only. System sizes are L = 8, 10, 12, 16, 24, 36, 48, 64 and 100 in one dimension, and L2 = 9, 16, 25, 36, 49, 64 and 100 in two dimensions. e, The result of linearly extrapolating to L → . Error bars indicate the uncertainty (variance) in the fits (shown as dashed lines in ad) to extract the scaling behaviour. Lines are guides to the eye.

To do so, we develop an incremental bound on the error in the unitary transform. If at each flow time step we discard all newly generated terms above fourth order, we obtain

$$\begin{aligned}H(l+{{{\rm{d}}}}l)&=H(l)+{{{\rm{d}}}}l[\eta (l),H(l)]\\ &=:H(l)+{{{\rm{d}}}}l(dH(l)+A(l)),\end{aligned}$$
(6)

where dH(l) = [η(2)(l), H(2)(l)] + [η(2)(l), H(4)(l)] + [η(4)(l), H(2)(l)] represents the terms of the flow that are kept and \(A(l)=[{\eta }^{(4)}(l),{H}^{\,(4)}(l)]+{{{\mathcal{T}}}}\) represents the truncation error, where the higher-order terms \({{{\mathcal{T}}}}\) are assumed to be negligible in what follows. The norm of the truncation error A(l) at each infinitesimal time step is upper bounded by

$$\begin{array}{ll}\Vert A(l){\Vert }_{2}&=\left\Vert\left[{\eta }^{(4)}(l),{H}^{(4)}(l)\right]\right\Vert _{2}\\&\le \sqrt{2}\left\Vert{\eta }^{(4)}(l)\right\Vert_{2}\,\left\Vert{H}^{(4)}(l)\right\Vert_{2}\\ &\le 2\left\Vert{H}_{0}^{(4)}(l)\right\Vert_{2}\,\left\Vert{V}^{(2)}(l)\right\Vert_{2}\,\left\Vert{H}^{(4)}(l)\right\Vert_{2}\end{array}$$
(7)

using the submultiplicativity of the .2 norm42. The total truncation error in flow time can be written as an integral of equation (7):

$${\varepsilon }^{{l}_{\max }}_{T}\le 2\int\nolimits_{0}{{{\rm{d}}}}l\left\Vert {H}^{(4)}_{0}(l)\right\Vert^{(2)}_{2}\left\Vert {V}(l)\right\Vert^{(4)}_{2}\left\Vert {H}(l)\right\Vert_{2}$$
(8)

over l. Typical values of V(2)(l) decay exponentially in flow time, that is, \({[{V}^{\,(2)}(l)]}_{ij}\propto \exp (-{({h}_{i}-{h}_{j})}^{2}l){[{V}^{\,(2)}(0)]}_{ij}\). Assuming random disorder drawn from a box distribution of width [−d, d] such that the mean value of this squared energy difference is 2d2/3 and that the largest parts of the interaction tensor remain proportional to the initial interaction strength (as new terms induced by the flow should always be smaller than the initial interactions), the error can be approximated as

$${\varepsilon }_{T}\propto {J}_{0}{\Delta }_{0}^{2}\int\nolimits_{0}^{{l}_{\max }}\,{{{\rm{d}}}}l\operatorname{e}^{-l2{d}^{\,2}/3}=\frac{3}{2}\frac{J{\Delta }_{0}^{2}}{{d}^{\,2}}\quad\text{as}\quad{l}_{\max }\to \infty.$$
(9)

For weak (quasi)disorder, the disorder bandwidth d is replaced by the effective bandwidth \(\tilde{d}\ge d\) induced by the scrambling transform (Supplementary Fig. 2). A numerical analogue can be computed by replacing the Hilbert–Schmidt (or Frobenius) norms in equation (7) with tensor Frobenius norms; the typical truncation error at each flow time step is well below 1% (Supplementary Fig. 6).

The above analysis indicates that energy differences below \(O(\,J{\Delta }_{0}^{2}/{d}^{\,2})\) cannot be reliably resolved, implying that the method will break down on timescales of the order \(t\propto {d}^{\,2}/(\,J{\Delta }_{0}^{2})\) when oscillations at corresponding frequencies \(\omega \approx O(\,J{\Delta }_{0}^{2}/{d}^{\,2})\) become relevant for the dynamics. The accuracy of the method can be systematically improved by incorporating additional higher-order terms into the truncated Hamiltonian, which allows accurate simulations of quantum dynamics to even longer times (proportional to \(1/{\Delta }_{0}^{3}\) at the next order of approximation) with a computational cost that remains polynomial in the system size. Future developments in massively parallel implementations of the tensor flow equation method26 used in this work, as well as advances in computer hardware, will facilitate the extension of this method to larger system sizes, longer timescales, stronger interactions and additional physical systems (including both driven37,43 and dissipative44 systems, which have been previously studied with CUT-based techniques). Scrambling transforms may be of interest in a variety of other contexts, as they are essentially a way of transforming a highly entangled system into a simpler representation that is easier to simulate.

We end the discussion by briefly comparing our findings with those of tensor network methods. Standard tensor network methods are challenged in time evolution by the exponentially growing bond dimension that is requ ired to accommodate states faithfully in time. Some steps have already been taken to allow tensor networks to access longer times45,46,47,48, for example, with folding techniques47 or adaptive mode transformations46. The ideas introduced here show that one can go further than that, as we find that the fermionic mode transformations do not have to be linear. There are good reasons to believe that this is a favourable way to reach long simulation times. In fact, once the Hamiltonian is diagonalized, in principle, all times are available and the accumulating errors can be upper bounded. First connections between flow equation and tensor network methods have been made49, in anticipation of the time-dependent variational principle based on a differential geometric picture50. It is conceivable that the ideas introduced here can be further merged with tensor network techniques, as one could think of final Hamiltonians that are not treated as fully diagonal ones. There is also the intriguing possibility of combining scrambling transforms and other CUT-based techniques with tensor network approaches, such as entanglement-based CUTs51, which may allow tensor network methods to break through the entanglement barrier.

In this work, we have introduced a flow-based method equipped with scrambling transforms that can simulate interacting fermionic quantum many-body systems to good accuracy for intermediate and long times. Such a classical development can also be seen as a challenge to dynamical quantum simulators2,4, which aim to probe non-equilibrium properties of quantum matter beyond the reach of classical computers. These are exciting avenues for future progress.

Methods

Computing and integrating the flow equation

All commutators computed in this work follow the scheme of ref. 26, in which the representation of the Hamiltonian in terms of a quadratic component (stored in memory as a matrix) and quartic component (stored as a tensor) allow the commutators to be recast in terms of matrix/tensor contractions, which are highly optimized linear algebra operations that can be performed efficiently on modern computing hardware. A complete description is contained in Supplementary Information Section 2. We used vacuum normal ordering such that higher-order terms in the running Hamiltonian have no feedback onto lower-order terms. The incorporation of additional non-perturbative corrections due to different choices of normal ordering has previously been done for the time-independent scenario52, but has been left for future work in the non-equilibrium setting. This would require specifying a particular reference state with respect to which the corrections are computed. Calculations for all system sizes with more than a total of 16 lattice sites were performed on graphics processing units (GPUs; specifically, NVIDIA RTX A5000 GPUs with 24 Gb RAM and NVIDIA RTX 2080Ti GPUs with 12 Gb RAM) using single-precision arithmetic.

The flow equation \({{{\rm{d}}}}Hl/{{{\rm{d}}}}l=[\overline{\eta }(l),H(l)]\) was solved using a mixed fourth- and fifth-order Runge–Kutta integration method as implemented in the JAX library53, which applies an adaptive step-size algorithm. The maximum integration time was \({l}_{\max }=\text{1,000}\), and the integration was stopped before then if the Hamiltonian was diagonalized to the target accuracy, which we chose to be when \(\max [| {V}^{\,(2)}| ] < 1{0}^{-6}\) and \(\max [| {V}^{\,(4)}| ] < 1{0}^{-3}\). Results for longer integration times showed no significant increase in accuracy, despite incurring a significantly higher computational cost. This is because the running Hamiltonian H(l) approaches full diagonalization only asymptotically at large values of l, so using larger values of \({l}_{\max }\) leads to diminishing returns.

Computing the dynamics

The transformed number operator was reconstructed from the transformed creation and annihilation operators for large fictitious time:

$${n}_{i}(l\to \infty )={c}_{i}^{{\dagger} }(l\to \infty )\times {c}_{i}(l\to \infty ),$$
(10)

with the creation operator given by transformed creation and annihilation operators:

$${c}_{i}^{\dagger}(l\to \infty )=\mathop{\sum}\limits_{j}{A}_j^{(i)}{\tilde{c}}_{j}^{{\dagger} }+\mathop{\sum}\limits_{j,k,q}{B}_{jkq}^{(i)}{\tilde{c}}_{j}^{{\dagger} }{\tilde{c}}_{k}^{{\dagger} }{\tilde{c}}_{q}$$
(11)

and the annihilation operator obtained by taking its Hermitian conjugate \({c}_{i}(l\to \infty )={({c}_{i}^{\dagger }(l\to \infty ))}^{\dagger}\). Multiplying these together allowed us to reconstruct the number operator, including terms up to sixth-order in the fermionic creation/annihilation operators for the diagonal basis, \({\tilde{c}}{_{i}^{{\dagger} }}\) and \({\tilde{c}}_{i}\). The number operator was then evolved in time in the diagonal basis according to the Heisenberg equation of motion, while neglecting newly generated higher-order terms, resulting in a closed-form solution. This step was performed on CPUs rather than GPUs due to memory limitations and is a prime candidate for future efficiency improvements. At long times, near-degenerate single-particle eigenvalues can still lead to divergent terms in this solution (consistent with the expectation that the simulation of a BQP-hard problem will eventually run into accuracy issues on a classical computer). However, these terms were strongly suppressed, arising only very rarely and at very long times. To prevent these rare scenarios from dominating the averaged data, we excluded disorder realizations when the maximum value of C(t) > 1.1. (Alternatively, we could have used the typical rather than mean value of C(t).) See Supplementary Information for full details of the calculation and for where divergent terms arise from. In one dimension, the divergent terms were rare enough to have essentially no effect. The long-time average was obtained directly by setting all off-diagonal terms in the transformed number operator to zero (as when time-evolved, they acquire oscillating phases that average to zero). For systems with greater than 36 lattice sites in total, we neglected the sixth-order contributions and kept only the quadratic and quartic terms when computing the dynamics. For the systems considered here, the sixth-order terms have a negligible effect, which can be seen from the qualitative agreement between small and larger systems.

Rescaling the correlation function

As the norm of the number operator ni is not precisely conserved by the unitary transform, we rescaled the correlation function for each disorder realization according to the ansatz C(t) c1(C(t) − c2), where c1 and c2 are determined by minimizing the error with respect to the short-time dynamics of the non-interacting system (as many-body interactions are essentially irrelevant at very short times). This is computationally efficient, as we get the exact dynamics of the non-interacting system essentially for free in this formalism by just retaining the quadratic components of the Hamiltonian and relevant observables. The rescaling employed in this work is justified a posteriori by the clear agreement between the rescaled C(t) and the exact result, which were computed for system sizes small enough for the comparison to be practical. Further analysis may be found in Supplementary Information Section 9. For small enough systems, an alternative would be to construct the operator as a matrix in the full Hilbert space and renormalize it by hand. However, this is not practical for systems as large as those considered here. We emphasize that the norm is preserved to high accuracy for sufficiently strong disorder, and the effects of this rescaling are most important for weakly disordered systems. This is independent of any error introduced in the eigenvalues and reflects the difficulty in simultaneously preserving the unitary evolution of both the Hamiltonian and the number operator within the same truncation scheme. The norm of the operator could, in principle, be exactly preserved by constructing the unitary transforms subject to additional constraints24. However, in practice this is challenging to implement. This underscores the need for further work in developing more flexible generators for the types of CUT developed here, perhaps in concert with machine learning approaches to design data-driven generators tailored for specific problems subject to specific hard-to-satisfy constraints.