New Results on Learning and Reconstruction of Quantum States

New Results on Learning and Reconstruction of Quantum States  Scott Aaronson (University of Texas at Austin) Weizmann Institute, November 6, 2017

Measurements in QM are Destructive One qubit has infinitely many bits in the amplitudes ,—but when you measure, the state “collapses” to just 1 bit! (0 with probability ||2, 1 with probability ||2) This destructiveness opens cryptographic possibilities, like quantum key distribution and quantum money—but it’s also a huge practical limitation!

And there’s a lot to be destroyed! An n-qubit pure state requires 2n complex numbers to specify, even approximately: Yet measuring yields at most n bits (Holevo’s Theorem) So should we say that the 2n complex numbers are “really there” in a single copy of |—or “just in our heads”? A probability distribution over n-bit strings also involves 2n real numbers. But probabilities don’t interfere!

Interlude: Mixed States DD Hermitian positive semidefinite matrix with Tr()=1 The most general kind of state in QM: encodes (everything measurable about) a distribution where each superposition |i occurs with probability pi A yes-or-no measurement can be represented by a DD Hermitian psd matrix E with all eigenvalues in [0,1] The measurement “accepts”  with probability Tr(E), and “rejects”  with probability 1-Tr(E)

Quantum State Tomography Task: Given lots of copies of an unknown D-dimensional quantum mixed state , produce an approximate classical description of  O’Donnell and Wright and Haah et al., STOC’2016: ~(D2) copies of  are necessary and sufficient for this Experimental Record: Innsbruck group (2005). 8 particles, ~656,000 measurement settings!Keep in mind: D = 2n

Quantum Computing Skepticism Levin Goldreich ‘t Hooft Davies Kalai Some physicists and computer scientists believe quantum computers will be impossible for a fundamental reason For many of them, the problem is that a quantum computer would “manipulate an exponential amount of information” using only polynomial resources But in what senses is it really an exponential amount?

Can we tame the exponential beast? Idea: “Shrink quantum states down to reasonable size” by asking what they’ll actually be used for In a sequence of works for 14 years, I’ve been trying to use tools from computational learning theory to do exactly that: • Postselected learning theorem [A. 2004] • Quantum Occam’s Razor Theorem [A. 2006] • Full characterization of quantum advice [A.-Drucker 2010] • Shadow tomography theorem [A. 2017] • Online learning of quantum states [A.-Chen-Hazan-Nayak 2017] I’ll tell you about the last two, and the others where relevant…

Shadow Tomography Given: Unknown D-dimensional mixed state , known 2-outcome measurements E1,…,EM Goal: Estimate Tr(Ei) to within additive error , for every i[M], with 1- success probability How many copies of  must we measure? Clearly Õ(D2) copies suffice, and also Õ(M) copies. But what about poly(log D, log M)? Main Result: It’s possible to do Shadow Tomography using only copies Doesn’t this violate Holevo’s Theorem? Nope…

Interlude: Gentle Measurement Winter 1999, A. 2004 Suppose a measurement of a mixed state  yields a certain outcome with probability 1- Then after the measurement, we still have a state ’ that’s -close to  in trace distance Moreover, we can apply M such measurements in succession, and they’ll all accept w.p. 1-2M Often combined with amplification: measure k copies of  and take a majority to push down error by 1/exp(k)

Why Doesn’t Gentle Measurement Immediately Solve the Problem? Implies a promise gap version of shadow tomography: Given measurements E1,…,EM and real numbers c1,…,cM, and promised that for each i, either Tr(Ei)ci or Tr(Ei)ci-, we can decide which for every i using only O(log M / 2) copies of  When there’s no promise, we can never rule out that we’re on the knife-edge between acceptance and rejection—in which case, measuring  is dangerous! Our main contribution is to solve this

Implications Given an n-qubit state |, for any fixed polynomial p, by measuring nO(1) copies of |, we can learn |’s behavior on every accepting/rejecting circuit with p(n) gates Given any problem in PromiseBQP/qpoly, nO(1) copies of the advice state are enough to reconstruct the whole truth table … even without knowing the promise! Likewise, in any 1-way communication protocol where Alice sends q qubits and Bob has an m-bit input,Õ(qm4) copies of Alice’s message let Bob learn everything Any scheme for quantum software copy-protection must require computational assumptions

Private-Key Quantum MoneyMy original motivation for shadow tomography Scheme where a bank prepares states |$ that it can verify, but no one else can feasibly copy Wiesner 1970 (!): Money scheme that’s unconditionally secure, but where the bank needs to store a separate set of measurement bases for every bill in circulation Bennett et al. 1982: Can avoid that using pseudorandom functions, but then it’s no longer unconditionally secure Tradeoff Theorem (A. 2016): Every money scheme requires either a computational assumption, or else a giant database maintained by the bank This follows immediately from the shadow tomography theorem! (Why?) Makes my earlier proof obsolete

Proving the shadow tomography theorem involves combining several previous ingredients. I’ll start with the problem of…The Absent-Minded Advisor | Can you hand all your grad students the same nO(1)-qubit quantum state |, so that by measuring their copy of | in a suitable basis, each student can learn the {0,1} answer to their n-bit thesis question? NO [Ambainis, Nayak, Ta-Shma, Vazirani 1999] Indeed, quantum communication is no better than classical for this task as n

Turning the lemon into lemonade… Suppose Alice wants to describe a D-dimensional mixed state  to Bob, well enough for Bob to estimate Tr(Ei) to within , for any of M two-outcome measurements E1,…,EM known to both players Postselected Learning Theorem (A. 2004): In that case, it suffices for Alice to send Bob only classical bits (trivial bounds: ~D2, ~M)

How does the theorem work? 1 2 3 I Alice is trying to describe  to Bob (actually an amplified version) Initially, Bob knows nothing about , so he guesses it’s the maximally mixed state0=I (actually ) Then Alice helps Bob improve, by repeatedly telling him a measurement Ei(t) on which his current guess t-1 badly fails Bob lets t be the state obtained by starting from t-1, then performing Ei(t) and postselecting on getting the right outcome

Crucial Claim: Bob’s iterative learning procedure will “converge” on a state T that behaves like *on all measurements E1,…,EM, after at most T=O(log D*) iterations Proof: Let pt = Pr[first t postselections all succeed]. Then Solving, we find that t = O(log D*) So it’s enough for Alice to tell Bob about T=Õ(log D) measurements Ei(1),…,Ei(T), using log(M) bits per measurement If pt wasn’t less than, say, (2/3)pt-1, learning would’ve ended! Complexity theory consequence:BQP/qpoly  PostBQP/poly (Open whether BQP/qpoly=BQP/poly)

Alas, postselected learning doesn’t suffice for the shadow tomography theorem, because it gives us no idea how to find the classical description of  given copies of . So we’ll need to combine with another result called the…Quantum OR Bound • Let  be an unknown mixed state, and let E1,…,EM be known 2-outcome measurements. Suppose we’re promised that either • there exists an i such that Tr(Ei)c, or else • Tr(Ei)c- for all i[M]. • Then we can decide which, with high probability, given only O((log M)/2) copies of 

[A. 2006] claimed a proof of the Quantum OR Bound, based on just applying amplified Ei’s in a random order [Harrow-Lin-Montanaro, SODA’2017] discovered an error in my proof—but also fixed it! They give two procedures, one of which is to prepare a control qubit in the state then repeatedly apply amplified Ei’s, while also periodically measuring the control qubit to see if it’s decohered (in which case we’re in case (ii)) It remains open whether my original, simpler procedure is also sound

Gentle Search Procedure Lemma: Let  be an unknown mixed state, and let E1,…,EM be known two-outcome measurements. Suppose we’re promised that there exists an i such that Tr(Ei)c. Then we can find a j such that Tr(Ej)c-, with probability 1-, using this many copies of : Proof Sketch: Reduce search to decision using binary search (“oldest trick in the book!”). As we recurse, promise degrades from c to c- to c-2, etc. That, plus the need for fresh copies of  at each level, is what produces the log4M. I conjecture it’s not tight.

Putting Everything Together To do shadow tomography, we run my postselected learning algorithm—but in each iteration, using the Gentle Search Procedure to find a measurement Ei such that Carefully Chernoff-bounding, union-bounding, etc. etc., I end up using copies of 

Lower Bound for Shadow Tomography Theorem: Given an unknown distribution  over n-bit strings, and known Boolean functions f1,…,fM, we need (n/2) samples from  to approximate Ex~[fi(x)] to within  for all i[M] (i.e., shadow tomography requires at least that many samples, for reasons having nothing to do with QM) I invite you to prove this yourself! (my approach used distributions with a single -sized Fourier coefficient s{0,1}n. It argued that each sample has mutual information only O(2) with s, but approximating Ex~[rx(mod 2)] for every r entails learning s) Note: In the classical case, ((log D)/2) is tight

Computational Cost of Shadow Tomography Our procedure uses Õ(LM) + DO(log log D) time, where L is the circuit complexity of implementing a single Ei Brandão et al. 2017: By combining our ideas with recent SDP algorithms, achieve shadow tomography with poly(log M, log D) copies and Õ(LM) + DO(1) time Under strong assumptions on the Ei’s: polylog(D)M time Note: Even writing the input takes ~MD2 bits, and output takes ~M bits, unless implicit representations are used If “hyper-efficient” shadow tomography is possible, then BQP/qpoly=BQP/poly and there’s no quantum copy-protection

Online Learning of Quantum StatesAnother result, by A.-Chen-Hazan-Nayak (in preparation) Theorem: Let  be an unknown D-dimensional state. Suppose we’re given 2-outcome measurements E1,E2,… in sequence, each followed by the value of Tr(Et) to within /2. Each time, we’re challenged to guess a hypothesis state  such that We can do this so that we fail at most O((log D)/2) times. Proof #1: Adapt my postselected learning theorem Proof #2: Upper-bound sequential fat-shattering dimension Proof #3: Semidefinite programming blah blah

The Non-Realizable Case Theorem:Now suppose there need not be any actual state . Even then, for any sequence b1,b2,…, we can ensure that our hypothesis states 1,2,… satisfy These results complement my earlier Quantum Occam’s Razor Theorem(A. 2006), which showed that for any distribution over two-outcome measurements, it takes only O(log D) sample measurements from , and their outcomes, before we can approximately predict Tr(E) for most further E~

Open Problems Shadow tomography with fewer copies of ? Maybe even log(M), independent of the Hilbert space dimension D??? Is shadow tomography possible without using “collective” and “nondemolition” measurements? Improve computational complexity beyond Brandão et al.? Do better for subclasses of states and measurements(PAC-learning stabilizer states: Rocchetto 2017) Find more applications of these results, whether in complexity theory or experimental physics!

New Results on Learning and Reconstruction of Quantum States