Designing Scalable Quantum Computer Architectures: Layout and Initialization

Similar documents

Introduction to Quantum Computing

Quantum Computing Architectures

Quantum Computing. Robert Sizemore

0.1 Phase Estimation Technique

Bits Superposition Quantum Parallelism

Factoring by Quantum Computers

Keywords Quantum logic gates, Quantum computing, Logic gate, Quantum computer

Quantum Computing Lecture 7. Quantum Factoring. Anuj Dawar

What Has Quantum Mechanics to Do With Factoring? Things I wish they had told me about Peter Shor s algorithm

Introduction to computer science

Quantum Computers. And How Does Nature Compute? Kenneth W. Regan 1 University at Buffalo (SUNY) 21 May, Quantum Computers

Lecture 13: Factoring Integers

"in recognition of the services he rendered to the advancement of Physics by his discovery of energy quanta". h is the Planck constant he called it

Notes on Factoring. MA 206 Kurt Bryan

Quantum Computing and Grover s Algorithm

Quantum Algorithms in NMR Experiments. 25 th May 2012 Ling LIN & Michael Loretz

Shor s algorithm and secret sharing

Quantum Computers vs. Computers

Quantum computing in practice

Lecture 1 Version: 14/08/28. Frontiers of Condensed Matter San Sebastian, Aug , Dr. Leo DiCarlo l.dicarlo@tudelft.nl dicarlolab.tudelft.

Open Problems in Quantum Information Processing. John Watrous Department of Computer Science University of Calgary

Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

= = 3 4, Now assume that P (k) is true for some fixed k 2. This means that

A Recent Improvements in Quantum Model and Counter Measures in Quantum Computing

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

Quantum Computing: Lecture Notes. Ronald de Wolf

QUANTUM INFORMATION, COMPUTATION AND FUNDAMENTAL LIMITATION

The Mathematics of the RSA Public-Key Cryptosystem

Lecture 13 - Basic Number Theory.

Data Storage - II: Efficient Usage & Errors

QUANTUM COMPUTERS AND CRYPTOGRAPHY. Mark Zhandry Stanford University

The finite field with 2 elements The simplest finite field is

arxiv:quant-ph/ v2 19 Jan 2000

Linear Codes. Chapter Basics

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Lecture 9 - Message Authentication Codes

Florida Math for College Readiness

COLLEGE ALGEBRA. Paul Dawkins

SECTION 0.6: POLYNOMIAL, RATIONAL, AND ALGEBRAIC EXPRESSIONS

ECE 842 Report Implementation of Elliptic Curve Cryptography

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

Cryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

Physics 9e/Cutnell. correlated to the. College Board AP Physics 1 Course Objectives

The Quantum Harmonic Oscillator Stephen Webb

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

What are the place values to the left of the decimal point and their associated powers of ten?

A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions

Quantum Computation: a Tutorial

Revised Version of Chapter 23. We learned long ago how to solve linear congruences. ax c (mod m)

Integer Factorization using the Quadratic Sieve

Simulated Quantum Annealer

Quotient Rings and Field Extensions

Number Theory. Proof. Suppose otherwise. Then there would be a finite number n of primes, which we may

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any.

Factoring & Primality

How To Encrypt Data With A Power Of N On A K Disk

K80TTQ1EP-??,VO.L,XU0H5BY,_71ZVPKOE678_X,N2Y-8HI4VS,,6Z28DDW5N7ADY013

Software Tool for Implementing RSA Algorithm

Operation Count; Numerical Linear Algebra

Secure Network Communication Part II II Public Key Cryptography. Public Key Cryptography

Mathematics Course 111: Algebra I Part IV: Vector Spaces

RSA Attacks. By Abdulaziz Alrasheed and Fatima

Quantum Computing. Robert Senser, PhD. CSC 5446 Presentation Spring Version

arxiv:quant-ph/ v1 11 Jul 1996

FACTORING LARGE NUMBERS, A GREAT WAY TO SPEND A BIRTHDAY

2.1 Complexity Classes

Information, Entropy, and Coding

Public Key Cryptography: RSA and Lots of Number Theory

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Quantum Key Distribution as a Next-Generation Cryptographic Protocol. Andrew Campbell

Basic Algorithms In Computer Algebra

Coding and decoding with convolutional codes. The Viterbi Algor

Mathematics of Internet Security. Keeping Eve The Eavesdropper Away From Your Credit Card Information

Quantum control of individual electron and nuclear spins in diamond lattice

Public Key (asymmetric) Cryptography

Measurement with Ratios

The Fourier Analysis Tool in Microsoft Excel

FAREY FRACTION BASED VECTOR PROCESSING FOR SECURE DATA TRANSMISSION

Study of algorithms for factoring integers and computing discrete logarithms

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA

Faster deterministic integer factorisation

(Refer Slide Time: 01:11-01:27)

Software Implementation of Gong-Harn Public-key Cryptosystem and Analysis

The science of encryption: prime numbers and mod n arithmetic

A Concrete Introduction. to the Abstract Concepts. of Integers and Algebra using Algebra Tiles

Transcription:

Designing Scalable Quantum Computer Architectures: Layout and Initialization By DEAN ELBERT COPSEY B.S. (University of California, Davis) 1986 M.S. (University of California, Davis) 1996 DISSERTATION Submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Science in the OFFICE OF GRADUATE STUDIES of the UNIVERSITY OF CALIFORNIA DAVIS Approved: Committee in charge 2005 i

Designing Scalable Quantum Computer Architectures: Layout and Initialization Copyright 2005 by Dean Elbert Copsey

To Michael ten years and still going strong, May the road rise up to meet you. and, in loving memory of Homer Copsey-Waggoner 1992 2004 Until we meet again. ii

Dean Elbert Copsey September 2005 Computer Science Designing Scalable Quantum Computer Architectures: Layout and Initialization Abstract As the complexity of quantum computers increases, typical architectural issues such as communication, layout, and efficient design need to be addressed. This document addresses some basic architectural ideas for quantum computers. To be useful in reasonable calculations, quantum states will need constant error correction. This need guides how best to lay out physical components to minimize errorcorrection overhead. I propose a layout that minimizes communication overhead, and discuss the implementation of error-correction algorithms on that layout. I compare the cost overhead of purely local communication to communication by teleportation, and calculate the break-even point at which teleportation becomes efficient. Additionally, the overhead of error correction can be reduced by using a memory hierarchy to more efficiently store data not currently being computed on. The main requirement is the same as for a classical computer s cache: temporal locality. I show that an important quantum routine can be rearranged to take advantage of a small quantum memory cache, and compute the achieved savings. In a quantum system, complex operators are built up from the basic operators allowed by a given technology model. I show that the set of operators required to implement any complex operator in an error-corrected system can be approximated to arbitrary precision, given two elementary operators. I give results for all the operators in the set. Finally, I examine methods to initialize a quantum system. Quantum operators are reversible, so data cannot simply be written over. Instead, initialization entails compressing iii

the entropy of a set of quantum bits into a small subset of those bits, leaving the rest of the bits in a known, non-random state. I examine three such compression algorithms. The best of these itself requires a pool of known states, and so cannot be used directly. The other two algorithms, however, produce less-than-optimal results. I explore why they produce suboptimal results, and propose that one of the suboptimal algorithms be used to compress the entropy of a subsystem, using the resulting known state to run the optimal algorithm. iv

Contents List of Figures List of Tables v vii 1 Introduction 1 1.1 Design Issues..................................... 2 1.2 Document Structure.................................. 3 2 Background 4 2.1 Quantum Computer Basics.............................. 5 2.2 Quantum Error Correction.............................. 10 2.3 Fault-tolerant Computation.............................. 13 2.4 Quantum Algorithms................................. 15 2.4.1 Shor s Factoring Algorithm......................... 16 2.4.2 Teleportation................................. 18 2.4.3 Teleporting Encoded Data.......................... 20 2.4.4 EPR Generation and Purification....................... 20 2.5 Some Potential Quantum Computer Systems..................... 22 2.5.1 NMR..................................... 22 2.5.2 Ion Traps................................... 24 2.5.3 Solid-State Technologies: The Skinner-Kane Model............ 25 3 Gates for the Skinner-Kane Model 28 3.1 Two Qubit Operators................................. 29 3.2 Single Qubit Operators................................ 32 3.2.1 Methodology and Results.......................... 33 3.2.2 Future Directions............................... 35 4 Quantum Memory Hierarchy 37 4.1 Motivation....................................... 37 4.2 Background...................................... 38 4.3 Memory Hierarchy.................................. 39 5 Error-Correction and Circuit Design 42 5.1 Error Correction Algorithms............................. 44 5.1.1 The [[7,1,3]] Code.............................. 45 5.1.2 Concatenated Codes............................. 46 v

5.2 Communication Costs and Error Correction..................... 47 5.2.1 Error Correction Costs............................ 48 5.2.2 Multilevel Error Correction......................... 50 5.3 Avoiding Correlated Errors.............................. 51 5.4 Teleportation..................................... 52 6 System Initialization: An Analysis of the Schulman-Vazirani Algorithm 54 6.1 Introduction...................................... 55 6.2 The Schulman-Vazirani Heat Engine......................... 57 6.2.1 The Simplified Schulman-Vazirani Algorithm................ 58 6.2.2 Expected Values and Variance........................ 59 6.2.3 Correlation and Covariance......................... 60 6.3 Akira and Kitagawa s Model............................. 62 6.3.1 The Simulator................................ 63 6.3.2 Analysis................................... 66 6.4 The Distillation Model................................ 67 6.4.1 Size and Number of Trays.......................... 68 6.4.2 How Cold is Cold Enough?......................... 69 6.4.3 Effects of Correlation............................. 70 6.4.4 Simulation Setup and Results........................ 72 6.5 The Schumacher Operator.............................. 74 6.6 Conclusions...................................... 75 6.7 Future Directions................................... 76 7 Conclusions and Future Work 77 7.0.1 The Future of Quantum Computing..................... 79 A Operator Approximation 80 B Hand-modeling the H-Tree Layout 90 C Modeling the Schulman-Vazirani Algorithm 93 C.1 sim.cc: C++ Code for Simulating the Distill Variant of the Schulman-Vazirani Algorithm....................................... 93 C.1.1 FindAlt.tcl.................................. 94 C.1.2 FindMax.tcl.................................. 95 C.1.3 SimAlt.sh................................... 98 C.1.4 sim.cc..................................... 99 C.2 akira.cc: C++ Code for Simulating the Akira-Kitagawa Variant........... 119 C.3 schumacher.cc: C++ Code for Simulating Cleve and DiVincenzo s Implementation of the Schumacher Operator............................. 141 Bibliography 151 vi

List of Figures 2.1 Bloch sphere representation of a qubit........................ 6 2.2 Basic quantum operations............................... 6 2.3 Creating a cat state................................. 8 2.4 Measuring X 12, the difference between (parity of) ψ 1 and ψ 2............ 12 2.5 Syndrome Measurement for 3-bit Code. The meter boxes indicate measurement, and the double lines indicate classical communication controlling the application of the Z operator...................................... 12 2.6 Tree structure of concatenated codes......................... 14 2.7 Summing over the Quantum Fourier Transform Vectors............... 18 2.8 Quantum Teleportation................................ 19 2.9 The basic quantum bit technology proposed by Kane, with modifications by Skinner. Qubits are embodied by the coupled nuclear and electronic spin of a phosphorus atom embedded in silicon under high magnetic field (2T) at low temperature (100mK)........................................ 26 3.1 Implementation of a rotation about the swap axis in the Skinner-Kane model... 30 3.2 Approximating an operator.............................. 33 3.3 Pulse sequence to approximate an H operator..................... 34 4.1 Trading computational ease for density........................ 40 4.2 Quantum Fourier transform on nine qubits...................... 40 4.3 Locality in the quantum Fourier transform...................... 40 5.1 Two-rail layout for the three-qubit phase-correction code............. 43 5.2 Schematic layout of the H-tree structure of a concatenated code. The branches labeled D i are for logical data qubits, and consist of two rails of eleven qubits each seven qubits for data and four for ancillae. The branch labeled A 1 is for creating, verifying, and uncreating the cat state......................... 44 5.3 Measuring the error syndrome for the [[7, 1, 3]] error-correction code......... 45 5.4 Swap channel...................................... 48 5.5 Cost of teleportation compared to swapping. The B-values chosen illustrate breakeven points for different levels of recursion...................... 53 6.1 Simplified Schumacher-Vazirani Algorithm..................... 58 6.2 Distribution of an a i,b i pair before CNOT (left), and after (right).......... 59 6.3 Distribution of qubits after one application of the Schulman-Vazirani algorithm... 60 vii

6.4 Distributions of a, b with correlation +0.125 (left) and 0.125 (right). (The dashed line is the distribution for independent qubits.)................ 62 6.5 The Schulman-Vazirani boost operator: a CNOT followed by a controlled-swap ( ) between a and b with inverted control by c................. 64 6.6 Results of applying the Akira-Kitagawa algorithm through several iterations.... 65 6.7 Correlation between qubits after a single application of the boost operator. The correlation value is maximum correlation between this qubit and any other..... 65 6.8 The Distillation Model................................ 67 6.9 The Distillation Algorithm.............................. 68 6.10 Effects of correlation between a i s, between b i s, and between a i and b i....... 70 6.11 Removal of highly correlated bits. Bits with correlation > 0.1 were relocated to the right end of the graph. Three iterations resulted in twelve cold (P 1,out < 0.01) qubits. 72 6.12 The Schumacher operator. The input bits have a probability of 0.2 of measuring to 1. Successive iterations discard most of the hot bits, to extract more cold bits. Ancilla bits consumed by the process are not shown................. 74 B.1 Counting operators for a parity measurement in [[7,1,3]]............... 91 viii

List of Tables 2.1 Phase correction for a 3-qubit code.......................... 12 4.1 Overhead of recursive error correction for a single qubit operator........... 38 4.2 Overhead of [[8, 3, 3]] concatenated with [[7, 1, 3]]on a per-qubit basis........ 40 5.1 Comparison of the cost of swapping an encoded qubit to the cost of teleporting it. The B-values are the distance between adjacent qubits................ 52 6.1 Schumacher Operator for Five Qubits........................ 56 6.2 Results of applying the Schulman-Vazirani boost operator, and the three-bit Schumacher operator.................................... 66 6.3 Distillation results for 128, 256, and 384 bits..................... 73 ix

Acknowledgments I thank my committee members, especially my adviser, Fred Chong for his unfailing optimism and encouragement, and his many lessons in research, teaching, and community work. I also to thank the other faculty with whom I co-wrote papers, John Kubi Kubiatowicz, Isaac Chuang, and Mark Oskin, for teaching me much about research. Additionally, I thank the other students I ve worked with, both at U.C. Davis and in the Quantum Architecture Research Center: my lab mates and fellow architects Tzvetan Metodiev, Darshan Thaker, Jedediah Crandall, Paul Sultana, Ravishankar Rao and John Oliver; the Berkeley crowd under Kubi, including Mark Whitney, Nemanja Isailovich, and Yatish Patel; and the folks at MIT under Isaac Chuang: Ken Brown, and Andrew Cross. Finally, I would like to thank Venkatesh Akella, Khaled Abdel-Ghaffar, Charles Chip Martel, and Umesh Vazirani, all of whom have influenced my research. In addition, I thank Matt Farrens for his advice and guidance on teaching. I thank the department s wonderful administrative staff who kept things running smoothly, but especially the graduate coordinators, who made sure I stayed on the true path to doctorhood: Barbara Weston, Kim Reinking and Mary Reid. I thank Hewlett-Packard Company for allowing me the opportunity of returning to college to finish my degree. If it hadn t been for their massive layoffs, I would still be happily plugging away, writing firmware and programming FPGAs. I thank my parents, for their continued financial and emotional support during the twentyplus years and two careers this degree has taken. And, finally, I thank my partner Michael for putting up with my moods, sharing coffee and comics on the porch, and enduring the lean times to make my dream come true. x

1 Chapter 1 Introduction The only known solutions to many important problems require exponential resources on a classical computer. Quantum computers can solve some of these problems with polynomial resources, which has led a great number of researchers to explore quantum information processing technologies [6, 14, 15, 30, 31, 35, 32]. The largest quantum computers to date have involved only a small number of components (less than 10). Several different technologies have been demonstrated, the two most successful being liquid-phase nuclear magnetic resonance and trapped ions [27, 29, 40, 50]. As the complexity of quantum computers increases, typical architectural issues such as communication and layout will need to be addressed. The goal of computer architecture is to structure the individual bits and wires of a computer system into modules, thus allowing architects to reason about how to best implement the modules (top-down design), as well as how to facilitate the interworking of the modules (glue logic). Quantum computation is still in its infancy. Physicists are still grappling with ways to produce sufficient numbers of interacting quantum devices to be able to solve non-trivial problems. When they succeed, the next step will be to assemble the devices into something useful. This document addresses some basic architectural ideas for quantum computers.

CHAPTER 1. INTRODUCTION 2 1.1 Design Issues Quantum states are fragile compared to classical bits. The probability of introducing an error while applying a quantum operator (the quantum equivalent of a logic gate) is around ten orders of magnitude greater than for a classical logic gate. Quantum data can be protected via error correction using more physical components to represent the data but at a cost of increased complexity for operators. This has profound implications about how best to lay out the physical components to minimize error-correction overhead. I propose a layout that minimizes communication overhead, given some assumptions about the underlying technology model, and discuss the implementation of an error-correction algorithm using the layout. Additionally, I show that there is a break-even point at which using teleportation a mechanism to communicate quantum data over large distances is more efficient than strictly local communication. Other classic architectural ideas can be leveraged to further reduce the overhead of error correction. One such idea is using a memory hierarchy to more efficiently store data that is not currently being computed on. The issues involved are similar to those of a classical computer; in particular, arranging the sequence of operators to take advantage of spatial and temporal locality. I show that an important quantum routine can efficiently use cache blocks, and I compute the reduction of complexity. In a quantum system, complex operators are built up from the basic operators allowed by the particular technology model. There are many open questions about efficiently building complex operators. I chose the following goal: showing that a standard set of universal operators required to implement any operator in an error-corrected system can be efficiently approximated to the precision required for a given technology model. I give results for all of the necessary operators. One last design issue is how to initialize a quantum system. Classical system initialization usually includes a memory check, in which patterns are written to memory to ensure its integrity. If a particular value is required in a location, it is simply written to that location. Quantum systems are not, in general, so easy. Quantum operators are reversible, so data cannot be simply written over. An arbitrary state can only be created from a known starting point. The goal of initialization in a quantum system, then, is creating a pool of known zeroes that can be manipulated to perform calculations. I examine three algorithms to do this. The best requires a pool of known states in order

CHAPTER 1. INTRODUCTION 3 to run, whereas the other two produce less-than-optimal results. I explore why the latter algorithms produce suboptimal results, and propose a combination of the first algorithm with one of the others to bootstrap system initialization. 1.2 Document Structure For reference and nomenclature, Chapter 2 gives background material, including: terminology and concepts; an overview of error correction and fault-tolerant quantum computing; some important algorithms, including applications (factoring, solution-space searches) and utility routines (teleportation, EPR pair generation and purification). And, finally, a discussion of technologies, including: nuclear magnetic resonance, which, while not scalable, was used for the most successful implementation to date and serves as a good introductory point; ion traps; and, phosphorus in silicon. Chapter 3 looks more closely at efficiently implementing operators for the phosphorusin-silicon models. Chapter 4 explores the implications of using a memory hierarchy in a quantum computer, focusing on the factoring algorithm from Chapter 2. Chapter 5 provides detailed error correction algorithms, and discusses tradeoffs for using teleportation as a communication scheme. Finally, Chapter 6 compares three system initialization schemes.

4 Chapter 2 Background While a bit in a classical computer represents either zero or one, a quantum bit (qubit) can be thought of as simultaneously representing both states. More precisely, the state of a qubit is described by probability amplitudes for measuring states representing zero or one. The amplitudes are complex values, with real and imaginary parts, and only turn into real probabilities upon external observation. Unlike classical probabilistic computation, the amplitudes for different computational pathways can cancel each other out through interference. The actual probabilities are determined by the modulus of the amplitude, which is the amplitude multiplied by its complex conjugate (hereafter referred to, somewhat inaccurately, as the square of the amplitude). The key to achieving exponential speedup is that quantum computers directly and simultaneously manipulate probability amplitudes to perform a computation. A system with n qubits has the ability to be in 2 n states simultaneously, each with its own probability amplitude. For example, two qubits can be in a superposition of the states 00, 01, 10, and 11. The work of a quantum computer is to manipulate qubits and the associated amplitude vectors in a useful manner. Any operation on a single qubit can affect all 2 n states. This is often called quantum parallelism, and is a useful way to think about what gives quantum computers such high potential speedups over classical computers. However, only one of these 2 n states can ever be measured. More precisely, measuring a qubit vector is equivalent to calculating the squares of the amplitudes, and probabilistically choosing one state. The amplitude vector then collapses, with a value of one for the chosen state, and zeroes for all other states. For this reason, quantum computers

CHAPTER 2. BACKGROUND 5 are best at NP problems where only a single answer is needed, and the answer can be verified in polynomial time. Designers of quantum algorithms must be very clever about how to get useful answers out of their computations. Grover s search algorithm [22], for example, iteratively skews probability amplitudes in a qubit vector until the probability for the desired value is near 1 and the probability for other values is close to 0. The algorithm can be used to search the entire solution space of an NP problem for a solution. It iterates n times, at which point a qubit vector representing the keys can be measured. The desired key is found with high probability. Another option is to arrange the computation such that it does not matter which one of many highly probable results is measured from a qubit vector. This method is used in Shor s algorithm for prime factorization of large numbers [43] (see Section 2.4.1), building upon modular exponentiation of all states and the quantum Fourier transform, an exponentially fast version of the classical discrete Fourier transform. Essentially, the factorization is encoded within the period of a set of highly probable values, from which the desired result can be obtained no matter which value is measured. Since prime factorization of large numbers is the basis of many modern cryptographic security systems, Shor s algorithm has received much attention. 2.1 Quantum Computer Basics In general, qubits are denoted in Dirac s bra, ket notation. 0 represents a qubit in the zero state 1, and is pronounced ket zero. A generic qubit, ψ, is represented by α 0 +β 1, where α 2 and β 2 are the probabilities of measuring 0 or 1, respectively. 0 and 1 are also sometimes referred to as the computational basis. Another useful way of thinking about a qubit is the Bloch sphere (see Figure 2.1). 0 is up along the ẑ-axis, and 1 is down. Generically, ψ = cos φ 2 0 +sin φ 2 eiθ 1. Operations on a qubit are equivalent to rotations of the Bloch sphere. 2 1 0 represents the column vector [ 1 0 ], where the values are the amplitudes of the possible states. Similarly, 1 represents [ 0 1 ]. The bra notation is used to represent the adjoint (conjugate transpose) of the ket notation. 2 It is interesting to note that a vector on the Bloch sphere only has two degrees of freedom. This is because all operators are unitary they preserve a total probability of unity. Hence, the phase for 0 can be divided out, and kept as a constant. All operators are multiplicative, so this global constant makes no difference, and cannot actually be observed. In general, the zero state for any set of qubits can be thought of having a real, non-negative value. Unfortunately, the Bloch sphere model does not scale to multiple qubits, but it is useful as a visualization tool for single-qubit operations on

CHAPTER 2. BACKGROUND 6 z 0 φ ψ x θ y 1 Figure 2.1: Bloch sphere representation of a qubit X Gate Bit flip, Not X 0 1 1 0 α β = β 0 + α 1 Z Gate Phase flip Z 1 0 0 1 α β = α 0 β 1 H Gate Hadamard H 1 2 1 1 1 1 α β = α+β 0 + 2 α β 1 T Gate T 1 0 0 e i π /4 α β = α 0 + 1 e i π /4 β Controlled Not Controlled X CNot X 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 a b c d = a 00 + b 01 + d 10 + c 11 Swap 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 a b c d = a 00 + c 01 + b 10 + d 11 Figure 2.2: Basic quantum operations

CHAPTER 2. BACKGROUND 7 Figure 2.2 gives a few basic quantum operations that are used throughout this text. These include one-bit operations such as the bit-flip (X), phase-flip (Z), Hadamard (H), and π/8 (T ) gates, as well as the two-bit controlled-not, CNOT. These are given in both their circuit and matrix representations. The matrix representation involves multiplying the operator by the amplitude vector of the quantum states. The X, Y, and Z operators are equivalent to rotating the Bloch sphere by π around the ˆx-, ŷ-, and ẑ-axes, respectively. The T operator rotates around the ẑ-axis by π/4 (it is called π/8 since it is equivalent to [ e iπ/8 0 ] 0 e iπ/8 up to a global phase). Any n-qubit unitary operator may be composed from single-qubit operators and the CNOT operator. A minimal universal set of operators, able to approximate any unitary operator to arbitrary precision, is CNOT, H, and T. Another interpretation of the above operators is that the bit-flip exchanges the probabilities of the two states, while the phase flip changes the sign (phase) between them. The Hadamard takes the two states and mixes them to a halfway state. The controlled-not does a bit-flip if the control qubit is 1: CNOT xy x,x y, where is the usual XOR operator. These basic gates, along with qubit measurement, form the set of operations used for quantum computation. To illustrate that quantum computation is potentially more powerful than classical computation, it useful to look at entanglement. If two qubits are joined to form a system, x y xy, the result is the tensor product (denoted by ) of the vector representations: (α 0 +β 1 ) (γ 0 +δ 1 ) αγ 00 +αδ 01 +βγ 10 +βδ 11 Any single-qubit operator, or tensor product of single-qubit operators, may be applied to the system and the qubits remain independent. However, if one starts with 00 and applies a Hadamard operator to the first qubit, and then uses that qubit as the control in a CNOT on the second qubit (see Figure 2.3), the resulting superposition of states, 1 2 00 + 1 2 11, cannot be split into the tensor product of two qubits. The two qubits share information that neither qubit has alone, and there is no concept of state for the individual qubits. The qubits are now tied together, or entangled: whatever value is measured for the first qubit will also be measured for the second qubit. The amplitudes for 01 and 10 are zero. This particular state is known as an EPR pair (after Einstein, Podolsky and Rosen, who were among the first to investigate such states). It is also called a Bell state, or a cat state, after Schrödinger s infamous thought experiment. Cat states are very important, and are used sets of qubits.

CHAPTER 2. BACKGROUND 8 00 H 0 + 1 0 00 + 11 2 2 Figure 2.3: Creating a cat state extensively in quantum computation. The group of operators given above are sufficient to approximate an arbitrary n-qubit unitary operator to any desired accuracy. For n > 1, however, the approximation is not necessarily efficient, and may be exponential in n. That said, a few more operators are generally used for descriptions of computations. They are R x (θ), R y (θ), and R z (θ), rotations by θ about the ˆx-, ŷ-, and ẑ-axes. R z (π/2) is used often enough to deserve its own name, S. Any two of the arbitrary-rotation operators can be used to efficiently implement any single-qubit unitary operator, U: U = R x (α)r y (β)r x (γ). for some α, β, and γ. that are important 3 : Much like classical gates, there are some basic relationships between quantum operators X 2 = Y 2 = Z 2 = H 2 = I X = X Y = Y Z = Z XZ HZH HXH S 2 = iy = X = Z = Z T 2 = S 3 The symbol indicates the adjoint (conjugate transpose) of the operator. Since all quantum operators are unitary, the operator s adjoint is also its inverse.

CHAPTER 2. BACKGROUND 9 SZ = ZS = S SXS = Y There are also several relationships involving CNOT: 1. X applied to the control input is equivalent to applying X to both outputs. 2. X applied to the target input is equivalent to applying X to the target output. 3. Z applied to the control input is equivalent to applying Z to the control output. 4. Z applied to the target input is equivalent to applying Z to both outputs. 5. H s applied before and after to the target bit converts a CNOT to a controlled-z operator. 6. Two qubits may be swapped with three CNOT s, with the middle CNOT applied in the opposite direction (swapping target and control). One final operator is needed to make quantum computation useful: measurement. Measurement is the one operation allowed in quantum computation that has no inverse. For a quantum system, any observable, such as energy or momentum, can be projected onto one of the eigenstates (basis states) of the measurement operator. Measurement is equivalent to choosing one of the states represented by the qubit(s) based on the probabilities determined by the amplitude vector. For example, if a system of two qubits is in the state 1 2 00 + 1 2 01 + 1 2 11, the probability of measuring either 00 or 01 is 1 4 ; the probability of measuring 11 is 1 2. If only the first qubit were measured, the probability of measuring 0 is 1 1 2, and the system would be in the state 2 ( 00 + 01 ) (see footnote 4); if a 1 were measured (also with probability 1 2 ), the system will be in the state 11. Note that measurement collapses the wave function representing the superposition of states, leaving the system in a state consistent with the measured value: if a qubit is measured as zero, it will be measured as zero from then on, unless another operator is applied (but see the footnote 4 ); if only part of a set of entangled qubits is measured, the rest of the qubits will be in a superposition of states consistent with the value(s) measured. 4 In some schemes, measurement may be destructive, effectively randomizing the qubit. Subsequent measurements may not give the same result, and the resulting state is different than the state before the measurement.

CHAPTER 2. BACKGROUND 10 Measurements are possible in something other than the computational basis. In terms of the Bloch sphere, the usual measurement operator described really measures zero for up and 1 for down, but could just as easily measure left and right (along the ŷ-axis). Usually, though, such a measurement is made by rotating the ŷ-axis to the ẑ-axis (R x (π/2)), using the usual measurement operator, and rotating back. Measuring in the up-down direction is the equivalent of measuring the Z operator, and in the left-right direction, the Y operator. It is also possible to measure a more complicated operator, using a basis with multiple qubits. An example using measurement in other bases is quantum error correction (see the next section), where multi-qubit operators are measured to extract partial information about a block of qubits. The measurement operators essentially return the parity of a sub-block of qubits, rotated to appropriate bases. Much like measuring one qubit of a cat state completely determines the state of the other qubit, the series of parity measurements completely determines the value of the error, which can then be corrected. 2.2 Quantum Error Correction Classical bits are highly immune to noise. One can expect a classical bit to fail every 10 16 operations (about one bit per year, in a typical system). Qubits, on the other hand, can be expected to fail on one out of every 10 5 10 7 operators. Quantum phenomena are constantly evolving with time. Atoms decay. Electrons change orbitals by absorbing or emitting photons. Magnetic spin states of nuclei and electrons flip due to external magnetic fields. A quantum system cannot be isolated to the point where it is completely stable. Hence, if qubits are entangled, they will slowly decohere (lose their unique state properties) due to entanglement with the environment. Since quantum states form a continuum, errors are not limited to full phase- or bit-flips. An error can be a slight deviation from the intended quantum state. As errors accumulate, quantum data stored on the qubits is corrupted and lost. The environment, in the form of a classical controller, acts on a qubit with each application of an operator. The actual applied operator is an approximation of the desired operator, implemented by a classically-controlled physical process, so the operator itself has a non-zero chance of introducing error 5. 5 Non-operator decoherence also occurs. The major mode of non-operator decoherence is dephasing, which has im-

CHAPTER 2. BACKGROUND 11 One way to reduce the effect of decoherence is to encode the state of a single logical qubit over several physical qubits. Peter Shor originally proved the feasibility of quantum error correction [44], by using a three-qubit repetition code to show the feasibility of encoding either quantum amplitude or phase. He then showed that the qubits comprising a three-qubit code for protecting amplitude could each be encoded in the three-qubit code for protecting phase. Here is how: A logical qubit representing 0 ( 0 L ) can be encoded as 000 and 1 L as 111. The superposition α 0 L + β 1 L is encoded as α 000 + β 111 When the logical qubit is measured, if one of the physical qubits is different than the other two, one can assume that it was inadvertently flipped along the way. However, it would be better to determine that a physical qubit had been flipped without having to measure it, since measurement destroys the quantum state. One can measure the difference in value (parity) between any two physical qubits using an ancilla an extra 0 and a circuit like the one in Figure 2.4. By performing two such measurements (see Figure 2.5) one can determine if a single qubit s value is different than the other two, and correct it (see Table 2.1). Protecting amplitude information is not enough, however, since phase information is just as important. Shor noted that a similar circuit (without the Hadamard gates, and turning the CNOT s around) could be used to measure the difference in phase between two qubits. By encoding each of the three qubits in the amplitude-flip code with the three-qubit phase-flip code (nine qubits total), one can measure and correct any single phase or amplitude error. Furthermore, the process of interacting the ancilla qubits with the encoded qubits produces an entangled state. After the measurement, the remaining qubits are in a state consistent with the measurement. That is, if qubit 2 is out of phase with the others, then applying a Z gate will exactly fix the error! The nine-qubit logical codewords for the states 0 L and 1 L are ( 000 + 111 ) ( 000 + 111 ) ( 000 + 111 ) 0 L = 2 2 ( 000 111 ) ( 000 111 ) ( 000 111 ) 1 L = 2 2 To convert 0 L to 1 L requires three single-qubit operators, meaning that at least three independent errors would have to occur before the error-correction scheme fails. Hence, any single error can be reliably corrected. plications for error-correction strategies for stored data.

CHAPTER 2. BACKGROUND 12 0 H H Z 12 ψ2 X ψ 2 ψ1 X ψ 1 Figure 2.4: Measuring X 12, the difference between (parity of) ψ 1 and ψ 2 0 H H Z 12 0 H H Z 23 ψ 3 X Z ψ3 ψ 2 X X Z ψ2 ψ 1 X Z ψ1 Figure 2.5: Syndrome Measurement for 3-bit Code. The meter boxes indicate measurement, and the double lines indicate classical communication controlling the application of the Z operator. The amazing thing about the code is that applying a Z (phase change) operator to each of the nine qubits takes 0 L to 1 L and vice versa. It is the same as applying an X (logical X) operator to the encoded qubit! Similarly, applying an X operator to each of the physical qubits performs a Z operation. Shor s code, based on the classical error correction method of repetition, is termed a [[9, 1, 3]] code nine physical qubits, encoding one logical qubit, with a Hamming distance of three. A code with a distance d is able to correct (d 1)/2 errors. The [[9,1,3]] code contains all of the elements of the quantum error-correcting codes presented in this document. To avoid actually measuring the state of the individual qubits, ancilla qubits are used for parity measurements. In addition, the measurement of the ancillae determines a unique error syndrome, a mapping from the Z 01 Z 12 Error Type Action 0 0 no error no action 0 1 qubit 3 flipped flip qubit 3 1 0 qubit 1 flipped flip qubit 1 1 1 qubit 2 flipped flip qubit 2 Table 2.1: Phase correction for a 3-qubit code

CHAPTER 2. BACKGROUND 13 measured values to the operations necessary to correct the error(s). Much like classical linear error codes, measuring the parity of subsets of the bits determines the error syndrome. The parity measurement tells nothing about the absolute value of the bits, just the relative value. Unlike a classical code, however, the parity measurements are made in a variety of bases: the parity measurement in the computational basis tells which bits have amplitude errors, and the parity measurement in the Hadamard-rotated basis tells which bits have phase errors. Shortly after Shor described the [[9, 1, 3]] code, many researchers [8, 46] showed how to create quantum error correction codes based on classical block codes. One important code is the Steane [[7, 1, 3]] code, which is used throughout most of this document. A generalization of block codes resulted in stabilizer codes, such as the [[5,1,3]] code [28], which is the smallest (densest) known encoding of a single qubit; and the [[8,3,3]] code [1, 17, 47], the densest code encoding three qubits. Larger codes such as [[16, 10, 3]] or [[23, 1, 7]] give greater density or larger Hamming distances. For more on quantum error-correction codes, the reader is directed to the literature [7, 34]. To summarize, errors in quantum circuits are not limited to full phase or bit flips, but can be any complex-valued linear combination of the two. However, when the error syndrome of an error code is determined, the parity measurements collapse the error waveform in the errormeasurement basis. Measuring the error effectively quantizes it so that only X, Y, and Z operators need be applied to correct it. 6 2.3 Fault-tolerant Computation Qubits are subject to decoherence when they interact with the environment. Applying an operator to a qubit is just such an interaction. On the other hand, if an operator could be applied directly to the encoded qubit(s), errors could be detected and corrected. Some stabilizer codes allow easy application of some logical operators, as the nine-qubit code demonstrated. The Steane [[7,1,3]] code is even more flexible, in that X is applied by applying X to all seven encoding qubits. The same is true for the Z, H, Y, and CNOT operators. The S operator requires applying ZS(= S 1 ). The last operator required to create a universal set, the T operator, requires a slightly more 6 This is not entirely true. The measurement and correction will return a valid codeword, or superposition of codewords. If more than (d 1)/2 errors occur, where d is the Hamming distance, then the error syndrome may indicate that no reliable correction is possible. If more than (d + 1)/2 errors occur, the corrections indicated by the error syndrome may take the code to some erroneous superposition of codewords.

CHAPTER 2. BACKGROUND 14 Logical qubit First level of encoding... Second level of encoding......... Figure 2.6: Tree structure of concatenated codes complicated procedure. Any logical operator may be applied in a fault tolerant manner, as long as p, the probability of an error for a physical operator, is smaller than, 1/c, where c is the number of ways two errors can cause an erroneous result during the logical operator application and subsequent error correction. Hence, the overall probability of a non-recoverable error for the logical operator is cp 2, an improvement when p is less than the threshold of 1/c. For the [[7,1,3]] code, c is about 1.7 10 4, assuming a CNOT operator can be applied to any two qubits in the system. If a logical qubit is encoded in n physical qubits, it is possible to encode each of those n qubits with an m-qubit code to produce an mn encoding. Such concatenation of codes can reduce the overall probability of error even further. For example, concatenating the [[7, 1, 3]] with itself gives a [[49,1,7]] code with an overall probability of error of c(cp 2 ) 2 (see Figure 2.6). Concatenating it k times gives (cp)2k c, while the size of the circuit increases by d k and the time complexity increases by t k, where d is the increase in circuit complexity for a single encoding and t is the increase in operation time for a single encoding. For a circuit of size p(n), and a desired probability of success of 1 ε, k must be chosen such that [34]: (cp) 2k c ε p(n) The number of operators to achieve this result is O(poly(log p(n)/ε)p(n)), provided p is below the threshold 1 c. The same results hold for codes other than [[7,1,3]], although there is no guarantee that performing logical operators on these other codes is efficient. For instance, stabilizer codes allow fairly easy application of the operators X, Y, and Z to the encoded qubits. However, for a given stabilizer code, an arbitrary operator may be difficult to perform in a fault tolerant manner. It is important to note that interacting two qubits in a logical encoding is not fault-tolerant. An error on one qubit can propagate through such interactions, even using an intermediate ancilla,

CHAPTER 2. BACKGROUND 15 permanently corrupting the encoded data. For example, in Shor s three qubit example in Section 2.2, a phase error on the ancilla can back-propagate across the CNOT, causing a phase error on the second qubit. Although the measured value of the ancilla would be correct, the data would be left with an uncorrected error. One solution, outlined in detail in Chapter 5, is to use a cat state in place of the ancilla. The cat state can be verified beforehand, since its properties are known. After the parity-calculating CNOT s, the cat state is uncreated (by reversing the order of the CNOT and H operators used to create it), condensing the parity onto a single qubit that can then be measured. 2.4 Quantum Algorithms There are many problems with quantum solutions whose theoretical asymptotic running time is better than the best known classical algorithm for the same problem. In general, the speedup of quantum algorithms is due to quantum parallelism, the ability to change the amplitudes of all states with a single operator. The downsides to quantum algorithms are (1) the answer is a single value, and (2) certain self-referential operations are not permissible ( if i then i = 0 ), since all operators must be reversible. Because the result of a quantum algorithm is a single value, quantum algorithms are best at hard problems whose answers are simple to verify, such as NP-decision problems. Two algorithms that solve NP-decision problems are Shor s order-finding algorithm (2.4.1) and Grover s unorderedsearch algorithm. Shor s algorithm can be used to factor n-bit numbers in O(n 3 ) time. Grover s algorithm can be used to find a pattern that meets some criterion from all possible patterns, and so can be used to solve NP-hard problems. The running time for Grover s algorithm is O(2 n/2 ) for a problem whose solution space is O(2 n ). A classical algorithm solving the same problem has to search through the entire solution space, giving a running time of O(2 n 1 ). While not as dramatic an improvement as Shor s algorithm, Grover s algorithm still out-performs any known classical algorithm for NP-hard problems. Because of the requirement for reversibility of operators, certain algorithms are not feasible. For example, Chapter 6 discusses several randomized methods for sorting 0 and 0 +eiθ 1 2 qubits to opposite ends of a register as a mechanism for system initialization. A better systeminitialization algorithm might be to simply sort the 0 and 1 qubitsr; unfortunately, doing so

CHAPTER 2. BACKGROUND 16 requires a statement of the form if ψ i = 0 then SWAP( ψ i, ψ i 1 ). The resultant absolute ordering retains no information about the initial order of the values, and so is not reversible. A brief description of Shor s algorithm is in the next subsection. Grover s algorithm, adiabatic algorithms, and numerous others, though important, are not discussed here. The interested reader is referred to [22] and [34] for a discussion of Grover s algorithm, and to any of several excellent web-based encyclopedias for discussions of the remainder. 2.4.1 Shor s Factoring Algorithm Perhaps the biggest motivation for research into quantum computation is due to Peter Shor s algorithm for factoring large numbers. Factoring is considered to be a hard problem: the best classical algorithms known require exponential time in the number of bits. Rivest, Shamir, and Adleman [39] have used it for the trapdoor function for RSA security: A message is encrypted by representing it as a number M, raising M to a publicly specified power e, and then taking the remainder when the result [is] divided by the publicly specified product, n, of two large secret prime numbers p and q. Decryption is similar; only a different, secret, power d is used, where e d = 1 (mod ((p 1) (q 1))). The security of the system rests in part on the difficulty of factoring the published divisor, n. Clearly, an algorithm that makes factoring exponentially easier is of immense interest. Shor s algorithm [43] can factor large numbers in polynomial time on the size of the representation (i.e., the number of bits), using modular exponentiation and an inverse quantum Fourier transform. The algorithm factors a large composite number, n, by finding the order of an element x of the group Z n. That is, for some integer x that is relatively prime to and smaller than n, the algorithm finds r such that x r = 1 (mod n). If r is even, and x r/2 1 (mod n), then (x r/2 +1)(x r/2 1) = kn, where k is an integer, since x r = kn+1. With very a low probability of failure, O( 1 n ), at least one of (x r/2 + 1) or (x r/2 1) divides n. The algorithm relies on the fact that if x r = 1 (mod n), then x br+k = x k (mod n), for any positive integers b and k. That is, f(a) = x a (mod n) is a periodic function, with a period of r, or equivalently, a = br+k, for some b and k.

CHAPTER 2. BACKGROUND 17 The first step of the algorithm is to make sure that n is not prime, and has more than one prime factor, since the probability of success also depends on the number of prime factors. Both of these requirements have polynomial-time tests. The algorithm proceeds as follows: 1. Start with two quantum registers, A and B. B should have log 2 n qubits (large enough to hold n), while A should have log 2 q = 2 log 2 n +1 qubits, where q is the smallest power of 2 greater than 2n 2. (Notice that the largest value that can be held by A is q 1.) 2. Put A B into the uniform superposition, 1 qubit in A. q 1 2 q 1 a=0 a 0, by applying an H-operator to each 3. Calculate x a (mod n) in the B register: 1 q 1 2 q 1 a=0 a xa (mod n). This can be done using standard arithmetic: modular multiplication is built up from modular addition, and modular exponentiation from modular multiplication, in O((log n) 3 ) time. 4. Finally, apply the quantum Fourier transform 7, a x a 1 q q 1 c=0 e2πiac/q c x a, and measure A to get c. To find r from c, it is important to look at the result of the Fourier transform, to see what values of c are most probable. The system A,B after the transform is in the following state: A,B = 1 q q 1 q 1 a=0 c=0 e 2πiac/q c x a Remembering that a = br+k for integers b and k, the probability of measuring a particular c and x k is 1 q 2 e 2πiac/q a:x a =x k = 1 q (q k 1)/r 2 e 2πi(brc+kc)/q = b=0 1 q e2πikc/q (q k 1)/r 2 e 2πibrc/q b=0 where the term e 2πikc/q is a constant with magnitude 1, and can be factored out of the summation. The terms being summed are all vectors on a unit circle in the complex plain (see Figure 2.7). If rc/q is close to 1, then the terms in the summation are about the same (that is, close to 1+0i), and constructively interfere. However, if rc/q is not close to 1, the terms are spread out over a unit circle on the complex plain, and destructively interfere to a sum close to 0. 7 The quantum Fourier transform is analogous to the discrete fast Fourier transform, and can be applied bit-wise, using controlled rotations. See Figure 4.2 on p.40 for an example. For an in-depth discussion, see [34]. The running time for the fast quantum Fourier transform is O((logn) 2 ).

CHAPTER 2. BACKGROUND 18 (a) (b) Figure 2.7: Summing over the Quantum Fourier Transform Vectors The two graphics above show 32 vectors being summed together. In (a), the vectors cover from 2ibrc/q = 0 to pi/2 (1/4 revolution), while in (b), the vectors cover from 2ibrc/q = 0 to 5π/2 (1-1/4 revolutions). The center of the small white circle indicates the resulting mean of the vectors (sum of the vectors divided by the number of vectors). Notice in (a), the small white circle is close to the outside of the unit circle, indicating an amplitude close to 1, whereas in (b), the amplitude is much smaller. The terms constructively interfere when rc mod q < r 2, or, equivalently, when r 2 < rc sq < r 2, for some integer s. Rearranging gives c q s r < 1 2q, implying that c q approximates s r. One can find the closest approximation s r, r < n for c q by truncating the continued fraction8. Given r, both x r/2 ± 1 can easily be calculated, and tested as divisors for n. 2.4.2 Teleportation It is possible to use many quantum error correction codes in a quantum computer, but converting between them can be problematic. Decoding and re-encoding can randomly propagate errors across qubits and compromise reliability. Fortunately, there is a special way to convert between codes that avoids this problem. This method involves the quantum primitive of teleportation [19]. As it turns out, teleportation is not only a good way to convert between codes, but it is also a good way to transport quantum data between different parts of the system. Quantum teleportation is the re-creation of a quantum state at a destination using the 8 Euler s continued-fraction algorithm uniquely represents a real number x as x = a 0 + 1 a 1 + 1 a 2 + where the a i are integers. If the series {a 0, a 1, a 2,... is truncated, the result is an approximation of x, with more terms giving a closer approximation. If x is rational, the series a i will terminate; if irrational, it won t.

CHAPTER 2. BACKGROUND 19 a H EPR Pair b c X Z a Figure 2.8: Quantum Teleportation shared quantum state of an EPR pair, 00 + 11 2, split between the source and the destination, and two measurement results that must be communicated as classical bits along conventional wires or other media. Figure 2.8 gives a schematic of the process, also described by this (typical) narrative: suppose Bob and Carol have the two qubits, b and c, of an EPR pair. Carol takes c to New York, leaving Bob (and b ) in Davis. Next Alice, who has the quantum state a, wants to send that state to Carol. She and Bob perform a CNOT, using a as the source, and then Alice performs an H operator on a. Next, they measure a and b, and send the two one-bit results to Carol over classical media. (In the figure, quantum data is denoted by the solid single lines, while classical data is represented by solid double lines.) Based on the sent bits, Carol applies either an X-gate, a Z-gate, or both to patch up c, which now has the original state of a. One way to think about what is happening is as follows. The CNOT operator transfers amplitude information from a to b (and phase information from b to a ), causing a partial rotation of b with respect to c. When b is measured, this angular difference is maintained, meaning that c has the amplitude characteristics of a, modulo a bit flip. The H operator applied to a rotates it so that its new phase (as rotated by the CNOT with b ) is measured. The two measurements provide information about the amplitude and phase errors of c relative to the original state of a, and the classical bits transmit that information. Two interesting asides are: 1. If the original state of a were not destroyed, teleportation would violate the no-cloning theorem [54]: duplicating both its phase and amplitude potentially allows enough information to be gathered to violate Heisenberg s uncertainty principal; and,

CHAPTER 2. BACKGROUND 20 2. Until the classical information is obtained, Carol cannot ascertain anything about a. Her qubit still has equal probabilities of measuring to 0 or 1, since the two classical bits that indicate whether or not to flip amplitude and phase can be any of 00, 01, 10, and 11 with equal probability. That is, teleportation does not provide a means of faster-than-light communication, despite being (in Einstein s words) spooky action at a distance. 2.4.3 Teleporting Encoded Data In order to teleport encoded data, the source and destination qubits of an EPR pair are encoded using the source and destination codes (which can differ). The source qubit is then teleported using the logical CNOT, H, and measurement operators for the source code, and the X and Z operators of the destination code, just as in the basic teleportation algorithm. If the qubits in the EPR pair are encoded in different codes, this provides a means to convert between different error correction codes, without decoding the data first. Since EPR pairs have well-known properties, it is possible to verify its encoded halves of the pair before its use, to reduce the possibility of error (see the next section on purification.) It is possible to teleport encoded data one qubit at a time, error correcting after the entire block has been transferred. However, if transporting EPR pairs is expensive compared to transporting ancilla, it makes sense to transport the encoded data as a block instead. At what point to teleport blocks or single qubits depends on the cost of creating and transporting valid EPR pairs. 2.4.4 EPR Generation and Purification Teleportation plays an important role in quantum computation. It uses quantum entanglement as a resource, and depends on entangled qubits being available at the source and destination. Creating entanglement, however, is a purely local operation. (The circuit required for creating a cat state was described in Section 2.1.) This means the qubits of an EPR-pair need to be transported to the locations where they will be used, while ensuring that their state is still pristine. An error in the EPR pair renders it useless for teleportation, since data teleported with it will be corrupted. Transporting the pair of qubits depends on the technology involved. The next section goes into more details, but in general, there are two methods of moving data around: using the

CHAPTER 2. BACKGROUND 21 SWAP gate to move values from location to location, or forcing the physical implementation of a qubit to change locations. In either case, the type and/or number of operations involved are likely to cause errors, either by rotating the phase of the cat state, or by (partially) inverting one of the bits. The nice thing about an EPR pair is that it is a commodity state: pure entanglement can be supplied by any valid EPR pair. Since the state of an EPR pair is well-defined, it is easy to verify. The two qubits will have even parity in the usual measurement (Z) basis, as well as in the Hadamard-rotated (X) basis. This parity can be measured using the circuit in Figure 2.4. Moreover, an EPR pair is highly tolerant of any noise that might be introduced by the parity measurement: if one of the operands of a CNOT is a 0, the other operand is unchanged. If the probability of error during EPR generation is low, a single ancilla can be used to verify multiple EPR pairs. (But if the ancilla measures to 1, then all of the pairs must be discarded.) Making sure that the EPR pairs are valid after transport is slightly more difficult, since the qubits are no longer close enough to allow a single ancilla to verify the pair. Instead, the pair must be purified in the following fashion: Given three EPR pairs, two of the pairs can be sacrificed to verify the state of the third. As in the single ancilla case, one of the EPR pairs is used as a CNOT target for both qubits in the pair to be verified. If there is an error, the qubits in the sacrificial pair won t measure the same. The same operation is performed again, but in the rotated basis, to verify the phase. As in the local case, the two sacrificial pairs can be used to verify multiple EPR pairs if the probability of error is sufficiently small. Finally, the measurements in the rotated basis do not actually require Hadamard gates. Using the fact that the CONTROLLED-Z operator is directionless (it changes the phase of 11, which is part of both the control and target qubits state), and that rotating to the X basis before applying a CNOT is the equivalent of applying a CONTROLLED-Z operator, the relative phase of the EPR-pair qubits can be measured simply by applying the CNOT in the opposite direction (see page 9).

CHAPTER 2. BACKGROUND 22 2.5 Some Potential Quantum Computer Systems 2.5.1 NMR The most successful implementation of a quantum computer to date is the seven-qubit nuclear magnetic resonance (NMR) computer [49]. NMR is an excellent proving ground for the basic ideas of quantum computing, since it operates at normal conditions, namely room temperature and atmospheric pressure. Additionally, NMR systems are well understood and are in widespread use in medicine and analytical chemistry. The basic idea behind the usual NMR systems is that some atomic nuclei have magnetic spin states. If the nucleus is placed in a magnetic field, it behaves much like a bar magnet, preferentially choosing the lower energy state with its spin aligned to the external field. One difference is that spin is quantized, and the spin states have different energies, and the difference is proportional to the strength of the magnetic field. The energy difference between states determines the resonance frequency; the nucleus will strongly absorb and re-emit photons at this frequency. Incident photons at the resonant frequency are effectively scattered, allowing them to be detected. The amount of magnetic field seen by the nucleus depends on the strength of the magnetic field created by the cloud of electrons around it. The denser the electron cloud, the greater the effective shielding, and the lower the energy difference between the spin states. The density of the electrons in a molecule vary from atom to atom, even for atoms of the same type, due to differences in each atom s neighbors. For example, the standard zero in NMR for analytical chemistry is tetramethyl-silane (TMS). TMS has a silicon atom attached to four methyl (CH 3 ) groups. Silicon is more electropositive than most elements found in organic chemistry, and tends to lend electrons to the methyl groups, further shielding and thus lowering the resonant frequency of the hydrogen nuclei 9. The molecules used in NMR quantum computation typically have extremely electronegative atoms, such as fluorine, and extremely electropositive atoms, such as iron, to maximize the difference in electron density seen by each atom. This increases the separation between the resonant frequencies. For quantum computation, nuclei with two spin states are ideal, since they form a natural 9 Regular carbon, 12 C is spin 0 (has no spin), and doesn t resonate. 13 C is spin- 1 / 2 with two spin states (± 1 / 2 ), but is relatively rare, meaning its resonant scattering is hard to detect in normal compounds.

CHAPTER 2. BACKGROUND 23 qubit. Such nuclei include 13 C, 19 F, 1 H, 15 N, and 31 P. By applying a short microwave 10 pulse at the resonant frequency, the nucleus can be forced to oscillate between the spin states. The period of oscillation depends on the strength of the pulse. Single qubit operators are applied using timed pulses, allowing for arbitrary rotations around the ˆx-axis. When the pulse is turned off, the qubit precesses around the ẑ-axis 11. Two qubit gates are mediated by the difference in energy between n e and n e, which is much smaller than that of a single qubit. Hence, low frequency pulses induce oscillation between 01 and 10, or a rotation in the SWAP direction 12. Using a series of tuned high- and low-frequency pulses, any arbitrary operator can be applied. Implementation Issues NMR computation occurs in a large ensemble of molecules in a liquid medium at room temperature. A large number of molecules is required in order to scatter enough photons to be detected. The molecules are initially at thermal equilibrium, instead of in some known, initialized state (such as all 0 ). Hence the results of any computation will also be a blend of the possible answers, with signal strengths relative to the corresponding starting state. One way to obtain meaningful results is to use a technique called temporal averaging, wherein the same computation is run multiple times, but with the starting qubits swapped around to different positions. With enough different starting points, it is possible to cancel out the effects of the equilibrium state, leaving only a pure result. However, the number of runs required is exponential in the number of qubits, and, since the computed signal is the difference between measured states, the required sensitivity and precision of the measurements also increases exponentially. This places a limitation on the practical number of qubits available in an NMR system. Furthermore, there is no way to measure intermediate results. Without measurement, there is no way to perform error correction, so all calculations must be performed well within the expected decoherence times. Additionally, any arithmetic that could be performed on a classical computer for the algorithms in Section 2.4 instead must be implemented on the quantum computer, 10 Actually, a radially polarized oscillating magnetic field. 11 The relevant physics is beyond the scope of this document. However, it is similar to the way a gyroscope precesses when resisting an orthogonal rotation due to the addition of spin vectors. 12 This is a simplification. In general, a n nucleus is balanced by a e electron cloud. Hence, bonding electrons can transfer spin between nuclei, even when the nuclei are separated by more than one bond.

CHAPTER 2. BACKGROUND 24 before measurement. Finally, coming up with molecules with adequate separation between a large number of resonant frequencies is difficult. The seven-qubit quantum computer used five 19 F and two 13 C as qubits, along with a highly electropositive iron complex. What makes NMR computing so attractive is that well defined pulses give very clean gates. If the nuclei could be initialized, then temporal averaging wouldn t be necessary, and quantum computers would only be limited by the separation between resonant frequencies and decoherence times. This motivated Schulman and Vazirani s algorithm, which is discussed in Section 6.2. On the other hand, if the qubits were also independently addressable and measurable, then there wouldn t be the need for separation between resonant frequencies, or a limitation on the length of computation, and computation would be truly scalable. This is the motivation behind the scheme proposed by Kane (see Section 2.5.3). 2.5.2 Ion Traps A second technology that has been demonstrated for quantum computing is ion traps. As the name implies, qubits are encoded in the state of a system of ions 13. The ions are trapped, or held in place, by an oscillating, saddle-shaped electromagnetic field, causing the ion to follow a circular path. There have been several ion-trap schemes proposed and built. The first, proposed by Cirac and Zoller [9], involves ions in a linear trap. Single qubit gates are implemented by using lasers to excite the nth ion from the ground electronic state, 0 = g n, to an excited state, 1 = e n. Multiple qubit gates involve transferring one of the electronic qubit states to a vibrational state for the group of qubits being operated on. Quantum mechanics has a set of allowable transitions, while forbidding others. A two-qubit gate is accomplished by exciting the allowed transition using lasers, and then converting the vibrational qubit back to an electronic qubit. In fact, the smallest demonstrated twoqubit system involved a single ion [33]: one qubit was the electronic state, the second a vibrational state. This single-ion system could effectively implement a CNOT gate! In general, an ion-trap system can be initialized by laser-cooling the qubits in the following way: A laser is tuned slightly below the electron-transition resonant frequency of the ion. Ions 13 Typical ions used are from the alkaline earth family, and include Be +, Ca +, Sr +, and Ba +.

CHAPTER 2. BACKGROUND 25 that are moving toward the laser beam perceive a Doppler-shifted energy that is just high enough to absorb. The direction of the absorbed momentum is in the opposite direction of movement, so the ion must emit a photon to re-balance its momentum and energy. Ions moving in the opposite direction perceive photons that have too low an energy to absorb. Hence, ions that absorb and reradiate photons are forced to lose vibrational energy. This is repeated until the ion is in its lowest vibrational energy state. The drawback with the Cirac and Zoller method is that as the number of qubits in the trap increases, the energy separation between vibrational states decreases 14. Kielpinsky, et.al., proposed a solution [26]: create an array of multiple traps such that qubits can move between the traps. Their demonstration system involved four qubits moving between a pair of traps. The traps were created from micromachined alumina, with spaced electrodes trapping the four qubits. For scalability, they proposed using a ladder-like structure of arrays. The steps of the ladder would be used for processing, and the rails for communication and storage. The current drawback to the Kielpinsky scheme is that moving ions is a slow process. In order to minimize errors during movement, the qubits must be accelerated very slowly. Movement is measured in milliseconds, while other operations are measured in microseconds. Even so, moving an ion heats it (adds vibrational quanta), especially when the ion needs to be separated from other ions in a trap. 2.5.3 Solid-State Technologies: The Skinner-Kane Model While experimentalists have examined many technologies for quantum computation in addition to NMR and ion traps, the most promising are solid-state technologies, such as Josephson junctions [55, 51], SQUIDs [11], electron spin resonance transistors [53], and the Skinner-Kane model of phosphorus embedded silicon [25, 45]. Of the solid-state proposals, I ve focused on the Skinner-Kane scheme, although I believe many of my results can be applied to any two-dimensional fixed-frame scheme. The key features of the Skinner-Kane platform are: 1. Quantum bits are laid out in silicon in a 2D fashion, similar to traditional CMOS VLSI. 14 Cirac and Zoller proposed a solution, which was to have a two-dimensional array of traps, with a read-write head ion trapped in a plane slightly above it. The head would be trapped by perpendicular oscillating electromagnetic fields.

CHAPTER 2. BACKGROUND 26 Classical control gates A S S A 20nm + P 31 + P 31 20nm 15-100nm Ground plane Figure 2.9: The basic quantum bit technology proposed by Kane, with modifications by Skinner. Qubits are embodied by the coupled nuclear and electronic spin of a phosphorus atom embedded in silicon under high magnetic field (2T) at low temperature (100mK). 2. Quantum interactions are near-neighbor between bits. 3. Unlike ion traps, quantum bits can not move physically, but quantum data can be swapped between neighbors. 4. The control structures necessary to manipulate the bits prevent an ultra-dense, nanometerscale 2D grid of bits. Instead, we have linear structures of bits which can cross, but there is a minimum distance between intersections to allow classical control structures, such as wire traces and single-electron transistors [37]. These four assumptions apply to several solid-state technologies 15. For concreteness, I have focused on an updated version of Kane s phosphorus-in-silicon nuclear-spin proposal [45]. Figure 2.9 illustrates important dimensions of the Kane scheme. Shown are two phosphorus atoms spaced 15-100 nm apart. Quantum states are stored in relatively stable electrondonor (e 31 P + ) spin pairs, where the electron (e) and the donor nucleus (n) have opposite spins. The basis states, 0 and 1, are defined as the superposition states 0 e n + e n and 1 e n e n. Twenty nanometers above the phosphorus atoms lie three classical control gates: one A gate and two S gates. Precisely timed pulses on these gates provide arbitrary oneand two-qubit quantum operators. Single qubit operators are composed of pulses on the A-gates, modulating the hyperfine interaction between electron and nucleus to provide ẑ-axis rotations. A globally applied static magnetic field provides rotations around the ˆx-axis. By changing the pulse widths, any desired rotational operator may be applied. 15 On-going research in my group includes applying my results for Skinner-Kane to ion traps, and comparing with other layout schemes.

CHAPTER 2. BACKGROUND 27 Two-qubit interactions are mediated by S-gates, which move an electron from one nucleus to the next 16. A SWAP operator can be implemented by moving e 1 to n 2, and e 2 to n 1, allowing an interchange of state between the electrons and nuclei, and moving the electrons back. (This is actually a multi-step process, since the electrons must be move one at a time.) The allowable spacing between phosphorus atoms is currently a topic of debate within the physics community, with conservative estimates of 100 nm, and more aggressive estimations of 15 nm. The tradeoff is between noise immunity and difficulty of manufacture. In my research, I used a figure (60nm) that lies between these two. This choice implies that the A- and S- gates are spaced 20nm apart, which allows for reasonable trace widths. Quantum computing systems display a characteristic tension between computation and communication. Fundamentally, technologies that transport data well do so because they are resistant to interaction with the environment or other quantum bits; on the other hand technologies that compute well do so precisely because they do interact. Nuclear-spin based, solid-state technologies are good at providing scalable computation but complicate communication, because their information carriers are non-mobile: the Kane proposal s phosphorus atom does not move, hence transporting this state to another part of the chip is laborious and requires swapping states between adjacent atoms. As an architect, I have looked at optimizing the tensions between communication, errorcorrection, and computation, and between classical control and quantum effects. The rest of this document describes my work and conclusions. In Chapter 3, I look at building the basic operators of computation and error correction, given a classical control model. In Chapter 4, I detail the implications of error correction on building quantum data registers. Chapter 5 describes the implications and costs of error correction has when building quantum communication channels. Finally, in Chapter 6, I investigate entropy removal algorithms for system initialization and 0 ancilla generation, both of which are necessary for building scalable quantum systems. 16 Kane s original proposal used a single electrode to induce electronic coupling between adjacent nuclei, much like NMR. In order to work properly, however, the placement of the phosphorus atoms within the silicon crystalline structure would have to be exact. Skinner s extension eliminates this requirement.

28 Chapter 3 Gates for the Skinner-Kane Model As discussed in the previous chapter, operators in the Skinner-Kane model are implemented using a series of pulses on two types of electrodes, or gates. The A-gates control the position of the non-bonded valence electron relative to the nucleus, either permitting or prohibiting the exchange interaction that implements a rotation around the ẑ-axis. The S-gates are positioned between the phosphorus atoms, allowing electrons to be moved between the nuclei. Skinner and Kane describe a two-qubit SWAP operator. However, the description I present here (and in [23]) seems clearer for non-physicists, since it clearly shows that the implementation of SWAP requires performing single-qubit exchange operators between the electron of one qubit and the nucleus of the other. However, single qubit operators are problematic. While it is possible to have pure magnetic (ˆx-axis) evolution, hyperfine (ẑ-axis) evolution occurs in conjunction with magnetic evolution. Skinner and Kane s solution to the single qubit operator is to compose a finite duration, t, of hyperfine evolution with a large number, a, of short t = t/a steps of hyperfine and magnetic evolution corrected, on the fly, by time-reversed t/2 steps of solely magnetic interaction which should approximate a pure hyperfine evolution. The problem is, the approximation is not close unless the number of steps is very large. To approximate an operator within tolerable levels of error, the number of steps needs to be much larger than the number proposed by Skinner and Kane. It seems desirable to find a short set of pulses of ẑ -axis (hyperfine and magnetic) evolution or ˆx-axis (pure magnetic) evolution that will produce any arbitrary operator. Using the parameters of the original

CHAPTER 3. GATES FOR THE SKINNER-KANE MODEL 29 Kane proposal, I will show in this chapter that it is possible to compose pulses to approximate I, H, X, Z, S, and T operators to within an error of < 10 5, using from one to three full-evolution periods. The search method is a simple minimization algorithm, using random starting points. In the process, I found many local minima that were not close enough. This work is preliminary, since a better understanding of the parameters involved may produce a better tool for searching for accurate operators. 3.1 Two Qubit Operators As suggested by Skinner [45], the basic two-qubit operator is SWAP, which is accomplished by shuttling electrons between adjacent nuclei, allowing the electrons of the qubits to exchange state with the other qubits nuclei, then shuttling the electrons back. The exchange interaction occurs anytime the A-gate is off, allowing the electron to orbit the nucleus. When an electron exchanges with its own nucleus, the hyperfine exchange operator implements a rotation around the ẑ-axis: a rotation by π is the Z operator, and by π/2, the S operator 1. For the cross-exchange, however, a rotation by π implements a SWAP operator, and by π 2, the SWAP operator. A SWAP operator and its inverse (a rotation by 3π 2 ), combined with several single qubit operators, can be used to implement a CNOT operator [5, 13]. The steps for the SWAP operator are shown in Figure 3.1. In the figure and the following, n 1 (n 2 ) refers to the first (second) nucleus, whose electron is e 1 (e 2 ). Electrons are moved by turning the A- and S-gates off and on: a gate drawn with a solid box is on (positive voltage), an open box (zero voltage), off. The essence of the operator is, first n 1 s electron, e 1, is moved out of the way (t =1,2,3), and e 2 is moved to n 1 (t =1,2,3,4). A 2 is turned off, allowing e 1 and n 2 to exchange state (t =5). At this point, the original qubits are held on the two electrons, e 1 e 2, and the two nuclei, n 1 n 2. Next, e 2 is moved out of the way (t =6 12), e 1 is moved to n 2 (t =8 12), and e 1 and n 2 swap state (t =13). Finally, the electrons are moved back to their original nuclei (t =14 17). This work only scratches the surface of two-qubit operators for the Skinner-Kane model. 1 This is not quite true. Magnetic evolution (ˆx-rotation) occurs simultaneously with the hyperfine interaction, whether the A-gate is on or off. Operators to produce a pure exchange (including Skinner s original proposal) are discussed in the next section.

CHAPTER 3. GATES FOR THE SKINNER-KANE MODEL 30 S R 0 S L 2 S1 L A 1 S1 R SL 2 A S2 R 2 S0 R SL 1 A 1 S1 R SL 2 A S2 R 2 S2 L t =0 e n1 e 2 n2 t =10 e 1 e 2 n 1 n 2 t =1 e 1 n 1 e 2 n 2 t =11 e 1 n 1 n 2 e 2 t =2 e 1 n 1 e 2 n 2 t =12 n 1 e 1 n 2 e 2 t =3 e 1 n 1 e 2 n 2 t =13 n 1 e n1 2 e 2 t =4 e 1 e 2 n 1 n 2 t =14 n 1 e 1 n 2 e 2 t =5 e 1 e n2 1 n 2 t =15 n 1 e 1 n 2 e 2 t =6 e 1 e 2 n 1 n 2 t =16 n 1 e 1 n 2 e 2 e t =7 1 n 1 e 2 n 2 t =17 e 1 n 1 e 2 n 2 t =8 e 1 n 1 e 2 n 2 t =18 e 1 n1 e 2 n2 t =9 e 1 e 2 n 1 n 2 Figure 3.1: Implementation of a rotation about the swap axis in the Skinner-Kane model.

CHAPTER 3. GATES FOR THE SKINNER-KANE MODEL 31 Isailovic and Whitney give a complete classical control circuit for the SWAP operator described above in [23]. Here are some additional thoughts: For the gate pulses that control electron movement, smaller slew rates allow for smoother transfer when moving an electron between gates, since the electron s wave function is modified more slowly. This reduces unintended noise and external entanglement, but also increases the time required for an operator. The electrons must be separated by at least two gates when shuttling; otherwise, an electron might be attracted to the wrong gate. This may still be optimistic, since electrons can easily tunnel short distances, and inadvertently swap. It may be necessary in practice to add a third S-gate between qubits, with a negative voltage applied to it to truncate the electron s wave function and prevent tunneling. There is no requirement that the cross-exchange (n 1 e 2 and n 2 e 1 ) interactions be symmetric, except that the result must map to the qubit ( 0 / 1 ) subspace. Nor is there a restriction on the number of cross-exchange interactions allowed. Using three or more cross-exchange interactions may allow for faster implementations of important two- and three-qubit operators, such as CNOT, and the Toffoli (C-C-NOT) and Fredkin (C-SWAP) operators. (Goan and Milburn [16] give an asymmetric, non-adiabatic implementation of a CNOT operator, but it is based on Kane s original proposal, in which two-qubit interactions are mediated by an induced pseudobond between adjacent phosphorus atoms, and not by explicit electron shuttling.) One final note is that the timing of the gate pulses is somewhat critical. In the next section, I discuss single qubit operators. However, when no operator is applied, the electron-nucleus pair still experiences the exchange interaction (A-gate off, ẑ-axis rotation), and magnetic evolution ( ˆx-axis rotation). A complete rotation around either axis is the equivalent of an identity (I) operator. All operators must occur in the same time frame as one or more complete identity operators, otherwise, they will add unintended rotations.

CHAPTER 3. GATES FOR THE SKINNER-KANE MODEL 32 3.2 Single Qubit Operators In the Skinner-Kane model, the qubit precesses around the ẑ-axis any time the A-gate is at 0V. There is a global clock to control the A-gate pulses, with the minimum pulse being one clock tick. The pulse clock frequency is chosen so that a complete rotation around the ẑ-axis takes 256 clock ticks. Similarly, the global static magnetic field is adjusted so that, when the A-gate is on, a complete rotation around the ˆx-axis requires 96 clock ticks. Using Euler s theorem for constructing arbitrary rotations from a sequence of rotations about distinct axes [34], any single-qubit operator can be composed by partial rotations around these two axes. In general, a single-qubit operator U can be composed from three such rotations, U = R x (φ 1 )R z (φ 2 )R x (φ 3 ). (3.1) There are several difficulties with this. First, the ˆx-axis rotation occurs at all times, whether the A-gate is on or off. Hence, when the A-gate is off, the actual per-clock-tick rotation 1 is around the axis given by the vector sum of the two rotations, 1 2 2π 256 + 1 2 96 radians. 256ẑ+ 1 96 Skinner proposed obtaining pure hyperfine (ẑ-axis) rotation by using ˆx, and has a magnitude of...the Trotter approximation, e ih At/ (e +ih B t/2 e i(h A+H B ) t/ e +ih B t/2 ) a, and compose a finite duration, t, of hyperfine evolution with a large number, a, of short t = t/a steps of hyperfine and magnetic evolution corrected, on the fly, by time-reversed t/2 steps of solely magnetic interaction. The problem with this approach is that it is time consuming. The time-reversed e +ih B t/2 terms are the inverse of a single clock-tick rotation around the ˆx-axis, that is, one clock tick shy of a complete 2π rotation. These need to be repeated 24 times to create a complete Z-operator. The outside rotations can be combined to only require 254 clock ticks, but this still requires 25 complete 256-clock-tick cycles for a single gate. Similarly, an S operator requires 13 complete cycles, while the S operator requires 37. A more efficient scheme, suggested by Euler s theorem, is to create operators directly from the two available rotations (around the ˆx- and ẑ -axes). The second difficulty is that in Eq. 3.1, φ 1 and φ 3 must be integer multiples of 2π/96, and φ 2 must be a multiple of 2π/256 (if it were actually around ẑ), since the pulses are an integer number of clock ticks. The third difficulty: the three rotations combined must take the same amount

CHAPTER 3. GATES FOR THE SKINNER-KANE MODEL 33 FINDAPPROXOP(OPERATOR A, INT ticks[], INT maxtick) 1 better i = 0 2 do 3 distance = GETDISTANCE(A, MAKEOP(ticks[], maxtick)) 4 for i = 1 to SIZEOF(ticks[]) do 5 ticks[i] = ticks[i] + 1 6 if distance > GETDISTANCE(A, MAKEOP(ticks[], maxtick)) 7 better i = i; better dir = 1 8 distance = GETDISTANCE(A, MAKEOP(ticks[], maxtick)) 9 fi 10 ticks[i] = ticks[i] 2 11 if distance > GETDISTANCE(A, MAKEOP(ticks[], maxtick)) 12 better i = i; better dir = 1 13 distance = GETDISTANCE(A, MAKEOP(ticks[], maxtick)) 14 fi 15 ticks[i] = ticks[i] + 1 16 od 17 ticks[better i] = ticks[better i] + better dir 18 until better i == 0 Figure 3.2: Approximating an operator Given an ideal operator, A; transition times between one-tick operators; and the maximum number of clock ticks for the operator; iteratively refine an approximate to the operator by moving transitions. of time as one or more 256-clock-tick identity operators, or else the operator will introduce a phase change relative to the rest of the system. The next section describes my search for sequences of pulses to create each of the operators of the universal set, X, Z, H, S, and T, given a particular model 2. Within this model, it is possible to approximate each of the operators to within a small error, so that applying the approximated operator followed by the inverse of the exact operator results in an error probability, p. For sustainable, fault-tolerant computation, p must be less than the Threshold-Theorem threshold, 1 c, targeted to less than 10 5 for the Kane technology. 3.2.1 Methodology and Results Single-qubit operators are implemented with a sequence of pulses on the A-gate, and each 2 This investigation used a different model than the Skinner-Kane model, which has the actual ˆx-axis, and a ẑ -axis. In the original Kane model, the unaltered axis was the ẑ-axis. This investigation used the same convolved axis as the Skinner- Kane model, but used the ẑ-axis instead of ˆx-axis, with an identity operator period of 256 clock ticks. Regardless, the technique involved can be used with any two distinct axes to find a set of operators.

CHAPTER 3. GATES FOR THE SKINNER-KANE MODEL 34 A-gate voltage high low 0 100 200 300 400 512 Clock tick Figure 3.3: Pulse sequence to approximate an H operator. The above pulse sequence approximates an H operator to ε = 1.094 10 5. A-gate transitions occur at clock ticks 1, 20, 83, 152, 337, and 338. pulse corresponds to an operator for one of two small rotations. The entire sequence corresponds to the composition of these operators. The simulation of on-off pulses on an A-gate is straightforward. Given a set of clock ticks, {a,b,...,c, on which to transition, and starting with the A-gate on, the operator produced is T a Z T b a Kx T t max c Z, where t max is the last clock in a complete (multiple of 256) cycle and T Z and T Kx are the single clock-tick rotations around the ẑ and convolved axes, respectively. For a pulse train generating U and approximating an operator A, the goal was to minimize ε = i j u i j a i j, since ε 2 max(1 ψ UA ψ ). (For any ψ, ψ I ψ = 1, where I is the identity operator, and ψ J ψ < 1 when J I). ε was minimized by moving each transition point ±1 tick while keeping the other points constant, and selecting the result with the smallest ε. This process was repeated until no better result was obtained that is, a local minimum was found. Keeping in mind that the total number of clock cycles must be a multiple of the cycles for an identity operator (in this case, 256n), there are still 2 256n possible operators. Two goals were to keep the number of complete cycles (n) small, and to keep the number of transitions small to simplify the classical control circuitry. Since the iterative process will find a local minimum for any starting point, randomly selecting starting points proved as effective as any starting-point selection strategy tried. My first experiment was to approximate an H operator. I started with one complete cycle, using two, four, and six transitions, but did not find an acceptable solution (although one might exist). However, with two cycles (512 clock ticks) and transitions at ticks 1, 20, 83, 152, 337, 338,

CHAPTER 3. GATES FOR THE SKINNER-KANE MODEL 35 ε = 1.094 10 5, giving a probability of error of O(10 10 ). Figure 3.3 shows the waveform. (This result is probably better than could be expected from a real pulse waveform with slew-rate errors and jitter.) My results for the other operators were not as stellar, but are still acceptable. (They also required fewer trials to find.) X, Z and S gave an ε on the order of 3 10 3, for an error probability around 10 5. ε for T was around 8 10 3, which is larger than the targeted reliability, but still not too bad for the [[7,1,3]] code, the T -operator is applied indirectly using a specially prepared, verified state, CNOT s and measurement. The T -operator itself is never directly applied to data qubits. All of the actual pulse trains and results are given in Appendix A. 3.2.2 Future Directions Single-Qubit Operators There is still much to be done to characterize this work. To start, the space of possible operators is huge. Since each tick can be either on or off, there are 2 256 combinations, each producing an operator. In the work above, I searched over a relatively small space of size ( 512 6 ), starting with a random set of transitions and iteratively moving the transitions to minimize the size of ε. A better approach might be to characterize families of two-transition operators with a fixed distance between the transitions, followed by four-transition families, etc. This would allow any arbitrary operator to be defined. However, with encoded qubits, there are a limited number of operators that actually need to be applied. Finding multiple acceptable approximations would be useful from the standpoint of reducing the introduction of systematic errors [12]. Multi-Qubit Operators For two-qubit operators, there is no reason that the cross-exchange operators need to be the same. The only requirement is that, to be valid, the resulting two-qubit operator must leave the qubits in the subspace defined by 0 and 1. Interesting non-swap operators may be created

CHAPTER 3. GATES FOR THE SKINNER-KANE MODEL 36 using three or more cross-exchange operators. A SWAP operator requires the equivalent of two S operators (the time required for electron movement is negligible). A CNOT requires five single-qubit (T and T ) operators and two SWAP operators, so is equivalent to nine single-qubit operators. A CNOT created using fewer exchanges would speed computation. Three-qubit operators, such as the Toffoli (C-C-NOT) and Fredkin (C-SWAP) operators, can be built up from SWAP s and CNOT s. During the SWAP operator, however, one of the qubits is converted to an electron-electron qubit, which is less stable, but highly mobile. It may be possible to find more efficient ways to create multi-qubit operators by temporarily leaving the 0 / 1 subspace, performing arbitrary exchanges, and then returning to the 0 / 1 subspace.

37 Chapter 4 Quantum Memory Hierarchy Chapter 2 introduced a number of quantum concepts, some theoretical (algorithms, error correction, teleportation), and some practical (decoherence rates, primitive operators). A computer architect s goal is to reason about and abstract the structural components and organization of a processor, given some basic design rules. In this chapter, I present some considerations about a memory structure for quantum computing, given those concepts. The main goal of this chapter is to provide insight into the feasibility of using a memory hierarchy to reduce overhead due to error-correction encoding. The first insight is that a code used during computation must have a fault-tolerant universal set of operators, but codes that are used solely for error correction require the smaller and easier-to-implement set of operators used for error correction. This chapter demonstrates that denser codes can be effectively used for caching qubits that are not currently being used for computation; that such caching (even with a modestly dense code) results in significantly fewer qubits in the system; and that caching is feasible for an important quantum procedure, the quantum Fourier transform. 4.1 Motivation The Steane error-correction code, [[7, 1, 3]], can be used for fault-tolerant quantum computation. Concatenated versions of [[7, 1, 3]], however, are not very dense. Denser codes exist, with two modest examples being [[8,3,3]] and [[16,10,3]]. These codes cannot be used for fault-tolerant

CHAPTER 4. QUANTUM MEMORY HIERARCHY 38 Recursion Storage Operation Min. time level (k) overhead overhead overhead 0 1 1 1 1 7 153 5 2 49 23,409 25 3 343 3,581,577 125 4 2,401 547,981,281 625 Table 4.1: Overhead of recursive error correction for a single qubit operator. The [[7, 1, 3]] code can be concatenated with itself to form [[49, 1, 7]], [[343, 1, 15]], etc. The operation overhead is estimated in [38] as the number of operators required to measure the error syndrome after applying a single operator to the encoded qubits. Since error-syndrome measurement can be performed in parallel, the time overhead is estimated to be smaller than the operation overhead. computation, however, because fault-tolerant universal sets of operators do not exist for them 1. One possibility, explored in this chapter, is to use [[7,1,3]]as the code for the actual computation portion of a quantum computer, and use denser codes for qubits not currently active in computation. This idea is analogous to a classical memory hierarchy, with small fast registers for computation, and a larger but slower bank of memory for bulk storage of data. For concreteness, this chapter focuses on the quantum Fourier transform (QFT) from Shor s algorithm. Due to its exponential speedup over any known classical algorithm, Shor s algorithm is one of the primary motivations to study quantum computing. In particular, I provide an analysis of the spatial and temporal locality of QFT. Combined with reasonable estimates of quantum device technology, this analysis provides the basis for a evaluating a memory hierarchy for a quantum computer. 4.2 Background The quantum processor assumed for this chapter computes in the [[7,1,3]]code, or a recursive (concatenated) variant ([[49, 1, 7]], [[343, 1, 15]], etc.). The reason for this is the relative ease with which most quantum primitives are performed on this code. Table 4.1 [38] shows the storage and operator overhead of using these codes. The numbers are for a single logical qubit, with rows 1 Different families of error-correction codes exist, and not all families support a universal set of operators. There are codes that have a universal set of fault-tolerant operators that are denser than concatenated [[7, 1, 3]]codes, but still only encode a single qubit. To get significant density requires encoding multiple qubits together.

CHAPTER 4. QUANTUM MEMORY HIERARCHY 39 depicting increased levels of concatenation and the ability to correct more errors. To factor a 1024-bit product of two primes requires roughly p(n) = 5.2 10 11 (520 billion) operators on a quantum computer [38]. An additional assumption for this chapter was an aggressive, but not overly optimistic, quantum device technology with a decoherence rate per operator of p = 10 6, which implies that to reliably execute Shor s algorithm, the operators will be performed on logical qubits in a [[343,1,15]] code word (i.e., k, the recursion level is equal to three) [38]. 4.3 Memory Hierarchy The underlying architectural question is: should the [[343, 1, 15]] code be used throughout the quantum processor, or should teleportation be used as a code-conversion tool, exploiting denser codes for storage, with the [[343, 1, 15]] code used only for the computational components of the system? If a denser code is used for storage, the costs will be increased access time and an imposed structure to the quantum data, since multiple qubits will be encoded together. There will be fewer physical qubits required for the computation, however, and fewer operators required to error-correct the data. One method of visualizing a [[343, 1, 15]] code is as a tree of codes, with each node in the tree being a [[7,1,3]]code (see Figure 2.6 on page 14). It is straightforward to replace the topmost node of this tree with either the [[5,1,3]] code or the [[8,3,3]] code and thereby generate a [[245,1,15]] code or a [[392,3,15]] code. While the number of physical qubits for the [[392,3,15]] code is larger (392 compared to 343), this code encapsulates three logical qubits instead of just one, requiring 62% fewer phyiscal qubits per logical qubit. The per-qubit overheads are calculated in Table 4.2, and should be compared to the values given in Table 4.1. This denser packing creates a quantum memory hierarchy that is analogous to classical memory hierarchies. Instead of a classical cache based upon high-speed SRAM, a quantum cache is based upon denser error correction codes. The rest of this chapter assumes the top-level code is [[8,3,3]], giving a cache line size of three. (The line size is determined by the number of qubits encoded by the top-level code.) Fortunately, like their classical counterparts, quantum algorithms can be structured to

CHAPTER 4. QUANTUM MEMORY HIERARCHY 40 Processor qubits Cache lines Memory pages teleport teleport More physical qubits Less complex operations Greater density More complex code Figure 4.1: Trading computational ease for density Storage Operation Min. time Code overhead overhead overhead none 1 1 1 [[8, 3, 3]] 2.67 276/3 = 92 7 [[8 7,3,7]] 18.67 92 153 = 35 =[[56, 3, 7]] 14, 076 [[8 7 7, 3, 15]] 130.67 14, 076 153 = 175 =[[392, 3, 15]] 2, 153, 628 Table 4.2: Overhead of [[8, 3, 3]] concatenated with [[7, 1, 3]]on a per-qubit basis H R2 R3 R4 R5 R6 R7 R8 R9 H R2 R3 R 4 R5 R6 R7 R8 H R 2 R 3 R 4 R5 R6 R7 H R 2 R 3 R4 R 5 R 6 H R2 R3 R4 R 5 H R 2 R 3 R 4 H R 2 R 3 H R 2 H Figure 4.2: Quantum Fourier transform on nine qubits H R2 R3 R4 R5 R6 R 7 R8 R 9 H R2 R3 R4 R5 R 6 R 7 R 8 H R2 R 3 R 4 R 5 R 6 R 7 H R2 R3 R4 R5 R6 H R 2 R 3 R 4 R 5 H R 2 R 3 R 4 H R2 R3 H R2 H Figure 4.3: Locality in the quantum Fourier transform

CHAPTER 4. QUANTUM MEMORY HIERARCHY 41 exhibit spatial and temporal locality to take advantage of caching. The Quantum Fourier Transform (QFT), integral to Shor s algorithm, can use such caching techniques to increase locality. Figure 4.2 shows a nine-bit QFT. In Figure 4.3, the operators have been reordered so that groups of operators involve two of three sets of three qubits each. (The R m gates are rotate-phaseby-π/2 m gates.) In this case, the third set of qubits can be cached while operators are applied to the first and second sets; the first set is then cached while operators are applied to the second and third sets, and finally, the second set is cached while operators are applied to the first and third sets. The three logical qubits being cached require 392 physical qubits, compared to 1,029 if they had not been cached, a space savings of more than 60%. Operationally, there is also considerable savings. The savings is actually greater than that suggested by Tables 4.1 and 4.2 (see footnote). The number of operators required by the [[343, 1, 15]] code is per logical operator. Since operators typically are not applied to the cache and memory (with the exception of teleportation), the overhead is relative to the refresh rate of the cache. A general assumption is that qubits decohere much more slowly when operators aren t being applied 2. If we assume the decoherence rate for an operator is ten times what it is for an idle qubit in the same time frame, then the operator overhead for the [[343,1,15]] code is about fifteen times that of the [[392, 3, 15]] code. The time overhead is also not completely described by the tables, since the main difference in operator time is not due to refreshing the cache qubits, but rather to the time taken to teleport qubits into and out of the cache 3. In the nine-qubit example, there are 54 operators, counting swaps as three CNOT operators 4. The teleportation operator requires applying six operators (including the two measurements), and bits must be teleported both directions. Hence, the total cost of teleporting between the processor and the cache is twelve operators. Two teleportations are required in our example. The first can overlap the six operators on just the second set of qubits, but we pay the full penalty for the second. The total overhead is eighteen operators, or about 33%. 2 The savings in operators is potentially much more than in Table 4.2, depending on the properties of the underlying quantum physical system. In many systems, the main source of non-operator decoherence is dephasing. The allows the use of a top-level code that only corrects phase errors, such as Shor s 3-qubit repetition code. 3 Since teleportation can be performed in parallel with some computational operators, the real times are between those in Table 4.1 and Table 4.2. 4 If the system has a fundamental SWAP operator, the total number of operators drops to 48.

42 Chapter 5 Error-Correction and Circuit Design This chapter discusses the costs associated with error correction, assuming a solid-state implementation with physical qubits at fixed locations. It shows that the most efficient layout of data qubits is dictated by the error-correction mechanism, and that a concatenated error correction code results in an H-tree structure. It explores the costs of error correction when inter-qubit communication (that is, moving qubit values) is taken into account, using a concatenated [[7, 1, 3]] code for fault tolerance, as suggested by Aharonov and Ben-Or [3, 2]. Finally, the cost of communication using the SWAP operator over a swap channel is compared to that of using a teleportation channel. As described previously, the Skinner-Kane proposal is a scalable, solid-state model that addresses individual qubits using voltage pulses on classical electrodes. In the Skinner-Kane system, the need for classical control wires limits the possible arrangement of qubits [36]. The inter-gate distance for thehttp://davidjr.com/thespaceboys/davidjr.com-thespaceboys1.mov qubits (on the order of 20nm) is smaller than the smallest allowable trace width (estimated to be 100nm at 100mK, to avoid noticeable quantum effects such as stair-stepped voltages). Because the wires are relatively large, qubits must be arranged in such a way that wires can be routed to them. Oskin, et.al. [38], concluded that qubits could be laid out in straight lines, with several planes of wires to carry the classical control signals to the qubits, narrow (5nm) vias to act as the actual gates, and junctions separated by a minimum number of qubits (calculated to be 22) in order to accommodate the wires. Much of the activity on a quantum computer is likely to consist of error-correcting the data between the real computational steps. Hence, laying out qubits to facilitate error correction

CHAPTER 5. ERROR-CORRECTION AND CIRCUIT DESIGN 43 D 3 H D 2 H D3 D 3 data and ancillae D 1 H D2 D 2 A 2 A 1 0 D1 D 1 A 2 A 1 A 2 A 1 communication channel D 3 D 2 D 1 A 2 H H Ψ used A 1 0 source used ancillae sink create verify parity uncreate and measure Figure 5.1: Two-rail layout for the three-qubit phase-correction code. The schematic on the left shows qubit placement and communication, where D i s indicate data qubits, and A i s are cat-state ancillae. The column of D i s and A i s form a swapping channel, and can also interact with the data and cat-state ancillae. The open qubit swapping channel at the bottom brings in fresh ancillae, and removes used ancillae. The same layout is shown as a quantum circuit on the right, with the operators required to create and verify an ancillary cat state, and to measure the parity of a pair of data qubits. is important to minimize communication costs. Physical qubits that encode the same logical qubit the error-correction block should be near one another. However, it is crucial that two qubits in an error-correction block do not interact, which leads to the following proposal: a two-rail layout, with two parallel lines of interacting qubits with classical support wires coming in from the sides (see the schematic on the far-left side of Figure 5.1). To minimize the total number of traces hence decreasing the distance between branches the qubits in the left rail can only interact with their neighbors in the right rail, while those in the right rail can communicate with their neighbors in both rails. The qubits in the right rail are strictly for communication and data transfer, while those in the left rail hold encoded data and ancillary cat states. Freshly validated ancillae are fed in from the bottom, while ancillae used during error correction and computation exit the bottom. Measurement (presumably) requires special hardware, so qubits are swapped out the top of the right rail to be measured. The inherent tree structure of concatenated error correction codes and the need for lowerlevel blocks in an encoding to be near one another leads to a tree structure for laying out the qubits. Since the goal is to minimize communication costs, and the design is limited to a quasi-linear layout

CHAPTER 5. ERROR-CORRECTION AND CIRCUIT DESIGN 44 0 D 1 D 2 B D 4 D 6 n+a A 1 D 3 D 5 D 7 Figure 5.2: Schematic layout of the H-tree structure of a concatenated code. The branches labeled D i are for logical data qubits, and consist of two rails of eleven qubits each seven qubits for data and four for ancillae. The branch labeled A 1 is for creating, verifying, and uncreating the cat state. in two dimensions, an H-tree structure is best. This limits the distance between any two qubits to O( N) where N is the total number of qubits 1. The bottom branches of the tree correspond to the two-rail structure, each holding a singly-encoded qubit. Second-level encoding uses the next larger branch size (its logical qubits are singly-encoded), third-level the size above that, and so forth. Above the maximum encoding level, higher level branches just hold multiple logical qubits (see Figure 5.2). Communication between branches at a level occurs along the trunk of the level above them. 5.1 Error Correction Algorithms For concreteness, I have focused on the [[7, 1, 3]] code throughout this chapter. The [[7, 1, 3]] code is transversal in many operators that is, application of the logical operator to the encoded qubit simply requires that the physical operator be applied to each physical qubit in the code. In particular, all of the operators required for error correction X, Y, Z, H, and CNOT are transversal in the [[7, 1, 3]] code. Universality to be able to compose any operator requires at least one more operator, such as T. Application of the T operator is more involved, requiring a specially prepared state, but once the T operator has been applied at the top level, only transversal operators are required at lower levels 2. 1 The assumption of a large distance between junctions falls apart when the qubits are large relative to the control structures, as with ion-traps. It may be more feasible in such a case to create islands, or even grids, of qubits, rather than a tree. This is a topic of on-going research. 2 It is possible to use another code with transversal operators such as [[23,1,7]]. Steane [48] showed that, in a seaof-qubits model, the [[23, 1, 7]] code had lower overhead than the [[49, 1, 7]] concatenated code. Calculating overall costs, including communication, is a topic of continuing research.

CHAPTER 5. ERROR-CORRECTION AND CIRCUIT DESIGN 45 H H H H H H H H H H H H H H 0000 + 1111 0000 + 0000 + 0000 + measure measure measure measure 1111 1111 1111 0000 + 1111 measure 0000 + 1111 measure Figure 5.3: Measuring the error syndrome for the [[7, 1, 3]] error-correction code. 5.1.1 The [[7, 1, 3]] Code Error correcting using the [[7, 1, 3]] code consists of measuring the error syndrome parities of the encoding qubits in various bases, and correcting the codeword based on the measured syndrome. As shown in Figure 5.3, the qubits are rotated to the different measurement bases using Hadamard gates 3. Parity is then measured in much the same way as with a classical code, using two-qubit CNOT operators acting as XOR s. Conceptually, the parity can be measured in the same way as the three-qubit code in Section 2.2, gathering the parity on ancilla 0 s. To perform a fault tolerant measurement, however, a cat state is used in place of a 0 (see [34]). Figure 5.3 shows all six parity measurements using cat states. (Cat-state creation and cat-state verification are not shown in the figure.) As shown in Figure 5.3, measuring parity consists of the following: 1. Prepare a cat state from four ancillae, using a Hadamard gate and three CNOT gates. 2. Verify the cat state by taking the parity of each pair of qubits. If any pair has odd parity, return to step 1. This requires six additional ancillae, one for each pair. 3. Perform a CNOT between each of the qubits in the cat state and the data qubits whose parity is to be measured (See Figure 5.3). 4. Uncreate the cat state by applying the same operators used to create it in reverse order. After applying the Hadamard gate to the final qubit, A 0, that qubit contains the parity. 5. Measure A 0 : 3 The actual measurements are for the generators of the code, i.e., the n-qubit operators to which the code is invariant. For more information, the reader is directed to the literature, especially [18, 17, 21, 20].

CHAPTER 5. ERROR-CORRECTION AND CIRCUIT DESIGN 46 A Given A 0 = α 0 +β 1, create the three-qubit state, α 000 + β 111 by using A 0 as the control for two CNOT gates, and two fresh 0 ancillae as the targets. B Measure each of the three qubits. 6. Use the majority measured value as the parity of the cat state. Each parity calculation and measurement has a small probability of introducing an error, either in the measurement or in the data qubits. Hence, the entire syndrome measurement must be repeated until two measurements agree. The resulting syndrome determines which, if any, qubit has an error, and which X, Z, or Y operator should be applied to correct the error. After correction, the probability of an error in the encoded data is O(p 2 ). For the Steane [[7,1,3]] code, each parity measurement requires twelve ancillae four for the cat state to capture the parity, six to verify the cat state, and two additional qubits to measure the cat state. The six parity measurements are each performed at least twice, for a minimum of 144 ancillae to measure the error syndrome! The minimum number of operators required for an error correction cycle is 38 Hadamards, 288 CNOT s, and 108 measurements. With maximum parallelization, the time required is 24S + 156C + M, where S is the time required for a single qubit operator, C is the time required for a CNOT, and M is the time required for a measurement, assuming all but the last measurement are performed in parallel with other operators. 5.1.2 Concatenated Codes The [[7,1,3]] [[7,1,3]] two-level concatenated code is measured in the same way as the [[7, 1, 3]] code, except the qubits are encoded, and each parity measurement uses a 12-qubit cat state instead of the expected 28-bit (7 4) cat state. That is because in the [[7,1,3]] code, a Z consists of a Z on each qubit. The parity of the logical qubit is the same as that of the physical qubits, and a logical qubit is a valid codeword. Thus, three four-qubit subsets of the qubits always have even parity and the parity of the remaining three qubits is the same as the logical qubit. The error syndrome measurement for concatenated codes, using the following steps, is analogous to the singly-encoded [[7, 1, 3]] case, except that the lower-level encodings must be error corrected between operators:

CHAPTER 5. ERROR-CORRECTION AND CIRCUIT DESIGN 47 1. Prepare 12 ancillae in a cat state. 2. Verify the cat state (66 ancillae for pairwise verification, although more efficient verification schemes can be found.) 3. Perform CNOT s between the cat state qubits and the qubits encoding the data qubits whose parity is to be measured. 4. Error correct the four logical data qubits as in the previous section. 5. Uncreate the cat state, and measure the resulting qubit. As in the singly-encoded case, each syndrome measurement must be repeated. Since an error on a lower-level qubit invalidates the upper-level measurement, correction on a lowerlevel qubit means that the current parity must be remeasured. The high-level measurement must be repeated until two measurements agree. The resulting syndrome determines which, if any, logical qubit has an error. The appropriate X, Z, or Y operator can the be applied to correct the error. After the correction operator is applied to a logical qubit, that qubit must be error-corrected. The probability of an error in the encoded data is O(p 4 ) after correction. For concatenated codes, each parity measurement requires 154 Hadamards, 1307 CNOT s, and 174 measurements. Using the same assumptions as for the non-concatenated case, the time required is 26S+201C + M. Of course, the [[7, 1, 3]] code can be concatenated more than once. The error-correction procedure for higher levels of concatenation is similar to the above. The probability of error for each parity measurement is O(p 2k ), for a code concatenated k 1 times. 5.2 Communication Costs and Error Correction In this section, I discuss the communication costs of the error correction algorithms of Section 5.1, under the constraint of having only nearest neighbor interactions. First, I analyze the growth rate of errors when using SWAP operations. Second, I analyze quantum teleportation as an alternative to SWAP operations for long-distance communication. Finally, I show that there is a level

CHAPTER 5. ERROR-CORRECTION AND CIRCUIT DESIGN 48 01 01 01 01 01 01 Figure 5.4: Swap channel. of encoding at which teleportation is more efficient, both in terms of distance and in terms of the accumulating probability of correlated errors between redundant qubits in our codewords. 5.2.1 Error Correction Costs The error correction algorithms in the previous section assume an ideal situation, where any qubit can interact with any other qubit. In realistic models, qubits can only interact with their nearest neighbors, so before applying a two-qubit operator, one of the operand qubits must be moved adjacent to the other. One way to move quantum data is to use a swap channel. By applying SWAP s between pairs of qubits, the values of half of the qubits are propagated in one direction, while the remaining values are propagated in the reverse direction (see Figure 5.4). A swap channel can be used to supply 0 ancillae for the purpose of error correction, remove used ancillae, and allow for data qubit movement 4. Figure 5.1 illustrates this for the three-qubit example, using two columns of qubits, one for the data and cat-state qubits, and one for communication. The layout in Figure 5.2 (page 44) can be applied to the [[7,1,3]] code, giving a minimum time for an error correction parity check of t ecc = 12(t cc +t cv +t p +t cu +t m ) (5.1) where t cc t cv is the time for cat-state creation; is the time for cat-state verification; 4 A second way of moving quantum data is to move the physical implementation of the data. Such a scheme has been proposed (and demonstrated) for ion traps. There are inherent obstacles to physically moving the qubit, just as there are for the swap channel. The first is that movement is controlled by classical signals that introduce motional (phononic) noise. The second is that movement is extremely slow, taking milliseconds instead of microseconds. Nonetheless, such movement is feasible. An interesting future-work topic is the comparison of movement, swapping, and teleportation.

CHAPTER 5. ERROR-CORRECTION AND CIRCUIT DESIGN 49 t p is the time to entangle the cat state with the parity qubits; t cu is the time to uncreate the cat state; and t m is the time to perform a triply-redundant measurement. For [[7, 1, 3]] in the ideal, parallel, sea-of-qubits model, t cc = t single + 3t cnot t cv t p = 6t cnot +t meas = t cnot,and t cu = 3t cnot +t single where t single t cnot is the time required for a single-qubit operator; is the time required for a CNOT operator; t swap is the time required for a SWAP operator; and t meas is the time required for redundant measurement. If communication by swapping is used, t cc = max(t single,t swap )+6t swap + 3max(t cnot,t swap ) (5.2) t cv = max(t single,t swap )+9t swap +11max(t cnot,t swap ) (5.3) t p 7t swap + 4max(t cnot,t swap ) (5.4) t cu = t swap + 3t cnot +t single +t meas (5.5) In the Kane model, t single < t swap < t cnot < t meas. Including parallelism between parity measurements, the minimum time for a syndrome measurement is t ecc = 221t swap + 210t cnot +t single +t meas. Since measurement is fully parallelizable, these times assume that there are enough measurement units to perform measurement in parallel with the other operations in the error-correction cycle.

CHAPTER 5. ERROR-CORRECTION AND CIRCUIT DESIGN 50 5.2.2 Multilevel Error Correction For the singly concatenated code, the data movement in the upper level is more complicated, although Eq. 5.1 still holds. The first step in an error correction parity calculation is to create and verify the 12-qubit cat state. Figure 5.2 on page 44 shows how the ancillae branches are incorporated into the data branches. After verification, the cat state is moved to the appropriate data branches, where it is CNOT ed with the data qubits. The cat state is then moved back and uncreated, while the data branches are error-corrected. Finally, a Hadamard is applied to the last cat-state ancilla, which is then redundantly measured. (The layout in Figure 5.2 is not necessarily optimal.) For [[7, 1, 3]] concatenated with itself k times, t cc,k log 2 (a k ) t cnot +( 5 2 a k 3)t swap (5.6) t cv,k = 2a k t cnot +(a k (a k 2)+2)t swap (5.7) t p,k = a k + 3t b,k + 3t b,k 1 +t ecc,k 1 (5.8) t cu,k = t cc,k +t single +t m (5.9) a k = 4 3 k 1 (5.10) 1, k = 1 B, k = 2 t b,k = (5.11) t b,k 1 +(n+a 1 )t b,k 2, k = 3 t b,k 1 + 2 n/2 t b,k 2, k > 3 where the subscript k indicates the level of encoding, a k is the number of qubits in the cat state at level k, t b,k is the branch distance between logical qubits at level k, B is the minimum number of qubits between two branches for a given architectural model, and n is the number of physical qubits in the non-concatenated code. With communication by swap channel, the SWAP operator becomes very important. In the sea-of-qubits model, SWAP s are not required. In the Skinner-Kane model, using a concatenated [[7,1,3]] H-tree model, SWAP s account for over 80% of all operations.

CHAPTER 5. ERROR-CORRECTION AND CIRCUIT DESIGN 51 5.3 Avoiding Correlated Errors An important assumption in quantum error correction is that errors in the block of qubits of a codeword are uncorrelated. That is, one error in a codeword does not make a second error more likely. To avoid such correlation, it is important that qubits in a codeword do not interact with each other. Unfortunately, a 2D layout cannot avoid indirect interaction of qubits in a block. When moving through a swap channel, all of the qubits in the block must pass through the same line of physical locations. Although the qubits are not swapped with one another, swapping them with some of the same qubits that flow in the opposite direction cannot be avoided. For example, if two qubits of codeword d 0 and d 1 both swap with a third qubit q 0, there is some probability that d 0 and d 1 will become correlated with each other through q 0. This occurs when both SWAPs experience a partial failure. In general, if p is the probability of a failure of a SWAP gate, the probability of an error from swapping a logical qubit is ( ) n n k k b k p+ b k p 2 + 2 ( n k 3 ) b k p 3 +, (5.12) where n k is the number of physical qubits in the code at level k, b k is the number of qubits between branches at level k, and the higher order terms are due to correlation between the qubits. From Eq. 5.12, it is clear that correlated errors are dominated by uncorrelated errors when n k p 1. Despite that, the first term in Eq. 5.12 grows exponentially with k, since p is constant for a given architecture. Hence, some new mechanism is necessary to transport logical qubits when n k b k p 1 no longer holds. One such method worth exploring is to use intermediate error correction during transport imagine small twigs growing off the larger trunks of the H-tree. Alternatively, teleportation could be used, since EPR pairs can be purified, whereas generic quantum data cannot be. By calculating the number of basic computation and communication operations necessary to use teleportation for long-distance communication, it is possible to quantify the level of encoding at which teleportation is more efficient than swapping (see the next section).

CHAPTER 5. ERROR-CORRECTION AND CIRCUIT DESIGN 52 k Teleportation Swapping, Swapping, Swapping, B = 22 B = 61 B = 285 1 864 1 1 1 2 864 22 61 285 3 864 77 194 866 4 864 363 948 4,308 5 864 1,199 3,032 13,560 6 864 4,543 11,680 52,672 Table 5.1: Comparison of the cost of swapping an encoded qubit to the cost of teleporting it. The B-values are the distance between adjacent qubits. 5.4 Teleportation In the two-rail H-tree model, the cost of teleportation is roughly constant (see footnote 5 ). The main cost would be movement of EPR pairs to the qubits to be teleported, except that the EPR pairs can be positioned ahead of time, effectively creating an always-available teleportation channel. (This is similar to the assumption of a constant stream of 0 ancilla being available for error correction.) Hence the actual cost of teleportation is due to entanglement, swapping the data to the measurement area, and measurement itself. Table 5.1 lists the number of SWAP operators required to move an unencoded qubit from one level-k codeword to the adjacent codeword for different minimum branch distances, as well as the total operations required to teleport the same qubit. Since a teleportation channel precommunicates EPR pairs, it has a fixed cost 5. To determine when to use teleportation, the number of computation and communication operations within the teleportation circuit are compared to the swapping costs from the previous section. The cross-over point shows at what level k of the tree to start using teleportation instead of swapping for communication. Figure 5.5 illustrates this tradeoff. We can see that for B = 22, teleportation should be used when k 5. 5 This assumes that the entire word is teleported at the lowest level encoding in parallel, and ignores the cost of error correction, since the qubit will need to be error-corrected in either case.

CHAPTER 5. ERROR-CORRECTION AND CIRCUIT DESIGN 53 Swapping and Teleportation 100,000 10,000 Distance Swapped 1,000 100 Teleportation Swap, B=22 Swap, B=61 Swap, B=285 10 1 1 2 3 4 5 6 Level of Recursion Figure 5.5: Cost of teleportation compared to swapping. The B-values chosen illustrate break-even points for different levels of recursion.

54 Chapter 6 System Initialization: An Analysis of the Schulman-Vazirani Algorithm All proposed schemes for scalable quantum computation require ancilla qubits in a known state. While it is possible to run small demonstrations of quantum computation starting with a mixed state using nuclear magnetic resonance (NMR) [49], such a scheme requires exponential resources, either in time or space. Some hardware schemes propose initialization by measuring, and inverting 1 qubits to 0. However, measurement may be destructive, leaving the measured qubit in an essentially random state. Even on systems such as ion traps, where measurement is not considered destructive, measurement is the least reliable of all the operators, since it forces the system to interact with the outside universe; the value of the resulting state may not be reliable enough for sustainable computation. In this chapter, I present and analyze algorithms to initialize part of a quantum system to a known state. The algorithms I analyze are: Schulman and Vazirani s quantum heat engine [41], which is attractive since it requires no qubits in a known state; Akira and Kitagawa s adaptation of the Schulman-Vazirani algorithm [4], which emphasizes the use of the boost operator; and Schumacher s noiseless-coding-theorem operator, as described by Cleve and DiVincenzo [10], which requires ancillary 0 qubits to perform arithmetic. I mainly analyze Schulman and Vazirani s algorithm, since they suggest it should have asymptotic performance similar to the Schumacher op-

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM55 erator. In general, the algorithms presented achieve initialization by compressing the entropy 1 in the system into a subset of the qubits, leaving the remaining qubits in a known 0 state. Since the phases of the original qubits are unknown, and all phase information is compressed into the non- 0 qubits, the algorithms I explore can be modeled using purely classical Monte Carlo simulation. My simulation model is an adaptation of that suggested by Akira and Kitagawa. It views binary (0/1) data through multiple arrays of indices sorted by the expected probability values associated with each index after application of Schulman-Vazirani algorithm (see Section 6.3 for a description of the Akira-Kitagawa simulation method). Additionally, I explore the effect of correlation (and the removal of correlated qubits) on the Schulman-Vazirani algorithm, since both the Schulman-Vazirani paper and the Akira-Kitagawa paper suggest correlation destroys the effectiveness of the algorithms. (In the end, correlation seems to play a small part, indicating the flaw lies elsewhere.) 6.1 Introduction Schumacher presented a quantum analogue to Shannon s noiseless coding theorem [10], in which a system of N qubits can be compressed to S + δn qubits (via a compression operator), where S is the von Neumann entropy of the system. The compressed qubits can be restored with fidelity (1 ε), given N (S+δN) 0 qubits, and the inverse of the compression operator. Since quantum operators are reversible, this means the original N (S + δn) discarded qubits must also have been 0 s. Schumacher gives a three-qubit operator as a demonstration (see Table 6.1 on page 56 and Table 6.2 on 66). Schulman and Vazirani s compression algorithm was developed specifically for initializing the atomic nuclei of large polymers in an NMR system (but is easily adapted to other models). An NMR quantum system at equilibrium favors the lowest energy spin state, 0, with a bias ε = P 0 P 1 = tanh( E /kt), where P m is the probability of a qubit measuring to m, k is Boltzman s constant, T is the temperature, and E is the difference in energy between states. Schulman and Vazirani [41] noted this bias could be exploited by compressing the entropy of the system into in- 1 Information-theoretic entropy is a measure of the randomness in a system (as is entropy in statistical thermodynamics). Shannon [42] showed that the two are inherently related, and gave a closed form for entropy based on the probability distribution of states within a system. Randomness (and hence entropy) is a measure of information content, in that a completely predictable system provides no information.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM56 Unencoded Encoded Data Stream Data Stream 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 1 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 Table 6.1: Schumacher Operator for Five Qubits If the bias toward 0 of the unencoded input data stream is > 60%, the most probable input states result in 0 s in the leftmost positions of the output states after encoding. finite temperature (zero bias) qubits, leaving the remaining qubits in a state suitably cold enough to be used as a starting state for computation. They called their algorithm a quantum heat engine, analogous to a Carnot heat engine, since it reversibly moves heat to one area of the system, leaving the rest of the system colder. A Monte-Carlo simulation of the Schulman-Vazirani algorithm runs in O(N 2 ), although a real quantum system could exploit parallelism to run in O(N). One model particularly well-suited to the Schulman-Vazirani algorithm is the Skinner- Kane proposal. As previously described, the Skinner-Kane proposal [24, 45] is a low-temperature, solid-state analogue of liquid-state NMR, using 31 P nuclei with two spin states embedded in a 0- spin-state 28 Si substrate. The nuclei are controlled with a set of anodes, and the entire system is kept at one hundred millikelvin. The bias between spin states at equilibrium can be very high. Akira and Kitagawa [4] concluded that any algorithm, such as Schumacher-Vazirani, based solely on expected probabilities of individual qubits cannot approach the Schumacher limit (N (S + δn)), instead being limited to N(1 S/N + δ). They give a construction based on the Schulman-Vazirani boost operator as proof, claiming that binary correlation prevents the Schumacher-Vazirani algorithm from approaching the Schumacher limit. I simulated their algorithm on a 128-qubit array. Additionally, I simulated an alternative construction, also based on Schulman-Vazirani, that performs better than Akira and Kitagawa s, but still falls short of the Schumacher limit. My im-

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM57 plementation focuses on a simplification of the original algorithm, rather than on a single operator. While the results fall within the same bounds as Akira-Kitagawa, N(1 S/N +δ), I reached a different conclusion: the low results are not due to correlation, but rather are inherent to the algorithm. As with my simulation of the Akira-Kitagawa scheme, the bulk of my results for this implementation are for an array of 128 qubits. I make no claim that my implementation of the algorithm is optimal. Rather, my results point to new directions for modifying the Schulman-Vazirani algorithm to approach optimality, perhaps by introducing a Schumacher operator on a small number of qubits (small Schumacher operators can be implemented directly from basic gates). Cleve and DiVincenzo [10] detail an algorithm implementing the generalized N-qubit Schumacher operator in polynomial time. However, their algorithm requires approximately N + log N 0 qubits that must be generated somehow, for implementation of the classical arithmetic on a quantum computer. (They also give a better implementation that requires O( N) qubits.) I simulated their algorithm using classical arithmetic only, using the GMP arbitrary precision package, once again with 128-bit inputs. The running time of the simulation is O(n 4 ), assuming a straightforward implementation of multiplication in O(n 2 ). The remainder of this chapter analyzes the above algorithms and simulation results. Section 6.2 introduces the Schulman-Vazirani algorithm in more detail, including an analysis of the effects of correlation. Section 6.3 introduces Akira and Kitagawa s simulation model and algorithm, and discusses simulation results. Section 6.4 describes the new variant of the Schulman-Vazirani algorithm, and compares it with Akira and Kitagawa s. Section 6.5 analyzes the Schumacher operator, along with a simulation of Cleve and DiVincenzo s construction. Section 6.7 looks at future directions this work should pursue. 6.2 The Schulman-Vazirani Heat Engine The algorithm proposed by Schulman and Vazirani is quite complex. In this section, I analyze a simplified version of their algorithm, and examine the expected probabilities of measuring 0, and the correlation between qubits after a single application of the algorithm. A detailed analysis of the Monte-Carlo simulation of the algorithm is given in Sections 6.3 and 6.4.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM58 SCHULMANVAZIRANIOPERATOR(QUBIT a[], QUBIT b[]) 1 // Note: The first element of the arrays has index 0. 2 for j = 1 to n 1; do 3 // If a j and b j differ, set a j to 1 (using CNOT). 4 a j = a j b j 5 for i = j downto 1; do 6 // Inverted-control CONTROLLED-SWAP 7 if a i == 0 8 Swap b i and b i 1 9 fi 10 Swap a i and a i 1 11 od 12 od Figure 6.1: Simplified Schumacher-Vazirani Algorithm 6.2.1 The Simplified Schulman-Vazirani Algorithm Schulman and Vazirani s original algorithm was targeted towards liquid-state NMR computing. It involves the atomic nuclei of a large polymer arranged into several tapes with a special nucleus serving as a uniquely addressable qubit. However, the underlying idea is quite elegant and simple. The algorithm concentrates bias in a manner opposite that of von Neumann s Simulating Fair Coin Flips From Biased Coins. In von Neumann s method [52], a trial consists of two coin flips. If the results of both flips are the same, the trial is discarded. Otherwise, the value of the trial is the value of the first flip. Since heads-tails has the same probability as tails-heads, all of the bias is in the discarded same-valued pairs. In contrast, the Schulman-Vazirani algorithm sorts qubits in two pairs of arrays by whether or not corresponding qubits differ, effectively collecting bias at one end of one array. (The second array is made warmer in the process.) The pseudocode in Figure 6.1 outlines the Schulman-Vazirani method. First, the qubits are divided into two groups, arrays a and b. If qubits a i and b i differ, a i is inverted. In the process, the a i becomes hotter, but the b i of states is cooler (the ratio of the amplitudes for 0,0 and 0,1 increases, see Figure 6.2). The a i bit is then used to determine whether the b i bit should be swapped to the cold end of the b array. The algorithm separates the b i into three groups: the original b 0 becomes the pivot, with

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM59 0,0 1,0 0,0 1,0 0,1 1,1 1,1 0,1 Figure 6.2: Distribution of an a i,b i pair before CNOT (left), and after (right). the cooled b qubits from the same-value pairs to the left of the pivot, and the hot b qubits to the right. The final position of the pivot falls into a limited range with high probability. The group of b i s in that range form the third group, and vary from cold to hot. This is illustrated in Figure 6.3, which gives the expected probabilities, P 1,out, of measuring a 1 after an application of the algorithm (The probability of measuring a 1 before the application of the algorithm, P 1,in is 0.2). (The values in the figure were calculated using Eq. 6.1 in the next section.) All of the a i except a 0 become warmer, and the entire a array is reversed. The original a 0 only participates insofar as it is moved from the left to the right, and so doesn t change temperature. 6.2.2 Expected Values and Variance Assume the original bits in a and b are independent. Let P 0 be the probability that a bit is 0, and P 1 = 1 P 0 the probability that a bit is 1. Similarly, let P 00,P 01,P 10,P 11 be the probabilities that both bits are 0, bit a is 0 and bit b is 1, etc. Since the bits are independent, P 00 = P0 2, etc. Also, let n 00,n 01,n 10,n 11 be the number of (a,b) pairs with a = 0,b = 0; a = 0,b = 1; etc. For simplicity, let n be the total number of pairs, N = 2n is the total number of bits, L = n 00 + n 11 is the number of same-valued pairs, and K = n L is the number of different-valued pairs. The expected distribution of b bits after a single iteration is E[b i ] = n L=0 Pr[L n] L n 00 =0 Pr[L n] = (P 00 + P 11 ) L (P 01 + P 10 ) K ( n L n 11 n Pr[n 00 L] 00 + n, 0 i < L 11 1/2, L i < n ) (6.1) (6.2)

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM60 0.8 0.7 Schulman-Vazirani Algorithm Schulman-Vazirani, P 1,in = 0.20 0.6 0.5 P 1,out 0.4 0.3 0.2 a 49 0.1 0 a 0 a 10 a 20 a 30 a 40 b 0 b 10 Qubit Label b 20 b 30 b 40 b 49 Figure 6.3: Distribution of qubits after one application of the Schulman-Vazirani algorithm. The y-axis is the probability, P 1,out of a qubit measuring to 1 after the algorithm. The x-axis is a label: the 50 qubits on the left are the a i, and the 50 on the right, the b i. The input probability of a qubit measuring to 1, P 1,in, is 0.2 for all qubits (P 0,in = 0.8). Pr[n 00 L] = ( P00 ) n00 ( ) n11 ( ) P11 L P 00 + P 11 P 00 + P 11 n 00. (6.3) This is just a binomial distribution 2, scaled to be between n 11 /(n 00 + n 11 ) and 1/2. For a binary variable, X {0,1, X 2 = X, hence Var(X) = E[X] E[X] 2. The expected distribution of a bits is 2P 0 P 1, 0 i < n 1 E[a i ] = P 1, i = n 1. Notice that if P 0 > 1/2, 2P 0 P 1 > P 1, so the a bits become warmer. 6.2.3 Correlation and Covariance Correlation ρ(x,y) is a measure of similarity for the random variables X and Y. Positive ρ(x,y) indicates that they tend to vary together, with ρ(x,y) = 1 indicating perfect correlation. 2 A binomial distribution the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability P. The values cluster around np, since this is the most likely outcome. Hence, there is a sharp, but smooth, transition between no and yes values, or in this case, between same-valued pairs and mixed-valued pairs.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM61 Negative ρ(x,y) indicates that they tend to vary inversely. A correlation of 0 indicates independent (uncorrelated) variables (see Figure 6.4). The formula for correlation is the expected covariance scaled by the square-root of the product of the variances: ρ(x,y) = E[(X E[X])(Y E[Y])] Var(X)Var(Y). After one iteration of the Schulman-Vazirani algorithm, if the input bits are independent, the correlation ρ(b i,b j ), 0 i < j < n, is: ρ(b i,b j ) = n L=0 Pr[L n] L n 00 =0 Pr[n 00 L] (0 E[b i ])(0 E[b j ])p 00 + (0 E[b i ])(1 E[b j ])p 01 + (1 E[b i ])(0 E[b j ])p 10 +, (6.4) (1 E[b i ])(1 E[b j ])p 11 where p 00 = p 01 = p 10 = p 11 = n 00 (n 00 1) L(L 1) 1 n 00 2 L, i < L, j < L ( L 2), i < L, j L 1/4, i L, j > L n 00 n 11 L(L 1) 1 n 00 2 L, i < L, j < L, i < L, j L 1/4, i L, j > L n 00 n 11 L(L 1) 1 n 11 2 L, i < L, j < L, i < L, j L 1/4, i L, j > L n 00 (n 00 1) L(L 1) 1 n 11 2 L, i < L, j < L, i < L, j L 1/4, i L, j > L (6.5) (6.6) (6.7) (6.8) Clearly, correlation between a i,b i is bad. Positive correlation reduces the effective P 0, since both P 00 and P 11 increase by ρp 0 P 1, but the increase to P 11 is relatively larger (see Figure 6.4). Hence, even though L increases, the cooled qubits are warmer than they otherwise would have been.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM62 0,0 1,0 0,0 1,0 0,1 1,1 0,1 1,1 Figure 6.4: Distributions of a, b with correlation +0.125 (left) and 0.125 (right). (The dashed line is the distribution for independent qubits.) If the input qubits are not all at the same temperature, the size of L can increase to include all of the output qubits, with a slope such that there is no significant change in temperature. Negative correlation decreases L, so fewer cooled qubits are produced, but they are colder. However, the algorithm only produces positive correlation. The effect of positive correlation between b i 1,b i, given in Eq. 6.4 is less obvious. Positive ρ(b i 1,b i ) favor input arrays that are mostly 0 s or mostly 1 s, giving lower probability to input arrays with the expected number of 1 s, since most such inputs will have neighboring pairs that differ. This leads to a flattening of the input probabilities (the less probable extremes become more probable, with a commensurate decrease in the probability of mixed inputs), with a subsequent widening of the pivot zone. As above, in the extreme case, the pivot zone will include all qubits, resulting in no change in temperature. Interestingly, correlation between the a i s has no effect on the output b i s; its main effect is raising the output correlation among the a i s. This really isn t surprising, since correlation doesn t affect the probability of a bit being a 0 or a 1. Since the a s and b s are independent, only correlation between a s and b s or between b s and b s has an impact on the expected values of the b s. 6.3 Akira and Kitagawa s Model The purpose of this section is to reproduce the results from Akira and Kitagawa [4] as a comparison baseline for the full Schulman-Vazirani algorithm in the next section (Section 6.4). Akira and Kitagawa considered the central element in the Schulman-Vazirani algorithm to be the boost operator (equivalent to a CNOT followed by an inverted-control C-SWAP, see Table 6.2 and

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM63 Figure 6.5). In [4], they describe a method to simulate a boost-operator-only algorithm, using a Monte-Carlo simulation, as suggested by Schulman and Vazirani in [41]. Akira and Kitagawa do not get the optimal results anticipated by Schulman and Vazirani. Nonetheless, their simulation method allows exploration of several different initialization algorithms. The Akira-Kitagawa simulation method is a clever way of taking the results of a Monte- Carlo simulation, rearranging them according to the expected value, and running subsequent Monte- Carlo simulations, based on those results. From a high-level view, their algorithm runs a function on a set of bits to determine statistical information about the function. An array of indices to the statistical results are sorted by some criteria (in this case, the probability of measuring to 0, low to high). The indices are sorted (instead of the results of a particular run), since the interesting result is the expected value after many runs. The results of a single run are 0 s and 1 s, and have little meaning outside of a statistical context. The pointer array allows the data to be viewed in different ways: unsorted for the initial function application, sorted according to expected values for a second function application. Applying the function a second time, using the view through the sorted indices, allows a second set of expected values to be generated, and a second array of indices sorted according to these expected values. Further applications of the function (with concomitant data gathering) can be performed by applying the function through the unsorted indices, then through successive arrays of sorted indices. 6.3.1 The Simulator The simulation algorithm starts by separating the qubits into groups of three. For each of the groups, the boost operator is applied conditional on cooling a. Enough runs are performed to achieve a desired accuracy in the expected value 3. The qubit indices are then sorted according to the expected value, and the result stored in an array of indices. (Note the actual qubits themselves are not sorted. The ordering is reused for subsequent iterations, and thus must be stored.) The procedure is then iterated, except that the view of the qubits is filtered through the array of indices the first application of the boost operator is on unsorted qubits, and each successive application on the result of the previous sort with the expected values sorted again, and the results of the sort stored as an 3 The total number of runs, I, required to achieve a desired accuracy ε is O(ε 2 ). The expected value depends on the correlation between bits, and the accuracy of the correlation grows as ε 2. Akira and Kitagawa also inverted bits with an expected value greater than 0.5. In my simulations, I found that such bits were always < 0.5+ε.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM64 a b c X X Figure 6.5: The Schulman-Vazirani boost operator: a CNOT followed by a controlled-swap ( ) between a and b with inverted control by c. array. This is continued up to a number of iterations specified by the user, after which no further gains are expected. Figure 6.6 on p.65 shows the iteration process. The zero th iteration starts with all of the qubits initialized to 0 or 1, with the probability of 1 being P 1,in = 0.2. The qubits are divided into groups of three, corresponding to a, b, and c 4 in Table 6.2: qubits 0, 1, and 2 form one group; qubits 3, 4, and 5, another; etc. The boost operator is applied to each group, and the outcomes (0 or 1) for each of the three qubits is recorded. Initialization and boosting is repeated a user-specified number of times to achieve a certain level of precision in the expected results (the first iteration). After the first iteration, qubits 0, 3, 6,... have approximately the same expectation, as do qubits 1, 4, 7,..., and qubits 2, 5, 8,.... (Qubits 126 and 127 do not participate.) The indices of the qubits are sorted based on expected value. Iteration 1 in the figure shows the view of the qubits through the sorting filter, giving three distinct groups for a, b, and c, as well as two left-over qubits at P 1,in. Iteration 2 starts with initialized qubits (iteration 0), runs them through a single boost operation (iteration 1), then a second boost operation using the sorted-index filter. This is repeated a user-specified number of times, with the resulting expectations once again sorted (into roughly nine groups), and stored, to be used as part of the the third iteration, and so forth. Eventually, some qubits will be statistically indistinguishable from P 1out = 0.5 (infinitely hot), and can be removed. Similarly, some qubits will be close enough to P 1out = 0 to be removed. Boosting and sorting only occurs on the active set of qubits (those not infinitely hot or cold enough), as can be seen by looking at the unordered hot qubits of iteration 12.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM65 0.8 0.7 0.6 0.5 Akira-Kitagawa Algorithm Iteration 1 Iteration 2 Iteration 3 Iteration 6 Iteration 12 P 1,out 0.4 0.3 0.2 0.1 0 0 20 40 60 80 Qubit Label 100 120 140 Figure 6.6: Results of applying the Akira-Kitagawa algorithm through several iterations. P 1,out 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 20 Akira-Kitagawa Algorithm Correlation 40 60 80 Qubit Label Expected Value Maximum Correlation 100 120 MaximumCorrelation f orqubit 140 Figure 6.7: Correlation between qubits after a single application of the boost operator. The correlation value is maximum correlation between this qubit and any other.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM66 Input Boost Schumacher a b c a b c a b c 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 1 1 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 Table 6.2: Results of applying the Schulman-Vazirani boost operator, and the three-bit Schumacher operator. 6.3.2 Analysis As can be seen in Table 6.2, qubit a is zero for the inputs with two or more zeroes. a is boosted if 100 has a smaller amplitude than 011, or assuming no correlation between the qubits: E[a] < E[b]E[c] 1 E[b] E[c]+2E[b]E[c] (6.9) If the qubits are correlated, Eq. 6.9 doesn t hold, in general, since Pr[ abc = 100 ] E[a](1 E[b])(1 E[c]). In fact, knowing the expected values of a, b, and c for an arbitrary abc is not enough to determine the probability state vector. Even knowing the three binary correlations, and the fact that the sum of probabilities is 1, there are seven equations, but eight unknown amplitudes. The Akira-Kitagawa model only measures the expected values, and so is underspecified and unlikely to be optimal. Using 128 qubits with P 1,in = 0.2, the algorithm produces 11 good 0 qubits. Akira and Kitagawa claim that their algorithm is optimal, since they have focused on the boost operator. However, their implementation has the drawback of strongly correlating a s and b s (see Figure 6.7). Since correlation is a problem, this is undesirable. An interesting aside is that the Schulman-Vazirani boost operator is similar to a threequbit Schumacher operator (see Table 6.2). A logical follow-up is, what is the behavior of the Akira-Kitagawa algorithm if the boost operator is replaced by the three-qubit Schumacher operator? What if a four-qubit Schumacher operator is used? At what point do the results start to approach the Schumacher limit? 4 The a, b, and c are different than the a and b from Schulman-Vazirani algorithm (section 6.2). Schulman-Vazirani algorithm divides the qubits into two large groups, a and b, of many qubits; the Akira-Kitagawa algorithm divides the qubits into many groups of 3 qubits, a, b and c.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM67 (a) (b) (c) Figure 6.8: The Distillation Model A distillation column (a) is used to separate two species of differing volatility. In each tray, the more volatile species concentrates in the vapor (dashed arrows), which percolates up, while the less volatile species concentrates in the liquid (solid arrows) which overflows its tray to the tray below. The distillation column was the inspiration for the distillation model (b), which separates cold 0 qubits from hot ( 0 + 1 ) qubits. The Schulman-Vazirani operator gradates a set of qubits from cold to hot. Like a boosts the concentrations of the desired species (cold qubits), but does not completely separate them. Unlike a distillation column, which has only two regions (liquid and vapor), the Schulman-Vazirani operator produces a continuum. Similar regions from each source tray (circles) can be moved to the same destination tray (arrows), instead of just up or down a tray. Some qubits can even remain in the same tray. (c) shows a simplification of (b) : a single tray (solid box) operates on part of the active set of qubits (solid line), sorted by expected temperature. The active set is re-sorted after the application of the Schulman-Vazirani operator (dotted line), and a new set of qubits is placed in the tray (dotted box). 6.4 The Distillation Model Intuitively, one could take the results of running the Schulman-Vazirani algorithm, sort the qubits into groups with similar probabilities, and rerun the algorithm on each of these groups, sorting out qubits that are too hot or cold enough, and returning the in-between qubits to the appropriate group. In a sense, this is similar to a fractional distillation column, where each distillation tray (group) feeds purer product to the trays above, and returns a less pure product to the trays below. The distillation model is in between Schulman and Vazirani s original proposal and the Akira-Kitagawa model. The original Schulman-Vazirani proposal operates on all qubits in the active set at the same time. 5 The Akira-Kitagawa model sorts the results between iterations based on 5 The Schulman-Vazirani proposal divides the qubits into several tape segments. Only one pair of tapes is active at

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM68 DISTILL(QUBIT input array[]) 1 Divide qubits into two groups, a[] and b[] 2 SCHULMANVAZIRANIOPERATOR(QUBIT a[], QUBIT b[]) 3 Sort the qubits by temperature 4 Remove those qubits that are infinitely hot or cold enough 5 The remaining qubits form the active set 6 Divide the active set into tray(s) as specified by user 7 for user specified number of times do 8 for each tray, hottest to coldest do 9 Divide tray into two halves, a[] and b[] 10 SCHULMANVAZIRANIOPERATOR(QUBIT a[], QUBIT b[]) 11 Remove infinitely hot qubits and qubits that are cold enough 12 Re-sort active set 13 Divide active set into trays 14 od 15 od Figure 6.9: The Distillation Algorithm The user specifies the number of trays, how much the trays should overlap, and how many times the routine should be run. Deciding which qubits to remove is non-trivial. One method is to simulate the algorithm using multiple runs with pseudo-random qubits to determine the expected temperature of each qubit, similar to the Akira-Kitagawa simulator (Section 6.3.1). expected values, but the tray size is limited to three qubits. The distillation model applies the Schulman-Vazirani operator to different trays sequentially, extracting coolness from the hottest tray and trickling it down to the cold tray. This section describes the results of varying parameters in the distillation model, such as changing the number of trays, adjusting tray overlap, varying the definitions of too hot and cold enough, and removing highly correlated qubits (as well as varying the definition of highly correlated. ) (For the complete set of adjustable parameters, see Appendix C.1). 6.4.1 Size and Number of Trays After the initial application of the Schulman-Vazirani operator across all of the input qubits, the qubits form three rough groups: the cooled b-qubits; the hot b-qubits that will be discarded; and the rest of the b-qubits (the pivot region) along with the a-qubits (see Figure 6.3). While the pivot region of the b-qubits grows more slowly than the total number of qubits, it still a given time.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM69 seems reasonable to divide the qubits into groups, and try to extract the maximum coolness from each group. The distillation algorithm selects subgroups of the active set of qubits (trays), and applies the Schulman-Vazirani operator (see Figure 6.8). After applying the Schulman-Vazirani operator to each subgroup, the algorithm sorts the active set again, removing those qubits that are too hot, or cold enough. The actual sort must be predetermined, based on the expected temperatures of the qubits. (Schulman and Vazirani [41] suggest using a Monte-Carlo method; since the phase does not matter to the Schulman-Vazirani operator, a classical simulation can determine expected temperatures.) The algorithm selects trays from the hot end of the active set first, in order to maximize the number of cool qubits available at the cool end of the active set. The user selects the size and overlap of the trays used by the algorithm. (Monte-Carlo simulation can help determine optimal size and overlap, as well as the optimal number of runs through the active set. Some results for different numbers of subgroups are shown in Table 6.3 on p. 73. See Section 6.4.4 for the details of the simulation setup for these results.) The version of the distillation algorithm described by this chapter uses trays with a fixed fraction of the active set, and the overlap is a fixed fraction of the size of the tray. This is not necessarily optimal: better results might be obtainable by using variable sized trays and overlap, with the size determined by temperature, or some other factor. 6.4.2 How Cold is Cold Enough? In a quantum system, the reliability of the available operators defines the lower limit on qubit coldness. Fault-tolerant quantum computing requires that the probability of an ancilla 0 measuring to 1 is less than 10 4, perhaps much less. The Schulman-Vazirani algorithm can produce arbitrarily cold qubits, up to the system limit, with a slight trade-off between coldness and the total number of cold qubits, since there is always a slight gradient in the temperature of the resulting cold qubits. A second limitation, though, comes from calculating the sorting. A fundamental limitation of the Monte-Carlo method used is that the accuracy of the results is limited by I, the number

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM70 0.0003 0.0002 0.0001 0.0000-0.0001-0.0002-0.0003 Effects of Correlation on Output Values Effects on a Expected Values Effects on b Expected Values 0.010 0.008 0.005 0.003 0.000-0.003-0.005-0.008-0.010 (a[i],a[i-1]) (b[i],b[i-1]) (a[i],b[i]) (a) (a[i],a[i-1]) (b[i],b[i-1]) (a[i],b[i]) b-values, =0 (b) Figure 6.10: Effects of correlation between a i s, between b i s, and between a i and b i. The plots in (a) and (b) show the effects of input correlation between a i and a i 1 (labelled ρ(a[i],a[i 1])); b i and b i 1 (ρ(b[i],b[i 1])); and a i and b i (ρ(a[i],b[i])). The y-axis represents the change in expected value, E[correlated] E[uncorrelated]. The right graphic also plots E[uncorrelated] (not to scale) to show the pivot point. The graphs show results for 5,000,000 samples, giving an expected accuracy in the data of < 0.0003. of samples taken. The accuracy of the expected value for correlation is O(I 1/2 ). Since the expected value depends on the correlation, the accuracy of expected values is also O(I 1/2 ), instead of O(I 1 ) as one might guess. The measured accuracy of expected values is roughly exp = 2 I log10 (I). (6.10) This was the minimum value used for cold-enough. The maximum value for too-hot was 0.5 exp. In the simulations performed, each run took 20,000 samples, for a cold-enough value of 0.00682 (see Section 6.4.4). 6.4.3 Effects of Correlation To determine the overall effect of correlation on the Schumacher-Vazirani operator, I ran several tests using correlated bits as the input to the operator. I looked at three separate sources of

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM71 correlation: in-stream correlation between the a-bits (ρ(a i,a i 1 ) > 0), in-stream correlation between the b-bits (ρ(b i,b i 1 ) > 0), and cross-stream correlation (ρ(a i,b i ) > 0). To create correlation, I generated a uniform random variable, RAND 6. If RAND > ρ(a i,a i 1 ), bit a i would be a copy of of bit a i 1 ; otherwise, it would be generated randomly according to the input probability, P 1,in. Bits with correlation ρ(b i,a i ) or ρ(a i,b i ) were generated in a similar manner. The results are shown in Figure 6.10. It is clear that correlation between b i s flattens the pivot (transition) region, as does correlation between a i and b i. Correlation between the a i s has no impact on the expected values of the b i s, but does impact the correlation on the output a i s. The intuition behind these results was discussed in Section 6.2.3. What is not intuitive is that although every b i is CNOTed with a corresponding a i, only a small number of b s those on the cold side of the pivot region actually become strongly correlated with a s. These qubits are strongly correlated with all of the a s, as well as with their immediate b neighbors. This is because this small number of correlated b s can originate from any position in the original b array, but always wind up in the same region of the output array. Since the a-array is only reversed, the corresponding a s can originate at any position in the array, and wind up spread across the array. (This is in contrast to the Akira-Kitagawa model, where most of the qubits wind up strongly correlated to another qubit.) A simple solution, then, is to remove these highly correlated b qubits before the next iteration. However, similar results are obtained when the correlated qubits are left in place, perhaps because the active set remains larger. Future research can determine the actual importance of correlation in the algorithm, by simply removing the correlation. At each level of iteration, the expected values of each qubit is known. Rather than reusing the correlated qubits (as I have done in the simulation described in the next section), it would be fairly simple to regenerate fresh bits using the expected values. The fresh bits would have no correlation. (Preliminary results show that correlation, in fact, is not that important.) 6 using the GNU C Library (glibc) random() function as a pseudo-random source, seeded by the current time value. Glibc s random() and rand() functions are the same. Both are more uniformly random than the original Unix rand() function.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM72 0.8 0.7 0.6 Schulman-Vazirani: Remove Correlation Iteration 1 Iteration 2 Iteration 3 0.5 P 1,out 0.4 0.3 0.2 0.1 0 0 20 40 60 80 Qubit Label 100 120 140 Figure 6.11: Removal of highly correlated bits. Bits with correlation > 0.1 were relocated to the right end of the graph. Three iterations resulted in twelve cold (P 1,out < 0.01) qubits. 6.4.4 Simulation Setup and Results The results of distillation are given in Table 6.3. In all cases, P 1,in = 0.2. Each iteration took 20,000 samples, giving exp = 0.00682 (from Eq. 6.10). The resulting cold bits ranged from 0.0002 < P 1,cold < exp. The results were largely immune to most of the tunable parameters, with the exceptions being the number of trays and over/underlap. b-array bits with correlation > 0.1 were removed. The simulations were run on dual-processor Dell TM PowerEdge TM servers with two Intel TM Xeon TM 2.80 GHz hyperthreaded processors. At most three simulations were run at any given time, leaving an additional thread to handle operating system (Red Hat TM Enterprise 3) and other tasks. The running time is roughly linear in the number of trays. In large part, this is because the number of iterations was one more than twice the number of trays (the amount of overlap, and the speed with which cold bits are removed are also factors). The running time is quadratic in the size of the input array. This is expected not only because of the nested-loop structure of the basic algorithm, but also because the simulator compiles binary correlation statistics for each pair of bits in order to remove the most highly correlated bits.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM73 Distillation Results Total Bits Cold Bits Running Time Over/Underlap 128 13 25s 2 trays 1.28 1.3 35s 3 trays 0.94 0.96 65s 7 trays 0.92 82s 9 trays 0.90 108s 10 trays 0.74 112s 11 trays 0.74 0.76 0 108s 12 trays 1.2 256 32 269s 9 trays 1.3 357s 12 trays 1.36 384 55 1144s 19 trays 1.28 1.30 44 731s 12 trays 1.36 512 82 1515s 16 trays 1.36 2269s 27 trays 1.20 63 1084s 12 trays 1.36 Table 6.3: Distillation results for 128, 256, and 384 bits The overlap value scales the size of the tray. Values greater than 1 indicate overlap, less than 1, underlap. The tray is scaled toward the center of the active bits (away-from for underlap). The non-maximal value of 12 trays for 128, 384, and 512 bits is included to show the quadratic run time. The 128-bit 12-tray run time is low, since the overlap was 1.2 instead of 1.36. Larger overlaps look at more bits per tray, increasing the run time. The run times listed in the table are averages for a single run of the controlling Tcl script (with multiple runs of the simulation code). Finding parameters to give the maximum number of cold bits is more complicated. The best results were found for a given range of trays for each array size: 2 13 for 128 bits, 5 13 for 256 bits, 5 19 for 384 bits, and 8 31 for 512 bits. For each set of trays the overlap value was adjusted between 0.5 and 1.9 in increments of 0.02. A bash shell script kept three copies of a Tcl program running, each with a different number of trays. The Tcl program varied the overlap parameter to find a local maximum using a modified binary search, with the assumption that there was a single maximal value. For each of the overlap parameters and number of trays tried, the main body of code (sim.cc) was recompiled and run. The resulting answer included both the number of cold bits and a value between 0 and 1, proportional to the area above the resulting curve and 0.5. (This was necessary to break ties between two runs with the same number of cold bits.) The results of the distillation algorithm can be compared to those obtained using Cleve and DiVincenzo s Schumacher algorithm (see the next section). The Schumacher operator is the

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM74 0.8 0.7 0.6 0.5 Schumacher Operator Iteration 1 Iteration 2 Iteration 3 Iteration 4 P 1,out 0.4 0.3 0.2 0.1 0 0 20 40 60 80 Qubit Label 100 120 140 Figure 6.12: The Schumacher operator. The input bits have a probability of 0.2 of measuring to 1. Successive iterations discard most of the hot bits, to extract more cold bits. Ancilla bits consumed by the process are not shown. theoretical limit, and as expected, it outperforms the distillation algorithm. 6.5 The Schumacher Operator The Schumacher operator is asymptotically the maximum obtainable compression. Cleve and DiVincenzo [10] detail an algorithm implementing the Schumacher operator in polynomial time. In fact, the algorithm uses only classical arithmetic, and is straightforward to simulate. The downside to the algorithm is that it requires scratch registers initialized to 0 s, both for classical arithmetic and to store intermediate results. The basic idea of the Schumacher operator is to effectively sort the input states using two keys (see Table 6.1 on page 56). The primary key is the rank the number of ones in the state. The secondary key is the state s binary value. The operator converts a given state to its index in the sort. Cleve and DiVincenzo s algorithm calculates the rank, the number of elements in lower ranks (giving the base position for the rank), and the offset into the rank for a given value. The sum of the base position for a rank and the offset into the rank give the index for that state 7. 7 Note that the states are not actually sorted, just indexed. Also, since the index for a given state is unique, the operator is a permutation.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM75 Determining rank and base position for that rank is straightforward. If there are N bits, then there is ( N ( 0) = 1 value with all zeroes, N 1) = N values with 1 one, etc. The smallest value with k ones is at Σ k 1 ) i=0. ( N i Within rank k, if the leftmost one is bit i, then there are ( i k) values in the rank with the leftmost one to the right of bit i (the least significant bit is bit 0). Similarly, within values with the leftmost one in bit i, if the second leftmost one-bit is in position j, then there are ( j k 1) values with the second bit to the right of bit j. We can sum over the positions of all k one-bits to get the absolute position of the value in rank k. Like the Schulman-Vazirani algorithm, the Schumacher operator can be iterated to further cool the warm bits. In the simulation, the number of bits to include in each iteration was (arbitrarily) chosen to maximize the number of cold bits produced for that iteration. The results are shown in Figure 6.5. There are 128 bits total. Schumacher s noiseless quantum coding theorem says that for a given density function, ρ, the system can be encoded with arbitrary fidelity ε, into a minimum of 93 qubits. Figure 6.5 suggests that we can compress the qubits further, using as few as 85 qubits to represent the data, resulting in 43 reliable 0 qubits. However, the fidelity is not arbitrary. The Monte-Carlo simulation used only sampled one million iterations, meaning that the fidelity is better than 10 6, but not arbitrary. A similar simulation with 100,000 samples resulted in 52 0 qubits (P 1 < 10 5 ). 6.6 Conclusions Given the above, it appears that the reason the Schulman-Vazirani algorithm doesn t approach the Schumacher limit is not because of correlation, but because at least half the bits in the working set are made warmer in each iteration. No matter how the bits are matched together, or the order in which the boost operator is applied, this remains true. However, the Schulman-Vazirani algorithm can create cold qubits without ancilla, unlike the Schumacher operator, so the Schulman-Vazirani algorithm could be used to provide an initial set of cold qubits for the Schumacher operator.

CHAPTER 6. SYSTEM INITIALIZATION: AN ANALYSIS OF THE SCHULMAN-VAZIRANI ALGORITHM76 6.7 Future Directions The experiments suggest that correlation does not play an important role in the Schulman- Vazirani algorithm. An additional experiment has been run 8 that compares the results of using correlated bits with using freshly generated bits with the same P 1. Initial indications are that the impact of correlation is negligible the results are not significantly different than the distillation method or the Akira-Kitagawa algorithm. One approach not tried in this experiment is to match qubits whose expected values result in a cold enough qubit after the boost operation. While this might be an interesting experiment, one of the qubits will still be made warmer. However, this might be a way to determine (and extract) maximum coolness. As noted in Section 6.3, the Schulman-Vazirani boost operator is similar to a three-bit Schumacher operator. Could a larger Schumacher operator be substituted for better performance? Small Schumacher operators can be built without ancillary 0 qubits. What are their correlation properties? How many bits would need to be operated on in order to approach Schumacher behavior? The Schulman-Vazirani algorithm s performance is not as good as could be desired. However, it can produce arbitrarily cold qubits, if one is willing to discard most of the input qubits. The Schumacher operator can be constructed to require O( n) ancilla. The Schulman-Vazirani algorithm could be applied to a small subset of qubits, and the resulting cold qubits could be used to bootstrap a series of Schumacher operators, each roughly squaring the number of cold qubits. 8 The results have not been verified, though, due to the tragic death of Matthew Godwin, the master s student doing the work.

77 Chapter 7 Conclusions and Future Work This document covers a wide range of topics that will be important for building a scalable quantum computer. Some, such as trading complexity for speed when using a memory hierarchy and determining the feasibility of different communication methods, are familiar to classical computer architects. Others, such as composing operators and compressing entropy for system initialization, are uniquely quantum. Like most research, the areas I explored still have many open questions. In Chapter 3, I explored single- and two-qubit operator implementations for the Skinner- Kane model, given its two basic single-clock-tick operators. Although the operators are not orthogonal, several standard operators can be efficiently approximated. I also noticed that the model s symmetric two-qubit operator family may not be the best choice for implementing arbitrary multiqubit operators. Instead, I have proposed using multiple cross-exchange operators to simplify implementing standard two- and three-qubit operators. Additionally, there is the open question of characterizing and parameterizing the single-qubit operator space to facilitate searching for operator implementations. Chapter 4 examined the feasibility of creating a quantum memory hierarchy using different-density error-correction codes. Not only are significant savings possible, but, like their classical counterparts, quantum algorithms can efficiently take advantage of a memory hierarchy. This chapter does not include the overhead of communication, and ignores potential additional savings from using a phase-only code for the top level of error correction. Error correction is essential to quantum computation. Chapter 5 introduced an H-tree

CHAPTER 7. CONCLUSIONS AND FUTURE WORK 78 layout scheme to map concatenated error-correction codes to hardware, and gave step-by-step algorithms to error-correct using the [[7, 1, 3]] code, concatenated with itself. Using the Skinner-Kane model, the cost-overhead for communicating inside the H-tree was calculated and compared to the cost of using teleportation for communication, giving a parameterized cross-over point at which teleportation is more efficient. One of the drawbacks inherent in the Skinner-Kane model is that communication is limited to fairly narrow wires. Although I showed that correlated errors introduced during communication are negligible, I did not discuss bandwidth issues. One reason is that all of the calculations were done by hand counting operators! With the quantum simulator, investigating the implications of bandwidth should be straightforward. A second direction to pursue is how this work relates to other implementation schemes, such as ion traps, or other solid-state technologies. The ion trap model does not rely on the SWAP operator, using the MOVE operator instead, and, like some other technologies, does not have the same space constraints as Skinner-Kane. At what point is teleportation more efficient in these other technologies? Chapter 6 dealt with system initialization. Not all quantum technology schemes require initialization (NMR is a good example), but to be scalable, all schemes require a source of reliable 0 ancillae for error correction, as well as for EPR pair generation and purification. Some systems (ion traps) have a reliable measurement operator that leaves qubits in the state measured, while others do not (Skinner-Kane). It is these latter systems that need to be initialized. Chapter 6 compared two different implementations of the Schulman-Vazirani algorithm and an algorithm implementing the Schumacher operator. In the chapter, I addressed the question of whether the Schulman-Vazirani algorithm could be used for initializing a system, and if not, why not. Previous researchers [4] concluded that the algorithm results in too much correlation between intermediate qubits. In my simulations, removing high-correlation qubits did not produce results approaching the Schumacher limit, so my conclusion was that the algorithm adds too much entropy during each iteration, eventually balancing the entropy removed. Nonetheless, the Schulman-Vazirani algorithm could be used to provide ancillae for implementing the Schumacher operator algorithm. Future directions for this work might answer the following questions: How large of a Schumacher operator is feasible, given its high-order running time? Could the Schulman-Vazirani algorithm be made more efficient, perhaps by applying the

CHAPTER 7. CONCLUSIONS AND FUTURE WORK 79 Schumacher operator to a small number of qubits, instead of using the three-qubit boost operator? and finally, Since a real system will introduce errors during initialization, how reliable are the resulting the ancillae? 7.0.1 The Future of Quantum Computing Without being clairvoyant, it is next to impossible to make a prediction regarding when quantum computing will become practical (or at least feasible). The current direction of research seems to be primarily in the area of ion traps, so much of the current architectural research is focused in that area. It is very likely that a sizable ion-trap system (> 50 qubits) will be built in the next five years. I believe that in the long run, though, a solid-state system will be more scalable: just as silicon transistors replaced vacuum tubes, Skinner-Kane (or another solid-state technology 1 ) will replace ion traps. Regardless, research tends to be synergistic: the work I ve done based on Skinner- Kane has spawned research on a simulator for ion-trap systems that is being revamped for solid-state systems. The future is out there, waiting to be discovered, one experiment, one step, at a time. This document only scratches the surface of many intriguing facets of quantum-computer architectural research. I see it as a road map to future directions, in many technologies and platforms; a play in which I am both a character and an actor. And, the curtain has only just now risen! states. 1 Quantum dots in particular are interesting, because of their speed. They have the drawback of having short-lived

80 Appendix A Operator Approximation The following MATLAB code implements the individual rotations available in the (modified) Skinner-Kane model. The important functions are: rotkanex(phi), the operator for a rotation of phi radians around the new convolved axis; Kxtick(ticks), the operator that occurs when the A-gate is off for ticks clock ticks; cycle(ticks, maxtick), the operator that occurs when the A-gate is on for ticks(1) ticks following t = 0, off for ticks(2) ticks, etc., and on again by maxtick (ticks is an array of values). The final important function is, like(ticks, gate, maxtick), which tries to adjust the values in ticks to closely approximate gate. Each value in ticks is adjusted by a single tick, keeping track of how much the resulting operator is like gate. The closest match is then iterated over, stopping when a local minimum is reached. The end of the code shows the sequence of clock-tick counts for close approximations to several single-qubit operators: X, Z, H, S, and T. The various (random) starting points are not given, just the resulting closest match (local minimum). X = [ 0, 1; 1, 0 ] Z = [ 1, 0; 0, 1 ] H = (X + Z)/sqrt(2) % The usual Pauli matrices function sx = rotx (phi) sx = [cos(phi/2), I*sin(phi/2); I*sin(phi/2), cos(phi/2)]; endfunction

APPENDIX A. OPERATOR APPROXIMATION 81 function sy = roty (phi) y = [0, i; i, 0]; sy = cos(phi/2) * eye(2) i * sin(phi/2) * y; endfunction function sz = rotz (phi) sz = [1, 0; 0, exp(i * phi)]; endfunction % Convolving two simultaneous rotations. th is the atan(mag(x)/mag(z)), % where mag(x) and mag(z) are the magnitudes of the rotations function xz = xztheta(th) X = [ 0, 1; 1, 0 ]; Z = [ 1, 0; 0, 1 ]; xz = X * sin(th) + Z * cos(th); endfunction %theta = atan(8./3.)/2 % angle from [1; 0], z-axis %kanehf = xztheta(theta) % Convolving for the Skinner-Kane model. mag(x)/mag(z) is 256/96 = 8/3. % (In one tick, magnetic evolution rotates 2 * pi / 256, hyperfine rotates % 2 * pi / 96.) function kx = rotkanex(phi) %theta = atan(8./3.)/2; theta = atan(8./3.); kanehf = xztheta(theta); % hyperfine + magnetic kx = eye(2) * cos(phi/2) + kanehf * i * sin(phi/2); endfunction % Rotating by pi around the convolved axis Kx = rotkanex(pi) / i % In one tick, magnetic evolution rotates 2 * pi / 256 function gate = Ztick(ticks)

APPENDIX A. OPERATOR APPROXIMATION 82 gate = rotz(2 * pi * ticks / 256.) ; endfunction % In one tick, hyperfine rotates 2 * pi / 96 function gate = Xtick(ticks) gate = rotx(2 * pi * ticks / 96) ; endfunction % so in one tick, kanex rotates (2 * pi * sqrt(1/96.ˆ2 + 1/256.ˆ2) function gate = Kxtick(ticks) kphi = 2 * pi * sqrt(1/96.ˆ2 + 1/256.ˆ2); gate = rotkanex(ticks * kphi) ; endfunction % Given an array of cycle counts (ticks), assume the A-gate is on for % ticks(1) cycles, off for ticks(2) cycles, etc, and then on until % maxtick that is, assume first a rotation around Z, then around Kx, % etc. function final = cycle(ticks, maxtick) Zturn = 1; final = eye(2); count = 0; for this = ticks this = abs(this); if (Zturn) final = Ztick(this) * final; else final = Kxtick(this) * final; endif Zturn =! Zturn; count = count + this; endfor if (count > maxtick) result = count maxtick return else final = Ztick(maxtick count) * final; endif

APPENDIX A. OPERATOR APPROXIMATION 83 if (final(1,1) > final(2,1)) final = final/final(1,1)*abs(final(1,1)); else final = final/final(2,1)*abs(final(2,1)); endif endfunction function result = like(ticks, gate, maxtick) Zturn = 1; final = eye(2); count = 0; for this = ticks this = abs(this); if (Zturn) final = Ztick(this) * final; else final = Kxtick(this) * final; endif %this,final Zturn =! Zturn; count = count + this; endfor if (count > maxtick) result = count maxtick return else final = Ztick(maxtick count) * final; %this,final endif % Normalize on the larger value in the first column if (final(1,1) > final(2,1)) final = final/final(1,1)*abs(final(1,1)); else final = final/final(2,1)*abs(final(2,1)); endif % Sum of squares of differences of columns, then sum of row

APPENDIX A. OPERATOR APPROXIMATION 84 %result = sum(sumsq(abs(final - gate)),2); result = sum(sum(abs(final gate)),2); endfunction function result = likex(foo) result = like(foo, [0,1;1,0], 512); endfunction %[foo,info] = fsolve( likex, [104;24;0;24]) global bar = [100;22;10;24] bar=[96;21;13;24;53;4] bar=[92;7;3;15;14;23;51;5] function fx(arg, val) global bar if (arg > 0 && arg <= length(bar)); bar(arg) = val; endif disp(bar ), likex(bar) endfunction function result = likeh(foo) %foo result = like(foo, [1,1;1, 1]/sqrt(2), 512); %result endfunction global barh = [112,32,0,0] barh=[1,19,65,69] % ans = -0.0477 barh=[1 19 63 69 185 1] %ans = 0.0057988 function fh(arg, val) global barh if (arg > 0 && arg <= length(barh)); barh(arg) = val; endif disp(barh), likeh(barh ) endfunction global barh2 = [0,1,20,83,152,337,338] function fh2(arg, val)

APPENDIX A. OPERATOR APPROXIMATION 85 global barh2 if (arg > 0 && arg <= length(barh2)); barh2(arg) = val; endif xbarh2 = diff(barh2); disp(barh2), likeh(xbarh2 ) endfunction % vect is a set of pulse boundaries, starting with 0, and ending with a % multiple of the period. It is assumed that the first pulse is on % that is, a Z rotation. function [answer, baz] = isearch(vect, gate) baz = sort(vect); answer = like(diff(baz), gate, baz(length(baz))); idx = 1; type = 0; while (idx) idx = 0; for i = 2:length(vect) 1 if (baz(i) < baz(i+1)) baz(i) += 1; tmp = like(diff(baz), gate, baz(length(baz))); if (tmp < answer); idx = i; ofs = 1; answer = tmp; endif type = 1; baz(i) = 1; endif if (baz(i) > baz(i 1)) baz(i) = 1; tmp = like(diff(baz), gate, baz(length(baz))); if (tmp < answer); idx = i; ofs = 1; answer = tmp; endif type = 1; baz(i) += 1; endif endfor for i = 2:length(vect) 2 if (baz(i+1) < baz(i+2)) baz(i) += 1; baz(i+1) += 1; tmp = like(diff(baz), gate, baz(length(baz)));

APPENDIX A. OPERATOR APPROXIMATION 86 if (tmp < answer); idx = i; ofs = 1; answer = tmp; endif type = 2; baz(i) = 1; baz(i+1) = 1; endif if (baz(i) > baz(i 1)) baz(i) = 1; baz(i+1) = 1; tmp = like(diff(baz), gate, baz(length(baz))); if (tmp < answer); idx = i; ofs = 1; answer = tmp; endif type = 2; baz(i) += 1; baz(i+1) += 1; endif endfor if (idx); baz(idx) += ofs; endif answer, baz endwhile endfunction function [answer, baz] = risearch(vect, gate) for i = 2:length(vect) 1 vect(i) = floor(rand() * vect(length(vect))); endfor [answer, baz] = isearch(vect, gate); endfunction function template() tmplt = [0 0 0 0 0 0 0 512]; [answer, baz] = risearch(tmplt, H); b = [answer, baz] for i=1:1000 fprintf(stderr, "Iteration %d, ans = %7.5f\n, i, b(1,1)) [answer, baz] = risearch(tmplt, H); if (answer < b(1,1)) b = [answer, baz; b]

APPENDIX A. OPERATOR APPROXIMATION 87 endfor endif fdisp (stderr, b) % Find H: % tmplt = [0 0 0 0 0 0 0 512]; [answer, baz] = risearch(tmplt, H); % b = [answer, baz]; % for i=1:10000; % fprintf(stderr, (H:) Iteration %d, ans = %7.5f\n, i, b(1,1)); % [answer, baz] = risearch(tmplt, H); % if (answer < b(1,1)); % b = [answer, baz; b], fdisp(stderr, b); % endif; endfor; % Best two-pulse sequence: % 0.02017, [0, 7, 98, 140, 205, 256] % Best three-pulse sequence % 0.01381, [0, 41, 47, 94, 143, 181 202, 256] % 0.01369, [0, 18, 34, 115, 303, 463 501, 512] % 0.00894, [0, 0, 61, 115, 333, 371, 505, 512] % 0.00330, [0, 24, 53, 116, 296, 461 486, 512] % 1.0940e-05, [0, 1, 20, 83, 152, 337, 338, 512] % Find X: % tmplt = [0 0 0 0 0 0 0 512]; % [answer, baz] = risearch(tmplt, X); % b = [answer, baz]; for i=1:1000; % fprintf(stderr, (X:) Iteration %d, ans = %7.5f\n, i, b(1,1)); % [answer, baz] = risearch(tmplt, X); % if (answer < b(1,1)); b = [answer, baz; b], fdisp(stderr, b); % endif; endfor; % Best for X: (3 pulses, 2 periods) % 0.01086, [0, 9, 49, 253, 314, 339 453, 512] % 0.00796, [0, 148, 214, 254, 263, 362, 480, 512] % 0.00601, [0, 13, 18, 256, 262, 331 463, 512] % 0.00383, [0, 48, 86, 315, 389, 430, 450, 512]

APPENDIX A. OPERATOR APPROXIMATION 88 % 0.00378, 0, 180, 220, 386, 421, 441, 490, 512 % Find Z: % tmplt = [0 0 0 0 0 0 0 512]; [answer, baz] = risearch(tmplt, Z); % b = [answer, baz]; for i=1:1000; % fprintf(stderr, (Z:) Iteration %d, ans = %7.5f\n, i, b(1,1)); % [answer, baz] = risearch(tmplt, Z); if (answer < b(1,1)); % b = [answer, baz; b], fdisp(stderr, b); endif; endfor; % 0.15381, [0, 65, 120, 120, 206, 258, 400, 512] % 0.13150, [0, 40, 43, 86, 321, 333, 458, 512] % 0.07690, [0, 2, 128, 211, 221, 336, 467, 512] % 0.01179, [0, 71, 115, 170, 193, 280, 331, 512] % 0.00575, [0, 14, 51, 203, 205, 248, 461, 512] % 0.00363, 0, 19, 19, 98, 139, 373, 505, 512 % Find T: % T = [1, 0; 0, (1+i)/sqrt(2)]; tmplt = [0 0 0 0 0 0 0 512]; % [answer, baz] = risearch(tmplt, T); % b = [answer, baz]; % for i=1:10000; % fprintf(stderr, (T:) Iteration %d, ans = %7.5f\n, i, b(1,1)); % [answer, baz] = risearch(tmplt, T); % if (answer < b(1,1)); b = [answer, baz; b], fdisp(stderr, b); % endif; endfor; % 0.01373, 0, 27, 66, 149, 279, 337, 452, 512 % 0.01235, 0, 32, 85, 138, 248, 324, 382, 512 % 0.01211, 0, 53, 84, 225, 256, 344, 446, 512 % 0.00973, 0, 48, 244, 269, 314, 405, 459, 512 % 0.00823, 0, 78, 154, 179, 200, 374, 473, 512 % Find S: % S = [1, 0; 0, J]; tmplt = [0 0 0 0 0 0 0 512]; % [answer, baz] = risearch(tmplt, S); b = [answer, baz]; % for i=1:10000; % fprintf(stderr, (S:) Iteration %d, ans = %7.5f\n, i, b(1,1)); % [answer, baz] = risearch(tmplt, S); if (answer < b(1,1));

APPENDIX A. OPERATOR APPROXIMATION 89 % b = [answer, baz; b], fdisp(stderr, b); endif; endfor; % 0.01425, 0, 13, 243, 268, 357, 372, 422, 512 % 0.00314, 0, 60, 145, 207, 212, 382, 478, 512 endfunction

90 Appendix B Hand-modeling the H-Tree Layout The [[7,1,3]] code, concatenated, can be fit to an H-tree. At the lowest level of the H-tree are dual rails for storing, manipulating, and error-correcting the encoded data. As discussed in Section 2.2 and Chapter 5, fault-tolerant computation requires that each logical operator be following by an error-correction cycle. The error-correction cycle requires a group of parity measurements (six for [[7, 1, 3]]), that must be repeated at least twice to determine the error syndrome 1. Figure B.1 shows the actual operators required for a single parity measurement, and how those operators can occur in parallel. In the figure, the top set of eleven lines are split into seven data qubits (at the top) and four ancilla (at the bottom). The next set of eleven lines are for communication and movement. New (unused) ancillae are fed in from the trunk at the bottom, to the communication rail. (The other four lines in this group are not actually part of the circuit!) It is assumed that all measurement occurs above the top qubit of the communication rail the path to the measurement unit is not show. The first four ancillae are used to create the measurement cat state. The next six, to verify the cat state (measurement is implied by taking the qubit to the top of the communication rail; used qubits are returned to lower positions in the communication rail, eventually to be flushed back to the trunk). Next the cat state is interacted with four data qubits (note that the CNOT operators are drawn backwards), and then uncreated. Two new ancilla are used to create the three-qubit state for redundant measurement. In the figure, the different stages of parity calculation and measurement are circled, and 1 In some cases it may be realistic to not repeat the parity measurements if the original syndrome indicated no error.

APPENDIX B. HAND-MODELING THE H-TREE LAYOUT 91 Figure B.1: Counting operators for a parity measurement in [[7, 1, 3]]

APPENDIX B. HAND-MODELING THE H-TREE LAYOUT 92 the total numbers of operators (C for CNOT s, W for SWAP s, and S for single-qubit operators) are tabulated in the lower right corner. These values correspond to the variables in the equations in Chapter 5.

93 Appendix C Modeling the Schulman-Vazirani Algorithm C.1 sim.cc: C++ Code for Simulating the Distill Variant of the Schulman-Vazirani Algorithm The section contains the code for Schulman-Vazirani simulator. The simulator has a number of tunable parameters that are set during compilation. Finding optimal values for these parameters was accomplished using a binary search. The binary search was written in Tcl because of its ease of use and rapid prototyping ability. The procedures of the binary search are in the file FindMax.tcl. FindMax.tcl is designed to be sourced by another Tcl script, which provides the function to maximize. A local maximum is found by computing the two endpoints and a midpoint. The best endpoint and the midpoint are then recursed upon, to a predefined precision. Intermediate results are cached, so that an end point need not be recalculated. As the iterations return, if the best point was the midpoint, the remaining endpoint is recursed as well, to verify that the midpoint was indeed a local maximum. The script, FindAlt.tcl attempts to maximize the best over/underlap parameter for a given tray size. (The name FindAlt is a misnomer. It was originally intended to determine whether alternation running the Schulman-Vazirani algorithm over the entire active subset of the array after running a set of trays produced better results.) FindAlt provides a function for FindMax s find

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 94 procedure that compiles and runs sim.cc (with -D defined parameters), and returns the result. It also provides an entry point, DoOverlap, with the two end point values for the overlap parameter and the targeted precision as parameters. DoOverlap expects three variables to be globally defined: alt full, alt regions, and depth. These global variables are defined in the calling bash shell script, SimAlt.sh. SimAlt.sh was written to keep three copies of FindAlt.tcl running concurrently, each with a different number of trays or alternation setting. (For runs in which the array size was not 128, the compiler flag -DASIZE= array size was included in FindAlt.tcl.) Other parameters were tuned in a similar fashion, although with a single Tcl script. The resulting optimized parameters were hard-coded into sim.cc as default values. One possible bit of future work would be to reevaluate the hard-coded parameters based on the best tray and overlap parameters. C.1.1 FindAlt.tcl source "FindMax.tcl" proc SimOverlap {overlap { global alt full alt regions depth set filebase "output/simalt" set outputfile "${filebase=${alt_regions_${alt_full_${overlap.out" set errorfile set execfile "${filebase=${alt_regions_${alt_full_${overlap.err" "${filebase=${alt_regions_${alt_full_${overlap" #set depth [expr 2*$alt regions + 5] #if {$depth < 10 {set depth 10; #set depth 4 exec g++ g O4 DITER=20000 DREMOVE HIGH CORR DALTERNATE REGIONS \ DSORT BY MAX CORR DCORR RATIO=10 DCORR REMOVE=0.10 \ DDEPTH=$depth \ DALT OVERLAP=$overlap DALT FULL=$alt full \ DALT REGIONS=$alt regions Rand32.o sim.cc o $execfile if [catch { exec./$execfile > $outputfile 2> $errorfile foo] { puts "while running $execfile: $foo, ignoring" return "NoAnswer"

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 95 exec rm $execfile set result [exec sed e /ˆ\[ˆ#\].*/d e s/#result:// $errorfile] puts "R:$alt_regions, F:$alt_full, O:$overlap, res:$result" return $result proc DoOverlap {xlo xhi maxhi { global alt full alt regions depth #set xlo 0.9 #set xhi 1.7 #set maxhi 128 set answer [find SimOverlap $xlo $xhi $maxhi] set xanswer [expr {1.0 * $answer/$maxhi * ($xhi $xlo) + $xlo] puts "#answer: $xanswer $::f_x($answer) $answer (R:$alt_regions, F:$alt_full)" C.1.2 FindMax.tcl # Given a function f(x(i)), find the maximum in the range i = lo to hi. # If f x(lo) > f x(hi), recurse, with mid = (lo+hi)/2. If it # turns out that mid is the maximum, check the other side (xmid to xhi). # Rather than relying on floating point arithmetic, we ll actually be # looking at integer i, where x(i) = i * (xmax - xmin) / imax + xmin. # That is, i ranges from 0 to xmax, and defines our precision. # Stop when hi <= lo + 1. set f x({) { set rdepth 0 proc find helper {fn lo hi { global f x rdepth #puts #rdepth: [incr rdepth] incr rdepth

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 96 if {$rdepth > 20 {return ; #puts lo, hi: $lo, $hi ; flush stdout # Always check to make sure there s a value in f x(lo), f x(hi). If # there is no such element, then get the element. If the fn fails to # retrieve the element, incr/decr lo/hi and try again until success. while {$lo < $hi { set foo [array get f x $lo] set xlo [x $lo] if {0 == [llength $foo] { set f x($lo) [$fn $xlo] puts "#lo: $xlo $f_x($lo) $lo"; flush stdout if {! [string is double $f x($lo)] { # Not a valid answer, increment lo incr lo else break while {$lo < $hi { set foo [array get f x $hi] set xhi [x $hi] if {0 == [llength $foo] { set f x($hi) [$fn $xhi] puts "#hi: $xhi $f_x($hi) $hi"; flush stdout if {! [string is double $f x($hi)] { # Not a valid answer, increment lo incr hi 1 else break if {$hi <= $lo + 1 { # END CONDITION MET # This is the end: lo == (mid = (lo + hi)/2), due to integer arith if {$f x($hi) > $f x($lo) { return $hi; return $lo # Create a new midpoint

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 97 set mid [expr {int($hi+$lo)/2] # Recurse if {$f x($hi) > $f x($lo) { set answer [find helper $fn $mid $hi] if {$answer == $mid { # Try the other side set answer [find helper $fn $lo $mid] else { set answer [find helper $fn $lo $mid] if {$answer == $mid { # Try the other side set answer [find helper $fn $mid $hi] #puts #unwind rdepth: [incr rdepth -1] incr rdepth 1 return $answer; proc x {i { global xmax xmin imax return [expr {$i * ($xmax $xmin) / $imax + $xmin] proc find {fn xmin xmax imax { global maxhi rdepth set rdepth 0 array unset ::f x * set ::xmin $xmin set ::xmax $xmax set ::imax $imax return [find helper $fn 0 $imax] proc fn1 {x {

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 98 set xx [expr {$x 0.75] return [expr {1 / ($xx * $xx + 1./3)] proc fn2 {x { return [expr {log($x) $x] proc fn3 {x { # Same as fn2, but with noise return [expr {[fn2 $x] + rand()*.1.05] proc fn4 {x { # Don t answer for some values if {$x == 1./4 { puts "Divide by zero not allowed" return "XXX"; set x [expr {1 / ($x 1./4)] set x [expr {$x*$x] return $x proc trash { { global f x # A crazy function set xmin 0. set xmax 2. set imax 16 set answer [find fn4 $xmin $xmax $imax] set xanswer [x $answer] puts "#answer: $xanswer $::f_x($answer) $answer" #\ trash C.1.3 SimAlt.sh run() {

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 99 ( echo set alt regions $1 echo set alt full $2 echo set depth $3 echo source FindAlt.tcl echo DoOverlap 0.5 1.9 70 ) tclsh for i in 0 1 2; do for j in 5 8 11; do reg=$((j+i)) depth=$((2*reg+1)) echo regions: $reg, depth: $depth run $reg 0 $depth run $reg 1 $depth done & done tee a progress C.1.4 sim.cc // $File: sim.cc$ // // Run simulations of Schulman Vazirani algorithm. // // $Log: sim.cc,v $ // Revision 1.11 2004/06/04 18:59:28 copsey // Fixed A[0] <- A[0]ˆB[0]. Unnecessary warming. // // Revision 1.10 2004/06/04 17:51:09 copsey // ALTERNATE REGIONS is fixed. We now look at similarly sized regions, // instead of top half and bottom half (bug from having only two regions). // // Revision 1.8 2004/05/31 00:40:50 copsey // This version does it all: it sorts by exp, and a function of corr and exp,

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 100 // then iterates. The problem is, it doesn t give the results I m looking for. // // Revision 1.5 2004/05/27 05:13:39 copsey // Not quite done, but this works with the new statically defined arrays, // with pointers to the real data. // // Revision 1.4 2004/05/25 19:36:58 copsey // This version is good for looking at near-neighbor correlation, and // running on contiguous pieces of the array. However, we need to collect // different pieces together, meaning that we need an overall correlation so // the array can be rearranged. // // Revision 1.2 2004/05/11 05:51:40 copsey // Rand32 object returns correct range of random bytes // // Revision 1.1 2004/05/08 20:51:36 copsey // Initial revision // // static char rcsid[ ] = "$Id: sim.cc,v 1.11 2004/06/04 18:59:28 copsey Exp copsey $"; #include <iostream> using namespace std; #include "sim_types.h" #include "Rand32.h" #include "Corr32.h" #include "math.h" #include <vector> #include <algorithm> // CONSTANTS possibly defined by external defs. #ifndef ASIZE #define ASIZE 128 #endif const int a size = ASIZE; #ifndef DEPTH

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 101 #define DEPTH 4; #endif const int depth = DEPTH; #ifndef PP0 #define PP0 0.8; #endif const double P0 = PP0; #ifndef PCA #define PCA 0.1; #endif const double Pca = PCA; #ifndef PCB #define PCB 0.1; #endif const double Pcb = PCB; #ifndef ITER #define ITER 50000; #endif const int iter = ITER; #ifndef EPS // For 1.6e5, 1.6e6 and 1.6e7 iterations, the constant is.127,.146,.117 // or < log(10)/log(iter). Extrapolating, #define EPS 2 * sqrt(log(10.)/(log((double) iter)*iter)); // set iter 5e4; expr 2 * sqrt(log(10.)/(log($iter)*$iter)) //#define EPS sqrt(1./iter); // set iter 5e4; expr sqrt(1./$iter) #endif const double eps = EPS; // How much larger should E(j) be such that E(i) + eps sort < E(j)? // Since we sort out the cold enough and too hot bits, this should be // 0. #ifndef EPS SORT #define EPS SORT 0; #endif const double eps sort = EPS SORT; // How far below 0.5 is too hot?

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 102 #ifndef EPS HIGH #define EPS HIGH eps; #endif const double eps high = EPS HIGH; // How small is cold enough? #ifndef EPS LOW #define EPS LOW eps; #endif const double eps low = EPS LOW; // For development purposes, calculate fewer correlation samples, since; // they are expensive. #ifndef CORR RATIO #define CORR RATIO 1 #endif const int corr ratio = CORR RATIO; #ifndef CORR EPS #define CORR EPS 2 * sqrt(corr ratio*log(10.)/(log((double) iter)*iter)); //#define CORR EPS sqrt((double) corr ratio/iter); #endif const double corr eps = CORR EPS; #ifndef CORR REMOVE #define CORR REMOVE sqrt((double) corr ratio/iter); #endif const double corr remove = CORR REMOVE; // ->sort corr scale*corr+eps*(1-corr scale) #ifndef CORR SCALE #define CORR SCALE 10./13. #endif const double corr scale = CORR SCALE; // How many rounds to wait before adding correlated bit back in #ifndef CORR INSERT #define CORR INSERT 0 #endif const int corr insert = CORR INSERT; // For distilling subregions, how many should there be, and how much // should they overlap (> 1.0) or underlap. Also, should a full distill

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 103 // be performed? #ifndef ALT REGIONS #define ALT REGIONS 3 #endif const int alt regions = ALT REGIONS; #ifndef ALT OVERLAP #define ALT OVERLAP 1.2 #endif const double alt overlap = ALT OVERLAP; #ifndef ALT FULL #define ALT FULL 1 #endif const int alt full = ALT FULL; // class StatArray // // This is the workhorse, where all of the data gathering goes on. Since // we want to distill, sort, and iterate, we need current data (expected // values and correlations), and overall values for correlation, since // we ll want to keep track of the maximum correlation observed for a bit. // // Additionally, we ll want to be able to sort the arrays, and keep track // of the result we ll want to sort in between distill runs. This // requires a list of positions, and a tag (pointer) to the actual data // for each position (thanks to Akira and Kitagawa). Data will fall off // the ends of the array as being either too hot (P1 > 0.5-epsilon) or too // cold (P1 < epsilon), so we ll also need a begin and end value. // // A s will (most likely) come from the warmer bits, since the expected // value of b[i] will decrease if a[i] < 0.5 (not considering // correlation). Highly correlated bits need to be in the A s, since // correlation in the A s appears to not affect the outcome, except for // higher output correlation in the A s. See below for how correlation // relates to expected P1.

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 104 // // The value for epsilon is determined by the noise in the sampling, which // is O(1/sqrt(n)) (n is the number of samples taken). The constant is // around 0.2. Since its a binary distribution, the probability for a // given noise is approx uniform across a region ( +/- 0.2/sqrt(n)), and // then falls off superexponentially (eˆ{-\lambda \lambdaˆi / i!, // \lambda=p0*n, Poisson approximation). (That s my story, and I m // sticking to it!) class StatArray { public: // Data // XXX: To hell with it. There is only one instance of this beast, we // might as well statically assign sizes. uint32 array[a size]; vector<int> order[depth+1]; // The actual bits being worked on // How the bits should be reordered for // each level. Extra guard level int low[depth+1]; int size[depth+1]; // Index to the bottom of the current order // Size (elements) of current data set int high[depth+1]; int level; // Current level < depth // cov, exp, etc., are in the same order as array. int exp[a size]; int cov[a size][a size][4]; // Count * E[X] // Covariant data: pairwise, keeps // track of 00, 01, 10, and 11 // occurences double corr[a size][a size]; double max corr[a size]; int wmax corr[a size]; double min corr[a size]; // Correlation, computed in MkCorr // Max correlation observed // Which bit does it correlate with? // Min correlation observed int wmin corr[a size]; // and with whom. // scalars uint32 exp count; uint32 corr count; // Total number of exp samples // Total number of cov samples // Methods void NewData(Rand32 gen);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 105 void MkExp(void); void MkCov(void); void MkCorr(void); // Gathers statistics for exp // Gathers statistics for cov // Turns cov into correlation void ClearStats(void); inline double E(int idx) { return double (GetExp(idx))/exp count; inline double Var(int idx) { double EE = E(idx); return EE EE*EE; double Corr(int a, int b); void MkMaxCorr(void); void MkMinCorr(void); void Distill(void) attribute ((noinline)) ; void Distill(int level) attribute ((noinline)) ; int main(); // The main routine. Used to be main for the file! void StatArray::GreedyMinimizeABCorrelation(); // View and modify the data through this level s ordering inline uint32& GetArray(int idx) { return array[order[level][idx]]; inline int& GetExp(int idx) { return exp[order[level][idx]]; inline int& GetCov(int a, int b, int idx) { return cov[order[level][a]][order[level][b]][idx]; inline double& GetCorr(int a, int b) { return corr[order[level][a]][order[level][b]]; inline double& GetMaxCorr(int idx) { return max corr[order[level][idx]]; inline int& GetWMaxCorr(int idx) { return wmax corr[order[level][idx]]; inline double& GetMinCorr(int idx) { return min corr[order[level][idx]]; inline int& GetWMinCorr(int idx) { return wmin corr[order[level][idx]]; inline uint32& PrevArray(int idx) { return array[order[level 1][idx]]; // View the data through a lower level s ordering (for sorting) inline double E(int level, int idx) { return double (GetExp(level, idx))/exp count; inline uint32& GetArray(int level, int idx) { return array[order[level][idx]]; inline int GetExp(int level, int idx) { return exp[order[level][idx]]; inline int GetCov(int a, int b, int level, int idx) {

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 106 return cov[order[level][a]][order[level][b]][idx]; inline double& GetCorr(int level, int a, int b) { return corr[order[level][a]][order[level][b]]; inline double GetMaxCorr(int level, int idx) { return max corr[order[level][idx]]; inline int GetWMaxCorr(int level, int idx) { return wmax corr[order[level][idx]]; inline double GetMinCorr(int level, int idx) { return min corr[order[level][idx]]; inline int GetWMinCorr(int level, int idx) { return wmin corr[order[level][idx]]; StatArray(); double debuge[a size]; void DebugE() attribute ((noinline)); void DebugEL(int l) attribute ((noinline)); void DebugC() attribute ((noinline)); void DebugA() attribute ((noinline)); ; void StatArray::DebugE() { for(int i = 0; i < a size; i++) clog << E(i) << " "; clog << endl; void StatArray::DebugEL(int l) { for(int i = 0; i < a size; i++) clog << E(l,i) << " "; clog << endl; void StatArray::DebugC() { for(int i = 0; i < a size; i++) clog << count ones(getarray(i)) << " "; clog << endl; void StatArray::DebugA() { for(int i = 0; i < a size; i++) clog << GetArray(i) << " ";

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 107 clog << endl; StatArray::StatArray() { int i; exp count = 0; corr count = 0; level = 0; low[0] = 0; size[0] = a size; high[0] = low[0] + size[0]; for(i = 0; i < depth + 1; i++) order[i].resize(a size); for (i = 0; i < a size; i++) { order[0][i] = i; max corr[i] = 0.; wmax corr[i] = a size 1; min corr[i] = 1.; wmin corr[i] = 1; ClearStats(); inline void StatArray::ClearStats() { for(int i = 0; i < a size; i++) { exp[i] = 0; max corr[i] = 0.; wmax corr[i] = a size 1; min corr[i] = 1.; wmin corr[i] = 1; for(int j = 0; j < a size; j++) { corr[i][j] = 0; for(int k = 0; k < 4; k++) { cov[i][j][k] = 0;

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 108 exp count = 0; corr count = 0; inline void StatArray::NewData(Rand32 gen) { for (int i = 0; i < a size; i++) { array[i] = gen.get32bits(); void IntSort(int *begin, int *end, bool fn(int, int)) { // Simple insertion sort. Works in place, although it s probably not // the best performer. However, if a == b in the original array, // their ordering is preserved. int size = end begin, i, j; for(i = 1; i < size; i++) { for(j = i; j > 0 && fn(begin[j], begin[j 1]); j ) { swap(begin[j], begin[j 1]); void StatArray::MkExp(void) { int i; uint32 val; // indices into order array // actual array values exp count += UINT32 SIZE; for(i = 0; i < a size; i++) { val = GetArray(i); GetExp(i) += count ones(val); void StatArray::MkCov(void) { int a, b; uint32 val a, val b; // indices into order array // actual array values

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 109 corr count += UINT32 SIZE; // Upper triangle only for(a = low[level]; a < high[level]; a++) { val a = GetArray(a); for(b = a+1; b < high[level]; b++) { val b = GetArray(b); GetCov(a, b, 0) += count ones( val a & val b); GetCov(a, b, 1) += count ones( val a & val b); GetCov(a, b, 2) += count ones(val a & val b); GetCov(a, b, 3) += count ones(val a & val b); inline double StatArray::Corr(int a, int b) { // a and b are indices into the order[level] vector double Ea = E(a), Eb = E(b); if (0 == Ea 0 == Eb) return 0.; double Va = Var(a), Vb = Var(b); double C = (0 Ea) * (0 Eb) * GetCov(a,b, 0) + (0 Ea) * (1 Eb) * GetCov(a,b, 1) + (1 Ea) * (0 Eb) * GetCov(a,b, 2) + (1 Ea) * (1 Eb) * GetCov(a,b, 3) ; C /= corr count * sqrt(va*vb); if (C > corr eps) return C; else return 0; void StatArray::MkCorr(void) { int a, b; for(a = low[level]; a < high[level]; a++) { for(b = a+1; b < high[level]; b++) { //corr[a][b] = Corr(a, b); GetCorr(a, b) = Corr(a, b);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 110 void StatArray::MkMaxCorr(void) { int a, b; double c; for(a = low[level]; a < high[level]; a++) { GetMaxCorr(a) = 0; for(b = a+1; b < high[level]; b++) { // Upper triangle, remember? /* if (b > a) c = Corr(a,b); else c = Corr(b,a); */ c = GetCorr(a, b); if (c < corr eps) continue; if (c > GetMaxCorr(a) ) { GetMaxCorr(a) = c; GetWMaxCorr(a) = b; void StatArray::MkMinCorr(void) { // Slightly different than above. We only care about the minimum // correlation between a s and b s (top and bottom half of the arrays) int a, b; double c; for(a = low[level]+size[level]/2; a < low[level]+size[level]; a++) { GetMinCorr(a) = 1.; GetWMinCorr(a) = 1; for(b = low[level]; b < low[level]+size[level]/2; b++) { // Upper triangle, remember? if (b > a) c = Corr(a,b);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 111 else c = Corr(b,a); if (b == a) continue; if (c < GetMinCorr(a)) { GetMinCorr(a) = c; GetWMinCorr(a) = b; // cswap: if bit in C is 1, swap corrersponding bits in A and B inline void cswap(uint32 C, uint32 &A, uint32 &B) {\ uint32 Atmp; Atmp = ( C & A) (C & B); B = ( C & B) (C & A); A = Atmp; inline void swap(uint32 &A, uint32 &B) { A = AˆB; B=AˆB; A=AˆB; void StatArray::Distill(void) { int i; for(i = 0; i <= level; i++) { Distill(i); void StatArray::Distill(int ilvl) { int i, j; //#define A(x) GetArray(ilvl, x+low[ilvl]+size[ilvl]/2) //#define B(x) GetArray(ilvl, x+low[ilvl]) #define A(x) GetArray(ilvl, x+size[ilvl]/2) #define B(x) GetArray(ilvl, x) // Save A[0] from being unnecessarily warmed, start at A[1]

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 112 for (j = low[ilvl] + 1; j < low[ilvl] + size[ilvl]/2; j++) { A(j) = A(j)ˆB(j); // CNot(B, A) -> (B, AˆB) for (i = j; i > low[ilvl]; i ) { cswap( A(i), B(i), B(i 1)); swap(a(i), A(i 1)); #undef A #undef B //void StatArray::Distill(int level, int low, int high) { // void StatArray::GreedyMinimizeABCorrelation() { //* Yeesh, this is sucky! // Assume the array is already in max corr order. This puts high // correlation bits into the A s. // // There are two more rules: the first is, make sure that b[i] > a[i] // since we don t want to throw away progress already made. The // second is that, given Rule 1, a[i],b[i] correlation should be // minimized. And last, we will be only moving around elements in // order[level]. int b, a, k; double c; for (a = high[level] 1; a >= high[level] size[level]/2; a ) { // Starting at the top of the A s, look for a B that meets our // criteria for (b = a size[level]/2; b >= low[level]; b ) { if (E(b) > E(a)) continue; c = GetCorr(b,a); if (c < min corr[a]) { min corr[a] = c; wmin corr[a] = b;

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 113 if (corr eps > c) break; if (wmin corr[a] >= 0) { // wmin corr was initialized to -1, so we have a match swap(order[level][a size[level]/2], order[level][wmin corr[a]]); else { clog << "Warning: no match found for bit " << a << ". "; clog << "(a,b)=(" << E(a) << "," << E(a size[level]/2) << ") "; clog << "rho(a,b)=" << GetCorr(b,a) << endl; StatArray *global arr; bool ExpSortFn(int a, int b) { extern StatArray *global arr; // This is a little tricky: We are being given the current level view // in a and b, so we don t want to dereference them again: return global arr >E(0,a) + eps sort < global arr >E(0,b); inline double EC(double C, double E) { return C * corr scale + E * (1 corr scale); bool CorrSortFn(int a, int b) { extern StatArray *global arr; double GMa = global arr >GetMaxCorr(0,a); double GMb = global arr >GetMaxCorr(0,b); double corr delta = abs(gmb GMa); double Ea = global arr >E(0,a), Eb = global arr >E(0,b); //if (corr eps > corr delta) return 0; //return GMa < GMb; //if (corr eps > corr delta) return (Ea < Eb); //return (GMa + Ea) < (GMb + Eb); return EC(GMa, Ea) < EC(GMb, Eb);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 114 int StatArray::main() { int i, j, k, l; int corr bot = high[level]; int save high = high[level]; int save low = low[level]; srandom(time(null)); Rand32 gen(p0); extern StatArray *global arr; global arr = this; while (level < depth) { // Gather data for (j = 0; j < iter; j++) { NewData(gen); Distill(); MkExp(); // Save some time, at least while developing // XXX: This isn t being used. #if CORR RATIO > 1 if (0 == (j % corr ratio)) #endif MkCov(); if (0 == (j % 1000)) clog << "\r" << j << flush; clog << endl; // *** XXX *** XXX *** XXX *** // Copy this level to the next order[level+1] = order[level]; low[level+1] = low[level]; high[level+1] = high[level]; size[level+1] = size[level]; // Look at the data with the new ordering level++; #ifdef ALTERNATE REGIONS if (save low < low[level]) {

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 115 clog << "Restoring low to "; low[level] = save low; clog << low[level] << endl; if (save high > high[level]) { high[level] = save high; #endif // *** XXX *** XXX *** XXX *** #ifdef REMOVE HIGH CORR // Okay, this is somewhat successful, what if we put them back at // some point? // XXX: Tunable Parameter! #if CORR INSERT!= 0 if (0 == level % corr insert) { IntSort(&order[level][0], &order[level][a size], ExpSortFn); #endif // Remove the high correlation bits. Start with the B s, since // that s where the really high correlation bits are formed. for (i = corr bot 1; i > low[level]; i ) { if (Corr(i 1, i) > corr remove) { // Swap with corr bot corr bot ; swap(order[level][i], order[level][corr bot]); if (corr bot < high[level]) high[level] = corr bot; #endif // *** XXX *** XXX *** XXX *** // Sort the next level by temperature clog << "Starting E sort..." << endl; //stable sort(order[level].begin(), order[level].end(), ExpSortFn); IntSort(&order[level][low[level]], &order[level][high[level]], ExpSortFn); // Get new range for low and size for (low[level] = 0; E(low[level]) < eps low; low[level]++); for (i = low[level]; i < high[level] && E(i) < 0.5 eps high; i++);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 116 high[level] = i; size[level] = high[level] low[level]; #ifdef ALTERNATE REGIONS // Distill alternate regions, starting at the high end and working // down. If alt full, then region == alt regions means distill // the whole thing. save low = low[level]; save high = high[level]; // Starting region is alt regions, starting level is 1 int region = ((level 1) % (alt regions + 1 + alt full)); cerr << "Region: " << region << endl; if (region < alt regions) { int unregion = alt regions region 1; high[level] = high[level] (int) ((double) region/alt regions * size[level]/alt overlap); if (high[level] > save high) high[level] = save high; low[level] = low[level] + (int) ((double) unregion/alt regions * size[level]/alt overlap); size[level] = high[level] low[level]; if (low[level] < save low) low[level] = save low; #endif // Calculate all pairwise correlations, and the maximum // correlation for each bit. XXX: Should these two functions be // merged? MkCorr(); MkMaxCorr(); // *** XXX *** XXX *** XXX *** #ifdef SORT BY MAX CORR // Now re-sort, with an eye towards putting maximally correlated // bits into the A s. IntSort(&order[level][low[level]], &order[level][high[level]], CorrSortFn); // Now sort A s by exp, B s by exp IntSort(&order[level][low[level]], &order[level][low[level]+size[level]/2 1], ExpSortFn);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 117 IntSort(&order[level][low[level]+size[level]/2], &order[level][high[level]], ExpSortFn); // Finally, rearrange the B s to minimize a[i],b[i] correlation GreedyMinimizeABCorrelation(); #endif // Output the data for this level clog << "Level " << level << endl; clog << "low, high:" << low[level] << ", " << high[level] << endl; for (i = 0; i < a size; i++) { //for (l = 0; l <= level; l++) { cout.width(3); //cout << right << order[l][i] << ; cout << right << i << " "; cout.width(9); cout.fill( ); //cout << left << E(l, i) << ; cout << left << E(i) << " "; cout.width(9); cout.fill( ); //cout << left << GetMaxCorr(l, i) << ; cout << left << GetMaxCorr(i) << " " ; cout << right << order[level][i] << " "; // //cout.width(0); cout << exp[i] << " " << /* GetWMaxCorr(i) << << GetMinCorr(i) << << GetWMinCorr(i) << */ endl; cout << endl << endl; if (level < depth) ClearStats(); else { // This is the end... IntSort(&order[level][0], &order[level][0] + a size, ExpSortFn);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 118 for(i = 0; i < a size; i++) { cout << i << " " << left << E(i) << endl; cout << endl << endl; double Result; for (i = 0; i < a size; i++) { Result += E(i); Result /= 128; Result = save low + (1 Result); cerr << "#Result:" << Result << endl; return 0; int main() { StatArray arr; return arr.main();

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 119 C.2 akira.cc: C++ Code for Simulating the Akira-Kitagawa Variant // $File: akira.cc$ // // Run simulations of Schulman Vazirani algorithm, using Akira s // variation. // // $Log: akira.cc,v $ // Revision 1.10 2004/06/04 17:51:09 copsey // ALTERNATE REGIONS is fixed. We now look at similarly sized regions, // instead of top half and bottom half (bug from having only two regions). // // Revision 1.8 2004/05/31 00:40:50 copsey // This version does it all: it sorts by exp, and a function of corr and exp, // then iterates. The problem is, it doesn t give the results I m looking for. // // Revision 1.5 2004/05/27 05:13:39 copsey // Not quite done, but this works with the new statically defined arrays, // with pointers to the real data. // // Revision 1.4 2004/05/25 19:36:58 copsey // This version is good for looking at near-neighbor correlation, and // running on contiguous pieces of the array. However, we need to collect // different pieces together, meaning that we need an overall correlation so // the array can be rearranged. // // Revision 1.2 2004/05/11 05:51:40 copsey // Rand32 object returns correct range of random bytes // // Revision 1.1 2004/05/08 20:51:36 copsey // Initial revision // // static char rcsid[ ] = "$Id: akira.cc,v 1.10 2004/06/04 17:51:09 copsey Exp copsey $"; #include <iostream>

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 120 using namespace std; #include "sim_types.h" #include "Rand32.h" //#include Corr32.h #include "math.h" #include <vector> #include <algorithm> // CONSTANTS possibly defined by external defs. #ifndef ASIZE #define ASIZE 128 #endif const int a size = ASIZE; #ifndef DEPTH #define DEPTH 4; #endif const int depth = DEPTH; #ifndef PP0 #define PP0 0.8; #endif const double P0 = PP0; #ifndef PCA #define PCA 0.1; #endif const double Pca = PCA; #ifndef PCB #define PCB 0.1; #endif const double Pcb = PCB; #ifndef ITER #define ITER 50000; #endif const int iter = ITER; #ifndef EPS // For 1.6e5, 1.6e6 and 1.6e7 iterations, the constant is.127,.146,.117 // or < log(10)/log(iter). Extrapolating,

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 121 #define EPS 2 * sqrt(log(10.)/(log((double) iter)*iter)); // set iter 5e4; expr 2 * sqrt(log(10.)/(log($iter)*$iter)) //#define EPS sqrt(1./iter); // set iter 5e4; expr sqrt(1./$iter) #endif const double eps = EPS; // How much larger should E(j) be such that E(i) + eps sort < E(j)? // Since we sort out the cold enough and too hot bits, this should be // 0. #ifndef EPS SORT #define EPS SORT 0; #endif const double eps sort = EPS SORT; // How far below 0.5 is too hot? #ifndef EPS HIGH #define EPS HIGH eps; #endif const double eps high = EPS HIGH; // How small is cold enough? #ifndef EPS LOW #define EPS LOW eps; #endif const double eps low = EPS LOW; // For development purposes, calculate fewer correlation samples, since; // they are expensive. #ifndef CORR RATIO #define CORR RATIO 1 #endif const int corr ratio = CORR RATIO; #ifndef CORR EPS #define CORR EPS 2 * sqrt(corr ratio*log(10.)/(log((double) iter)*iter)); //#define CORR EPS sqrt((double) corr ratio/iter); #endif const double corr eps = CORR EPS; #ifndef CORR REMOVE #define CORR REMOVE sqrt((double) corr ratio/iter);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 122 #endif const double corr remove = CORR REMOVE; // ->sort corr scale*corr+eps*(1-corr scale) #ifndef CORR SCALE #define CORR SCALE 10./13. #endif const double corr scale = CORR SCALE; // How many rounds to wait before adding correlated bit back in #ifndef CORR INSERT #define CORR INSERT 0 #endif const int corr insert = CORR INSERT; // For distilling subregions, how many should there be, and how much // should they overlap (> 1.0) or underlap. Also, should a full distill // be performed? #ifndef ALT REGIONS #define ALT REGIONS 3 #endif const int alt regions = ALT REGIONS; #ifndef ALT OVERLAP #define ALT OVERLAP 1.2 #endif const double alt overlap = ALT OVERLAP; #ifndef ALT FULL #define ALT FULL 1 #endif const int alt full = ALT FULL; // Constants for Akira bit array const int SKIP FLAG = 1 << 0x0; const int INVERT FLAG = 1 << 0x1; // class StatArray // // This is the workhorse, where all of the data gathering goes on. Since // we want to distill, sort, and iterate, we need current data (expected

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 123 // values and correlations), and overall values for correlation, since // we ll want to keep track of the maximum correlation observed for a bit. // // Additionally, we ll want to be able to sort the arrays, and keep track // of the result we ll want to sort in between distill runs. This // requires a list of positions, and a tag (pointer) to the actual data // for each position (thanks to Akira and Kitagawa). Data will fall off // the ends of the array as being either too hot (P1 > 0.5-epsilon) or too // cold (P1 < epsilon), so we ll also need a begin and end value. // // A s will (most likely) come from the warmer bits, since the expected // value of b[i] will decrease if a[i] < 0.5 (not considering // correlation). Highly correlated bits need to be in the A s, since // correlation in the A s appears to not affect the outcome, except for // higher output correlation in the A s. See below for how correlation // relates to expected P1. // // The value for epsilon is determined by the noise in the sampling, which // is O(1/sqrt(n)) (n is the number of samples taken). The constant is // around 0.2. Since its a binary distribution, the probability for a // given noise is approx uniform across a region ( +/- 0.2/sqrt(n)), and // then falls off superexponentially (eˆ{-\lambda \lambdaˆi / i!, // \lambda=p0*n, Poisson approximation). (That s my story, and I m // sticking to it!) class StatArray { public: // Data // XXX: To hell with it. There is only one instance of this beast, we // might as well statically assign sizes. uint32 array[a size]; int* order[depth+1]; // The actual bits being worked on // How the bits should be reordered for // each level. Extra guard level int *flags[depth+1]; //int *skip[depth+1]; int low[depth+1]; int size[depth+1]; // &1 = True if value >0.5 after boost // &2 = True if boosting won t help value // Index to the bottom of the current order // Size (elements) of current data set

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 124 int high[depth+1]; int level; // Current level < depth // cov, exp, etc., are in the same order as array. int exp[a size]; int cov[a size][a size][4]; // Count * E[X] // Covariant data: pairwise, keeps // track of 00, 01, 10, and 11 // occurences double corr[a size][a size]; double max corr[a size]; int wmax corr[a size]; double min corr[a size]; // Correlation, computed in MkCorr // Max correlation observed // Which bit does it correlate with? // Min correlation observed int wmin corr[a size]; // and with whom. // scalars uint32 exp count; uint32 corr count; // Total number of exp samples // Total number of cov samples // Methods void NewData(Rand32 gen); void MkExp(void); void MkCov(void); void MkCorr(void); // Gathers statistics for exp // Gathers statistics for cov // Turns cov into correlation void ClearStats(void); inline double E(int idx) { return double (GetExp(idx))/exp count; inline double Var(int idx) { double EE = E(idx); return EE EE*EE; double Corr(int a, int b); void MkMaxCorr(void); void MkMinCorr(void); void MkFlags(void); void Akira (void) attribute ((noinline)) ; void Akira (int level) attribute ((noinline)) ; /* void Distill(void) attribute ((noinline)) ; void Distill(int level) attribute ((noinline)) ; */ int main(); // The main routine. Used to be main for the file!

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 125 void StatArray::GreedyMinimizeABCorrelation(); // View and modify the data through this level s ordering inline uint32& GetArray(int idx) { return array[order[level][idx]]; inline int& GetExp(int idx) { return exp[order[level][idx]]; inline int& GetCov(int a, int b, int idx) { return cov[order[level][a]][order[level][b]][idx]; inline double& GetCorr(int a, int b) { return corr[order[level][a]][order[level][b]]; inline double& GetMaxCorr(int idx) { return max corr[order[level][idx]]; inline int& GetWMaxCorr(int idx) { return wmax corr[order[level][idx]]; inline double& GetMinCorr(int idx) { return min corr[order[level][idx]]; inline int& GetWMinCorr(int idx) { return wmin corr[order[level][idx]]; inline bool GetInvert(int idx) { return 0!= (INVERT FLAG & flags[level][order[level][idx]]); inline bool SetInvert(int idx, bool inv) { if (inv) flags[level][order[level][idx]] = INVERT FLAG; else flags[level][order[level][idx]] &= INVERT FLAG; return GetInvert(idx); inline bool GetSkip(int idx) { return 0!= (SKIP FLAG & flags[level][order[level][idx]]); inline bool SetSkip(int idx, bool inv) { if (inv) flags[level][order[level][idx]] = SKIP FLAG; else flags[level][order[level][idx]] &= SKIP FLAG; return GetInvert(idx); // View the data through a lower level s ordering (for sorting) inline double E(int level, int idx) { return double (GetExp(level, idx))/exp count; inline uint32& GetArray(int level, int idx) { return array[order[level][idx]]; inline int GetExp(int level, int idx) {

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 126 return exp[order[level][idx]]; inline int GetCov(int a, int b, int level, int idx) { return cov[order[level][a]][order[level][b]][idx]; inline double& GetCorr(int level, int a, int b) { return corr[order[level][a]][order[level][b]]; inline double GetMaxCorr(int level, int idx) { return max corr[order[level][idx]]; inline int GetWMaxCorr(int level, int idx) { return wmax corr[order[level][idx]]; inline double GetMinCorr(int level, int idx) { return min corr[order[level][idx]]; inline int GetWMinCorr(int level, int idx) { return wmin corr[order[level][idx]]; inline bool GetInvert(int level, int idx) { return 0!= (0x1 & flags[level][order[level][idx]]); inline bool GetSkip(int level, int idx) { return 0!= (SKIP FLAG & flags[level][order[level][idx]]); StatArray(); double debuge[a size]; void DebugE() attribute ((noinline)); void DebugEL(int l) attribute ((noinline)); void DebugC() attribute ((noinline)); void DebugA() attribute ((noinline)); ; void StatArray::DebugE() { for(int i = 0; i < a size; i++) clog << E(i) << " "; clog << endl; void StatArray::DebugEL(int l) { for(int i = 0; i < a size; i++) clog << E(l,i) << " "; clog << endl; void StatArray::DebugC() { for(int i = 0; i < a size; i++)

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 127 clog << count ones(getarray(i)) << " "; clog << endl; void StatArray::DebugA() { for(int i = 0; i < a size; i++) clog << GetArray(i) << " "; clog << endl; StatArray::StatArray() { int i; exp count = 0; corr count = 0; level = 0; low[0] = 0; size[0] = a size; high[0] = low[0] + size[0]; for(i = 0; i < depth + 1; i++) { order[i] = new int[a size]; //invert[i] = new bool[a size]; flags[i] = new int[a size]; for (i = 0; i < a size; i++) { order[0][i] = i; max corr[i] = 0.; wmax corr[i] = a size 1; min corr[i] = 1.; wmin corr[i] = 1; ClearStats(); inline void StatArray::ClearStats() { for(int i = 0; i < a size; i++) { exp[i] = 0; max corr[i] = 0.;

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 128 wmax corr[i] = a size 1; min corr[i] = 1.; wmin corr[i] = 1; for(int j = 0; j < a size; j++) { corr[i][j] = 0; for(int k = 0; k < 4; k++) { cov[i][j][k] = 0; exp count = 0; corr count = 0; inline void StatArray::NewData(Rand32 gen) { for (int i = 0; i < a size; i++) { array[i] = gen.get32bits(); void IntSort(int *begin, int *end, bool fn(int, int)) { // Simple insertion sort. Works in place, although it s probably not // the best performer. However, if a == b in the original array, // their ordering is preserved. int size = end begin, i, j; struct {int a,b; bool lt; int x; double E0a, E0b/*, Ea, Eb*/; f; extern StatArray *global arr; StatArray *G = global arr; f.x = 0; #define dbg(x) ((x), f.a=begin[j], f.b=begin[j 1], f.lt = fn(f.a, f.b), \ f.e0a=g >E(0,f.a), f.e0b=g >E(0,f.b) \ /*, f.ea=g->e(f.a), f.eb=g->e(f.b)*/) for(i = 1; i < size; i++) { dbg(j = i); while (j > 0 && fn(begin[j], begin[j 1])) {

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 129 dbg(swap(begin[j], begin[j 1])); dbg(j ); void StatArray::MkExp(void) { int i; uint32 val; // indices into order array // actual array values exp count += UINT32 SIZE; for(i = 0; i < a size; i++) { val = GetArray(i); GetExp(i) += count ones(val); void StatArray::MkCov(void) { int a, b; uint32 val a, val b; // indices into order array // actual array values corr count += UINT32 SIZE; // Upper triangle only for(a = low[level]; a < high[level]; a++) { val a = GetArray(a); for(b = a+1; b < high[level]; b++) { val b = GetArray(b); GetCov(a, b, 0) += count ones( val a & val b); GetCov(a, b, 1) += count ones( val a & val b); GetCov(a, b, 2) += count ones(val a & val b); GetCov(a, b, 3) += count ones(val a & val b); inline double StatArray::Corr(int a, int b) { // a and b are indices into the order[level] vector double Ea = E(a), Eb = E(b);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 130 if (0 == Ea 0 == Eb) return 0.; double Va = Var(a), Vb = Var(b); double C = (0 Ea) * (0 Eb) * GetCov(a,b, 0) + (0 Ea) * (1 Eb) * GetCov(a,b, 1) + (1 Ea) * (0 Eb) * GetCov(a,b, 2) + (1 Ea) * (1 Eb) * GetCov(a,b, 3) ; C /= corr count * sqrt(va*vb); if (C > corr eps) return C; else return 0; void StatArray::MkCorr(void) { int a, b; for(a = low[level]; a < high[level]; a++) { for(b = a+1; b < high[level]; b++) { //corr[a][b] = Corr(a, b); GetCorr(a, b) = Corr(a, b); void StatArray::MkMaxCorr(void) { int a, b; double c; for(a = low[level]; a < high[level]; a++) { GetMaxCorr(a) = 0; for(b = a+1; b < high[level]; b++) { // Upper triangle, remember? /* if (b > a) c = Corr(a,b); else c = Corr(b,a); */ c = GetCorr(a, b); if (c < corr eps) continue;

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 131 if (c > GetMaxCorr(a) ) { GetMaxCorr(a) = c; GetWMaxCorr(a) = b; void StatArray::MkMinCorr(void) { // Slightly different than above. We only care about the minimum // correlation between a s and b s (top and bottom half of the arrays) int a, b; double c; for(a = low[level]+size[level]/2; a < low[level]+size[level]; a++) { GetMinCorr(a) = 1.; GetWMinCorr(a) = 1; for(b = low[level]; b < low[level]+size[level]/2; b++) { // Upper triangle, remember? if (b > a) c = Corr(a,b); else c = Corr(b,a); if (b == a) continue; if (c < GetMinCorr(a)) { GetMinCorr(a) = c; GetWMinCorr(a) = b; // Akira says to skip triplet if ea < (eb+ec)/(1+eb*ec) inline bool akira skip (double ea, double eb, double ec) { return ea < (eb+ec)/(1+eb*ec); void StatArray::MkFlags(void) { int i;

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 132 for (i = low[level]; i < high[level]; i++) { if (E(i) > 0.5 /*+ eps high*/) { //GetInvert(i) = true; SetInvert(i, true); // We ll be sorting these, so invert exp[i] GetExp(i) = exp count GetExp(i); /* for (i = low[level]; i + 2 < high[level]; i+=3) { SetSkip(i, akira skip(e(i), E(i+1), E(i+2))); */ // cswap: if bit in C is 1, swap corrersponding bits in A and B inline void cswap(uint32 C, uint32 &A, uint32 &B) {\ uint32 Atmp; Atmp = ( C & A) (C & B); B = ( C & B) (C & A); A = Atmp; inline void swap(uint32 &A, uint32 &B) { A = AˆB; B=AˆB; A=AˆB; inline void akira op(uint32 &A, uint32 &B1, uint32 &B2) { // CNot B2, A A = AˆB2; cswap( A, B1, B2); void StatArray::Akira(void) { int i;

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 133 for(i = 0; i <= level; i++) { Akira(i); void StatArray::Akira(int ilvl) { int j; for (j = low[ilvl]; j+2 < high[ilvl]; j += 3) { if (! GetSkip(ilvl, j)) { akira op(getarray(ilvl, j+2), GetArray(ilvl, j+1), GetArray(ilvl, j+0)); /* for (j = low[ilvl]; j < high[ilvl]; j++) { if (GetInvert(ilvl, j)) GetArray(ilvl, j) = GetArray(ilvl, j); */ //void StatArray::Distill(int level, int low, int high) { // void StatArray::GreedyMinimizeABCorrelation() { //* Yeesh, this is sucky! // Assume the array is already in max corr order. This puts high // correlation bits into the A s. // // There are two more rules: the first is, make sure that b[i] > a[i] // since we don t want to throw away progress already made. The // second is that, given Rule 1, a[i],b[i] correlation should be // minimized. And last, we will be only moving around elements in // order[level]. int b, a, k; double c; for (a = high[level] 1; a >= high[level] size[level]/2; a ) {

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 134 // Starting at the top of the A s, look for a B that meets our // criteria for (b = a size[level]/2; b >= low[level]; b ) { if (E(b) > E(a)) continue; c = GetCorr(b,a); if (c < min corr[a]) { min corr[a] = c; wmin corr[a] = b; if (corr eps > c) break; if (wmin corr[a] >= 0) { // wmin corr was initialized to -1, so we have a match swap(order[level][a size[level]/2], order[level][wmin corr[a]]); else { clog << "Warning: no match found for bit " << a << ". "; clog << "(a,b)=(" << E(a) << "," << E(a size[level]/2) << ") "; clog << "rho(a,b)=" << GetCorr(b,a) << endl; StatArray *global arr; bool ExpSortFn(int a, int b) { extern StatArray *global arr; // This is a little tricky: We are being given the current level view // in a and b, so we don t want to dereference them again: return global arr >E(0,a) + eps sort < global arr >E(0,b); inline double EC(double C, double E) { return C * corr scale + E * (1 corr scale); bool CorrSortFn(int a, int b) { extern StatArray *global arr;

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 135 double GMa = global arr >GetMaxCorr(0,a); double GMb = global arr >GetMaxCorr(0,b); double corr delta = abs(gmb GMa); double Ea = global arr >E(0,a), Eb = global arr >E(0,b); //if (corr eps > corr delta) return 0; //return GMa < GMb; //if (corr eps > corr delta) return (Ea < Eb); //return (GMa + Ea) < (GMb + Eb); return EC(GMa, Ea) < EC(GMb, Eb); int StatArray::main() { int i, j, k, l; int corr bot = high[level]; int save high = high[level]; int save low = low[level]; srandom(time(null)); Rand32 gen(p0); extern StatArray *global arr; global arr = this; while (level < depth) { // Gather data for (j = 0; j < iter; j++) { NewData(gen); Akira(); MkExp(); // Save some time, at least while developing // XXX: This isn t being used. #if CORR RATIO > 1 if (0 == (j % corr ratio)) #endif MkCov(); if (0 == (j % 1000)) clog << "\r" << j << flush; clog << endl; // *** XXX *** XXX *** XXX ***

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 136 // Copy this level to the next for (i = 0; i < a size; i++) { order[level+1][i] = order[level][i]; flags[level+1][i] = flags[level][i]; low[level+1] = low[level]; high[level+1] = high[level]; size[level+1] = size[level]; // Look at the data with the new ordering level++; #ifdef ALTERNATE REGIONS if (save low < low[level]) { clog << "Restoring low to "; low[level] = save low; clog << low[level] << endl; if (save high > high[level]) { high[level] = save high; #endif // *** XXX *** XXX *** XXX *** #ifdef REMOVE HIGH CORR // Okay, this is somewhat successful, what if we put them back at // some point? // XXX: Tunable Parameter! #if CORR INSERT!= 0 if (0 == level % corr insert) { IntSort(&order[level][0], &order[level][a size], ExpSortFn); #endif // Remove the high correlation bits. Start with the B s, since // that s where the really high correlation bits are formed. for (i = corr bot 1; i > low[level]; i ) { if (Corr(i 1, i) > corr remove) { // Swap with corr bot corr bot ; swap(order[level][i], order[level][corr bot]);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 137 if (corr bot < high[level]) high[level] = corr bot; #endif // *** XXX *** XXX *** XXX *** MkFlags(); // Sort the next level by temperature clog << "Starting E sort..." << endl; //stable sort(order[level].begin(), order[level].end(), ExpSortFn); IntSort(&order[level][low[level]], &order[level][high[level]], ExpSortFn); // Get new range for low and size for (low[level] = 0; E(low[level]) < eps low; low[level]++); for (i = low[level]; i < high[level] && E(i) < 0.5 eps high; i++); high[level] = i; size[level] = high[level] low[level]; save low = low[level]; save high = high[level]; #ifdef ALTERNATE REGIONS // Distill alternate regions, starting at the high end and working // down. If alt full, then region == alt regions means distill // the whole thing. // Starting region is alt regions, starting level is 1 int region = ((level 1) % (alt regions + alt full)); cerr << "Region: " << region << endl; if (region < alt regions) { int unregion = alt regions region 1; high[level] = high[level] (int) ((double) region/alt regions * size[level]*alt overlap); if (high[level] > save high) high[level] = save high; low[level] = low[level] + (int) ((double) unregion/alt regions * size[level]*alt overlap); size[level] = high[level] low[level]; if (low[level] < save low) low[level] = save low; #endif // Calculate all pairwise correlations, and the maximum

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 138 // correlation for each bit. XXX: Should these two functions be // merged? MkCorr(); MkMaxCorr(); // *** XXX *** XXX *** XXX *** #ifdef SORT BY MAX CORR // Now re-sort, with an eye towards putting maximally correlated // bits into the A s. IntSort(&order[level][low[level]], &order[level][high[level]], CorrSortFn); // Now sort A s by exp, B s by exp IntSort(&order[level][low[level]], &order[level][low[level]+size[level]/2 1], ExpSortFn); IntSort(&order[level][low[level]+size[level]/2], &order[level][high[level]], ExpSortFn); // Finally, rearrange the B s to minimize a[i],b[i] correlation GreedyMinimizeABCorrelation(); #endif // Output the data for this level clog << "Level " << level << endl; clog << "low, high:" << low[level] << ", " << high[level] << endl; for (i = 0; i < a size; i++) { //for (l = 0; l <= level; l++) { cout.width(3); //cout << right << order[l][i] << ; cout << right << i << " "; cout.width(9); cout.fill( ); //cout << left << E(l, i) << ; cout << left << E(i) << " "; cout.width(9); cout.fill( ); //cout << left << GetMaxCorr(l, i) << ; cout << left << GetMaxCorr(i) << " " ; cout << right << GetWMaxCorr(0,i) << " "; cout << right << order[level][i] << " "; // //cout.width(0);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 139 cout << exp[i] << " " << /* GetWMaxCorr(i) << << GetMinCorr(i) << << GetWMinCorr(i) << */ endl; cout << endl << endl; if (level < depth) ClearStats(); else { // This is the end... IntSort(&order[level][0], &order[level][0] + a size, ExpSortFn); for(i = 0; i < a size; i++) { cout << i << " " << left << E(i) << endl; cout << endl << endl; double Result; for (i = 0; i < a size; i++) { Result += E(i); Result /= 128; Result = #ifdef ALTERNATE REGIONS save low #else low[level] #endif + (1 Result); cerr << "#Result:" << Result << endl; return 0;

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 140 int main() { StatArray arr; return arr.main();

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 141 C.3 schumacher.cc: C++ Code for Simulating Cleve and DiVincenzo s Implementation of the Schumacher Operator // Building a Schumacher operator: Cleve and Divincenzo (96) in // Schumacher s Quantum Data Compression as a Quantum Computation show // how told build the Schumacher operator. The only operators used are // Not, CNot, and Toffoli, which all have classical analogues, so we can // simulate the Schumacher operator using a classical Monte Carlo // algorithm. // The basic idea of the Schumacher operator is to sort the input states // using two keys. The primary key is the number of ones in the state, // and the secondary key is the canonical ordering, i.e., value A is // greater than value B if bit i is a one in A and a zero in B, and all // bits to the left of (more significant than) bit i are equal. // // Counting 1 s is straightfoward enough. If there are N bits, then there // is one value with all zeroes C(N, 0), N values with 1 one C(N, 1), etc. // The smallest value with k ones is at sum(i = 0 to k-1, C(N, i)). // // Within a rank k (that is, all values with k ones), if the leftmost one // is in bit i, then there are C(i, k) values in the rank with the // leftmost one to the right of bit i. Similarly, within values with the // leftmost one in bit i, if the second leftmost bit is in position j, // then there are C(k-2, j-1) values with the second bit to the right of // bit j. We can sum over the positions of all k one-bits to get the // absolute position of the value in rank k. #include <iostream> using namespace std; #include <cstdlib> #include <ctime> //#include <stdio.h> #include <gmp.h>

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 142 // Using the GMP (Gnu Multiple Precision) library; // mpz t is an arbitrary precision integer // mpq t is an arbitrary precision quotient of integers // mpf t is an arbitrary precision floating point with limited precision // mantissa. // // Link with -lgmpxx -lgmp (or specify libgmp.a for fast execution!) #include "sim_types.h" #include "Rand32.h" const int foobar = 111; inline uint32 choose(int n, int k) { int i, c; if (n < k) return 0; if (n == 0) return 1; c = 1; for (i = 1; i <= k; i++) { c *= n i + 1; c /= i; return c; inline void choose(mpz t c, int n, int k) { int i; mpz set ui(c, 1); if (n < k) { mpz set ui(c, 0); return; if (0 == n 0 == k) { return;

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 143 for (i = 1; i <= k; i++) { //c *= n - i + 1; mpz mul ui(c, c, n i + 1); //c /= i; mpz tdiv q ui(c, c, i); inline int count ones(const mpz t n) { int result; result = mpz popcount(n); return result; /* mpz t z32; uint32 n32; mpz t ntmp; // Make a temporary copy of n mpz init set(ntmp, n); // init z32 with space for 32 bits. mpz init2(z32, 32); while (mpz cmp ui(ntmp, 0)) { // Get the bottom 32 bits... mpz tdiv r 2exp(z32, ntmp, 32); // and shift the rest mpz tdiv q 2exp(ntmp, ntmp, 32); n32 = mpz get ui(z32); result += count ones(n32); mpz clear(ntmp); mpz clear(z32); return result; */ // bits is N, n is the value to convert

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 144 void schu(mpz t result, const int bits, const mpz t n) { int i, j, k; char dbg str[80]; int rank = count ones(n); mpz t rank base; mpz t in rank; mpz t tmp; // Intermediate for C(n,k) mpz init2(tmp, bits); // First find the bottom of the rank //rank base = 0; // Init to zero, w/ space for bits bits mpz init2(rank base, bits); //for (i = 0; i < rank; i++) for (i = 0; i < rank; i++) { //rank base += choose(bits, i); choose(tmp, bits, i); mpz add(rank base, rank base, tmp); // Next find the place in the rank //in rank = 0; // Init to zero, w/ space for bits bits mpz init2(in rank, bits); for (j = bits 1, k = rank; k >= 0, j >= 0; j ) { //mpz tdiv q 2exp(tmp, n, j); if (mpz tstbit(n, j)) { //in rank += choose(j, k); choose(tmp, j, k); mpz add(in rank, in rank, tmp); k ;

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 145 //return rank base + in rank; mpz add(result, rank base, in rank); mpz clear(tmp); mpz clear(in rank); mpz clear(rank base); // UnSchu - the inverse of the schu operator // // Much like schu, we must determine the rank, but we do it by counting // the elements in the ranks below. The element within the rank is // determined by subtracting off the sum of the ranks below. Next, // we place the bits by substracting off the first C(k-i-1, j-i) until we // get zero. void unschu(mpz t result, const int bits, const mpz t n) { int i, j, k; char dbg str[80]; mpz t rank base; mpz t in rank; mpz t tmp; // Intermediate for C(n,k) // Initialize tmp=0 to have bits bits mpz init2(tmp, bits); // First find the rank //rank base = 0; // Init to zero, w/ space for N (=bits) bits mpz init2(rank base, bits); for (k = 0; mpz cmp(n, rank base) >= 0; k++) { //rank base += choose(bits, k); choose(tmp, bits, k); mpz add(rank base, rank base, tmp); // We ve overshot, so back up one // rank base -= choose(bits, k); k ; mpz sub(rank base, rank base, tmp);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 146 k ; // Subtract off rank base: mpz init2(in rank, bits); // in rank = n - rank base; mpz sub(in rank, n, rank base); // Next, figure out where the bits go //result = 0; mpz set ui(result, 0); for (i = bits 1; i >= 0; i ) { //if (choose(i, k) < in rank) { // Put a bit at i, and decrement k // result += 1 << i; // choose(tmp, i, k); if (mpz cmp(tmp, in rank) <= 0) { mpz setbit(result, i); mpz sub(in rank, in rank, tmp); k ; inline void get bits(mpz t n, int bits, Rand32 P0) { int i; uint32 tmp = 0; // Get bits above 32-bit multiple: i = bits % UINT32 SIZE; if (0!= i) { tmp = P0.get32bits(); tmp &= ((1 << i) 1); bits >>= i; // Start with these bits first mpz set ui(n, tmp);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 147 for (i = 0; i < bits; i += UINT32 SIZE) { // Ignore overflow mpz mul 2exp(n, n, UINT32 SIZE); mpz add ui(n, n, P0.get32bits()); #ifndef ITER #define ITER 20000 #endif const int iter = ITER; #ifndef SECOND #define SECOND 45 #endif const int second bits = SECOND; #ifndef THIRD #define THIRD 52 #endif const int third bits = THIRD; #ifndef FOURTH #define FOURTH 56 #endif const int fourth bits = FOURTH; const double eps = 0.0044; int main() { const int bits=128; int i, j; mpz t n; mpz t sch;//, bit; Rand32 P0(.8); int bucket[bits]; double result; srandom(time(null)); mpz init2(n, bits);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 148 mpz init2(sch, bits); #if defined UNSCHU TEST const int test int = (1<<27) + (1<<19) + (1<<16) + (1<<5) + 0x9; mpz set ui(n, test int); schu(sch, bits, n); unschu(sch, bits, sch); cout << test int << endl; cout << (unsigned int) (mpz get ui(sch)) << endl; #endif //defined UNSCHU TEST //cout << #iter = << iter << endl; //cout << #second bits = << second bits << endl; //cout << #eps = << eps << endl; for (j = 0; j < bits; j++) { bucket[j] = 0; for (i = 0; i < iter; i++) { //n = P0.get32bits() & ((1<<bits) - 1); get bits(n, bits, P0); //gmp printf( n: %032Zx,, n); schu(sch, bits, n); #if DEPTH > 1 // Recurse on bits 0 to second bits: Mask out low bits into n, // high bits into sch. Schu on n, then recombine. //n = sch % (1 << (bits-second bits)); mpz tdiv r 2exp(n, sch, bits second bits); //sch = sch / (1 << (bits-second bits)); mpz tdiv q 2exp(sch, sch, bits second bits); //schu(n, second bits, n); schu(sch, second bits, sch); //sch <<= second bits; mpz mul 2exp(sch, sch, second bits); //sch += n; mpz add(sch, sch, n); #if DEPTH > 2 // Let s run a third iteration, just for fun mpz tdiv r 2exp(n, sch, bits third bits);

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 149 mpz tdiv q 2exp(sch, sch, bits third bits); //schu(n, third bits, n); schu(sch, third bits, sch); mpz mul 2exp(sch, sch, third bits); mpz add(sch, sch, n); #if DEPTH > 3 // Let s run a fourth iteration, just for fun mpz tdiv r 2exp(n, sch, bits fourth bits); mpz tdiv q 2exp(sch, sch, bits fourth bits); //schu(n, fourth bits, n); schu(sch, fourth bits, sch); mpz mul 2exp(sch, sch, fourth bits); mpz add(sch, sch, n); #endif //DEPTH > 3 #endif //DEPTH > 2 #endif //DEPTH > 1 //gmp printf( sch: %032Zx \n, sch); // Keep track of results in bucket s // Put the high-order bits in the low-indexed buckets for (j = 0; j < bits; j++) { bucket[bits j 1] += mpz tstbit(sch, j); if (0 == (i % 1000)) cerr << "\r" << i; cerr << endl; result = 0.; for (j = 0; j < bits; j++) { cout << j << ": " << (bucket[j] / (double) iter) << endl; result += (bucket[j] / (double) iter); result /= bits/2; for (j = 0; bucket[j] <= iter*eps && j < bits; j++); result = j + 1 result; cerr << "#Result: " << result << endl; cout << endl << endl; return 0;

APPENDIX C. MODELING THE SCHULMAN-VAZIRANI ALGORITHM 150

151 Bibliography [1] P. Shor A. Calderbank, E. Rains and N. Sloane. Quantum error correction and orthogonal geometry. Phys. Rev. Lett., 78:405 409, 1997. [2] D. Aharonov. Noisy Quantum Computation. PhD thesis, The Hebrew Univesity, Jerusalem, 1999. [3] D. Aharonov and M. Ben-Or. Fault tolerant computation with constant error. In Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing, pages 176 188, 1997. [4] S. Akira and M. Kitagawa. Numerical analysis of boosting scheme for scalable nmr quantum computation. arxive e-print quant-ph/0305097, 2003. [5] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus, P. Shor, T. Sleator, J. Smolin, and H. Weinfurter. Elementary gates for quantum computation. Phys. Rev. A, 52:3457 3467, 1995. arxive e-print quant-ph/9503016. [6] Charles H. Bennett and David P. DiVincenzo. Quantum information and computation. Nature, 404:247 55, 2000. [7] A. Calderbank et al. Quantum error correction via codes over GF(4). IEEE Trans. Inf. Theory, 44(4):1369 1387, 1998. [8] A. Calderbank and P. Shor. Good quantum error-correcting codes exist. Phys. Rev. A, 54:1098, 1996. [9] J. I. Cirac and P. Zoller. Quantum computations with cold trapped ions. Phys. Rev. Lett., 74:4091 4094, 1995. [10] R. Cleve and D. P. DiVincenzo. Schumacher s quantum data compression as a quantum computation. Phys. Rev. A, 54:2636 2650, 1996. arxive e-print quant-ph/9603009. [11] M W Coffey. Quantum computing based on a superconducting quantum interference device: exploiting the flux basis. Journal of Modern Optics, 49(14):2389 2398, 2002. [12] et.al. D. Copsey. Toward a scalable, silicon-based quantum computing architecture. IEEE Journal of Selected Topics in Quantum Electronics, 9, 2003. [13] D. P. DiVincenzo. Two-bit gates are universal for quantum computation. Phys. Rev. A, 51(2):1015, 1994. [14] David P. DiVincenzo. Quantum computation. Science, 270(5234):255, 1995. arxive e-print quant-ph/9503016.

BIBLIOGRAPHY 152 [15] N. Gershenfeld and I.L. Chuang. Quantum computing with molecules. Scientific American, June 1998. [16] H. Goan and G.J. Milburn. Silicon-based electron-mediated nuclear spin quantum computer. unpublished, 2000. [17] D. Gottesman. A class of quantum error-correcting codes saturating the quantum hamming bound. Phys. Rev. A, 54:1862 1868, 1996. [18] D. Gottesman. Theory of fault-tolerant quantum computation. Phys. Rev. A, 57(1):127 137, 1998. arxive e-print quant-ph/9702029. [19] D. Gottesman and I. L. Chuang. Quantum teleportation is a universal computational primitive. Nature, 402:390 392, 1999. [20] Daniel Gottesman. Stabilizer codes and quantum error correction. PhD thesis, California Institute of Technology, 1997. [21] Daniel Gottesman. A theory of fault-tolerant quantum computation. arxive e-print quantph/9702029, 1997. [22] L. Grover. A fast quantum mechanical algorithm for database search. In Proc. 28 th Annual ACM Symposium on the Theory of Computation, pages 212 219, New York, 1996. ACM Press. [23] Nemanja Isailovic, Mark Whitney, Dean Copsey, Yatish Patel, Frederic T. Chong, and Isaac L. Chuang. Datapath and control for quantum wires. In ACM Transactions on Architecture and Code Optimization (ACM TACO 2004), New York, 2004. ACM Press. [24] BE Kane, NS McAlpine, and AS Dzurak et al. Single-spin measurement using single-electron transistors to probe two-electron systems. Phys. Rev. B, 61(4):2961 2972, January 2000. [25] Bruce Kane. A silicon-based nuclear spin quantum computer. Nature, 393:133 137, 1998. [26] D. Kielpinsky, C. Monroe, and D.J. Wineland. Architecture for a large-scale ion trap quantum computer. Nature, 417:709, 2002. [27] E Knill, R Laflamme, R Martinez, and CH Tseng. An algorithmic benchmark for quantum information processing. Nature, 404(6776):368 370, March 2000. [28] R. Laflamme, C. Miquel, J.-P. Paz, and W. H. Zurek. Perfect quantum error correction code. Phys. Rev. Lett., 77:198, 1996. arxive e-print quant-ph/9602019. [29] D Leibfried, R Blatt, C Monroe, and D Wineland. Quantum state engineering with Josephsonjunction devices. Reviews of Modern Physics, 75(1):281 324, January 2003. [30] Seth Lloyd. Quantum-mechanical computers. Scientific American, 273(4):44, October 1995. [31] Y Maklin, Schön, and A Schnirman. Quantum state engineering with Josephson-junction devices. Reviews of Modern Physics, 73(2):357 400, 2001. [32] P Seddighrad MI Dykman, PM Platzman. Qubits with electrons on liquid helium. Phys. Rev. B, 67(155403), 2003. [33] C. Monroe, D. M. Meekhof, B. E. King, W. M. Itano, and D. J. Wineland. Demonstration of a fundamental quantum logic gate. Phys. Rev. Lett., 75:4714, 1995.

BIBLIOGRAPHY 153 [34] M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, Cambridge, UK, 2000. [35] M.A. Nielsen and I.L. Chuang. Quantum computation and quantum information. Cambridge University Press, Cambridge, England, 2000. [36] Mark Oskin, Fred Chong, Isaac Chuang, and John Kubiatowicz. Building quantum wires: The long and the short of it. unpublished, 2002. [37] Mark Oskin, Fred Chong, Isaac Chuang, and John Kubiatowicz. Building quantum wires: The long and the short of it. In Proc. International Symposium on Computer Architecture (ISCA 2003), New York, 2003. ACM Press. [38] Mark Oskin, Frederic T. Chong, and Isaac L. Chuang. A practical architecture for reliable quantum computers. IEEE Computer, 35(1):79 87, January 2002. [39] R. L. Rivest, A. Shamir, and L. M. Adelman. A method for obtaining digital signatures and public-key cryptosystems. Technical Report MIT/LCS/TM-82, MIT, 1978. [40] C.A. Sackett, D. Kielpinsky, B.E. King, C. Langer, V. Meyer, C.J. Myatt, M. Rowe, Q.A. Turchette, W.M. Itano, D.J. Wineland, and C. Monroe. Experimental entanglement of four particles. Nature, 404:256 258, 2000. [41] L. Schulman and U. Vazirani. Molecular scale heat engines and scalable quantum computation. In 31st STOC, 1999. [42] C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379 423, 623 656, July and October 1948. [43] P. Shor. Algorithms for quantum computation: Discrete logarithms and factoring. In Proc. 35 th Annual Symposium on Foundations of Computer Science, page 124, Los Alamitos, CA, 1994. IEEE Press. [44] P. W. Shor. Scheme for reducing decoherence in quantum computer memory. Phys. Rev. A, 54:2493, 1995. [45] A.J. Skinner, M.E. Davenport, and B.E. Kane. Hydrogenic spin quantum computing in silicon: A digital approach. Phys. Rev. Lett., 90(087901), February 2003. [46] A. Steane. Multiple particle interference and quantum error correction. Proc. R. Soc. London A, 452:2551 76, 1996. [47] A. Steane. Simple quantum error correcting codes. Phys. Rev. Lett., 77:793 797, 1996. [48] Andrew M. Steane. Overhead and noise threshold of fault-tolerant quantum error correction. arxive e-print quant-ph/0207119, 2002. [49] L. M. K. Vandersypen et al. Experimental realization of shor s quantum factoring algorithm using nuclear magnetic resonance. Nature, 414:883, 2001. [50] Lieven M.K. Vandersypen, Matthias Steffen, Gregory Breyta, Costantino S. Yannoni, Richard Cleve, and Isaac L. Chuang. Experimental realization of order-finding with a quantum computer. Phys. Rev. Lett., 85(25):5453 5455, December 15 2000.

BIBLIOGRAPHY 154 [51] D. Vion, A. Aassime, A. Cottet, P. Joyez, H. Pothier, C. Urbina, D. Esteve, and M. H. Devoret. Maniplating the quantum state of an electrical circuit. Science, 296:886, 2002. [52] J. von Neumann. Various techniques used in connection with random digits. In Applied Math Series, volume 12, pages 36 38. National Bureau of Standards, 1951. Notes by G.E. Forsythe. Reprinted in von Neumann s Collected Works, Vol. 5, Pergamon Press (1963) pages 768 770. [53] Rutger Vrijen, Eli Yablonovitch, Kang Wang, Hong Wen Jiang, Alex Balandin, Vwani Roychowdhury, Tal Mor, and David DiVincenzo. Electron spin resonance transistors for quantum computing in silicon-germanium heterostructures. arxive e-print quant-ph/9905096, 1999. [54] W.K. Wootters and W.H. Zurek. A single quantum cannot be cloned. Nature, 299:802 3, 1982. [55] Y Yu, S Han, X Chu, S-I Chu, and Z Wang. Coherent temporal oscillations of macroscopic quantum states in a Josephson junction. Science, 296(5596):889 892, May 2002.