Optimal Electric Field Estimation and. Control for Coronagraphy

Transcription

1 Optimal Electric Field Estimation and Control for Coronagraphy Tyler D. Groff A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy Recommended for Acceptance by the Department of Mechanical and Aerospace Engineering Adviser: Dr. N. Jeremy Kasdin September 212

3 Abstract Detecting and characterizing extrasolar planets has become a very relevant field in Astrophysics. There are several methods to achieve this, but by far the most difficult and potentially most rewarding approach is direct imaging of the planets. Coronagraphs can be used to image the area surrounding a star with sufficient contrast to detect orbiting planets. However, coronagraphs exhibit an extreme sensitivity to optical aberrations which causes starlight to leak into the search area. To solve this problem we use deformable mirrors to correct the field, recovering a small search area of high contrast (commonly referred to as a dark hole ) where we can once again search for planets. These coronagraphs require focal plane wavefront control techniques to achieve the necessary contrast levels. These correction algorithms are iterative and the control methods require an estimate of the electric field at the science camera, which requires nearly all of the images taken for the correction. In order to maximize science time the amount of time required for correction must be minimized, which means reducing the number of exposures required for correction. Given the large number of images required for estimation, the ideal choice is to use fewer exposures to estimate the electric field. With a more efficient monochromatic estimation in hand, we also seek to apply this correction over as broad a bandwidth as possible. This allows us to spectrally characterize a target without having to repair the field for every wavelength. This thesis derives and demonstrates an optimal estimator that uses prior knowledge to create the estimate of the electric field. In this way we can optimally estimate the electric field by minimizing the number of exposures required to estimate under an error constraint. With an optimal estimator in place for monochromatic light, we also demonstrate a controller that can suppress the field over a bandwidth when provided with this monochromatic estimate. The challenges, current levels of performance, and future directions of this work are discussed in detail. iii

4 Acknowledgements Many people have contributed towards my successes in life over a very long period of time. First, I thank my adviser, Dr. N. Jeremy Kasdin. You have been an excellent mentor over the years and I consider you a great friend. Thank you for trusting me to use your laboratory, with a resource like that it is hard not to be successful. Your enthusiasm for this work has kept me engaged and made working with you very enjoyable. I greatly appreciate the time I have spent debating and discussing all technical aspects of our work. I have learned a lot from you and it has most certainly left a great impression. I also thank my committee and readers, Drs. David Spergel, Robert Vanderbei, Mike Littman, and Craig Arnold. Your continued advice over the past five years has been greatly appreciated, as has the unique perspective each of you has taken on the work I present here. I will always be grateful for your continuous presence and willingness to discuss any challenge that I have faced. In addition to my adviser and committee, the faculty here at Princeton have been incredibly supportive. Dr. Dick Miles continued support and interest in my graduate career has been to my great benefit. Much of my understanding of optics can be credited to him. Dr. Robert Stengel has taught me a great deal about estimation and control, and I have very much appreciated his ongoing interest in my research. I also thank Ed Turner for providing so many opportunities to work at the Subaru telescope, which has played a substantial role in my career. I also thank the support staff in the department, particularly Jessica O Leary, Jill Ray, and Candy Reed, who always seem to be able to solve any problem. The post-docs in our group over my time have been limited to Mike McElwain and Alexis Carlotti. I have enjoyed working with both of you, and look forward to more. In addition to great friendship, I owe a debt of gratitude to the older students from my research group. Drs. Amir Give on, Laurent Pueyo, Jason Kay, Eric Cady, and Dmitry Savransky have all contributed to my understanding and development in this field. Amir, we did not overlap at Princeton but your continued presence at JPL has afforded us the opportunity to compare iv

5 notes and work together several times. Laurent, I love getting to banter about math and wavefront control with you. I always walk away knowing more, and I look forward to seeing you whenever I can. Jason Kay, it feels like an eternity since I walked in the door and you taught me everything about the lab, and I miss the loud banter and hatred for equipment crashes. I will never forget our train ride to Boston. Eric Cady, I value all of our work and conversations together and I am happy to see you so often as a colleague and friend. Best enabler ever. Dmitry Savransky, I also appreciate our continued friendship and I am grateful to see you on a regular basis. Thanks for tolerating my general impatience with computers. Our symbiotic approach to understanding optics and computers is sorely missed. To the younger students in our research group, it s been fun having you around. A J, you are picking up all the little quirks in the lab quickly and you have made my job much easier. It s been nice working with someone in there again and I m glad that isn t over. One nice aspect of this field is its closeness and friendliness. As a result I have spent a great deal of time with many individuals outside of Princeton and I consider them my extended research group. Drs. Olivier Guyon, Ruslan Belikov, Rémi Soummer, Frantz Martinache, and Stuart Shaklan in particular have all made significant contributions to the quality of the work presented here, and have given me much to think about. They have been very generous in lending advice, providing thoughtful conversation, and have quickly become people I consider to be very good friends. I have many friends from my time at Princeton that are not part of my group; Mac Haas, Mike Burke, Andy Stewart, Josh Proctor, James Michael, Will Larrison, Katie Quaranta, my fellow fifth years, the bonfire attendees, the softball team, and rock climbing crews have all made my time at Princeton quite enjoyable. From my college years at Tufts University I would like to thank the ME faculty there, particularly my advisers Dr. Gary Leisk and Dr. Robert White. They have gone above and beyond, and I am glad to stay in contact with them to this day. I also would like to thank the folks at DFM Engineering, where I truly learned how to do mechanical design and learned v

6 to love building telescopes. They are truly a family, and I have learned so much from them. I especially thank Dr. Frank Melsheimer and Kate Melsheimer for opening their home to me and taking me under their wing. I also thank the Astrogeeks of OELS. Steve Lee, Dave Olson, Ben Reed, and Poti Doukas have been a constant point of support. Steve, Dave, and Ben have been some of the most important and constant mentors in my entire life, and Poti quickly joined their ranks. I also thank all Astrogeeks and the entirety of the Outdoor Education Laboratory Schools. It was through this program that I discovered astronomy, and my continued participation has helped keep my wonder of the universe alive. It is in this spirit that I consider the work presented here to merely be part of a life series entitled I Wonder... I end by thanking my wonderful family. My Grandparents Roy, Sally, Shirley, Jim, Dean. My Parents Dean and Lauri. Your unwavering support through my entire life has gotten me to this point, and I could not have done it without you. My Brother Shawn. Thank you for serving our country, particularly in a time of war. Keep throwing rocks down hills, just make sure nobody is at the bottom...to the rest of my family, aunts and uncles and in-laws I thank you for your support as well, and for adding richness to my life. My beautiful and intelligent wife, Kimberley. Your support has been unparalleled by anyone. Your constancy, kindness, and intelligence have made my life (and this thesis) so much better. I like hanging out with you, and I want you to know that you are the love of my life. None of the work would have been possible without substantial NASA funding. This work was funded by NASA Grant #NNX9AB96G and the NASA Earth and Space Science Graduate Fellowship. This dissertation carries the number T-3243 in the records of the Department of Mechanical and Aerospace Engineering. vi

7 To the love of my life, Kimberley. This would all be meaningless without you. vii

8 Contents Abstract Acknowledgements List of Tables List of Figures List of Symbols Notation iii iv xi xii xvi xvii 1 Introduction Science Motivation Coronagraphs and Wavefront Control The High contrast Imaging Laboratory Two Deformable Mirrors in Series Fourier Optics Propagation: Fresnel Transform Imaging: Fourier Transform Controllability of Amplitude and Phase Pupil Plane Controllability: Angular Spectrum Image Plane Controllability: The Propagation Factor Numerical Transform Thesis Overview Chapter Assumptions viii

9 2 Focal Plane Wavefront Control Monochromatic Wavefront Control Wavelength Dependence of the Image Plane Continuous Bandwidth Constraint Windowed Stroke Minimization Extrapolating Estimates in Wavelength Chapter Assumptions Batch Process Electric Field Estimation Linearity of the Electric Field Pairwise Images DM Diversity: Batch Process Estimation Probe Shapes Kalman Filter Estimation Constructing the Optimal Filter Sensor and Process Noise Iterative Kalman Filter Optimal Probes: Using the Control Signal Chapter Assumptions Laboratory Results Monochromatic Performance DM Diversity Performance Kalman Filter Performance Broadband Performance Prior to Single Mode Photonic Crystal Fiber Photonic Crystal Single Mode Fiber Upgrade Final Remarks ix

10 6 Sources and Propagation of Error Precision of a Contrast Measurement Estimation Algorithms and Propagating Error Accuracy of Wavelength Extrapolation DM Controllable Space Experiment Stability and Laser Power Stability of Laser Power Final Remarks on Error Conclusions and Future Directions Parameter Adaptive Filtering Dual Controller Including Alternate Sensors Establishing a Reference Applying Reference to the Time Update Bias Estimation Final Remarks Bibliography 139 x

11 List of Tables 1.1 Coordinates in each plane Definition of all Kalman Filter Matrices Definition of Kalman Filter Vectors xi

12 List of Figures 1.1 Telescope Diffraction Atmospheric Adaptive Optics HCIL Optical Layout Ideal vs. Aberrated PSF HCIL Filter Mechanism Single DM FPWC Experiments Light Propagation Fourier Imaging Schematic Angular Spectrum Schematic DM Nominal Shapes Controllability of Amplitude Aberrations Numerical Dimension of Planes DM Actuation Over Control History Extrapolating Estimates in Wavelength Feedback Block Diagram Detector Noise Monochromatic Correction with DM-Diversity Monochromatic Correction with Kalman Filter: 4 Image Pairs Monochromatic Correction with Kalman Filter: 3 Image Pairs xii

13 5.4 Monochromatic Correction with Kalman Filter: 2 Image Pairs Monochromatic Correction with Kalman Filter: 1 Image Pair Broadband Correction: Pre-PCSM Extrapolated Results Broadband Correction: Pre-PCSM Extrapolate Individual Filters Broadband Correction: Extrapolated Results Broadband Correction: Extrapolated Estimate Individual Filters Broadband Correction: Direct Estimate Results Broadband Correction: Direct Estimate Individual Filters Phase of Propagation Uncertainty Interferometric Measurement of Superposition Interferometric Measurement of the Influence Function Contrast Stability Performance Comparison Sensor Schematic xiii

14 List of Symbols i Imaginary Number λ Wavelength E The complex valued electric field n Normal Vector, any vector normal to the Optical Axis q Point from which the field is known p Point of the unknown field r p/q Vector from the unknown to the known field Σ The Surface of Integration in the Rayleigh-Sommerfeld Integral Σ The Surface of Integration in the Rayleigh-Sommerfeld Integral S Differential Surface Unit z Free space propagation distance L{ } The lens operator f Focal length of an imaging optic ξ First coordinate in an Intermediate Plane η Second coordinate in an Intermediate Plane u First coordinate in the Pupil Plane v Second coordinate in the Pupil Plane x First coordinate in the Image Plane y Second coordinate in the Image Plane F{ } The Fourier Transform Operator xiv

15 A Amplitude Distribution, Typically at a Pupil Plane φ Phase Difference D The diameter of the pupil Shorthand for the Fourier Transform pup Subscript Indicating the Pupil Plane g(u, v) Arbitrary Aberrated Field (Complex) im Subscript Indicating the Image Plane C{ } Arbitrary Linear Operator I Intensity DH Subscript Indicating the Dark Hole I DH A Scalar, Average Dark Hole Intensity R Real Part of a Complex Variable λ The Central Wavelength Being Estimated a q Amplitude for a Single DM Actuator h Deformable Mirror Physical Height Change I Imaginary Part of a Complex Variable u Vector of DM Actuation Signals M Matrix Mapping DM Actuation to Image Plane Intensity b Matrix Operator Mapping Deformable Mirror-Aberrated Field to Image Plane Intensity d Inner Product of the Aberrated Field J Cost Function µ Lagrange Multiplier u opt The Optimal DM Command w(λ) Intensity Weight as a Function of Wavelength λ Bandwidth [Meters] λ 1 The Lower Bounding Wavelength xv

16 λ 2 The Upper Bounding Wavelength δ Scalar Weight on the Lagrange Multiplier α Amplitude of the Aberrated Field at the Pupil β Phase of the Aberrated Field at the Pupil z Noisy Sensor Measurement x The Current State, or Electric Field n Sensor Noise H Observation Matrix k Current Discrete Point in Time in Estimation P Covariance of the Electric Field R Sensor Noise Covariance Matrix K Gain Matrix Φ Time Update in a Discrete Time Filter Γ Linear Propagation of Control to Image Plane Electric Field w Process Noise Λ Linear Propagation of Process Noise to Image Plane Electric Field Q Process Noise Covariance Matrix xvi

17 Notation Search Area: The area in the image plane where the coronagraph has been designed to produce high contrast for the detection of dim companions. Dark Hole: A region in the aberrated image where wavefront control has been used to recover high contrast. Focal Plane Wavefront Control (Controller): This terminology refers to the control law being used in the correction algorithm Focal Plane Wavefront Correction: This encompasses the entire algorithm used to correct the wavefront, including the state estimator and control law. <, > Is the Inner Product of any Matrix and is used to evaluate the Intensity, I Σ, of the electric field in a given area of the image plane. Matrix Inner Product: an inner product that produces a scalar value, intended to describe a scalar value of the intensity in the dark hole, I DH. Scalar Inner Product: an element by element inner product of each value in a vector, intended to find the intensity distribution of the electric field in the dark hole, I im (x, y). xvii

18 Chapter 1 Introduction 1.1 Science Motivation Detecting and characterizing extrasolar planets has become a major point of focus in the astrophysical community. Directly imaging solar systems opens up a parameter space unavailable to current indirect detection methods such as radial velocity and transit photometry. These methods are highly successful, but biased towards large planets very close to their parent star. Even with impressive advances in these detection limits, they are only sensitive to orbits that intersect our line of sight with the parent star. Apart from astrometry, direct detection is the only method sensitive to face on orbits that do not cross our line of sight. It is also capable of taking reliable spectral measurements of the planet and does not require an orbital fit to certify the planets existence. So long as we can efficiently detect planets through direct imaging, we can dramatically increase the number of detectable systems. Much like galactic astronomy, planetary science relies on a large number of observations of many systems to build up our understanding of the time evolution in solar systems. Such models that describe the formation of solar systems and their major orbiting bodies need as much and as detailed a data set as possible. The increased parameter space and the spectral information that direct imaging has to offer makes it a very compelling method for 1

19 this purpose. Planets are generally classified as jovian or terrestrial bodies, which in our solar system also correlates to their mass and proximity to the Sun. It is expected that the same mass correlation will hold, making current detection methods more sensitive to jovian planets. In fact, over 7 planets have been detected, but the vast majority are between 1 and 1, times the mass of Earth [1]. This is largely due to the fact that most detection methods are more sensitive to higher mass planets with a small orbital period. These detection schemes do not directly image the orbiting body, but measure the effect of the planet on the stellar signal in the form of a periodic doppler shift or drop in intensity. The Kepler mission uses transit photometry to detect orbiting bodies, and has been highly successful [13, 12]. It is even capable of detecting Earth analogs (and has already come close)[4], but is incapable of spectrally characterizing the planet in any way. A major disadvantage of the indirect detection techniques is that they are fundamentally finding a best fit solution of periodic data to a Keplerian orbit. This requires observations over at least one orbit to make a detection, which means these observations are biased to high mass planets very close to their parent star. Directly imaging a planet does not exhibit the same sensitivity to mass (though reflected area does play a role in the intensity of the planet) and only requires enough observations to rule out the possibility of the body being a background star. Direct imaging also opens a new parameter space of orbits that are observed to be face on, which are undetectable by the indirect techniques. Since we are gathering light directly from the planet we can also directly measure the spectra of a planet, and directly observe the projection of its separation from the parent star. To spectrally characterize any detectable planet, we seek imaging methods that are capable of directly imaging Earth-analogs. The Terrestrial Planet Finder (TPF) telescopes in the late 199 s to early 2 s were NASA s original space-based concepts for such a mission. One was an imaging interferometer with a satellite constellation to produce a long baseline, commonly referred to as TPF-I [2]. The second was a coronagraphic imager based on a 2

20 4 8 m elliptical mirror, referred to as TPF-C. These did not become funded missions, but the science requirements developed for them are still used as a baseline for today s mission concepts. More recently, another concept has been developed where the telescope is flown in formation with a second satellite mounted to a starshade, or occulter, designed to create a diffraction limited shadow from the star, allowing the off-axis planet to pass by unobstructed into the telescope [16, 71]. Simultaneously, more advanced coronagraphic imaging concepts have been developed, making these the two leading concepts for a direct imaging mission [67, 42, 38, 73, 33, 37, 15]. Each mission has its own set of challenges, but the main objective of each is to mitigate the diffractive effects of the telescope s finite aperture. In all likelihood, a combined mission concept will maximize performance with regard to the number of targets than can be detected and characterized, and will mitigate risk involved with the mission. Of the two missions, the coronagraph applies most broadly to both ground and space instrumentation. This thesis focuses on the technology development for the coronagraph concept. 1.2 Coronagraphs and Wavefront Control The two primary obstacles to imaging very dim objects orbiting extremely close to their parent star are diffraction effects from the telescope and aberrations to the field from imperfections in the optical system. The finite aperture of the telescope results in a diffraction pattern that leaks starlight into the region where a dim companion would otherwise be seen. As shown in Fig. 1.1, this is not an issue of angular resolution. Neglecting any errors, an aperture larger than approximately 2 meters is capable of distinguishing two objects separated by 1 astronomical unit (AU) within 1 parsecs of the Earth. Fig. 1.1 shows that it is the relative intensity of the planet and the diffracted light that limits the detectability of a planet. This typically limits the detectable intensity of a companion to roughly one or two orders of magnitude dimmer than its parent star. Most generally, a coronagraph can 3

21 1 2 meter Telescope 1 1 meter Telescope Normalized Intensity Normalized Intensity λ/d λ/d Figure 1.1: Intensity profile of an image from a circular aperture with diameter (a) 2 meters, and (b) 1 meters. The off-axis source with unitary amplitude (red-dashed line) indicates that the object is resolvable by either telescope. The off-axis source with a peak intensity 1 1 lower than the star s peak intensity (solid-green line) shows the relative power of an Earth like planet and the diffracted energy from the star. If we were to solve the diffraction problem by making the telescope larger the aperture would have to be greater than 1 km in diameter. be defined as an optical system contained within the telescope that modifies the diffraction pattern imposed by the telescope s finite aperture. By attenuating the diffracted light at small angular separations, the coronagraph lowers the detection limit of a dim companion. The degree of suppression is quantified as contrast, a dimensionless parameter in the image plane. The contrast of any point in the image plane is defined as a fraction of the peak power in the point spread function (PSF) of the unobstructed aperture. A region of high contrast is commonly referred to as the search area and the closest point to the star s centroid that achieves the targeted contrast level is defined as the inner working angle (IWA) of the coronagraph. As the designed contrast and IWA decrease, the coronagraph becomes more difficult to manufacture and simultaneously becomes more sensitive to optical aberrations. Since a coronagraph is fundamentally sensitive to perturbations in the incident field (e.g. a second slightly off-axis source) it exhibits an extreme sensitivity to optical aberrations that 4

22 distort the field, as demonstrated in Fig.1.4. These can be aberrations that are incident on the telescope (as is the case when considering atmospheric turbulence) or they can be due to imperfections in the optical system itself. In either case we seek to correct these wavefront errors using deformable mirrors (DMs), computer controlled mirrors with high precision actuators bonded to the back surface. There are many DM technologies, with varying levels of actuator stroke, density, and precision. In all cases the purpose is to produce achromatic phase shifts that vary arbitrarily across the plane (limited by the number spacing of the actuators). Adaptive optics (AO) attempt to correct the atmospheric distortion incident on the telescope. The challenges of atmospheric AO are in the speed of correction ( 1 s - 1 s of Hz), correcting over a large field of view, and the potential for a high degree of nonlinearity [53, 51]. As shown in Fig. 1.2, a typical AO system samples the beam going Figure 1.2: Diagram of a generic ground-based atmospheric adaptive optics system. The wavefront sensor and DM are capable of correcting high speed aberrations that appear as phase at the DM (commonly a pupil) plane. All residual static and quasi-static speckle uncontrollable by the AO system leaves a residual that must be corrected with focal plane wavefront control techniques. to the science instrument using a dichroic mirror to measure the field at the pupil plane 5

23 with some form of wavefront sensing device. For the purpose of this thesis we will consider the atmospheric AO problem to be solved by an upstream AO system, or by considering a space-based observatory. To correct imperfections in the optical system, we are trying to repair the wavefront to a significantly higher degree of precision (since we are conceptually trying to repair residual static errors). This generally allows us to take approximations that make the problem linear, but the correction time is much slower. As a result, we require common-path techniques that account for the aberrations that reach the image plane. This thesis focuses on model based methods to estimate the electric field at the image plane using only the science camera, and control laws based on the estimated electric field at the image plane (rather than pupil plane) measurements. This estimation and control problem is commonly referred to as focal plane wavefront correction (FPWC). 1.3 The High contrast Imaging Laboratory The High Contrast Imaging Laboratory (HCIL) at Princeton tests coronagraphs and wavefront control algorithms for quasi-static speckle suppression. The collimating optic is a six inch off-axis parabola (OAP) followed by two first generation Boston Micromachines kilo- DMs in series and a shaped pupil coronagraph, which is imaged with a second six inch OAP (Figure 1.3). We use a shaped pupil coronagraph, shown in Figure 1.4(a), and described in detail in Belikov et al.[5]. This coronagraph produces a discovery space with a theoretical contrast of in two 9 regions as shown in Figure 1.4(b). At the Princeton HCIL, the aberrations in the system result in an uncorrected average contrast of approximately in the area immediately surrounding the core of the point spread function (PSF), which agrees with the simulations shown in Figure 1.4(d). Since the coronagraph is a binary mask, its contrast performance is fundamentally achromatic, subject only to the physical scaling of the PSF with wavelength. The lab can be configured with either a 635 nm monochromatic laser diode input, or a Koheras supercontinuum source. As shown in Fig. 1.5, 6

24 Figure 1.3: Optical ayout of the Princeton HCIL. Collimated light is incident on two DMs in series, which propagates through a Shaped Pupil, the core of the PSF is removed with an image plane mask, and the 9 search areas are reimaged on the final camera. before the supercontinuum source is injected into the laboratory experiment, it is first collimated by a 9 off axis parabolic element designed specifically for collimating/coupling of polychromatic light from a fiber. After the light is collimated it passes through a filter wheel where a set of interference filters allows us to sample narrow bandwidths in a λ/λ = 2% range around λ = 635 nm. After the light passes through the filter wheel it is recoupled with a second off axis element into a second fiber made by Koheras which is designed to be continuously single mode over the entire visible and near-infrared spectrum. This allows us to reproduce the wavelength nature of a light coming from a star, the importance of which is discussed in Ch. 5. Since the collimating/coupling elements rigidly attach the fiber tips to the 9 OAPs, alignment of the beam is determined entirely by tip-tilt variation of the collimated beam. To preciselly recouple the light back into the delivery fiber, the collimating element is rigidly mounted to the filter wheel and the coupling element is mounted to a 7

25 Shaped Pupil 15 Normalized PSF from Ripple (mm) (mm) (a) Shaped Pupil λ /D (b) Ideal PSF Shaped Pupil After DMs Aberrated PSF from Ripple (mm) (mm) (c) Aberrated Pupil λ /D (d) Aberrated PSF 1 Figure 1.4: Example of the effect of an aberrated field incident on a Shaped Pupil coronagraph. The aberrations are simulated by Fresnel propagating the measured nominal shapes of the HCIL DMs to the pupil plane. Other sources of aberrations are not included because they have not been measured. (d) The PSF of the shaped pupil with the simulated aberrations. The figures are in a log scale, and the log of contrast is shown in the colorbars. tip-tilt stage. To eliminate ghosts, all interference filters have a small wedge between their exterior surfaces. To guarantee a quality alignment for all of the filters for a fixed tip-tilt, they must all be clocked inside the filter wheel so that when they are positioned within the beam, the wedge is aligned in the same direction. To guarantee stability of the coupling (which is sensitive on the sub-micron level) the entire optical train is sealed from the outside environment, eliminating any air flow through the system. With the system very compact and light, sealed, and highly rigid (since the tip-tilt mechanism is very stiff) we observe that the coupling is reliable over a period of weeks to months once it has been aligned. Since 8

26 Figure 1.5: Optical Layout of the Princeton HCIL s Filter Wheel. The light from the SuperK supercontinuum fiber is collimated by a Thorlabs reflective collimator (c, passes through a filter wheel which contains narrow band interference filter, and is recoupled into a Koheras Photonic Crystal continuously Single Mode (PCSM) fiber with another reflective coupler (RC8FC). The system is rigid with the exception of the tip-tilt mechanism, which is used to align the beam for coupling back into the PCSM fiber that delivers light onto the bench. the original HCIL experiment had proved to be limited by the stability of its old HeNe laser and its free space coupling into a fiber this was a critical design parameter for the filtering scheme. The two source configurations allows for testing of control algorithms in both monochromatic and broadband light (typically 1 2% of the central wavelength).the monochromatic experiments allow us to test controller performance very quickly, while leaving the results independent of any chromatic effects. Once a particular algorithm has been proven in monochromatic light, we can use the polychromatic configuration to test its performance over a larger bandwidth. 9

27 1.4 Two Deformable Mirrors in Series Focal plane wavefront control techniques have primarily been developed and tested at the Jet Propulsion Laboratory (JPL) High Contrast Imaging Testbed (HCIT) [24], the Subaru telescope s Phase Induced Amplitude Apodization (PIAA) testbed (which has been deconstructed) [33], more recently at the NASA Ames Coronagraph Experiment (ACE) [6], and Princeton s HCIL described in 1.3. JPL s HCIT is the only experiment in vacuum, and has tested several coronagraphs using the Electric Field Conjugation (EFC) algorithm. The primary goal at the HCIT is to test the limit of ultimate achievable contrast and IWA of each coronagraph and estimation scheme. The Subaru telescope s PIAA testbed was used for the initial verification of the PIAA coronagraph [33], which uses a pair of highly aspheric mirrors as a pupil remapping system to achieve nearly lossless apodization of the pupil for high contrast imaging at low inner working angle. This experiment has moved to JPL s HCIT [34] and progress with the PIAA coronagraph at the Subaru telescope has shifted to the Subaru Coronagraphic Extreme Adaptive Optics (SCExAO) system [5]. The ACE experiment focuses on low inner working angle coronagraphy using the PIAA coronagraph, primarily as a technology demonstrator for critical hardware required in an Explorer class mission. Princeton s HCIL is unique compared to these in that we focus on the development of estimation and control schemes, their efficacy for a true observatory environment, and their ability to relax coronagraphic tolerance. One of the most unique components of the HCIL, the two DMs in series, means that both amplitude and phase aberrations are fully controllable over the entire image plane [55]. The experiments at JPL, Ames, and Subaru only use one DM. This means they are only capable of correcting phase perturbations on both sides of the image plane, and energy from amplitude aberrations can only be shifted from one side of the PSF to the other. As a result they are only capable of reaching high levels of contrast on a single side of the image plane, as shown in Fig The ability to correct symmetrically in the image plane allows us to double the discovery space for planets, but makes the control problem (particularly in broadband light) much more challenging. The 1

28 (a) JPL s HCIT [23] (b) Subaru PIAA [33] (c) ACE [7] Figure 1.6: Single DM FPWC results from (a) JPL s HCIT [23], (b) Subaru PIAA testbed [33], and (b) NASA s ACE [7]. All three facilities use a single DM for, which is why all the results only exhibit a dark hole on a single side of the image plane. To achieve symmetric correction, as will be shown in Ch. 5, at least two DMs are needed to achieve any amount of amplitude controllability for symmetric dark hole correction. presence of two DMs at planes that are not conjugate to the pupil plane will be an underlying theme to the mathematical development for wavefront estimation and control, as well as many of the experimental challenges addressed in this thesis. By doubling the discovery space two DMs in series increases the likelihood of detecting an exoplanet in any mission, and adds redundancy in the wavefront control system. As a result, many wavefront control architectures for planet finding missions assume a 2-DM system [43, 69, 31, 45, 44, 7, 14]. With Princeton s HCIL being the only laboratory with this capability, the limitations and results of our work provides a unique and relevant body of information for future coronagraphic instrumentation. Pueyo [59] proved that, to first order, two-dms in series could correct both amplitude and phase, showed it was achievable over a bandwidth [54], and has indicated its necessity for coronagraphy on segmented apertures [56]. Kay [4] developed a DM independent estimation scheme to avoid the compounding effect of optical model error in a two-dm system and used this to generate symmetric dark holes over the largest published areas of the image plane, albeit at more modest contrast levels. We have also begun to develop algorithms that are capable of creating symmetric dark holes over finite bandwidths [29]. Overall, these experiments represent the only body of work dedicated to symmetric 11

29 dark hole generation and this thesis is a continuation of that effort. 1.5 Fourier Optics All of the control software and coronagraph designs that appear in this thesis were produced using Fourier optics. To more fully appreciate their validity and limitations, the relevant integrals are derived here beginning with the Rayleigh-Sommerfeld diffraction integral. Looking Figure 1.7: Relevant coordinates, vectors, and frames to propagate light from Σ to Σ. at Fig. 1.7, we begin by assuming a field on an arbitrary surface, Σ, originating from the point O. We wish to propagate this field to a plane centered about O, a distance z away. Using a vector notation from Kasdin and Paley [36], we define the chief ray from O to O as n = r O /O r O /O. (1.5.1) The Rayleigh-Sommerfeld integral evaluates the incident electric field on an arbitrary point, p, in the second plane, Σ, from every point, q, in the first plane, Σ. We define the vector 12

30 from O to q and from O to p as r q/o = u e u + v e v, and (1.5.2) r p/o = ue u + ve v + ze z. (1.5.3) Thus, we evaluate the vector from q to p as r p/q = r p/o r q/o (1.5.4) = (u u )e u + (v v )e v + ze z. (1.5.5) With all propagation vectors in place, the Rayleigh-Sommerfeld diffraction integral [25] describes the field at point p as E(p) = 1 iλ Σ = 1 iλ Σ E(q) cos(n, r p/q) e i 2π λ rp/q ds (1.5.6) r p/q E(q) n ˆr p/q 2π r p/q ei λ rp/q ds, (1.5.7) where ˆr p/q = r p/q r p/q (1.5.8) is the unit vector pointing from q to p. Even though the result is scalar, the integral is vector-based and quite complicated to solve. We simplify Eq by first applying the paraxial approximation. This assumes that lateral deviation from the chief ray is so small that every ray vector has propagated the same distance as the chief ray, making r p/q z. (1.5.9) 13

31 We also assume that for any combination of p and q, ˆr p/q is parallel to n, which means n ˆr p/q 1. (1.5.1) Applying these simplifications to the terms preceding the exponential in Eq , the Rayleigh-Sommerfeld diffraction integral simplifies to E(p) = 1 iλz Σ E(q)e i 2π λ r p/q ds, (1.5.11) which is the vector form of the Huygens-Fresnel integral. The application of the paraxial approximation means that Eq is limited to narrow-field imaging. Fortunately in coronagraphy, we are concerned with very small angular separations Propagation: Fresnel Transform It may at first appear strange that we do not apply the approximation of Eq to the exponential and further simplify Eq This is because the 1/λ term is of order , which amplifies small errors due to the approximation and causes rapid 2π periodic errors in the phase of the integrand. We instead make the Fresnel approximation by taking a first order expansion of the r p/q term in the exponential. In scalar coordinates, this becomes ( ) 2 ( ) 2 u u v v r p/q = z (1.5.12) z z [ z [ (u u ) 2 + (v v ) 2] ] O(2) (1.5.13) 2 14

32 With this approximation we find that the Fresnel Integral is E(u, v, z) = ei 2π λ z iλz + E(u, v )e i π λz[(u u ) 2 +(v v ) 2 ] du dv. (1.5.14) = F z {E(u, v )} (1.5.15) These approximations only hold for a narrow field, and the following equations will not accurately reflect the image distortion at large angular separation. The validity of the Fresnel approximation can be determined by evaluating the next highest term in Eq The propagation distance required for this term to contribute << 1 radian in the exponential of Eq is z 3 min >> π 4λ [ (u u ) 2 + (v v ) 2] 2. (1.5.16) Goodman [25] points out that this is a conservative estimate and makes arguments for a softer constraint if the aperture has fine structure and is illuminated by uniform plane waves. Otherwise it is safer to apply Eq to determine the validity of the integral. In the presence of deformable mirrors and shaped pupil coronagraphs the HCIL does have sinusoidally varying transmission and phase across virtually any plane, requiring that we must in fact check that Eq is satisfied. Thus, for the HCIL s standard 1mm pupil and 55nm light (the shortest wavelength used in the laboratory), a Fresnel transform only accurately reflects the diffraction pattern beyond a distance of z min = 38.5cm. The shortest free space propagation in the laboratory is between the two DMs, a distance of 47cm, which exceeds z min by a factor of 1.2. Knowing this is a conservative calculation, this is likely large enough, particularly since the incident field on DM1 is a uniform plane wave and there are no amplitude variations. The distance from DM2 to the pupil plane is 1.13 m 3z min, which is the last plane that the Fresnel integral must be applied. Since the amplitude non-uniformity at progressive planes is a direct result of phase to amplitude mixing, larger 15

33 distances would actually make the non-uniformity in the field worse. Thus it is likely that larger propagation distances for such a small aperture would not solve the problem, and the only solution is to use a higher fidelity integral if more precision is required. To date, the fresnel integral has not been found to be a limiting factor but this should be taken as a point of caution since two DMs in series have never proven dark holes below the levels reported in this thesis Imaging: Fourier Transform The advantage of Fourier optics is the simplicity with which we can relate the pupil and image planes. To show this, we will consider a field incident on a lens, E in, and evaluate the field one focal length downstream of the optic. We first define a lens operator, which describes the field exiting the lens as E out = L{E in } (1.5.17) In a true optic with finite thickness (even a mirror), the operator would be nonlinear and require a continuous integral in z over the entire displacement of the lens surface, or sag. This complicates computing the diffractive effect of a lens, so we seek an approximation to simplify the computation. We will first assume that the lens has a parabolic profile. In ray optics, a parabolic lens has the merit of maintaining an equal path length between the focal point and collimated plane of the optic, regardless of the ray we choose. This implies a constant phase change over the entire field at the output, but to see a similar advantage in diffractive optics we must assume that the optic is infinitely thin. With its effect being confined to a plane, its parabolic shape only contributes a phase to the incident field. Since the primary imaging optic is an F/1 parabolic surface with a 6 inch diameter, and we under fill the mirror by a factor of 15, the sag of the optic is less than a millimeter (< 1%) of the relevant aperture. Additionally, we are observing sufficiently on-axis that any affect on the 16

34 image from the mirror curvature will not be significant, making the infinitely thin, parabolic phase assumption valid in our system. Defining this parabolic phase to be a function of the focal length of the optic, f, the lens operator simplifies to L{E in (ξ, η)} = E in (ξ, η)e i π λf (ξ2 +η 2). (1.5.18) The thin lens approximation is increasingly valid for larger F/# = f/d systems because the sag is small. To propagate the output field from lens to the image plane using Eq , we will also assume that the optic is of infinite extent. The most rigorous way to quantify this is to show that nearly 1% of the energy is contained within the optic, which typically requires that the imaging optic be oversized compared to the incident beam by at least a factor of two. Since we overfill by a factor of 15, we are in fact containing nearly all of the energy from the pupil. Applying the Fresnel integral (Eq ) to the field L{E 1 (ξ, η)}, we find the field one focal length downstream of the optic (Fig. 1.8) to be E im (x, y, f) = = = = ei 2π λ f iλf ei 2π λ f iλf ei 2π λ f iλf ei ei 2π λ f + + L{E 1 (ξ, η)}e i π λf [(x ξ) 2 +(y η) 2 ] dξdη (1.5.19) E 1 (ξ, η)e i π λf (ξ2 +η 2) e i π λf [(x ξ) 2 +(y η) 2 ] dξdη π λf (x2 +y 2 ) π + 2π i E 1 (ξ, η)e λf [ξx+ηy] dξdη iλf ei λf (x2 +y 2) F {E 1 (ξ, η)}. (1.5.2) Neglecting the piston in phase, this shows that the electric field at this final plane is a quadratic phase factor multiplied by the Fourier transform, F{ }, of the incident field on an infinitely thin parabolic lens of infinite spatial extent. We define this plane as the image plane, which corresponds to the focal point of the optic in a ray trace. Now we will consider the special case shown in Fig. 1.8, where we generate the incident 17

35 Figure 1.8: The relevant planes to Fourier imaging in astronomical optics is the field, E an arbitrary plane a distance z prior to the imaging optic and the field immediately incident on the plane of the optic, E 1. In both cases the image plane electric field, E im, is located one focal length, f, downstream of the optic. field on the optic, E 1, by Fresnel propagating an arbitrary field, E, from one focal length upstream of the lens. In order to describe the image plane field in Eq as a function of E we will relate it to the field E 1 via the Fresnel integral (Eq ). It is pointed out by Goodman [25] (and many optics textbooks) that this is simply the convolution E 1 (ξ, η) = E h = + E (u, v)h(ξ u, η v)dudv, (1.5.21) where the kernel of this integral is h(ξ, η) = ei 2π λ f π iλf ei λf [ξ 2 +η 2 ]. (1.5.22) Recognizing that we ultimately seek the Fourier transform of E 1 (ξ, η), not E 1 (ξ, η) itself, we take the Fourier transform of Eq directly. By applying the Fourier convolution 18

36 theorem, we find F{E 1 (ξ, η)} = F{E } F{h} (1.5.23) = = ei 2π λ f iλf F{E } ei 2π λ f + iλf F{E }e i e i π λf [ξ 2 +η 2 ] e i 2π λf [ξx+ηy] dξdη π λf (x2 +y 2 ) + e i π λf [(x ξ) 2 +(y η) 2 ] dξdη = e i 2π λ f e i π λf (x2 +y 2) F{E }. (1.5.24) Applying Eq to Eq , the image plane field becomes E im (x, y, f) = ei 4π λ f iλf F {E (u, v)}. (1.5.25) Thus, apart from a piston phase term, Eq shows that the electric field one focal length downstream of the optic, defined earlier as the image plane, is an exact Fourier transform of the electric field incident one focal length upstream of the optic. We define this as the pupil plane. Throughout this thesis the image and pupil are defined as exact Fourier conjugates of one another. For the purposes of FPWC we will use the image plane as a reference, requiring that a pupil be exactly one focal length in front of an imaging optic. The constant phase term in Eq is of no consequence since the original field is based off an arbitrary reference field with zero phase, and is often neglected in many texts. The pupil plane at the HCIL is 1 mm in diameter, designed to inscribe the kilo-dm aperture. We use a mm diameter OAP with a meter focal length to form the primary image. The system is F/152.4 at the primary image, and the overfill factor of the imaging optic is 15.24, more than adequate to use the infinite optics approximation in the control algorithms. 19

37 1.6 Controllability of Amplitude and Phase Before we start producing control algorithms, we must prove that by placing two DMs in non-conjugate planes we make both amplitude and phase aberrations controllable, allowing us to create symmetric dark holes in the image plane. We will do so in two different ways. First, we follow the work of Pueyo [59] and make an argument based in the pupil plane that two DMs in series are capable of correcting both amplitude and phase. Since we ultimately seek controllability at the image plane, we will also prove controllability by directly modeling the effect on the image plane electric field from an arbitrary perturbation at a non-conjugate plane. In addition to proving the controllability of both amplitude and phase at the image plane, the result will also provide a model that we can use in the estimation and control algorithms of Chs. 2, 3, and 4. In both cases we will treat the modulation of the DM shape as a perturbation to the electric field at an arbitrary plane, p, upstream of the pupil, as shown in Fig We account for the fact that the DM is of finite size by including the aperture function, A p, but the perturbation induced by the DM is exclusively in phase. Figure 1.9: Phase perturbations at plane p are propagated to the pupil plane, defined as being on focal length away from the imaging optic. The lens acts as a Fourier transforming device, producing the image plane electric field one focal length after the optic. 2

38 1.6.1 Pupil Plane Controllability: Angular Spectrum Beginning with the controllability proof by Pueyo [59], we decompose the intermediate plane, p, shown in Fig. 1.9 into a Fourier series. We then Fresnel propagate this field to the pupil plane and apply the pupil function, A pup, to find the pupil plane electric field, E pup. Since we are ultimately decomposing the solution to the Fresnel integral into a Fourier series, this is commonly referred to as the angular spectrum approximation. Given a unitary input field incident on a DM at plane p that is inducing a phase perturbation, φ, over the aperture, A p, the electric field is given by E p (ξ, η) = A p (ξ, η)e iφ(ξ,η) (1.6.1) Next we expand the exponential as a Taylor series and take a first order approximation, assuming that the phase perturbations are small enough that the second order term in the expansion is negligible. The linearized field is E p (ξ, η) A p (ξ, η) [1 + iφ(ξ, η)] (1.6.2) We now decompose the phase perturbations at plane p (Fig. 1.9), φ(ξ, η), into a sum of spatial frequencies. Following notation similar to Pueyo [59], we describe φ as a Fourier series in Cartesian coordinates. Summing over the integers (n, m) [, ] with amplitude b m,n, the linearized phase perturbation induced by the DM is φ(ξ, η) = m,n b m,n e i 2π D (mξ+nη). (1.6.3) 21

39 Applying Eq to Eq , the linearized field at plane p becomes E p (ξ, η) A p (ξ, η) + ia p (ξ, η) m,n b m,n e i 2π D (mξ+nη) (1.6.4) E nom + E pert, (1.6.5) where we have defined the unperturbed, or nominal, component of the field as E nom and the perturbed component of the field induced by the DM, E pert. The field due to the DM perturbation can now be propagated a distance z to a second plane via a Fresnel Transformation. For simplicity in notation we assume that this is the pupil plane, as shown in Fig Applying Eq , the effect of the perturbation, E pert, at the pupil plane is E pert,pup (u, v) = i m,n b m,n F z {A p (ξ, η)e i 2π D (mξ+nη) } (1.6.6) = i m,n e i 2π λ z + b m,n iλz A p (ξ, η)e i 2π D (mξ+nη) e i π λz [(u ξ) 2 +(v η) 2 ] dξdη. (1.6.7) Applying the coordinate transformations, (u ξ) 2 = (v η) 2 = ( u ξ mλz ) 2 ( ) 2 ( ) ( ) mλz mλz mλz + 2u 2ξ D D D D ( v η nλz ) 2 ( ) 2 ( ) ( ) nλz nλz nλz + 2v 2η, D D D D the sum of spatial frequencies can be pulled out of the integral, making the field E pert,pup (u, v) = i m,n b m,n e i 2π D (mu+nv) πλz i e D 2 (m2 +n 2 ) ei 2π λ z iλz + A p (ξ, η)e i π λz [ (u ξ+ mλz D ) 2 +(v η+ nλz D ) 2] dξdη. (1.6.8) 22

40 The result is the angular spectrum approximation. It is the same perturbation expansion as Eq , but simplifies the integrand of the Fresnel transform by using shifted output coordinates, u = u mλz D (1.6.9) v = v nλz D. (1.6.1) We indicate this coordinate shift in our operator notation with a subscript, F z { } (u,v ). Applying this notation, Eq is written as E pert,pup (u, v) = i m,n b m,n e i 2π D (mu+nv) πλz i e D 2 (m2 +n 2) F z {A p (ξ, η)} (u,v ). (1.6.11) Eq shows that the perturbed field at plane p can be computed as a sum of shifted and complex weighted values of the fresnel transformed nominal field, A p. Each term is shifted to the corresponding spatial frequency in the summation. Examining the value of this shift for the given laboratory parameters, λ max 7 nm n max, m max = 16 cycles/aperture z max 1.6 m D = 1.8 mm, we see that the maximum shift relevant for wavefront control is n max λ max z max D = 1.65 mm. (1.6.12) In our nominal control configuration at 1 cycles/aperture in 635 nm light this value becomes less than 1 mm. Even so, this is a significant fraction of the 1 mm pupil diameter used 23

41 in the laboratory so these shifts should not be neglected if this transformation were to be used to compute the field for control. Interestingly, the sharp edges of a shaped pupil mean that we cannot truncate the series to a small set for fear of introducing a Gibbs effect into numerical model. This precludes the utility of the angular spectrum factor for control, but it is convenient for rigorously proving that (to first order) two DMs can control both amplitude and phase aberrations. Recalling from Eq that the entire field incident on the pupil is a sum of the nominal field and the perturbed component, we write the incident field on the pupil plane as E pup (u, v) = F z {A p (ξ, η)} + i m,n b m,n e i 2π D (mu+nv) πλz i e D 2 (m2 +n 2) F z {A p (ξ, η)} (u,v ). (1.6.13) Applying the pupil function, A pup (u, v), to the incident field we find E pup (u, v) =A pup (u, v)f z {A p (ξ, η)}+ i m,n b m,n e i 2π D (mu+nv) πλz i e D 2 (m2 +n 2) A pup (u, v)f z {A(ξ, η)} (u,v ). (1.6.14) We will make one more simplification to Eq by assuming that the aperture function of the DM, A p (ξ, η), sufficiently overfills the pupil plane aperture, A pup, such that A pup (u, v)f z {A p } (u,v) A pup F z {A p } (u,v ) A pup (u, v). (1.6.15) This assumption is non-intuitive because we have decomposed the field into spatial frequencies, but it is equivalent to assuming that the Fresnel ringing from the edges of the DM is negligible because it is blocked by the aperture at the pupil plane, A pup (u, v). Under this assumption the perturbed field at the pupil plane, Eq , simplifies to E pup (u, v) A pup (u, v) [ 1 + i m,n b m,n e i 2π D (mu+nv) πλz i e D 2 (m2 +n 2 ) ]. (1.6.16) 24

42 To prove controllability of amplitude and phase, Pueyo [59] first points out that if the phase perturbations are small enough we may approximate the aberrated electric field, E abr, with arbitrary amplitude aberrations, A abr, and phase aberrations, φ abr, as E abr (u, v) = A abr (u, v)e iφ abr(u,v) (1.6.17) A abr (u, v)(1 + iφ abr (u, v)). (1.6.18) Thus, in the linear approximation, phase aberrations are purely imaginary and amplitude aberrations are purely real in the pupil. He then points out that the quadratic exponential in Eq effectively rotates the phase of each term in the series, making the contribution of each term a mixed complex value rather than a purely imaginary term. By linearizing this quadratic term and assuming there is a second DM exactly at the pupil plane, Pueyo [59] used a phasor representation to show that both real and imaginary aberrations can be perfectly conjugated at the pupil plane if we use a second DM at plane p to perturb the field. Since each component of the series summation can exactly conjugate an aberration at the pupil, It follows that these aberrations have been suppressed in the image plane [59] Image Plane Controllability: The Propagation Factor The argument made in Pueyo [59] was the first rigorous proof that DMs at non-conjugate planes were capable of controlling both amplitude and phase, but it required several approximations and linearizations of field perturbations at both the intermediate and pupil planes. We would like to understand the effect of a truly arbitrary perturbation from a DM at plane p on the image plane because this is ultimately where we want to control the field. Doing so necessitates propagating a non-conjugate plane all the way to the image plane so that we can compute the control effect, and evaluate how well we can correct a complex-valued field as a function of location in the image plane. Since we are also trying to produce a numerical model for control, we begin by assess- 25

43 DM1 Nominal Sur face DM2 Nominal Sur face y (mm) 2 4 Sur face Pr ofile (nm) Sur face Pr ofile (nm) 4 y (mm) x (mm) (a) DM1 Nominal Shape 1 x (mm) (b) DM2 Nominal Shape Figure 1.1: Interferometric measurement of the uncontrolled shape of the HCIL s two kilodms. Note the complex structure of the nominal surface. The amplitude of the high spatial frequency component is 2 3 nm on each surface. Low order spherical and cylindrical modes have amplitudes on the order of 1 s of nanometers. ing the accuracy of numerically propagating the DM surfaces via the Fresnel integral. As shown in Fig. 1.1, the nominal surface of each DM is quite complex and contains high spatial frequency errors with amplitudes of approximately 5% of the wavelength used in the experiment. Indeed, the high resolution of the DM actuation also implies commands that oscillate rapidly across the aperture. Capturing this well would require very high sampling when we solve the Fresnel integral numerically, dramatically increasing our sensitivity to numerical error and making the computation very slow (a bad thing for real-time control). We seek to simplify our transformation from plane p to the image plane so that we will be less sensitive to the discretization of our numerical integrator. This simplification will also clearly demonstrate how the image plane electric field, Eim (x, y), changes as we move the DM a distance z from the pupil. Beginning with Eq , we Fresnel propagate an arbitrary field at plane p to the pupil 26

44 plane. Applying the pupil function A pup, the total field at the pupil plane is E pup (u, v) = A pup (u, v)f z {A p (ξ, η)e iφ(ξ,η) } (1.6.19) 2π ei λ z + = A pup (u, v) iλz A p (ξ, η)e iφ(ξ,η) e i π λz[(u ξ) 2 +(v η) 2 ] dξdη (1.6.2) = [ A pup A p e iφ] h(u, v), (1.6.21) where the kernel to the convolution is h(u, v) = ei 2π λ z π iλz ei λz (u2 +v 2). (1.6.22) Using the Fourier transform relationship between the image and pupil plane given by Eq , the field at the image plane is E im (x, y) = = ei 4π λ f iλf F {[ A pup A p e iφ] h(u, v) } (1.6.23) ei 4π λ f iλf F { A pup A p e iφ }F{h(u, v) } (1.6.24) 2π ei = λ 2 fz λ (2f+z) 2π ei = λ 2 fz = λ (2f+z) 2π ei λ (2f+z) iλf F { + A pup A p e iφ} F { + A pup A p e iφ} e i π λz(u 2 +v 2 ) e i 2π λf [ux+vy] dudv (1.6.25) πz i e λf 2 (x 2 +y 2 ) e i λz[(u π z f x)2 +(v z f y)2 ] dudv (1.6.26) F { A pup A p e iφ} πz i e λf 2 (x 2 +y 2 ). (1.6.27) By applying a quadratic phase term at the image plane, Eq accounts for the propagation of an arbitrary field from plane p to the pupil plane. We refer to this quadratic term as the propagation factor. For control, the power of this result lies in the fact that we do not have to integrate the field twice to compute the field at the image plane from a perturbation at plane p. Note that in most scenarios, particularly with a shaped pupil coronagraph, 27

45 we can simplify Eq by recognizing that the aperture function will underfill the DM aperture, making A pup A p A pup. This is not a critical simplification, but makes the result slightly easier to understand since the PSF is dominated by the coronagraph. The propagation factor in Eq mixes the effect of the DMs perturbation to the electric field, φ, between real and imaginary parts in the image plane. One DM at a nonconjugate plane cannot correct both amplitude and phase aberrations by itself, but rather corrects a specific mixture of the two dictated by the propagation factor. Having established one DM a distance z = z p away from the pupil, we will include a second DM at yet another plane, q, a distance z q away from the pupil. By assuming that the contribution of each DM at the image plane electric field is additive for sufficiently small stroke, as proven in Pueyo and Kasdin [54] and Pueyo et al. [55], we aim to show sufficient coverage of the real and imaginary parts of the electric field in the image plane. It is now important to note that showing independent and simultaneous control of the real and imaginary parts of the field is equivalent to demonstrating control over amplitude and phase. The choice of one over the other is a matter of convenience in representing the field. We will find in the estimation and control chapters that it is more convenient to represent the image plane in real and imaginary parts, so for the sake of consistency we will also prove controllability of the image plane electric field in the same manner. The choice of the propagation distance for each DM, z p and z q, to the pupil is not critical except to guarantee enough phase to amplitude mixing once the field reaches the pupil. In a conventional system, one DM would be conjugate to the pupil and the second non-conjugate to the pupil. However, this has the disadvantage of requiring additional re-imaging optics to conjugate one mirror to the pupil. We will show that conjugating one DM to the pupil also reduces our coverage of the image plane, meaning that we will have poor controllability over both real and imaginary parts of the field across the entire image plane. To clarify this second point, we examine the contribution of a DM in Eq when it is non-conjugate to the pupil. Applying Euler s formula to Eq , we find the real and imaginary components 28

46 of the field to be E im (x, y) = 2π ei λ (2f+z) iλf F {[ A pup A p e iφ]} [ sin ( ) ( )] πz πz λf 2 (x2 + y 2 ) + i cos λf 2 (x2 + y 2 ). (1.6.28) We now see that the contribution of the propagation factor for a single DM oscillates between being purely real and purely imaginary across the image plane. The consequence is that one DM non-conjugate to the pupil exhibits regions in the image plane with poor mixing between real and imaginary components of the field. Using relevant parameters from the HCIL, and choosing z p = z 1 for DM1 and z q = z 2 for DM2, Fig shows the overall effect this has in the image plane by plotting the magnitude of the real term in Eq , sin( πz λf 2 (x 2 + y 2 )). Fig. 1.11(a) and Fig. 1.11(b) shows that there is a rapid oscillation in the degree to which the control effect becomes either purely real or purely imaginary, even at low to mid spatial frequencies. By choosing z 2 such that its propagation factor oscillates with a different period of the image plane, we have the ability to simultaneously control the real and imaginary parts of the field over the entire image plane. For example, Fig. 1.11(a) shows that the magnitude of the real component of the first DM s propagation factor is very close to zero at approximately 1 λ /D. Looking at Fig. 1.11(b), we see that the magnitude of the real term for the second DM s propagation factor is much larger. The fact that the mixing from each propagation factor is different means that we can arbitrarily correct either the real or imaginary part of the field at this location of the image plane. Another way to see this is to actuate the same spatial frequency, w, on each DM, each with its own amplitude, b 1 and b 2. Scaling by the wavelength, λ, the aperture size, D, and the focal length of the imaging optic, f, If we choose DM1 s perturbation to be ( ) 2πwDξ φ 1 (ξ, η) = b 1 cos, (1.6.29) λf 29

47 Real Term for DM1 1 Real Term for DM λ /D λ /D (a) DM1 Propagation Factor (b) DM2 Propagation Factor Real Part, Overlay of Both DMs 1 Imaginary Part, Overlay of Both DMs λ /D λ /D (c) Real Part Combined (d) Imaginary Part Combined Figure 1.11: Effect of the DM propagation on the real and imaginary parts of the electric field at the image plane. (a) and (b) show the magnitude of the real part due to the angular spectrum factor. (c) and (d) overlay the contribution of both DMs to the real and imaginary parts of the field respectively, indicating that there is good coverage of both real and imaginary terms in search area up to the control limit of the DM. This indicates that the system has a high degree of controllability over both amplitude and phase in monochromatic light. where ξ is the physical coordinate in the DM1 plane. We also apply the same shape for 3

48 DM2, making its perturbation to the field ( ) 2πwDσ φ 2 (σ, τ) = b 2 cos, (1.6.3) λf where σ is the physical coordinate in the DM2 plane. We now assume that b 1 and b 2 are small enough that A 1 (ξ, η)e iφ 1(ξ,η) A 1 (ξ, η)(1 + iφ 1 (ξ, η)) (1.6.31) A 2 (σ, τ)e iφ 2(σ,τ) A 2 (σ, τ)(1 + iφ 2 (σ, τ)) (1.6.32) we can use Eq to write the perturbation incident on the image plane from DM1 and DM2 as E pert,dm1 = 2π ei λ (2f+z 1) = b 1 2 = b 1 2 iλf F e i 2π λ (2f+z 1) iλf e i 2π λ (2f+z 1) iλf { ia pup b 1 cos ( 2πwDξ λf )} e i πz 1 λf 2 (x 2 +y 2 ) (1.6.33) e i πz 1 λf 2 (x 2 +y 2 ) F {Apup } (δ(x wd) + δ(x + wd)) (1.6.34) e i πz 1 λf 2 (x 2 +y 2 ) [ F {A pup } (x wd) + F {A pup } (x+wd) ], (1.6.35) and E pert,dm2 = 2π ei λ (2f+z 2) = b 2 2 = b 2 2 iλf F e i 2π λ (2f+z 2) iλf e i 2π λ (2f+z 2) iλf { ia pup b 2 cos ( 2πwDξ λf )} e i πz 2 λf 2 (x 2 +y 2 ) (1.6.36) e i πz 2 λf 2 (x 2 +y 2 ) F {Apup } (δ(x wd) + δ(x + wd)) (1.6.37) e i πz 2 λf 2 (x 2 +y 2 ) [ F {A pup } (x wd) + F {A pup } (x+wd) ] (1.6.38) respectively. With the same spatial frequency applied to both DMs, we apply two shifted copies of the PSF at the same locations in the image plane. They will each have an amplitude chosen by the amplitude of the perturbation applied to the DM, and a phase dictated by 31

49 their propagation factor. Since their total contribution to the image plane is E pert = E pert,dm1 + E pert,dm2 (1.6.39) = ei 4π λ f i2λf [b 1 e i 2π λ z 1 e i πz 1 λf 2 (x 2 +y 2 ) + b2 e i 2π λ z 2 e i πz 2 λf 2 (x 2 +y 2 ) ] [F {A pup } (x wd) + F {A pup } (x+wd) ], (1.6.4) we can vary b 1 and b 2 relative to each other to produce whatever ratio of real and imaginary parts we like at the image plane location (x, y) = (±wd, ). If we describe the perturbation from each DM surface as the sum of all controllable spatial frequencies we can extend this effect to any point in the image plane. Thus, the coverage provided by complementary propagation factors gives us sufficient degrees of freedom to simultaneously correct real and imaginary components of the field. As seen in Eq , it is this ability that allows us to create symmetric dark holes in the image plane, as opposed to the single sided correction shown in Fig To demonstrate the quality of coverage over the entire image plane, Fig. 1.11(c) and Fig. 1.11(d) overlays the contribution of both DMs to the real and imaginary parts of the field, respectively. Their relative separations favors coverage for the real part of the field at small working angles, but both real and imaginary components exhibit good coverage and never go to zero (note that inside the core of the PSF the value does not matter). In the example shown, the only nulls are nearly at the 16λ/D controllable limit of the the HCIL DMs (a consequence of the fact that the maximum spatial frequency a DM can directly command is half of the number of actuators across the aperture). We have now shown that the absolute value of z p and z q matters much less for controllability than their relative magnitudes. Looking to Eq , if z p /z q were a multiple of 2π they would have identical propagation factors and we would have poor coverage in certain areas of the image plane. This effect was also recognized by Shaklan and Green [63], but in the pupil plane. He points out that if a particular spatial frequency of the field can effectively be reconstructed if it is propagated by the Talbot distance, z t = 2D 2 /λw 2 in our 32

50 notation. This is equivalent to the argument made here for controllability in the image plane. Furthermore, had we chosen the DM at plane q to be conjugate to the pupil by including re-imaging optics, z q would be zero and the DM at plane q would not be able to compensate for regions where we have poor control with the non-conjugate DM at plane p. Having demonstrated good controllability of both the real and imaginary parts of the image plane electric field over the entire controllable space, we can guarantee good controllability of both amplitude and phase aberrations incident on the system. Additionally, Eq gives us a model for computing the control effect of a DM at a non-conjugate plane. We will use this propagation factor extensively in lieu of applying multiple numerical transformations to produce our estimation and control algorithms. 1.7 Numerical Transform Since the purpose of the DMs is to produce arbitrary surfaces that correct an arbitrary set of aberrations at the image plane (within the controllable limit of the DM), we cannot rely on analytical integration to find solutions for the DM shapes. Thus, our control laws require that we use numerical integration techniques to relate the DM, pupil, and image planes using the Fourier optics techniques described in 1.5. To that end we will use a matrix formulation to take the two-dimensional Fourier transform. For the purposes of defining zero at the center of each plane we will consider the dimension of the pupil and image as a quarter plane, defined by the number of elements (N pup, M pup ) and (N im, M im ) respectively. The dimension of each plane is defined in Table 1.1. Pupil Coordinates Dimension u (2 N pup ) 1 v (2 M pup ) 1 Image Coordinates Dimension x (2 N im + 1) 1 y (2 M im + 1) 1 Table 1.1: Coordinates in each plane 33

51 Keeping in mind the dimension of Table 1.1, we discretize the integration along the coordinates (u, v) in the pupil as (u k1, v k2 ), where (k 1, k 2 ) are integers from N pup : N pup and M pup : M pup respectively. Likewise, we discretize the image plane coordinates, (x, y), as (x j1, y j2 ), where (j 1, j 2 ) are integers from N im : : N im and M im : : M im respectively. Using this notation the pupil field, A, is discretized with indices A k1,k 2.Using the discretized pupil and image plane coordinates, we write the Fourier transform in Eq as a finite sum. Following a method similar to [72, 15], we describe the discrete form of the Fourier transform, denoted by ˆ, at a particular pixel location in the image plane as Â j1,j 2 = N pup M pup e i k 1 = N pup k 2 = M pup 2π λf v 2π k 2 y j2 i Ak1,k 2 e λf u k 1 x j1. (1.7.1) Using the column vectors defined in Table 1.1, each element of the image Âj 1,j 2 can be directly encoded into a two-dimensional matrix by writing the elements of the Fourier integrand as 2π i Â = e λf (y vt ) 2π i A e λf (u xt ) du dv, (1.7.2) λf where du and dv are the physical dimension of each pixel in the pupil plane, λ is the wavelength under consideration, and f is the focal length of the imaging optic. Describing an 34

52 individual element of A as a k1,k 2, the elements of the resultant matrices are 2π i e λf (y vt ) = exp 2π i λf y Mim v Mpup... y Mim v Mpup.. (1.7.3) 2π i e y Mim v Mpup... y Mim v Mpup a Mpup, Npup... a Mpup,Npup A =.. a Mpup, Npup... a Mpup,Npup u Npup x Nim... u Npup x Nim λf (u xt ) = exp 2π i λf.. (1.7.4) (1.7.5) u Npup x Nim... u Npup x Nim Where 1/λf scales the amplitude of the result and du dv scales the resultant image plane in physical units. Note that the constant phase factor and imaginary term (which itself is only a π/2 phase shift across the plane) have been left out because it is unnecessary for the control code. The first summation of Eq is found for every element along the entire first index by multiplying the last two matrices of Eq We define this intermediate matrix as 2π i G(j 1, k 2 ) = A e λf (u xt ) du dv λf (1.7.6) The rows of G(j 1, k 2 ) encode the values of the pupil index along the second dimension, k 2, and the columns encode the values of the image plane index along the first dimension, j 1. Thus, we index an individual element of G as g j1,k 2. The final matrix multiplication completes the summation, making the two matrix multiplications equivalent to evaluating g j1,k 2 = Â j1,j 2 = N pup k 1 = N pup A k1,k 2 e i N pup k 2 = N pup e i 35 2π λf u k 1 x j1 (1.7.7) 2π λf v k 2 y j2 gj1,k 2 (1.7.8)

53 for every discrete location in the image plane, (j 1, j 2 ). Comparing the two matrix multiplications in Eq to the scalar equations (Eq , 1.7.8), each element described by g j1,k 2 is encoded in the dimensionality of the matrix in G(j 1, k 2 ). The final multiplication will encode the scalar value described by Âj 1,j 2 in each pixel of the matrix, thereby constructing the two-dimensional Fourier transform. The computational merit of this method is well described and compared to other numerical methods in [72]. One of the key advantages of this method over a standard FFT is that the physical dimension of the second plane can be changed, which is critical to astronomical imaging where the extent of the pupil plane is orders of magnitude larger than the image plane. The flexibility of the numerical transform with regard to dimension in Eq makes it much simpler to sample the image plane appropriately. With regard to choosing the spatial sampling in each plane, there are several factors to consider. In the image plane, we will seek to normalize to the peak value of the PSF, which is located at (, ). For this reason, the peak value may not be accurately reflected in an even set of pixels since the exact center is not contained within any one pixel. Thus, an odd number of pixels is chosen to sample the image plane. In the pupil plane, the choice is the opposite. In many cases, we seek to exploit symmetry principles to simplify the computation of the transform as in [15]. Two more practical reasons apply to the computation of the DM shapes and manufacturing of the pupil. The DMs we use at the Princeton HCIL have an even set of actuators across the surface which means that the functions it can generate are fundamentally based on an even set across the aperture. Thus we can more accurately represent their surface with an even data set. More importantly, real world manufacturing processes define shapes based on the physical coordinates of a boundary, the precision of which is embedded in the numerical precision of the coordinates. There is a subtle difference between defining the aperture of the coronagraph as a set of boundaries (as in the manufacturing process) and defining the aperture as a two-dimensional set of values (as the design process using numerical transforms). Defining an odd set in the manner shown in Fig. 1.12(a) artificially 36

54 oversizes the pupil by a half a pixel in each direction. This can be repaired, but at the cost of shifting the definition of the center within a particular discrete value. No matter how you define the odd set, the edge and the center cannot share a common reference. If one is edge defined, the other is guaranteed to be center defined. To solve this, an even set of pixels is chosen, but Fig. 1.12(b) demonstrates that this can also be done incorrectly. In this case the definition of the physical coordinates has resulted in a half pixel shift of the pupil, which will artificially impose a tip-tilt discrepancy in the phase computed for the image plane electric field. Fig. 1.12(c) demonstrates the proper definition of physical coordinates for an even discretization in the pupil plane. This set produces pixels whose physical coordinates are edge defined. Unlike the odd case both the center and edges of the pupil are edge defined, dramatically simplifying the physical mapping of the pixel values to the boundary coordinates required to manufacture the pupil. In summary, the appropriate method for discretizing these planes is to create an edgedefined even set for the pupil (or any intermediary plane) and a center-defined odd set in the final image plane. The matrix representation of the numerical Fourier transform shown in this section is highly reliable and efficient while maintaining a degree of flexibility necessary to dimension each plane appropriately for astronomical imaging. 37

55 (a) Odd Set: Oversized Pupil (b) Bad Even Set: Shifted Pupil (c) Good Even Set: Edge Defined Figure 1.12: (a) and (b) represent poor choices for the numerical dimension of a pupil plane, whereas (c) demonstrates the ideal way to define an aperture for producing a coronagraph 38

56 1.8 Thesis Overview In this thesis we will use the tools developed in this chapter to develop FPWC algorithms to correct aberrations from the system optics that degrade the high contrast regions of our coronagraphic image. By doing so, we will be able to recover small regions of high contrast where we may once again search for planets. Ch. 2 develops the control laws and strategies that suppress the aberrated field in both monochromatic and broadband light. To reduce the number of exposures required for wavefront estimation we also develop an extrapolation technique that uses a single monochromatic estimate in the broadband wavefront control algorithm. In Ch. 3 we introduce the DM diversity estimation algorithm, the most common estimation scheme used for FPWC at any laboratory. It is a batch process estimator that computes the field via a left pseudo-inverse, which provides least squares minimal error on the field. The measurements are differential measurements at the image plane with conjugate DM shapes. We also address how to choose the probe shapes so that the problem is well posed and we guarantee measurements with high signal-to-noise. In Ch. 4 we develop a new estimation scheme to replace the DM diversity estimator, utilizing a Kalman filter. The filter still uses the same probe scheme, but uses fewer measurements since we are able to close the loop on the state estimate. We demonstrate how this allows us to operate more efficiently and robustly, since we can rely on measurements taken when the aberrated field was brighter. Finally we discuss a method by which we can use the control signal itself to probe the field, enabling closed loop estimation using only a single measurement each iteration. Ch. 5 and Ch. 6 show the experimental results using these estimation and control schemes, and then discusses the modeling and experimental limitations that limit our ability to supress the field. In Ch. 7 we make our final conclusions and point to possible future directions of this work, particularly in the area of electric field estimation. As might be surmised in this overview, the challenge of this problem is not just the optics; the level of precision we must adhere to makes this an inherently challenging estimation and control problem. Beyond that our fundamental measurement of error, the accuracy and 39

57 precision with which we can manipulate the electric field, is completely unobservable. As such, our criteria for estimator and controller performance is not simply quantified by the ultimate achievable contrast, but the efficiency and robustness of the correction algorithm as well. At any point, this could be limited by the experiment, our model of the experiment from which we derive the estimation and control laws, or the correction algorithm itself. Improving our efficiency and robustness drives many of the choices we make in the mathematical development in this thesis because without improvements in all three of these categories a wavefront control system will never work in a real space observatory. 1.9 Chapter Assumptions The following assumptions were made in the derivations of this chapter From Rayleigh-Sommerfeld to Huygens-Fresnel: The paraxial approximation; the magnitude of any propagation vector is equal to that of the chief ray. The on-axis approximation, which assumes that the dot product of any ray with the chief ray is equal to one From Huygens-Fresnel to the Fresnel Integral: A first order binomial expansion of the propagation distance in the exponential, the validity of which is determined by Eq From Fresnel to Fourier Transform Imaging: Infinitely thin optics of infinite extent (or large enough to effectively capture all of the energy) The optic is modeled as only applying a quadratic phase to the field incident on the optic 4

58 Pupil Plane Controllability: Angular Spectrum: The phase perturbations at the p th plane upstream of the pupil are small enough that a first order linearization may be taken in the exponential. The pupil aperture is undersized enough compared to the aperture at the p th plane that its Fresnel propagation (and the shifted Fresnel integral) have no effect on the field at the pupil plane. In principle this need only be undersized enough to cover the Fresnel ringing induced by the propagation of the aperture at the p th plane Image Plane Controllability: The Propagation Factor: The contribution of each DM perturbation is additive in the image plane. For the analytical proof of controllability using a single spatial frequency, we assume that the control amplitudes are small enough that a first order linearization of each DM plane is valid and that the DM can be described as a superposition of spatial frequencies. This does not rule out controllability for larger stroke, in this scenario an argument was made based on coverage of phase mixing. A continuous set of spatial frequencies are controllable up to the controllable limit of the DM. 41

59 Chapter 2 Focal Plane Wavefront Control The goal of any wavefront correction algorithm in high contrast imaging is to reduce the intensity of the aberrated field to a level that makes a planet detectable. Quantifying a detection limit is in itself complicated because in addition to the residual aberrated field we must account for photon and detector noise, background from the exozodiacal light, and integration time. For simplicity in the control algorithms and in quantifying their performance we simply quote directly measured values of the normalized intensity relative to the peak power of the PSF. In contrast to conventional adaptive optics systems where a non-common path wavefront sensor is used to control large perturbations in the field in a fast feedback loop, focal plane wavefront control (FPWC) does not include a wavefront sensor. The non-common path of the wavefront sensor in conventional AO makes it impossible to accurately measure the electric field at the science camera to the required level of precision (better than a part in 1 5 for an Earth-like planet). Extreme-AO systems currently under development will attempt to calibrate the non-common path to reach contrast levels of , but this has yet to be proven [48, 8]. In order to eliminate all non-common path elements the field at the science camera must be measured directly. Since the science detector can only measure intensity the field must be estimated, which is the topic of Chs. 3 and 4. We also seek a control law for the 42

60 DMs, which are upstream of the pupil (Fig. 1.3), based on the estimated electric field at the image plane. To do so requires that we map the effect of the DMs on the image plane electric field, which will rely on the propagation equations derived in Ch. 1. As a result the control laws derived in this chapter are model based, requiring that we have a good measurement of critical physical parameters such as aperture sizes, focal lengths, and propagation distances between critical planes. Most critical is the DM model, which describes the mirror shape as a function of the applied voltages. 2.1 Monochromatic Wavefront Control There are a variety of control laws we may choose for this problem. Energy Minimization is one of the first attempts at a model-based control law; it computes actuator commands that minimize the intensity in a region of the image plane [49, 11]. This controller proved to be numerically unstable because the field at the image plane must be inverted to compute the DM commands. Since this matrix is driven towards zero it comes close to being singular, making the computed control highly inaccurate. Electric field conjugation regularizes the inversion by driving the field to a targeted value (typically the theoretical PSF) rather than attempting to drive the field to zero [23, 24]. This guarantees inversion, but is highly sensitive to the linearity of the control since it does not regulate the actuator strength. There is also no guarantee that the targeted field is reachable. Rather than using feedback control in this manner, we will seek to optimize the control effect in some fashion. Rather than using our control signal from the DM to minimize the average contrast in our dark hole, I DH, we seek a solution that minimizes the deformation across the DM s surface under the constraint that this achieve our targeted contrast level, 1 C. The algorithm developed by Pueyo et al. [55] achieves this by minimizing the sum of the squares for all the actuator strengths subject to the constraint that it achieve a specified average contrast in the area we seek to create a dark hole. To compute the affect of a DMs control signal and the incident aberrations on the 43

61 average contrast value in the image plane, we must propagate the resulting pupil plane field to the image plane via Fourier and Fresnel transformations developed in Ch. 1. Assuming an aberrated field being added to the nominal field incident on the pupil function, A(u, v), the pupil plane electric field is given by E pup (u, v) = A(u, v)(1 + g(u, v))e i φ(u,v), (2.1.1) where g(u, v) is the complex aberrated electric field and φ(u, v) is the total phase perturbation induced by the DM. The DM perturbation will be added to a pre-existing (also referred to as nominal ) DM shape, φ (u, v), about which we will ultimately linearize the phase induced by the DM. Recall that by applying the propagation factor, Eq , we can account for the fact that the DMs are not conjugate to the pupil after we have computed I DH. Assuming that the phase perturbation from the DM, φ, is adequately small we take a first order approximation of the exponential. Linearizing the pupil plane electric field about the phase induced by the nominal DM surface, φ, we find E pup (u, v) = A(u, v)(1 + g(u, v))e iφ (1 + i( φ(u, v) φ )). (2.1.2) Additionally, we assume that the product g( φ φ ) is negligible compared to the other terms since they will both be small and of approximately the same magnitude. Thus E pup (u, v) = A(u, v)e iφ (1 + g(u, v) + i( φ(u, v) φ (u, v))). (2.1.3) If we assume that we are starting from φ =, we eliminate the exponential making E pup (u, v) = A(u, v)(1 + g(u, v) + iφ(u, v)). (2.1.4) 44

62 Applying Eq , the linearized form of the image plane electric field about φ (u, v) = is E im (x, y) = F{A(u, v)} + F{A(u, v)g(u, v)} + if{a(u, v)φ(u, v)}. (2.1.5) The linearization has simplified our representation of the image plane electric field by making the effect of each component additive. Potentially the most useful outcome of this is the flexibility with which we can account for a non-conjugate DM. In this event, we may generalize the Fourier transform of φ in Eq as a linear operator, C{ }, that includes the propagation of the DM by applying the propagation factor in Eq Knowing this field, we now compute the image plane intensity to be I im (x, y) = 2. C{A(u, v)} + C{A(u, v)g(u, v)} + ic{a(u, v)φ(u, v)} (2.1.6) We then integrate the image plane intensity over the dark hole to find I DH = C{A} + C{Ag} + ic{aφ} 2 dxdy. (2.1.7) DH With the scalar intenisty, I DH, in place we seek to describe this in a way that will be useful for control. We first discretize the integral in Eq as I DH = i,j C{A} (i,j) + C{Ag} (i,j) + ic{aφ} (i,j) 2 x y, (2.1.8) where the pair (i, j) describes a discrete point in the two-dimensional plane. The physical dimension of each point, ( x, y), matches the pixel size of our detector. We now rewrite the quadratic sum Eq as an inner product: I DH = < C{A}, C{A} > + < C{Ag}, C{Ag} > + < ic{aφ}, ic{aφ} > + 2R{< C{A} + C{Ag}, ic{aφ} >} + 2R{< C{A}, C{Ag} >}. (2.1.9) 45

63 Eq now gives us the ability to consider the contribution of each component to the intensity. The middle three terms will fully describe the interaction of the DM actuation with the aberrated field. Conventional wisdom says that by design the coronagraph will achieve the required contrast levels when there are no aberrations. Thus by design, C{A} should be negligible compared to the aberrated field and the DM actuation. Consequently, the effect of < C{A}, C{A} >, < C{A}, C{Ag} >, and < C{A}, ic{aφ} > on the value of I DH are negligible compared to the other three terms in Eq Here we must note that efforts are being made to improve coronagraph performance (e.g. IWA and throughput) by relaxing the contrast level of the nominal PSF to be equal with the amplitude aberrations present in the system [39]. In this case, the DMs are being used to reach contrast levels below the PSF (being equivalent to pupil mapping [57, 58]) which requires that this term be accounted for in the controller. As will be shown in Ch. 3, the estimator includes the nominal PSF in the state estimate of the image plane electric field. If we compute the scalar contrast value from a single image this will also include the contribution of the nominal PSF in the measurement. Thus, we do in fact account for the contribution of the nominal field in the cost function and the controller will be capable of suppressing beyond the nominal value of the PSF. Reordering Eq so that each term reflects a measureable quantity in the laboratory, the intensity to be used for our cost function is I DH =< C{Aφ}, C{Aφ} > +2R{< C{A(1 + g)}, ic{aφ} >} + < C{A(1 + g)}, C{A(1 + g)} >. (2.1.1) Now we impose a matrix inner product so that we can describe the electric field and the DM control effect, our state and control variables, as column matrices. These are formed by stacking the columns of the two-dimensional fields to create a single column matrix, an approach used throughout this thesis. The state in our control algorithm is the column matrix describing the aberrated field at the image plane, C{Ag}. However, we have yet to 46

64 parameterize the DM commands into a control matrix, u. At the moment we could solve for C{Aφ} but what we seek are actuation commands for the DM, not its field at the image plane. One control approach might be to solve for C{Aφ}, compute the inverse transform, and use an arbitrary physical model of the the DM to compute the actuator strengths (which relates the surface height to the voltage commands). This requires real time transformations during the control loop, which will make the algorithm slower and more reliant on a lot of computational power. Instead, we will introduce a physical model of the DM so that the optimization can directly solve for the amplitude of each actuator on the DM. This will put less demand on the computer, and each control step will be faster. Letting H(x, y) be the height of the DM surface, the resulting phase perturbation induced by the DM, φ(u, v), is φ(u, v) = 2π λ H(x, y). (2.1.11) For control, we wish to describe DM surface height, H(x, y), as a combination of the twodimensional height maps imposed by each actuator. Since we are using a DM with a continuous face sheet, the contribution of any actuator will be highly localized but will still deform the entire DM surface. As a result, we must describe the contribution of the q th actuator as a two-dimensional phase map over the entire plane of the DM surface, h q (x, y). The continuous membrane means that the combination of all actuators is nonlinear, and very complicated to compute [9]. However, we will show later that we operate in an extremely low stroke regime. We will show in Ch. 6 that even with actuation levels nearly 4 times larger than our peak-to-valley actuator commands, the combination of actuators is close to linear (Fig. 6.2). Thus, we can describe H(x, y) as a superposition of h q (x, y) over all actuators, N act. The phase contribution of the DM can then be described as φ(u, v) = 2π N act h q (u, v). (2.1.12) λ Finally, we wish to make our control matrix, u, a column matrix made up of the control 47 q=1

65 signal from each actuator, u q. To do so we describe h q (u, v) as a characteristic shape with unitary amplitude, commonly referred to as an influence function, f q (u, v). To find h q (u, v) we simply multiply f q (u, v) by the control amplitude, a q. Describing h q (u, v) with influence functions, the phase perturbation induced by the DM is φ(u, v) = 2π N act a q f q (u, v), (2.1.13) λ q=1 which sums the q th 2-D phase map, or influence function f q (u, v), for all N act actuators to reconstruct φ(u, v). The strength of each influence function is determined by a q. Substituting Eq into Eq , we write C{Aφ} = C{Af}u, where u is a column matrix of actuator strengths, u = [a 1... a k ] T, and f is a matrix describing the perturbation of each influence function, f q, at the pupil. In this formulation, we are specifically applying a matrix inner product and C{Af} can be written as a matrix of dimension N pix N act. To simplify the notation we define this matrix as G = C{Af} [N pix N act ]. (2.1.14) This allows us to write < C{Aφ}, C{Aφ} >= u T G Gu. (2.1.15) Applying the matrix form of the control amplitudes to Eq we find I DH (λ ) = 4π2 u T M λ 2 u + 4π u T I{b } + d. (2.1.16) λ 48

66 Where M =< C{Af}, C{Af} >= G G (2.1.17) b =< C{A(1 + g)}, C{Af} >= G C{A(1 + g)} (2.1.18) d =< C{A(1 + g)}, C{A(1 + g)} >= C{A(1 + g)} C{A(1 + g)}. (2.1.19) Conceptually, d is the column matrix of the intensity contribution from the aberrated field, b is a matrix representing the interaction of the DM electric field with the aberrated field, and M describes the additive contribution of the DM to the image plane intensity. Having represented I DH in a quadratic form with regard to a control matrix, we can use Eq to produce an optimal control strategy. Recalling that the targeted contrast is 1 C, the optimization problem in monochromatic light is stated as minimize N a 2 k = u T u k=1 (2.1.2) subject to I DH (λ ) 1 C. To solve the optimization problem we create a cost function, J. Incorporating the constraint for the central wavelength into the minimization via a Lagrange multiplier, µ, yielding J = u T u + µ (I DH 1 C ) ( 4π = u T 2 u + µ u T M u + 4π ) u T I{b } + d 1 C λ λ 2 ( ) J = u T 4π 2 4π ( I + µ M u + µ u T I{b } + µ d 1 C). (2.1.21) λ λ 2 The cost function is quadratic in form, guaranteeing a single minimum. Recognizing that M = M T, we take the partial derivative to find actuation that minimizes the cost function. Evaluating J u T 4π = 2 (I 2 + µ uopt λ 2 ) 4π M u opt + µ I{b} = (2.1.22) λ 49

67 and solving for for the optimal control input, we find ( ) 1 λ u opt = µ 2π I + µ 2π M I{b }. (2.1.23) λ To find the value of u opt, all that is left is to find the value of µ that minimizes the cost function, Eq This is typically done via a line search on µ, evaluating u with Eq for each value of µ until Eq reaches a minimum. Pueyo et al. [55] have rigorously shown that this optimization is in fact a quadratic subprogram to the full nonlinear problem. They have also shown that for a single iteration, the controller is guaranteed to achieve the targeted contrast level, 1 C, provided the electric field is known perfectly and the DM control magnitude is small enough that it remains within the bounds of the current linearization. Since the sub-program is convex, we can reach its global minimum if we relinearize about the new DM shape at each iteration of the correction algorithm (or every time we apply a new DM command). This is computationally expensive, so we tend not to do this in the experiment. However the nature of the controller is such that the solution will not deviate dramatically from the optimization we get by re-linearizing each time. Since the controller is trying to minimize stroke, the control tends to remain in the regime of a particular linearization for as long as possible. Additionally, the magnitude of actuation is proportional to the contrast it is trying to suppress. Since each control step is operating on lower contrast levels the actuation magnitude will also tend to decrease, making the deviation from the last control shape smaller with each iteration. As an example, Fig. 2.1 shows the evolution of the mode, median, and absolute peak to valley deformation for DM1 and DM2 as a function of the control history. The vast majority of the stroke is used to eliminate strong abberations in the first 5 iterations. After this, the extrema and mode reduce dramatically, frequently a factor of 1 less stroke than the first iteration. Throughout the entire control history the median never deviates far from zero, meaning that a DC drift never develops. Perhaps the best balance is to re-linearize when the contrast performance is 5

68 Height Change (nm) DM1 Actuation Characteristics Mode Median Max Min Iteration (a) DM1 Actuation Per Iteration Height Change (nm) 1 5 DM2 Actuation Characteristics Mode Median Max Min Iteration (b) DM2 Actuation Per Iteration Figure 2.1: Time history of the mode, median, and peak-to-valley actuation levels for DM1 and DM2. The starting contrast is and final contrast is at the 3 th iteration of the control algorithm. poor (requiring large stroke), and discontinue re-linearizing as the contrast improves (since the second order term neglected in Eq will become less significant). Neither of these techniques guarantees that we reach the global optimum for the subprogram, but for a space telescope it is probably worth the computational savings since it will not deviate significantly. Pueyo et al. [55] also shows how to account for multiple DMs at non-conjugate planes. As shown in 1.6.2, using two DMs in planes non-conjugate to the pupil we can take advantage of their relative propagation to produce phase induced amplitude distributions at the pupil. This makes both the real and imaginary parts of the image plane controllable, allowing the creation of symmetric dark holes about the PSF. Thus, we can generate a dark hole within the entire search area made available by the coronagraph. By virtue of the linearity and small value approximations made to reach Eq and Eq , the affect is purely additive since the cross-terms, or cross-talk, between the DMs will be negligible. Including the propagation factors from Eq into the transform for DM1, C 1 { }, and DM2, C 2 { }, 51

69 the image plane electric field with two DMs in series is E im (x, y) = F{A(u, v)(1 + g(u, v))} + ic 1 {A(u, v)φ 1 } + ic 2 {A(u, v)φ 2 }. (2.1.24) Applying the inner produce, we find the intensity at the image plane to be I DH = < C 1 {Aφ 1 }, C 1 {Aφ 1 } > + < C 2 {Aφ 2 }, C 2 {Aφ 2 } > + < C 1 {Aφ 1 }, C 2 {Aφ 2 } > + < C 2 {Aφ 2 }, C 1 {Aφ 1 } > + 2R{< C{A(1 + g)}, ic 1 {Aφ 1 } >} + 2R{< C{A(1 + g)}, ic 2 {Aφ 2 } >} (2.1.25) + < C{A(1 + g)}, C{A(1 + g)} >. Since I DH is a scalar value, we can maintain the same form for Eq and Eq by simply augmenting the control matrix, u = [u DM1 u DM2... u DMi ] T. Using the same superposition principle and influence function set for the second DM, we define their control effect matrices, G DM1 and G DM2, as G DM1 = C 1 {Af 1 } (2.1.26) G DM2 = C 2 {Af 2 }. (2.1.27) With two DMs, the matrices in the control law given by Eq and Eq now take the form M = G DM1 G DM1 G DM1 G DM2 (2.1.28) G DM2 G DM1 G DM2 G DM2 b = G DM1 G DM2 C{A(1 + g)} (2.1.29) C{A(1 + g)} d = C{A(1 + g)} C{A(1 + g)}. (2.1.3) 52

70 Thus, the computation remains the same regardless of the number of DMs. The only change is in the dimension of u, M, and b, which encode the different propagation factors for each DM. Using this control law is not equivalent to other multi-dm concepts such as multiconjugate AO (MCAO) [52], woofer-tweeter concepts [48], or wide field AO [61]. As was shown in 1.6, it is the different propagation distances that gives two DMs in series their power in the control algorithm. It is also worth mentioning that in the event of a DM failure this provides redundancy in the mission, mitigating the risk involved in flying an unproven technology in space. Having developed a monochromatic wavefront control algorithm we now have confidence in its controllability with regard to both amplitude and phase aberrations in the image plane. Once we are provided with an electric field measurement we may solve the optimal control problem to suppress the field below a specified level on both sides of the image plane. 2.2 Wavelength Dependence of the Image Plane The Stroke Minimization algorithm in 2.1 only operates on a single wavelength, λ, which means that suppression of the field is neither optimal nor guaranteed over a bandwidth. Pueyo [59] showed through simulation that the bandwidth must be less than 1 2% of the central wavelength. Practically, we can also define a single wavelength by our image plane resolution, requiring that the difference in plate scale between the maximum and minimum wavelengths is less than a pixel (.1 pixel width). However, the primary purpose of directly imaging an exoplanet is to measure its spectra. Using the monochromatic control algorithm, we would have to correct for the aberrations at each wavelength separately to obtain a full spectrum of the planet, which would take prohibitively long. Instead, we would like to make the control algorithm effective over a bandwidth. This has the potential to improve the efficiency of spectral characterization and enables detection in a broadband image, critical in a photon limited system. Moving to broadband algorithms requires that 53

71 we have a good understanding of the nature of coherence from starlight. We already understand that the star is spatially coherent because of the large propagation distances; the curvature of the wavefronts from the star are so large and the point sources from different locations on the star are so close that we effectively image parallel, plane wavefronts. We also understand that the light from a star has extremely short coherence length, requiring equal path interferometers to interfere the light with itself. What we must address is how to integrate in wavelength to compute the broadband image. Knowing that the emission of a star is effectively from random radiators (in that they are not in phase with one another) we can assume that each wavelength will not interfere with the other. As a result we may integrate the intensity over wavelength to compute what our broadband image should be, making it very simple to augment the optimization problem to accommodate the additional wavelengths. Eq shows which terms are dependent for the specified wavelength in monochromatic light, λ. For arbitrary wavelength, λ, the dark hole intensity is I DH (λ) = w(λ) 4π2 λ 2 ut M λ u + w(λ) 4π λ ut I{b λ } + w(λ)d λ. (2.2.1) We have included a normalization function, w(λ), in Eq to account for the fact that the relative intensity of each wavelength will vary. To simplify the normalization, w(λ) is defined so that w(λ ) = 1, where λ will be centered in the control bandwidth, λ. It is also important to note that in addition to the chromaticity of the coronagraph (if any) and aberrations, M λ, b λ, and d λ vary in wavelength because of the transform, C{ }, making M λ = < G λ, G λ > (2.2.2) b λ = < C λ {A(1 + g λ )}, G λ > (2.2.3) d λ = < C λ {Ag λ }, C λ {Ag λ } >, (2.2.4) where M λ is simply the transformation from u to image plane intensity. Every control effect matrix, M λ, may be precomputed for each wavelength, assuming it remains within the linear 54

72 regime of the DM shape. However, b λ requires a measurement of the aberrated field to be computed. Since d λ is the intensity distribution of the aberrated field, this simply requires an exposure of the aberrated field at that wavelength. With b λ and d λ requiring an estimate of the current state, we will require more exposures if the field is to be evaluated at multiple wavelengths. 2.3 Continuous Bandwidth Constraint Knowing the wavelength dependence of the electric field in the image plane, we now seek an optimization that suppresses the field to a specified contrast level over a bandwidth, λ, centered about our central wavelength, λ. Thus, our statement of the problem becomes minimize subject to 1 λ N act a 2 k = u T u k=1 λ + λ/2 λ λ/2 I DH (λ) dλ 1 C. (2.3.1) There are two problems with this formulation. First, we have produced a numerically intractable solution, where the numerical minimization requires that a continuous integral be evaluated many times. This also requires an electric field measurement (still assumed to be perfectly known) over the full integral. Since the functional dependence in wavelength is unknown (despite our assumption to this point that the field is provided to us) this drives the number of required field measurements to infinity. Thus, the optimization problem should be solved for a discrete set of n wavelengths, changing the optimization problem to minimize subject to 1 N λ N act a 2 k = u T u k=1 N λ i=1 I DH (λ i ) 1 C (2.3.2) 55

73 In this formulation there is an implicit assumption that if we constrain a discrete set of wavelengths to fall under a targeted contrast value this will correspond to suppression of all wavelengths between them. For example, if we are provided two monochromatic estimates that bound a bandwidth we can find the cost function for Eq in the same manner as 2.1 to find a set of DM commands that suppress those two wavelengths. However, there is no guarantee that when we image the entire bandwidth we will maintain the targeted contrast level. We must discretize the wavelengths in the optimization but we lose our ability to guarantee suppression over the full bandwidth. In the end, successfully suppressing the field depends upon the size of the band and the number of discrete wavelengths chosen. To help guarantee suppression between the optimized wavelengths, we appeal to the concept of maintaining small phase shifts so that there are no dramatic changes in the wavefront. For any two wavelengths in the optimization, (λ 1, λ 2 ), the bandwidth between them should be small enough that λ 2 < 2λ 1. Since we target bandwidths of λ/λ = 1 2% there isn t a dramatic shift in the relative phase for these wavelengths, and our expectation is that the contrast should be maintained across the entire band. Looking further, the optimization does not guarantee a particular contrast level for each wavelength, only that their sum be below 1 C. In other words, formulating the problem in this way does not give us the freedom to weight the contribution of each wavelength to the optimization. For characterization we need the speckles to be suppressed equally well over all wavelengths, but suppression at one wavelength may be more important than at others. For example, if we bound a spectral feature that is much dimmer we would require higher contrast at this wavelength. While Eq is undoubtably the optimal solution with regard to suppressing a bandwidth below a particular value, it will not necessarily guarantee our ability to obtain a spectral measurement. Therefore we will continue to further constrain the problem to get the desired properties out of the controller. 56

74 2.4 Windowed Stroke Minimization As discussed in section 2.3 we seek to make the problem of correcting over a bandwidth computationally tractable by discretizing the integral in Eq to a summation of finite wavelengths. We cannot avoid the problem of guaranteeing suppression between wavelengths, but we can try to guarantee suppression of each wavelength in Eq by using multiple constraints in the optimization. Rather than simply summing the intensities we will impose a separate constraint for each wavelength. In this formulation, we will choose three wavelengths, one at the center of the bandwidth, λ, and two more providing the boundaries for the problem, λ 1, λ 2, to define a window over which the correction will be made. Applying three separate constraints, the optimization becomes minimize N act a 2 k = u T u k=1 subject to: I DH (λ ) 1 C λ, I DH (λ 1 ) 1 C λ 1, (2.4.1) I DH (λ 2 ) 1 C λ 2 where λ 1 = γ 1 λ λ 2 = γ 2 λ. Now we will find the optimal control law by augmenting the minimization with three Lagrange multipliers for each discrete value of the intensity. Following the same procedure as 57

75 in 2.1, we write the cost function as J =u T u + µ (I DH (λ ) 1 C λ ) + µ1 (I DH (λ 1 ) 1 C λ 1 ) + µ2 (I DH (λ 2 ) 1 C λ 2 ) ( )] J =u [I + 4π2 w(λ 1 ) w(λ 2 ) µ λ 2 M λ + µ 1 M γ1 2 λ1 + µ 2 M γ2 2 λ2 u T + 4π [ ] w(λ 1 ) w(λ 2 ) µ I{b λ } + µ 1 I{b λ1 } + µ 2 I{b λ2 } u T λ γ 1 γ 2 + [ µ ( dλ 1 C λ ) + µ1 w(λ 1 ) ( d λ1 1 C λ 1 ) + µ2 w(λ 2 ) ( d λ2 1 C λ 2 )]. (2.4.2) Taking the partial derivative of the resulting cost function yields the optimal DM command for a subset of wavelengths spanning the entire bandwidth λ. As before, the optimal command is determined by performing a line search on µ. We now have an optimization across three variables, complicating the task of minimizing the function. We still have the same problem as in 2.3 that the globally optimal solution in the three dimensional space (µ, µ 1, µ 2 ) does not necessarily guarantee the targeted suppression at all three wavelengths. However, with three Lagrange multipliers we can guarantee suppression of all three wavelengths by restricting the search to a single dimension. We write the Lagrange multipliers of the two bounding wavelengths as weighted values of the first so that µ 1 = δ 1 µ and µ 2 = δ 2 µ. Applying this relationship, Eq becomes 4π J =u [I 2 + µ + µ 4π λ λ 2 ( M λ + δ 1 w(λ 1 ) γ 2 1 M λ1 + δ 2 w(λ 2 ) γ 2 2 M λ2 )] u T [ I{b λ } + δ 1 w(λ 1 ) γ 1 I{b λ1 } + δ 2 w(λ 2 ) γ 2 I{b λ2 } + µ [( dλ 1 C λ ) + δ1 w(λ 1 ) ( d λ1 1 C λ 1 ) + δ2 w(λ 2 ) ( d λ2 1 C λ 2 )]. (2.4.3) ] u T 58

76 Taking the partial derivative and evaluating at zero, J u T = uopt =u opt [ 2 I + 2µ 4π 2 + µ 4π λ λ 2 ( M λ + δ 1 w(λ 1 ) γ 2 1 M λ1 + δ 2 w(λ 2 ) γ 2 2 [ I{b λ } + δ 1 w(λ 1 ) γ 1 I{b λ1 } + δ 2 w(λ 2 ) γ 2 I{b λ2 } M λ2 )] ], (2.4.4) we can use the value of µ that minimizes Eq to compute the following optimal command: u opt = µ [ I{b λ } + δ 1 w(λ 1 ) w(λ 2 ) I{b λ1 } + δ 2 γ 1 ( 2π I + µ 2π w(λ 1 ) M λ + δ 1 λ γ1 2 [ λ ] I{b λ2 } γ 2 )] 1 w(λ 2 ) M λ1 + δ 2 M γ2 2 λ2. (2.4.5) By parameterizing the three Lagrange multipliers we can weigh their effect on the cost function, thus allowing us to control the degree to which each constraint is satisfied. If we choose δ 1 = δ 2 = 1 we have made each contrast target equally important, making the problem equivalent to solving Eq We can also control the degree to which achieving the bandwidth affects the optimization. If we need a very soft correction outside of the central wavelength then we can choose δ 1 and δ 2 to be less than one. We may also find that we need to preferentially weight one side of the bandwidth to accommodate variance in absorption and emission from the planet. Eq and Eq have simplified the problem of optimal broadband correction by writing a cost function with a single Lagrange multiplier, while leaving the degree of freedom to weight the required performance of each wavelength. This parameterization constrains the path of the original 3D optimization to lie along a vector. The direction of the vector is arbitrary, set by the user based on the values chosen for (δ 1, δ 2 ). To evaluate this cost function we must account for the wavelength dependence of each matrix, which triples our 59

77 computational cost. The matrices describing the impact of the DMs on the electric field, M λ, can be precomputed since this is simply a linear map from unitary DM actuation to image plane intensity. The only difference between the three is the wavelength used in the transform, C λ { }. They only need to be re-evaluated when the system is re-linearized. Since b λ and d λ require a measurement of the current electric field, this cannot be pre-computed and must be measured for each wavelength. Practically, a windowed optimization allows us to seek commands to suppress over a given filter set in the instrument. In the most conservative case the bounding wavelengths would be the edges of the filter. In a more aggressive control mode the window could be chosen to span the smallest and largest wavelengths across a set of filters. The disadvantage of this approach is that it does not solve the problem of guaranteeing correction between the intermediate wavelengths. This does, however provide the freedom to arbitrarily weight the contrast performance separately for each wavelength. This degree of freedom is actually rather useful, since the relative intensity of the planet to its parent star is a function of wavelength. For example, in the visible spectrum Earth is 1 1 times dimmer than the sun but in the infrared it is only 1 6 times dimmer, and Des Marais et al. [17] show that we can expect order of magnitude fluctuations in the reflectance of a terrestrial body with an atmosphere within a 2% band. Thus we can use our weighting values (δ 1, δ 2 ) to relax the DM commands with respect to wavelengths that we do not expect to require such stringent contrast levels, purely based on the blackbody spectrum of the parent star. With regard to the control problem, we will find in the following sections that there is a significant amount of error introduced in the estimates at bounding wavelengths, regardless of whether they are provided by a direct estimate or an extrapolation method ( 2.5). The ability to underweight the bounding wavelengths in the cost function gives us the ability to soften the effect of those errors. In fact, we often found the best performance from the results in Ch. 5 when the bounding wavelengths were slightly underweighted. Noting this behavior, an interesting adaptive control scheme would be to modulate δ 1 and δ 2 based on an estimate 6

78 of the error introduced by the extrapolated fields at the bounding wavelengths. This could go so far as using these values to adjust a the functional form of the extrapolation that we will derive in Extrapolating Estimates in Wavelength The Windowed Stroke Minimization algorithm of 2.4 solves the bandwidth problem, but its implementation can be quite complicated because its wavelength dependence requires multiple transforms and field estimates. The wavelength dependent matrices, b λ and d λ, represent the component of the electric field from coupling between aberrations and the DM-induced perturbations and the intensity distribution of the aberrated field respectively. Computing both requires an estimate of the electric field at each iteration of the quadratic subprogram. In FPWC the estimate is what drives the correction time, not the controller. If we could measure the field directly, one iteration of the correction algorithm would only require one image per wavelength to measure the field plus one more to see the control effect. However, as will be discussed in Ch. 3, the electric field cannot be measured directly but must be estimated. This involves taking many exposures to estimate the field (the number of which is a major topic in Ch. 3), between two and eight exposures per wavelength. Once provided with an electric field estimate, we still only require one image to measure the control effect. As such, we only save one exposure per wavelength (two for Windowed Stroke Minimization) if we have to directly estimate each field. Thus we gain very little in controller performance by going to such a complicated algorithm. In terms of correction efficiency we would almost be better off correcting each wavelength individually using the monochromatic control law in 2.1. If we can eliminate our need to estimate every wavelength, we can reduce the number of exposures per iteration by as much as 6%. We still require a field estimate at a single wavelength, but from that we will attempt to extrapolate what the field is at other 61

79 wavelengths. To do this, we will first make some assumptions about the electric field to write a functional relationship describing how the aberrated field evolves in wavelength as it deviates from the estimate at the central wavelength, λ. To include the wavelength dependence of the transform, C λ { }, we will characterize the variance of the aberrations at the pupil plane. An arbitrary aberrated field at the pupil may be described as E pup,abr = α(u, v, λ)e iβ(u,v,λ). (2.5.1) For the sake of computational efficiency, we will describe the functional form of α(u, v, λ) and β(u, v, λ) by assuming that the errors are induced by optics and that these errors are effectively located at the pupil plane. Uniform amplitude variation in wavelength (such as the change in reflectivity of a mirror as a function of wavelength) will be absorbed by the normalization of the field, since it is analagous to intensity fluctuation in wavelength. Thus α(u, v, λ) describes the spatial variation of amplitude across the pupil as a function of wavelength. If we assume the amplitude variations due to system optics are analogous to reflectivity variations, i.e coating errors across the surface of the mirror, the amplitude aberrations become independent of wavelength, making Eq E pup,abr = α(u, v)e iβ(u,v,λ). (2.5.2) This assumption does not necessarily hold In the presence of reflectivity variations that exist at non-conjugate planes because of phase induced amplitude errors that arrive at the pupil. We acknowledge this as a limitation, but we will keep the assumption to maintain a simple functional relationship since this computation must be made during the control loop. Our task will be to see how effective this assumption is in the experiment. Moving to β(u, v, λ), we assume that the phase errors are from shaping errors within the system optics. We assume that these errors are exist at the pupil plane. Assuming a particular height perturbation to the shape of he optic, h(u, v), we can write the phase errors 62

80 β as β(u, v, λ) = 2π λ h(u, v). (2.5.3) This means that the phase errors in the pupil plane are inversely proportional to wavelength. This makes intuitive sense since a fixed perturbation induced by an optic applies a smaller phase disturbance as the wavelength increases. Defining the incident phase by our estimated wavelength, λ, as β (u, v) = 2π λ h(u, v). (2.5.4) We can rewrite Eq as a function of β (u, v), λ, and λ. The phase perturbation at the pupil then becomes β(u, v, λ) = f(β (u, v), λ) = λ λ β (u, v). (2.5.5) Applying Eeq and Eq to Eq , we find that the wavelength dependent aberrations at the pupil can be approimated as E pup,abr (u, v, λ ) = α(u, v)e i λ λ β (u,v). (2.5.6) Assuming an estimate of the electric field at the image plane, E est (x, y, λ ), that is only from pupil plane aberrations, our estimate of the pupil plane aberrations is given by g (u, v) = F 1 λ {E est (x, y, λ )}. (2.5.7) We now equate g (u, v) to Eq , making the phase of the pupil estimate e iβ = g α ( g ) iβ = ln. (2.5.8) α 63

81 Applying Eq , we shift the phase found in Eq to get iβ λ = i λ λ β = λ λ ln(g α ) α(u, v)e iβ λ = αe λ λ ln( g α ) α(u, v)e iβ λ = = αe λ λ (ln(g ) ln(α)) = α 1 λ λ λ g λ g λ λ g. (2.5.9) λ λ 1 Reapplying the linear transform for the new wavelength, F λ { }, we compute the extrapolated field from λ to λ to be E extrap (x, y, λ) = F λ { F 1 λ {E est (x, y, λ )} λ λ F 1 λ {E est (x, y, λ )} λ λ 1. } (2.5.1) Using Eq we now having the ability to extrapolate an estimate made at λ to bounding wavelengths. To minimize the bandwidth between estimates, we choose to estimate at the central wavelength and extrapolate the field for the bounding wavelengths. In applying the extrapolation, we run into a numerical complication. Since the estimate is finite, the inverse transform of its shape will convolve with the field we seek, imposing itself on the extrapolation. Fortunately the area being estimated and controlled is typically smaller than the image, so we can mitigate the effect by filling in the unknown area with the square root of the intensity found in the image measuring the control effect. Not knowing the phase, we are only adding partial information in this region but it serves to soften the effect of the finite area on the extrappolation. Fig. 2.2 shows an example of an estimate extrapolation over a λ = 1% bandwidth. Overlaid on the images are boxes defining the estimation area, inside of which a complex field is provided in Fig. 2.2(b). We then use Eq to compute the field at the bounding wavelengths, Fig. 2.2(a) and Fig. 2.2(c). 64

82 Lower Wavelength 4 Central Wavelength 4 Upper Wavelength (a) E extrap (λ 1 ) (b) E est (λ ) (c) E extrap (λ 2 ) 2 Figure 2.2: Example of wavelength extrapolation using Eq in a bandwidth of λ = 1%. The central wavelength(b) is used to extrapolate the field at a wavelength at the bottom (a) and top (c) of the window. The evolution of the aberrations is more complicated than a physical scaling law. The extrapolated field estimates for the upper and lower wavelengths are then taken from the area inside the boxes that defines the correction area. At this stage, it is possible to mitigate our uncertainty in the phase outside of the estimation area with a Gerchberg- Saxton like algorithm. We would recursively transform to each wavelength, replacing the areas outside of the estimate with the newly computed complex field. However, the search area defined by the image plane mask and the camera resolution may limit the accuracy of a Gerchberg-Saxton loop. These algorithms have been shown to require upwards of 2 cycles, each involving multiple 2D Fourier transforms, and are very costly with regard to computation time [4]. Additionally, the level of accuracy gained by such a technique is lost by the time evolution of the aberrations. This would be a function of computational power and the time scale of the aberrations. In a space observatory the computational power is low and on a ground telescope the speckles evolve quickly, making speckle evolution a very real possibility in either scenario. The idea is certainly worth pursuing in a highly stable laboratory environment to test if it can improve performance, but is unlikely to be of benefit as a true observation mode. By extrapolating (b λ1, d λ1 ), (b λ2, d λ2 ) Eq provides all the necessary information for the Windowed Stroke Minimization algorithm, Eq and Eq We can now attempt 65

83 broadband suppression using only a single monochromatic estimate. However, the simplification made for both amplitude and phase requires that errors in non-conjugate planes have a negligible effect on the aberrations in the pupil. Reflectivity variation across any mirror is generally very low since chemical vapor deposition is a very stable and reliable process. These errors are typically so low that they only become a limiting factor in extreme interferometric problems where the null must be very deep, such as the visible nuller coronagraph [46]. In this case the variations in amplitude from non-conjugate surfaces, such as the DMs (Fig. 1.3), will be negligible and Eq will be relatively accurate. However, by virtue of the fact that we use two DMs to correct amplitude via the propagation of phase deformations we cannot say the same for the assumption made on amplitude. The phase induced amplitude aberrations from DM1 and DM2 are significant due to the large nominal phase errors present on these surfaces. Accounting for these errors will add higher order wavelength dependence to α(u, v, λ), complicating the form of the transformation we found in Eq Since the spatial frequencies of these aberrations are mostly of very high order we will continue with our original assumption and hope that this error does not contribute significantly at the low spatial frequencies we are considering. 2.6 Chapter Assumptions The following assumptions were made in the derivations of this chapter Monochromatic Wavefront Control: Linear approximation made for the DM field. The control effect relies on the angular spectrum factor when a DM is non-conjugate to the pupil. g(u, v) and φ(u, v) are both small and the product gφ is negligible. For clarity, the control laws are written in a form that assumes φ =. 66

84 By design, < C{A}, C{A} >, < C{A}, C{Ag} >, and < C{A}, ic{aφ} > are negligible. The DM response is small enough that its response to voltage is linear, and superposition of influence functions holds. Re-linearization of the control matrices is not necessary at each control step because of the rapid reduction of actuation levels during control. The effect of multiple DMs is additive, and symptom of the first three assumptions. 2.3, 2.4, Broadband Wavefront Control and Extrapolation: The bandwidth, λ/λ, is small enough that wavelengths inside those constraining the controller will also be suppressed. Amplitude variations are wavelength independent and fixed to the pupil plane. Phase variations scale as λ /λ and are fixed to the pupil plane. 67

85 Chapter 3 Batch Process Electric Field Estimation The control algorithms developed in Ch. 2 require that we provide an estimate of the electric field at the image plane. The required level of accuracy and precision that focal plane wavefront correction requires from the electric field estimate generally precludes using a separate wavefront sensor because it introduces non-common path errors. However, the final science camera is only capable of imaging the magnitude squared of the electric field. All phase information of the complex field is lost. Therefore we must modulate the field in some manner to make both amplitude and phase observable at the science detector. To accomplish this, there are generally two levels of algorithms that have been developed. Nonlinear estimation schemes based on Gerchberg-Saxton algorithms can accommodate large phase deformations (many multiples of the wavelength), but their uncertainty is too large for the controllers of Ch. 2 to reach extremely high levels of contrast [4, 2, 1, 6, 3]. To create dark holes in high contrast images we require the second type, high precision estimation schemes that only operate in the regime of small phase perturbations. One approach is to image multiple planes and converge on the estimate using a more precise version of a Gerchberg-Saxton type algorithm [4, 18, 19, 22]. Another approach is to use 68

86 algorithms that modulate the aberrations with the deformable mirror itself. This requires a model accurate and precise enough to predict the DM s effect on the image plane at intensity levels equal to or lower than our desired contrast level, and must be measured fast enough for the control to be effective. The speed is directly tied to the stability of the field, which is in turn dependent on the instrument stability. As will be shown in 6.5, the stability must be quantified as a function of the contrast and we will see that this affects the performance of our broadband experiments in Ch. 5. For precision estimation, we have used the DM-Diversity estimation scheme as a baseline to provide the electric field estimate to the stroke minimization controller [11, 23] because of its widespread use and success in multiple laboratories [7, 24, 28]. This chapter derives the algorithm and addresses its advantages and limitations. 3.1 Linearity of the Electric Field To produce the model for this estimation scheme, we begin as we did in Ch. 2 with a model relating the electric field at the DM/pupil plane to the electric field at the image plane. Using an arbitrary linear operator, C{ }, to account for the DM being at a plane non-conjugate to the pupil we rewrite Eq as E im (x, y) = C{A(u, v)} + C{A(u, v)g(u, v)} + ic{a(u, v)φ(u, v)}. (3.1.1) In the end we will still use matrix forms to compute the intensity distribution, but rather than applying a matrix inner product to describe a single scalar value as in Eq , we seek the intensity at each pixel in the image. This requires calculating the magnitude squared of each element in the image, so we will be evaluating the inner product for each scalar value in the image. Given a particular DM shape, +φ, the intensity distribution at the image plane 69

87 is given by I + = < C{A}, C{A} > + < C{Ag}, C{Ag} > + < ic{aφ}, ic{aφ} > + 2R{< C{A} + C{Ag}, ic{aφ} >} + 2R{< C{A}, C{Ag} >}, (3.1.2) We can now describe the interaction of DM actuation and aberrations over the entire control area where we intend to create a dark hole. The only approximation made in this intensity distribution is the linearization of the DM shape. Thus, while we have not actually linearized about the aberrated field it must be small enough that the second order term in the linearization used to produce Eq is negligible (in a single control step). Correspondingly, we will find in the following sections that the estimate of the aberrated field is directly dependent on the linearization of the DM shape. 3.2 Pairwise Images In Ch. 2 we linearized about the DM shape so that we might create a quadratic cost function to solve for the optimal control law provided a field. The goal of this chapter is to estimate the electric field in the image plane given an image described by Eq We will do this by modulating the DM and measuring the effect on the intensity distribution of the image plane. The additive component of the DM in Eq is of no help, but the cross term of the DM effect with the aberrated field will tell us how the modulation of the DM interacts with the aberrated field to change the intensity distribution. To make this interaction the sole observable quantity, we must eliminate all the other terms since they will add bias and noise to the measurement. As pointed out by Borde and Traub [11], we cannot simply subtract the image taken prior to applying the probe (making φ = ) because this does not eliminate the additive component of φ. As described by Borde and Traub [11], applying the negative 7

88 of the DM shape, φ, the intensity distribution becomes I = < C{A}, C{A} > + < C{Ag}, C{Ag} > + < ic{aφ}, ic{aφ} > 2R{< C{A} + C{Ag}, ic{aφ} >} + 2R{< C{A}, C{Ag} >}. (3.2.1) If we subtract Eq from Eq we find the residual to be I + I = 4R{< C{A} + C{Ag}, ic{aφ} >}. (3.2.2) Thus taking difference images leaves us with the product of the DM probe field, C{Aφ}, with the aberrated and nominal field. Difference imaging has the added benefit of removing any static incoherent light sources, such as detector bias, stray light, and planet light, leaving only the coherent component of the field to be measured. The only residual left in the measurement is the interaction of the DM probe with the nominal field, < C{A}, ic{aφ} >. When the aberrations are much larger than the nominal field this will have a negligible effect. Once the aberrations have been suppressed to a level close to that of the nominal field this will become significant. However, recent progress showing that wavefront control is equivalent to computing the profile for a pupil mapping coronagraph [3, 73] has shown that we should in principle be capable of using our control to go below the nominal field the coronagraph is designed for [39]. In this case it is not a residual because we want to include this component of the field in the estimate so we can suppress it. 3.3 DM Diversity: Batch Process Estimation The final step is to manipulate Eq to separate out the aberrated field in a matrix form. To do so, we recognize that we only want the real part of the scalar inner product 71

89 between the two quantities and rewrite Eq as I + I = 4 (R{C{A} + C{Ag}} R{iC{Aφ}} + I{C{A} + C{Ag}} I{iC{Aφ}}) [ ] R{C{A} + C{Ag}} = 4 R{iC{Aφ}} I{iC{Aφ}}. (3.3.1) I{C{A} + C{Ag}} Eq separates the probe and aberrated fields into independent matrices but this equation alone still leaves us with an underdetermined system, meaning the solution is non-unique. The estimate will have a minimal norm, x min, solution via the right pseudo-inverse instead of providing an estimate with minimal least-squares error. To complete the DM-Diversity estimator developed by Give on et al. [23], we must produce an overdetermined system so that we can write it as an unweighted batch process that produces an estimate of the aberrated field at the current control iteration with least-squares minimal error. The linearized interaction of the DM probe and the aberrated field, Eq , can be augmented by taking multiple difference images using j pre-determined shapes. The image I + j is taken with one deformable mirror shape, φ j, while I j is the image taken with the negative of that shape, φ j, applied to the deformable mirror. The difference of each conjugate pair is then used to construct a matrix of noisy measurements, I 1 + I1 z =.. (3.3.2) I + j I j Defining x as the image plane electric field state, we write z as a linear equation in x and include additive noise, n, z = Hx + n (3.3.3) which defines H as the observation matrix that relates the observed quantity to the state we seek to estimate. By writing x as the real and imaginary parts of the electric field at a 72

90 specific pixel, x = R{C{Ag}}, (3.3.4) I{C{Ag}} we can construct H so that it contains the real and imaginary parts of the j th DM perturbation, C{Aφ j }, in each row. With multiple pairs of images it takes the form H = 4 R{C{Aφ 1 }} I{C{Aφ 1 }}.. R{C{Aφ j }} I{C{Aφ j }}. (3.3.5) The product Hx will then match the intensity distribution in the measurement z. With at least three measurements, j 3, we can take a left pseudo-inverse to solve for the estimate of the real and imaginary parts of the aberrated field at each pixel in the image plane with least-squares minimal error: ˆx = (H T H) 1 H T z (3.3.6) To write the system in full matrix form, the state x is stacked vertically for each pixel and the observation matrix for a single pixel, H, is ordered into a larger block diagonal matrix. In most cases, there are enough pixels in the dark hole that the dimension of H becomes very large and too cumbersome for most mathematical programs to handle the matrix inverse. So we must construct H as shown here to construct x pixel by pixel so that enough memory is left to manage the experiment. As we see from Eq , the DM-Diversity algorithm is simply a least-squares batch process estimator [68]. The pseudo-inverse minimizes the error because it is effectively averaging the elements of H when the inverse is taken. The power of this algorithm also comes from the difference imaging of conjugate probe shapes that are applied to the DM. We are left only with the time-varying camera noise in our measurement, meaning that the sensor noise n will follow a zero-mean Poisson distribution [35]. The problem becomes invertible 73

91 using two image pairs to construct z and H, but a minimum of 3 image pairs must be used to create an overdetermined system that will produce a unique estimate with least-squares minimal error from the available data [68]. Practically, we find that 4 image pairs must be used to get a good enough estimate at the Princeton HCIL, largely to average model errors and detector noise. Consequently, 8 images are taken per iteration to estimate the electric field with the DM diversity algorithm. The algorithm has the advantage of being simple, and relatively robust. The disadvantage is that the algorithm is fundamentally limited by DM model uncertainty. The robustness of the algorithm comes with a high cost of exposures that must be repeated every time, a major disadvantage in a system where the time required for detection will be exposure limited. 3.4 Probe Shapes With the DM-Diversity estimator in place we can explore the choice of the probe shapes, φ j. They must be chosen to modulate the estimation area well, otherwise the difference between I + j and I j would be so small that z would come close to zero. Even worse, the observation matrix is constructed by computing the probe effect in the estimation area. If we choose a probe that does not modulate the field in the estimation area well we will get rows that are effectively zero, making H poorly conditioned. While our choice in the probe shapes is somewhat arbitrary we must take care that they modulate the estimation area well enough to produce a well posed problem in Eq We guarantee this by choosing shapes based on analytical functions for which we know the Fourier transform. The DMs being non-conjugate to the pupil plane will have little effect on this computation since we have shown that the angular spectrum factor will simply add an additional phase distribution in the image plane. Following Give on et al. [23], we will simplify the problem of coverage/shape of the dark hole by choosing two symmetric rectangular regions that span the region we wish to estimate. Mathematically we produce a rectangle of width w x and height w y by multiplying two rect 74

92 functions, one for each dimension. Applying the inverse Fourier transform, the DM shape required to produce this rectangle in the image plane is F 1 { rect(w x x) rect(w y y)} = sinc(w x u) sinc(w y v). (3.4.1) We offset the rectangle from the center by a distance a in the x dimension and a distance b in the y direction by convolving it with two pairs of delta functions, one set for each coordinate. The inverse Fourier transform of two symmetric delta functions is F 1 { 1 2 [δ(x a) + δ(x + a)] 1 2 [δ(y b) + δ(y + b)] } = cos(au) cos(bv) (3.4.2) Applying an arbitrary amplitude, c, and the pupil function, A, the two offset rectangles in the image plane generated by the DM shape φ are F{Aφ} = F{A} c rect(w x x) rect(w y y) [δ(x a) + δ(x + a)] [δ(y b) + δ(y + b)] (3.4.3) = F{cA sin(w x u)} F{cA sinc(w y v))} F{cA cos(au)} F{cA cos(bv)} = F{cA sinc(w x u) sinc(w y v) cos(au) cos(bv)}} (3.4.4) Inverse transforming, the shape we would like the DM to approximate is φ = c sinc(w x u) sinc(w y v) cos(au) cos(bv). (3.4.5) The coordinate offset for the delta functions and the width of the rect function is equal to the frequency of the cosine and sinc functions respectively. Assuming linearity of the DM actuation, Eq , we have produced a phase distribution for one DM that results in a unitary amplitude in two rectangular regions of the image plane. As discussed in Give on 75

93 et al. [23], we must keep in mind that the true distribution at the image plane includes a convolution with the nominal PSF, F{A}. The PSF will alter the field so that the the field from a DM shape given by Eq will not have exact unit amplitude and the edges of the rectangle will extend by one radius of the PSF. The distribution will still be relatively uniform, so we are guaranteed to modulate the area under consideration with a reasonable expectation that each pixel in the dark hole will also be modulated. If we take the magnitude square of Eq , we see that the intensity provided by the probe shape will be proportional to the square root of the amplitude, c, of the shape produced by Eq Generally, the amplitude of these shapes is prescribed by the normalized intensity of the aberrated field. When we probe, we want to make a significant effect so that there is a good signal in z, but we do not want to actuate so strongly that we wash out the aberrations. Thus we choose an actuation amplitude equal to the square root of the average contrast in the previous iteration [11]. The experimental results using probe pairs with the DM Diversity estimation algorithm [23] are found in Ch. 5. This estimator is used to supply and estimate to both the windowed and monochromatic forms of the stroke minimization algorithm developed in Ch

94 Chapter 4 Kalman Filter Estimation The DM-Diversity algorithm described in Ch. 3 is quite effective, but it is limited by the fact that it is only a batch process method. As shown in Fig. 4.1, it does not close the loop on the state estimate. Therefore all state estimate information, ˆx, acquired about the electric field in the prior control step is lost. Thus we start over at each iteration, requiring that we take a full set of estimation images to estimate the field again. In addition to being Figure 4.1: Block diagram of a standard FPWC control loop. At any time step, k, only the intensity measurements, z k, provide any feedback to estimate the current state, x k, for control. The red dashed lines show additional feedback from the prior electric field (or state) estimate, ˆx k, and the control signal, u k, used to suppress it. very costly with regard to exposures, the measurements will become progressively noisier as we reach higher contrast levels. If we include feedback of the state estimate we will have a certain degree of robustness to new, noisy measurements by including information from 77

95 prior measurements with better signal-to-noise. Since we already have demonstrated a model based controller, we should be able to use this model to predict the change in the electric field after the controller has applied a DM command. In doing so we do want to consider the relative effect of process and detector noise to optimally combine an extrapolation of the state estimate with new measurement updates. This is exactly the problem a discrete time Kalman filter solves. 4.1 Constructing the Optimal Filter A Kalman filter includes prior state estimate history by extrapolating a new estimate of the state using a model, then optimally updating the estimate with a sequence of measurements taken at discrete intervals. For the time being, the Kalman filter estimator will still use DM probe pairs as described in 3 for the measurement update. In this way we are not testing whether there is a better way to obtain information regarding the electric field, but rather changing the way we use the information from the applied probe shapes to reconstruct the electric field estimate. In particular, we use the prior information to improve our ability to estimate the field in later iterations. This will allow us to use fewer measurements to reconstruct the electric field. Following the notation used in Stengel [68], we begin by assuming we have a state, defined as the electric field estimate from the prior iteration, ˆx k 1 (+). The plus indicates that the estimate was updated at the prior iteration with some measurement, be it from an initialization or a prior estimation and control step. Prior to any additional measurements, we will extrapolate from ˆx k 1 (+) to the current time step, ˆx k ( ), by an arbitrary function (to be defined later in the chapter). We also seek a metric to gauge the uncertainty in the estimate. Following Stengel [68], we define the extrapolated state estimate covariance, P k ( ), as the expected value of the error between the estimate and the true state, x k : P k ( ) = E[(ˆx k ( ) x k )(ˆx k ( ) x k ) T ]. (4.1.1) 78

96 We now seek to optimally include new measurements to improve the state and covariance estimates. These noisy measurements, z k = y k + n k, will still be difference images of probe pairs. As discussed in Ch. 3, the conjugate pairs allow us to construct a linear observation matrix, H k, which stems from Eq If we were not in a low aberration regime the observer would have to be nonlinear. This is not impossible for a Kalman filter, but can make it highly biased [68] and computationally expensive. As we decide how to optimally update the field, we must also have an estimate of the measurement noise covariance, which we define as R k = E[n k n T k ]. (4.1.2) To properly demonstrate the conditions under which the Kalman filter is truly optimal, we would have to show that propagating the state estimate is a Gauss-Markov sequence (indicating that optimality requires white Gaussian inputs and Gaussian initial conditions) [68]. This is rather tedious and can be found in many textbooks that discusses the Kalman filter, so it will not be re-derived here. However, it is worth demonstrating that like the batch process method the Kalman filter does produce an estimate with least-squares minimal error. Since the Kalman filter operates on the estimate in closed loop, the weighted cost function used to derive the batch process solution, J = 1 2 [H kˆx k z k ] T R 1 k [H kˆx k z k ], (4.1.3) will not adequately represent the error contributions in the system. We must also include an estimate of the state covariance, since this will also propagate error in the estimate update. Defining the error as both the difference between the noisy observation and the estimated observation, H kˆx k (+) z, and the difference between the current estimate and the estimate extrapolation, (ˆx k ˆx k ( )), we write the quadratic cost function as J = 1 2 [ˆxk ˆx k ( )] T P k ( ) 1 [ˆx k ˆx k ( ) ] [H kˆx k z k ] T R 1 k [H kˆx k z k ]. (4.1.4) 79

97 We can formulate the cost in matrix form as J = 1 ˆx k ˆx k ( ) 2 H kˆx k z k T P k( ) R k = I ˆx k ˆx k( ) H k z k T 1 P k( ) R k ˆx k ˆx k ( ) (4.1.5) H kˆx k z k 1 I ˆx k ˆx k( ) H k z k = 1 2 ( H kˆx k z k ) T R 1 k ( H kˆx k z k ), (4.1.6) where we have now defined a new set of augmented matrices as H k = I, (4.1.7) H k z k = ˆx k( ), (4.1.8) R k = z k P k( ). (4.1.9) R k This allows us to write an analog to the weighted cost function used to compute the batch process estimator. Taking the partial derivative with respect to the state estimate ˆx k and evaluating at the optimal update, ˆx k (+), we find T J(z k ) = ˆx H k T k ˆx k (+) R 1 k H k H T k R 1 k z k. (4.1.1) 8

98 Evaluating the partial derivative at zero, the optimal state update is ˆx k (+) = ( H T k R 1 k H k ) 1 HT k R 1 k z k (4.1.11) = [ P k ( ) 1 H T k R k H k ] 1 [ Pk ( ) 1ˆx k ( ) + H T k R 1 z k ] = [ P k ( ) P k ( )H T k (H k P k ( )H T k + R k ) 1 H k P k ( ) ] [P k ( ) 1ˆx k ( ) + Hk T R 1 k z ] k = ˆx k ( ) + P k ( )H T k [ Hk P k ( )H T k + R k ] 1 [zk H kˆx k ( )]. (4.1.12) From Eq , we define the optimal gain to be K k = P k ( )H T k [H k P k ( )H T k + R k ] 1. (4.1.13) Eq optimally combines the prior estimate history with measurement updates to minimize the total error contributions based on the expected state and measurement covariance. Much like the batch process method the Kalman filter produces a solution that minimizes a quadratic cost function, Eq. 4.1, but it is also subject to the constraining dynamic equations given by ˆx k ( ) and P k ( ). However, looking at Eq there is a major advantage of the Kalman filter in its minimization of the cost function. For H k to be overdetermined, we only require a single measurement. Thus, at a fundamental level the Kalman filter is formulated in such a way that it solves a least squares, left pseudo-inverse problem, regardless of the number of measurements taken. This gives us the freedom to minimize the number of exposures required to estimate the field to a precision adequate for suppressing the field to the target contrast level. While the form of the cost functions are the same, they are evaluating different criteria. Consequently, we cannot use the cost functions to directly compare their optimality. Instead we look to the only other metric of comparison, the covariance at each iteration. To update the state covariance estimate, P k (+), we continue to use the augmented matrices in 81

99 Eqs Beginning with the expected value function shown in Table 4.1, we write P k (+) = E[(ˆx k (+) x k )(ˆx k (+) x k ) T ] (4.1.14) [ [ = E ( H ] [ k T 1 R k H k ) 1 HT R 1 k k ñk ( H ] ] T k T 1 R k H k ) 1 HT R 1 k k ñk [ = ( H ] [ k T 1 R k H k ) 1 HT R 1 k k E[ñ k ñ T k ] ( H ] T k T 1 R k H k ) 1 HT R 1 k k = ( H T k R 1 k H k ) 1 = [P k ( ) 1 + H T k R 1 k H k] 1. (4.1.15) For comparison, we evaluate the covariance of a weighted form of the batch process method described in Ch. 3, which is P = E [ (x ˆx)(x ˆx) ] T (4.1.16) = E[(H T R 1 H T R 1 n)h T R 1 H T R 1 n) T ] = (H T R 1 H T R 1 )E[nn T ](H T R 1 H T R 1 ) T = ( H T R 1 H ) 1. (4.1.17) As shown in Eq , the state covariance of the batch process method resets after every control step, and is tied to the noise in that particular set of measurements. However, the covariance of the Kalman filter is also a function of the prior state covariance. Looking at Eq , Hk T R 1 k H k, is guaranteed to be positive definite. Thus additional measurements taken at each iteration will act to reduce the magnitude of the covariance since additional measurements can do nothing but make the inversion smaller. We can use the contrast normalization for the measurements, I, to get an idea of the estimator s robustness. Looking ahead to 6.1, we see that I is a function of exposure time. Thus if we do not take a long enough exposure in the probe images R k will become quite large, indicating a poor signal to noise ratio. This exposure time is based on the detection limit for a given laser power, as discussed in 6.1. For the 2 mw laser power used in the 82

100 monochromatic experiments, this is on the order of 1 ms to detect contrast levels. In the broadband experiments, the power levels are on the order of a microwatt for any given wavelength which means we require exposure times on the order of 1 s of seconds. In the batch-process estimator, we are stuck with these measurements and will receive an estimate with large covariance. In this case the control will not be effective, which is why we often see jumps in contrast when using this estimator once we reach low contrast levels. In the case of the Kalman filter, this high covariance is dampened by the contribution of prior covariance estimates via P k ( ), stabilizing the state estimate and its covariance in the event that we take a bad measurement. Since we cannot guarantee that a probe will provide good signal, particularly at low contrast levels, this is an extremely attractive component of the Kalman filter estimator. To complete the filter we need to propagate the prior estimate, x k 1 (+), to the current time step. The filter extrapolates to the current state estimate, ˆx k ( ), by applying a time update to the prior state estimate via the state transition matrix, Φ k 1, and numerically propagating the control output from stroke minimization at the prior iteration, u k 1, via a linear transformation described by Γ k 1. We also have a disturbance from the process noise, w k 1, which is propagated to the current state of the electric field via the linear transformation, Λ k 1. Assuming these components are additive, the state estimate extrapolation is ˆx k ( ) = Φ k 1ˆx k 1 (+) + Γ k 1 u k 1 + Λ k 1 w k 1. (4.1.18) We will apply the linearized optical model used to develop the batch process estimation method and both control algorithms described in Ch. 2 and Ch. 3. Using a linearized model avoids generating arbitrary bias in the estimate at each pixel, a common problem with a nonlinear filter [21]. The first step in propagating the state forward in time is to update any dynamic variation between the discrete time steps with the state transition matrix, Φ k 1. In this system, Φ k 1 captures any variation of the field due to temperature fluctuations, 83

101 vibration, or air turbulence that perturb the optical system. To simplify the model, we recognize that there is no reliable way to measure or approximate small changes in the optical system over time with alternate sensors; we assume that the state remains constant between control steps, making the state transition matrix, Φ k 1, Φ k 1 = Φ = I. The process noise is any disturbance input into the system. The dominant contributor to this will be errors in our expectation of the DM actuation, which will be discussed in greater detail following this derivation. For the time being, we make the standard assumption that the process noise is gaussian white noise, which means it s expected value is zero. Thus, the expected value of the state when we extrapolate is ˆx k ( ) = Φ k 1ˆx k 1 (+) + Γ k 1 u k 1. (4.1.19) The covariance of the process noise will be handled in the covariance extrapolation. For the same reason that we may treat Φ as a constant matrix, the optical system is assumed to be stable enough that the linearized propagation of the control u k 1 is constant, making Γ k 1 = Γ. This matrix must map the control effect of the DM actuators to the image plane electric field, but we need to sort it such that every pair of rows in Γu is the real and imaginary parts of a particular pixel. To begin, we look to the control effect matrices produced in Ch. 2. Recalling Eq , We can produce a vector of complex values via Gu = C{iAφ}. (4.1.2) To produce gamma, we simply need to sort the control effect into the real and imaginary parts per pixel. We have to take the real and imaginary parts pixel by pixel so that each block element of the matrix forms as R{G} n,: = R{G DM1} n,: R{G DM2 } n,: (4.1.21) I{G} n,: I{G DM1 } n,: I{G DM2 } n,:, 84

102 where (n, :) indicates that we have taken all columns of the n th row in G. This block, G n,:, maps the effect of every DM actuator onto the n th pixel. We must reorganize the control matrix in this manner for the sake of H k. If we were to reorganize the state and control vectors so that the real and imaginary components were stacked such that x = [R{C{Ag}} I{C{Ag}}] T, H k would be arranged in a sparse form rather than as a block diagonal matrix. Thus, each submatrix for Γ shown in Table 4.1 is of dimension 2 2N DM and represents the control effect on a single pixel of the matrix. Following Stengel [68], we use the state transition matrix to propagate the prior covariance estimate forward. Applying an additive term for the process noise, Q k 1, the extrapolated covariance estimate is P k ( ) = Φ k 1 P k 1 (+)Φ T k 1 + Q k 1. (4.1.22) where Q k = Ew k w T k. (4.1.23) The details of how we formulate Q k, and the sensor noise, R k, will be addressed in more detail in 4.2. Combining Eq , Eq , and, Eq with the extrapolation equations used in the cost function, we have the discrete time Kalman filter. This form of the filter consists of five equations that describe the state estimate extrapolation, covariance estimate extrapolation, filter gain computation, state estimate update, and covariance estimate update at the k th iteration [68]: ˆx k ( ) = Φ k 1ˆx k 1 (+) + Γ k 1 u k 1. (4.1.24) P k ( ) = Φ k 1 P k 1 (+)Φ T k 1 + Q k 1 (4.1.25) K k = P k ( )H T k [ Hk P k ( )H T k + R k ] 1 (4.1.26) ˆx k (+) = ˆx k ( ) + K k [z k H kˆx k ( )] (4.1.27) P k (+) = [ P k ( ) 1 + H T k R 1 k H k] 1 (4.1.28) 85

103 A fundamental property of the Kalman filter is that the optimal gain, Eq , is not based on measurements, but rather estimates of the state covariance, P k ( ), process noise from the actuation Q k 1, and sensor noise R k. This means that the optimality of the estimate is closely related to the accuracy and form of these matrices; this will be discussed at length in 4.2. The gain matrix, K k, is ultimately what balances uncertainty in the prior state estimate against uncertainty in the measurements z k when computing the final state estimate update, ˆx k (+). Matrix Dimension Φ = I (2 N pix ) (2 N pix ) [ ] R{GDM1 } R{G DM2 } I{G DM1 } I{G DM2 } 1 Γ = [. (2 N ] pix ) (2 N DM ) R{GDM1 } R{G DM2 } I{G DM1 } I{G DM2 } n Λ = Γ (2 N pix ) (2 N DM ) P = E[(x ˆx )(x ˆx ) T ] (2 N pix ) (2 N pix ) Q k = ΛE[w k wk T ]ΛT (2 N pix ) (2 N pix ) R{G DM2 φ k1 } I{G DM2 φ k1 } H k = diag.. (N pix N pairs ) (2 N pix ) R{G DM2 φ kj } I{G DM2 φ kj } n R k = E[n k n T k ] (N pix N pairs ) (N pix N pairs ) K k is computed (2 N pix ) (N pix N pairs ) Table 4.1: Definition of all filter matrices. N DM is the number of actuators on a single DM, N pix is the number of pixels in the area targeted for dark hole generation, and N pairs is the number of image pairs taken while applying positive and negative shapes to the deformable mirror With H k, z k, Γ, ˆx, and u k constructed, the dimension and form of the rest of the filter follows. Table 4.1 and Table 4.2 define all the matrices and vectors in the filter equations 86

104 for this problem and provides their dimensionality for clarity. The initialization of the covariance, P, is critical for the performance of the filter. In our system this cannot be measured, so we must initialize with a reasonable guess. We can use the final covariance matrix from a prior control attempt (to maximum achievable contrast) to initialize the filter in the future so that its form might be more accurate. We compute the process noise assuming a standard zero-mean variance on the actuation of the DMs, w. The sensor noise is determined statistically from the readout noise that the detector exhibits when taking dark frames. As in Ch. 3, the focal plane measurements z k are identical to that of 3, and are constructed into a vertical stack of difference images taken in a pair-wise fashion to produce j measurements for n pixels. Likewise H k takes on a similar form, and is a matrix constructed from the effect of a specific deformable mirror shape φ j on the real and imaginary parts of the electric field in the image plane. Finally, we compute the covariance update, P k (+), based on the added noise from the new measurements. The estimated state is a vertical stack of the real and imaginary parts of the electric field at each pixel of the dark hole in the image plane. The control signal u is a vertical stack of the actuators of each DM, with DM1 being stacked on top of DM2. Since we are only considering process noise at the DMs, the process disturbance w is a vertical stack of the variance expected from each actuator. Recalling Eq , H k is constructed by separating the real and imaginary parts of the DM probe field. Thus it will be underdetermined unless at least 2 pairs of images are used in the measurement, one of the major limitations of the batch process method in Ch. 3. This will result in a non-unique solution to the state when using a batch-process, and will only provide the solution with the smallest quadratic norm since it must be solved via the right pseudo-inverse. On the other hand, the Kalman filter only requires a single measurement as an update to the state. Therefore it isn t necessary for the matrix to be square or overdetermined, and we maintain a favorable dimensionality when updating the state. 87

105 Variable I 1 + I1. I + j I j 1 z =. I 1 + I1. I + j I j n [ ] R{E1 } I{E 1 } ˆx = [. ] R{En } I{E n } [ ] DM1 u = DM2 [ ] σdm1 w = σ DM2 Dimension (N pix ) (N pairs ) 1 (2 N pix ) 1 (2 N DM ) 1 (2 N DM ) 1 Table 4.2: Definition of filter vectors. N DM is the number of actuators on a single DM, N pix is the number of pixels in the area targeted for dark hole generation, and N pairs is the number of image pairs taken while applying positive and negative shapes to the deformable mirror In the scenario of coronagraphic imaging, we are photon limited which typically means exposure time is by far the limiting factor when estimating the field. However, there a lot of mathematical operations involved with a Kalman filter. It is worth looking at the computation time required to compute the update since it will ultimately limit the speed with which we may estimate the field. The number of mathematical operations follows from the dimension of the matrices given in Table 4.1 and Table 4.2. Thus, the computation is directly dependent on the size of the dark hole, the number of actuators on the DMs, and the number of measurements taken at each step. For a fixed dark hole size and a set number of actuators, all we can do is attempt to minimize the number of measurement updates per iteration. Presumably we could bin the camera to reduce the number of pixels, N pix, but 88

106 this will not benefit a true observatory since the image plane is Nyquist sampled (implying a loss of necessary spatial information if we bin the detector). The number of actuators is not a limiting factor in any space or ground telescope design to date, but this does bring to light a general control and estimation challenge for the next generation of extremely large telescopes (ELTs). AO studies for ELTs are investigating DMs with 4, actuators or more, meaning there will be a numerical challenge for any estimation and control scheme (even a conventional atmospheric AO scheme)[41]. In this case, we would likely have to come up with a way to reduce the dimension of the problem. However, in current observatory scenarios the highest available actuator count is limited to 4, actuators. In this case we will certainly be limited by exposure time even if we estimate the field over the entire controllable area of the DM (since exposure times suitable for high contrast detection are on the order of minutes to hours). In the laboratory experiment, with 2 mw of laser power using the Ripple3 coronagraph[5] the computation time for the estimation step is on the order of seconds, which is much faster than the field variability at the level. Since readout takes approximately.6 seconds on the Starlight Xpress SXV-M9 camera, exposure time is a significant fraction of the time required to estimate the field, even in the case of high laser power levels. Thus, the speed of the Kalman filter computation is not limiting our current achievable contrast levels. 4.2 Sensor and Process Noise Two important design parameters for the performance of the filter are the process noise, Q k 1, and the sensor noise, R k. In order for the filter to operate optimally in the laboratory we must make reasonable assumptions for the values that exist in the laboratory. Rather than running a simulation to find the most likely values, we appeal to physical scaling of the two largest known sources of error in the system. Our sensor noise will be the dark current and read noise inherent to the detector. Process noise will largely come from errors in the 89

107 actuation shape. Defining the control perturbation for the Kalman filter such that it is treated as an additive error allows us to appeal to the value of the process noise Q physically, since we do not have a way to measure it. For a disturbance, w k, we define the covariance of the process noise as Q k = E [ ] w k wk T. (4.2.1) Following the definitions for Q in Stengel [68], the process noise at the image plane is Q k = ΛQ kλ T, (4.2.2) where Λ propagates the process noise to the image plane. To propagate process noise onto the image plane, Q k = ΛE[ww T ]Λ, we assume w is solely from the variance in DM actuation. This allows us to choose Λ k 1 = Λ = Γ. We can trace errors of the actuation shape to two sources. The first is the resolution of the digital-to-analog (D/A) converter, which is 14- bits over a 25 Volt range. This gives us a resolution of approximately.15 Volts, which corresponds to approximately.3 nm vertical resolution. This is much more precise than the surface knowledge, which is the true limitation. Poor knowledge of the surface comes from the inherent nonlinearity in the voltage-to-actuation gain as a function of voltage, the variance in this gain from actuator to actuator, and the accuracy of the superposition model used to construct the mirror surface that covers the 32x32 actuator array of the Boston Micromachines kilo-dm. Physical models, such as those found in Blain et al. [9], have been constructed to produce a more accurate surface prediction over the full 1.5µm stroke range with an rms error of 1 nanometers. Since we operate in a low actuation regime, superposition is still a relatively safe assumption as evidenced by current laboratory success. The Kalman filter presents an elegant solution where we can treat actuation errors as additive process noise and include them in the estimator in a statistical fashion, rather than deterministically in a physical model. Since there is no physical reasoning to justify 9

108 varying Q k at each iteration, it will be kept constant throughout the entire control history (Q k = Q = constant). Two versions of Q can be considered in this case. The first is where we simply have no correlation between actuators, giving a purely diagonal matrix with a magnitude corresponding to the square of the actuation variance, σ u. Note that while Q is diagonal the process noise at the image plane, Q, will not be diagonal because of the linear transformation to propagate this variance to the image plane via Γ. The second version of Q which we may consider is one that has symmetric off-diagonal elements. This will treat uncertainty due to inter-actuator coupling and errors in the superposition model statistically. As a first step, we will not consider inter-actuator coupling to help avoid a poorly conditioned matrix. This helps guarantee that the Kalman filter itself will be well behaved. Thus the process noise for the filter will be Q = σ 2 uγiγ T. (4.2.3) Following Howell [35] the noise from both the incident light and dark noise in a CCD detector follows a Poisson distribution. This will lead to an estimator that is not truly optimal for short exposures, but the Poisson distribution will more closely approximate a Gaussian distribution with non-zero mean for an adequately long exposure. We could subtract a median dark frame, but differencing pairwise images allows us to construct a linear observer in matrix form. The noise in each measurement will have zero-mean and will become more Gaussian as the exposure time increases. The statistics for the HCIL detector at relevant exposure times are shown in Fig Here we simplify the noise statistics by assuming it is uncorrelated and constant from pixel to pixel, making R k a diagonal matrix of the mean pixel covariance in units of contrast. Since we do not have the ability to measure this variance actively the assumption is that the CCD and laser source have been thermally stabilized, which will keep the variance constant 91

109 Distribution of Dark Frame for 5ms x x 14 Distribution of Dark Frame for 2ms 2 2 Occurence Occurence Counts Counts (a) 5 ms Dark (b) 2 ms Dark Figure 4.2: Average counts across the detector for a dark frame. At short exposure times this follows a Poisson distribution. At much longer exposure times it should be well approximated by gaussian distribution, but is still slightly Poisson at 2ms. over time, thus defining the sensor noise as R = σ CCD I I Npairs N pairs. (4.2.4) Having appealed to physical scaling in the HCIL, we now have close approximations of the true process and sensor noise exhibited in the experiment. With an appropriate, nonzero, initialization of the covariance we will be able to produce an effective optimal gain that will leverage the data at each iteration as much as possible to produce a new state estimate update. 4.3 Iterative Kalman Filter An additional advantage of the Kalman filter is that we may apply the filter iteratively, feeding the newly computed state ˆx K (+) and covariance update P k (+) back into the filter again, setting u k 1 to zero. For sufficiently small control this will help account for nonlinearity in the actuation and better filter noise in the system, limited only by the accuracy of 92

110 the observation matrix, H k. With no control update, the control signal will be set to zero when we iterate the filter. Following a notation similar to Gelb et al. [21], the j th iteration of feedback into the iterative Kalman filter at the k th control step is ˆx j,k ( ) = ˆx j 1,k 1 (+) (4.3.1) P j,k ( ) = Φ j 1,k 1 P j 1,k 1 (+)Φ T j 1,k 1 + Q j 1,k 1 (4.3.2) [ ] K j,k = P j,k ( )Hj,k T Hj,k P j,k ( )Hj,k T 1 + R j,k (4.3.3) ˆx j,k (+) = ˆx j,k ( ) + K j,k [z j,k H j,kˆx j,k ( )] (4.3.4) P j,k (+) = [ P j,k ( ) 1 + H T j,kr 1 j,k H j,k] 1. (4.3.5) The power of iterating the filter lies in what we are fundamentally trying to achieve. For a successful control signal, we will have suppressed the field. This means that the magnitude of the probe signal will be lower than the control perturbation. This guarantees that H k will better satisfy the linearity condition than Γu. As a result, if we iterate the filter on itself during a given control step we can use the discrepancy between the image predicted by H kˆx k (+) and the measurements, z k, in Eq to filter out any error due to nonlinear terms not accounted for in Γ. In this way, we can accommodate a small amount of nonlinearity in the extrapolation of the state without having to resort to a nonlinear, or extended, Kalman filter. This means that we don t have to re-linearize about ˆx k (+), as would be the case for an iterative extended Kalman filter (IEKF). It also avoids having to concern ourselves with any bias introduced into the estimate by a nonlinear filter. 4.4 Optimal Probes: Using the Control Signal In 3.4 we discussed the choice of probe shapes to create a well posed problem. In principle, we have found shapes that adequately probe the field by perturbing the field as uniformly as possible. However, nobody has ever looked deeply into the true merit of these functions or 93

111 how to choose the best shapes to probe the dark hole. In any dark hole there are discrete aberrations that are much brighter than others, requiring that we apply more amplitude to those spatial frequencies. Conversely the bright speckles raise the amplitude of the probe shape, which is too bright for to take a good measurement of dimmer speckles. Excluding this issue, we also cannot truly generate the analytical functions described in 3.4. Even the DM with the highest actuator density available, the Boston Micromachines 4K-DM, can only approximate each function with 64 actuators. We account for the true shape in the model but this shape does not truly probe each pixel in the dark hole with equal weight, which was the primary advantage of the analytical function for a probe shape in the first place. Fortunately, we can once again appeal to the mathematical model for estimation and control to help determine an adequate probe shape. Once one estimate has been provided, the control law determines a shape to suppress all the speckles in the dark hole. Since it has suppressed the field, this control shape necessarily probes the aberrated field in the dark hole. If we apply the conjugate of the control shape we will increase the energy of the aberrated field. Thus, we can rely on the controller to compute shapes that optimally probe the aberrated field and automatically choose amplitudes appropriate for the intensity at each pixel in the dark hole. 4.5 Chapter Assumptions 4.1 Filter Construction: The linear form of the filter derived here requires the linearized models of the image plane electric field developed in Ch. 1 and Ch. 2. The merit of this linearization over an extended Kalman filter and our ability to accommodate any nonlinearities via filter recursion is discussed in this section Sensor and Process Noise: 94

112 All process noise is limited to uncertainty in the DM actuation, and does not account for interactuator coupling. All sensor noise is limited to the dark current of the CCD. The noise has zero-mean, and over long exposures the Poisson distribution describing this noise will closely approximate white gaussian noise. 95

113 Chapter 5 Laboratory Results In Chapters 2, 3, and 4 we developed a number of estimation and control algorithms that make up the focal plane wavefront correction algorithm designed to recover regions of high contrast in finite areas of the image plane. In this chapter we present the results of experiments testing these correction algorithms at the Princeton HCIL and their ability to produce symmetric dark holes in both monochromatic and broadband light. Since the purpose is to develop and test the performance of the controller itself, we do not apply any post-processing techniques to remove so-called incoherent components of the electric field [33]. This is not done because it is possible that model error will look like incoherent light and will be falsely subtracted, adding uncertainty to our performance. More importantly, our purpose is to test the performance of the correction algorithms, which is different than testing our ability to detect a planet. The values reported in this thesis demonstrate a situational performance during an observation, rather than relying on a post-processing technique to achieve a higher contrast level. 5.1 Monochromatic Performance To begin, we demonstrate the monochromatic performance of the Princeton HCIL using the stroke minimization correction algorithm derived in 2.1. This allows us to compare 96

114 the performance of the DM Diversity and Kalman filter estimators in the simplest scenario where we do not have to consider the effect of bandwidth on their performance. Overall, we demonstrate our ultimate achievable contrast and the ability of the Kalman filter to more efficiently suppress the field by requiring fewer estimation exposures DM Diversity Performance As discussed in Ch. 3, the DM diversity estimator can produce a unique solution with as few as two measurements, each a difference image of conjugate DM shapes. This is the number of measurements used at the JPL HCIT for estimation at each iteration, but at the Princeton HCIL there is enough uncertainty in the system that we require a minimum of three measurements (6 images) to take advantage of the averaging effect of the left pseudoinverse. Practically, we find that the DM diversity estimator requires four measurements (8 images total) to reach our ultimate achievable contrast levels. This is the baseline image set for comparing to the performance of the Kalman filter in The laboratory starts at 1 Initial Image Diversity Estimator Contrast Plot AVG Contrast Left Contrast Right Contrast 1 Best Contrast Diversity Estimator log 1 (Contrast) Contrast log 1 (Contrast) λ /D Iteration λ /D (a) (b) (c) Figure 5.1: Experimental results of sequential DM correction using the DM-Diversity estimation algorithm. The dark hole is a square opening from λ/d on both sides of the image plane. (a) The aberrated image. (b) Contrast plot. (c) The corrected image. Image units are log(contrast). an initial contrast of (Fig. 5.1(a)). Using the least-squares estimation technique it is capable of reaching an average contrast of in a (7 1) (-2 2) λ/d region 97

115 within 3 iterations (Fig. 5.1(c)) on both sides of the image plane, a unique capability that is a result of the two deformable mirrors in the system. The size of the dark hole is limited by our certainty in the DM shape. As we increase the outer working angle of the dark hole, we need better certainty of the DM at higher spatial frequencies to maintain the same contrast level. This is compounded by the fact that we have two DMs in series, but this enables us to create symmetric dark holes in the image plane Kalman Filter Performance In this section we correct the field using the Kalman filter from Ch. 4 for estimation using four, three, two, and one pair of images as a measurement update to assess the degradation in performance as information is lost. We begin with four measurements (four image pairs), to compare its performance using the same number of measurements as the batch process estimator to produce the results of Using 4 pairs, the filter achieved a contrast of in (7-1)x(-2-2) λ/d symmetric dark holes within 2 iterations of the controller, shown in Fig Note that this used a total of 16 estimation images, which is the same amount of information available to the batch process method in when it achieved a contrast of in 2 iterations. Initial Image Contrast Plot Final Image: 4 Pairs Contrast AVG Contrast Left Contrast Right Contrast Iteration (a) (b) (c) Figure 5.2: Experimental results of sequential DM correction using the discrete time extended Kalman filter with 4 image pairs to build the image plane measurement, z k. The dark hole is a square opening from λ/d on both sides of the image plane. (a) The aberrated image. (b) Contrast plot. (c) The corrected image. Image units are log(contrast). 98

116 When the number of image pairs is reduced to three, the correction algorithm was still able to reach a contrast level of using only 12 estimation images, as shown in Fig.5.3. Having proven that we can successfully reach very close to the same limits with fewer Initial Image Contrast Plot: 3 Pairs Final Image: 3 Pairs Contrast AVG Contrast Left Contrast Right Contrast Iteration (a) (b) (c) Figure 5.3: Experimental results of sequential DM correction using the discrete time extended Kalman filter with 3 image pairs to build the image plane measurement, z k. The dark hole is a square opening from λ/d on both sides of the image plane. (a) The aberrated image. (b) Contrast plot. (c) The corrected image. Image units are log(contrast). exposures, we now tune the covariance initialization and noise matrices and attempt only using two pairs of images. By reducing the number of image pairs to two, we are using half as many images as the correction shown in and have reached a point where the batch process method will no longer provide a solution that takes advantage of the averaging effect of the left pseudo-inverse solution. After further tuning the covariance and noise matrices, the contrast achieved after 3 iterations of the correction algorithm was , shown in Fig Note that this is better than the case which used three pairs because we have improved the covariance initialization and increased the number of times the filter is iterated in a single control step. In fact it should be noted that making the filter iterative is critical to its performance since it accounts for nonlinearity, particularly in the propagation of the control. Reducing the number of measurements to a single pair we find a very interesting result. The quality of the measurement at any particular time step of the algorithm is now dependent on the quality of that particular probe shape. As a result, if the probe does not happen to 99

117 Aberrated Field Contrast Plot Corrected Contrast: Contrast AVG Contrast Left Contrast Right Contrast Iteration (a) (b) (c) Figure 5.4: Experimental results of sequential DM correction using the discrete time extended Kalman filter with 2 image pairs to build the image plane measurement, z k. The dark hole is a square opening from λ/d on both sides of the image plane. (a) The aberrated image. (b) Contrast plot. (c) The corrected image. Image units are log(contrast). modulate the field well the field estimate gets worse. It is also important to cycle through the probe shapes. A single probe may not modulate a specific location of the field well, so we must choose a different probe shape to guarantee that we adequately cover the entire dark hole. Starting from an aberrated field with an average contrast of , Fig. 5.5(a), we achieved a contrast of in 3 iterations and in 43 iterations of control, Fig. 5.5(c). Looking at the contrast plot in Fig. 5.5(b), the sensitivity of a single measurement update to the quality of the probe is very clear. What is interesting, however, is that the modulation damps out over the control history. While we do not suppress as quickly in earlier iterations, as in the case with more probes, we achieve our ultimate contrast levels in almost the exact same number of iterations. This is a direct result of developing good coverage across the dark hole over time by changing the probe shape at each iteration. Thus, even with one measurement update at each iteration the prior state estimate history stabilizes the estimate in the presence of the measurement update s poor signal-to-noise at high contrast levels. What is further encouraging is that this is still applying the arbitrary probe shapes derived in Ch. 3. If we were to intelligently choose our probes, as discussed in Ch. 4, we will see a dramatic improvement in the rate of convergence for a single measurement update. A very promising aspect of this estimation scheme is that its performance did not degrade 1

118 Aberrated Field Contrast Plot Corrected Contrast log 1 (Contrast) Contrast AVG Contrast Left Contrast Right Contrast log 1 (Contrast) Iteration (a) (b) (c) Figure 5.5: Experimental results of sequential DM correction using the discrete time extended Kalman filter with one image pair to build the image plane measurement, z k. The dark hole is a square opening from λ/d on both sides of the image plane. (a) The aberrated image. (b) Contrast plot. (c) The corrected image. Image units are log(contrast). significantly as the amount of measurement data was reduced. With only 86 estimation images it was capable of reaching the same final contrast (within measurement uncertainty) achieved by the DM diversity algorithm in 5.1.1, which achieved a contrast of in 3 iterations. The batch process required 24 images to maintain an estimate of the entire control history, achieving a contrast of Thus by making the estimation method more dependent on a model we were able to reduce our need to measure deterministic perturbations in the image plane electric field. 5.2 Broadband Performance To take spectra of any planets we discover in our dark hole, we need to extend the experimental results of 5.1 to broadband light so that the planet is detectable in more than one wavelength. In 2.4 we developed the Windowed Stroke Minimization algorithm with estimate extrapolation to accomplish this task. Here we show the results of these experiments, and point to an interesting laboratory limitation that required an upgrade of the optical fiber used as the point source in the experiment. The results in are prior to this upgrade, and are included for the sake of comparison shows the most current results from the 11

119 Princeton HCIL producing symmetric dark holes in a targeted 1% band around λ = 633 nm. In all cases, the results we present are in a dark hole region with dimension λ /D. The contrast measurement is pinned to the central wavelength so that we can pin the performance to a fixed sky angle, α, defined as α = tan 1 (nλ /D). In a 1% band the physical shift is less than one pixel at the HCIL, and the controller corrects an area of λ /D. In this way we do not have areas that the controller has not corrected leaking light into the dark hole, skewing our measurement of the controller performance Prior to Single Mode Photonic Crystal Fiber In the first broadband experiments at the HCIL, the output fiber shown in Fig. 1.5 was simply a 633 nm single mode fiber. The correction was performed at 62 rather than 633 nm as in the other experiments, partially because of filter availability. In this experiment we have tested the performance of the Windowed Stroke Minimization algorithm of 2.4 over a 1% bandwidth. The estimate for the filters bounding the 1% target bandwidth are computed using the estimate extrapolation technique developed in 2.5. Starting at an average contrast of (Fig. 5.6(c)) over the five filters spanning our 1% bandwidth (6,62,633,64,65 nm), Fig. 5.6(d) shows an average contrast of when we use the filter extrapolation technique. Note that while the central wavelength of the 65 nm filter does not exactly reach 5% above our central wavelength, it has a relatively wide bandwidth that reaches 558 nm at its FWHM. Starting at a contrast level of over the full bandwidth, the controller reached a contrast limit of Looking at the wavelength performance, we see that even the central wavelength is not suppressed particularly well. The dark holes exhibit a good average contrast, but there is a lot of variance within them and their edges are not well defined. Additionally, we see a rapid degradation of the dark hole field as a function of wavelength to the point where it is virtually indistinguishable when we reach the bounding wavelengths in the optimization at 6 and 65. Compared to a typical monochromatic experiment, these images depict an abnormally 12

120 Pre-PCSM Extrapolation Results Pre-PCSM Full Bandwidth Contrast 1 AVG Contrast Left Contrast Right Contrast Initial Contrast λ λ /D log 1 (Contrast) (a) (b) Pre-PCSM 1% Mean Initial = Pre-PCSM 1% Mean Contrast = log 1 (Contrast) log 1 (Contrast) λ /D λ /D (c) (d) Figure 5.6: Pre-PCSM Extrapolated results high amount of structure in the dark hole and appear to be highly sensitive to variance in low to mid-spatial frequency aberrations. The chromatic dependence of these errors, particularly at the shorter wavelengths, indicates that the 633 nm single mode fiber is inadequate for the broadband experiments. This is a result of the multimode output (primarily TEM1 and TEM1 modes) at shorter wavelengths and the high degree of attenuation at longer wavelengths. We chose to reproduce the results in this section after upgrading the fiber in the laboratory. 13

121 5.2.2 Photonic Crystal Single Mode Fiber Upgrade Given the non-single mode nature of the output beam at shorter wavelengths (and our sensitivity to such aberrations), the poor coupling efficiency, and high attentuation of the 633 nm single mode fiber at longer wavelengths we chose to upgrade the fiber delivery to a Koheras Photonic Crystal continuously Single Mode (PCSM) fiber. We chose a fiber option that fully spanned the bandwidth we operate over, with a 5 µm core. This provided the smallest mode field diameter available, ±.5 µm, providing a numerical aperture (NA) of.1.14 across the visible spectrum (NA being the sine of the divergence halfangle). This is comparable to the µm mode field diameter and NA.1.14 of a single mode 633 nm fiber between 633 and 68 nm. Overall, the PCSM fiber has a lower level of attentuation, is continuously single mode, and roughly matches the beam divergence angle expected from the original fiber, which we have found in the past to well approximate a point source. Since the field from a star is effectively planar, our ability to provide singlemode light at all wavelengths allows us to more accurately demonstrate the controller under conditions true to a real observation. Fig. 5.8 shows the overall results of applying the same extrapolation technique after the new fiber had been installed. Fig. 5.8(a) shows marked contrast improvement at all wavelengths, the out of band wavelengths improving on the order of 3 1. As we would have expected, the shorter wavelengths improved more than the longer wavelengths because their output no longer contains higher TEM modes. Comparing Fig. 5.8(b) to Fig. 5.6(b) we also see that we have a slight improvement in the inner working angle of the dark hole, which is consistent with the fact that we eliminated very low order modes, such as TEM1 and TEM1, by upgrading to the new fiber. Very little of the energy in Fig. 5.6(d) is (intentionally) below the cutoff wavelength for single mode output of the 633 nm SM fiber, which is why the IWA improvement is not as evident when comparing to Fig. 5.8(d). Looking at the progression of the final dark hole in wavelength, Fig. 5.9, we see that the central wavelength is deeply suppressed while the intensity of the dark hole raises rapidly. For the filters inside the 1% optimization bandwidth (6, 62,64, and 65 14

122 nm) we see that the contrast degradation is a result of small scale aberrations growing in intensity. Outside of these wavelengths the dark hole degrades rapidly to the point that it is not distinguishable in the 55 and 74 nm images. While the average contrast does degrade from the slight shift in the dark hole location, it is also due to speckles within the dark hole increasing in intensity. This indicates that we are somewhat limited by the accuracy of our extrapolation, which tends to introduce fine structure into the dark hole. Note, however, that when comparing Fig. 5.9(e) with Fig. 5.7(d) we see that the fine structure at the central wavelength is gone. This is entirely due to the fiber upgrade, since no other modifications were made to the experiment. 15

123 The accuracy of the functional relationship of the phase and amplitude among wavelengths will ultimately bound the achievable bandwidth; therefore, as a metric, these results are also compared to estimating each wavelength separately. As discussed in 2.5, improving this functional relationship requires that we establish a higher order relationship of the electric field that captures more of the system model. For the time being, we compare the performance of the simplest (and fastest) extrapolation technique we may physically motivate to multiple estimates, which will be slower but presumably more accurate at longer wavelengths. Fig. 5.1 shows the overall performance of multiple estimates vs. single estimates. When estimating each wavelength separately the contrast reaches in a 1% band (Fig. 5.1(d)) and over the full bandwidth (Fig. 5.1(b)). There is no improvement compared to the contrast achieved in the 1% band and contrast over the full spectrum using the estimate extrapolation technique. Shaklan et al. [65] show that the ultimate achievable contrast is a function of the correction bandwidth. They show that this limitation is from propagation induced amplitude distributions in the field from surface figure errors on the optics, and the fact that we have a finite controllable bandwidth using two DMs in series (or a Michelson configuration). If we assume that the DM surfaces (Fig. 1.1) are the worst figures in our system and apply this to the derivation in Shaklan et al. [65], the HCIL optical system should be capable of reaching at least over a 2% bandwidth, indicating that both methods are well above the fundamental limitations of this optical system (Figs. 5.1(b), 5.1(a)) and these results are largely limited by higher sensitivity to estimation error and system stability. Comparing the contrast as a function of wavelength in Fig. 5.8(a) and Fig. 5.1(a), the bandwidth has been suppressed much more uniformly when multiple estimates are used in lieu of the extrapolation technique. Since the bounding wavelengths were only slightly underweighted in the optimization (µ =.75) we expected a relatively uniform suppression as in Fig. 5.1(a). This indicates that the accuracy of the extrapolation was the limiting factor in allowing the controller to evenly suppress the 16

124 bandwidth. However, the ultimate contrast of the central wavelength is not nearly as low in the direct estimate as it was when applying multiple estimates. Comparing the dark holes at the central wavelength using estimate extrapolation (Fig. 5.9(e)), we see that the dark hole using direct measurements (Fig. 5.11(e)) exhibits much more residual structure. However, Fig. 5.11(c) Fig. 5.11(g) show that the region bounding the corrected area persists better than the dark hole in the extrapolation case, Figs. 5.9(c) 5.9(g). Since both reached roughly the same average contrast in the 1% band, we may have fundamentally bottomed out the achievable contrast for symmetric dark holes (at the HCIL) over that bandwidth. In other words, we can either have all five filters at a modest contrast level or we can have one wavelength highly suppressed at the cost of worse contrast in the others. This would be related to the inherent chromaticity of our pupil from highly aberrated, non-conjugate planes. This could be beyond the effective bandwidth achievable using only two DMs in series. However, another distinct possibility is that we have reached a stability limit in the experimental setup. Since we required three individual estimates to achieve the results shown in Fig. 5.1, the estimation step took roughly three times longer than in the extrapolation case. The low power of the filtered broadband light requires exposure times of 4 seconds. With 8 exposures required per estimate using the batch process estimator means that the estimation step went from 5 to 15 minutes per iteration. As will be shown in 6.5, the system is only stable to over such a long period (independent of power fluctuations). Thus, the extrapolation method reached the limit of system variance over a 5 minute interval at the central wavelength, but at the cost of less accurate estimates over the bandwidth due to an innaccurate extrapolation. On the other hand, the longer time frame required to take multiple estimates meant that we compromised the stability of the experiment but we were able to more evenly suppress the field over the bandwidth. As a result, we cannot prove that we have reached a fundamental limit in the laboratory without getting more laser power or improving system stability. The sensitivity of the correction algorithm to laboratory stability demonstrates the power of the extrapolation 17

125 technique. To take full advantage of an observatory s stability, we clearly want to reduce the time required to produce estimates of the electric field over the optimization bandwidth. Furthermore, the advantage of establishing an augmented cost function and using extrapolated wavelengths is that it automatically extends the optimal estimator developed in Ch. 4 to broadband light because this method only requires a single monochromatic estimate. It is therefore worthwhile to continue pursuing more accurate and sophisticated extrapolation techniques. The most promising direction we currently see is to augment the Kalman filter to include the extrapolation. This potentially allows us to produce estimates of multiple wavelengths using incomplete measurements at every wavelength. Thus, the estimator could not independently produce an estimate at each wavelength but averages uncertainty in the wavelength dependence across all three estimates. 5.3 Final Remarks In this chapter we have demonstrated the Kalman filter estimator, the windowed stroke minimization algorithm, an estimate extrapolation technique, and the ability to create symmetric dark holes via two DMs in series. The Kalman filter was tested against the original DM-Diversity batch process method using the monochromatic Stroke Minimization algorithm developed by Pueyo et al. [55]. The experiments show that the Kalman filter s ability to optimally estimate the field, balancing prior state estimate feedback with new measurements dramatically boosts the efficiency of the estimation stage and stabilizes the estimate at higher contrast levels. This efficiency is a function of both the number of exposures and the exposure time. As it stands, the HCIL suppresses three orders of magnitude, which is well within the dynamic range of our camera. As a result, exposure time does not affect the efficiency of the results shown in this section. However, as we continue to higher contrast levels ( 1 4 for a 16 bit camera) the exposure time at each iteration will also contribute to the efficiency since we will have to change the exposure time. In this case state esti- 18

126 mate feedback will become even more important because the reliance on prior estimates will reduce our dependence on extremely long exposure times at very high contrast levels. Overall, the results presented here show half the number of exposures required for estimation without sacrificing achievable contrast or dark hole area. Additionally, thanks largely to model improvements the convergence rate to the ultimate achievable contrast has increased dramatically. The monochromatic wavefront suppression has matured greatly. The Windowed Stroke Minimization algorithm is the first controller written so that it explicitly suppresses a bandwidth, and the initial results are promising. The extrapolation technique allows us to further improve the efficiency of the estimation stage by removing the requirement that we obtain a field estimate for every wavelength in the optimization. The ultimate achievable contrast in a 1% band is a little more than one order of magnitude worse than the best performance demonstrated with the monochromatic algorithm, but this is within a factor of two of the original symmetric dark hole results using monochromatic light presented in Pueyo et al. [55]. There is still a great deal of work to be done to directly optimize the bandwidth and improve the estimate extrapolation, but given the simplicity of the physically motivated computations the results shown in are very promising. 19

127 Pre-PCSM Extrapolate, λ = 55 nm, e-5 4 Pre-PCSM Extrapolate, λ = 577 nm, e-5 4 Pre-PCSM Extrapolate, λ = 6 nm, e log 1 (Contrast) 5.5 log 1 (Contrast) 5.5 log 1 (Contrast) (a) λ = 55 nm (b) λ = 577 nm (c) λ = 6 nm Pre-PCSM Extrapolate, λ = 62 nm, e-6 4 Pre-PCSM Extrapolate, λ = 633 nm, e-6 4 Pre-PCSM Extrapolate, λ = 64 nm, 5.765e log 1 (Contrast) 5.5 log 1 (Contrast) 5.5 log 1 (Contrast) (d) λ = 62 nm (e) λ = 633 nm (f) λ = 64 nm Pre-PCSM Extrapolate, λ = 65 nm, 8.541e-6 4 Pre-PCSM Extrapolate, λ = 67 nm, e-5 4 Pre-PCSM Extrapolate, λ = 694 nm, e log 1 (Contrast) 5.5 log 1 (Contrast) 5.5 log 1 (Contrast) (g) λ = 65 nm (h) λ = 67 nm (i) λ = 694 nm Pre-PCSM Extrapolate, λ = 74 nm, e log 1 (Contrast) (j) λ = 74 nm Figure 5.7: Pre-PCSM Extrapolate Individual Filters 11

128 δ 1 = δ 2 =.75 Contrast Plot Extrapolated Full Bandwidth Contrast log 1 (Contrast) 1 6 AVG Contrast Left Contrast Right Contrast Initial Contrast λ λ /D (a) (b) Extrapolated 1% Mean Initial = Extrapolated 1% Mean Contrast = log 1 (Contrast) log 1 (Contrast) λ /D λ /D (c) (d) Figure 5.8: Extrapolated results 111

129 Extrapolated, λ = 55 nm, e-5 4 Extrapolated, λ = 577 nm, e-6 4 Extrapolated, λ = 6 nm, e log 1 (Contrast) 5.5 log 1 (Contrast) 5.5 log 1 (Contrast) (a) λ = 55 nm (b) λ = 577 nm (c) λ = 6 nm Extrapolated, λ = 62 nm, e-6 4 Extrapolated, λ = 633 nm, e-7 4 Extrapolated, λ = 64 nm, e log 1 (Contrast) 5.5 log 1 (Contrast) 5.5 log 1 (Contrast) (d) λ = 62 nm (e) λ = 633 nm (f) λ = 64 nm Extrapolated, λ = 65 nm, e-6 4 Extrapolated, λ = 67 nm, 1.655e-5 4 Extrapolated, λ = 694 nm, e log 1 (Contrast) 5.5 log 1 (Contrast) 5.5 log 1 (Contrast) (g) λ = 65 nm (h) λ = 67 nm (i) λ = 694 nm Extrapolated, λ = 72 nm, e-5 4 Extrapolated, λ = 74 nm, e log 1 (Contrast) log 1 (Contrast) (j) λ = 72 nm (k) λ = 74 nm Figure 5.9: Extrapolated Estimate Individual Filters 112

130 Contrast Plot, Multiple Estimates 1% Band Direct Full Bandwidth Contrast log 1 (Contrast) AVG Contrast Left Contrast Right Contrast Initial Contrast λ λ /D 4.8 (a) (b) Direct 1% Mean Initial = Direct 1% Mean Contrast = log 1 (Contrast) log 1 (Contrast) λ /D λ /D (c) (d) Figure 5.1: Direct Estimate results 113

131 Direct, λ = 55 nm, e-5 4 Direct, λ = 577 nm, e-5 4 Direct, λ = 6 nm, 6.318e log 1 (Contrast) 5.5 log 1 (Contrast) 5.5 log 1 (Contrast) (a) λ = 55 nm (b) λ = 577 nm (c) λ = 6 nm Direct, λ = 62 nm, e-6 4 Direct, λ = 633 nm, 4.46e-6 4 Direct, λ = 64 nm, 6.568e log 1 (Contrast) 5.5 log 1 (Contrast) 5.5 log 1 (Contrast) (d) λ = 62 nm (e) λ = 633 nm (f) λ = 64 nm Direct, λ = 65 nm, 6.671e-6 4 Direct, λ = 67 nm, e-5 4 Direct, λ = 694 nm, 1.777e log 1 (Contrast) 5.5 log 1 (Contrast) 5.5 log 1 (Contrast) (g) λ = 65 nm (h) λ = 67 nm (i) λ = 694 nm Direct, λ = 72 nm, e-5 4 Direct, λ = 74 nm, 4.198e log 1 (Contrast) log 1 (Contrast) (j) λ = 72 nm (k) λ = 74 nm Figure 5.11: Direct Estimate Individual Filters 114