Schedule for ExoStat2014 Second Workshop on Modern Statistical and Computational Methods for Analysis of Kepler Data Hosted by The Department of Statistics at Carnegie Mellon University, Pittsburgh, PA, USA The following rooms will used for the workshop: Location: Baker Hall off of Frew Street (Google maps: type Baker Hall CMU) All talks and discussions: Baker A51 (Giant Eagle Auditorium) Workrooms: Baker A50 (coffee lounge), A51 (Giant Eagle Auditorium), 154A, 154T Wednesday, June 18, 2014 9:00-9:45 am Registration and Breakfast (Baker A50 - The Coffee Lounge) 9:45-10:00 am Welcome Session Jessi Cisewski 10:00-10:25 am Toward Mitigating the Impact of Correlated Noise on Detection and Characterization of Kepler Candidates Rebekah Dawson, UC - Berkeley 10:25-10:50 am Exoplanet population inference and the rate of Earth analogs from noisy, incomplete catalogs Daniel Foreman-Macky, NYU 10:50-11:15 am Coffee Break 11:15-11:40 am Measuring Stellar rotation periods with Gaussian Processes Ruth Angus, University of Oxford 11:40-12:05 pm Fitting stellar spectra with some help from Gaussian processes Ian Czekala, Harvard-Smithsonian CfA 12:05-12:30 pm Taming Spitzer s Systematics with UCF s POET Pipeline Joseph Harrington, University of Central Florida 12:30-2:00 pm Lunch 2:00-2:30 pm Work time preparation (Baker A51 - Giant Eagle Auditorium) 2:30-5:30 pm Work time 5:30-6:00 pm Work time wrap up (Baker A51 - Giant Eagle Auditorium) 7:00 pm Group dinner at The Porch at Schenley (221 Schenley Dr, Pittsburgh, PA 15213) http://www.theporchatschenley.com (Baker A51 - Giant Eagle Auditorium) (Baker A50 - The Coffee Lounge) (On your own) Thursday, June 19, 2014 9:00-9:30 am Breakfast (Baker A50 - The Coffee Lounge) 9:30-9:55 am Hierarchical Bayesian Analyses of the Planetary Mass-Radius Relation from Kepler Data Eric Ford, Penn State 9:55-10:20 am Inferring the Eccentricity Distribution of Short-Period Planet Candidates Detected by Kepler in Occultation Megan Shabram, Penn State 10:20-10:45 am Approximate Bayesian Computation for Exoplanets Jessi Cisewski, Carnegie Mellon University
10:45-11:10 am Coffee Break (Baker A50 - The Coffee Lounge) 11:10-11:35 am Hierarchical Bayesian Models applied to the Planet Mass-Radius-Flux Distribution Leslie Rogers, Caltech 11:35-12:00 pm Update on SysSim: Determining the Distribution of Exoplanetary Architectures Darin Ragozzine, Florida Institute of Technology 12:00-12:25 pm Open Source Cloud Computing for Transiting Planet Discovery Peter McCullough, STSci 12:25-2:00 pm Lunch (On your own) 2:00-2:30 pm Work time preparation (Baker A51 - Giant Eagle Auditorium) 2:30-5:30 pm Work time 5:30-6:00 pm Work time wrap up (Baker A51 - Giant Eagle Auditorium) Friday, June 20, 2014 9:00-9:30 am Breakfast (Baker A50 - The Coffee Lounge) 9:30-9:55 am Simple-ABC: A Python package for Approximate Bayesian Computation Robert Morehead, Penn State 9:55-10:20 am How Rocky Are They? The Composition Distribution of Kepler s Sub-Neptune Planet Candidates within 0.15 AU Angie Wolfgang, UC - Santa Cruz 10:20-10:45 am How Low Can You Go? The Photoeccentric Effect for Planets of Various Sizes Ellen Price, Caltech 10:45-11:10 am Coffee Break (Baker A50 - The Coffee Lounge) 11:10-11:35 am Transit Timing Posteriors through Importance Sampling Benjamin Montet, Caltech/Harvard 11:35-12:00 pm Making Kepler more precise at the pixel level, New York University 12:00-1:30 pm Lunch (On your own) 1:30-2:00 pm Work time preparation (Baker A51 - Giant Eagle Auditorium) 2:00-5:00 pm Work time 5:00-5:30 pm Work time wrap up (Baker A51 - Giant Eagle Auditorium) Saturday, June 21, 2014 9:00-9:30 am Breakfast (Baker A50 - The Coffee Lounge) 9:30-11:00 am Work time 11:00-11:20 am Coffee Break (Baker A50 - The Coffee Lounge) 11:20-12:30 pm Closing Session (Baker A51 - Giant Eagle Auditorium)
Wednesday, June 18, 2014 Abstracts 10:00-10:25 am Rebekah Dawson, UC - Berkeley Toward Mitigating the Impact of Correlated Noise on Detection and Characterization of Kepler Candidates. A realistic treatment of stellar and instrumental noise is essential for detecting planets near the signal-to-noise limit and for characterizing higher to signal-to-noise transits, particularly in determining uncertainties in parameters. For example, correlated noise can masquerade as a transit especially for long period objects with many trials in orbital phase and outshine true transits whose durations well-separated from the dominant noise scales. I compare the performance of three likelihood functions a traditional χ 2 likelihood, the Carter and Winn wavelet likelihood that assumes 1/f noise, and a more flexible wavelet likelihood in detecting and characterizing candidates from Kepler light curves. I also assess the effects of extra in-transit noise caused by the planet occulting star spots or other features on the stellar disk. 10:25-10:50 am Daniel Foreman-Macky, NYU Exoplanet population inference and the rate of Earth analogs from noisy, incomplete catalogs. Many exoplanets have been discovered and characterized; it is now possible to study the population as a whole. Although no true extra-solar Earth analog is known, hundreds of planets have been found around Sun-like stars that are either Earth-sized but on shorter periods, or else on year-long orbits but somewhat larger. Under strong assumptions, these exoplanet catalogs have been used to make an extrapolated estimate of the rate at which Sun-like stars host Earth analogs. These studies are complicated by the fact that every catalog is censored by non-trivial selection effects and detection efficiencies, and every property (period, radius, etc.) is measured noisily. We have developed a general hierarchical probabilistic framework for making justified inferences about the population of exoplanets, taking into account survey completeness and, for the first time, observational uncertainties. We are able to make fewer assumptions about the distribution than previous studies; we only require that the occurrence rate density be a smooth function of period and radius (employing a Gaussian process). By applying our method to synthetic catalogs, we demonstrate that it produces more accurate estimates of the whole population than standard procedures based on weighting by inverse detection efficiency. We apply the method to an existing catalog of small planet candidates around G stars (Petigura et al. 2013). We confirm a previous result that the radius distribution changes slope near Earth radius and we find that the rate density of Earth analogs is about 0.02 (per star per natural logarithmic bin in period and radius) with large uncertainty. This number is much smaller than previous estimates made with the same data but stronger assumptions. 11:15-11:40 am Ruth Angus, University of Oxford Measuring Stellar rotation periods with Gaussian Processes. With space-based missions like Kepler it has become possible to measure stellar rotation periods directly from light curves. Active regions on the surfaces of stars rotate in and out of view, producing periodic variations detectable in Kepler data. Unfortunately, none of the commonly used methods for measuring periodic signals are ideally suited to photometric stellar rotation periods. Periodogram and wavelet based methods are not suited to quasi-periodic, non- sinusoidal signals and often detect harmonics or sub-harmonics of the true rotation period. The autocorrelation function (ACF) method (McQuillan et al, 2012) is able to distinguish a true period from its harmonics in most cases and is better adapted to quasi-periodic, non-sinusiodal signals. General practise within the field is to use the ACF method, or some combination of ACF, periodogram and wavelets, however one major drawback to all of the methods listed above is that uncertainties on rotation periods are difficult to quantify. In an ideal world, physically- motivated models would be used to measure rotation periods. Unfortunately, the physics driving the production of variability in stellar light curves is poorly understood and parameters such as spot lifetime and differential rotation (shear) are highly degenerate. In the absence of a well- motivated physical model, we opt to use a highly flexible semi-parametric model: a Gaussian process (GP) with a quasi-periodic covariance function. This GP model should be well suited to quasi-periodic, non-sinusoidal signals, it will be capable of modelling noise and physical signals simultaneously and will provide rotation period measurements with realistic uncertainties.
11:40-12:05 pm Ian Czekala, Harvard-Smithsonian CfA Fitting stellar spectra with some help from Gaussian processes. We present a novel approach for making accurate and unbiased inferences of fundamental stellar parameters (e.g., effective temperature, surface gravity, metallicity) from spectroscopic observations, with reference to a library of synthetic spectra. The forward-modeling formalism we have developed is generic (easily adaptable to data from any instrument or covering any wavelength range) and modular, in that it can incorporate external prior knowledge or additional data (e.g., broadband photometry) and account for instrumental and non-stellar effects on the spectrum (e.g., parametric treatments of extinction, spots, etc.). We use covariance kernels to model the systematic discrepancies between the observations and the synthetic spectral library as correlated noise, ensuring that issues like uncertainties in atomic or molecular constants do not strongly bias the parameter inferences. In addition to extracting a set of unbiased inferences of the (posterior) probability distributions for basic stellar parameters, our modeling approach also maps out problematic spectral regions in the synthetic libraries that could be used as a basis for improving the models. As a demonstration, we present some preliminary results from modeling optical spectra of well-characterized exoplanet host stars. A basic set of adaptable software that performs this modeling approach will be released publicly. 12:05-12:30 pm Joseph Harrington, University of Central Florida Taming Spitzer s Systematics with UCF s POET Pipeline. Spitzer was designed for 10% absolute and 1% relative photometry, but exoplanet eclipses, for which Spitzer has become famous, require relative photometry as good as 0.01%. The community has identified numerous systematics, including latent images, which depend on the nature of the previous observation, and intrapixel sensitivity variation. Our POET (Photometry for Orbits, Eclipses, and Transits) pipeline removes these by applying a large selection of candidate models, none of which is consistently the best. POET uses Bayesian methods to explore the model phase space and select the best model. It is written in object-oriented Python and Python-wrapped C and is highly automated, stopping only when a human decision is required. It can model multiple datasets at once, sharing common parameters. This makes it simple enough that several undergraduates have used it to write lead-author ApJ papers. We are preparing open-source releases of several key components, including our Multi-Core Markov-chain Monte Carlo (MCcubed) code, which implements both the simplistic Metropolis-Hastings and the self-tuning differential-evolution samplers. We present an overview of the current POET techniques and the path of discovery taken by the community. With Tom Loredo (Cornell) and the UCF Exoplanets Team Thursday, June 19, 2014 9:30-9:55 am Eric Ford, Penn State Hierarchical Bayesian Analyses of the Planetary Mass-Radius Relation from Kepler Data. NASA s Kepler mission is revolutionizing our knowledge of the frequency of planets and how it depends on orbital period, planet size and host star properties. Astronomers have much more limited information about the distribution of planet masses. Characterizing the relationship between planet masses and radii is a key goal for the coming years. I will describe the results of preliminary calculations meant to understand the practicality of performing a Hierarchical Bayesian analysis of the planet mass-radius relationship. 9:55-10:20 am Megan Shabram, Penn State Inferring the Eccentricity Distribution of Short-Period Planet Candidates Detected by Kepler in Occultation. We explore the eccentricity distribution of a sample of short-period planet candidates observed in both transit and occultation by Kepler. We use hierarchical Bayesian analysis to calculate the posterior eccentricity distribution. We explore the sensitivity of our results to the choice of priors and demonstrate the robustness of our results by analyzing simulated observations. If we assume a Rayleigh distribution for eccentricities, then the posterior mode for the dispersion of the eccentricity distribution is σ 0.08. However, we find that a Rayleigh distribution of eccentricities is inadequate to model plausible eccentricity distributions. Fortunately, we demonstrate that a broad range of realistic eccentricity distributions can be well described by a two component Gaussian mixture model with zero mean for both h = e cos ω and k = e sin ω. Using this model for a sample of 60 planet candidates with both transit and occultation measurements, we find that the posterior mode for the mixture fractions and dispersions are f low = 0.87 ± 0.048, σ low = 0.012 ± 0.005, and f high = 0.13 ± 0.048, σ high = 0.19 ± 0.043 respectively. Our results provide support for two populations of short-period planet candidates: one population with nearly circular orbits, and a smaller fraction that have managed to retain significant eccentricities. Next, we explore how the eccentricity distribution correlates with stellar effective temperature, planet radius, orbital period, and stellar metallicity. We find evidence that systems around higher metallicity stars are more dynamically complex, but current results are limited by uncertainties in photometrically derived stellar metallicity and sample size.
10:20-10:45 am Jessi Cisewski, Carnegie Mellon University Approximate Bayesian Computation for Exoplanets. A standard Bayesian statistical analysis relies on the specification of a likelihood. Unfortunately the likelihood is not always known or tractable. Approximate Bayesian computation (ABC) provides a framework for performing inference in cases where the likelihood is not available. I will talk about the work initiated last summer at SAMSI: ABC for the eccentricity distribution of exoplanets. 11:10-11:35 am Leslie Rogers, Caltech Hierarchical Bayesian Models applied to the Planet Mass-Radius-Flux Distribution. The Kepler Mission, combined with ground based radial velocity (RV) follow-up and transit timing variations (TTV) dynamical analyses, has revolutionized the observational constraints on sub-neptune-size planet compositions. Keplers unprecedentedly large and homogeneous samples of planets with both mass and radius constraints open the possibility of statistical studies of the underlying planet composition distribution. This presentation will describe the application of hierarchical Bayesian models to constrain the underlying planet composition distribution from a sample of noisy mass-radius measurements. This approach represents a promising avenue toward a quantitative measurement of the amount of physical scatter in small planet compositions, the identification of planet sub-populations that may be tied to distinct formation pathways, and empirical constraints on the dominant compositional trends in the planet sample. Near-term goals include distilling composition-distribution insights from the current sample of Kepler planets with RV masses, and informing the target selection for future transiting planet RV follow-up surveys. This presentation reports on work in progress, and is aimed at catalyzing discussion and collaboration during the workshop. 11:35-12:00 pm Darin Ragozzine, Florida Institute of Technology Update on SysSim: Determining the Distribution of Exoplanetary Architectures. I will review the SysSim project I described last year at SAMSI: the goal of this project is to create a forward model that can explain the architectures of exoplanetary systems as observed by Kepler. After the overview, I ll discuss the progress that we ve made on this project (which has only recently started in earnest). I will discuss the method we ve developed for drawing simultaneously from the period and period ratio distribution and why this is important. I hope to then encourage additional discussion on how to practically develop an Approximate Bayesian Computing method that can account for multiple semi-dependent statistical comparisons between synthetic data and Kepler observations. 12:00-12:25 pm Peter McCullough, STSci Open Source Cloud Computing for Transiting Planet Discovery. We provide an update on the development of the opensource software suite designed to detect exoplanet transits using high-performance and cloud computing resources (https: //github.com/openexo). Our collaboration continues to grow as we are developing algorithms and codes related to the detection of transit-like events, especially in Kepler data, Kepler 2.0 and TESS data when available. Extending the work of Berriman et al. (2010, 2012), we describe our use of the XSEDE-Gordon supercomputer and Amazon EMR cloud to search for aperiodic transit-like events in Kepler light curves. Such events may be caused by circumbinary planets or transiting bodies, either planets or stars, with orbital periods comparable to or longer than the observing baseline such that only one transit is observed. As a bonus, we use the same code to find stellar flares too; whereas transits reduce the flux in a box-shaped profile, flares increase the flux in a fast-rise, exponential-decay (FRED) profile that nevertheless can be detected reliably with a square-wave finder. Friday, June 20, 2014 9:30-9:55 am Robert Morehead, Penn State Simple-ABC: A Python package for Approximate Bayesian Computation. Approximate Bayesian computation (ABC) has enormous potential to become an useful addition to an astronomer s statistical toolbox. Though relatively simple to implement, there is always a barrier to the adoption of new methodologies when there is a lack of software tools available in a researcher s preferred computing environment. Since Python is rapidly becoming the default programing language in astronomical data analysis, we have recently started work on Simple-ABC, a publicly available Python package for ABC based on easy to manipulate model objects. I will give a brief overview of the package and share some early results based a simplified model of the population of Kepler exoplanets.
9:55-10:20 am Angie Wolfgang, UC - Santa Cruz How Rocky Are They? The Composition Distribution of Kepler s Sub-Neptune Planet Candidates within 0.15 AU. The Kepler Mission has found thousands of planetary candidates with radii between 1 and 4 R Earth. As these planets have no analogues in our own Solar System, they provide an unprecedented opportunity to understand the range and distribution of planetary compositions allowed by planet formation and evolution. Although a mass measurement is usually required to constrain the possible compositions of an individual super-earth-sized planet, we can address this question for the Kepler sample without them, via a statistical approach that leverages interior structure models which, by accounting for the thermal evolution of a gaseous envelope around a rocky core, yield radii largely independent of mass. In particular, we apply Hierarchical Bayesian Modeling to a complete subsample of Kepler planet candidates to find the current-day composition distribution, which shows that gaseous envelopes are most likely to be between 0.1% and 1% of the planet s total mass. We also address the gaseous/rocky transition and illustrate its sensitivity to the uncertainty in the planets radii. Finally, we illustrate that this composition distribution does not result in a one-to-one relationship between the population of sub-neptune masses and radii; accordingly, dynamical studies which wish to use Kepler data must adopt a probabilistic approach to accurately represent the range of possible masses at a given radius. 10:20-10:45 am Ellen Price, Caltech How Low Can You Go? The Photoeccentric Effect for Planets of Various Sizes. It is well-known that the light curve of a transiting planet contains information about the planet s orbital period and size relative to the host star. We can also extract the eccentricity from the light curve because eccentricity changes the transit duration and shape compared to the transit of a planet on a circular orbit. This is one manifestation of what we call the photoeccentric effect. So far, this approach has only been used to study large planets with high signal-to-noise (S/N) transit data, which raises the question of how well the photoeccentric effect can constrain eccentricities for smaller planets or in cases with lower S/N. We explore the limits of the photoeccentric effect with analytic and numerical techniques. For a set of planetary and stellar parameters we are able to predict the best-case uncertainty in eccentricity that can be measured via the photoeccentric effect. We develop a rule of thumb that for per-point photometric uncertainties σ = {10 3, 10 4, 10 5 } in Kepler long-cadence data, the critical values of planet-star radius ratio are Rp/R* = {0.1, 0.05, 0.03}; for smaller values of Rp/R*, eccentricity is not well-constrained. This clears the path to study the precise eccentricities of a larger and more diverse collection of planets in the Kepler sample. Co-authors: Leslie A. Rogers, Rebekah I. Dawson, John A. Johnson 11:10-11:35 am Benjamin Montet, Caltech/Harvard Transit Timing Posteriors through Importance Sampling. Due to the high precision of Kepler light curves, the first unambiguous observations of transit timing variations (TTVs) are now in hand. TTVs provide the potential to precisely characterize transiting exoplanets and detect additional nontransiting companions to transiting planets. I will discuss the need for full posterior distributions of transit times and the current deficiencies in the current datasets, with a focus on stellar astrophysics and planetary dynamics. I will then provide a status update on our preliminary efforts to produce such posteriors through importance sampling for all the Kepler planet candidates. 11:35-12:00 pm, New York University Making Kepler more precise at the pixel level. I discuss two pieces of vaporware we are developing to improve the precision of the light curves of stars delivered by Kepler. In the first, we use a regression of pixels against other pixels to build a predictive model for every pixel in the raw data (TPF). The structure of the model is based on strong beliefs about causal structure: Where pixels that talk to spatially separated stars are covariant, they must be affected by a confounder, which in this case is the spacecraft. In the second, we are replacing the rigid, pre-computed apertures used by Kepler to produce SAP photometry with soft apertures generated by optimizing an empirical estimate of the expected signal-to-noise. These optimized apertures produce photometry devoid of some of the trends induced by spacecraft variability. I make some comments about combining these two projects and the opportunities the combination might afford.