arxiv:0712.3028v2 [astro-ph] 7 Jan 2008



Similar documents
A quantum model for the stock market

Face Hallucination and Recognition

SAT Math Must-Know Facts & Formulas

Betting Strategies, Market Selection, and the Wisdom of Crowds

TERM INSURANCE CALCULATION ILLUSTRATED. This is the U.S. Social Security Life Table, based on year 2007.

Advanced ColdFusion 4.0 Application Development Server Clustering Using Bright Tiger

ASYMPTOTIC DIRECTION FOR RANDOM WALKS IN RANDOM ENVIRONMENTS arxiv:math/ v2 [math.pr] 11 Dec 2007

COMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION

Chapter 3: JavaScript in Action Page 1 of 10. How to practice reading and writing JavaScript on a Web page

Secure Network Coding with a Cost Criterion

3.5 Pendulum period :40:05 UTC / rev 4d4a39156f1e. g = 4π2 l T 2. g = 4π2 x1 m 4 s 2 = π 2 m s Pendulum period 68

Risk Margin for a Non-Life Insurance Run-Off

Pay-on-delivery investing

Life Contingencies Study Note for CAS Exam S. Tom Struppeck

3.3 SOFTWARE RISK MANAGEMENT (SRM)

Early access to FAS payments for members in poor health

Chapter 1 Structural Mechanics

Teamwork. Abstract. 2.1 Overview

Fixed income managers: evolution or revolution

SAT Math Facts & Formulas

Market Design & Analysis for a P2P Backup System

DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS

Fast Robust Hashing. ) [7] will be re-mapped (and therefore discarded), due to the load-balancing property of hashing.

Pricing and Revenue Sharing Strategies for Internet Service Providers

A Latent Variable Pairwise Classification Model of a Clustering Ensemble

A New Statistical Approach to Network Anomaly Detection

Finance 360 Problem Set #6 Solutions

Risk Margin for a Non-Life Insurance Run-Off

Betting on the Real Line

Energy Density / Energy Flux / Total Energy in 3D

WHITE PAPER UndERsTAndIng THE VAlUE of VIsUAl data discovery A guide To VIsUAlIzATIons

GREEN: An Active Queue Management Algorithm for a Self Managed Internet

Physics 100A Homework 11- Chapter 11 (part 1) The force passes through the point A, so there is no arm and the torque is zero.

Australian Bureau of Statistics Management of Business Providers

CONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS

Insertion and deletion correcting DNA barcodes based on watermarks

DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS

Breakeven analysis and short-term decision making

Normalization of Database Tables. Functional Dependency. Examples of Functional Dependencies: So Now what is Normalization? Transitive Dependencies

Business schools are the academic setting where. The current crisis has highlighted the need to redefine the role of senior managers in organizations.

GWPD 4 Measuring water levels by use of an electric tape

Figure 1. A Simple Centrifugal Speed Governor.

Bite-Size Steps to ITIL Success

Art of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN:

The Whys of the LOIS: Credit Risk and Refinancing Rate Volatility

An Idiot s guide to Support vector machines (SVMs)

Design of Follow-Up Experiments for Improving Model Discrimination and Parameter Estimation

Let s get usable! Usability studies for indexes. Susan C. Olason. Study plan

Simultaneous Routing and Power Allocation in CDMA Wireless Data Networks

Leakage detection in water pipe networks using a Bayesian probabilistic framework

CONDENSATION. Prabal Talukdar. Associate Professor Department of Mechanical Engineering IIT Delhi

Multi-Robot Task Scheduling

FRAME BASED TEXTURE CLASSIFICATION BY CONSIDERING VARIOUS SPATIAL NEIGHBORHOODS. Karl Skretting and John Håkon Husøy

Chapter 3: e-business Integration Patterns

Lecture 3: Linear methods for classification

Chapter 2 Traditional Software Development

Business Banking. A guide for franchises

Advantages and Disadvantages of Sampling. Vermont ASQ Meeting October 26, 2011

Tort Reforms and Performance of the Litigation System; Case of Medical Malpractice [Preliminary]

Design and Analysis of a Hidden Peer-to-peer Backup Market

Mean-field Dynamics of Load-Balancing Networks with General Service Distributions

The Simple Pendulum. by Dr. James E. Parks

AA Fixed Rate ISA Savings

A Supplier Evaluation System for Automotive Industry According To Iso/Ts Requirements

Statistical Machine Learning

Introduction the pressure for efficiency the Estates opportunity

A short guide to making a medical negligence claim

On the relationship between radiance and irradiance: determining the illumination from images of a convex Lambertian object

APPENDIX 10.1: SUBSTANTIVE AUDIT PROGRAMME FOR PRODUCTION WAGES: TROSTON PLC

Virtual trunk simulation

Hybrid Process Algebra

Local coupling of shell models leads to anomalous scaling

On Capacity Scaling in Arbitrary Wireless Networks

No longer living together: how does Scots cohabitation law work in practice?

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm


Oligopoly in Insurance Markets

Capacity of Multi-service Cellular Networks with Transmission-Rate Control: A Queueing Analysis

Comparison of Traditional and Open-Access Appointment Scheduling for Exponentially Distributed Service Time

STUDY MATERIAL. M.B.A. PROGRAMME (Code No. 411) (Effective from ) II SEMESTER 209MBT27 APPLIED RESEARCH METHODS IN MANAGEMENT

l l ll l l Exploding the Myths about DETC Accreditation A Primer for Students

Discounted Cash Flow Analysis (aka Engineering Economy)

ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l l. l l

LADDER SAFETY Table of Contents

7. Dry Lab III: Molecular Symmetry

Learning from evaluations Processes and instruments used by GIZ as a learning organisation and their contribution to interorganisational learning

Swingtum A Computational Theory of Fractal Dynamic Swings and Physical Cycles of Stock Market in A Quantum Price-Time Space

The guaranteed selection. For certainty in uncertain times

ELECTRONIC FUND TRANSFERS. l l l. l l. l l l. l l l

NCH Software FlexiServer

SQL. Ilchul Yoon Assistant Professor State University of New York, Korea. on tables. describing schema. CSE 532 Theory of Database Systems

A Description of the California Partnership for Long-Term Care Prepared by the California Department of Health Care Services

Transcription:

A practica guide to Basic Statistica Techniques for Data Anaysis in Cosmoogy Licia Verde arxiv:0712.3028v2 [astro-ph] 7 Jan 2008 ICREA & Institute of Space Sciences (IEEC-CSIC), Fac. Ciencies, Campus UAB Torre C5 pare 2 Beaterra (Spain) Abstract This is the summary of 4 Lectures given at the XIX Canary isands winter schoo of Astrophysics The Cosmic Microwave Background, from quantum Fuctuations to the present Universe. Eectronic address: verde@ieec.uab.es 1

Contents I. INTRODUCTION 3 II. Probabiities 4 A. What s probabiity: Bayesian vs Frequentist 4 B. Deaing with probabiities 5 C. Moments and cumuants 7 D. Usefu trick: the generating function 8 E. Two usefu distributions 8 1. The Poisson distribution 8 2. The Gaussian distribution 9 III. Modeing of data and statistica inference 10 A. Chisquare, goodness of fit and confidence regions 11 1. Goodness of fit 12 2. Confidence region 13 B. Likeihoods 13 1. Confidence eves for ikeihood 14 C. Marginaization, combining different experiments 14 D. An exampe 16 IV. Description of random fieds 16 A. Gaussian random fieds 18 B. Basic toos 19 1. The importance of the Power spectrum 21 C. Exampes of rea word issues 22 1. Discrete Fourier transform 23 2. Window, seection function, masks etc 24 3. pros and cons of ξ(r) and P(k) 25 4.... and for CMB? 26 5. Noise and beams 27 6. Aside: Higher orders correations 29 2

V. More on Likeihoods 30 VI. Monte Caro methods 33 A. Monte Caro error estimation 33 B. Monte Caro Markov Chains 34 C. Markov Chains in Practice 36 D. Improving MCMC Efficiency 37 E. Convergence and Mixing 38 F. MCMC Output Anaysis 40 VII. Fisher Matrix 40 VIII. Fisher matrix 41 IX. Concusions 43 Acknowedgments 43 X. Questions 43 Acknowedgments 45 XI. Tutorias 45 References 47 Figures 49 Tabes 51 I. INTRODUCTION Statistics is everywhere in Cosmoogy, today more than ever: cosmoogica data sets are getting ever arger, data from different experiments can be compared and combined; as the statistica error bars shrink, the effect of systematics need to be described, quantified and accounted for. As the data sets improve the parameter space we want to expore aso grows. In the ast 5 years there were more than 370 papers with statistic- in the tite! 3

There are many exceent books on statistics: however when I started using statistics in cosmoogy I coud not find a the information I needed in the same pace. So here I have tried to put together a starter kit of statistica toos usefu for cosmoogy. This wi not be a rigorous introduction with theorems, proofs etc. The goa of these ectures is to a) be a practica manua: to give you enough knowedge to be abe to understand cosmoogica data anaysis and/or to find out more by yoursef and b) give you a bag of tricks (hopefuy) usefu for your future work. Many usefu appications are eft as exercises (often with hints) and are thus proposed in the main text. I wi start by introducing probabiity and statistics from a Cosmoogist point of view. Then I wi continue with the description of random fieds (ubiquitous in Cosmoogy), foowed by an introduction to Monte Caro methods incuding Monte-Caro error estimates and Monte Caro Markov Chains. I wi concude with the Fisher matrix technique, usefu for quicky forecasting the performance of future experiments. II. PROBABILITIES A. What s probabiity: Bayesian vs Frequentist Probabiity can be interpreted as a frequency P = n N (1) where n stands for the successes and N for the tota number of trias. Or it can be interpreted as a ack of information: if I knew everything, I know that an event is surey going to happen, then P = 1, if I know it is not going to happen then P = 0 but in other cases I can use my judgment and or information from frequencies to estimate P. The word is divided in Frequentists and Bayesians. In genera, Cosmoogists are Bayesians and High Energy Physicists are Frequentists. For Frequentists events are just frequencies of occurrence: probabiities are ony defined as the quantities obtained in the imit when the number of independent trias tends to infinity. Bayesians interpret probabiities as the degree of beief in a hypothesis: they use judgment, prior information, probabiity theory etc... 4

As we do cosmoogy we wi be Bayesian. B. Deaing with probabiities In probabiity theory, probabiity distributions are fundamenta concepts. They are used to cacuate confidence intervas, for modeing purposes etc. We first need to introduce the concept of random variabe in statistics (and in Cosmoogy). Depending on the probem at hand, the random variabe may be the face of a dice, the number of gaaxies in a voume δv of the Universe, the CMB temperature in a given pixe of a CMB map, the measured vaue of the power spectrum P(k) etc. The probabiity that x (your random variabe) can take a specific vaue is P(x) where P denotes the probabiity distribution. The properties of P are: 1. P(x) is a non negative, rea number for a rea vaues of x. 2. P(x) is normaized so that 1 dxp(x) = 1 3. For mutuay excusive events x 1 and x 2, P(x 1 + x 2 ) = P(x 1 ) + P(x 2 ) the probabiity of x 1 or x 2 to happen is the sum of the individua probabiities. P(x 1 + x 2 ) is aso written as P(x 1 Ux 2 ) or P(x 1.OR.x 2 ). 4. In genera: P(a, b) = P(a)P(b a) ; P(b, a) = P(b)P(a b) (2) The probabiity of a and b to happen is the probabiity of a times the conditiona probabiity of b given a. Here we can aso make the (apparenty tautoogica) identification P(a, b) = P(b, a). For independent events then P(a, b) = P(a)P(b). Exercise: Wi it be sunny tomorrow? answer in the frequentist way and in the Bayesian way 2. Exercise: Produce some exampes for rue (iv) above. 1 for discrete distribution 2 These ectures were given in the Canary Isands, in other ocations answer may differ... 5

- Whie Frequentists ony consider distributions of events, Bayesians consider hypotheses as events, giving us Bayes theorem: P(H D) = P(H)P(D H) P(D) (3) where H stands for hypothesis (generay the set of parameters specifying your mode, athough many cosmoogists now aso consider mode themseves) and D stands for data. P(H D) is caed the posterior distribution. P(H) is caed the prior and P(D H) is caed ikeihood. Note that this is nothing but equation 2 with the apparenty tautoogica identity P(a, b) = P(b, a) and with substitutions: b H and a D. Despite its simpicity Eq. 3 is a reay important equation!!! The usua points of heated discussion foow: How do you chose P(H)?, Does the choice affects your fina resuts? (yes, in genera it wi). Isn t this then a bit subjective? Exercise: Consider a positive definite quantity (ike for exampe the tensor to scaar ratio r or the optica depth to the ast scattering surface τ). What prior shoud one use? a fat prior in the variabe? or a ogaritmic prior (i.e. fat prior in the og of the quantity)? for exampe CMB anaysis may use a fat prior in nr, and in Z = exp( 2τ). How is this reated to using a fat pror in r or in τ? It wi be usefu to consider the foowing: effectivey we are comparing P(x) with P(f(x)), where f denotes a function of x. For exampe x is τ and f(x) is exp( 2τ). Reca that: P(f) = P(x(f)) df 1. The Jacobian of the transformation appears here to conserve probabiities. Exercise: Compare Fig 21 of Sperge et a (2007)(26) with figure 13 of Sperge et a 2003 (25). Consider the WMAP ony contours. Ceary the 2007 paper uses more data than the 2003 paper, so why is that the constraints ook worst? If you suspect the prior you are correct! Find out which prior has changed, and why it makes such a difference. Exercise: Under which conditions the choice of prior does not matter? (hint: compare the WMAP papers of 2003 and 2007 for the fat LCDM case). dx 6

C. Moments and cumuants Moments and cumuants are used to characterize the probabiity distribution. In the anguage of probabiity distribution averages are defined as foows: f(x) = dxf(x)p(x) (4) These can then be reated to expectation vaues (see ater). For now et us just introduce the moments: ˆµ m = x m and, of specia interest, the centra moments: µ m = (x x ) m. - Exercise: show that ˆµ 0 = 1 and that the average x = ˆµ 1. Aso show that µ 2 = x 2 x 2 Here, µ 2 is the variance, µ 3 is caed the skewness, µ 4 is reated to the kurtosis. If you dea with the statistica nature of initia conditions (i.e. primordia non-gaussianity) or non-inear evoution of Gaussian initia conditions, you wi encounter these quantities again (and again..). Up to the skewness, centra moments and cumuants coincide. For higher-order terms things become more compicated. To keep things a simpe as possibe et s just consider the Gaussian distribution (see beow) as reference. Whie moments of order higher than 3 are non-zero for both Gaussian and non-gaussian distribution, the cumuants of higher orders are zero for a Gaussian distribution. In fact, for a Gaussian distribution a moments of order higher than 2 are specified by µ 1 and µ 2. Or, in other words, the mean and the variance competey specify a Gaussian distribution. This is not the case for a non-gaussan distribution. For non-gaussian distribution, the reation between centra moments and cumuants κ for the first 6 orders is reported beow. µ 1 = 0 (5) µ 2 = κ 2 (6) µ 3 = κ 3 (7) µ 4 = κ 4 + 3(κ 2 ) 2 (8) µ 5 = κ 5 + 10κ 3 κ 2 (9) µ 6 = κ 6 + 15κ 4 κ 2 + 10(κ 3 ) 2 + 15(κ 2 ) 3 (10) 7

D. Usefu trick: the generating function The generating function aows one, among other things, to compute quicky moments and cumuants of a distribution. Define the generating function as Z(k) = exp(ikx) = dx exp(ikx)p(x) (11) Which may sound famiiar as it is a sort of Fourier transform... Note that this can be written as an infinite series (by expanding the exponentia) giving (exercise) (ik) n Z(k) = ˆµ n (12) n=0 n! So far nothing specia, but now the neat trick is that the moments are obtained as: ˆµ n = ( i n ) dn dk nz(k) k=0 (13) and the cumuants are obtained by doing the same operation on nz. Whie this seems just a neat trick for now it wi be very usefu shorty. E. Two usefu distributions Two distributions are widey used in Cosmoogy: these are the Poisson distribution and the Gaussian distribution. 1. The Poisson distribution The Poisson distribution describes an independent point process: photon noise, radioactive decay, gaaxy distribution for very few gaaxies, point sources... It is an exampe of a discrete probabiity distribution. For cosmoogica appications it is usefu to think of a Poisson process as foows. Consider a random process (for exampe a random distribution of gaaxies in space) of average density ρ. Divide the space in infinitesima ces, of voume δv so sma that their occupation can ony be 0 or 1 and the probabiity of having more than one object per ce is 0. Then the probabiity of having one object in a given ce is P 1 = ρδv and the probabiity of getting no object in the ce is therefore P 0 = 1 ρδv. Thus for one ce the generating function is Z(k) = n P n exp(ikn) = 1 + ρδv (exp(ik) 1) and for a voume V with V/δV ces, we have Z(k) = (1 + ρδv (exp(ik) 1)) V/δV exp[ρv (exp(ik) 1)]. 8

With the substitution ρv λ we obtain Z(k) = exp[λ(exp(ik) 1)] = n=0 λ n /n! exp( λ) exp(ikn). Thus the Poission probabiity distribution we recover is: P n = λn exp[ λ] (14) n! Exercise: show that for the Poisson distribution n = λ and that σ 2 = λ. - 2. The Gaussian distribution The Gaussian distribution is extremey usefu because of the Centra Limit theorem. The Centra Limit theorem states that the sum of many independent and identicay distributed random variabes wi be approximatey Gaussiany distributed. The conditions for this to happen are quite mid: the variance of the distribution one starts off with has to be finite. The proof is remarkaby simpe. Let s take n events with probabiity distributions P(x i ) and < x i >= 0 for simpicity, and et Y be their sum. What is P(Y )? The generating function for Y is the product of the generating functions for the x i : [ ] m= n ( (ik)m Z Y (k) = µ m 1 1 k 2 < x 2 n > +...) (15) m! 2 n m=0 for n then Z Y (k) exp[ 1/2k 2 < x 2 >]. By recaing the definition of generating function (eq. 11 we can see that the probabiity distribution which generated this Z is [ 1 P(Y ) = 2π < x2 > exp 1 Y 2 ] (16) 2 < x 2 > that is a Gaussian! Exercise: Verify that higher order cumuants are zero for the Gaussian distribution. Exercise: Show that the Centra imit theorem hods for the Poisson distribution. Beyond the Centra Limit theorem, the Gaussian distribution is very important in cosmoogy as we beieve that the initia conditions, the primordia perturbations generated from 9

infation, had a distribution very very cose to Gaussian. (Athough it is crucia to test this experimentay.) We shoud aso remember that thanks to the Centra Limit theorem, when we estimate parameters in cosmoogy in many cases we approximate our data as having a Gaussian distribution, even if we know that each data point is NOT drawn from a Gaussian distribution. The Centra Limit theorem simpifies our ives every day... There are exceptions though. Let us for exampe consider N independent data points drawn from a Cauchy distribution: P(x) = [πσ(1 + [(x x)/σ] 2 ] 1. This is a proper probabiity distribution as it integrates to unity, but moments diverge. One can show that the numerica mean of a finite number N of observations is finite but the popuation mean (the one defined through the integra of equation (4) with f(x) = x) is not. Note aso that the scatter in the average of N data points drawn from this distribution is the same as the scatter in 1 point: the scatter never diminishes regardess of the sampe size... III. MODELING OF DATA AND STATISTICAL INFERENCE To iustrate this et us foow the exampe from (28). If you have an urn with N red bas and M bue bas and you draw from the urn, probabiity theory can te you what the chances are of you to pick a red ba given that you has so far drawn m bue and n red ones... However in practice what you want to do is to use probabiity to te you what is the distribution of the bas in the urn having made a few drawn from it! In other words, if you knew everything about the Universe, probabiity theory coud te you what the probabiities are to get a given outcome for an observation. However, especiay in cosmoogy, you want to make few observations and draw concusions about the Universe! With the added compication that experiments in Cosmoogy are not quite ike experiments in the ab: you can t poke the Universe and see how it reacts, and in many cases you can t repeat the observation, and you can ony see a sma part of the Universe! Keeping this caveat in mind et s push ahead. Given a set of observations often you want to fit a mode to the data, where the mode is described by a set of parameters α. Sometimes the mode is physicay motivated (say CMB anguar power spectra etc.) or a convenient function (e.g. initia studies of arge scae structure were fitting gaaxies correation functions with power aws). Then you want to 10

define a merit function, that measures the agreement between the data and the mode: by adjusting the parameters to maximize the agreement one obtains the best fit parameters. Of course, because of measurement errors, there wi be errors associated to the parameter determination. To be usefu a fitting procedure shoud provide a) best fit parameters b) error estimates on the parameters c) possiby a statistica measure of the goodness of fit. When c) suggests that the mode is a bad description of the data, then a) and b) make no sense. Remember at this point Bayes theorem: whie you may want to ask: What is the probabiity that a particuar set of parameters is correct?, what you can ask to a figure of merit is Given a set of parameters, what is the probabiity that that this data set coud have occurred?. This is the ikeihood. You may want to estimate parameters by maximizing the ikeihood and somehow identify the ikeihood (probabiity of the data given the parameters) with the ikeihood of the mode parameters. A. Chisquare, goodness of fit and confidence regions Foowing Numerica recipes ((24), Chapter 15) it is easier to introduce mode fitting and parameter estimation using the east-squares exampe. Let s say that D i are our data points and y( x i α) a mode with parameters α. For exampe if the mode is a straight ine then α denotes the sope and intercept of the ine. The east squares is given by: χ 2 = i w i [D i y(x i α)] 2 (17) and you can show that the minimum variance weights are w i = 1/σ1 2. - Exercise: if the points are correated how does this equation change? - Best fit vaue parameters are the parameters that minimize the χ 2. Note that a numerica exporation can be avoided: by soving χ 2 / α i 0 you can find the best fit parameters. 11

1. Goodness of fit In particuar, if the measurement errors are Gaussiany distributed, and (as in this exampe) the mode is a inear function of the parameters, then the probabiity distribution of for different vaues of χ 2 at the minimum is the χ 2 distribution for ν n m degrees of freedom (where m is the number of parameters and n is the number of data points. The probabiity that the observed χ 2 even for a correct mode is ess than a vaue ˆχ 2 is P(χ 2 < ˆχ 2, ν) = P(ν/2, ˆχ 2 /2) = Γ(ν/2, ˆχ 2 /2) where Γ stands for the incompete Gamma function. Its compement, Q = 1 P(ν/2, ˆχ 2 /2) is the probabiity that the observed χ 2 exceed by chance ˆχ 2 even for a correct mode. See numerica recipes (24) chapters 6.3 and 15.2 for more detais. It is common that the chi-square distribution hods even for modes that are non inear in the parameters and even in more genera cases (see an exampe ater). The computed probabiity Q gives a quantitative measure of the goodness of fit when evauated at the best fit parameters (i.e. at χ 2 min ). If Q is a very sma probabiity then a) the mode is wrong and can be rejected b) the errors are reay arger than stated or c) the measurement errors were not Gaussiany distributed. If you know the actua error distribution you may want to Monte Caro simuate synthetic data stets, subject them to your actua fitting procedure, and determine both the probabiity distribution of your χ 2 statistic and the accuracy with which mode parameters are recovered by the fit (see section on Monte Caro Methods). On the other hand Q may be too arge, if it is too near 1 then aso something s up: a) errors may have been overestimated b) the data are correated and correations were ignored in the fit. c) In principe it may be that the distribution you are deaing with is more compact than a Gaussian distribution, but this is amost never the case. So make sure you excude cases a) and b) before you invest a ot of time in exporing option c). Postscript: the Chi-by eye rue is that the minimum χ 2 shoud be roughy equa to the number of data-number of parameters (giving rise to the widespread use of the so-caed reduced chisquare). Can you possiby rigorousy justify this statement? 12

2. Confidence region Rather than presenting the fu probabiity distribution of errors it is usefu to present confidence imits or confidence regions: a region in the m-dimensiona parameter space (m being the number of parameters), that contain a certain percentage of the tota probabiity distribution. Obviousy you want a suitaby compact region around the best fit vaue. It is customary to choose 68.3%, 95.4%, 99.7%... Eipsoida regions have connections with the norma (Gaussian) distribution but in genera things may be very different... A natura choice for the shape of confidence intervas is given by constant χ 2 boundaries. For the observed data set the vaue of parameters α 0 minimize the χ 2, denoted by χ 2 min. If we perturb α away from α 0 the χ 2 wi increase. From the properties of the χ 2 distribution it is possibe to show that there is a we defined reation between confidence intervas, forma standard errors, and χ 2. We In tabei we report the χ 2 for the conventionas 1, 2,and 3 σ as a function of the number of parameters for the joint confidence eves. In genera, et s spe out the foowing prescription. If µ is the number of fitted parameters for which you want to pot the join confidence region and p is the confidence imit desired, find the χ 2 such that the probabiity of a chi-square variabe with µ degrees of freedom being ess than χ 2 is p. For genera vaues of p this is given by Q described above (for the standard 1,2,3 σ see tabe above). P.S. Frequentists use χ 2 a ot. B. Likeihoods One can be more sophysticated than χ 2, if P(D) (D is data) is known. Remember from the Bayes theorem (eq.3) the probabiity of the data given the mode (Hypothesis) is the ikeihood. If we set P(D) = 1 (after a, you got the data) and ignore the prior, by maximizing the ikeihood we find the most ikey Hypothesis, or, often, the most ikey parameters of a given mode. Note that we have ignored P(D) and the prior so in genera this technique does not give you a goodness of fit and not an absoute probabiity of the mode, ony reative probabiities. Frequentists rey on χ 2 anayses where a goodness of fit can be estabished. In many cases (thanks to the centra imit theorem) the ikeihood can be we approxi- 13

mated by a muti-variate Gaussian: 1 L = exp 1 (D y) (2π) n/2 detc 1/2 i Cij 1 2 (D y) j (18) ij where C ij = (D i y i )(D j y j ) is the covariance matrix. Exercise: when are ikeihood anayses and χ 2 anayses the same? 1. Confidence eves for ikeihood For Bayesian statistics, confidence regions are found as regions R in mode space such that R P( α D)d α is, say, 0.68 for 68% confidence eve and 0.95 for 95% confidence. Note that this encoses the prior information. To report resuts independenty of the prior the ikeihood ratio is used. In this case compare the ikeihood at a particuar point in mode space L( α) with the vaue of the maximum ikeihood L max. Then a mode is said acceptabe if [ ] L( α) 2 n threshod (19) L max Then the threshod shoud be caibrated by cacuating the distribution of the ikeihood ratio in the case where a particuar mode is the true mode. There are some cases however when the vaue of the threshod is the corresponding confidence imit for a χ 2 with m degrees of freedom, for m the number of parameters. Exercise: in what cases? 3 C. Marginaization, combining different experiments Of a the mode parameters α i some of them may be uninteresting. Typica exampes of nuisance parameters are caibration factors, gaaxy bias parameter etc, but aso it may 3 Soution: The data must have Gaussian errors, the mode must depend ineary on the parameters, the gradients of the mode with respect to the parameters are not degenerate and the parameters do not affect the covariance. 14

be that we are interested on constraints on ony one cosmoogica parameter at the time rather than on the joint constraints on 2 or more parameters simutaneousy. One then marginaizes over the uninteresting parameters by integrating the posterior distribution: P(α 1..α j D) = dα j+1,...dα m P( α D) (20) if there are in tota m parameters and we are interested in j of them (j < m). Note that if you have two independent experiments, the combined ikeihood of the two experiments is just the product of the two ikeihoods. (of course if the two experiments are non independent then one woud have to incude their covariance). In many cases one of the two experiments can be used as a prior. A word of caution is on order here. We can aways combine independent experiments by mutipying their ikeihoods, and if the experiments are good and sound and the mode used is a good and compete description of the data a is we. However it is aways important to: a) think about the priors one is using and to quantify their effects. b) make sure that resuts from independent experiments are consistent: by mutipying ikeihood from inconsistent experiments you can aways get some sort of resuts but it does not mean that the resut actuay makes sense... Sometimes you may be interested in pacing a prior on the uninteresting parameters before marginaization. The prior may come from a previous measurement or from your beief. Typica exampes of this are: marginaization over caibration uncertainty, over point sources ampitude or over beam errors for CMB studies. For exampe for marginaization over, say, point source ampitude, it is usefu to know of the foowing trick for Gaussian ikeihoods: P(α 1..α m 1 D)= da (2π) m 2 C 1 2 [ 1 exp 1 2πσ 2 2 e [ 1 2 (C i (Ĉi+AP i ))Σ 1 ij (C j (Ĉj+AP j ))] (21) ] (A Â)2 σ 2 repeated indices are summed over and C denotes the determinant. Here, A is the ampitude of, say, a point source contribution P to the C anguar power spectrum, A is the m th parameter which we want to marginaize over with a Gaussian prior with variance σ 2 around Â. The trick is to recognize that this integra can be written as: P(α 1..α m 1 D) = C 0 exp [ 1 ] 2 C 1 2C 2 A + C 3 A 2 da (22) 15

(where C 0...3 denote constants) and that this kind of integra is evauated by using the substitution A A C 2 /C 3 giving something exp[ 1/2(C 1 C 2 2 /C 3)]. It is eft as an exercise to write the constants expicity. D. An exampe Let s say you want to constrain cosmoogy by studying custers number counts as a function of redshift. Here we foow the paper of Cash 1979 (3). The observation of a discrete number N of custers is a Poisson process, the probabiity of which is given by the product P = Π N i=1 [en i i exp( e i )/n i!] (23) where n i is the number of custers observed in the i th experimenta bin and e i is the expected number in that bin in a given mode: e i = I(x)δx i with i being the proportiona to the probabiity distribution. Here δx i can represent an interva in custers mass and/or redshift. Note: this is a product of Poisson distributions, thus one is assuming that these are independent processes. Custers may be custered, so when can this be used? For unbinned data (or for sma bins so that bins have ony 0 and 1 counts) we define the quantity: N C 2 n P = 2(E n I i ) (24) i=1 where E is the tota expected number of custers in a given mode. The quantity C between two modes with different parameters has a χ 2 distribution! (so a that was said in the χ 2 section appies, even though we started from a highy non-gaussian distribution.) IV. DESCRIPTION OF RANDOM FIELDS Let s take a break from probabiities and consider a sighty different issue. In comparing the resuts of theoretica cacuations with the observed Universe, it woud be meaningess to hope to be abe to describe with the theory the properties of a particuar patch, i.e. to predict the density contrast of the matter δ( x) = δρ(x)/ρ at any specific point x. Instead, it is possibe to predict the average statistica properties of the mass distribution 4. In 4 A very simiar approach is taken in statistica mechanics. 16

addition, we consider that the Universe we ive in is a random reaization of a the possibe Universes that coud have been a reaization of the true underying mode (which is known ony to Mother Nature). A the possibe reaizations of this true underying Universe make up the ensambe. In statistica inference one may sometime want to try to estimate how different our particuar reaization of the Universe coud be from the true underying one. Thinking back at the exampe of the urn with coored bas, it woud be ike considering that the particuar urn from which we are drawing the bas is ony one possibe reaization of the true underying distribution of urns. For exampe, say that the true distribution has a 50-50 spit in red and bue bas but that the urn can have ony an odd number of bas. Ceary the exact 50-50 spit cannot be reaized in one particuar urn but it can be reaized in the ensambe... Foowing the cosmoogica principe (e.g. Peebes 1980 (23)), modes of the Universe have to be homogeneous on the average, therefore, in widey separated regions of the Universe (i.e. independent), the density fied must have the same statistica properties. A crucia assumption of standard cosmoogy is that the part of the Universe that we can observe is a fair sampe of the whoe. This is cosey reated to the cosmoogica principe since it impies that the statistics ike the correation functions have to be considered as averages over the ensembe. But the pecuiarity in cosmoogy is that we have just one Universe, which is just one reaization from the ensembe (quite fictitious one: it is the ensembe of a possibe Universes). The fair sampe hypothesis states that sampes from we separated part of the Universe are independent reaizations of the same physica process, and that, in the observabe part of the Universe, there are enough independent sampes to be representative of the statistica ensembe. The hypothesis of ergodicity foows: averaging over many reaizations is equivaent to averaging over a arge (enough) voume. The cosmoogica fied we are interested in, in a given voume, is taken as a reaization of the statistica process and, for the hypothesis of ergodicity, averaging over many reaizations is equivaent to averaging over a arge voume. Theories can just predict the statistica properties of δ( x) which, for the cosmoogica principe, must be a homogeneous and isotropic random fied, and our observabe Universe is a random reaization from the ensembe. In cosmoogy the scaar fied δ( x) is enough to specify the initia fuctuations fied, and we utimatey hope aso the present day distribution of gaaxies and matter. Here ies one 17

of the big chaenges of modern cosmoogy. (see e.g.??) A fundamenta probem in the anaysis of the cosmic structures, is to find the appropriate toos to provide information on the distribution of the density fuctuations, on their initia conditions and subsequent evoution. Here we concentrate on power spectra and correation functions. A. Gaussian random fieds Gaussian random fieds are cruciay important in cosmoogy, for different reasons: first of a it is possibe to describe their statistica properties anayticay, but aso there are strong theoretica motivations, namey infation, to assume that the primordia fuctuations that gave rise to the present-day cosmoogica structures, foow a Gaussian distribution. Without resorting to infation, for the centra imit theorem, Gaussianity resuts from a superposition of a arge number of random processes. The distribution of density fuctuations δ defined as 5 δ = δρ/ρ cannot be exacty Gaussian because the fied has to satisfy the constraint δ > 1, however if the ampitude of the fuctuations is sma enough, this can be a good approximation. This seems indeed to be the case: by ooking at the CMB anisotropies we can probe fuctuations when their statistica distribution shoud have been cose to its primordia one; possibe deviations from Gaussianity of the primordia density fied are sma. If δ is a Gaussian random fied with average 0, its probabiity distribution is given by: P n (δ 1,, δ n ) = DetC 1 (2π) n/2 [ exp 1 ] 2 δt C 1 δ (25) where δ is a vector made by the δ i, C 1 denotes the inverse of the correation matrix which eements are C ij = δ i δ j. An important property of Gaussian random fieds is that the Fourier transform of a Gaussian fied is sti Gaussian. The phases of the Fourier modes are random and the rea and imaginary part of the coefficients have Gaussian distribution and are mutuay independent. Let us denote the rea and imaginary part of δ k by Reδ k and Imδ k respectivey. Their 5 Note that δ = 0 18

joint probabiity distribution is the bivariate Gaussian: P(Reδ k, Imδ k )dreδ k dimδ k = 1 [ exp Reδ2 k + Imδ 2 ] k dreδ 2πσk 2 2σk 2 k dimδ k (26) where σ 2 k is the variance in Reδ k and Imδ k and for isotropy it depends ony on the magnitude of k. Equation (26) can be re-written in terms of the ampitude δ k and the phase φ k : P( δ k, φ k )d δ k dφ k = 1 [ exp δ k 2 ] δ 2πσk 2 2σk 2 k d δ k dφ k (27) that is δ k foows a Rayeigh distribution. From this foows that the probabiity that the ampitude is above a certain threshod X is: [ P( δ k 2 1 > X) = exp δ k 2 ] [ δ X σk 2 2σk 2 k d δ k = exp X ]. (28) δ k 2 Which is an exponentia distribution. The fact that the phases of a Gaussian fied are random, impies that the two point correation function (or the power spectrum) competey specifies the fied. P.S. If your advisor now asks you to generate a Gaussian random fied you know how to do it. (It you are not famiiar with Fourier transforms see next section) - The observed fuctuation fied however is not Gaussian. The observed gaaxy distribution is highy non-gaussian principay due to gravitationa instabiity. To competey specify a non-gaussian distribution higher order correation functions are needed 6 ; conversey deviations from Gaussian behavior can be characterized by the higher-order statistics of the distribution. B. Basic toos The Fourier transform of the (fractiona) overdensity fied δ is defined as: δ k = A d 3 rδ( r) exp[ i k r] (29) 6 For non pathoogica distributions. For a discussion see e.g. (17). 19

with inverse δ( r) = B d 3 kδ k exp[i k r] (30) and the Dirac deta is then given by δ D ( k) = BA d 3 r exp[±i k r] (31) Here I chose the convention A = 1, B = 1/(2π) 3, but aways beware of the FT conventions. The two point correation function (or correation function) is defined as: ξ(x) = δ( r)δ( r + x) = < δ k δ k > exp[i k r] exp[i k ( r + x)]d 3 kd 3 k (32) because of isotropy ξ( x ) (ony a function of the distance not orientation). Note that in some cases when isotropy is broken one may want to keep the orientation information (see e.g. redshift space distortions, which affect custering ony aong the ine-of sight ). The definition of the power spectrum P(k) foows : < δ k δ k >= (2π) 3 P(k)δ D ( k + k ) (33) again for isotropy P(k) depends ony on the moduus of the k-vector, athough in specia cases where isotropy is broken one may want to keep the direction information. Since δ( r) is rea. we have that δ k = δ k, so < δ k δ k >= (2π) 3 d 3 xξ(x) exp[ i k x]δ d ( k k ) (34). The power spectrum and the correation function are Fourier transform pairs: ξ(x) = 1 P(k) exp[i (2π) k r]d 3 k (35) 3 P(k) = ξ(x) exp[ i k x]d 3 x (36) At this stage the same amount of information is encosed in P(k) as in ξ(x). From here the variance is σ 2 =< δ 2 (x) >= ξ(0) = 1 (2π) 3 P(k)d 3 k (37) or better σ 2 = 2 (k)d nk where 2 (k) = 1 (2π) 3k3 P(k) (38) 20

and the quantity 2 (k) is independent form the FT convention used. Now the question is: on what scae is this variance defined? Answer: in practice one needs to use fiters: the density fied is convoved with a fiter (smoothing) function. There are two typica choices: f = 1 f = exp[ 1/2x 2 /R (2π) 3/2 R G] 2 Gaussian f G 3 k = exp[ k 2 RG/2] 2 (39) 1 Θ(x/R (4π)RT 3 T ) TopHat f k = roughy R T 5R G. 3 (kr T ) 3[sin(kR T) kr T cos(kr T )] (40) Remember: Convoution in rea space is a mutipication in Fourier space; Mutipication in rea space is a convoution in Fourier space. - Exercise: consider a muti-variate Gaussian distribution: P(δ 1..δ n ) = 1 (2π) n/2 detc 1/2 exp[ 1 2 δt C 1 δ] (41) where C ij =< δ i δ j > is the covariance. Show that is δ i are Fourier modes then C ij is diagona. This is an idea case, of course but this is teing us that for Gaussian fieds the different k modes are independent! which is aways a nice feature. Another question for you: if you start off with a Gaussian distribution (say from Infation) and then eave this Gaussian fied δ to evove under gravity, wi it remain Gaussian forever? Hint: think about present-time Universe, and think about the dark matter density at, say, the center of a big gaaxy and in a arge void. 1. The importance of the Power spectrum The structure of the Universe on arge scaes is argey dominated by the force of gravity (which we think we know we) and no too much by compex mechanisms (baryonic physics, gaaxy formation etc.)- or at east that s the hope... Theory (see ectures on infation) give us a prediction for the primordia power spectrum: ( ) n k P(k) = A (42) 21 k 0

n - the spectra index is often taken to be a constant and the power spectrum is a power aw power spectrum. However there are theoretica motivations to generaize this to ( k P(k) = A k 0 ) n(k0 )+ 1 dn 2 d n k n(k/k 0) (43) as a sort of Tayor expansion of n(k) around the pivot point k 0. dn/d nk is caed the running of the spectra index. Note that different authors often use different choices of k 0 (sometimes the same author in the same paper uses different choices...) so things may get confused... so et s report expicity the conversions: ( k A(k 1 ) = A(k 0 ) k 0 ) n(k0 )+1/2(dn/d n k)n(k 1 /k 0 ) (44) Exercise: Prove the equations above. Exercise:Show that given the above definition of the running of the spectra index, n(k) = n(k 0 ) + dn/d nk n(k/k 0 ). It can be shown that as ong as inear theory appies and ony gravity is at pay δ << 1, different Fourier modes evove independenty and the Gaussian fied remains Gaussian. In addition, P(k) changes ony in ampitude and not in shape except in the radiation to matter dominated era and when there are baryon-photon interactions and baryons-dark matter interactions (see Lectures on CMB). In detai, this is described by inear perturbation growth and by the transfer function. C. Exampes of rea word issues Say that now you go and try to measure a P(k) from a reaistic gaaxy cataog. What are the rea word effects you may find? We have mentioned before redshift space distortions. Here we concentrate on other effects that are more genera (and not so specific to arge-scae structure anaysis). 22

1. Discrete Fourier transform In the rea word when you go and take the FT of your survey or even of your simuation box you wi be using something ike a fast Fourier transform code (FFT) which is a discrete Fourier transform. If your box has side of size L, even if δ(r) in the box is continuous, δ k wi be discrete. The k-modes samped wi be given by k = ( 2π L ) (i, j, k) where k = 2π L (45) The discrete Fourier transform is obtained by pacing the δ(x) on a attice of N 3 grid points with spacing L/N. Then: δk DFT = 1 exp[ i N k r]δ( r) (46) 3 δ DFT ( r) = k r exp[i k r]δ DFT k (47) Beware of the mapping between r and k, some routines use a weird wrapping! There are different ways of pacing gaaxies (or partice in your simuation) on a grid: Nearest grid point, Coud in ce, tranguar shaped coud etc... For each of these remember(!) then to deconvove the resuting P(k) for their effect. Note that and thus δ k ( ) x 3 N 3 δk DFT 1 2π k 3δDFT k (48) P(k) < δdft 2 > ( k) 3 since δ D (k) δk ( k) 3 (49) The discretization introduces severa effects: The Nyquist frequency: k Ny = 2π L N 2 is that of a mode which is samped by 2 grid points. Higher frequencies cannot be propery samped and give aiasing (spurious transfer of power) effects. You shoud aways work at k < k Ny. There is aso a minimum k (argest possibe scae) that you finite box can test :k min > 2π/L. This is one of the many reason why one needs ever arger N-body simuations... In addition DFT assume periodic boundary conditions, if you do not have periodic boundary conditions then this aso introduces aiasing. 23

2. Window, seection function, masks etc Seection function: Gaaxy surveys are usuay magnitude imited, which means that as you ook further away you start missing some gaaxies. The seection function tes you the probabiity for a gaaxy at a given distance (or redshift z) to enter the survey. It is a mutipicative effect aong the ine of sight in rea space. Window or mask You can never observe a perfect (or even better infinite) squared box of the Universe and in CMB studies you can never have a perfect fu sky map (we ive in a gaaxy...). The mask (sky cut in CMB jargon) is a function that usuay takes vaues of 0 or 1and is defined on the pane of the sky (i.e. it is constant aong the same ine of sight). The mask is aso a rea space mutipication effect. In addition sometimes in CMB studies different pixes may need to be weighted differenty, and the mask is an extreme exampe of this where the weights are either 0 or 1. Aso this operation is a rea space mutipication effect. Let s reca that a mutipicaton in rea space (where W( x) denotes the effects of window and seection functions) δ true ( x) δ obs ( x) = δ true ( x)w( x) (50) is a convoution in Fourier space: δ true ( k) δ obs ( k) = δ true ( k) W( k) (51) the sharper W( r) is the messier and deocaized W( k) is. As a resut it wi coupe different k- modes even if the underying ones were not correated! Discreteness Whie the dark matter distribution is amost a continuous one the gaaxy distribution is discrete. We usuay assume that the gaaxy distribution is a samping of the dark matter distribution. The discreteness effect give the gaaxy distribution a Poisson contribution (aso caed shot noise contribution). Note that the Poisson contribution is non Gaussian: it is ony in the imit of arge number of objects (or of modes) that it approximates a Gaussian. Here it wi suffice to say that as ong as a gaaxy number density is high enough (which wi need to be quantified and checked for any practica appication) and we have enough modes, we say that we wi have a superposition of our random fied (say the dark matter one characterized by its P(k)) pus a white noise contribution coming from the discreteness which ampitude depends on the 24

average number density of gaaxies (and shoud go to zero as this go to infinity), and we treat this additiona contribution as if it has the same statistica properties as the underying density fied (which is an approximation). What is the shot noise effect on the correation properties? Foowing (23) we recognize that our random fied is now given by f( x) = n( x) = n[1 + δ( x)] = i δ D ( x x i ) (52) where n denotes average number of gaaxies: n =< i δ D ( x x i ) >. Then, as done when introducing the Poisson distribution, we divide the voume in infinitesima voume eements δv so that their occupation can ony be 0 or 1. For each of these voumes the probabiity of getting a gaaxy is δp = ρ( x)δv, the probabiity of getting no gaaxy is δp = 1 ρ( x)δv and < n i >=< n 2 i >= nδv. We then obtain a doube stochastic process with one eve of randomness coming from the underying random fied and one eve coming from the Poisson samping. The correation function is obtained as: ij δ D ( r 1 r i )δ D ( r 2 r j ) = n 2 (1 + ξ 12 ) + nδ D ( r 1 r 2 ) (53) thus < n 1 n 2 >= n 2 [1+ < δ 1 δ 2 > d ] where < δ 1 δ 2 > d = ξ(x 12 ) + 1 n δd ( r 1 r 2 ) (54) and in Fourier space ( < δ k1 δ k2 > d = (2π) 3 P(k) + 1 n ) δ d ( k 1 + k 2 ) (55) This is not a compete surprise: the power spectrum of a superposition of two independent processes is the sum of the two power spectra... 3. pros and cons of ξ(r) and P(k) Let us briefy recap the pros and cons of working with power spectra or correation functions. Power spectra Pros: Direct connection to theory. Modes are uncorreated (in the idea case). The average density ends up in P(k = 0) which is usuay discarted, so no accurate knowedge of the mean density is needed. There is a cear distinction between inear and non-inear scaes. 25

Smoothing is not a probem (just a mutipication) Cons: window and seection functions act as compicated convoutions, introducing mode couping! (this is a serious issue) Correation function Pros: no probem with window and seection function Cons: scaes are correated. covariance cacuation a rea chaenge even in the idea case. Need to know mean densities very we. No cear distinction between inear and non-inear scaes. No direct correspondence to theory. 4.... and for CMB? If we can observe the fu sky the the CMB temperature fuctuation fied can be nicey expanded in spherica harmonics: T(ˆn) = a m Y m (ˆn). (56) >0 m= where a m = dω n T(ˆn)Y m(ˆn). (57) and thus < a m 2 >= a m a m = δ δ mm C (58) C is the anguar power spectrum and C = 1 (2 + 1) m= a m 2 (59) Now what happens in the presence of rea word effects such as a sky cut? Anaogousy to the rea space case: ã m = dω n T(ˆn)W(ˆn)Y m(ˆn) (60) where W(ˆn) is a position dependent weight that in particuar is set to 0 on the sky cut. As any CMB observation gets pixeized this is ã m = Ω p p T(p)W(p)Ym (p) (61) where p runs over the pixes and Ω p denotes the soid ange subtended by the pixe. 26

Ceary this can be a probem (this can be a nasty convoution), but et us initiay ignore the probem and carry on. The pseudo-c s (Hivon et a 2002)(15) are defined as: Ceary C C but C = 1 (2 + 1) m= ã m 2 (62) C = G C (63) where <> denotes the ensambe average. We notice aready two things: as expected the effect of the mask is to coupe otherwhise uncorreated modes. In arge scae structure studies usuay peope stop here: convove the theory with the various rea word effects incuding the mask and compare that to the observed quantities. In CMB usuay we go beyond this step and try to deconvove the rea word effects. First of a note that G 1 2 = 2 2 + 1 4π 3 (2 3 + 1)W 3 ( 1 2 3 0 0 0 ) 2 (64) where W = 1 2 + 1 W m 2 and W m = m dω n W(ˆn)Y m (ˆn) (65) So if you are good enough to be abe to invert G and you can say that < C > is the C you want then C = G 1 C (66) Not for a experiments it is viabe (possibe) to do this ast step. In adddition to this, the instrument has other effects such as noise and a finite beam. 5. Noise and beams Instrumenta noise and the finite resoution of any experiment affect the measured C. The effect of the noise is easiy found: the instrumenta noise is an independent random process with a Gaussian distribution superposed to the temperature fied. In a m space a m a signa m + a noise m. 27

Whie a noise = 0, in the power spectrum this gives rise to the so-caed noise bias: where C noise m a biased estimator. C measured = C signa + C noise (67) = (2 + 1) m a noise m 2. As the expectation vaue of C noise is non zero, this is Note that the noise bias disappears if one computes the so-caed cross C obtained as a cross-correation between different, uncorreated, detectors (say detector a and b) as a noise,a m a noise,b m = 0. One is however not getting something for nothing: when one computes the covariance (or the associated error) for auto and for cross correation C (exercise!) the covariance is the same and incudes the extra contribution of the noise. It is ony that the cross-c are unbiased estimators. Every experiment sees the CMB with a finite resoution given by the experimenta beam (simiar concept to the Point Spread Function for optica astronomy). The observed temperature fied is smoothed on the beam scaes. Smoothing is a convoution in rea space: T i = dω n T(ˆn)b( ˆn ˆn ) (68) where we have considered a symmetric beam for simpicity. The beam is often we approximated by a Gaussian of a given Fu Width at Haf Maximum. σ b = 0.425FWHM. Thus in harmonic space the beam effect is a mutipication: Remember that C measured = C sky e 2 σ 2 b (69) and in the presence of instrumenta noise C measured = C sky e 2 σb 2 + C noise (70) Of course, one can aways deconvove for the effects of the beam to obtain an estimate of C measured as cose as possibe to C sky : C measured = C sky + C noise e 2 σ 2 b. (71) That is why it is often said that the effective noise bows up at high (sma scaes) and why it is important to know the beam(s) we. 28

Exercise: What happens if you use cross-c s? Exercise: What happens to C measured if the beam is poory reconstructed? - Note that the signa to noise of a CMB map depends on the pixe size (by smoothing the map and making arger pixes the noise per pixe wi decrease as Ω pix, Ω pix being the new pixe soid ange), on the integration time σ pix = s/ t where s is the detector sensitivity and t the time spent on a given pixe and on the number of detectors σ pix = s/ M where M is the number f detectors. To compare maps of different beam sizes it is usefu to have a noise measure that in independent of Ω pix : w = (σpix 2 Ω pix) 1. - Exercise: Compute the expression for C noise t = observing time s = detector sensitivity (in µk/ s) n = number of detectors N = number of pixes f sky = fraction of the sky observed given: Assume uniform noise and observing time uniformy distributed. You may find (18) very usefu. 6. Aside: Higher orders correations From what we have earned so far we can concude that the power spectrum (or the correation function) competey characterizes the statistica properties of the density fied if it is Gaussian. But what if it is not? Higher order correations are defined as: δ 1...δ m where the detas can be in rea space giving the correation function or in Fourier space giving power spectra. At this stage, it is usefu to present here the Wick s theorem (or cumuant expansion 29

theorem). The correation of order m can in genera be written as sum of products of unreducibe (connected) correations of order for = 1...m. For exampe for order 3 we obtain: δ 1 δ 2 δ 3 f = (72) δ 1 δ 2 δ 3 + (73) δ 1 δ 2 δ 3 + (3cyc.terms) (74) δ 1 δ 2 δ 3 (75) and for order 6 (but for a distribution of zero mean): < δ 1...δ 6 > f = (76) < δ 1 δ 2 >< δ 3 δ 4 >< δ 5 δ 6 > +..(15terms) (77) < δ 1 δ 2 >< δ 3 δ 4 δ 5 δ 6 > +..(15terms) (78) < δ 1 δ 2 δ 3 >< δ 4 δ 5 δ 6 > +...(10terms) (79) < δ 1..δ 6 > (80) For computing covariances of power spectra, it is usefu to be famiiar with the above expansion of order 4. V. MORE ON LIKELIHOODS Whie the CMB temperature distribution is Gaussian (or very cose to Gaussian) the C distribution is not. At high the Centra Limit Theorem wi ensure that the ikeihood is we approximated by a Gaussian but at ow this is not the case. L(T C th ) exp[ (TS 1 T)/2] det(s) (81) where T denotes a vector of the temperature map, C th denotes the C given by a theoretica mode (e.g. a cosmoogica parameters set), and S ij is the signa covariance: S ij = (2 + 1) C th 4π P ( ˆn i ˆn j ) (82) and P denote the Legendre poynomias. 30

If we then expand T in spherica harmonics we obtain: ] L(T C th ) exp[ 1/2 a m 2 /C th C th Isotropy means that we can sum over m s thus: 2 n L = [ ( ) C th (2 + 1) n + C data ( ) ] C data 1 C th (83) (84) where C data = m a m 2 /(2 + 1). - Exercise: show that for an experiment with (gaussian) noise the expression is the same but with the substitution C th C th + N with N denoting the power spectrum of the noise. Exercise: show that for a partia sky experiment (that covers a fraction of sky f sky you can approximatey write: n L f sky n L (85) Hint: Think about how the number of independent modes scaes with the sky area. As an aside... you coud ask: But what do I do with poarization data?. We... if the a T m are Gaussiany distributed aso the ae m and ab m wi be. So we can generaize the approach above using a vector (a T ma E ma B m). Let us consider a fu sky, idea experiment. Start by writing down the covariance, foow the same steps as above and show that: 2 n L = { ( ) ( C BB C TT C EE (C TE ) 2 ) (2 + 1) n + n Ĉ BB Ĉ TT Ĉ EE (ĈTE ) 2 where C denotes C th + ĈTT C EE C TT + C TT C EE and Ĉ denotes C data. It is easy to show that for a noisy experiment then C XY denotes the noise power spectrum and X, Y = {T, E, B}. Ĉ EE 2ĈTE C TE } + ĈBB 3, (86) (C TE ) 2 C BB C XY + N XY where N Exercise: generaize the above to partia sky coverage: for added compication take f TT sky fsky EE fbb sky. (this is often the case as the sky cut for poarization may be different from that of temperature( the foregrounds are different) and in genera the cut (or the weighting) for B may need to be arger than that for E. 31

Foowing (27) et us now expand in Tayor series Equation (84) around its maximum by writing Ĉ = C th (1 + ǫ). For a singe mutipoe, 2 n L = (2 + 1)[ǫ n(1 + ǫ)] (2 + 1) ( ǫ 2 2 ǫ3 3 + O(ǫ4 ) ). (87) We note that the Gaussian ikeihood approximation is equivaent to the above expression truncated at ǫ 2 : 2 n L Gauss, (2 + 1)/2[(Ĉ C th )/C th ] 2 (2 + 1)ǫ 2 /2. Aso widey used for CMB studies is the ognorma ikeihood for the equa variance approximation (Bond et a 1998): approximation is 2 n L LN = (2 + 1) 2 [ n ( Ĉ C th )] 2 (2 + 1) ( ǫ 2 Thus our approximation of ikeihood function is given by the form, 2 ǫ3 2 ). (88) n L = 1 3 n L Gauss + 2 3 n L LN, (89) where and n L Gauss 1 (C th 2 n L LN = 1/2 (z th Ĉ)Q (C th Ĉ ), (90) ẑ )Q (z th ẑ ), (91) where z th = n(c th + N ), ẑ = n(ĉ + N ) and Q is the oca transformation of the curvature matrix Q to the ognorma variabes z, Q = (C th + N )Q (Ĉth + N ). (92) The curvature matrix is the inverse of the covariance matrix evauated at the maximum ikeihood. However we do not want to adopt the equa variance approximation, so Q for us wi be in inverse of the covariance matrix. as: The eements of the covariance matrix, even for a idea fu sky experiment can be written C = 2 (Cth )2 2 + 1 32 (93)

In the absence of noise the covariance is non zero: this is the cosmic variance. Note that for the atest WMAP reease (22; 26)at ow the ikeihood is computed directy from the maps m. The standard ikeihood is given by L( m S)d m = exp [ 1 2 mt (S + N) 1 m ] d m S + N 1/2 (2π) 3np/2, (94) where m is the data vector containing the temperature map, T, as we as the poarization maps, Q, and U, n p is the number of pixes of each map, and S and N are the signa and noise covariance matrix (3n p 3n p ), respectivey. As the temperature data are competey dominated by the signa at such ow mutipoes, noise in temperature may be ignored. This simpifies the form of ikeihood as [ exp 1 m ] t ( S 2 P + N P ) 1 m d m L( m S)d m = S P + N P 1/2 (2π) np ( exp 1T t S 1 2 T T ) dt S T 1/2 (2π) np/2, (95) where S T is the temperature signa matrix (n p n p ), the new poarization data vector, m = ( Q p, Ũ p ) and S P is the signa matrix for the new poarization vector with the size of 2n p 2n p. At the time of writing, in CMB parameter estimates for < 2000, the ikeihood cacuation is the botteneck of the anaysis. VI. MONTE CARLO METHODS A. Monte Caro error estimation Let s go back to the issue of parameter estimation and error cacuation. Here is the conceptua interpretation of what it means that en experiment measures some parameters (say cosmoogica parameters). There is some underying true set of parameters α true that are ony known to Mother Nature but not to the experimenter. There true parameters are statisticay reaized in the observabe universe and random measurement errors are then incuded when the observabe universe gets measured. This reaization gives the measured data D 0. Ony D 0 is accessibe to the observer (you). Then you go and do what you have to do to estimate the parameters and their errors (chi-square, ikeihood, etc...) and get α 0. Note that D 0 is not a unique reaization of the true mode given by α true : there coud be infinitey many other reaizations as hypothetica data sets, which coud have 33

been the measured one: D 1, D 2,... each of them with a sighty different fitted parameters α 1, α 2... α 0 is one parameter set drawn from this distribution. The hypotetica ensambe of universes described by α i is caed ensambe, and one expects that the expectation vaue α i = α true. If we knew the distribution of α i α true we woud know everything we need about the uncertainties in our measurement α 0. The goa is to infer the distribution of α i α true without knowing α true. Here s what we do: we say that hopefuy α 0 is not too wrong and we consider a fictitious word where α 0 was the true one. So it woud not be such a big mistake to take the probabiity distribution of α 0 α i to be that of α true α i. In many cases we know how to simuate α 0 α i and so we can simuate many synthetic reaization of words where α 0 is the true underying mode. Then mimic the observation process of these fictitious Universes repicating a the observationa errors and effects and from each of these fictitious universe estimate the parameters. Simuate enough of them and from α i S α 0 you wi be abe to map the desired muti-dimensiona probabiity distribution. With the advent of fast computers this technique has become increasingy widespread. As ong as you beieve you know the underying distribution and that you beieve you can mimic the observation repicating a the observationa effects this technique is extremey powerfu and, I woud say, indispensabe. B. Monte Caro Markov Chains When deaing with high dimensiona ikeihoods (i.e. many parameters) the process of mapping the ikeihood (or the posterior) surface can become very expensive. For exampe for CMB studies the modes considered have from 6 to 11+ parameters. Every mode evauation even with a fact code such as CAMB can take up to minutes per iteration. A grid-based ikeihood anaysis woud require prohibitive amounts of CPU time. For exampe, a coarse grid ( 20 grid points per dimension) with six parameters requires 6.4 10 7 evauations of the power spectra. At 1.6 seconds per evauation, the cacuation woud take 1200 days. Christensen & Meyer (2000) (4) proposed using Markov Chain Monte Caro (MCMC) to investigate the ikeihood space. This approach has become the standard too for CMB anayses. MCMC is a method to simuate posterior distributions. In particuar one simuates samping the posterior distribution P(α x), of a set of parameters α given event x, obtained 34

via Bayes Theorem P(α x) = P(x α)p(α) P(x α)p(α)dα, (96) where P(x α) is the ikeihood of event x given the mode parameters α and P(α) is the prior probabiity density; α denotes a set of cosmoogica parameters (e.g., for the standard, fat ΛCDM mode these coud be, the cod-dark matter density parameter Ω c, the baryon density parameter Ω b, the spectra sope n s, the Hubbe constant in units of 100 km s 1 Mpc 1 ) h, the optica depth τ and the power spectrum ampitude A), and event x wi be the set of observed Ĉ. The MCMC generates random draws (i.e. simuations) from the posterior distribution that are a fair sampe of the ikeihood surface. From this sampe, we can estimate a of the quantities of interest about the posterior distribution (mean, variance, confidence eves). The MCMC method scaes approximatey ineary with the number of parameters, thus aowing us to perform ikeihood anaysis in a reasonabe amount of time. A propery derived and impemented MCMC draws from the joint posterior density P(α x) once it has converged to the stationary distribution. The primary consideration in impementing MCMC is determining when the chain has converged. After an initia burnin period, a further sampes can be thought of as coming from the stationary distribution. In other words the chain has no dependence on the starting ocation. Another fundamenta probem of inference from Markov chains is that there are aways areas of the target distribution that have not been covered by a finite chain. If the MCMC is run for a very ong time, the ergodicity of the Markov chain guarantees that eventuay the chain wi cover a the target distribution, but in the short term the simuations cannot te us about areas where they have not been. It is thus crucia that the chain achieves good mixing. If the Markov chain does not move rapidy throughout the support of the target distribution because of poor mixing, it might take a prohibitive amount of time for the chain to fuy expore the ikeihood surface. Thus it is important to have a convergence criterion and a mixing diagnostic. Pots of the samped MCMC parameters or ikeihood vaues versus iteration number are commony used to provide such criteria (eft pane of Figure 3). However, sampes from a chain are typicay seriay correated; very high autocorreation eads to itte movement of the chain and thus makes the chain to appear to have converged. For a more detaied discussion see (10). Using a MCMC that has not fuy expored the ikeihood surface for determining cosmoogica parameters wi yied wrong resuts. See right pane of Figure 3). 35

C. Markov Chains in Practice Here are the necessary steps to run a simpe MCMC for the CMB temperature power spectrum. It is straightforward to generaize these instructions to incude the temperaturepoarization power spectrum and other datasets. The MCMC is essentiay a random wak in parameter space, where the probabiity of being at any position in the space is proportiona to the posterior probabiity. 1) Start with a set of cosmoogica parameters {α 1 }, compute the C 1 L 1 = L(C 1th Ĉ). and the ikeihood 2) Take a random step in parameter space to obtain a new set of cosmoogica parameters {α 2 }. The probabiity distribution of the step is taken to be Gaussian in each direction i with r.m.s given by σ i. We wi refer beow to σ i as the step size. The choice of the step size is important to optimize the chain efficiency (see discussion beow) 3) Compute the C 2th for the new set of cosmoogica parameters and their ikeihood L 2. 4.a) If L 2 /L 1 1, take the step i.e. save the new set of cosmoogica parameters {α 2 } as part of the chain, then go to step 2 after the substitution {α 1 } {α 2 }. 4.b) If L 2 /L 1 < 1, draw a random number x from a uniform distribution from 0 to 1. If x L 2 /L 1 do not take the step, i.e. save the parameter set {α 1 } as part of the chain and return to step 2. If x < L 2 /L 1, take the step, i.e. do as in 4.a). 5) For each cosmoogica mode run four chains starting at randomy chosen, weseparated points in parameter space. When the convergence criterion is satisfied and the chains have enough points to provide reasonabe sampes from the a posteriori distributions (i.e. enough points to be abe to reconstruct the 1- and 2-σ eves of the marginaized ikeihood for a the parameters) stop the chains. It is cear that the MCMC approach is easiy generaized to compute the joint ikeihood of WMAP data with other datasets. 36

D. Improving MCMC Efficiency MCMC efficiency can be seriousy compromised if there are degeneracies among parameters. The typica exampe is the degeneracy between Ω m and H for a fat cosmoogy or that between Ω m and Ω Λ for a non-fat case (see e.g. reference (26)). The Markov chain efficiency can be improved in different ways. Here we report the simpest way. Reparameterization We describe beow the method we use to ensure convergence and good mixing. Degeneracies and poor parameter choices sow the rate of convergence and mixing of the Markov Chain. There is one near-exact degeneracy (the geometric degeneracy) and severa approximate degeneracies in the parameters describing the CMB power spectrum Bond et a (1994) (1), Efstathiou& Bond(1984) (2). The numerica effects of these degeneracies are reduced by finding a combination of cosmoogica parameters (e.g., Ω c, Ω b, h, etc.) that have essentiay orthogona effects on the anguar power spectrum. The use of such parameter combinations removes or reduces degeneracies in the MCMC and hence speeds up convergence and improves mixing, because the chain does not have to spend time exporing degeneracy directions. Kosowsky, Miosavjevic & Jimenez (2002) (19) and Jimenez et a (2003) (16) introduced a set of reparameterizations to do just this. In addition, these new parameters refect the underying physica effects determining the form of the CMB power spectrum (we wi refer to these as physica parameters). This eads to particuary intuitive and transparent parameter dependencies of the CMB power spectrum. For the 6 parameters LCDM mode these norma or physica parameters are: the physica energy densities of cod dark matter, ω c Ω c h 2, and baryons, ω b Ω b h 2, the characteristic anguar scae of the acoustic peaks, where a dec is the scae factor at decouping, θ A = r s(a dec ) D A (a dec ), (97) r s (a dec ) = c (98) H 0 3 adec dx [( ) 0 1 + 3Ω b 4Ω γ ((1 Ω)x2 + Ω Λ x 1 3w + Ω m x + Ω rad ) ] 1/2 37

is the sound horizon at decouping, and d A (a dec ) = a S κ (r dec ) H 0 Ω 1 (99) where 1 r(a dec ) = Ω 1 a dec dx [(1 Ω)x 2 + Ω Λ x 1 3w + Ω m x + Ω rad ] 1/2 (100) and S κ (r) as usua coincides with the argument if the curvature κ is 0, is a sin function for Ω > 1 and a sinh function otherwhise. Here H 0 denotes the Hubbe constant and c is the speed of ight, Ω m = Ω c + Ω b, Ω Λ denotes the dark energy density parameters, w is the equation of state of the dark energy component, Ω = Ω m + Ω Λ and the radiation density parameter Ω rad = Ω γ + Ω ν, Ω γ, Ω ν are the the photon and neutrino density parameters respectivey. For reionization sometmes the parameter Z exp( 2τ) is used, where τ denotes the optica depth to the ast scattering surface (not the decouping surface). These reparameterizations are usefu because the degeneracies are non-inear, that is they are not we described by eipses in parameter space. For degeneracies that are we approximated by eipses in parameter space it is possibe to find the best reparameterization automaticay. This is what the code CosmoMC (5; 6) (see tutorias) does. To be more precise it computes the parameters covariance matrix from which the axes of the muti-dimensiona degeneracy eipse can be found. Then it performs a rotation and re-scaing of the coordinates (i.e. the parameters) to transform the degeneracy eipse in an azimutay symmetric contour. See discussion at http://cosmoogist.info/notes/cosmomc.pdf for more information. This technique can improve the MCMC efficiency up to a factor of order 10. Step size optimization The choice of the step size in the Markov Chain is crucia to improve the chain efficiency and speed up convergence. If the step size is too big, the acceptance rate wi be very sma; if the step size is too sma the acceptance rate wi be high but the chain wi exhibit poor mixing. Both situations wi ead to sow convergence. E. Convergence and Mixing Before we even start this section: thou sha aways use a convergence and mixing criterion when running MCMC s. Let s iustrate here the method proposed by Geman & Rubin 1992 (8) as an exampe. They advocate comparing severa sequences drawn from different starting points and check- 38

ing to see that they are indistinguishabe. This method not ony tests convergence but can aso diagnose poor mixing. Let us consider M chains starting at we-separated points in parameter space; each has 2N eements, of which we consider ony the ast N: {y j i } where i = 1,.., N and j = 1,.., M, i.e. y denotes a chain eement (a point in parameter space) the index i runs over the eements in a chain the index j runs over the different chains. We define the mean of the chain and the mean of the distribution ȳ j = 1 N ȳ = 1 NM We then define the variance between chains as and the variance within a chain as The quantity B n = 1 M 1 W = ˆR = 1 M(N 1) N yi, j (101) i=1 NM ij=1 y j i. (102) M (ȳ j ȳ) 2, (103) j=1 (y j i ȳ j ) 2. (104) ij N 1 W + B ( ) N n 1 + 1 M W (105) is the ratio of two estimates of the variance in the target distribution: the numerator is an estimate of the variance that is unbiased if the distribution is stationary, but is otherwise an overestimate. The denominator is an underestimate of the variance of the target distribution if the individua sequences did not have time to converge. The convergence of the Markov chain is then monitored by recording the quantity ˆR for a the parameters and running the simuations unti the vaues for ˆR are aways < 1.03. Needess to say that the cosmomc package offers severa convergence and mixing diagnostic toos as part of the getdist routine. Question: how does the MCMC sampe the prior if a one actuay computes is the ikeihood? 39

F. MCMC Output Anaysis Now that you have your mutipe chains and the convergence criterium says they are converged whot do you do? First discard burn in and merge the chains. Since the MCMC passes objective tests for convergence and mixing, the density of points in parameter space is proportiona to the posterior probabiity of the parameters. (Note that cosmomc saves repeated steps as the same entry in the fie but with a weight equa to the repetitions: the MCMC gives to each point in parameter space a weight proportiona to the number of steps the chain has spent at that particuar ocation.). The marginaized distribution is obtained by projecting the MCMC points. This is a great advantage compared to the grid-based approach where muti-dimensiona integras woud have to be performed. The MCMC basicay performs a Monte Caro integration. the density of points in the n-dimensiona space is proportiona to the posterior, and best fit parameters and muti-dimensiona confidence eves can be found as iustrated in the ast cass. Note that the goba maximum ikeihood vaue for the parameters does not necessariy coincide with the expectation vaue of their marginaized distribution if the ikeihood surface is not a muti-variate Gaussian. A virtue of the MCMC method is that the addition of extra data sets in the joint anaysis can efficienty be done with minima computationa effort from the MCMC output if the incusion of extra data set does not require the introduction of extra parameters or does not drive the parameters significanty away from the current best fit. If the ikeihood surface for a subset of parameters from an externa (independent) data set is known, or if a prior needs to be added a posteriori, the joint posterior surface can be obtained by mutipying the new probabiity distribution with the posterior distribution of the MCMC output. To be more precise: as the density of point (i.e. the weight) is directy proportiona to the posterior, then this is achieved by mutipying the weight by the new probabiity distribution. The cosmomc package aready incudes this faciity. VII. FISHER MATRIX What if you wanted to forecast how we a future experiment can do? There is the expensive but more accurate way and the cheap and quick way (but often ess accurate). The 40

expensive way is in the same spirit of Monte-Caros simuations discussed earier: simuate the observations and estimate the parameters as you woud do on the data. However often you want to have a much quicker way to forecasts parameters uncertainties, especiay if you need to quicky compare many different experimenta designs. This technique is the Fisher matrix approach. VIII. FISHER MATRIX The question of how accuratey one can measure mode parameters from a given data set (without simuating the data set) was answered more than 70 years ago by Fisher (1935) (9). Suppose your data set is given by m rea numbers x 1...x m arranged in a vector (they can be CMB map pixes, P(k) of gaaxies etc...) x is a random variabe with probabiity distribution which depends in some way on the vector of mode parameters α. The Fisher information matrix id defined as: 2 L F ij = (106) α i α j where L = n L. In practice you wi choose a fiducia mode and compute the above at the fiducia mode. In the one parameter case et s note that if the Likeihood is Gaussian then L = 1/2(α α 0 )/σα 2 where α 0 is the vaue that maximizes the ikeihood and sigma is the error on the parameter α. Thus the second derivative wrt α of L is 1/σα 2 as ong as σ α does not depend on α. In genera we can expand L in Tayor series around its maximum (or the fiducia mode). There by definition the first derivative of L wrt the parmeters is 0. L = 1 d 2 L 2 dα 2(α α 0) 2 (107) When 2 L = 1, which in the case when we can identify it with χ 2 with corresponds to 68% (or 1-σ) then 1/ d 2 L/dα 2 is the 1 sigma dispacement of α from α 0. The generaization to many variabes is beyond our scope here (see Kenda & Stuard 1977 (17)) et s just say that an estimate of the covariance for the parameters is given by: σ 2 α i,α j (F 1 ) ij (108) From here it foows that if a other parameters are kept fixed 1 σ αi (109) F ii 41

(i.e. the reciproca of the square root of the diagona eement ii of the Fisher matrix.) But if a other parameters are estimated from the data as we then the marginaized error is σ α = (F 1 ) 1/2 ii (110) (i.e. square root of the eement ii of the inverse of the Fisher matrix -perform a matrix inversion here!) In genera, say that you have 5 parameters and that you want to pot the joint 2D contours for parameters 2 and 4 marginaized over a other parameters 1,3,5. Then you invert F ij, take the minor 22, 24, 42, 24 and invert it back. The resuting matrix, et s ca it Q, describes a Gaussian 2D ikeihood surface in the parameters 2 and 4 or, in other words, the chisquare surface for parameters 2,4 - marginaized over a other parameters- can be described by the equation χ 2 = kq(α k α fiducia k )Q kq (α q αq fiducia ). From this equation, getting the errors corresponds to finding the quadraitic equation soution χ 2 = χ 2. For correspondence between χ 2 and confidence region see the earier discussion. If you want ti make pots, the equation for the eiptica boundary for the joint confidence region in the sub-space of parameters of interest is: = δ αq 1 δα. Note that in many appications the ikeihood for the data is assumed to be Gaussian and the data-covariance is assumed not to depend on the parameters. For exampe for the appication to CMB the fisher matrix is often computed as: F ij = (2 + 1) 2 C α j C α j (C + Ne σ2 2 ) 2 (111) This approximation is good in the high, noise dominated regime. In this regime the centra imit theorem ensures a Gaussian ikeihood and the cosmic variance contribution to the covariance is negigibe and thus the covariance does not depend on the cosmoogica parameters. However at ow in the cosmic variance dominated regime, this approximation over-estimates the errors (this is one of the few cases where the Fisher matrix approach over-estimates errors) by about a factor of 2. In this case it is therefore preferabe to go for the more numericay intensive option of computing eq. 106 with the exact form for the ikeihood Eq 86. Before we concude we report here a usefu identity: 2L = n det(c) + (x y) i C 1 ij (x y) T j = Tr[n C + C 1 D] (112) 42

where C stands for the covariance matrix of the data, repeated indices are summed over, x denotes the data and y the mode fitting the data and D is the data matrix defined as ( x y)( x y) t. We have used the identity ndet(c) = Tr nc. IX. CONCLUSIONS Wether you are a theorist or an experimentaist in cosmoogy, these days you cannot ignore the fact that to make the most of your data, statistica techniques need to be empoyed, and used correcty. An incorrect treatment of the data wi ead to nonsensica resuts. A given data set can reve a ot about the universe, but there wi aways things that are beyond the statistica power of the data set. It is crucia to recognize it. I hope I have given you the basis to be abe to earn this by yoursef. When in doubt aways remember: treat your data with respect. Acknowedgments We woud ike to thank Aain Connes, Jeff Harvey, Abert Schwarz and Nathan Seiberg for comments on the manuscript, and the ITP, Santa Barbara for hospitaity. This work was supported in part by DOE grant DE-FG05-90ER40559, and by RFFI grants 01-01-00549 and 00-15-96557. X. QUESTIONS Here are some of the questions (and their answers...) Q: What about the forgotten P(D) in the Bayes theorem? A: P(D) becomes important when one wants to compare different cosmoogica modes (not just different parameter vaues within the same cosmoogica mode). There is a somewhat extensive iterature in cosmoogy aone on this: Bayesian evidence is used to do this mode seection. Bayesian evidence cacuation can be, in many cases, numericay intensive. Q: What do you do when the ikeihood surface is very compicated for exampe is mutipeaked? A: Luckiy enough in CMB anaysis this is amost never the case (exceptions being, for exam- 43

pe, the cases where sharp osciations are present in the C th as it happens in transpanckian modes.) In other contexts this case is more frequent. In these cases, when additiona peaks can t be suppressed by a motivated prior, there are severa options: a) if m is the number of parameters, report the confidence eves by considering ony regions with m-dimensiona ikeihoods above a threshod before marginaizing (thus oca maxima wi be incuded in the confidence contours if significant enough) or b) simpy marginaize as described here, but expect that the marginaized peaks wi not coincide with the m-dimensiona peaks. The most chaenging issue when the ikeihood surface is compicated is having the MCMC to fuy expore the surface. Techniques such as Hamitonian Monte Caro are very powerfu in these cases see e.g. (11; 12) for? Q: The Fisher matrix approach is very interesting, is there anything I shoud ook out A: The Fisher matrix approach assumes that the parameters og-ikeihood surface can be quadraticay approximated around the maximum. This may or may not be a good approximation. It is aways a good idea to check at east in a reference case wether this approach significanty underestimated the errors. In many Fisher matrix impementation the ikeihood for the data is assumed to be Gaussian and the data-covariance is assumed not to depend on the parameters. Whie this approximation greaty simpifies the cacuations (Exercise: show why this is the case), it may significanty mis-estimate the size of the errors. In addition, the Fisher matrix cacuation often requires numerica evauation of second derivatives. Numerica derivatives aways need a ot of care and attention: you have been warned. Q: Does cosmomc use the samping described here? A: The recipe reported here is the so caed Metropois-Hasting agorithm. Cosmomc offers aso other samping agorithms: sice samping, or a spit between sow and fast parameters or the earn-propose option where chains automaticay adjust the proposa distribution by repeatedy computing the parameters covariance matrix on the fy. These techniques can greaty improve the MCMC efficiency. See for exampe http://cosmoogist.info/notes/cosmomc.pdf or (10) for more detais. 44

Acknowedgments I am indebited to A. Tayor for his ectures on statistics for beginners given at ROE in 1997. Aso a ot of my understanding of the basics come from Ned Wright Journa Cub in statistics on his web page. A ot of the techniques reported here were deveoped and/or tested as part of the anaysis of WMAP data between 2002 and 2006. Thanks aso to E. Linder and Y. Wang for reading and sending comments on the manuscript. Last, but not east, I woud ike to thank the organizers of the XIX Canary isands winter schoo, for a very stimuating schoo. XI. TUTORIALS Two tutoria sessions were organized as part of the Winter Schoo. You may want to try to repeat the same steps. The goas of the first tutoria were: Downoad and insta Heapix (7; 14). make sure you have a fortran 90 compier instaed and possiby aso IDL instaed The LAMBDA site (20) contains a ot of extremey usefu information: browse the page. Find the on-ine cacuator for the theory C given some cosmoogica parameters, generate a set of C and save them as a fits fie. Browse the hep pages of Heapix and find out what it does. In particuar the routine symfast enabes one to generate a map from a set of theory C. The routine anafast enabes one to compute C from a map. Using symfast, generate two maps with two different random number generators for the C you generated above. Seect a beam of FWHM of, say, 30, but for now do not imose any gaaxy cut and do not add noise. Compare the maps. To do that use moview. Why are they different? 45

Using anafast compute the C for both maps. Using your favorite potting routine (the IDL part of heapix offers utiities to do that) pot the orgina C and the two reaizations. Why do they differ? Deconvove for the beam, and make sure you are potting the correct units, the correct factors of ( + 1) etc. Why do the power spectra sti differ? Try to compute the C for max = 3000 and for a non fat mode with camb. It takes much onger than for a simpe, fat LCDM mode. A code ike CMBwarp (16) offers a shortcut. try it by downoading it at http://www.astro.princeton.edu/~rauj/cmbwarp/index.htm. Keep in mind however that this offers a fast fitting routine for a fine grid of pre-computed C, it is not a Botzmann code and thus is vaid ony within the quoted imits! The goas of the second tutoria were: Downoad and insta the CosmoMC (5) package (and read the instructions). Remember that WMAP data and ikeihood need to be dowoaded separatey from the LAMBDA site. Do this foowing the instructions. Take a ook at the params.ini fie. In here you set up the MCMC Set a chain to run. get famiar with the getdist program and distparams.ini fie. This program checks convergence for you and compute basic parameter estimation from converged chains. downoad from LAMBDA a set of chains, the suggested one was for the fat, quintessence mode for WMAP data ony. pot the marginaized probabiity distribution for the parameter w. the chaenge: Appy now a Hubbe constant prior to this chain, take the HST key project (13) constraint of H 0 = 72 ± 8 km/s/mpc and assume it has a Gaussian distribution. This procedure is caed importance samping. 46

References [1] Bond, J. R., Crittenden, R., Davis, R. L., Efstathiou, G., Steinhardt, P. J. (1994), Phys. Rev. Lett., 72, 13 [2] Bond, J. R., Efstathiou, G. (1984) ApJLett,285, L45 [3] Cash, W. (1979) ApJ 228 939 947 [4] Christensen, N. and Meyer, R. (2001) PRD, 64, 022001 [5] http://cosmoogist.info/cosmomc/ [6] Lewis, A., Bride, S. (2002) Phys. Rev. D, 66, 103511 [7] Górski, K. M. and Hivon, E. and Banday, A. J. and Wandet, B. D. and Hansen, F. K. and Reinecke, M. and Bartemann, M.(2005) ApJ 622 759-771 [8] Geman A., Rubin D. (1992) Statistica Science, 7, 457 [9] Fisher R.A. (1935) J. Roy. Stat. Soc., 98, 39 [10] Giks, W. R., Richardson, S., & Spiegehater, D. J. (1996), Markov Chain Monte Caro in Practice (London: Chapman and Ha) [11] Hajian A. (2007) Phys. Rev. D, 75, 083525 [12] Tayor J. F., Ashdown M. A. J., Hobson M. P. (2007) arxiv:0708.2989 [13] Freedman et a. (2000) ApJ, 553, 47 72 [14] http://heapix.jp.nasa.gov/ [15] Hivon, E. and Górski, K. M. and Netterfied, C. B. and Cri, B. P. and Prunet, S. and Hansen, F. (2002) ApJ 567 2 17 [16] Jimenez R., Verde, L., Peiris H., Kosowsky A. (2003) PRD, 70, 3005 [17] Kenda, M. and Stuart, A., The advanced theory of statistics., London: Griffin, 1977, 4th ed. [18] Knox, L.(1995) PrD, 52, 4307-4318. [19] Kosowsky, A., Miosavjevic, M., Jimenez, R. (2002) Phys. Rev. D, 66, 63007 [20] http://ambda.gsfc.nasa.gov/ [21] Martinez V. and Saar E. (2002), Statistics of the Gaaxy Distribution Chapman & Ha/CRC Press [22] Page, L. and Hinshaw, G. and Komatsu, E. and Nota, M. R. and Sperge, D. N. and Bennett, C. L. and Barnes, C. and Bean, R. and Doré, O. and Dunkey, J. and Hapern, M. and Hi, 47

R. S. and Jarosik, N. and Kogut, A. and Limon, M. and Meyer, S. S. and Odegard, N. and Peiris, H. V. and Tucker, G. S. and Verde, L. and Weiand, J. L. and Woack, E. and Wright, E. L. (2007) ApJS, 170 335 376 [23] Peebes, P. J. E., The arge-scae structure of the universe, Princeton, N.J., Princeton University Press, 1980. [24] Press, W. H. and Teukosky, S. A. and Vettering, W. T. and Fannery, B. P., Numerica recipes in FORTRAN. The art of scientific computing Cambridge: University Press, 1992. [25] Sperge, D. N. and Verde, L. and Peiris, H. V. and Komatsu, E. and Nota, M. R. and Bennett, C. L. and Hapern, M. and Hinshaw, G. and Jarosik, N. and Kogut, A. and Limon, M. and Meyer, S. S. and Page, L. and Tucker, G. S. and Weiand, J. L. and Woack, E. and Wright, E. L. (1979) ApJS 148 175 194 [26] Sperge, D. N. and Bean, R. and Doré, O. and Nota, M. R. and Bennett, C. L. and Dunkey, J. and Hinshaw, G. and Jarosik, N. and Komatsu, E. and Page, L. and Peiris, H. V. and Verde, L. and Hapern, M. and Hi, R. S. and Kogut, A. and Limon, M. and Meyer, S. S. and Odegard, N. and Tucker, G. S. and Weiand, J. L. and Woack, E. and Wright, E. L., (2007) ApJS 170, 377 408 [27] Verde, L. and Peiris, H. V. and Sperge, D. N. and Nota, M. R. and Bennett, C. L. and Hapern, M. and Hinshaw, G. and Jarosik, N. and Kogut, A. and Limon, M. and Meyer, S. S. and Page, L. and Tucker, G. S. and Woack, E. and Wright, E. L., (2003) ApJS 148, 195 211 [28] Wa J. V., Jenkins C. R. (2003) Practica statistic s for Astronomers, Cambridge University Press. 48

Figures FIG. 1 Exampe of inear fit to the data points D i in 2D. 49

Periodic boundary conditions Smooth transition sharp transition Non periodic boundary conditions FIG. 2 The importance of periodic boundary conditions 50

FIG. 3 Unconverged Markov chains. The eft pane shows a trace pot of the ikeihood vaues versus iteration number for one MCMC (these are the first 3000 steps from run). Note the burn-in for the first 100 steps. In the right pane, red dots are points of the chain in the (n, A) pane after discarding the burn-in. Green dots are from another MCMC for the same data-set and the same mode. It is cear that, athough the trace pot may appear to indicate that the chain has converged, it has not fuy expored the ikeihood surface. Using either of these two chains at this stage wi give incorrect resuts for the best fit cosmoogica parameters and their errors. Figure from ref (27) Tabes 51

TABLE I χ 2 for the conventionas 1,2,and 3 σ as a function of the number of parameters for the joint confidence eves. σ p 1 2 3 1-σ 68.3% 1.00 2.30 3.53 90% 2.71 4.61 6.25 2-σ 95.4% 4.00 6.17 8.02 3-σ 99.73% 9.00 11.8 14.2 52