Statistical Distributions in Astronomy

Save this PDF as:

Size: px
Start display at page:

Transcription

1 Data Mining In Modern Astronomy Sky Surveys: Statistical Distributions in Astronomy Ching-Wa Yip Bloomberg 518

2 From Data to Information We don t just want data. We want information from the data. Information Database Sensors Data Analysis or Data Mining

3 Why Do We Use Statistics to Analyze Data? Describe the data: What is the mean, median, mode? What is the standard deviation? What is the distribution? What are the outliers? Find trends in the data: Is there a correlation between two variables? How well are they correlated? What is the predicted value of a variable?

4 R A free statistical software. Many packages available for performing from basic to advanced statistical analyses. Popular in multiple disciplines. Popular in industry.

5 Interactive Data Language (IDL) A programming language for data analysis and plotting. Many Procedures for manipulating FITS file. Popular among astronomers. (Age map of a galaxy.)

6 Other Programming Languages & Resources Python (free; getting popular among astronomers) Matlab C/C++/C# Java Numerical Recipes (William Press et al.) Performs many numerical calculations Can be used with different programming languages LAPACK Performs algebraic and matrix calculations Can be used with different programming languages

7 Future: Data Analysis using Database Automated data analysis: Select data from DB using C# routines with SQL scripts embedded Perform computations Output results to DB, if necessary (MS SQL Server. Source: Alex Szalay)

8 Basic Description of the Data: Mean, Median, Mode Mean = average value Median = middle value Mode = most common value E.g. 10 sampled values = (3) (4) (7) (8) (2) (9) (6) (10) (5) (1) n = 10 Mean = Sum of values / n = 6.03 Median = ( ) / 2 Mode =

9 Basic Description of the Data: Mean, Median, Mode Mean = average value Median = middle value Mode = most common value E.g. 10 sampled values = (3) (4) (7) (8) (2) (9) (6) (10) (5) (1) n = 10 Mean = Sum of values / n = 6.03 Median = ( ) / 2 Mode = 9.6

10 Width of a Distribution: Standard Deviation (or SD, σ ) σ Width (+299,000 km/s)

11 Shape of a Distribution In general, sampled values may not form a well-quantified distribution (or, not analytical). We can use the binned histogram to report the shape of the distribution. Advanced: We can fit a linear combination of multiple functions.

12 Basic Statistics in R

13 Key Concepts in Statistics Variables Population Distribution vs. Sampling Distribution Central Limit Theorem (CLT)

14 A Big Bag of Marbles Suppose we want to measure the average and variance of the weight of a big bag of marbles. How to do it? We draw a sample from the population.

15 Random Variable (X) X s are Independent: the outcome of any one experiment does not influence the outcomes of others (random draws). X s are Identically Distributed: every X is drawn from the same distribution (same bag of marbles). X 1 X 2 X 3 X 4 X n-1 X n E.g., 1.1g 0.8g 1.0g 0.9g 0.7g 1.2g

16 Random Variable (X) X s are Independent: the outcome of any one experiment does not influence the outcomes of others (random draws). X s are Identically Distributed: every X is drawn from the same distribution (same bag of marbles). X 1 X 2 X 3 X 4 X n-1 X n E.g., 1.1g 0.8g 1.0g 0.9g 0.7g 1.2g IID Variable

17 Types of Random Variables Discrete variables E.g., Photon Count: 10, 1, 6, 2, 14, 3, 5, 7, E.g., Binary States: 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, Continuous variables E.g., Luminosity (in units of solar luminosity): 0.14, 2.46, 1.57, 4.52,

18 Height of Trees in Forest Sample Vs. Population

19 Color of Galaxies in Universe Credit: Miguel Angel Aragon Calvo

20 Statistical Distributions There are many statistical distributions available to describe experimental results. The most common ones in Astronomy are: Gaussian Distribution Poisson Distribution Planck Distribution

21 Gaussian Distribution (or Bell Curve, Normal Distribution ) (Carl Friedrich Gauss, )

22 Uncertainty in Physical Measurements For many experiments and observations concerning physical phenomena, we find that performing the procedure twice under (what seem!!!) identical conditions results in two different outcomes. Such kind of outcomes follows a Gaussian Distribution. For example: Michelson and Morley s speed-of-light experiment.

23 (+299,000 km/s) (Credit: Wikipedia)

24 Random Error (+299,000 km/s) (Credit: Wikipedia)

25 Main Lesson Michelson and Morley repeated the experiment many times to estimate the speed-of-light. They only knew the mean and SD of the population (X i ) exist, but they did not know of the values. It works because of the Central Limit Theorem. *Read first two paragraphs of the Handout.

26 Poisson Distribution The probability of the number of events (occurring in a fixed period of time) given a known, expected count. Probability (Siméon Poisson, ) k (= number of events)

27 Probability Examples of Poisson Distributions I know from experience that I get 4 phone calls a day (λ = 4). Then there is about 20% chance I will get 3 phone calls today. Number of events

28 Probability Examples of Poisson Distributions I know from experience that I get 4 photons a day (λ = 4). Then there is about 20% chance I will get 3 photons today. Number of events

29 Probability: Chance of Occurrence Discrete Variable (e.g., 1, 2, 3, 4, 5, 6) Continuous Variable E.g., Experimental results follow a Gaussian (also called Normal) X Probability of getting X between two values is the area under the Standard Normal Distribution between the two said values. A Standard Normal Distribution is a Normal Distribution with total area = 1.

30 Random Number: Gaussian Distribution as a Case Study When we analyze big datasets, sampling methods are commonly used. Random number plays an important role in sampling strategies. T 3 T 6 T 5 T 1 T 4 T 2 T 7 Suppose we have a piece of metal with temperature measurement T(x, y). We want to estimate the average temperature without looking at all data (i.e., the population).

31 Distribution of 10 Random Numbers from Normal Distribution

32 Distribution of 100 Random Numbers from Normal Distribution

33 Distributions in Astronomy Gaussian Distribution Poisson Distribution Planck Distribution Etc.

34 Emission Lines in Galaxies: Gaussian Fitting single or multiple Gaussian functions to the line profiles in a galaxy spectrum.

35 Frequency Colors of Galaxies in Nearby Universe: Multiple Gaussians Color (Redder ) (Yip, Connolly, Szalay, et al. 2004)

36 Seeing Disk of Stars: Gaussian

37 Cause of Seeing Disk of Stars: Atmospheric Disturbance

38 Photon Counts in Astronomical Images: Poisson Distribution Signal-to-Noise Ratio (SNR) = Count/Random Error of Count = Count (for Poisson) That is, when Count increases, SNR increases.

39 Cosmic Microwave Background: Planck Distribution Universe expands and cools down. The photons were hot, but cold now (3 Kelvin).

40 Black Body Radiation from the Sun: Planck Distribution (Max Planck, ) Why is the Sun not a perfect Black Body (or Planck Distribution)?

Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications

Data Mining In Modern Astronomy Sky Surveys: Concepts in Machine Learning, Unsupervised Learning & Astronomy Applications Ching-Wa Yip cwyip@pha.jhu.edu; Bloomberg 518 Human are Great Pattern Recognizers

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

Dr. Peter Tröger Hasso Plattner Institute, University of Potsdam. Software Profiling Seminar, Statistics 101

Dr. Peter Tröger Hasso Plattner Institute, University of Potsdam Software Profiling Seminar, 2013 Statistics 101 Descriptive Statistics Population Object Object Object Sample numerical description Object

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it

KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:

AMS 5 CHANCE VARIABILITY

AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

Experimental Methods -Statistical Data Analysis - Assignment

Experimental Methods -Statistical Data Analysis - Assignment In this assignment you will investigate daily weather observations from Alice Springs, Woomera and Charleville, from 1950 to 2006. These are

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

Data Mining and Visualization

Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research

Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

Data Provided: A formula sheet and table of physical constants is attached to this paper. DARK MATTER AND THE UNIVERSE

Data Provided: A formula sheet and table of physical constants is attached to this paper. DEPARTMENT OF PHYSICS AND ASTRONOMY Autumn Semester (2014-2015) DARK MATTER AND THE UNIVERSE 2 HOURS Answer question

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

PROBABILITIES AND PROBABILITY DISTRIBUTIONS

Published in "Random Walks in Biology", 1983, Princeton University Press PROBABILITIES AND PROBABILITY DISTRIBUTIONS Howard C. Berg Table of Contents PROBABILITIES PROBABILITY DISTRIBUTIONS THE BINOMIAL

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

1.1.2 Normal distribution 1.1.3 Approimating binomial distribution by normal 2.1 Central Limit Theorem Prof. Tesler Math 283 October 22, 214 Prof. Tesler 1.1.2-3, 2.1 Normal distribution Math 283 / October

Data Exploration Data Visualization

Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction

Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

Lecture 8: Radiation Spectrum The information contained in the light we receive is unaffected by distance The information remains intact so long as the light doesn t run into something along the way Since

F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

Characteristics and statistics of digital remote sensing imagery

Characteristics and statistics of digital remote sensing imagery There are two fundamental ways to obtain digital imagery: Acquire remotely sensed imagery in an analog format (often referred to as hard-copy)

Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination

Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg - ) Chromium (mg kg - ) Copper (mg kg - ) Lead (mg kg - ) Mercury

MTH 140 Statistics Videos

MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

Astronomy 114 Summary of Important Concepts #1 1

Astronomy 114 Summary of Important Concepts #1 1 1 Kepler s Third Law Kepler discovered that the size of a planet s orbit (the semi-major axis of the ellipse) is simply related to sidereal period of the

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

GCSE Statistics Revision notes

GCSE Statistics Revision notes Collecting data Sample This is when data is collected from part of the population. There are different methods for sampling Random sampling, Stratified sampling, Systematic

RESULTS FROM A SIMPLE INFRARED CLOUD DETECTOR

RESULTS FROM A SIMPLE INFRARED CLOUD DETECTOR A. Maghrabi 1 and R. Clay 2 1 Institute of Astronomical and Geophysical Research, King Abdulaziz City For Science and Technology, P.O. Box 6086 Riyadh 11442,

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

5. The Nature of Light. Does Light Travel Infinitely Fast? EMR Travels At Finite Speed. EMR: Electric & Magnetic Waves

5. The Nature of Light Light travels in vacuum at 3.0. 10 8 m/s Light is one form of electromagnetic radiation Continuous radiation: Based on temperature Wien s Law & the Stefan-Boltzmann Law Light has

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

Take away concepts. What is Energy? Solar Energy. EM Radiation. Properties of waves. Solar Radiation Emission and Absorption

Take away concepts Solar Radiation Emission and Absorption 1. 2. 3. 4. 5. 6. Conservation of energy. Black body radiation principle Emission wavelength and temperature (Wein s Law). Radiation vs. distance

Sampling, frequency distribution, graphs, measures of central tendency, measures of dispersion

Statistics Basics Sampling, frequency distribution, graphs, measures of central tendency, measures of dispersion Part 1: Sampling, Frequency Distributions, and Graphs The method of collecting, organizing,

Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

Nominal Scaling. Measures of Central Tendency, Spread, and Shape. Interval Scaling. Ordinal Scaling

Nominal Scaling Measures of, Spread, and Shape Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning The lowest level of

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Data Analysis Plan The appropriate methods of data analysis are determined by your data types and variables of interest, the actual distribution of the variables, and the number of cases. Different analyses

Implications of Big Data for Statistics Instruction 17 Nov 2013

Implications of Big Data for Statistics Instruction 17 Nov 2013 Implications of Big Data for Statistics Instruction Mark L. Berenson Montclair State University MSMESB Mini Conference DSI Baltimore November

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

GCSE HIGHER Statistics Key Facts

GCSE HIGHER Statistics Key Facts Collecting Data When writing questions for questionnaires, always ensure that: 1. the question is worded so that it will allow the recipient to give you the information

Preview of Period 3: Electromagnetic Waves Radiant Energy II

Preview of Period 3: Electromagnetic Waves Radiant Energy II 3.1 Radiant Energy from the Sun How is light reflected and transmitted? What is polarized light? 3.2 Energy Transfer with Radiant Energy How

Physics Open House. Faraday's Law and EM Waves Change in the magnetic field strength in coils generates a current. Electromagnetic Radiation

Electromagnetic Radiation (How we get most of our information about the cosmos) Examples of electromagnetic radiation: Light Infrared Ultraviolet Microwaves AM radio FM radio TV signals Cell phone signals

Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

INTERMEDIATE STATISTICS and LEAST SQUARES METHOD

INTERMEDIATE STATISTICS and LEAST SQUARES METHOD Educational Objective: To introduce several intermediate concepts in statistics for scientific measurements and data analysis, propagation of errors and

Investigating electromagnetic radiation Announcements: First midterm is 7:30pm on 2/17/09 Problem solving sessions M3-5 and T3-4,5-6. Homework due at 12:50pm on Wednesday. We are covering Chapter 4 this

Lecture 8: Signal Detection and Noise Assumption

ECE 83 Fall Statistical Signal Processing instructor: R. Nowak, scribe: Feng Ju Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(, σ I n n and S = [s, s,...,

Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?

7. Normal Distributions

7. Normal Distributions A. Introduction B. History C. Areas of Normal Distributions D. Standard Normal E. Exercises Most of the statistical analyses presented in this book are based on the bell-shaped

John Kerrich s coin-tossing Experiment. Law of Averages - pg. 294 Moore s Text

Law of Averages - pg. 294 Moore s Text When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So, if the coin is tossed a large number of times, the number of heads and the

GeoGebra Statistics and Probability

GeoGebra Statistics and Probability Project Maths Development Team 2013 www.projectmaths.ie Page 1 of 24 Index Activity Topic Page 1 Introduction GeoGebra Statistics 3 2 To calculate the Sum, Mean, Count,

Module 4: Data Exploration

Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive

MAT 12O ELEMENTARY STATISTICS I

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 12O ELEMENTARY STATISTICS I 3 Lecture Hours, 1 Lab Hour, 3 Credits Pre-Requisite:

Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data. and Alex Gray

Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data Željko Ivezić, Andrew J. Connolly, Jacob T. VanderPlas University of Washington and Alex

Session 1.6 Measures of Central Tendency

Session 1.6 Measures of Central Tendency Measures of location (Indices of central tendency) These indices locate the center of the frequency distribution curve. The mode, median, and mean are three indices

Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

STAT 155 Introductory Statistics. Lecture 5: Density Curves and Normal Distributions (I)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 5: Density Curves and Normal Distributions (I) 9/12/06 Lecture 5 1 A problem about Standard Deviation A variable

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

Blackbody Radiation References INTRODUCTION

Blackbody Radiation References 1) R.A. Serway, R.J. Beichner: Physics for Scientists and Engineers with Modern Physics, 5 th Edition, Vol. 2, Ch.40, Saunders College Publishing (A Division of Harcourt

Week 1. Exploratory Data Analysis

Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

Nuclear Physics Lab I: Geiger-Müller Counter and Nuclear Counting Statistics

Nuclear Physics Lab I: Geiger-Müller Counter and Nuclear Counting Statistics PART I Geiger Tube: Optimal Operating Voltage and Resolving Time Objective: To become acquainted with the operation and characteristics

Light. What is light?

Light What is light? 1. How does light behave? 2. What produces light? 3. What type of light is emitted? 4. What information do you get from that light? Methods in Astronomy Photometry Measure total amount

Lecture 3 : Hypothesis testing and model-fitting

Lecture 3 : Hypothesis testing and model-fitting These dark lectures energy puzzle Lecture 1 : basic descriptive statistics Lecture 2 : searching for correlations Lecture 3 : hypothesis testing and model-fitting

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

Grade 6 Standard 3 Unit Test A Astronomy. 1. The four inner planets are rocky and small. Which description best fits the next four outer planets?

Grade 6 Standard 3 Unit Test A Astronomy Multiple Choice 1. The four inner planets are rocky and small. Which description best fits the next four outer planets? A. They are also rocky and small. B. They

Unit 1: Introduction to Quality Management

Unit 1: Introduction to Quality Management Definition & Dimensions of Quality Quality Control vs Quality Assurance Small-Q vs Big-Q & Evolution of Quality Movement Total Quality Management (TQM) & its

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data

Didacticiel Études de cas

1 Theme Data Mining with R The rattle package. R (http://www.r project.org/) is one of the most exciting free data mining software projects of these last years. Its popularity is completely justified (see

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

Origins of the Cosmos Summer 2016. Pre-course assessment

Origins of the Cosmos Summer 2016 Pre-course assessment In order to grant two graduate credits for the workshop, we do require you to spend some hours before arriving at Penn State. We encourage all of

1. 2. 3. 4. Find the mean and median. 5. 1, 2, 87 6. 3, 2, 1, 10. Bellwork 3-23-15 Simplify each expression.

Bellwork 3-23-15 Simplify each expression. 1. 2. 3. 4. Find the mean and median. 5. 1, 2, 87 6. 3, 2, 1, 10 1 Objectives Find measures of central tendency and measures of variation for statistical data.

Name: Date: Use the following to answer questions 2-4:

Name: Date: 1. A phenomenon is observed many, many times under identical conditions. The proportion of times a particular event A occurs is recorded. What does this proportion represent? A) The probability

Predicting Flight Delays

Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

How Matter Emits Light: 1. the Blackbody Radiation

How Matter Emits Light: 1. the Blackbody Radiation Announcements n Quiz # 3 will take place on Thursday, October 20 th ; more infos in the link `quizzes of the website: Please, remember to bring a pencil.

Dealing with large datasets

Dealing with large datasets (by throwing away most of the data) Alan Heavens Institute for Astronomy, University of Edinburgh with Ben Panter, Rob Tweedie, Mark Bastin, Will Hossack, Keith McKellar, Trevor

Light as a Wave. The Nature of Light. EM Radiation Spectrum. EM Radiation Spectrum. Electromagnetic Radiation

The Nature of Light Light and other forms of radiation carry information to us from distance astronomical objects Visible light is a subset of a huge spectrum of electromagnetic radiation Maxwell pioneered

Descriptive Statistics

Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

Homework #4 Solutions ASTR100: Introduction to Astronomy Fall 2009: Dr. Stacy McGaugh

Homework #4 Solutions ASTR100: Introduction to Astronomy Fall 2009: Dr. Stacy McGaugh Chapter 5: #50 Hotter Sun: Suppose the surface temperature of the Sun were about 12,000K, rather than 6000K. a. How

Astro 130, Fall 2011, Homework, Chapter 17, Due Sep 29, 2011 Name: Date:

Astro 130, Fall 2011, Homework, Chapter 17, Due Sep 29, 2011 Name: Date: 1. If stellar parallax can be measured to a precision of about 0.01 arcsec using telescopes on Earth to observe stars, to what distance

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

COOKBOOK. for. Aristarchos Transient Spectrometer (ATS)

NATIONAL OBSERVATORY OF ATHENS Institute for Astronomy, Astrophysics, Space Applications and Remote Sensing HELMOS OBSERVATORY COOKBOOK for Aristarchos Transient Spectrometer (ATS) P. Boumis, J. Meaburn,

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE TROY BUTLER 1. Random variables and distributions We are often presented with descriptions of problems involving some level of uncertainty about

SSO Transmission Grating Spectrograph (TGS) User s Guide

SSO Transmission Grating Spectrograph (TGS) User s Guide The Rigel TGS User s Guide available online explains how a transmission grating spectrograph (TGS) works and how efficient they are. Please refer

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

Unit 1: Introduction to Six Sigma I

Unit 1: Introduction to Six Sigma I Six Sigma & its Goals Six Sigma as a Performance Measure, Problem Solving Tool, & Management Philosophy Problem Solving Tools Used in Six Sigma & its Evolution Variation

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,