Basic Descriptive Statistics & Probability Distributions

Size: px
Start display at page:

Download "Basic Descriptive Statistics & Probability Distributions"

Transcription

1 Basic Descriptive Statistics & Probability Distributions Scott Oser Lecture #2 Physics 509 1

2 Outline Last time: we discussed the meaning of probability, did a few warmups, and were introduced to Bayes theorem. Today we cover more basics. Basic descriptive statistics Covariance and correlation Properties of the Gaussian distribution The binomial distribution Application of binomial distributions to sports betting The multinomial distribution Physics 509 2

3 Basic Descriptive Statistics WHAT IS THIS DISTRIBUTION? Physics Often the probability distribution for a quantity is unknown. You may be able to sample it with finite statistics, however. Basic descriptive statistics is the procedure of encoding various properties of the distribution in a few numbers.

4 The Centre of the Data: Mean, Median, & Mode Mean of a data set: N x= 1 N i=1 x i Mean of a PDF = expectation value of x x dx P x x Median: the point with 50% probability above & 50% below. (If a tie, use an average of the tied values.) Less sensitive to tails! Mode: the most likely value 4

5 Variance V & Standard Deviation (a.k.a. RMS) Variance of a distribution: V x = 2 = dx P x x 2 V x = dx P x x 2 2 dx P x x 2 dx P x = x 2 2 = x 2 x 2 Variance of a data sample (regrettably has same notation as variance of a distribution---be careful!): V x = 2 = 1 N i x i x 2 =x 2 x 2 An important point we'll return to: the above formula underestimates the variance of the underlying distribution, since it uses the mean calculated from the data instead of the true mean of the true distribution. V x = 2 = 1 x N 1 i x 2 V x = 2 = 1 i N x i 2 i This is unbiased if you must estimate the mean from the data. Use this if you know the true mean of the underlying distribution. Physics 509 5

6 FWHM & Quartiles/Percentiles FWHM = Full Width Half Max. It means what it sounds like--- measure across the width of a distribution at the point where P(x)=(1/2)(P max ). For Gaussian distributions, FWHM=2.35 Quartiles, percentiles, and even the median are rank statistics. Sort the data from lowest to highest. The median is the point where 50% of data are above and 50% are below. The quartile points are those at which 25%, 50%, and 75% of the data are below that point. You can also extend this to percentile rank, just like on a GRE exam. FWHM or some other width parameter, such as 75% percentile data point 25% data point, are often robust in cases where the RMS is more sensitive to events on tails. Physics 509 6

7 Higher Moments Of course you can calculate the r th moment of a distribution if you really want to. For example, the third central moment is called the skew, and is sensitive to the asymmetry of the distribution (exact definition may vary---here's a unitless definition): skew= = 1 x N 3 i x 3 i Kurtosis (or curtosis) is the fourth central moment, with varying choices of normalizations. For fun you are welcome to look up the words leptokurtotic and platykurtotic, but since I speak Greek I don't have to. Warning: Not every distribution has well-defined moments. The integral or sum will sometimes not converge! Physics 509 7

8 A bad distribution: the Cauchy distribution Consider the Cauchy, or Breit-Wigner, distribution. Also called a Lorentzian. It is characterized by its centroid M and its FWHM. P x, M = 1 2 x M 2 /2 2 A Cauchy distribution has infinite variance and higher moments! Unfortunately the Cauchy distribution actually describes the mass peak of a particle, or the width of a spectral line, so this distribution actually occurs! Physics Cauchy (black) vs. Gaussian (red)

9 Covariance & Correlation The covariance between two variables is defined by: cov x, y = x x y y = xy x y This is the most useful thing they never tell you in most lab courses! Note that cov(x,x)=v(x). The correlation coefficient is a unitless version of the same thing: cov x, y = x y If x and y are independent variables (P(x,y) = P(x)P(y)), then cov x, y = dx dy P x, y xy dx dy P x, y x dx dy P x, y y = dx P x x dy P y y dx P x x dy P y y = 0 Physics 509 9

10 More on Covariance Correlation coefficients for some simulated data sets. Physics Note the bottom right---while independent variables must have zero correlation, the reverse is not true! Correlation is important because it is part of the error propagation equation, as we'll see.

11 Variance and Covariance of Linear Combinations of Variables Suppose we have two random variable X and Y (not necessarily independent), and that we know cov(x,y). Consider the linear combinations W=aX+bY and Z=cX+dY. It can be shown that cov(w,z)=cov(ax+by,cx+dy) = cov(ax,cx) + cov(ax,dy) + cov(by,cx) + cov(by,dy) = ac cov(x,x) + (ad + bc) cov(x,y) + bd cov(y,y) = ac V(X) + bd V(Y) + (ad+bc) cov(x,y) Special case is V(X+Y): V(X+Y) = cov(x+y,x+y) = V(X) + V(Y) + 2cov(X,Y) Very special case: variance of the sum of independent random variables is the sum of their individual variances! Physics

12 Gaussian Distributions By far the most useful distribution is the Gaussian (normal) distribution: P x, = e x 2 2 Mean =, Variance= 2 Note that width scales with. Area out on tails is important---use lookup tables or cumulative distribution function. In plot to left, red area (>2 ) is 2.3% % of area within % of area within % of area within 3 90% of area within % of area within % of area within Physics

13 Why are Gaussian distributions so critical? They occur very commonly---the reason is that the average of several independent random variables often approaches a Gaussian distribution in the limit of large N. Nice mathematical properties---infinitely differentiable, symmetric. Sum or difference of two Gaussian variables is always itself Gaussian in its distribution. Many complicated formulas simplify to linear algebra, or even simpler, if all variables have Gaussian distributions. Gaussian distribution is often used as a shorthand for discussing probabilities. A 5 sigma result means a result with a chance probability that is the same as the tail area of a unit Gaussian: 2 5 dt P t =0, =1 This way of speaking is used even for non-gaussian distributions! Physics

14 Why you should be very careful with Gaussians.. The major danger of Gaussians is that they are overused. Although many distributions are approximately Gaussian, they often have long non-gaussian tails. While 99% of the time a Gaussian distribution will correctly model your data, many foul-ups result from that other 1%. It's usually good practice to simulate your data to see if the distributions of quantities you think are Gaussian really follow a Gaussian distribution. Common example: the ratio of two numbers with Gaussian distributions is itself often not very Gaussian (although in certain limits it may be). Physics

15 A slightly non-trivial example Two measurements (X & Y) are drawn from two separate normal distributions. The first distribution has mean=5 & RMS=2. The second has mean=3 & RMS=1. The correlation coefficient of the two distributions is =-0.5. What is the distribution of the sum Z=X+Y? Physics

16 A slightly non-trivial example Two measurements (X & Y) are drawn from two separate normal distributions. The first distribution has mean=5 & RMS=2. The second has mean=3 & RMS=1. The correlation coefficient of the two distributions is =-0.5. What is the distribution of the sum Z=X+Y? First, recognize that the sum of two Gaussians is itself Gaussian, even if there is a correlation between the two. To see this, imagine that we drew two truly independent Gaussian random variables X and W. Then we could form a linear combination Y=aX+bW. Y would clearly be Gaussian, although correlated with X. Then Z=X+Y=X+aX+bW=(a+1)X+bW is the sum of two truly independent Gaussian variables itself. So Z must be a Gaussian. Physics

17 A slightly non-trivial example Two measurements (X & Y) are drawn from two separate normal distributions. The first distribution has mean=5 & RMS=2. The second has mean=3 & RMS=1. The correlation coefficient of the two distributions is =-0.5. What is the distribution of the sum Z=X+Y? Now, recognizing that Z is Gaussian, all we need to figure out are its mean and RMS. First the mean: X Y = dx dy P X,Y X Y = dx dy P X,Y X dx dy P X,Y Y = X Y This is just equal to 5+3 = 8. Physics

18 A slightly non-trivial example Two measurements (X & Y) are drawn from two separate normal distributions. The first distribution has mean=5 & RMS=2. The second has mean=3 & RMS=1. The correlation coefficient of the two distributions is =-0.5. What is the distribution of the sum Z=X+Y? Now for the RMS. Use V(Z)=cov(Z,Z)=cov(X+Y,X+Y) V(Z) = cov(x,x) + 2 cov(x,y) + cov(y,y) = x x y + y 2 = (2)(2) + 2(2)(1)(-0.5) + (1)(1) = 3 So Z is a Gaussian with mean=8 and RMS of =sqrt(3) Physics

19 Binomial Distributions Many outcomes are binary---yes/no, heads/tails, etc. Ex. You flip N unbalanced coins. Each coin has probability p of landing heads. What is the probability that you get m heads (and N-m tails)? The binomial distribution: P m p, N = p m 1 p N m N! m! N m! First term: probability of m coins all getting heads Second term: probability of N-m coins all getting tails Third term: number of different ways to pick m different coins from a collection of N total be to heads. Physics

20 Binomial distributions P m p, N = p m 1 p N m N! m! N m! Mean = Np Variance = Np(1-p) Notice that the mean and variance both scale linearly with N. This is understandable---flipping N coins is the sum of N independent binomial variables. When N gets big, the distribution looks increasingly Gaussian! 20

21 But a binomial distribution isn't a Gaussian! Gaussian approximation fails out on the tails... Physics

22 More on the binomial distribution In the limit of large Np, Gaussian approximation is decent so long as P(m=0) P(m=N) 0, provided you don't care much about tails. Beware a common error: =sqrt(np(1-p)), not =sqrt(m)=sqrt(np). The latter is only true if p 1. The error is not always just the simple square root of the number of entries! Use a binomial distribution to model most processes with two outcomes: Detection efficiency (either we detect or we don't) Cut rejection Win-loss records (although beware correlations between teams that play in the same league) Physics

23 An example from the world of sports... Consider a best-of-seven series... the first team to win four games takes the prize. We have a model which predicts that Team A is favoured in any game with p=0.6. What is the probability that A wins the series? How could we approach this problem? Physics

24 Best of 7 series: brute force Easiest approach may be simply to list the possibilities: A. Win in 4 straight games. Probability = p 4 B. Win in 5 games. Four choices for which game the team gets to lose. Probability = 4p 4 (1-p) C. Win in 6 games. Choose 2 of the previous five games to lose. Probability = C(5,2)p 4 (1-p) 2 = 10p 4 (1-p) 2 D. Win in 7 games. Choose 3 of the previous six games to lose. Probability = C(6,3)p 4 (1-p) 3 = 20p 4 (1-p) 3 Prob p = p p 10 1 p p 3 Physics

25 Best of 7 series: outcomes Prob p = p p 10 1 p p 3 Symmetry evident between p and 1-p, which makes good logical sense For p=0.6, probability of series win is only 71% Physics

26 Best of 7 series: online betting studies Efficient market hypothesis: if a market mis-estimates a risk, smart investors will figure this out and bet accordingly, driving the odds back to the correct value. There is significant evidence that this hypothesis (almost) holds in many real-life markets. See A Random Walk Down Wall Street for details.* Does this work for online sports betting? * Warning: reading this book may endanger your career in physics by getting you interested in quantitative analysis of financial markets. Physics

27 Best of 7 series: online betting studies I got interested in this during the 2006 baseball playoffs, as my beloved Cardinals came very close to collapsing entirely, yet went on to win the World Series. I used a coin flip model to predict series odds: All games treated as independent, with equal probability. In simplest case, assume p=0.5 More complicated case: using Bill James' Pythagorean Theorem to predict winning percentage of matchup: Runs Scored 2 p= Runs Scored 2 Runs Allowed 2 Physics

28 My brother is stupid. Younger brothers always are. He objected to my coin flip model: Assigning 50/50 odds is ludicrous when you know the Astros will start Clemens. If you want to estimate p, you should only look at recent records to estimate odds. How dare he deny my math! But the proof is in the pudding... Physics

29 What do the markets say? Odds for St. Louis to win NL Central title (going into final weekend): Coin flip model says 74.6%. Betting market said 74%. After St. Louis loses some ground to Houston, coin flip model says 59%. The betting markets predicted 61%. My brother's recency prior for p predicts 45%. Going into next to last day of season, my coin flip model says 89% for Cards. Betting market is mixed: odds for Cards to win are all over the map, but odds for Houston to win is right at 11%. Opportunity for arbitrage. Last day: coin flip and markets both predict ~93% Physics

30 In the middle of the first round of playoffs Coin Flip Betting Markets St Louis over San Diego 68.8% 66.1% Dodgers over Mets 50.0% 44.6% Twins over A's 31.3% 36.0% Yanks over Detroit 68.8% 85.5% One way to view the betting market odds is actually as an estimator of the p value for a matchup. For example, the market felt (wrongly) that Detroit was badly overmatched. In the end, CARDS WIN! Physics

31 Negative binomial distribution In a regular binomial distribution, you decide ahead of time how many times you'll flip the coin, and calculate the probability of getting k heads. In the negative binomial distribution, you decide how many heads you want to get, then calculate the probability that you have to flip the coin N times before getting that many heads. This gives you a probability distribution for N: P N k, p = N 1 k 1 pk 1 p N k Physics

32 Multinomial distribution We can generalize a binomial distribution to the case where there are more than two possible outcomes. Suppose there are k possible outcomes, and we do N trials. Let n i be the number of times that the i th outcome comes up, and let p i be the probability of getting outcome i in one trial. The probability of getting a certain distribution of n i is then: P n 1, n 2,...,n k p 1... p k = N! n 1! n 2!...n k! p n 1 n 1 p n k pk Note that there are important constraints on the parameters: k i p i =1 k i n i =N Physics

33 What is the multinomial distribution good for? Any problem in which there are several discrete outcomes (binomial distribution is a special case). Note that unlike the binomial distribution, which basically predicts one quantity (the number of heads---you get the number of tails for free), the multinomial distribution is a joint probability distribution for several variables (the various n i, of which all but one are independent). If you care about just one of these, you can marginalize over the other (sum them over all of their possible values) to get the probability distribution for the one you care about. This obviously will have a binomial distribution. A common application: binned data! If you sample independent trials from a distribution and bin the results, the numbers you predict for each bin follow the multinomial distribution. Physics

34 Dealing with binned data Very often you're going to deal with binned data. Maybe there are too many individual data points to handle efficiently. Maybe you binned it to make a pretty plot, then want to fit a function to the plot. Some gotchas: Nothing in the laws of statistics demands equal binning. Consider binning with equal statistics per bin. Beware bins with few data points. Many statistical tests implicitly assume Gaussian errors, which won't hold for small numbers. General rule of thumb: rebin until every bin has >5 events. Always remember that binning throws away information. Don't do it unless you must. Try to make bin size smaller than any relevant feature in the data. If statistics don't permit this, then you shouldn't be binning, at least for that part of the distribution. Physics

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Math 202-0 Quizzes Winter 2009

Math 202-0 Quizzes Winter 2009 Quiz : Basic Probability Ten Scrabble tiles are placed in a bag Four of the tiles have the letter printed on them, and there are two tiles each with the letters B, C and D on them (a) Suppose one tile

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 3 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected

More information

Mathematical Expectation

Mathematical Expectation Mathematical Expectation Properties of Mathematical Expectation I The concept of mathematical expectation arose in connection with games of chance. In its simplest form, mathematical expectation is the

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Exploratory Data Analysis. Psychology 3256

Exploratory Data Analysis. Psychology 3256 Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find

More information

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000 Lecture 4 Nancy Pfenning Stats 000 Chapter 7: Probability Last time we established some basic definitions and rules of probability: Rule : P (A C ) = P (A). Rule 2: In general, the probability of one event

More information

Statistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined

Statistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined Expectation Statistics and Random Variables Math 425 Introduction to Probability Lecture 4 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan February 9, 2009 When a large

More information

How to Win the Stock Market Game

How to Win the Stock Market Game How to Win the Stock Market Game 1 Developing Short-Term Stock Trading Strategies by Vladimir Daragan PART 1 Table of Contents 1. Introduction 2. Comparison of trading strategies 3. Return per trade 4.

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

Problem sets for BUEC 333 Part 1: Probability and Statistics

Problem sets for BUEC 333 Part 1: Probability and Statistics Problem sets for BUEC 333 Part 1: Probability and Statistics I will indicate the relevant exercises for each week at the end of the Wednesday lecture. Numbered exercises are back-of-chapter exercises from

More information

Betting systems: how not to lose your money gambling

Betting systems: how not to lose your money gambling Betting systems: how not to lose your money gambling G. Berkolaiko Department of Mathematics Texas A&M University 28 April 2007 / Mini Fair, Math Awareness Month 2007 Gambling and Games of Chance Simple

More information

9. Sampling Distributions

9. Sampling Distributions 9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

FACT A computer CANNOT pick numbers completely at random!

FACT A computer CANNOT pick numbers completely at random! 1 THE ROULETTE BIAS SYSTEM Please note that all information is provided as is and no guarantees are given whatsoever as to the amount of profit you will make if you use this system. Neither the seller

More information

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.) Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

More information

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard

More information

If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C?

If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C? Problem 3 If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C? Suggested Questions to ask students about Problem 3 The key to this question

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Due Date: Friday, March 11 at 5:00 PM This homework has 170 points plus 20 bonus points available but, as always, homeworks are graded

More information

Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

More information

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

consider the number of math classes taken by math 150 students. how can we represent the results in one number? ch 3: numerically summarizing data - center, spread, shape 3.1 measure of central tendency or, give me one number that represents all the data consider the number of math classes taken by math 150 students.

More information

The Binomial Distribution

The Binomial Distribution The Binomial Distribution James H. Steiger November 10, 00 1 Topics for this Module 1. The Binomial Process. The Binomial Random Variable. The Binomial Distribution (a) Computing the Binomial pdf (b) Computing

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

More information

WHERE DOES THE 10% CONDITION COME FROM?

WHERE DOES THE 10% CONDITION COME FROM? 1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay

More information

6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks

6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks 6.042/8.062J Mathematics for Comuter Science December 2, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Random Walks Gambler s Ruin Today we re going to talk about one-dimensional random walks. In

More information

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science Mondays 2:10 4:00 (GB 220) and Wednesdays 2:10 4:00 (various) Jeffrey Rosenthal Professor of Statistics, University of Toronto

More information

13.0 Central Limit Theorem

13.0 Central Limit Theorem 13.0 Central Limit Theorem Discuss Midterm/Answer Questions Box Models Expected Value and Standard Error Central Limit Theorem 1 13.1 Box Models A Box Model describes a process in terms of making repeated

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR) 2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

More information

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important

More information

Statistics 104: Section 6!

Statistics 104: Section 6! Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

THE ROULETTE BIAS SYSTEM

THE ROULETTE BIAS SYSTEM 1 THE ROULETTE BIAS SYSTEM Please note that all information is provided as is and no guarantees are given whatsoever as to the amount of profit you will make if you use this system. Neither the seller

More information

Chapter 16. Law of averages. Chance. Example 1: rolling two dice Sum of draws. Setting up a. Example 2: American roulette. Summary.

Chapter 16. Law of averages. Chance. Example 1: rolling two dice Sum of draws. Setting up a. Example 2: American roulette. Summary. Overview Box Part V Variability The Averages Box We will look at various chance : Tossing coins, rolling, playing Sampling voters We will use something called s to analyze these. Box s help to translate

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

The Normal Approximation to Probability Histograms. Dice: Throw a single die twice. The Probability Histogram: Area = Probability. Where are we going?

The Normal Approximation to Probability Histograms. Dice: Throw a single die twice. The Probability Histogram: Area = Probability. Where are we going? The Normal Approximation to Probability Histograms Where are we going? Probability histograms The normal approximation to binomial histograms The normal approximation to probability histograms of sums

More information

MONEY MANAGEMENT. Guy Bower delves into a topic every trader should endeavour to master - money management.

MONEY MANAGEMENT. Guy Bower delves into a topic every trader should endeavour to master - money management. MONEY MANAGEMENT Guy Bower delves into a topic every trader should endeavour to master - money management. Many of us have read Jack Schwager s Market Wizards books at least once. As you may recall it

More information

Polynomials and Factoring. Unit Lesson Plan

Polynomials and Factoring. Unit Lesson Plan Polynomials and Factoring Unit Lesson Plan By: David Harris University of North Carolina Chapel Hill Math 410 Dr. Thomas, M D. 2 Abstract This paper will discuss, and give, lesson plans for all the topics

More information

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you

More information

MA 1125 Lecture 14 - Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values.

MA 1125 Lecture 14 - Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values. MA 5 Lecture 4 - Expected Values Friday, February 2, 24. Objectives: Introduce expected values.. Means, Variances, and Standard Deviations of Probability Distributions Two classes ago, we computed the

More information

Probability and Expected Value

Probability and Expected Value Probability and Expected Value This handout provides an introduction to probability and expected value. Some of you may already be familiar with some of these topics. Probability and expected value are

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

The Math. P (x) = 5! = 1 2 3 4 5 = 120.

The Math. P (x) = 5! = 1 2 3 4 5 = 120. The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Recursive Estimation

Recursive Estimation Recursive Estimation Raffaello D Andrea Spring 04 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March 8, 05 Notes: Notation: Unlessotherwisenoted,x, y,andz denoterandomvariables, f x

More information

6 PROBABILITY GENERATING FUNCTIONS

6 PROBABILITY GENERATING FUNCTIONS 6 PROBABILITY GENERATING FUNCTIONS Certain derivations presented in this course have been somewhat heavy on algebra. For example, determining the expectation of the Binomial distribution (page 5.1 turned

More information

The overall size of these chance errors is measured by their RMS HALF THE NUMBER OF TOSSES NUMBER OF HEADS MINUS 0 400 800 1200 1600 NUMBER OF TOSSES

The overall size of these chance errors is measured by their RMS HALF THE NUMBER OF TOSSES NUMBER OF HEADS MINUS 0 400 800 1200 1600 NUMBER OF TOSSES INTRODUCTION TO CHANCE VARIABILITY WHAT DOES THE LAW OF AVERAGES SAY? 4 coins were tossed 1600 times each, and the chance error number of heads half the number of tosses was plotted against the number

More information

Multivariate Analysis of Variance (MANOVA): I. Theory

Multivariate Analysis of Variance (MANOVA): I. Theory Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

More information

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i ) Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Stat 20: Intro to Probability and Statistics

Stat 20: Intro to Probability and Statistics Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley 22 July 2014 By the end of this lecture... You will be able to: Determine what we expect the sum

More information

Statistics 100A Homework 3 Solutions

Statistics 100A Homework 3 Solutions Chapter Statistics 00A Homework Solutions Ryan Rosario. Two balls are chosen randomly from an urn containing 8 white, black, and orange balls. Suppose that we win $ for each black ball selected and we

More information

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a

More information

Measures of Central Tendency and Variability: Summarizing your Data for Others

Measures of Central Tendency and Variability: Summarizing your Data for Others Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :

More information

WEEK #22: PDFs and CDFs, Measures of Center and Spread

WEEK #22: PDFs and CDFs, Measures of Center and Spread WEEK #22: PDFs and CDFs, Measures of Center and Spread Goals: Explore the effect of independent events in probability calculations. Present a number of ways to represent probability distributions. Textbook

More information

Experimental Designs (revisited)

Experimental Designs (revisited) Introduction to ANOVA Copyright 2000, 2011, J. Toby Mordkoff Probably, the best way to start thinking about ANOVA is in terms of factors with levels. (I say this because this is how they are described

More information

Example: Find the expected value of the random variable X. X 2 4 6 7 P(X) 0.3 0.2 0.1 0.4

Example: Find the expected value of the random variable X. X 2 4 6 7 P(X) 0.3 0.2 0.1 0.4 MATH 110 Test Three Outline of Test Material EXPECTED VALUE (8.5) Super easy ones (when the PDF is already given to you as a table and all you need to do is multiply down the columns and add across) Example:

More information

The Assumption(s) of Normality

The Assumption(s) of Normality The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew

More information

Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

More information

The Binomial Probability Distribution

The Binomial Probability Distribution The Binomial Probability Distribution MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2015 Objectives After this lesson we will be able to: determine whether a probability

More information

Math 431 An Introduction to Probability. Final Exam Solutions

Math 431 An Introduction to Probability. Final Exam Solutions Math 43 An Introduction to Probability Final Eam Solutions. A continuous random variable X has cdf a for 0, F () = for 0 <

More information

3: Summary Statistics

3: Summary Statistics 3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes

More information

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

More information

Section 7C: The Law of Large Numbers

Section 7C: The Law of Large Numbers Section 7C: The Law of Large Numbers Example. You flip a coin 00 times. Suppose the coin is fair. How many times would you expect to get heads? tails? One would expect a fair coin to come up heads half

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

Chapter 5: Normal Probability Distributions - Solutions

Chapter 5: Normal Probability Distributions - Solutions Chapter 5: Normal Probability Distributions - Solutions Note: All areas and z-scores are approximate. Your answers may vary slightly. 5.2 Normal Distributions: Finding Probabilities If you are given that

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Ch. 3.1 # 3, 4, 7, 30, 31, 32

Ch. 3.1 # 3, 4, 7, 30, 31, 32 Math Elementary Statistics: A Brief Version, 5/e Bluman Ch. 3. # 3, 4,, 30, 3, 3 Find (a) the mean, (b) the median, (c) the mode, and (d) the midrange. 3) High Temperatures The reported high temperatures

More information

Contemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific

Contemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific Contemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific Name: The point value of each problem is in the left-hand margin. You

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

More information

Intro to Data Analysis, Economic Statistics and Econometrics

Intro to Data Analysis, Economic Statistics and Econometrics Intro to Data Analysis, Economic Statistics and Econometrics Statistics deals with the techniques for collecting and analyzing data that arise in many different contexts. Econometrics involves the development

More information

Chapter 5. Discrete Probability Distributions

Chapter 5. Discrete Probability Distributions Chapter 5. Discrete Probability Distributions Chapter Problem: Did Mendel s result from plant hybridization experiments contradicts his theory? 1. Mendel s theory says that when there are two inheritable

More information

A Quick Algebra Review

A Quick Algebra Review 1. Simplifying Epressions. Solving Equations 3. Problem Solving 4. Inequalities 5. Absolute Values 6. Linear Equations 7. Systems of Equations 8. Laws of Eponents 9. Quadratics 10. Rationals 11. Radicals

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution 8 October 2007 In this lecture we ll learn the following: 1. how continuous probability distributions differ

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit? ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

More information

Economics 1011a: Intermediate Microeconomics

Economics 1011a: Intermediate Microeconomics Lecture 12: More Uncertainty Economics 1011a: Intermediate Microeconomics Lecture 12: More on Uncertainty Thursday, October 23, 2008 Last class we introduced choice under uncertainty. Today we will explore

More information

Mind on Statistics. Chapter 8

Mind on Statistics. Chapter 8 Mind on Statistics Chapter 8 Sections 8.1-8.2 Questions 1 to 4: For each situation, decide if the random variable described is a discrete random variable or a continuous random variable. 1. Random variable

More information

A New Interpretation of Information Rate

A New Interpretation of Information Rate A New Interpretation of Information Rate reproduced with permission of AT&T By J. L. Kelly, jr. (Manuscript received March 2, 956) If the input symbols to a communication channel represent the outcomes

More information

Texas Hold em. From highest to lowest, the possible five card hands in poker are ranked as follows:

Texas Hold em. From highest to lowest, the possible five card hands in poker are ranked as follows: Texas Hold em Poker is one of the most popular card games, especially among betting games. While poker is played in a multitude of variations, Texas Hold em is the version played most often at casinos

More information

You flip a fair coin four times, what is the probability that you obtain three heads.

You flip a fair coin four times, what is the probability that you obtain three heads. Handout 4: Binomial Distribution Reading Assignment: Chapter 5 In the previous handout, we looked at continuous random variables and calculating probabilities and percentiles for those type of variables.

More information

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

AMS 5 CHANCE VARIABILITY

AMS 5 CHANCE VARIABILITY AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

MAS108 Probability I

MAS108 Probability I 1 QUEEN MARY UNIVERSITY OF LONDON 2:30 pm, Thursday 3 May, 2007 Duration: 2 hours MAS108 Probability I Do not start reading the question paper until you are instructed to by the invigilators. The paper

More information

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS UNIT I: RANDOM VARIABLES PART- A -TWO MARKS 1. Given the probability density function of a continuous random variable X as follows f(x) = 6x (1-x) 0

More information

Solution Let us regress percentage of games versus total payroll.

Solution Let us regress percentage of games versus total payroll. Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars)

More information