Basic Descriptive Statistics & Probability Distributions
|
|
- Ralf Montgomery
- 8 years ago
- Views:
Transcription
1 Basic Descriptive Statistics & Probability Distributions Scott Oser Lecture #2 Physics 509 1
2 Outline Last time: we discussed the meaning of probability, did a few warmups, and were introduced to Bayes theorem. Today we cover more basics. Basic descriptive statistics Covariance and correlation Properties of the Gaussian distribution The binomial distribution Application of binomial distributions to sports betting The multinomial distribution Physics 509 2
3 Basic Descriptive Statistics WHAT IS THIS DISTRIBUTION? Physics Often the probability distribution for a quantity is unknown. You may be able to sample it with finite statistics, however. Basic descriptive statistics is the procedure of encoding various properties of the distribution in a few numbers.
4 The Centre of the Data: Mean, Median, & Mode Mean of a data set: N x= 1 N i=1 x i Mean of a PDF = expectation value of x x dx P x x Median: the point with 50% probability above & 50% below. (If a tie, use an average of the tied values.) Less sensitive to tails! Mode: the most likely value 4
5 Variance V & Standard Deviation (a.k.a. RMS) Variance of a distribution: V x = 2 = dx P x x 2 V x = dx P x x 2 2 dx P x x 2 dx P x = x 2 2 = x 2 x 2 Variance of a data sample (regrettably has same notation as variance of a distribution---be careful!): V x = 2 = 1 N i x i x 2 =x 2 x 2 An important point we'll return to: the above formula underestimates the variance of the underlying distribution, since it uses the mean calculated from the data instead of the true mean of the true distribution. V x = 2 = 1 x N 1 i x 2 V x = 2 = 1 i N x i 2 i This is unbiased if you must estimate the mean from the data. Use this if you know the true mean of the underlying distribution. Physics 509 5
6 FWHM & Quartiles/Percentiles FWHM = Full Width Half Max. It means what it sounds like--- measure across the width of a distribution at the point where P(x)=(1/2)(P max ). For Gaussian distributions, FWHM=2.35 Quartiles, percentiles, and even the median are rank statistics. Sort the data from lowest to highest. The median is the point where 50% of data are above and 50% are below. The quartile points are those at which 25%, 50%, and 75% of the data are below that point. You can also extend this to percentile rank, just like on a GRE exam. FWHM or some other width parameter, such as 75% percentile data point 25% data point, are often robust in cases where the RMS is more sensitive to events on tails. Physics 509 6
7 Higher Moments Of course you can calculate the r th moment of a distribution if you really want to. For example, the third central moment is called the skew, and is sensitive to the asymmetry of the distribution (exact definition may vary---here's a unitless definition): skew= = 1 x N 3 i x 3 i Kurtosis (or curtosis) is the fourth central moment, with varying choices of normalizations. For fun you are welcome to look up the words leptokurtotic and platykurtotic, but since I speak Greek I don't have to. Warning: Not every distribution has well-defined moments. The integral or sum will sometimes not converge! Physics 509 7
8 A bad distribution: the Cauchy distribution Consider the Cauchy, or Breit-Wigner, distribution. Also called a Lorentzian. It is characterized by its centroid M and its FWHM. P x, M = 1 2 x M 2 /2 2 A Cauchy distribution has infinite variance and higher moments! Unfortunately the Cauchy distribution actually describes the mass peak of a particle, or the width of a spectral line, so this distribution actually occurs! Physics Cauchy (black) vs. Gaussian (red)
9 Covariance & Correlation The covariance between two variables is defined by: cov x, y = x x y y = xy x y This is the most useful thing they never tell you in most lab courses! Note that cov(x,x)=v(x). The correlation coefficient is a unitless version of the same thing: cov x, y = x y If x and y are independent variables (P(x,y) = P(x)P(y)), then cov x, y = dx dy P x, y xy dx dy P x, y x dx dy P x, y y = dx P x x dy P y y dx P x x dy P y y = 0 Physics 509 9
10 More on Covariance Correlation coefficients for some simulated data sets. Physics Note the bottom right---while independent variables must have zero correlation, the reverse is not true! Correlation is important because it is part of the error propagation equation, as we'll see.
11 Variance and Covariance of Linear Combinations of Variables Suppose we have two random variable X and Y (not necessarily independent), and that we know cov(x,y). Consider the linear combinations W=aX+bY and Z=cX+dY. It can be shown that cov(w,z)=cov(ax+by,cx+dy) = cov(ax,cx) + cov(ax,dy) + cov(by,cx) + cov(by,dy) = ac cov(x,x) + (ad + bc) cov(x,y) + bd cov(y,y) = ac V(X) + bd V(Y) + (ad+bc) cov(x,y) Special case is V(X+Y): V(X+Y) = cov(x+y,x+y) = V(X) + V(Y) + 2cov(X,Y) Very special case: variance of the sum of independent random variables is the sum of their individual variances! Physics
12 Gaussian Distributions By far the most useful distribution is the Gaussian (normal) distribution: P x, = e x 2 2 Mean =, Variance= 2 Note that width scales with. Area out on tails is important---use lookup tables or cumulative distribution function. In plot to left, red area (>2 ) is 2.3% % of area within % of area within % of area within 3 90% of area within % of area within % of area within Physics
13 Why are Gaussian distributions so critical? They occur very commonly---the reason is that the average of several independent random variables often approaches a Gaussian distribution in the limit of large N. Nice mathematical properties---infinitely differentiable, symmetric. Sum or difference of two Gaussian variables is always itself Gaussian in its distribution. Many complicated formulas simplify to linear algebra, or even simpler, if all variables have Gaussian distributions. Gaussian distribution is often used as a shorthand for discussing probabilities. A 5 sigma result means a result with a chance probability that is the same as the tail area of a unit Gaussian: 2 5 dt P t =0, =1 This way of speaking is used even for non-gaussian distributions! Physics
14 Why you should be very careful with Gaussians.. The major danger of Gaussians is that they are overused. Although many distributions are approximately Gaussian, they often have long non-gaussian tails. While 99% of the time a Gaussian distribution will correctly model your data, many foul-ups result from that other 1%. It's usually good practice to simulate your data to see if the distributions of quantities you think are Gaussian really follow a Gaussian distribution. Common example: the ratio of two numbers with Gaussian distributions is itself often not very Gaussian (although in certain limits it may be). Physics
15 A slightly non-trivial example Two measurements (X & Y) are drawn from two separate normal distributions. The first distribution has mean=5 & RMS=2. The second has mean=3 & RMS=1. The correlation coefficient of the two distributions is =-0.5. What is the distribution of the sum Z=X+Y? Physics
16 A slightly non-trivial example Two measurements (X & Y) are drawn from two separate normal distributions. The first distribution has mean=5 & RMS=2. The second has mean=3 & RMS=1. The correlation coefficient of the two distributions is =-0.5. What is the distribution of the sum Z=X+Y? First, recognize that the sum of two Gaussians is itself Gaussian, even if there is a correlation between the two. To see this, imagine that we drew two truly independent Gaussian random variables X and W. Then we could form a linear combination Y=aX+bW. Y would clearly be Gaussian, although correlated with X. Then Z=X+Y=X+aX+bW=(a+1)X+bW is the sum of two truly independent Gaussian variables itself. So Z must be a Gaussian. Physics
17 A slightly non-trivial example Two measurements (X & Y) are drawn from two separate normal distributions. The first distribution has mean=5 & RMS=2. The second has mean=3 & RMS=1. The correlation coefficient of the two distributions is =-0.5. What is the distribution of the sum Z=X+Y? Now, recognizing that Z is Gaussian, all we need to figure out are its mean and RMS. First the mean: X Y = dx dy P X,Y X Y = dx dy P X,Y X dx dy P X,Y Y = X Y This is just equal to 5+3 = 8. Physics
18 A slightly non-trivial example Two measurements (X & Y) are drawn from two separate normal distributions. The first distribution has mean=5 & RMS=2. The second has mean=3 & RMS=1. The correlation coefficient of the two distributions is =-0.5. What is the distribution of the sum Z=X+Y? Now for the RMS. Use V(Z)=cov(Z,Z)=cov(X+Y,X+Y) V(Z) = cov(x,x) + 2 cov(x,y) + cov(y,y) = x x y + y 2 = (2)(2) + 2(2)(1)(-0.5) + (1)(1) = 3 So Z is a Gaussian with mean=8 and RMS of =sqrt(3) Physics
19 Binomial Distributions Many outcomes are binary---yes/no, heads/tails, etc. Ex. You flip N unbalanced coins. Each coin has probability p of landing heads. What is the probability that you get m heads (and N-m tails)? The binomial distribution: P m p, N = p m 1 p N m N! m! N m! First term: probability of m coins all getting heads Second term: probability of N-m coins all getting tails Third term: number of different ways to pick m different coins from a collection of N total be to heads. Physics
20 Binomial distributions P m p, N = p m 1 p N m N! m! N m! Mean = Np Variance = Np(1-p) Notice that the mean and variance both scale linearly with N. This is understandable---flipping N coins is the sum of N independent binomial variables. When N gets big, the distribution looks increasingly Gaussian! 20
21 But a binomial distribution isn't a Gaussian! Gaussian approximation fails out on the tails... Physics
22 More on the binomial distribution In the limit of large Np, Gaussian approximation is decent so long as P(m=0) P(m=N) 0, provided you don't care much about tails. Beware a common error: =sqrt(np(1-p)), not =sqrt(m)=sqrt(np). The latter is only true if p 1. The error is not always just the simple square root of the number of entries! Use a binomial distribution to model most processes with two outcomes: Detection efficiency (either we detect or we don't) Cut rejection Win-loss records (although beware correlations between teams that play in the same league) Physics
23 An example from the world of sports... Consider a best-of-seven series... the first team to win four games takes the prize. We have a model which predicts that Team A is favoured in any game with p=0.6. What is the probability that A wins the series? How could we approach this problem? Physics
24 Best of 7 series: brute force Easiest approach may be simply to list the possibilities: A. Win in 4 straight games. Probability = p 4 B. Win in 5 games. Four choices for which game the team gets to lose. Probability = 4p 4 (1-p) C. Win in 6 games. Choose 2 of the previous five games to lose. Probability = C(5,2)p 4 (1-p) 2 = 10p 4 (1-p) 2 D. Win in 7 games. Choose 3 of the previous six games to lose. Probability = C(6,3)p 4 (1-p) 3 = 20p 4 (1-p) 3 Prob p = p p 10 1 p p 3 Physics
25 Best of 7 series: outcomes Prob p = p p 10 1 p p 3 Symmetry evident between p and 1-p, which makes good logical sense For p=0.6, probability of series win is only 71% Physics
26 Best of 7 series: online betting studies Efficient market hypothesis: if a market mis-estimates a risk, smart investors will figure this out and bet accordingly, driving the odds back to the correct value. There is significant evidence that this hypothesis (almost) holds in many real-life markets. See A Random Walk Down Wall Street for details.* Does this work for online sports betting? * Warning: reading this book may endanger your career in physics by getting you interested in quantitative analysis of financial markets. Physics
27 Best of 7 series: online betting studies I got interested in this during the 2006 baseball playoffs, as my beloved Cardinals came very close to collapsing entirely, yet went on to win the World Series. I used a coin flip model to predict series odds: All games treated as independent, with equal probability. In simplest case, assume p=0.5 More complicated case: using Bill James' Pythagorean Theorem to predict winning percentage of matchup: Runs Scored 2 p= Runs Scored 2 Runs Allowed 2 Physics
28 My brother is stupid. Younger brothers always are. He objected to my coin flip model: Assigning 50/50 odds is ludicrous when you know the Astros will start Clemens. If you want to estimate p, you should only look at recent records to estimate odds. How dare he deny my math! But the proof is in the pudding... Physics
29 What do the markets say? Odds for St. Louis to win NL Central title (going into final weekend): Coin flip model says 74.6%. Betting market said 74%. After St. Louis loses some ground to Houston, coin flip model says 59%. The betting markets predicted 61%. My brother's recency prior for p predicts 45%. Going into next to last day of season, my coin flip model says 89% for Cards. Betting market is mixed: odds for Cards to win are all over the map, but odds for Houston to win is right at 11%. Opportunity for arbitrage. Last day: coin flip and markets both predict ~93% Physics
30 In the middle of the first round of playoffs Coin Flip Betting Markets St Louis over San Diego 68.8% 66.1% Dodgers over Mets 50.0% 44.6% Twins over A's 31.3% 36.0% Yanks over Detroit 68.8% 85.5% One way to view the betting market odds is actually as an estimator of the p value for a matchup. For example, the market felt (wrongly) that Detroit was badly overmatched. In the end, CARDS WIN! Physics
31 Negative binomial distribution In a regular binomial distribution, you decide ahead of time how many times you'll flip the coin, and calculate the probability of getting k heads. In the negative binomial distribution, you decide how many heads you want to get, then calculate the probability that you have to flip the coin N times before getting that many heads. This gives you a probability distribution for N: P N k, p = N 1 k 1 pk 1 p N k Physics
32 Multinomial distribution We can generalize a binomial distribution to the case where there are more than two possible outcomes. Suppose there are k possible outcomes, and we do N trials. Let n i be the number of times that the i th outcome comes up, and let p i be the probability of getting outcome i in one trial. The probability of getting a certain distribution of n i is then: P n 1, n 2,...,n k p 1... p k = N! n 1! n 2!...n k! p n 1 n 1 p n k pk Note that there are important constraints on the parameters: k i p i =1 k i n i =N Physics
33 What is the multinomial distribution good for? Any problem in which there are several discrete outcomes (binomial distribution is a special case). Note that unlike the binomial distribution, which basically predicts one quantity (the number of heads---you get the number of tails for free), the multinomial distribution is a joint probability distribution for several variables (the various n i, of which all but one are independent). If you care about just one of these, you can marginalize over the other (sum them over all of their possible values) to get the probability distribution for the one you care about. This obviously will have a binomial distribution. A common application: binned data! If you sample independent trials from a distribution and bin the results, the numbers you predict for each bin follow the multinomial distribution. Physics
34 Dealing with binned data Very often you're going to deal with binned data. Maybe there are too many individual data points to handle efficiently. Maybe you binned it to make a pretty plot, then want to fit a function to the plot. Some gotchas: Nothing in the laws of statistics demands equal binning. Consider binning with equal statistics per bin. Beware bins with few data points. Many statistical tests implicitly assume Gaussian errors, which won't hold for small numbers. General rule of thumb: rebin until every bin has >5 events. Always remember that binning throws away information. Don't do it unless you must. Try to make bin size smaller than any relevant feature in the data. If statistics don't permit this, then you shouldn't be binning, at least for that part of the distribution. Physics
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
More informationMath 202-0 Quizzes Winter 2009
Quiz : Basic Probability Ten Scrabble tiles are placed in a bag Four of the tiles have the letter printed on them, and there are two tiles each with the letters B, C and D on them (a) Suppose one tile
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 3 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected
More informationMathematical Expectation
Mathematical Expectation Properties of Mathematical Expectation I The concept of mathematical expectation arose in connection with games of chance. In its simplest form, mathematical expectation is the
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationExploratory Data Analysis. Psychology 3256
Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find
More informationLecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000
Lecture 4 Nancy Pfenning Stats 000 Chapter 7: Probability Last time we established some basic definitions and rules of probability: Rule : P (A C ) = P (A). Rule 2: In general, the probability of one event
More informationStatistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined
Expectation Statistics and Random Variables Math 425 Introduction to Probability Lecture 4 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan February 9, 2009 When a large
More informationHow to Win the Stock Market Game
How to Win the Stock Market Game 1 Developing Short-Term Stock Trading Strategies by Vladimir Daragan PART 1 Table of Contents 1. Introduction 2. Comparison of trading strategies 3. Return per trade 4.
More informationProbability and Statistics Vocabulary List (Definitions for Middle School Teachers)
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence
More informationProblem sets for BUEC 333 Part 1: Probability and Statistics
Problem sets for BUEC 333 Part 1: Probability and Statistics I will indicate the relevant exercises for each week at the end of the Wednesday lecture. Numbered exercises are back-of-chapter exercises from
More informationBetting systems: how not to lose your money gambling
Betting systems: how not to lose your money gambling G. Berkolaiko Department of Mathematics Texas A&M University 28 April 2007 / Mini Fair, Math Awareness Month 2007 Gambling and Games of Chance Simple
More information9. Sampling Distributions
9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationFACT A computer CANNOT pick numbers completely at random!
1 THE ROULETTE BIAS SYSTEM Please note that all information is provided as is and no guarantees are given whatsoever as to the amount of profit you will make if you use this system. Neither the seller
More informationCenter: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)
Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center
More informationPITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU
PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard
More informationIf A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C?
Problem 3 If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C? Suggested Questions to ask students about Problem 3 The key to this question
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationFinancial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2
Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Due Date: Friday, March 11 at 5:00 PM This homework has 170 points plus 20 bonus points available but, as always, homeworks are graded
More informationChapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
More informationconsider the number of math classes taken by math 150 students. how can we represent the results in one number?
ch 3: numerically summarizing data - center, spread, shape 3.1 measure of central tendency or, give me one number that represents all the data consider the number of math classes taken by math 150 students.
More informationThe Binomial Distribution
The Binomial Distribution James H. Steiger November 10, 00 1 Topics for this Module 1. The Binomial Process. The Binomial Random Variable. The Binomial Distribution (a) Computing the Binomial pdf (b) Computing
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction
More informationAn Introduction to Basic Statistics and Probability
An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random
More informationWHERE DOES THE 10% CONDITION COME FROM?
1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay
More information6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks
6.042/8.062J Mathematics for Comuter Science December 2, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Random Walks Gambler s Ruin Today we re going to talk about one-dimensional random walks. In
More informationSTA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science
STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science Mondays 2:10 4:00 (GB 220) and Wednesdays 2:10 4:00 (various) Jeffrey Rosenthal Professor of Statistics, University of Toronto
More information13.0 Central Limit Theorem
13.0 Central Limit Theorem Discuss Midterm/Answer Questions Box Models Expected Value and Standard Error Central Limit Theorem 1 13.1 Box Models A Box Model describes a process in terms of making repeated
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More information2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)
2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came
More informationMath Review. for the Quantitative Reasoning Measure of the GRE revised General Test
Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important
More informationStatistics 104: Section 6!
Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More informationTHE ROULETTE BIAS SYSTEM
1 THE ROULETTE BIAS SYSTEM Please note that all information is provided as is and no guarantees are given whatsoever as to the amount of profit you will make if you use this system. Neither the seller
More informationChapter 16. Law of averages. Chance. Example 1: rolling two dice Sum of draws. Setting up a. Example 2: American roulette. Summary.
Overview Box Part V Variability The Averages Box We will look at various chance : Tossing coins, rolling, playing Sampling voters We will use something called s to analyze these. Box s help to translate
More informationCA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction
CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More informationThe Normal Approximation to Probability Histograms. Dice: Throw a single die twice. The Probability Histogram: Area = Probability. Where are we going?
The Normal Approximation to Probability Histograms Where are we going? Probability histograms The normal approximation to binomial histograms The normal approximation to probability histograms of sums
More informationMONEY MANAGEMENT. Guy Bower delves into a topic every trader should endeavour to master - money management.
MONEY MANAGEMENT Guy Bower delves into a topic every trader should endeavour to master - money management. Many of us have read Jack Schwager s Market Wizards books at least once. As you may recall it
More informationPolynomials and Factoring. Unit Lesson Plan
Polynomials and Factoring Unit Lesson Plan By: David Harris University of North Carolina Chapel Hill Math 410 Dr. Thomas, M D. 2 Abstract This paper will discuss, and give, lesson plans for all the topics
More informationDiscrete Math in Computer Science Homework 7 Solutions (Max Points: 80)
Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you
More informationMA 1125 Lecture 14 - Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values.
MA 5 Lecture 4 - Expected Values Friday, February 2, 24. Objectives: Introduce expected values.. Means, Variances, and Standard Deviations of Probability Distributions Two classes ago, we computed the
More informationProbability and Expected Value
Probability and Expected Value This handout provides an introduction to probability and expected value. Some of you may already be familiar with some of these topics. Probability and expected value are
More informationDescriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion
Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research
More informationThe Math. P (x) = 5! = 1 2 3 4 5 = 120.
The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More informationRecursive Estimation
Recursive Estimation Raffaello D Andrea Spring 04 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March 8, 05 Notes: Notation: Unlessotherwisenoted,x, y,andz denoterandomvariables, f x
More information6 PROBABILITY GENERATING FUNCTIONS
6 PROBABILITY GENERATING FUNCTIONS Certain derivations presented in this course have been somewhat heavy on algebra. For example, determining the expectation of the Binomial distribution (page 5.1 turned
More informationThe overall size of these chance errors is measured by their RMS HALF THE NUMBER OF TOSSES NUMBER OF HEADS MINUS 0 400 800 1200 1600 NUMBER OF TOSSES
INTRODUCTION TO CHANCE VARIABILITY WHAT DOES THE LAW OF AVERAGES SAY? 4 coins were tossed 1600 times each, and the chance error number of heads half the number of tosses was plotted against the number
More informationMultivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
More informationFor a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )
Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationStat 20: Intro to Probability and Statistics
Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley 22 July 2014 By the end of this lecture... You will be able to: Determine what we expect the sum
More informationStatistics 100A Homework 3 Solutions
Chapter Statistics 00A Homework Solutions Ryan Rosario. Two balls are chosen randomly from an urn containing 8 white, black, and orange balls. Suppose that we win $ for each black ball selected and we
More informationA Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University
A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a
More informationMeasures of Central Tendency and Variability: Summarizing your Data for Others
Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :
More informationWEEK #22: PDFs and CDFs, Measures of Center and Spread
WEEK #22: PDFs and CDFs, Measures of Center and Spread Goals: Explore the effect of independent events in probability calculations. Present a number of ways to represent probability distributions. Textbook
More informationExperimental Designs (revisited)
Introduction to ANOVA Copyright 2000, 2011, J. Toby Mordkoff Probably, the best way to start thinking about ANOVA is in terms of factors with levels. (I say this because this is how they are described
More informationExample: Find the expected value of the random variable X. X 2 4 6 7 P(X) 0.3 0.2 0.1 0.4
MATH 110 Test Three Outline of Test Material EXPECTED VALUE (8.5) Super easy ones (when the PDF is already given to you as a table and all you need to do is multiply down the columns and add across) Example:
More informationThe Assumption(s) of Normality
The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew
More informationAlgebra I Vocabulary Cards
Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression
More informationThe Binomial Probability Distribution
The Binomial Probability Distribution MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2015 Objectives After this lesson we will be able to: determine whether a probability
More informationMath 431 An Introduction to Probability. Final Exam Solutions
Math 43 An Introduction to Probability Final Eam Solutions. A continuous random variable X has cdf a for 0, F () = for 0 <
More information3: Summary Statistics
3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes
More informationSTT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables
Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random
More informationSection 7C: The Law of Large Numbers
Section 7C: The Law of Large Numbers Example. You flip a coin 00 times. Suppose the coin is fair. How many times would you expect to get heads? tails? One would expect a fair coin to come up heads half
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationQuantitative Methods for Finance
Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain
More informationDef: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.
Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.
More informationChapter 5: Normal Probability Distributions - Solutions
Chapter 5: Normal Probability Distributions - Solutions Note: All areas and z-scores are approximate. Your answers may vary slightly. 5.2 Normal Distributions: Finding Probabilities If you are given that
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationCh. 3.1 # 3, 4, 7, 30, 31, 32
Math Elementary Statistics: A Brief Version, 5/e Bluman Ch. 3. # 3, 4,, 30, 3, 3 Find (a) the mean, (b) the median, (c) the mode, and (d) the midrange. 3) High Temperatures The reported high temperatures
More informationContemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific
Contemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific Name: The point value of each problem is in the left-hand margin. You
More informationLecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationUsing Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable
More informationIntro to Data Analysis, Economic Statistics and Econometrics
Intro to Data Analysis, Economic Statistics and Econometrics Statistics deals with the techniques for collecting and analyzing data that arise in many different contexts. Econometrics involves the development
More informationChapter 5. Discrete Probability Distributions
Chapter 5. Discrete Probability Distributions Chapter Problem: Did Mendel s result from plant hybridization experiments contradicts his theory? 1. Mendel s theory says that when there are two inheritable
More informationA Quick Algebra Review
1. Simplifying Epressions. Solving Equations 3. Problem Solving 4. Inequalities 5. Absolute Values 6. Linear Equations 7. Systems of Equations 8. Laws of Eponents 9. Quadratics 10. Rationals 11. Radicals
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationLecture 3: Continuous distributions, expected value & mean, variance, the normal distribution
Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution 8 October 2007 In this lecture we ll learn the following: 1. how continuous probability distributions differ
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationQuestion: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?
ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the
More informationEconomics 1011a: Intermediate Microeconomics
Lecture 12: More Uncertainty Economics 1011a: Intermediate Microeconomics Lecture 12: More on Uncertainty Thursday, October 23, 2008 Last class we introduced choice under uncertainty. Today we will explore
More informationMind on Statistics. Chapter 8
Mind on Statistics Chapter 8 Sections 8.1-8.2 Questions 1 to 4: For each situation, decide if the random variable described is a discrete random variable or a continuous random variable. 1. Random variable
More informationA New Interpretation of Information Rate
A New Interpretation of Information Rate reproduced with permission of AT&T By J. L. Kelly, jr. (Manuscript received March 2, 956) If the input symbols to a communication channel represent the outcomes
More informationTexas Hold em. From highest to lowest, the possible five card hands in poker are ranked as follows:
Texas Hold em Poker is one of the most popular card games, especially among betting games. While poker is played in a multitude of variations, Texas Hold em is the version played most often at casinos
More informationYou flip a fair coin four times, what is the probability that you obtain three heads.
Handout 4: Binomial Distribution Reading Assignment: Chapter 5 In the previous handout, we looked at continuous random variables and calculating probabilities and percentiles for those type of variables.
More informationCharacteristics of Binomial Distributions
Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation
More informationAMS 5 CHANCE VARIABILITY
AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and
More informationHow To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
More informationMAS108 Probability I
1 QUEEN MARY UNIVERSITY OF LONDON 2:30 pm, Thursday 3 May, 2007 Duration: 2 hours MAS108 Probability I Do not start reading the question paper until you are instructed to by the invigilators. The paper
More informationUNIT I: RANDOM VARIABLES PART- A -TWO MARKS
UNIT I: RANDOM VARIABLES PART- A -TWO MARKS 1. Given the probability density function of a continuous random variable X as follows f(x) = 6x (1-x) 0
More informationSolution Let us regress percentage of games versus total payroll.
Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars)
More information