the number of organisms in the squares of a haemocytometer? the number of goals scored by a football team in a match?

Size: px
Start display at page:

Download "the number of organisms in the squares of a haemocytometer? the number of goals scored by a football team in a match?"

Transcription

1 Poisson Random Variables (Rees: ) Examples: What is the distribution of: the number of organisms in the squares of a haemocytometer? the number of hits on a web site in one hour? the number of goals scored by a football team in a match? the number of cracks in a rail track? the number of sultanas in a slice of fruit cake? The Poisson distribution often provides a good model for the number of events occurring in time or space. Space can be linear, area or volume. In order to decide whether to use the Binomial or the Poisson distribution, consider whether there is a sample size involved (i.e. an upper limit on the number). If so, use the Binomial; if not, use the Poisson. Consider, for example, the number of calls arriving at a telephone exchange. We assume that: events which occur in time intervals that do not overlap are independent; the underlying rate λ ( lambda ) at which calls arrive is constant. Under these conditions, if Y is the random variable denoting the number of calls actually made in one hour then Y has a Poisson distribution with parameter λ. We write The probability distribution of Y is Y P(λ). Pr(Y = r) = e λ λr r! where e is the well known mathematical constant (e = ). Most calculators have an e x button; this is called the exponential function. The formula for Poisson distribution probabilities simplifies when r = 0 ( no calls ), because λ 0 = 1 and 0! = 1. So Pr(Y = 0) = e λ Example: The mean number of bacteria in the single cell of a haemocytometer is 2. Let N be the number of bacteria actually observed in a cell. Find: (i) Pr(N = 0) (ii) Pr(N = 3) (iii) Pr(N > 2)

2 We assume that N has a Poisson distribution with rate parameter 2. Then: Pr(N = n) = e λ λn n! = e 2 2n n! (i) Pr(N = 0) = e ! = e 2 (simplifies for N = 0) = (ii) Pr(N = 3) = e ! = e (iii) We can calculate Pr(N > 2) as Check that this gives = = Pr(N > 2) = 1 Pr(N 2) = 1 (p(0) + p(1) + p(2)) ) = 1 (e 2 + e ! + e ! 1 ( ) = = We can also use the NCST tables to calculate Poisson probabilities. Table 2 of Lindley and Scott (pp 24 32) gives Pr(Y r) for λ from 0 up to 20. Using p25 gives Pr(N 2) = immediately. Rees gives a short version of Table 2 in Table C.2 (4th ed.). Note that Rees uses m instead of λ, while Lindley & Scott use µ. Example: Telephone calls arrive in an office at a rate of 5 calls per hour. Find the probability that there are: (i) exactly 2 calls in one hour; (ii) exactly 1 call in 15 minutes (iii) 10 or fewer calls in 2 hours.

3 We use Table 2; (you should check the first two answers by using the formula!) (i) X P(5) Pr(X = 2) = Pr(X 2) Pr(X 1) = = (ii) X P(5/4) Pr(X = 1) = Pr(X 1) Pr(X = 0) = = (iii) X P(10) Pr(X 10) = Note how the value of λ adjusts to take account of the time interval; for example, if the number of calls in one hour is P(5) then the number of calls in two hours is P(10).

4 Some properties of Poisson distribution The Poisson distribution is always skewed but the distribution becomes more nearly symmetrical as the rate parameter λ increases. The expected value of a Poisson distribution is the rate parameter λ The variance of a Poisson distribution is also the rate parameter λ Thus the standard deviation is λ Note that in relative terms, the spread gets less as the rate increases. Let X be the number of accidents per quarter at an accident black spot. Suppose that X P(4). Then the probability distribution of X looks like: Note that any value between 1 and 8 is likely to occur and more extreme values are possible! The theoretical mean and standard deviation of a Poisson random variable are (Rees 6.12): Mean = λ S.D. = λ. For X P(4), Mean = λ = 4 and S.D. = λ = 2 RECAP: Binomial and Poisson These two distributions are both used as models for counts. The Binomial is used when there is a clear upper limit n on the number of events recorded; there is no theoretical upper limit for the Poisson. Binomial: ( There ) are n independent trials each with probability of success p. n Pr(X = r) = p r (1 p) n r r The mean is np and the s.d. is np(1 p). Poisson: Events occur independently at rate λ. Pr(Y = r) = e λ λr r! The mean is λ and the s.d. is λ.

5 Normal Random Variables (Rees: ) Example: Consider relative frequency histograms of heights of classes of students. The sizes of the classes are (a) 50 (b) 250 (c) 1000 (d) As the class size increases, we can reduce the width of a bar in the histogram. We can imagine that with an infinite class size the histogram could be made smooth. The resulting smooth curve is often very similar to a particular form the Normal distribution. Here are some other examples of variables that could have Normal distributions. the weight of a bag of cement; the time taken to walk to the pub; the volume of beer in your glass. We use the Normal distribution as a model for many naturally occurring variables. For example, we might use it to calculate the proportion of weights or lengths that fall between two limits. Notation: A Normal distribution is determined by its mean, µ, and its standard deviation, σ. If X has a Normal distribution with mean µ and standard deviation σ we write X N(µ, σ 2 ). Beware!! The second parameter is the Variance and not the Standard Deviation. Thus, N(5, 9) refers to a Normal distribution with mean 5 and standard deviation 3. First, a special Normal distribution. Definition: We say that Z has the standard Normal distribution if Z N(0, 1). The mean of Z is 0 and its standard deviation is 1. Here is what the standard Normal distribution looks like: Examples of other Normal distributions N(5, 1) Mean 5 and standard deviation 1

6 N(10, 1) Mean 10 and standard deviation N(8, 4) Mean 8 and standard deviation Probability Density Functions These plots of Normal distributions are examples of probability density functions; the name can be abbreviated to density. These are similar to the probability function for a discrete random variable, but there are some important differences. Instead of lumps of probability at certain values, the probability of getting exactly any particular value is zero! Instead we have to consider the probability of being between two values. The density functions are scaled so that the area under the curve between the values equals the probability. This implies that the total area under the curve equals one. For some continuous random variables, there is an explicit formula for the probabilities. For others we have to use statistical tables or a computer package.

7 There is no formula to allow the direct calculation of probabilities for a standard Normal distribution. We have to use a computer or tables such as Table 4 of Lindley & Scott. NCST tables give P(Z z) for values of z 0. The symbol Φ(z) is usually used for this probability. This is called the Cumulative Probability Function of the standard Normal distribution. It is the area under the Probability Density Function of the standard Normal distribution. Example: If Z N(0, 1), use NCST Table 4 to find: (i) Pr(Z 1) Tabulated value: Pr(Z 1) = (ii) Pr(Z > 2) Tabulated value: Pr(Z 2) = So: Pr(Z > 2) = = (iii) Pr(Z < 1) By symmetry: Pr(Z < 1) = Pr(Z > +1) = 1 Pr(Z 1) = = Note how we use symmetry to help us here and how a quick picture keeps us on the right track. When we have a general Normal N(µ, σ 2 ), we can still use Table 4, provided that everything is put on a standard scale. This works because of the following properties of the Normal distribution. If X Normal with mean µ and s.d. σ and if a and b are constants, then (X - a ) Normal with mean (µ a) and s.d. σ X b Normal with mean µ b and s.d. σ b So: Z = X µ Normal with mean 0 and s.d. 1 σ Definition: If x is a value from a distribution with a mean of µ and a standard deviation of σ then the standard score (or z score) of x is z = x µ σ If x comes from a Normal distribution, z comes from a Standard Normal Distribution N(0, 1). In the previous example, the questions referred to values from the standard Normal distribution. How do we find probabilities for N(µ, σ 2 )? (Rees 7.3 gives some examples). In each case, the original question is converted to a question about the standard Normal distribution. The tables are then used as before. We may also need to use linear interpolation if the required value is between tabulated values, Example: The random variable X N(5, 9). Find:

8 (i) Pr(X < 8) (ii) Pr(X < 3) (iii) Pr(2 X 11) First, recall that the second parameter (9) is the variance, so the standard deviation is 9 = 3. ( (i) Pr(X < 8) = Pr Z < 8 5 ) 3 = Pr(Z < 1) = (ii) ( Pr(X < 3) = Pr Z < 3 5 ) 3 = Pr(Z < ) = 1 Pr(Z < ) = 1 ( ) ( ) = = Note the use of linear interpolation here to get a more accurate answer. (iii) ( 2 5 Pr(2 X 11) = Pr Z 11 5 ) 3 3 = Pr( 1 Z 2) = Pr(Z 2) (1 Pr(Z 1)) = ( ) = In each case, the numerical values in the original question are converted to z-scores before the NCST tables are used. Example: IQ tests are constructed so that the mean is 100 and the standard deviation is 15. What percentage of the population will get a score of more than 120? ( ) Pr(IQ > 120) = Pr Z > 15 = Pr(Z > ) = 1 Pr(Z < ) = = 9.12% Sometimes we need to reverse the above process. NCST Table 5 allows us to do this.

9 Example: What is the IQ score such that only 1% of population do better? From NCST Table 5, we have: Pr(Z > ) = 1% The standardisation is reversed by multiplying by the standard deviation and then adding in the mean. So the required IQ score is: IQ = = Suppose the Normal distribution is used to model a continuous variable and that we wish to find the probability of getting a particular value. Example: In a certain population, male heights are Normally distributed with mean 170 cm and standard deviation 5 cm. If heights are recorded to the nearest cm, then the probability that an individual is 170 cm should be taken as being the probability of being in the interval (169.5, 170.5). The corresponding z-score interval is (-0.1, 0.1) From tables P(Z < 0.1) = So P( 0.1 < Z < 0.1) = 2( ) = The idea that was used in this example is called a Continuity Correction. Example: Suppose that student female heights have a Normal distribution with mean µ = 163 cm and with standard deviation σ = 6 cm. Let X be the height of a randomly chosen student. Find the probabilities that: (i) X is greater than 170 cm. (ii) X is more than 164 cm and less than 171 cm (iii) X is 149 cm or less. (i) Pr(X > 170) ( ) = Pr Z > 6 = Pr(Z > 1.25) = 1 Pr(Z < 1.25) = = Note the use of the continuity correction, which assumes that heights are taken to nearest cm. Heights of 171 or more are included and 170 or less are excluded, so the division is taken to be at cm.

10 (ii) Pr(164 < X < 171) ( = Pr < Z < 6 = Pr(0.25 < Z < 1.25) = = ) (iii) Pr(X 149) ( ) = Pr Z < 6 = Pr(Z < 2.25) = 1 Pr(Z < 2.25) = = Example: What is the height such that 10% of female students are taller? For standard Normal distribution, NCST Table 5 tells us that 10% of the population are bigger than So the required height is: = cm. Models using Discrete and Continuous Distributions The previous sections have introduced Binomial, Poisson and Normal distributions. These can all be derived using probability theory from sets of assumptions; we did this for Binomial distribution. These are all useful as models for real data and allow us to make predictions. If the process that generated the data matches the assumptions of a distribution, then we can be confident in the predictions. The predictions will also be good if the assumptions are close to reality. Example: Consider a binary variable. Random sampling with replacement from any population implies that data will follow a Binomial distribution. Random sampling without replacement from a large population implies that Binomial distribution will be a good approximation. A distribution that is chosen empirically can also make useful predictions. For example, proportion of student heights within a range.

11 Regression We are often interested in the relationship between two or more variables. This can arise from surveys in which several variables are measured on each unit or from experiments in which some variables are modified and other variables observed. Note: Data sets arising from these two types of situation cannot be distinguished in general, but the interpretation is different. Example: A study for an environmental impact assessment measured the flow rate against the depth at a site on a stream. Depth (m) Flow Rate (m/s) Depth (m) Flow Rate (m/s) Depth (m) Flow Rate (m/s) Note that the choice of which depths to use was made in advance; that is why they are spaced at fixed intervals. Flow Rate 7.5+ x - x - - x - x x x - x - x x x - x Depth The plot suggests a straight line relationship. It is often useful to be able to predict the values of one variable from another variable. To do this, we need to formulate a model. The model should allow for the variability that is present. A possible model is: Flow = α + β Depth with variability about the line being independent samples from a Normal distribution with unknown variance σ 2. More generally: y i = α + βx i + ǫ i

12 where ǫ i N(0,σ 2 ) and independent. ǫ is the Greek letter epsilon. This is called a linear regression model. Notes: The model includes two parts: the functional part and the part which models the variability about the function. Many of the laws of physics and chemistry started out as empirical observations of this sort. The observed variability was often just measurement error. In other sciences, such as biology, there is often variability inherent in the material which is much greater than any errors.

13 Fitting the Model There is a general method for fitting models of this type. It is called the Method of Least Squares. This method is optimal for predicting the y variable from the x variable(s) if the model for the variability is as specified above. To fit the model, we minimise squared deviations in the y direction. i.e. find ˆα, ˆβ to minimise: (y i α βx i ) 2 Notes: At school, this model is often given as y = mx + c and the line is fitted by eye. Predicting x from y gives different answers. i If the x values were chosen, different methods are needed if we wish to predict x from y. This is needed in assay systems. We calculate: S xx = x 2 i ( x i ) 2 n S xy = x i y i ( x i ) ( y i ) n S yy = yi 2 ( y i ) 2 n Note: S yy = (n 1) Variance (y) = (y i ȳ) 2 Then: ˆβ = S xy S xx ˆα = 1 ( yi n ˆβ ) x i = ȳ ˆβ x Thus the line goes through the point ( x, ȳ). The fitted line y = ˆα + ˆβx is said to be the regression of Y on X. Note: The recommended Casio calculators provide short cut ways of carrying out the calculations.

14 Regression Summary (so far) Data were collected on a variable (y) at given values of another variable (x). A scatter plot of the two variables suggested a straight line relationship. Variability about the straight line appeared to be roughly constant. Least squares was used to estimate α and β, the model parameters. In the example, there was only one y value for each x value, but the method is also applicable when there are many y values (flow rate at different points with same depth). If the model assumptions are correct, the fitted line y = ˆα + ˆβx gives the best estimate of y for any given value of x. This value is called the fitted value at x. The model can also make predictions of y for values of x where no measurements were taken. Example: Flow rate data. n = 11 xi = 6.05 x = 0.55 yi = 50.9 ȳ = x 2 i = S xx = xi y i = S xy = 2.63 y 2 i = S yy = ˆβ = S xy = 2.63 S xx = ˆα = The fitted model can be used to predict the flow rate for any specified depth: i.e. ŷ = ˆα + ˆβx. However, the value of ˆα suggests that it could be dangerous to extrapolate from this model! Digression: The word regression literally means stepping back. The term originated when the results from units that were measured on two occasions were compared. The best (or worst) on the first occasion was rarely the best on the second occasion. Comparing the second set of results showed that, on average, units in the first set had regressed towards the mean. Although most uses of regression are not like this, the name has stuck.

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

8. THE NORMAL DISTRIBUTION

8. THE NORMAL DISTRIBUTION 8. THE NORMAL DISTRIBUTION The normal distribution with mean μ and variance σ 2 has the following density function: The normal distribution is sometimes called a Gaussian Distribution, after its inventor,

More information

Pr(X = x) = f(x) = λe λx

Pr(X = x) = f(x) = λe λx Old Business - variance/std. dev. of binomial distribution - mid-term (day, policies) - class strategies (problems, etc.) - exponential distributions New Business - Central Limit Theorem, standard error

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

More information

Lecture 5 : The Poisson Distribution

Lecture 5 : The Poisson Distribution Lecture 5 : The Poisson Distribution Jonathan Marchini November 10, 2008 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Unit 7: Normal Curves

Unit 7: Normal Curves Unit 7: Normal Curves Summary of Video Histograms of completely unrelated data often exhibit similar shapes. To focus on the overall shape of a distribution and to avoid being distracted by the irregularities

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Probability Distributions

Probability Distributions Learning Objectives Probability Distributions Section 1: How Can We Summarize Possible Outcomes and Their Probabilities? 1. Random variable 2. Probability distributions for discrete random variables 3.

More information

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

The normal approximation to the binomial

The normal approximation to the binomial The normal approximation to the binomial The binomial probability function is not useful for calculating probabilities when the number of trials n is large, as it involves multiplying a potentially very

More information

You flip a fair coin four times, what is the probability that you obtain three heads.

You flip a fair coin four times, what is the probability that you obtain three heads. Handout 4: Binomial Distribution Reading Assignment: Chapter 5 In the previous handout, we looked at continuous random variables and calculating probabilities and percentiles for those type of variables.

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is. Some Continuous Probability Distributions CHAPTER 6: Continuous Uniform Distribution: 6. Definition: The density function of the continuous random variable X on the interval [A, B] is B A A x B f(x; A,

More information

Probability Distributions

Probability Distributions CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution

More information

The Normal Distribution

The Normal Distribution Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution

More information

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

6 3 The Standard Normal Distribution

6 3 The Standard Normal Distribution 290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

More information

The normal approximation to the binomial

The normal approximation to the binomial The normal approximation to the binomial In order for a continuous distribution (like the normal) to be used to approximate a discrete one (like the binomial), a continuity correction should be used. There

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

z-scores AND THE NORMAL CURVE MODEL

z-scores AND THE NORMAL CURVE MODEL z-scores AND THE NORMAL CURVE MODEL 1 Understanding z-scores 2 z-scores A z-score is a location on the distribution. A z- score also automatically communicates the raw score s distance from the mean A

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Probability. Distribution. Outline

Probability. Distribution. Outline 7 The Normal Probability Distribution Outline 7.1 Properties of the Normal Distribution 7.2 The Standard Normal Distribution 7.3 Applications of the Normal Distribution 7.4 Assessing Normality 7.5 The

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION 6. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION It is sometimes difficult to directly compute probabilities for a binomial (n, p) random variable, X. We need a different table for each value of

More information

WHERE DOES THE 10% CONDITION COME FROM?

WHERE DOES THE 10% CONDITION COME FROM? 1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay

More information

Binomial Sampling and the Binomial Distribution

Binomial Sampling and the Binomial Distribution Binomial Sampling and the Binomial Distribution Characterized by two mutually exclusive events." Examples: GENERAL: {success or failure} {on or off} {head or tail} {zero or one} BIOLOGY: {dead or alive}

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

Continuous Random Variables

Continuous Random Variables Chapter 5 Continuous Random Variables 5.1 Continuous Random Variables 1 5.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize and understand continuous

More information

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER seven Statistical Analysis with Excel CHAPTER chapter OVERVIEW 7.1 Introduction 7.2 Understanding Data 7.3 Relationships in Data 7.4 Distributions 7.5 Summary 7.6 Exercises 147 148 CHAPTER 7 Statistical

More information

The Normal Distribution

The Normal Distribution The Normal Distribution Continuous Distributions A continuous random variable is a variable whose possible values form some interval of numbers. Typically, a continuous variable involves a measurement

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Important Probability Distributions OPRE 6301

Important Probability Distributions OPRE 6301 Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in real-life applications that they have been given their own names.

More information

Notes on Continuous Random Variables

Notes on Continuous Random Variables Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes

More information

Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

More information

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS 1. If two events (both with probability greater than 0) are mutually exclusive, then: A. They also must be independent. B. They also could

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

The Standard Normal distribution

The Standard Normal distribution The Standard Normal distribution 21.2 Introduction Mass-produced items should conform to a specification. Usually, a mean is aimed for but due to random errors in the production process we set a tolerance

More information

AP STATISTICS 2010 SCORING GUIDELINES

AP STATISTICS 2010 SCORING GUIDELINES 2010 SCORING GUIDELINES Question 4 Intent of Question The primary goals of this question were to (1) assess students ability to calculate an expected value and a standard deviation; (2) recognize the applicability

More information

Basic Probability and Statistics Review. Six Sigma Black Belt Primer

Basic Probability and Statistics Review. Six Sigma Black Belt Primer Basic Probability and Statistics Review Six Sigma Black Belt Primer Pat Hammett, Ph.D. January 2003 Instructor Comments: This document contains a review of basic probability and statistics. It also includes

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Lecture 7: Continuous Random Variables

Lecture 7: Continuous Random Variables Lecture 7: Continuous Random Variables 21 September 2005 1 Our First Continuous Random Variable The back of the lecture hall is roughly 10 meters across. Suppose it were exactly 10 meters, and consider

More information

AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

More information

Normal Distribution as an Approximation to the Binomial Distribution

Normal Distribution as an Approximation to the Binomial Distribution Chapter 1 Student Lecture Notes 1-1 Normal Distribution as an Approximation to the Binomial Distribution : Goals ONE TWO THREE 2 Review Binomial Probability Distribution applies to a discrete random variable

More information

ST 371 (IV): Discrete Random Variables

ST 371 (IV): Discrete Random Variables ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 5.1 Homework Answers 5.7 In the proofreading setting if Exercise 5.3, what is the smallest number of misses m with P(X m)

More information

The Binomial Distribution

The Binomial Distribution The Binomial Distribution James H. Steiger November 10, 00 1 Topics for this Module 1. The Binomial Process. The Binomial Random Variable. The Binomial Distribution (a) Computing the Binomial pdf (b) Computing

More information

Math 461 Fall 2006 Test 2 Solutions

Math 461 Fall 2006 Test 2 Solutions Math 461 Fall 2006 Test 2 Solutions Total points: 100. Do all questions. Explain all answers. No notes, books, or electronic devices. 1. [105+5 points] Assume X Exponential(λ). Justify the following two

More information

Review. March 21, 2011. 155S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results

Review. March 21, 2011. 155S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results MAT 155 Statistical Analysis Dr. Claude Moore Cape Fear Community College Chapter 7 Estimates and Sample Sizes 7 1 Review and Preview 7 2 Estimating a Population Proportion 7 3 Estimating a Population

More information

table to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1.

table to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1. Review Problems for Exam 3 Math 1040 1 1. Find the probability that a standard normal random variable is less than 2.37. Looking up 2.37 on the normal table, we see that the probability is 0.9911. 2. Find

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Unit 4 The Bernoulli and Binomial Distributions

Unit 4 The Bernoulli and Binomial Distributions PubHlth 540 4. Bernoulli and Binomial Page 1 of 19 Unit 4 The Bernoulli and Binomial Distributions Topic 1. Review What is a Discrete Probability Distribution... 2. Statistical Expectation.. 3. The Population

More information

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

12.5: CHI-SQUARE GOODNESS OF FIT TESTS 125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

More information

Math 151. Rumbos Spring 2014 1. Solutions to Assignment #22

Math 151. Rumbos Spring 2014 1. Solutions to Assignment #22 Math 151. Rumbos Spring 2014 1 Solutions to Assignment #22 1. An experiment consists of rolling a die 81 times and computing the average of the numbers on the top face of the die. Estimate the probability

More information

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches) PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

More information

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1 Lecture 2: Discrete Distributions, Normal Distributions Chapter 1 Reminders Course website: www. stat.purdue.edu/~xuanyaoh/stat350 Office Hour: Mon 3:30-4:30, Wed 4-5 Bring a calculator, and copy Tables

More information

Without data, all you are is just another person with an opinion.

Without data, all you are is just another person with an opinion. OCR Statistics Module Revision Sheet The S exam is hour 30 minutes long. You are allowed a graphics calculator. Before you go into the exam make sureyou are fully aware of the contents of theformula booklet

More information

Stats on the TI 83 and TI 84 Calculator

Stats on the TI 83 and TI 84 Calculator Stats on the TI 83 and TI 84 Calculator Entering the sample values STAT button Left bracket { Right bracket } Store (STO) List L1 Comma Enter Example: Sample data are {5, 10, 15, 20} 1. Press 2 ND and

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Point and Interval Estimates

Point and Interval Estimates Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Lesson 17: Margin of Error When Estimating a Population Proportion

Lesson 17: Margin of Error When Estimating a Population Proportion Margin of Error When Estimating a Population Proportion Classwork In this lesson, you will find and interpret the standard deviation of a simulated distribution for a sample proportion and use this information

More information

Questions and Answers

Questions and Answers GNH7/GEOLGG9/GEOL2 EARTHQUAKE SEISMOLOGY AND EARTHQUAKE HAZARD TUTORIAL (6): EARTHQUAKE STATISTICS Question. Questions and Answers How many distinct 5-card hands can be dealt from a standard 52-card deck?

More information

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

Chapter 5: Normal Probability Distributions - Solutions

Chapter 5: Normal Probability Distributions - Solutions Chapter 5: Normal Probability Distributions - Solutions Note: All areas and z-scores are approximate. Your answers may vary slightly. 5.2 Normal Distributions: Finding Probabilities If you are given that

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Math 202-0 Quizzes Winter 2009

Math 202-0 Quizzes Winter 2009 Quiz : Basic Probability Ten Scrabble tiles are placed in a bag Four of the tiles have the letter printed on them, and there are two tiles each with the letters B, C and D on them (a) Suppose one tile

More information