the number of organisms in the squares of a haemocytometer? the number of goals scored by a football team in a match?

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "the number of organisms in the squares of a haemocytometer? the number of goals scored by a football team in a match?"

Transcription

1 Poisson Random Variables (Rees: ) Examples: What is the distribution of: the number of organisms in the squares of a haemocytometer? the number of hits on a web site in one hour? the number of goals scored by a football team in a match? the number of cracks in a rail track? the number of sultanas in a slice of fruit cake? The Poisson distribution often provides a good model for the number of events occurring in time or space. Space can be linear, area or volume. In order to decide whether to use the Binomial or the Poisson distribution, consider whether there is a sample size involved (i.e. an upper limit on the number). If so, use the Binomial; if not, use the Poisson. Consider, for example, the number of calls arriving at a telephone exchange. We assume that: events which occur in time intervals that do not overlap are independent; the underlying rate λ ( lambda ) at which calls arrive is constant. Under these conditions, if Y is the random variable denoting the number of calls actually made in one hour then Y has a Poisson distribution with parameter λ. We write The probability distribution of Y is Y P(λ). Pr(Y = r) = e λ λr r! where e is the well known mathematical constant (e = ). Most calculators have an e x button; this is called the exponential function. The formula for Poisson distribution probabilities simplifies when r = 0 ( no calls ), because λ 0 = 1 and 0! = 1. So Pr(Y = 0) = e λ Example: The mean number of bacteria in the single cell of a haemocytometer is 2. Let N be the number of bacteria actually observed in a cell. Find: (i) Pr(N = 0) (ii) Pr(N = 3) (iii) Pr(N > 2)

2 We assume that N has a Poisson distribution with rate parameter 2. Then: Pr(N = n) = e λ λn n! = e 2 2n n! (i) Pr(N = 0) = e ! = e 2 (simplifies for N = 0) = (ii) Pr(N = 3) = e ! = e (iii) We can calculate Pr(N > 2) as Check that this gives = = Pr(N > 2) = 1 Pr(N 2) = 1 (p(0) + p(1) + p(2)) ) = 1 (e 2 + e ! + e ! 1 ( ) = = We can also use the NCST tables to calculate Poisson probabilities. Table 2 of Lindley and Scott (pp 24 32) gives Pr(Y r) for λ from 0 up to 20. Using p25 gives Pr(N 2) = immediately. Rees gives a short version of Table 2 in Table C.2 (4th ed.). Note that Rees uses m instead of λ, while Lindley & Scott use µ. Example: Telephone calls arrive in an office at a rate of 5 calls per hour. Find the probability that there are: (i) exactly 2 calls in one hour; (ii) exactly 1 call in 15 minutes (iii) 10 or fewer calls in 2 hours.

3 We use Table 2; (you should check the first two answers by using the formula!) (i) X P(5) Pr(X = 2) = Pr(X 2) Pr(X 1) = = (ii) X P(5/4) Pr(X = 1) = Pr(X 1) Pr(X = 0) = = (iii) X P(10) Pr(X 10) = Note how the value of λ adjusts to take account of the time interval; for example, if the number of calls in one hour is P(5) then the number of calls in two hours is P(10).

4 Some properties of Poisson distribution The Poisson distribution is always skewed but the distribution becomes more nearly symmetrical as the rate parameter λ increases. The expected value of a Poisson distribution is the rate parameter λ The variance of a Poisson distribution is also the rate parameter λ Thus the standard deviation is λ Note that in relative terms, the spread gets less as the rate increases. Let X be the number of accidents per quarter at an accident black spot. Suppose that X P(4). Then the probability distribution of X looks like: Note that any value between 1 and 8 is likely to occur and more extreme values are possible! The theoretical mean and standard deviation of a Poisson random variable are (Rees 6.12): Mean = λ S.D. = λ. For X P(4), Mean = λ = 4 and S.D. = λ = 2 RECAP: Binomial and Poisson These two distributions are both used as models for counts. The Binomial is used when there is a clear upper limit n on the number of events recorded; there is no theoretical upper limit for the Poisson. Binomial: ( There ) are n independent trials each with probability of success p. n Pr(X = r) = p r (1 p) n r r The mean is np and the s.d. is np(1 p). Poisson: Events occur independently at rate λ. Pr(Y = r) = e λ λr r! The mean is λ and the s.d. is λ.

5 Normal Random Variables (Rees: ) Example: Consider relative frequency histograms of heights of classes of students. The sizes of the classes are (a) 50 (b) 250 (c) 1000 (d) As the class size increases, we can reduce the width of a bar in the histogram. We can imagine that with an infinite class size the histogram could be made smooth. The resulting smooth curve is often very similar to a particular form the Normal distribution. Here are some other examples of variables that could have Normal distributions. the weight of a bag of cement; the time taken to walk to the pub; the volume of beer in your glass. We use the Normal distribution as a model for many naturally occurring variables. For example, we might use it to calculate the proportion of weights or lengths that fall between two limits. Notation: A Normal distribution is determined by its mean, µ, and its standard deviation, σ. If X has a Normal distribution with mean µ and standard deviation σ we write X N(µ, σ 2 ). Beware!! The second parameter is the Variance and not the Standard Deviation. Thus, N(5, 9) refers to a Normal distribution with mean 5 and standard deviation 3. First, a special Normal distribution. Definition: We say that Z has the standard Normal distribution if Z N(0, 1). The mean of Z is 0 and its standard deviation is 1. Here is what the standard Normal distribution looks like: Examples of other Normal distributions N(5, 1) Mean 5 and standard deviation 1

6 N(10, 1) Mean 10 and standard deviation N(8, 4) Mean 8 and standard deviation Probability Density Functions These plots of Normal distributions are examples of probability density functions; the name can be abbreviated to density. These are similar to the probability function for a discrete random variable, but there are some important differences. Instead of lumps of probability at certain values, the probability of getting exactly any particular value is zero! Instead we have to consider the probability of being between two values. The density functions are scaled so that the area under the curve between the values equals the probability. This implies that the total area under the curve equals one. For some continuous random variables, there is an explicit formula for the probabilities. For others we have to use statistical tables or a computer package.

7 There is no formula to allow the direct calculation of probabilities for a standard Normal distribution. We have to use a computer or tables such as Table 4 of Lindley & Scott. NCST tables give P(Z z) for values of z 0. The symbol Φ(z) is usually used for this probability. This is called the Cumulative Probability Function of the standard Normal distribution. It is the area under the Probability Density Function of the standard Normal distribution. Example: If Z N(0, 1), use NCST Table 4 to find: (i) Pr(Z 1) Tabulated value: Pr(Z 1) = (ii) Pr(Z > 2) Tabulated value: Pr(Z 2) = So: Pr(Z > 2) = = (iii) Pr(Z < 1) By symmetry: Pr(Z < 1) = Pr(Z > +1) = 1 Pr(Z 1) = = Note how we use symmetry to help us here and how a quick picture keeps us on the right track. When we have a general Normal N(µ, σ 2 ), we can still use Table 4, provided that everything is put on a standard scale. This works because of the following properties of the Normal distribution. If X Normal with mean µ and s.d. σ and if a and b are constants, then (X - a ) Normal with mean (µ a) and s.d. σ X b Normal with mean µ b and s.d. σ b So: Z = X µ Normal with mean 0 and s.d. 1 σ Definition: If x is a value from a distribution with a mean of µ and a standard deviation of σ then the standard score (or z score) of x is z = x µ σ If x comes from a Normal distribution, z comes from a Standard Normal Distribution N(0, 1). In the previous example, the questions referred to values from the standard Normal distribution. How do we find probabilities for N(µ, σ 2 )? (Rees 7.3 gives some examples). In each case, the original question is converted to a question about the standard Normal distribution. The tables are then used as before. We may also need to use linear interpolation if the required value is between tabulated values, Example: The random variable X N(5, 9). Find:

8 (i) Pr(X < 8) (ii) Pr(X < 3) (iii) Pr(2 X 11) First, recall that the second parameter (9) is the variance, so the standard deviation is 9 = 3. ( (i) Pr(X < 8) = Pr Z < 8 5 ) 3 = Pr(Z < 1) = (ii) ( Pr(X < 3) = Pr Z < 3 5 ) 3 = Pr(Z < ) = 1 Pr(Z < ) = 1 ( ) ( ) = = Note the use of linear interpolation here to get a more accurate answer. (iii) ( 2 5 Pr(2 X 11) = Pr Z 11 5 ) 3 3 = Pr( 1 Z 2) = Pr(Z 2) (1 Pr(Z 1)) = ( ) = In each case, the numerical values in the original question are converted to z-scores before the NCST tables are used. Example: IQ tests are constructed so that the mean is 100 and the standard deviation is 15. What percentage of the population will get a score of more than 120? ( ) Pr(IQ > 120) = Pr Z > 15 = Pr(Z > ) = 1 Pr(Z < ) = = 9.12% Sometimes we need to reverse the above process. NCST Table 5 allows us to do this.

9 Example: What is the IQ score such that only 1% of population do better? From NCST Table 5, we have: Pr(Z > ) = 1% The standardisation is reversed by multiplying by the standard deviation and then adding in the mean. So the required IQ score is: IQ = = Suppose the Normal distribution is used to model a continuous variable and that we wish to find the probability of getting a particular value. Example: In a certain population, male heights are Normally distributed with mean 170 cm and standard deviation 5 cm. If heights are recorded to the nearest cm, then the probability that an individual is 170 cm should be taken as being the probability of being in the interval (169.5, 170.5). The corresponding z-score interval is (-0.1, 0.1) From tables P(Z < 0.1) = So P( 0.1 < Z < 0.1) = 2( ) = The idea that was used in this example is called a Continuity Correction. Example: Suppose that student female heights have a Normal distribution with mean µ = 163 cm and with standard deviation σ = 6 cm. Let X be the height of a randomly chosen student. Find the probabilities that: (i) X is greater than 170 cm. (ii) X is more than 164 cm and less than 171 cm (iii) X is 149 cm or less. (i) Pr(X > 170) ( ) = Pr Z > 6 = Pr(Z > 1.25) = 1 Pr(Z < 1.25) = = Note the use of the continuity correction, which assumes that heights are taken to nearest cm. Heights of 171 or more are included and 170 or less are excluded, so the division is taken to be at cm.

10 (ii) Pr(164 < X < 171) ( = Pr < Z < 6 = Pr(0.25 < Z < 1.25) = = ) (iii) Pr(X 149) ( ) = Pr Z < 6 = Pr(Z < 2.25) = 1 Pr(Z < 2.25) = = Example: What is the height such that 10% of female students are taller? For standard Normal distribution, NCST Table 5 tells us that 10% of the population are bigger than So the required height is: = cm. Models using Discrete and Continuous Distributions The previous sections have introduced Binomial, Poisson and Normal distributions. These can all be derived using probability theory from sets of assumptions; we did this for Binomial distribution. These are all useful as models for real data and allow us to make predictions. If the process that generated the data matches the assumptions of a distribution, then we can be confident in the predictions. The predictions will also be good if the assumptions are close to reality. Example: Consider a binary variable. Random sampling with replacement from any population implies that data will follow a Binomial distribution. Random sampling without replacement from a large population implies that Binomial distribution will be a good approximation. A distribution that is chosen empirically can also make useful predictions. For example, proportion of student heights within a range.

11 Regression We are often interested in the relationship between two or more variables. This can arise from surveys in which several variables are measured on each unit or from experiments in which some variables are modified and other variables observed. Note: Data sets arising from these two types of situation cannot be distinguished in general, but the interpretation is different. Example: A study for an environmental impact assessment measured the flow rate against the depth at a site on a stream. Depth (m) Flow Rate (m/s) Depth (m) Flow Rate (m/s) Depth (m) Flow Rate (m/s) Note that the choice of which depths to use was made in advance; that is why they are spaced at fixed intervals. Flow Rate 7.5+ x - x - - x - x x x - x - x x x - x Depth The plot suggests a straight line relationship. It is often useful to be able to predict the values of one variable from another variable. To do this, we need to formulate a model. The model should allow for the variability that is present. A possible model is: Flow = α + β Depth with variability about the line being independent samples from a Normal distribution with unknown variance σ 2. More generally: y i = α + βx i + ǫ i

12 where ǫ i N(0,σ 2 ) and independent. ǫ is the Greek letter epsilon. This is called a linear regression model. Notes: The model includes two parts: the functional part and the part which models the variability about the function. Many of the laws of physics and chemistry started out as empirical observations of this sort. The observed variability was often just measurement error. In other sciences, such as biology, there is often variability inherent in the material which is much greater than any errors.

13 Fitting the Model There is a general method for fitting models of this type. It is called the Method of Least Squares. This method is optimal for predicting the y variable from the x variable(s) if the model for the variability is as specified above. To fit the model, we minimise squared deviations in the y direction. i.e. find ˆα, ˆβ to minimise: (y i α βx i ) 2 Notes: At school, this model is often given as y = mx + c and the line is fitted by eye. Predicting x from y gives different answers. i If the x values were chosen, different methods are needed if we wish to predict x from y. This is needed in assay systems. We calculate: S xx = x 2 i ( x i ) 2 n S xy = x i y i ( x i ) ( y i ) n S yy = yi 2 ( y i ) 2 n Note: S yy = (n 1) Variance (y) = (y i ȳ) 2 Then: ˆβ = S xy S xx ˆα = 1 ( yi n ˆβ ) x i = ȳ ˆβ x Thus the line goes through the point ( x, ȳ). The fitted line y = ˆα + ˆβx is said to be the regression of Y on X. Note: The recommended Casio calculators provide short cut ways of carrying out the calculations.

14 Regression Summary (so far) Data were collected on a variable (y) at given values of another variable (x). A scatter plot of the two variables suggested a straight line relationship. Variability about the straight line appeared to be roughly constant. Least squares was used to estimate α and β, the model parameters. In the example, there was only one y value for each x value, but the method is also applicable when there are many y values (flow rate at different points with same depth). If the model assumptions are correct, the fitted line y = ˆα + ˆβx gives the best estimate of y for any given value of x. This value is called the fitted value at x. The model can also make predictions of y for values of x where no measurements were taken. Example: Flow rate data. n = 11 xi = 6.05 x = 0.55 yi = 50.9 ȳ = x 2 i = S xx = xi y i = S xy = 2.63 y 2 i = S yy = ˆβ = S xy = 2.63 S xx = ˆα = The fitted model can be used to predict the flow rate for any specified depth: i.e. ŷ = ˆα + ˆβx. However, the value of ˆα suggests that it could be dangerous to extrapolate from this model! Digression: The word regression literally means stepping back. The term originated when the results from units that were measured on two occasions were compared. The best (or worst) on the first occasion was rarely the best on the second occasion. Comparing the second set of results showed that, on average, units in the first set had regressed towards the mean. Although most uses of regression are not like this, the name has stuck.

Hence, multiplying by 12, the 95% interval for the hourly rate is (965, 1435)

Hence, multiplying by 12, the 95% interval for the hourly rate is (965, 1435) Confidence Intervals for Poisson data For an observation from a Poisson distribution, we have σ 2 = λ. If we observe r events, then our estimate ˆλ = r : N(λ, λ) If r is bigger than 20, we can use this

More information

Statistics - Written Examination MEC Students - BOVISA

Statistics - Written Examination MEC Students - BOVISA Statistics - Written Examination MEC Students - BOVISA Prof.ssa A. Guglielmi 26.0.2 All rights reserved. Legal action will be taken against infringement. Reproduction is prohibited without prior consent.

More information

Definition The covariance of X and Y, denoted by cov(x, Y ) is defined by. cov(x, Y ) = E(X µ 1 )(Y µ 2 ).

Definition The covariance of X and Y, denoted by cov(x, Y ) is defined by. cov(x, Y ) = E(X µ 1 )(Y µ 2 ). Correlation Regression Bivariate Normal Suppose that X and Y are r.v. s with joint density f(x y) and suppose that the means of X and Y are respectively µ 1 µ 2 and the variances are 1 2. Definition The

More information

Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

More information

Lecture 5 : The Poisson Distribution. Jonathan Marchini

Lecture 5 : The Poisson Distribution. Jonathan Marchini Lecture 5 : The Poisson Distribution Jonathan Marchini Random events in time and space Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,

More information

Sampling Central Limit Theorem Proportions. Outline. 1 Sampling. 2 Central Limit Theorem. 3 Proportions

Sampling Central Limit Theorem Proportions. Outline. 1 Sampling. 2 Central Limit Theorem. 3 Proportions Outline 1 Sampling 2 Central Limit Theorem 3 Proportions Outline 1 Sampling 2 Central Limit Theorem 3 Proportions Populations and samples When we use statistics, we are trying to find out information about

More information

The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

More information

Continuous Random Variables and Probability Distributions. Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage

Continuous Random Variables and Probability Distributions. Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage 4 Continuous Random Variables and Probability Distributions Stat 4570/5570 Material from Devore s book (Ed 8) Chapter 4 - and Cengage Continuous r.v. A random variable X is continuous if possible values

More information

Statistics 100 Binomial and Normal Random Variables

Statistics 100 Binomial and Normal Random Variables Statistics 100 Binomial and Normal Random Variables Three different random variables with common characteristics: 1. Flip a fair coin 10 times. Let X = number of heads out of 10 flips. 2. Poll a random

More information

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

More information

E205 Final: Version B

E205 Final: Version B Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random

More information

3. Continuous Random Variables

3. Continuous Random Variables 3. Continuous Random Variables A continuous random variable is one which can take any value in an interval (or union of intervals) The values that can be taken by such a variable cannot be listed. Such

More information

Hypothesis tests, confidence intervals, and bootstrapping

Hypothesis tests, confidence intervals, and bootstrapping Hypothesis tests, confidence intervals, and bootstrapping Business Statistics 41000 Fall 2015 1 Topics 1. Hypothesis tests Testing a mean: H0 : µ = µ 0 Testing a proportion: H0 : p = p 0 Testing a difference

More information

Sampling Distribution of a Normal Variable

Sampling Distribution of a Normal Variable Ismor Fischer, 5/9/01 5.-1 5. Formal Statement and Examples Comments: Sampling Distribution of a Normal Variable Given a random variable. Suppose that the population distribution of is known to be normal,

More information

Statistics revision. Dr. Inna Namestnikova. Statistics revision p. 1/8

Statistics revision. Dr. Inna Namestnikova. Statistics revision p. 1/8 Statistics revision Dr. Inna Namestnikova inna.namestnikova@brunel.ac.uk Statistics revision p. 1/8 Introduction Statistics is the science of collecting, analyzing and drawing conclusions from data. Statistics

More information

The Normal Curve. The Normal Curve and The Sampling Distribution

The Normal Curve. The Normal Curve and The Sampling Distribution Discrete vs Continuous Data The Normal Curve and The Sampling Distribution We have seen examples of probability distributions for discrete variables X, such as the binomial distribution. We could use it

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

12.1 Inference for Linear Regression

12.1 Inference for Linear Regression 12.1 Inference for Linear Regression Least Squares Regression Line y = a + bx You might want to refresh your memory of LSR lines by reviewing Chapter 3! 1 Sample Distribution of b p740 Shape Center Spread

More information

Pr(X = x) = f(x) = λe λx

Pr(X = x) = f(x) = λe λx Old Business - variance/std. dev. of binomial distribution - mid-term (day, policies) - class strategies (problems, etc.) - exponential distributions New Business - Central Limit Theorem, standard error

More information

GCSE HIGHER Statistics Key Facts

GCSE HIGHER Statistics Key Facts GCSE HIGHER Statistics Key Facts Collecting Data When writing questions for questionnaires, always ensure that: 1. the question is worded so that it will allow the recipient to give you the information

More information

PROBLEM SET 1. For the first three answer true or false and explain your answer. A picture is often helpful.

PROBLEM SET 1. For the first three answer true or false and explain your answer. A picture is often helpful. PROBLEM SET 1 For the first three answer true or false and explain your answer. A picture is often helpful. 1. Suppose the significance level of a hypothesis test is α=0.05. If the p-value of the test

More information

Statistics GCSE Higher Revision Sheet

Statistics GCSE Higher Revision Sheet Statistics GCSE Higher Revision Sheet This document attempts to sum up the contents of the Higher Tier Statistics GCSE. There is one exam, two hours long. A calculator is allowed. It is worth 75% of the

More information

MATHEMATICS FOR ENGINEERS STATISTICS TUTORIAL 4 PROBABILITY DISTRIBUTIONS

MATHEMATICS FOR ENGINEERS STATISTICS TUTORIAL 4 PROBABILITY DISTRIBUTIONS MATHEMATICS FOR ENGINEERS STATISTICS TUTORIAL 4 PROBABILITY DISTRIBUTIONS CONTENTS Sample Space Accumulative Probability Probability Distributions Binomial Distribution Normal Distribution Poisson Distribution

More information

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

Topic 3 Correlation and Regression

Topic 3 Correlation and Regression Topic 3 Correlation and Regression Linear Regression I 1 / 15 Outline Principle of Least Squares Regression Equations Residuals 2 / 15 Introduction Covariance and correlation are measures of linear association.

More information

COURSE OUTLINE. Course Number Course Title Credits MAT201 Probability and Statistics for Science and Engineering 4. Co- or Pre-requisite

COURSE OUTLINE. Course Number Course Title Credits MAT201 Probability and Statistics for Science and Engineering 4. Co- or Pre-requisite COURSE OUTLINE Course Number Course Title Credits MAT201 Probability and Statistics for Science and Engineering 4 Hours: Lecture/Lab/Other 4 Lecture Co- or Pre-requisite MAT151 or MAT149 with a minimum

More information

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

More information

Chapter 2. The Normal Distribution

Chapter 2. The Normal Distribution Chapter 2 The Normal Distribution Lesson 2-1 Density Curve Review Graph the data Calculate a numerical summary of the data Describe the shape, center, spread and outliers of the data Histogram with Curve

More information

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem 1.1.2 Normal distribution 1.1.3 Approimating binomial distribution by normal 2.1 Central Limit Theorem Prof. Tesler Math 283 October 22, 214 Prof. Tesler 1.1.2-3, 2.1 Normal distribution Math 283 / October

More information

Sample Size Determination

Sample Size Determination Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)

More information

Unit 21 Student s t Distribution in Hypotheses Testing

Unit 21 Student s t Distribution in Hypotheses Testing Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

More information

Inference for Regression

Inference for Regression Simple Linear Regression Inference for Regression The simple linear regression model Estimating regression parameters; Confidence intervals and significance tests for regression parameters Inference about

More information

Review. Lecture 3: Probability Distributions. Poisson Distribution. May 8, 2012 GENOME 560, Spring Su In Lee, CSE & GS

Review. Lecture 3: Probability Distributions. Poisson Distribution. May 8, 2012 GENOME 560, Spring Su In Lee, CSE & GS Lecture 3: Probability Distributions May 8, 202 GENOME 560, Spring 202 Su In Lee, CSE & GS suinlee@uw.edu Review Random variables Discrete: Probability mass function (pmf) Continuous: Probability density

More information

Chapter 6. The Standard Deviation as a Ruler and the Normal Model. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 6. The Standard Deviation as a Ruler and the Normal Model. Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model Copyright 2012, 2008, 2005 Pearson Education, Inc. The Standard Deviation as a Ruler The trick in comparing very different-looking values

More information

The basics of probability theory. Distribution of variables, some important distributions

The basics of probability theory. Distribution of variables, some important distributions The basics of probability theory. Distribution of variables, some important distributions 1 Random experiment The outcome is not determined uniquely by the considered conditions. For example, tossing a

More information

Histograms and density curves

Histograms and density curves Histograms and density curves What s in our toolkit so far? Plot the data: histogram (or stemplot) Look for the overall pattern and identify deviations and outliers Numerical summary to briefly describe

More information

Comment on the Tree Diagrams Section

Comment on the Tree Diagrams Section Comment on the Tree Diagrams Section The reversal of conditional probabilities when using tree diagrams (calculating P (B A) from P (A B) and P (A B c )) is an example of Bayes formula, named after the

More information

where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis. Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

More information

Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline. Correlation & Regression, III. Review. Relationship between r and regression Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

More information

1 Maximum likelihood estimators

1 Maximum likelihood estimators Maximum likelihood estimators maxlik.tex and maxlik.pdf, March, 2003 Simplyput,ifweknowtheformoff X (x; θ) and have a sample from f X (x; θ), not necessarily random, the ml estimator of θ, θ ml,isthatθ

More information

4. Introduction to Statistics

4. Introduction to Statistics Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

More information

8. THE NORMAL DISTRIBUTION

8. THE NORMAL DISTRIBUTION 8. THE NORMAL DISTRIBUTION The normal distribution with mean μ and variance σ 2 has the following density function: The normal distribution is sometimes called a Gaussian Distribution, after its inventor,

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Chapter 6 Random Variables

Chapter 6 Random Variables Chapter 6 Random Variables Day 1: 6.1 Discrete Random Variables Read 340-344 What is a random variable? Give some examples. A numerical variable that describes the outcomes of a chance process. Examples:

More information

AP Statistics 1998 Scoring Guidelines

AP Statistics 1998 Scoring Guidelines AP Statistics 1998 Scoring Guidelines These materials are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use must be sought from the Advanced Placement

More information

Empirical Rule Confidence Intervals Finding a good sample size. Outline. 1 Empirical Rule. 2 Confidence Intervals. 3 Finding a good sample size

Empirical Rule Confidence Intervals Finding a good sample size. Outline. 1 Empirical Rule. 2 Confidence Intervals. 3 Finding a good sample size Outline 1 Empirical Rule 2 Confidence Intervals 3 Finding a good sample size Outline 1 Empirical Rule 2 Confidence Intervals 3 Finding a good sample size -3-2 -1 0 1 2 3 Question How much of the probability

More information

Chapter 3: Discrete Random Variable and Probability Distribution. January 28, 2014

Chapter 3: Discrete Random Variable and Probability Distribution. January 28, 2014 STAT511 Spring 2014 Lecture Notes 1 Chapter 3: Discrete Random Variable and Probability Distribution January 28, 2014 3 Discrete Random Variables Chapter Overview Random Variable (r.v. Definition Discrete

More information

Summary of Probability

Summary of Probability Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

More information

Continuous Random Variables

Continuous Random Variables Continuous Random Variables COMP 245 STATISTICS Dr N A Heard Contents 1 Continuous Random Variables 2 11 Introduction 2 12 Probability Density Functions 3 13 Transformations 5 2 Mean, Variance and Quantiles

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

MAT X Hypothesis Testing - Part I

MAT X Hypothesis Testing - Part I MAT 2379 3X Hypothesis Testing - Part I Definition : A hypothesis is a conjecture concerning a value of a population parameter (or the shape of the population). The hypothesis will be tested by evaluating

More information

An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

More information

Review Exam Suppose that number of cars that passes through a certain rural intersection is a Poisson process with an average rate of 3 per day.

Review Exam Suppose that number of cars that passes through a certain rural intersection is a Poisson process with an average rate of 3 per day. Review Exam 2 This is a sample of problems that would be good practice for the exam. This is by no means a guarantee that the problems on the exam will look identical to those on the exam but it should

More information

Minitab Guide. This packet contains: A Friendly Guide to Minitab. Minitab Step-By-Step

Minitab Guide. This packet contains: A Friendly Guide to Minitab. Minitab Step-By-Step Minitab Guide This packet contains: A Friendly Guide to Minitab An introduction to Minitab; including basic Minitab functions, how to create sets of data, and how to create and edit graphs of different

More information

This HW reviews the normal distribution, confidence intervals and the central limit theorem.

This HW reviews the normal distribution, confidence intervals and the central limit theorem. Homework 3 Solution This HW reviews the normal distribution, confidence intervals and the central limit theorem. (1) Suppose that X is a normally distributed random variable where X N(75, 3 2 ) (mean 75

More information

Regression, least squares

Regression, least squares Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

More information

13.2 Measures of Central Tendency

13.2 Measures of Central Tendency 13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

More information

Chapter 8. Linear Regression. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2012, 2008, 2005 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

Lecture 7 Linear Regression Diagnostics

Lecture 7 Linear Regression Diagnostics Lecture 7 Linear Regression Diagnostics BIOST 515 January 27, 2004 BIOST 515, Lecture 6 Major assumptions 1. The relationship between the outcomes and the predictors is (approximately) linear. 2. The error

More information

STATISTICS FOR PSYCH MATH REVIEW GUIDE

STATISTICS FOR PSYCH MATH REVIEW GUIDE STATISTICS FOR PSYCH MATH REVIEW GUIDE ORDER OF OPERATIONS Although remembering the order of operations as BEDMAS may seem simple, it is definitely worth reviewing in a new context such as statistics formulae.

More information

2. Describing Data. We consider 1. Graphical methods 2. Numerical methods 1 / 56

2. Describing Data. We consider 1. Graphical methods 2. Numerical methods 1 / 56 2. Describing Data We consider 1. Graphical methods 2. Numerical methods 1 / 56 General Use of Graphical and Numerical Methods Graphical methods can be used to visually and qualitatively present data and

More information

Math 62 Statistics Sample Exam Questions

Math 62 Statistics Sample Exam Questions Math 62 Statistics Sample Exam Questions 1. (10) Explain the difference between the distribution of a population and the sampling distribution of a statistic, such as the mean, of a sample randomly selected

More information

Lecture.7 Poisson Distributions - properties, Normal Distributions- properties. Theoretical Distributions. Discrete distribution

Lecture.7 Poisson Distributions - properties, Normal Distributions- properties. Theoretical Distributions. Discrete distribution Lecture.7 Poisson Distributions - properties, Normal Distributions- properties Theoretical distributions are Theoretical Distributions 1. Binomial distribution 2. Poisson distribution Discrete distribution

More information

Probability Models for Continuous Random Variables

Probability Models for Continuous Random Variables Density Probability Models for Continuous Random Variables At right you see a histogram of female length of life. (Births and deaths are recorded to the nearest minute. The data are essentially continuous.)

More information

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

More information

Regression. In this class we will:

Regression. In this class we will: AMS 5 REGRESSION Regression The idea behind the calculation of the coefficient of correlation is that the scatter plot of the data corresponds to a cloud that follows a straight line. This idea can be

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

Lecture 5 : The Poisson Distribution

Lecture 5 : The Poisson Distribution Lecture 5 : The Poisson Distribution Jonathan Marchini November 10, 2008 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,

More information

Population and sample; parameter and statistic. Sociology 360 Statistics for Sociologists I Chapter 11 Sampling Distributions. Question about Notation

Population and sample; parameter and statistic. Sociology 360 Statistics for Sociologists I Chapter 11 Sampling Distributions. Question about Notation Population and sample; parameter and statistic Sociology 360 Statistics for Sociologists I Chapter 11 Sampling Distributions The Population is the entire group we are interested in A parameter is a number

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Confidence Intervals about a Population Mean

Confidence Intervals about a Population Mean Confidence Intervals about a Population Mean MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2015 Motivation Goal: to estimate a population mean µ based on data collected

More information

Math 2015 Lesson 21. We discuss the mean and the median, two important statistics about a distribution. p(x)dx = 0.5

Math 2015 Lesson 21. We discuss the mean and the median, two important statistics about a distribution. p(x)dx = 0.5 ean and edian We discuss the mean and the median, two important statistics about a distribution. The edian The median is the halfway point of a distribution. It is the point where half the population has

More information

University of California, Los Angeles Department of Statistics. Normal distribution

University of California, Los Angeles Department of Statistics. Normal distribution University of California, Los Angeles Department of Statistics Statistics 100A Instructor: Nicolas Christou Normal distribution The normal distribution is the most important distribution. It describes

More information

Probability Distributions

Probability Distributions Learning Objectives Probability Distributions Section 1: How Can We Summarize Possible Outcomes and Their Probabilities? 1. Random variable 2. Probability distributions for discrete random variables 3.

More information

5.3. The Poisson distribution. Introduction. Prerequisites. Learning Outcomes. Learning Style

5.3. The Poisson distribution. Introduction. Prerequisites. Learning Outcomes. Learning Style The Poisson distribution 5.3 Introduction In this block we introduce a probability model which can be used when the outcome of an experiment is a random variable taking on positive integer values and where

More information

4: Probability. What is probability? Random variables (RVs)

4: Probability. What is probability? Random variables (RVs) 4: Probability b binomial µ expected value [parameter] n number of trials [parameter] N normal p probability of success [parameter] pdf probability density function pmf probability mass function RV random

More information

We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students:

We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students: MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having

More information

Random Variables. Consider a probability model (Ω, P ). Discrete Random Variables Chs. 2, 3, 4. Definition. A random variable is a function

Random Variables. Consider a probability model (Ω, P ). Discrete Random Variables Chs. 2, 3, 4. Definition. A random variable is a function Rom Variables Discrete Rom Variables Chs.,, 4 Rom Variables Probability Mass Functions Expectation: The Mean Variance Special Distributions Hypergeometric Binomial Poisson Joint Distributions Independence

More information

Expected Value. Let X be a discrete random variable which takes values in S X = {x 1, x 2,..., x n }

Expected Value. Let X be a discrete random variable which takes values in S X = {x 1, x 2,..., x n } Expected Value Let X be a discrete random variable which takes values in S X = {x 1, x 2,..., x n } Expected Value or Mean of X: E(X) = n x i p(x i ) i=1 Example: Roll one die Let X be outcome of rolling

More information

Chapter 6 Continuous Probability Distributions

Chapter 6 Continuous Probability Distributions Continuous Probability Distributions Learning Objectives 1. Understand the difference between how probabilities are computed for discrete and continuous random variables. 2. Know how to compute probability

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is. Some Continuous Probability Distributions CHAPTER 6: Continuous Uniform Distribution: 6. Definition: The density function of the continuous random variable X on the interval [A, B] is B A A x B f(x; A,

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

AP Statistics: Syllabus 3

AP Statistics: Syllabus 3 AP Statistics: Syllabus 3 Scoring Components SC1 The course provides instruction in exploring data. 4 SC2 The course provides instruction in sampling. 5 SC3 The course provides instruction in experimentation.

More information

Cents and the Central Limit Theorem Overview of Lesson GAISE Components Common Core State Standards for Mathematical Practice

Cents and the Central Limit Theorem Overview of Lesson GAISE Components Common Core State Standards for Mathematical Practice Cents and the Central Limit Theorem Overview of Lesson In this lesson, students conduct a hands-on demonstration of the Central Limit Theorem. They construct a distribution of a population and then construct

More information

Chapter 7 Inference for a Mean or Median

Chapter 7 Inference for a Mean or Median 7.1 Introduction 139 Chapter 7 Inference for a Mean or Median 7.1 Introduction There are many situations when we might wish to make inferences about the location of the center of the population distribution

More information

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally

More information

4.4 Other Discrete Distribution: Poisson and Hypergeometric S

4.4 Other Discrete Distribution: Poisson and Hypergeometric S 4.4 Other Discrete Distribution: Poisson and Hypergeometric S S time, area, volume, length Characteristics of a Poisson Random Variable 1. The experiment consists of counting the number of times x that

More information

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table 2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

More information

Continuous Random Variables

Continuous Random Variables Chapter 5 Continuous Random Variables 5.1 Continuous Random Variables 1 5.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize and understand continuous

More information

F. Farrokhyar, MPhil, PhD, PDoc

F. Farrokhyar, MPhil, PhD, PDoc Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Readings: Ha and Ha Textbook - Chapters 1 8 Appendix D & E (online) Plous - Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability

More information

Section 7.2 Confidence Intervals for Population Proportions

Section 7.2 Confidence Intervals for Population Proportions Section 7.2 Confidence Intervals for Population Proportions 2012 Pearson Education, Inc. All rights reserved. 1 of 83 Section 7.2 Objectives Find a point estimate for the population proportion Construct

More information

The Method of Least Squares

The Method of Least Squares 33 The Method of Least Squares KEY WORDS confidence interval, critical sum of squares, dependent variable, empirical model, experimental error, independent variable, joint confidence region, least squares,

More information