Statistics 104: Section 6!

Size: px
Start display at page:

Download "Statistics 104: Section 6!"

Transcription

1 Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC 705, Friday 12pm-1pm, SC 601 Section Outline 1. Reminders 2. Comments on HW #5 3. Week in review a. Sampling distributions of x and pˆ b. Confidence Intervals for x and pˆ 4. Practice problems and Midterm review! Reminders Midterm exam: Monday October 19 th, 8-9pm Science Center C and D. The exam is a full hour exam, so please arrive slightly before 8pm. You need a calculator. You may bring 1 page double-sided of notes. Exam covers lectures Midterm review: Head TF Kevin Rader will hold an optional midterm review session this Saturday October 17 th, 1-2:30pm in Science Center Lecture Hall D. He will go through about an hour of general review, and 30 minutes of exam-like problems. The PowerPoint slides and problems discussed (with solutions) will be posted online shortly after the review. This review session will also be videotaped. Midterm practice: Try to complete the Fall 2008 Statistics 104 midterm (posted on our class website) as a practice exam. Write up your cheat sheet, and sit with a calculator for 1 hour completing that exam. Comments on HW #5 Good work. A few notes: When looking for the variance of a sum of random variables, remember: Adding n random variables is not the same as multiplying a single random variable by n! If X 1, X 2,,X n are iid (independent and identically distributed) and Var(X i ) =σ 2, then Var(nX 1 ) = n 2 Var(X 1 ) = n 2 σ 2, but Var( i X i ) = i Var(X i ) = nσ 2. This leads the variance of a sum to be lower than the variance of a single random variable multiplied many times, because it is far less likely for all of the X i 's to be very high or very low than it is for a single X i to be very high or low. Two random variables X and Y are not necessary independent if their correlation (or covariance) equals zero! Why not? Correlation measures only linear relationships. To prove independence, test a formulation like P(X=x Y=y) =P(X=x)P(Y=y). When dealing with normal approximations to binomial probabilities, it is always a good idea to use the continuity correction. One way of remembering it is to rewrite the probability you're trying to calculate in two ways. For example, if you want P(X< 100), where X is binomially distributed, you rewrite this as P(X < 100) = P(X < 99), so the number you use for the normal approximation is ½( ) = 99.5.

2 Page 2 Week in Review 1. Sampling Distributions You all already know a lot about sampling distributions! Let s review them one more time: We have collected information on characteristic X from each unit in a sample of size n. We would like to know about the mean value of X in the population (if X is continuous meaning that it can take on an infinite number of values we denote this population mean µ, and if X is binomial meaning that it is the sum of iid 1 binary random variables we denote this population mean p, and we interpret it as a proportion). What can we do? We could just sample our entire population and then calculate the mean based on the data from every unit in the population. But this is too expensive! Instead, we could draw multiple samples of size n, calculate the sample mean (denoted x for continuous variables and pˆ for binomial variables) in each of these samples, and create a sampling distribution for the sample mean. We could then use the mean and standard deviation of this sampling distribution to draw inferences about the population mean. But this is also too expensive! Instead, we must rely on theoretical results regarding the sampling distribution of the sample mean. There are two important theoretical underpinnings to our inferential procedures: 1. Law of Large Numbers: If an independent sample is drawn from a population with mean µ, then as the number of observations in the sample increases (i.e., as n increases) the sample mean eventually becomes very close to (and stays close to) the population mean µ. This suggests that for large samples, the center of our sampling distribution will be the true population mean. 2. Central Limit Theorem: Loosely, for a sequence of n iid 1 random variables X 1, X 2,,X n with finite mean µ and variance σ 2 >0, as n increases, the distribution of the sample average of these random variables approaches the normal distribution with mean µ and variance σ 2 /n This suggests that the sampling distribution of the sample mean is approximately normal for large n, and that as our sample size increases the variability of our sampling distribution decreases. Note that this asymptotic (n ) normal distribution holds for the mean even if the underlying variable X is far from normally distributed! How large is large enough? For x the rule of thumb is n>30; for pˆ the rule of thumb is np>10 and n(1-p)>10. So, that s the theory. Here are the formulations we will use: For large enough n (check for yourself: what are the rules of thumb on sample size again?), For X with population mean µ and standard deviation σ, the mean x is distributed x ~ N(µ, σ/ n) For binomial X with population probability of success p in n trials, the proportion of successes pˆ is distributed pˆ ~N(p, p(1-p)/ n) [note that this is just a special case of the rule stated above] 1 Recall: iid stands for independent and identically distributed in the binomial case, we mean that each of the n trials is independent of all of the others and each trial as probability p of success; in genea;, we mean that each X i is independent of all the others and they all have the same mean and variance and family of distribution (e.g., Normal).

3 Page 3 Week in Review continued 2. Confidence Intervals The Law of Large Numbers (introduced in the previous section) tells use that our sample mean and sample proportion are good estimates of our population mean and population proportion (at least when our sample size is large). However, when we want to make inferences from our sample data to our population parameter of interest, we often prefer to report a reasonable range of values that we think may contain the true population value, rather than reporting a single sample value that almost certainly does not exactly equal the true population value. How do we determine what this range of values should be? We use the sampling distribution of our statistic, of course! Because the sampling distribution of the sample mean is Normal, we know that 95% of values will lie within 1.96 standard deviations of the mean. The mean of our sampling distribution is our point estimate (this is x for the sample mean and pˆ for the sample proportion). The values within 1.96 standard deviations represent other values that we think would be reasonable estimates of our population parameter. Note: 1.96 is the value we use for a 95% confidence interval. For other levels of confidence, we would use a different critical value. For example, for a 99% confidence intervals, our critical value would be instead of 1.96, because for the Normal distribution 99% of the area is contained within standard deviations from the mean. We calculate a 95% confidence interval (for sample with large enough n) as: Point estimate (x 1.96*σ/ n, x *σ/ n) or ( pˆ 1.96* pˆ (1- pˆ )/ n, pˆ * pˆ (1- pˆ )/ n) Critical value from Normal distribution: here, the 97.5 quantile We allow an α=5% error rate in a 95% interval, which translates to a 100- α/2 quantile cut-off, denoted Z α/2. Question: what is the error rate for a 99% interval and what would be the critical value? Standard deviation of the point estimate Point estimate Margin of error = Critical value* Standard deviation of point estimate Note that we have plugged in the sample estimate pˆ into the standard error formula for the proportion, but we have kept the population parameter σ in the formula for the sample mean x. Why? We introduce extra uncertainty when we estimate σ (note that there are two parameters to be estimated for the continuous variable case, µ and σ, but only one for the proportion, p). We will learn how to take this extra uncertainty into account soon! Once we calculate our confidence interval, how do we interpret it? Say we find that our 95% confidence interval for pˆ from some sample is (10, 16). Can we say that there is a 95% probability that our true population parameter p is between 10% and 16%? No!! Since p is a fixed (although unknown) value, it either is between 10% and 16% or it is not between 10% and 16% there is no probability associated with a constant (other than 0 or 1); it either is within a certain range or it is not. So, how can we interpret our interval? Note that although p is fixed, our interval is random because it changes based on the random sample we draw! Thus, we can associate probabilities with the interval. If we find that our 95% confidence interval for pˆ from some sample is (10, 16), we can say that there is 95% probability that our interval covers the true population parameter p. In other words, if we repeatedly draw samples of size n and calculate 95% CIs for each sample, then 95% of such CIs will cover the unknown population mean μ.

4 Page 4 Week in Review continued 3. Midterm Review Today we will complete several practice problems based on previous midterms. Due to time limitations, we will not complete a general review of topics you may encounter on the exam. However, I want to provide a few notes for you: See Lecture 17, Slides for a review of the general topics that may be covered on the midterm. There are a number of previous midterms posted on the class website. These can be useful for practice. I particularly recommend that you take the Fall 2008 midterm as a practice exam. Write your cheat sheet (1 page doubled sided of notes) and sit down with it and a calculator for 1 hour and take the exam. When looking at midterms prior to Fall 2008, be aware: o More attention has been paid recently to conditional probabilities, expectations, variances, and Bayes Rule. o Don t forget about the material we saw at the beginning of the term: skewness and relationships between the mean and median; transformations; normal quantile plots; using the Normal table; regression (including the line, predictions, residuals, and R 2 ); correlation (and covariance); study design. Areas of particular student difficulty on past exams include: o Probability and conditional probability (independence of events and of random variables) o Sums and differences of random variables (use the rules we discussed last week). Remember that the linear combination of independent normal random variables is normally distributed itself. o Binomial random variables. Don t forget to check the sample size conditions before using the normal approximation, and don t forget the continuity correction. When the sample size conditions are not met, be ready to use the actual binomial probabilities.

5 Page 5 Practice Problems! 1. Confidence Intervals: Financial Gains and Losses Note: We will use the normal distribution for this problem, but there is evidence that financial fluctuations are not normally distributed why might that be? Does the fact that these fluctuations are not normally distributed impact our confidence interval calculations? We are interested in measuring how the S&P 500 stock index fluctuates from day-to-day. We have the entire population over the course of the last 20+ years in a dataset, and our variable of interest is percent daily change = 100*(today s price yesterday s price)/(yesterday s price). A. If we were to sample individual daily stock fluctuations, where would 95% of our observations fall? B. If we were to sample 50 observations at a time, where would 95% of our sample means ( x ) fall? C. Create a sample of about n = 50 observations (We will do this in Stata). What is the 95% confidence interval for the population mean, µ, using this sample? Do we cover the true population mean? D. Now, create 100 samples of size n = 50, and calculate the confidence interval for each of the samples. How many of these confidence intervals do we expect to contain the true mean? How many actually do? E. What is the interpretation of your confidence interval from part (c) above? (Keep in mind what is truly random, because only random things can have probabilities other than zero or one!) F. The S&P 500 has increased on 51.6% of the days over the course of the last year (258 days). Assuming this is a good random sample, what is the 95% confidence interval for the overall proportion of days in which the S&P 500 will increase?

6 Page 6 Practice Problems continued 2. Past exam question I: LASIK Gone Wrong? According to a recent study, 1% of all patients who undergo laser eye surgery (i.e. LASIK) to correct their vision have serious post-laser vision problems (All About Vision, 2006). A. (9 points) A doctor has recently started treating patients with LASIK surgery. After treating three patients, it was observed that two had serious post-laser vision problems. If we assume that the true rate of these problems is 1% and that these three patients can be treated as a random sample, what is the probability two or three (i.e., > 2) of these patients should have been observed to have serious post-laser problems? B. (1 point) Considering your calculation in part (a), what conclusion would you make regarding this doctor? C. (10 points) In a random sample of 1600 LASIK patients, what is the probability that more than 25 experienced serious post-laser vision problems if we assume that the true rate of these problems is 1%? 3. Past exam question II: Stranger at the door Maria s dog Rio often barks when people are at the front door. If the person at the front door is a stranger, Rio barks 90% of the time. If the person at the front door is Maria s friend, Rio barks only 20% of the time. About 75% of people who come to the front door are Maria s friends. (Note: for this problem, everyone is either Maria s friend or a stranger). A. (7 points) What is the probability that Rio barks at the next person at the front door? B. (7 points) If Rio is barking at someone at the front door, what is the probability that person is Maria s friend? 4. Past exam question III: Hold the door? There is a sign in the university library elevator indicating a 16-person limit as well as a weight limit of 2500 pounds. Suppose that the weights of students, faculty, and staff are approximately normally distributed with a mean weight of 150 pounds and a standard deviation of 25 pounds. What is the probability that a random sample of 16 people in the elevator will exceed the weight limit?

7 Page 7 Practice Problems continued 5. Past exam question IV: Political opinions The table below shows the political affiliation of American voters and the proportion favoring or opposing the death penalty within the 6 categories defined by three values of party affiliation and 2 opinions. Death Penalty Opinion Party Favor Oppose Republican Democrat Other A. (6 points) What is the probability that a randomly chosen voter favors the death penalty? What is the probability that a different randomly chosen voter is a Republican? B. (6 points) Suppose you know that a randomly chosen voter is a Republican. What is the probability that he or she favors the death penalty? C. (7 points) Are the events choosing a Republican and choosing someone who favors the death penalty independent? Justify your answer. 6. Past exam question V: Gas taxes An investigator is interested in determining the relationship between gasoline prices and gasoline consumption in the US. She collects data on all 50 states and DC (n = 51) and measures the following variables: price: price per gallon of gasoline (in US dollars in June, 2006) usage: gasoline consumption (in barrels, per capita in 2006). Below is the relevant output from Stata on the state-by-state data she collected (summary statistics, correlation table, and scatterplot). Use this output to answer the following questions.. summarize price usage Variable Obs Mean Std.Dev. Min Max price usage corr price usage (obs=51) price usage price usage usage price

8 Page 8 Practice Problems continued 6. Past exam question V: Gas taxes continued A. (5 points) Calculate the equation for the least squares regression line for predicting gasoline usage from gasoline price. B. (5 points) Some politicians are hoping to add a $0.25 tax on every gallon of gasoline (which will in effect raise the average gasoline price exactly $0.25). Based on this regression model, how much would the average gasoline consumption change with this new tax? C. (6 points) Below is the histogram of the residuals from the regression model you found in part (a). Please give an estimate for the mean and median of this distribution of residuals, and also state (with brief reasoning) whether the distribution of residuals appears to be symmetric, left-skewed, or right-skewed: Mean: Median: Skewness (circle one): a) Left-skewed b) Symmetric c) Right-skewed Reasoning for skewness answer: Percent Residuals

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

WHERE DOES THE 10% CONDITION COME FROM?

WHERE DOES THE 10% CONDITION COME FROM? 1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay

More information

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

More information

Lecture 10: Depicting Sampling Distributions of a Sample Proportion

Lecture 10: Depicting Sampling Distributions of a Sample Proportion Lecture 10: Depicting Sampling Distributions of a Sample Proportion Chapter 5: Probability and Sampling Distributions 2/10/12 Lecture 10 1 Sample Proportion 1 is assigned to population members having a

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

Chapter 5: Normal Probability Distributions - Solutions

Chapter 5: Normal Probability Distributions - Solutions Chapter 5: Normal Probability Distributions - Solutions Note: All areas and z-scores are approximate. Your answers may vary slightly. 5.2 Normal Distributions: Finding Probabilities If you are given that

More information

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building

More information

Math 151. Rumbos Spring 2014 1. Solutions to Assignment #22

Math 151. Rumbos Spring 2014 1. Solutions to Assignment #22 Math 151. Rumbos Spring 2014 1 Solutions to Assignment #22 1. An experiment consists of rolling a die 81 times and computing the average of the numbers on the top face of the die. Estimate the probability

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Module 2 Probability and Statistics

Module 2 Probability and Statistics Module 2 Probability and Statistics BASIC CONCEPTS Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The standard deviation of a standard normal distribution

More information

The Math. P (x) = 5! = 1 2 3 4 5 = 120.

The Math. P (x) = 5! = 1 2 3 4 5 = 120. The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct

More information

Lecture 8. Confidence intervals and the central limit theorem

Lecture 8. Confidence intervals and the central limit theorem Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Lecture 5 : The Poisson Distribution

Lecture 5 : The Poisson Distribution Lecture 5 : The Poisson Distribution Jonathan Marchini November 10, 2008 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2 Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Thursday, November 13: 6.1 Discrete Random Variables

Thursday, November 13: 6.1 Discrete Random Variables Thursday, November 13: 6.1 Discrete Random Variables Read 347 350 What is a random variable? Give some examples. What is a probability distribution? What is a discrete random variable? Give some examples.

More information

1. How different is the t distribution from the normal?

1. How different is the t distribution from the normal? Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. t-distributions.

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

Chapter 5 Discrete Probability Distribution. Learning objectives

Chapter 5 Discrete Probability Distribution. Learning objectives Chapter 5 Discrete Probability Distribution Slide 1 Learning objectives 1. Understand random variables and probability distributions. 1.1. Distinguish discrete and continuous random variables. 2. Able

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Sampling Distributions

Sampling Distributions Sampling Distributions You have seen probability distributions of various types. The normal distribution is an example of a continuous distribution that is often used for quantitative measures such as

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Chapter 7 - Practice Problems 1

Chapter 7 - Practice Problems 1 Chapter 7 - Practice Problems 1 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Provide an appropriate response. 1) Define a point estimate. What is the

More information

Forecasting in supply chains

Forecasting in supply chains 1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the

More information

Problem sets for BUEC 333 Part 1: Probability and Statistics

Problem sets for BUEC 333 Part 1: Probability and Statistics Problem sets for BUEC 333 Part 1: Probability and Statistics I will indicate the relevant exercises for each week at the end of the Wednesday lecture. Numbered exercises are back-of-chapter exercises from

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-

More information

Chapter 4. Probability Distributions

Chapter 4. Probability Distributions Chapter 4 Probability Distributions Lesson 4-1/4-2 Random Variable Probability Distributions This chapter will deal the construction of probability distribution. By combining the methods of descriptive

More information

Important Probability Distributions OPRE 6301

Important Probability Distributions OPRE 6301 Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in real-life applications that they have been given their own names.

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Practice problems for Homework 12 - confidence intervals and hypothesis testing. Open the Homework Assignment 12 and solve the problems.

Practice problems for Homework 12 - confidence intervals and hypothesis testing. Open the Homework Assignment 12 and solve the problems. Practice problems for Homework 1 - confidence intervals and hypothesis testing. Read sections 10..3 and 10.3 of the text. Solve the practice problems below. Open the Homework Assignment 1 and solve the

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics Lecturer: Mikhail Zhitlukhin. 1. Course description Probability Theory and Introductory Statistics

More information

1 Simple Linear Regression I Least Squares Estimation

1 Simple Linear Regression I Least Squares Estimation Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

More information

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.) Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

More information

MA 1125 Lecture 14 - Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values.

MA 1125 Lecture 14 - Expected Values. Friday, February 28, 2014. Objectives: Introduce expected values. MA 5 Lecture 4 - Expected Values Friday, February 2, 24. Objectives: Introduce expected values.. Means, Variances, and Standard Deviations of Probability Distributions Two classes ago, we computed the

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Practice#1(chapter1,2) Name

Practice#1(chapter1,2) Name Practice#1(chapter1,2) Name Solve the problem. 1) The average age of the students in a statistics class is 22 years. Does this statement describe descriptive or inferential statistics? A) inferential statistics

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Time series Forecasting using Holt-Winters Exponential Smoothing

Time series Forecasting using Holt-Winters Exponential Smoothing Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract

More information

9. Sampling Distributions

9. Sampling Distributions 9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

The normal approximation to the binomial

The normal approximation to the binomial The normal approximation to the binomial The binomial probability function is not useful for calculating probabilities when the number of trials n is large, as it involves multiplying a potentially very

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides, you

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i ) Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll

More information

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for the Difference Between Two Means Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

AP STATISTICS (Warm-Up Exercises)

AP STATISTICS (Warm-Up Exercises) AP STATISTICS (Warm-Up Exercises) 1. Describe the distribution of ages in a city: 2. Graph a box plot on your calculator for the following test scores: {90, 80, 96, 54, 80, 95, 100, 75, 87, 62, 65, 85,

More information

ECE302 Spring 2006 HW4 Solutions February 6, 2006 1

ECE302 Spring 2006 HW4 Solutions February 6, 2006 1 ECE302 Spring 2006 HW4 Solutions February 6, 2006 1 Solutions to HW4 Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in

More information

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

consider the number of math classes taken by math 150 students. how can we represent the results in one number? ch 3: numerically summarizing data - center, spread, shape 3.1 measure of central tendency or, give me one number that represents all the data consider the number of math classes taken by math 150 students.

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1]. Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

More information

Name: Date: Use the following to answer questions 2-3:

Name: Date: Use the following to answer questions 2-3: Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

More information

Econometrics and Data Analysis I

Econometrics and Data Analysis I Econometrics and Data Analysis I Yale University ECON S131 (ONLINE) Summer Session A, 2014 June 2 July 4 Instructor: Doug McKee (douglas.mckee@yale.edu) Teaching Fellow: Yu Liu (dav.yu.liu@yale.edu) Classroom:

More information

AMS 5 Statistics. Instructor: Bruno Mendes mendes@ams.ucsc.edu, Office 141 Baskin Engineering. July 11, 2008

AMS 5 Statistics. Instructor: Bruno Mendes mendes@ams.ucsc.edu, Office 141 Baskin Engineering. July 11, 2008 AMS 5 Statistics Instructor: Bruno Mendes mendes@ams.ucsc.edu, Office 141 Baskin Engineering July 11, 2008 Course contents and objectives Our main goal is to help a student develop a feeling for experimental

More information

Math 35 Section 43376 Spring 2014. Class meetings: 6 Saturdays 9:00AM-11:30AM (on the following dates: 2/22, 3/8, 3/29, 5/3, 5/24, 6/7)

Math 35 Section 43376 Spring 2014. Class meetings: 6 Saturdays 9:00AM-11:30AM (on the following dates: 2/22, 3/8, 3/29, 5/3, 5/24, 6/7) Math 35 Section 43376 Spring 2014 Class meetings: 6 Saturdays 9:00AM-11:30AM (on the following dates: 2/22, 3/8, 3/29, 5/3, 5/24, 6/7) Instructor: Kathy Nabours Office: MTSC 133 Email: kathy.nabours@rcc.edu

More information

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck! STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.

More information

Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

The Binomial Probability Distribution

The Binomial Probability Distribution The Binomial Probability Distribution MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2015 Objectives After this lesson we will be able to: determine whether a probability

More information

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions 1. The following table contains a probability distribution for a random variable X. a. Find the expected value (mean) of X. x 1 2

More information

AMS 5 CHANCE VARIABILITY

AMS 5 CHANCE VARIABILITY AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and

More information

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density HW MATH 461/561 Lecture Notes 15 1 Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density and marginal densities f(x, y), (x, y) Λ X,Y f X (x), x Λ X,

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

More information

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Solution Let us regress percentage of games versus total payroll.

Solution Let us regress percentage of games versus total payroll. Assignment 3, MATH 2560, Due November 16th Question 1: all graphs and calculations have to be done using the computer The following table gives the 1999 payroll (rounded to the nearest million dolars)

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

More information

Random variables, probability distributions, binomial random variable

Random variables, probability distributions, binomial random variable Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Economic Statistics (ECON2006), Statistics and Research Design in Psychology (PSYC2010), Survey Design and Analysis (SOCI2007)

Economic Statistics (ECON2006), Statistics and Research Design in Psychology (PSYC2010), Survey Design and Analysis (SOCI2007) COURSE DESCRIPTION Title Code Level Semester Credits 3 Prerequisites Post requisites Introduction to Statistics ECON1005 (EC160) I I None Economic Statistics (ECON2006), Statistics and Research Design

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

WEEK #22: PDFs and CDFs, Measures of Center and Spread

WEEK #22: PDFs and CDFs, Measures of Center and Spread WEEK #22: PDFs and CDFs, Measures of Center and Spread Goals: Explore the effect of independent events in probability calculations. Present a number of ways to represent probability distributions. Textbook

More information

Universally Accepted Lean Six Sigma Body of Knowledge for Green Belts

Universally Accepted Lean Six Sigma Body of Knowledge for Green Belts Universally Accepted Lean Six Sigma Body of Knowledge for Green Belts The IASSC Certified Green Belt Exam was developed and constructed based on the topics within the body of knowledge listed here. Questions

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information