2013 MBA Jump Start Program. Statistics Module Part 3



Similar documents
Chapter 7: Simple linear regression Learning Objectives

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Regression Analysis: A Complete Example

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

2. Simple Linear Regression

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Simple Linear Regression Inference

Final Exam Practice Problem Answers

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Regression step-by-step using Microsoft Excel

Simple Methods and Procedures Used in Forecasting

Chapter 5 Analysis of variance SPSS Analysis of variance

Exercise 1.12 (Pg )

Multiple Linear Regression

Introduction to Regression and Data Analysis

The importance of graphing the data: Anscombe s regression examples

A Primer on Forecasting Business Performance

Week TSX Index

MTH 140 Statistics Videos

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Simple Predictive Analytics Curtis Seare

Chapter 23. Inferences for Regression

Elementary Statistics Sample Exam #3

Data Analysis Tools. Tools for Summarizing Data

Violent crime total. Problem Set 1

Univariate Regression

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Simple Regression Theory II 2010 Samuel L. Baker

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Factors affecting online sales

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Correlation and Regression

DATA INTERPRETATION AND STATISTICS

12: Analysis of Variance. Introduction

Correlation key concepts:

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Simple linear regression

Business Valuation Review

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Statistics 151 Practice Midterm 1 Mike Kowalski

Module 3: Correlation and Covariance

Outline: Demand Forecasting

STAT 360 Probability and Statistics. Fall 2012

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Module 5: Multiple Regression Analysis

Name: Date: Use the following to answer questions 2-3:

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Statistics Review PSY379

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

How To Run Statistical Tests in Excel

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Simple Linear Regression

Additional sources Compilation of sources:

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

STAT 350 Practice Final Exam Solution (Spring 2015)

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines

Using Excel for Statistical Analysis

Fairfield Public Schools

Correlation and Simple Linear Regression

Projects Involving Statistics (& SPSS)

Scatter Plot, Correlation, and Regression on the TI-83/84

Chapter 23 Inferences About Means

Scatter Plots with Error Bars

Estimation of σ 2, the variance of ɛ

Introduction to Quantitative Methods

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Description. Textbook. Grading. Objective

Problems With Using Microsoft Excel for Statistics

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

17. SIMPLE LINEAR REGRESSION II

SPSS Guide: Regression Analysis

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

430 Statistics and Financial Mathematics for Business

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Using R for Linear Regression

STATISTICA Formula Guide: Logistic Regression. Table of Contents

11. Analysis of Case-control Studies Logistic Regression

Premaster Statistics Tutorial 4 Full solutions

August 2012 EXAMINATIONS Solution Part I

Part 2: Analysis of Relationship Between Two Variables

Normality Testing in Excel

Florida Math for College Readiness

Data Mining Introduction

Regression and Correlation

Transcription:

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1

Making an Investment Decision A researcher in your firm just invented a new flavor of ice cream Given the short Seattle spring, you only had the opportunity to ask ten people about the taste Six liked it and four hated it After a quick meeting with your co founder, you have decided to abandon the last year of R&D that culminated in this amazingly different ice cream Is this a reasonable decision? 3 The Power of Statistics After sitting down with your consultants, you established that your target market comprises of 25 million DINKS To be profitable, you need to sell your ice cream to 30% of that market over the course of the summer How many people do you need to sample in order to be 95% confident that at least 30% of that market will like the ice cream? This is something you will be able to answer by the end of the winter quarter! 4 2

Estimating parameters Goal Parameter: a characteristic of the population (e.g. μ) Feature of the data generating process Statistics: an observed characteristic of a sample (ӯ) To estimate is to use a statistic to approximate a parameter 5 Sampling Variation Sampling variation is the variability in the value of a statistic from sample to sample It is the price we pay for working with a sample rather than the population Example: Average exam class grade 6 3

From Data to Probability Over the long run (with enough data), the accumulated relative frequency converges to a constant (probability) The Law of Large Numbers: The relative frequency of an outcome converges to a number, the probability of the outcome, as the number of observed outcomes increases 7 GDP Growth What has been the average annual Gross Domestic Product (GDP) growth in the U.S. since 1947? In Excel, you have annualized quarterly real GDP growth Is this the true average GDP growth? Is this next quarter s expected GDP growth? 8 4

Normal Models Sample means are normally distributed (bell shape curve) if the individual values are normally distributed We never have exact normal distributions The Central Limit Theorem shows that the sampling distribution of averages is approximately normal even if the underlying population is not normally distributed Sample size needs to be large enough for averaging to smooth away deviations from normality 9 Standard Error of the Mean The standard error of the mean of a simple random sample of n measurements from a process or population with standard deviation σ is: SE x n The larger the sample size, the smaller the sampling variation from sample to sample What is the standard error in our average GDP growth estimate? 10 5

Concept of Statistical Test We estimated the average annualized GDP growth at 3.3%. Is it different from 4%? Use a statistical test to answer this question Consider the plausibility of a specific claim Claims are called hypotheses 11 Concept of Statistical Test Statistical hypothesis: claim about a parameter of a population Null hypothesis (H 0 ): specifies a default course of action, preserves the status quo Alternative hypothesis (H a ): contradicts the assertion of the null hypothesis 12 6

Hypotheses Is average GDP growth (3.3%) different from 4%? H 0 : H a : 13 Types of Errors Type I error: Reject H 0 incorrectly Believe that GDP growth is 4% even though it is not False positive Type II error: Accept H 0 incorrectly Believe that GDP growth is not 4% even though it is False negative 14 7

Confidence Interval In order to estimate the long term tax revenue from closing a tax loophole, the White House needs to know what future GDP growth will be It can use past GDP growth as a basis for long term planning Use confidence intervals to answer such questions Confidence intervals convey information about the precision of the estimates 15 Ranges for Parameters A confidence interval is a range of plausible values for a parameter based on a sample Constructing confidence intervals relies on the sampling distribution of the statistic We will assume a normal model based on the Central Limit Theorem 16 8

Confidence Interval for the Mean We will use the estimated standard error of the mean SE x n Based on the normal distribution, random samples have the following property: The sample statistic in 95% of samples lies within about two standard errors of the population parameter 17 Confidence Interval for the Mean As a result, the confidence interval (at 95% confidence) for the mean is x 2 SE x to x 2 SE x What is your 95% confidence interval on annual GDP growth in the U.S. over the last 65 years? 18 9

Interpreting the Confidence Interval What does this mean? We are 95% confident that μ (true GDP growth) lies between 2.75% and 3.78% Might μ be 2%? It could be, but it is unlikely given the sample results 19 Wrong interpretations! Common Confusion 95% of years witness a GDP growth between 2.75% and 3.78% The average GDP growth is between 2.75% and 3.78% 20 10

Practice Quiz #5 and Break Please take a few minutes and complete the practice quiz on the next page Hypothesis testing for the mean return of PCAR Then take a 10 min break to stretch your legs! 21 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 22 11

Regression We are interested in understanding how changes in one variable can be explained by movements in one or more other variables A response variable in a dataset measures the outcome of a study An explanatory variable explains or influences changes in a response variable A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes: y = f(x) 23 Regression We use regression lines to predict y as a function of x: y = a + b*x How do we estimate a and b? How do we find the best line to fit between y and x? A ordinary least squares (OLS) regression line of y on x is the line that minimizes the sum of the squares of the vertical distance between the data points and the line 24 12

Graphical Explanation 25 OLS Regression The slope coefficient is given by: bˆ x Covar y, x V ar This is actually an estimated slope b hat and we also have an estimated intercept a hat : aˆ y bˆ x Using these estimates, we can calculate some predicted values of y given the values of x: yˆ aˆ bˆ x 26 13

Regression in Excel Let s regress PCAR returns on market returns Go to Data Analysis Select Regression Highlight the y and x variables and press OK Note the many options: Labels No intercepts: y = b*x Confidence intervals Residuals and residual plots 27 Regression Output SUMMARY OUTPUT Regression Statistics Multiple R 0.664232549 R Square 0.441204879 Adjusted R Square 0.435933227 Standard Error 0.066815657 Observations 108 ANOVA df SS MS F Significance F Regression 1 0.373637157 0.373637157 83.69385395 4.62782E 15 Residual 106 0.4732192 0.004464332 Total 107 0.846856357 Coefficients Standard Error t Stat P value Lower 95% Upper 95% Lower 99.0% Upper 99.0% Intercept 0.019298413 0.006429671 3.001461975 0.003350688 0.006550965 0.032045861 0.002433332 0.036163495 X Variable 1 1.274361693 0.139298335 9.148434508 4.62782E 15 0.998189203 1.550534182 0.90898099 1.639742395 28 14

Alternative Visualization Build scatter plot of the returns (need to reverse the columns in order to have PCAR vertically and the market horizontally) Add linear trendline (highlight data on chart and rightclick) Note: Can also use intercept( ) and slope( ) functions 29 Interpreting the Fitted Line Interpreting the slope The slope estimates the marginal PCAR return per unit of market return While tempting, it is not correct to describe the slope as the change in y caused by changing x Question: Is the slope statistically different from 0? 30 15

Explaining Variation R squared (R 2 ) It is the squared correlation between x and y It is the fraction of the variation in y accounted by the variation in x In our example, 44% of the variation in PCAR returns can be explained by variation in market returns 31 Regression Example Relationship between age and blood pressure 32 16

Regression Example (2) Explaining home selling prices using multiple explanatory variables 33 Caution! Association (or correlation) does not imply causation! Must use common sense! It could be co linearity It could be a missing variable: need to control for it It could be a variable that is not independent Example: Someone says, There is a strong positive correlation between the number of firefighters at a fire and the amount of damage the fire does. So sending lots of firefighters just causes more damage. Explain why this reasoning is wrong 34 17

Summary of Part 3 Hypothesis testing is the cornerstone of inference in statistics Attempt to reject (or fail to reject) a null Standard errors of parameter estimates are key to answer Regressions are powerful tools to understand relations between variables Simple regression is very similar to a correlation Multiple right hand side variables allow decomposition of effects (or controls) 35 Part 1 Summary of Statistics Module Review of basic data analysis, such as means and standard deviation Histograms and distributions Part 2 Review of co variation analysis (covariance and correlation) Working with random variables Part 3 Inference and regressions Additional Problem Set at the end contains more problems on (almost) all topics Good statistics and its understanding is too overlooked and leads to poor decision making! 36 18