Elements of statistics (MATH0487-1)



Similar documents
Simple Linear Regression Inference

Chapter 7: Simple linear regression Learning Objectives

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Fairfield Public Schools

Introduction to Regression and Data Analysis

Example: Boats and Manatees

Factors affecting online sales

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

MTH 140 Statistics Videos

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Regression Analysis: A Complete Example

11. Analysis of Case-control Studies Logistic Regression

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Biostatistics: Types of Data Analysis

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Organizing Your Approach to a Data Analysis

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

SPSS Explore procedure

Lecture 1: Review and Exploratory Data Analysis (EDA)

Simple linear regression

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ


Chapter 13 Introduction to Linear Regression and Correlation Analysis

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Statistics in Retail Finance. Chapter 2: Statistical models of default

Directions for using SPSS

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

List of Examples. Examples 319

Least Squares Estimation

Regression Modeling Strategies

Final Exam Practice Problem Answers

Lecture 3: Linear methods for classification

Data analysis process

430 Statistics and Financial Mathematics for Business

17. SIMPLE LINEAR REGRESSION II

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

The importance of graphing the data: Anscombe s regression examples

The Basic Two-Level Regression Model

Descriptive Statistics

SAS Software to Fit the Generalized Linear Model

Simple Predictive Analytics Curtis Seare

4. Simple regression. QBUS6840 Predictive Analytics.

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

2. Simple Linear Regression

Multiple Regression: What Is It?

Week TSX Index

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Ordinal Regression. Chapter

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Penalized regression: Introduction

LOGIT AND PROBIT ANALYSIS

AP Statistics: Syllabus 1

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Introduction to Quantitative Methods

Multiple Linear Regression

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Forecasting Business Performance

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length

How To Check For Differences In The One Way Anova

Note 2 to Computer class: Standard mis-specification tests

Univariate Regression

16 : Demand Forecasting

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

5. Linear Regression

Generalized Linear Models

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Statistical Models in R

Recall this chart that showed how most of our course would be organized:

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS

Additional sources Compilation of sources:

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Simple Linear Regression

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

GLM I An Introduction to Generalized Linear Models

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

Data Analysis, Research Study Design and the IRB

Description. Textbook. Grading. Objective

ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE. School of Mathematical Sciences

The Statistics Tutor s Quick Guide to

Premaster Statistics Tutorial 4 Full solutions

Regression Analysis (Spring, 2000)

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Multinomial Logistic Regression

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Simple Methods and Procedures Used in Forecasting

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Transcription:

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis - EDA Estimation Confidence Intervals Hypothesis Testing Tab Outline I 1 Introduction to Statistics Why? What? Probability Statistics Some Examples Making Inferences Inferential Statistics Inductive versus Deductive Reasoning 2 Basic Probability Revisited 3 Sampling Samples and Populations Sampling Schemes Deciding Who to Choose Deciding How to Choose Non-probability Sampling Probability Sampling A Practical Application Study Designs Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis - EDA Estimation Confidence Intervals Hypothesis Testing Tab Outline II Classification Qualitative Study Designs Popular Statistics and Their Distributions Resampling Strategies 4 Exploratory Data Analysis - EDA Why? Motivating Example What? Data analysis procedures Outliers and Influential Observations How? One-way Methods Pairwise Methods Assumptions of EDA 5 Estimation Introduction Motivating Example Approaches to Estimation: The Frequentist s Way Estimation by Methods of Moments Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis - EDA Estimation Confidence Intervals Hypothesis Testing Tab Outline III Motivation What? How? Examples Properties of an Estimator Recapitulation Point Estimators and their Properties Properties of an MME Estimation by Maximum Likelihood What? How? Examples Profile Likelihoods Properties of an MLE Parameter Transformations 6 Confidence Intervals Importance of the Normal Distribution Interval Estimation What? How? Pivotal quantities Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis - EDA Estimation Confidence Intervals Hypothesis Testing Tab Outline IV Examples Interpretation of CIs Recapitulation Pivotal quantities Examples Interpretation of CIs 7 Hypothesis Testing General procedure Hypothesis test for a single mean P-values Hypothesis tests beyond a single mean Errors and Power 8 Table analysis Dichotomous Variables Comparing two proportions using2 2 tables About RRs and ORs and their confidence intervals Chi-square Tests Goodness-of-fit test Independence test Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis - EDA Estimation Confidence Intervals Hypothesis Testing Tab Outline V Homogeneity test 9 Introduction to Linear Regression Correlation Analysis Simple Linear Regression Centering Inference on Regression Coefficients Checking Model Assumptions Relation between Experience and Wage 10 Frequently Asked Questions Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Dichotomous Analysis - EDA Variables Estimation Chi-square Confidence Tests Intervals Hypothesis Testing Tab Testing Hypothesis with Table Data Several Types of χ 2 -tests: The following tests give rise to test statistics that follow a χ 2 -distribution under their appropriate null (hypothesis) Test of Goodness of fit Test of independence Test of homogeneity or (no) association Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

Recall: Properties of the χ 2 -distribution

A family of χ 2 -distributions

Critical values of χ 2

The χ 2 -table

The χ 2 -table via the R software

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Dichotomous Analysis - EDA Variables Estimation Chi-square Confidence Tests Intervals Hypothesis Testing Tab The χ 2 goodness-of-fit test Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

The χ 2 goodness-of-fit test

Example: Handgun survey

Example: Handgun survey

Example: Handgun survey

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Dichotomous Analysis - EDA Variables Estimation Chi-square Confidence Tests Intervals Hypothesis Testing Tab The χ 2 test of independence Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

The χ 2 test of independence

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Dichotomous Analysis - EDA Variables Estimation Chi-square Confidence Tests Intervals Hypothesis Testing Tab The χ 2 test of no association (homogeneity) Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

Example: treatment response

Example: treatment response

Example: treatment response

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Correlation Analysis -Analysis EDA Estimation Simple Linear Confidence Regression Intervals Centering Hypothesis Inference Testing on Regre Tab Quanitfying Associations Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

Correlation Analysis

Correlation Coefficients

Correlation Coefficients

Correlation Coefficients

To Remember

To Remember

Examples

Examples

Examples

Examples

Association and Causality

Bradford Hill s Criteria for Causality

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Correlation Analysis -Analysis EDA Estimation Simple Linear Confidence Regression Intervals Centering Hypothesis Inference Testing on Regre Tab Simple Linear Regression (SLR) Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

Example Boxplot of % low income by education level:

Simple Linear Regression: Regression Line

Regression Analysis represented by Regression Line Equation

Interpretation of Regression Model Components

Interpretation of Regression Model Components

Why is Linear Regression so popular? Linear regression can refer to simple linear regression (one predictor) or multiple linear regression (more than 1 predictor) Linear regression naturally extends to quadratic, cubic... regression to investigate curvilinear relationships Linear regression is flexible, since it can deal with binary X continuous X categorical X confounders interactions (leading to k-order regression models)

Example: Galton s study on height

Example: Galton s study on height

Regression Formalism I

Regression Formalism II

Regression Formalism: Model Assumptions The regression formalism naturally leads to four model assumptions: The relationship is linear The errors have the same variance The errors are independent of each other The errors are normally distributed When we satisfy the assumptions, it means that we have used all of the information available from the patterns in the data. When we violate an assumption, it usually means that there is a pattern to the data that we have not included in our model, and we could actually find a model that fits the data better.

Regression Formalism: Geometric Interpretation

Another example: City education versus income

Another example: City education versus income

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Correlation Analysis -Analysis EDA Estimation Simple Linear Confidence Regression Intervals Centering Hypothesis Inference Testing on Regre Tab Where is our Intercept? Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

Need for Centering As in the City education versus income example:

Centering Use c = 12, as education level to center with:

Centering - New Regression Equation / Interpretation

Using Sample Data to Estimate the Truth So far, we have presented our fitted regression line as ŷ i = β 0 +β 1 x i, without having said anything about how to obtain the best such regression line.

Using Sample Data to Estimate the Truth Note: Sometimes linear regression is referred to as least squares regression. This has to do with the fact that a criterium of minimizing squared deviations from the mean is often used to estimate the parameters of the regression model. However, other estimation methods exist (beyond the scope of this course). Hence, since we actually used a sample to estimate the population regression line, a more accurate notation would have been ŷ i = ˆβ 0 + ˆβ 1 x i

Drawing Conclusions about Population Associations

Hypothesis Testing in Regression Formulation of null hypothesis, alternative hypothesis, derivation of test statistic:

Hypothesis Testing in Regression Would you have expected this statistic to follow a t-distribution?

Hypothesis Testing in Regression Would you have expected this statistic to follow a t-distribution? Summary table:confidence intervals for difference in means

Example: Education

Example: Education

The use of Confidence Intervals It becomes easy once you have a pivotal quantity identified:

The use of Confidence Intervals

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Correlation Analysis -Analysis EDA Estimation Simple Linear Confidence Regression Intervals Centering Hypothesis Inference Testing on Regre Tab Statistical Modeling General Approach: Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)

Statistical Modeling General Approach (continued):

Model Validity Checks

Model Validity Checks What do we have to check?

Model Validity Checks How do we have to check?

Residual Plots

LINE: Linear relationship

LINE: Linear relationship In applied statistics, a partial regression plot attempts to show the effect of adding an additional variable to the model (given that one or more independent variables are already in the model). Partial regression plots are also referred to as added variable plots or adjusted variable plots. Partial regression plots are formed by: 1 Computing the residuals of regressing the response variable against the independent variables but omitting Xi 2 Computing the residuals from regressing Xi against the remaining independent variables 3 Plotting the residuals from 1. against the residuals from 2.

LINE: Independent observations The relevant question here is: Are all the subjects surveyed independent of one another? In order to answer this question, one needs information about how the data were collected... Can the independence assumption be assessed graphically?

LINE: Independent observations Can the independence assumption be assessed graphically? If the lag plot (Y i versus Y i 1 ) is without structure, then the randomness assumption holds...

LINE

LINE

Graphical Model Validity Checks Plotting y versus x

Graphical Model Validity Checks: Residual Analysis Standardized residuals:

Graphical Model Validity Checks: Residual Analysis

Example 1: Residual Analysis on Health Status

Example 2: Non-linearity

Example 3: Outliers

Example 4: Non-normality

Example 5: Heteroscedasticity

Minimal Practice: Fit a regression line and...

Check model assumptions For instance, plot residuals versus x:

Check model assumptions Compute standardized residuals: Note that for a normal distribution, About 68.27% of the values lie within 1 standard deviation of the mean. Similarly, about 95.45% of the values lie within 2 standard deviations of the mean. Nearly all (99.73%) of the values lie within 3 standard deviations of the mean

Check model assumptions Plot standardized residuals versus x: Model fit: Residual pattern for people with very little experience?

Caution: Outliers Check

Caution: Normality Check

Caution: Homescedasticity Check

Recapitulation Questions?

Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis - EDA Estimation Confidence Intervals Hypothesis Testing Tab Acknowledgements Most slides on formal statistical inference, chi-square testing and regression are based on an excellent course series of Sandy Eckel, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, USA. Prof. Dr. Dr. K. Van Steen Elements of statistics (MATH0487-1)