Sampling and Inference. The Quality of Data and Measures

Size: px
Start display at page:

Download "Sampling and Inference. The Quality of Data and Measures"

Transcription

1 Sampling and Inference The Quality of Data and Measures 1

2 Why we talk about sampling General citizen education Understand data you ll be using Understand how to draw a sample, if you need to Make statistical inferences 2

3 Cost/ benefit Why do we sample? Benefit (precision) Cost (hassle factor) N 3

4 How do we sample? Simple random sample Variant: systematic sample with a random start Stratified Cluster 4

5 Stratification Divide sample into subsamples, based on known characteristics (race, sex, religiousity, continent, department) Benefit: preserve or enhance variability 5

6 Stratification example NES Hypothetical sample N 50% N 50% White Christians 1, % % Black Christians % % White Jews % % Black Jews % % Other race/religion % % Missing 227 n.a. Total 1, % (on 1,487 valid obs.) 6

7 Cluster sampling Block HH Unit Individual 7

8 Effects of samples Obvious: influences marginals Less obvious Allows effective use of time and effort Effect on multivariate techniques Sampling of independent variable: greater precision in regression estimates Sampling on dependent variable: bias 8

9 Sampling on Independent Variable y y x x 9

10 Sampling on Dependent Variable y y x x 10

11 Sampling Consequences for Statistical Inference 11

12 Statistical Inference: Learning About the Unknown From the Known Reasoning forward: distributions of sample means, when the population mean, s.d., and n are known. Reasoning backward: learning about the population mean when only the sample, s.d., and n are known 12

13 Reasoning Forward 13

14 Exponential Distribution Example Fraction Mean = 250,000 Median=125,000 s.d. = 283,474 Min = 0 Max = 1,000, e+06 inc 14

15 Consider 10 random samples, of n = 100 apiece Sample 1 mean 253, , , ,657.3 Fraction 6 241, , , , , e+06 inc 15

16 Consider 10,000 samples of n = 100 N = 10,000 Mean = 249,993 s.d. = 28,559 Skewness = Kurtosis = 2.92 Fraction e+06 (mean) inc 16

17 Consider 1,000 samples of various sizes Fraction Fraction Fraction e+06 (mean) inc Mean =250,105 s.d.= 90,891 Skew= 0.38 Kurt= e+06 (mean) inc Mean = 250,498 s.d.= 28,297 Skew= 0.02 Kurt= e+06 (mean) inc Mean = 249,938 s.d.= 9,376 Skew= Kurt=

18 Difference of means example State 1 Mean = 250,000 Fraction e+06 inc State 2 Mean = 300,000 Fraction e+06 inc2 18

19 Take 1,000 samples of 10, of each state, and compare them First 10 samples Sample State 1 State ,410 < 365, ,571 < 243, ,574 > 438, ,374 < 557, ,934 > 189, ,400 < 284, ,115 < 210, ,885 < 333, ,678 < 314, ,725 > 152,312 19

20 1,000 samples of e+06 (mean) inc2 0 State 2 > State 1: 673 times 0 1.1e+06 (mean) inc 20

21 1,000 samples of e+06 (mean) inc e+06 (mean) inc State 2 > State 1: 909 times 21

22 1,000 samples of 1, e+06 (mean) inc e+06 (mean) inc State 2 > State 1: 1,000 times 22

23 Another way of looking at it: The distribution of Inc 2 Inc 1 n = 10 n = 100 n = 1, Fraction Fraction Fraction diff Mean = 51,845 s.d. = 124, diff Mean = 49,704 s.d. = 38, diff Mean = 49,816 s.d. = 13,932 23

24 Reasoning Backward When you know n, X, and but want to say something s, about µ 24

25 Central Limit Theorem As the sample size n increases, the distribution of the mean X of a random sample taken from practically any population approaches a normal distribution, with mean : and standard σ deviation n 25

26 Calculating Standard Errors In general: std. err. = s n 26

27 Most important standard errors Mean Proportion Diff. of 2 means Regression (slope) coeff. s n p ( 1 p) n 1 1 s p + n n 1 2 s. e. r. 1 n s x 27

28 If you know the sample mean, s.d., and n, what can you say about the population mean? In general, population mean = sample mean ± arbitrary interval standard error 28

29 If n is sufficiently large, choose the interval using the normal curve y σ 3σ 2σ σ 68% Mean 95% 99% σ 2σ 3σ 4σ 29

30 Population mean using original example (n = 10) 68% 95% 99% Sample Mean s.d. s.e. lower upper lower upper lower upper 1 311, ,392 76, , , , ,079 82, , , ,655 68, , ,767 48, ,963-20, , , , , , , , , , , , , , , ,272 50, ,971-51, , , ,256 86, , ,345 48, ,756-38, , , , , , ,817 51, ,235-57, , , ,071 62,319 64, ,435 2, ,754-59, , , ,711 40, , , , , , , , ,009 63,564 89, ,242 25, ,806-38, , , ,339 83, , ,317 55, ,908-28, ,499 30

31 Population mean using original example (n = 1000) 68% 95% 99% Sample Mean s.d. s.e. lower upper lower upper lower upper 1 238, ,492 8, , , , , , , , ,954 9, , , , , , , , ,022 8, , , , , , , , ,772 8, , , , , , , , ,343 8, , , , , , , , ,213 8, , , , , , , , ,150 9, , , , , , , , ,490 8, , , , , , , , ,994 8, , , , , , , , ,118 9, , , , , , ,262 31

32 With mean = 238,226 s.e. = 8,775, how likely is it that the true mean is 200,000 or less? Another way of asking this: y The z-ratio , , , , ,226 4σ 3σ 2σ σ Mean σ 2σ 3σ 4σ 32

33 Z (Sample mean - test value) z =, standard error in this case, (238, ,000) z = = ,775 33

34 t (when the sample is small) z (normal) distribution t-distribution z 34

35 Reading a z table 35

36 Reading a t table 36

37 Doing a t-test Q: How likely is it that the residual vote rate n 1996 was 2.5% or less?.2 Mean: s.d.: N: 1905 Fraction s. e. = s / n = = / blank96 37

38 Mean: s.d.: N: The picture s. e. = s / n = / 1905 y = t = = newz 38

39 The STATA output. ttest blank96=.025 One-sample t test Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] blank Degrees of freedom: 1904 Ho: mean(blank96) =.025 Ha: mean <.025 Ha: mean ~=.025 Ha: mean >.025 t = t = t = P < t = P > t = P > t =

40 Doing another t-test Q: How likely is it that the residual vote rate in 1996 equal to the rate in 1992 (I.e., blank 96 -blank 92 = 0)? Mean: s.d.: N: Fraction s. e. = s / n = = / diff

41 Mean: s.d.: N: 1448 The picture s. e. = s / n = / 1448 y = t = = newz

42 . ttest blank96=blank92 Paired t test The STATA output Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] blank blank diff Ho: mean(blank96 - blank92) = mean(diff) = 0 Ha: mean(diff) < 0 Ha: mean(diff) ~= 0 Ha: mean(diff) > 0 t = t = t = P < t = P > t = P > t = ttest diff9692=0 One-sample t test Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] diff Degrees of freedom: Ho: mean(diff9692) = 0 Ha: mean < 0 Ha: mean ~= 0 Ha: mean > 0 t = t = t = P < t = P > t = P > t =

43 Final t-test Q: Was there a relationship between residual vote and county Size in 1996? blank96 Fitted values Slope coeff: s.e.r: N: 1861 S x : s. e. = s. e. r n 1 s x = = = blank e+06 vap96_to 43

44 Calculating t t = =

45 The STATA output. reg lblank96 lvap96 Source SS df MS Number of obs = F( 1, 1859) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lblank96 Coef. Std. Err. t P> t [95% Conf. Interval] lvap _cons

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

Interaction effects between continuous variables (Optional)

Interaction effects between continuous variables (Optional) Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat

More information

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors. Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

More information

Quick Stata Guide by Liz Foster

Quick Stata Guide by Liz Foster by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

Addressing Alternative. Multiple Regression. 17.871 Spring 2012

Addressing Alternative. Multiple Regression. 17.871 Spring 2012 Addressing Alternative Explanations: Multiple Regression 17.871 Spring 2012 1 Did Clinton hurt Gore example Did Clinton hurt Gore in the 2000 election? Treatment is not liking Bill Clinton 2 Bivariate

More information

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

More information

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

More information

Using Stata 9 & Higher for OLS Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 8, 2015

Using Stata 9 & Higher for OLS Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 8, 2015 Using Stata 9 & Higher for OLS Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 8, 2015 Introduction. This handout shows you how Stata can be used

More information

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052) Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation

More information

Linear Regression Models with Logarithmic Transformations

Linear Regression Models with Logarithmic Transformations Linear Regression Models with Logarithmic Transformations Kenneth Benoit Methodology Institute London School of Economics kbenoit@lse.ac.uk March 17, 2011 1 Logarithmic transformations of variables Considering

More information

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Data Analysis Methodology 1

Data Analysis Methodology 1 Data Analysis Methodology 1 Suppose you inherited the database in Table 1.1 and needed to find out what could be learned from it fast. Say your boss entered your office and said, Here s some software project

More information

MODELING AUTO INSURANCE PREMIUMS

MODELING AUTO INSURANCE PREMIUMS MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format: Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random

More information

Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

More information

Stata Walkthrough 4: Regression, Prediction, and Forecasting

Stata Walkthrough 4: Regression, Prediction, and Forecasting Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting

More information

The average hotel manager recognizes the criticality of forecasting. However, most

The average hotel manager recognizes the criticality of forecasting. However, most Introduction The average hotel manager recognizes the criticality of forecasting. However, most managers are either frustrated by complex models researchers constructed or appalled by the amount of time

More information

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. level(#) , options2

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. level(#) , options2 Title stata.com ttest t tests (mean-comparison tests) Syntax Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see One-sample t test ttest varname

More information

xtmixed & denominator degrees of freedom: myth or magic

xtmixed & denominator degrees of freedom: myth or magic xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or

More information

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions. SW Ch 8 1/54/ Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

More information

especially with continuous

especially with continuous Handling interactions in Stata, especially with continuous predictors Patrick Royston & Willi Sauerbrei German Stata Users meeting, Berlin, 1 June 2012 Interactions general concepts General idea of a (two-way)

More information

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

More information

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7. THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing

More information

Using Stata for One Sample Tests

Using Stata for One Sample Tests Using Stata for One Sample Tests All of the one sample problems we have discussed so far can be solved in Stata via either (a) statistical calculator functions, where you provide Stata with the necessary

More information

Need for Sampling. Very large populations Destructive testing Continuous production process

Need for Sampling. Very large populations Destructive testing Continuous production process Chapter 4 Sampling and Estimation Need for Sampling Very large populations Destructive testing Continuous production process The objective of sampling is to draw a valid inference about a population. 4-

More information

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Hypothesis Testing on Multiple Parameters In many cases we may wish to know whether two or more variables are jointly significant in a regression.

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Discussion Section 4 ECON 139/239 2010 Summer Term II

Discussion Section 4 ECON 139/239 2010 Summer Term II Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Nonlinear relationships Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Nonlinear relationships Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Nonlinear relationships Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February, 5 Sources: Berry & Feldman s Multiple Regression in Practice 985; Pindyck and Rubinfeld

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

Chapter 4 and 5 solutions

Chapter 4 and 5 solutions Chapter 4 and 5 solutions 4.4. Three different washing solutions are being compared to study their effectiveness in retarding bacteria growth in five gallon milk containers. The analysis is done in a laboratory,

More information

Erik Parner 14 September 2016. Basic Biostatistics - Day 2-21 September, 2016 1

Erik Parner 14 September 2016. Basic Biostatistics - Day 2-21 September, 2016 1 PhD course in Basic Biostatistics Day Erik Parner, Department of Biostatistics, Aarhus University Log-transformation of continuous data Exercise.+.4+Standard- (Triglyceride) Logarithms and exponentials

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

25 Working with categorical data and factor variables

25 Working with categorical data and factor variables 25 Working with categorical data and factor variables Contents 25.1 Continuous, categorical, and indicator variables 25.1.1 Converting continuous variables to indicator variables 25.1.2 Converting continuous

More information

Correlated Random Effects Panel Data Models

Correlated Random Effects Panel Data Models INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Introduction; Descriptive & Univariate Statistics

Introduction; Descriptive & Univariate Statistics Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of

More information

Point and Interval Estimates

Point and Interval Estimates Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

More information

is paramount in advancing any economy. For developed countries such as

is paramount in advancing any economy. For developed countries such as Introduction The provision of appropriate incentives to attract workers to the health industry is paramount in advancing any economy. For developed countries such as Australia, the increasing demand for

More information

Survey Data Analysis in Stata

Survey Data Analysis in Stata Survey Data Analysis in Stata Jeff Pitblado Associate Director, Statistical Software StataCorp LP 2009 Canadian Stata Users Group Meeting Outline 1 Types of data 2 2 Survey data characteristics 4 2.1 Single

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

2. Filling Data Gaps, Data validation & Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics 2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)

More information

Outliers Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 7, 2016

Outliers Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 7, 2016 Outliers Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 7, 2016 These notes draw heavily from several sources, including Fox s Regression Diagnostics; Pindyck

More information

Guide to Microsoft Excel for calculations, statistics, and plotting data

Guide to Microsoft Excel for calculations, statistics, and plotting data Page 1/47 Guide to Microsoft Excel for calculations, statistics, and plotting data Topic Page A. Writing equations and text 2 1. Writing equations with mathematical operations 2 2. Writing equations with

More information

Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)

Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI) Objectives 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) Statistical confidence (CIS gives a good explanation of a 95% CI) Confidence intervals. Further reading http://onlinestatbook.com/2/estimation/confidence.html

More information

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Statistics, Data Analysis & Econometrics

Statistics, Data Analysis & Econometrics Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important

More information

Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

More information

From this it is not clear what sort of variable that insure is so list the first 10 observations.

From this it is not clear what sort of variable that insure is so list the first 10 observations. MNL in Stata We have data on the type of health insurance available to 616 psychologically depressed subjects in the United States (Tarlov et al. 1989, JAMA; Wells et al. 1989, JAMA). The insurance is

More information

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005

Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005 Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005 Philip J. Ramsey, Ph.D., Mia L. Stephens, MS, Marie Gaudard, Ph.D. North Haven Group, http://www.northhavengroup.com/

More information

NCSS Statistical Software. One-Sample T-Test

NCSS Statistical Software. One-Sample T-Test Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,

More information

New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Chapter 2 Probability Topics SPSS T tests

Chapter 2 Probability Topics SPSS T tests Chapter 2 Probability Topics SPSS T tests Data file used: gss.sav In the lecture about chapter 2, only the One-Sample T test has been explained. In this handout, we also give the SPSS methods to perform

More information

The leverage statistic, h, also called the hat-value, is available to identify cases which influence the regression model more than others.

The leverage statistic, h, also called the hat-value, is available to identify cases which influence the regression model more than others. Outliers Outliers are data points which lie outside the general linear pattern of which the midline is the regression line. A rule of thumb is that outliers are points whose standardized residual is greater

More information

Sample Size Calculation for Longitudinal Studies

Sample Size Calculation for Longitudinal Studies Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1) Introduction

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

MEAN SEPARATION TESTS (LSD AND Tukey s Procedure) is rejected, we need a method to determine which means are significantly different from the others.

MEAN SEPARATION TESTS (LSD AND Tukey s Procedure) is rejected, we need a method to determine which means are significantly different from the others. MEAN SEPARATION TESTS (LSD AND Tukey s Procedure) If Ho 1 2... n is rejected, we need a method to determine which means are significantly different from the others. We ll look at three separation tests

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for the Difference Between Two Means Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population. SAMPLING & INFERENTIAL STATISTICS Sampling is necessary to make inferences about a population. SAMPLING The group that you observe or collect data from is the sample. The group that you make generalizations

More information

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

More information

Statistical Functions in Excel

Statistical Functions in Excel Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

More information