Applied Data Analysis. Fall 2015



Similar documents
Father s height (inches)

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Regression Analysis: A Complete Example

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Correlational Research

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Chapter 13 Introduction to Linear Regression and Correlation Analysis

SIMON FRASER UNIVERSITY

Chapter Four. Data Analyses and Presentation of the Findings

Chapter 7: Simple linear regression Learning Objectives

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

Name: Date: Use the following to answer questions 3-4:

Hypothesis Testing. Hypothesis Testing

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Descriptive Statistics

The Influence of a Summer Bridge Program on College Adjustment and Success: The Importance of Early Intervention and Creating a Sense of Community

Simple Linear Regression Inference

II. DISTRIBUTIONS distribution normal distribution. standard scores

6.2 Normal distribution. Standard Normal Distribution:

Validity of Selection Criteria in Predicting MBA Success

17. SIMPLE LINEAR REGRESSION II

WHAT IS A JOURNAL CLUB?

Module 5: Statistical Analysis

TEACHING PRINCIPLES OF ECONOMICS: INTERNET VS. TRADITIONAL CLASSROOM INSTRUCTION

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Biostatistics: Types of Data Analysis

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

A Modest Experiment Comparing HBSE Graduate Social Work Classes, On Campus and at a. Distance

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Chapter 23. Inferences for Regression

SPSS Guide: Regression Analysis

Regression step-by-step using Microsoft Excel

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Introduction to Linear Regression

MASTER COURSE SYLLABUS-PROTOTYPE PSYCHOLOGY 2317 STATISTICAL METHODS FOR THE BEHAVIORAL SCIENCES

Two-sample hypothesis testing, II /16/2004

Part 2: Analysis of Relationship Between Two Variables

Univariate Regression

National Center for Education Statistics

August 2012 EXAMINATIONS Solution Part I

Course Syllabus MATH 110 Introduction to Statistics 3 credits

Elementary Statistics

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

WHAT IS A BETTER PREDICTOR OF ACADEMIC SUCCESS IN AN MBA PROGRAM: WORK EXPERIENCE OR THE GMAT?

Multiple Linear Regression

Introduction to Linear Regression

Point Biserial Correlation Tests

3.4 Statistical inference for 2 populations based on two samples

Study Guide for the Final Exam

C. FIRST-TIME, FIRST-YEAR (FRESHMAN) ADMISSION

Multiple Regression. Page 24

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Chapter 7 Section 1 Homework Set A

Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Statistics Review PSY379

C. FIRST-TIME, FIRST-YEAR (FRESHMAN) ADMISSION

DATA COLLECTION AND ANALYSIS

Basic Probability and Statistics Review. Six Sigma Black Belt Primer

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

The University of Texas at Austin School of Social Work SOCIAL WORK STATISTICS

C. FIRST-TIME, FIRST-YEAR (FRESHMAN) ADMISSION


Statistics in Medicine Research Lecture Series CSMC Fall 2014

Descriptive statistics; Correlation and regression

STAT 350 Practice Final Exam Solution (Spring 2015)

Admissions Institution: Florida Atlantic University (133669) User ID: P Overview

Causal Forecasting Models

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

Hypothesis Testing --- One Mean

Generalized Linear Models

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

HYPOTHESIS TESTING: POWER OF THE TEST

La Roche College ASSOCIATE OF SCIENCE IN NURSING / BACHELOR OF SCIENCE IN NURSING PROGRAM GUIDE

Judith M. Harackiewicz University of Wisconsin Madison. Kenneth E. Barron James Madison University. Andrew J. Elliot University of Rochester

The correlation coefficient

Los Angeles Pierce College. SYLLABUS Math 227: Elementary Statistics. Fall 2011 T Th 4:45 6:50 pm Section #3307 Room: MATH 1400

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

1 Simple Linear Regression I Least Squares Estimation

Mind on Statistics. Chapter 15

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Online Basic Statistics

Some Essential Statistics The Lure of Statistics

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

Bowen, Chingos & McPherson, Crossing the Finish Line

Stats Review Chapters 9-10

Transcription:

Applied Data Analysis Fall 2015

Course information: Labs Anna Walsdorff anna.walsdorff@rochester.edu Tues. 9-11 AM Mary Clare Roche maryclare.roche@rochester.edu Mon. 2-4 PM

Lecture outline 1. Practice questions 2. Inference and regression

Question 1 For women age 25-45 in the U.S. in 2005, with full-time jobs, the relationship between education (years of schooling completed) and personal income (dollars) can be summarized as follows: Education Income Mean 14.0 32,000 St. Dev. 2.4 26,000 Estimate the average income of those women who have finished high school, but have not gone on to college (12 years of education). The correlation is 0.34.

z-table

Question 1 answer 12 14 2.4 = 0.8333

Question 1 answer 12 14 2.4 = 0.8333 0.34 0.833 = 0.28

Question 1 answer 12 14 2.4 = 0.8333 0.34 0.833 = 0.28 0.28 $26, 000 + $32, 000 = $24, 720

Question 2 For the first-year students at a certain university, the correlation between SAT scores and first-year GPA was 0.60. The scatter diagram is football-shaped. Predict the percentile rank for the first-year GPA for a student whose percentile rank on the SAT was 1. 90% 2. 30% 3. 50% 4. unknown

z-table

Question 2 answer 1. On the z-table, 90% translates to 1.28. 1.28 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%.

Question 2 answer 1. On the z-table, 90% translates to 1.28. 1.28 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%. 2. On the z-table, 30% translates to -0.52. 0.52 0.6 = 0.312. Going back to the z-table, -0.312 gives us about 38%.

Question 2 answer 1. On the z-table, 90% translates to 1.28. 1.28 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%. 2. On the z-table, 30% translates to -0.52. 0.52 0.6 = 0.312. Going back to the z-table, -0.312 gives us about 38%. 3. 50%

Question 2 answer 1. On the z-table, 90% translates to 1.28. 1.28 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%. 2. On the z-table, 30% translates to -0.52. 0.52 0.6 = 0.312. Going back to the z-table, -0.312 gives us about 38%. 3. 50% 4. 50%

Question 3 As part of their training, air force pilots make two practice landings with instructors and are rated on performance. The instructors discuss the ratings with the pilots after each landing. Statistical analysis shows that pilots who make poor landings the first time tend to do better the second time. Conversely, pilots who make good landings the first time tend to do worse the second time. The conclusion: criticism helps the pilots while praise makes them perform worse. As a result, instructors were ordered to criticize all landings, good or bad. Was this policy warranted by the facts?

Question 3 answer No, the air force is guilty of making the regression fallacy. The results are probably due to the regression effect.

Question 4 An admissions officer is trying to choose between two methods of predicting first-year scores. One method has an r.m.s. error of 12. The other has an r.m.s. error of 7. Other things being equal, which should she choose? Why?

Question 4 answer The one with the smaller r.m.s. error because it will be more accurate.

Question 5 At a certain college, the first-year GPAs average about 3.0, with a SD of about 0.5; they are correlated at about 0.6 with high-school GPA. Person A predicts first-year GPAs just using the average. Person B predicts first-year GPAs by regression, using the high-school GPAs. Which person makes the smaller r.m.s. error? Smaller by what factor?

Question 5 answer Person B, who uses more information.

Question 5 answer Person B, who uses more information. The r.m.s. will be smaller by a factor of 1 r 2 = 1 0.6 2 = 0.8

Question 6 Pearson and Lee obtained the following results for about 1,000 families: r = 0.25 Husband height Wife height Mean 68.0 63.0 St. Dev. 2.7 2.5 1. What percentage of the women were over 5 8? 2. Of the women who were married to men of height 6 feet, what percentage were over 5 8?

z-table

Question 6 answer 1. 68 63 2.5 = 2 2.28%.

Question 6 answer 1. 68 63 2.5 = 2 2.28%. 2. 72 68 2.7 = 1.48 1.48 0.25 = 0.37 0.37 2.5 + 63 = 63.9 68 63.9 2.5 = 1.64 5%

Inference Up to this point, I have not been clear between the difference between the true regression line and the estimated regression line. This is just like the difference between µ x and x. The true regression line is y i = β 0 + β 1 x i + ɛ i

Inference Up to this point, I have not been clear between the difference between the true regression line and the estimated regression line. This is just like the difference between µ x and x. The true regression line is y i = β 0 + β 1 x i + ɛ i The estimated regression line is ŷ i = a + bx i = ˆβ 0 + ˆβ 1 x i

Return of the hypothesis test b is an estimator so it must have a sampling distribution and a standard error. If it has those, we can perform hypothesis tests. H 0 : β = 0 H 1 : β 0

Return of the hypothesis test b is an estimator so it must have a sampling distribution and a standard error. If it has those, we can perform hypothesis tests. H 0 : β = 0 H 1 : β 0 Now we just need to calculate a standard error.

Have to take my word for it The standard error for the regression coefficient is s b = b 1 r 2 r n 2 = s y 1 r 2 s x n 2

The test statistic is distributed as a t with n 2 degrees of freedom. b s b t n 2

Is the effect of income on contacts real? Contacts Income Mean 3.60 4.230 St. Dev. 2.27 3.328 b = 0.78 2.27 3.328 = 0.532 a = 3.60 0.532(4.32) = 1.3

The test s b = b 1 r 2 r n 2 = 0.532 1 0.78 2 0.78 10 2 = 0.151

The test s b = b 1 r 2 r n 2 = 0.532 1 0.78 2 0.78 10 2 = 0.151 t = 0.532 0.151 = 3.47

The test s b = b 1 r 2 r n 2 = 0.532 1 0.78 2 0.78 10 2 = 0.151 t = 0.532 0.151 = 3.47 The p-value is approximately 0, and we reject the null hypothesis.

What did we learn? If you had majored in international relations, you would not need this class.