Statistics E100 Fall 2013 Midterm I Exam Solutions

Similar documents
Statistics E100 Fall 2013 Practice Midterm I - A Solutions

Father s height (inches)

COMMON CORE STATE STANDARDS FOR

Statistics 151 Practice Midterm 1 Mike Kowalski

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

What is the purpose of this document? What is in the document? How do I send Feedback?

Name: Date: Use the following to answer questions 2-3:

Week 3&4: Z tables and the Sampling Distribution of X

Module 3: Correlation and Covariance

AP Statistics Solutions to Packet 2

Statistics 2014 Scoring Guidelines

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

6.2 Normal distribution. Standard Normal Distribution:

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Mind on Statistics. Chapter 2

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

First Midterm Exam (MATH1070 Spring 2012)

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

AP * Statistics Review. Descriptive Statistics

Exercise 1.12 (Pg )

Second Midterm Exam (MATH1070 Spring 2012)

MTH 140 Statistics Videos

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

Draft 1, Attempted 2014 FR Solutions, AP Statistics Exam

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

Characteristics of Binomial Distributions

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Descriptive Inferential. The First Measured Century. Statistics. Statistics. We will focus on two types of statistical applications

AP STATISTICS REVIEW (YMS Chapters 1-8)

a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams?

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Mathematics (Project Maths)

Chapter 3. The Normal Distribution

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Applied Data Analysis. Fall 2015

3. Data Analysis, Statistics, and Probability

STAT 350 Practice Final Exam Solution (Spring 2015)

Descriptive statistics; Correlation and regression

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Descriptive Statistics

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Fairfield Public Schools

The Importance of Statistics Education

Results from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics

SAMPLING DISTRIBUTIONS

Descriptive Statistics and Measurement Scales

Premaster Statistics Tutorial 4 Full solutions

Homework 8 Solutions

Mind on Statistics. Chapter 8

CALCULATIONS & STATISTICS

Probability Distributions

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

This curriculum is part of the Educational Program of Studies of the Rahway Public Schools. ACKNOWLEDGMENTS

Mind on Statistics. Chapter 12

Regression Analysis: A Complete Example

Ch. 3.1 # 3, 4, 7, 30, 31, 32

Midterm Review Problems

The Dummy s Guide to Data Analysis Using SPSS

MEASURES OF VARIATION

Independent samples t-test. Dr. Tom Pierce Radford University

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY

Statistics 104: Section 6!

Section 1.3 Exercises (Solutions)

Georgia Department of Education Common Core Georgia Performance Standards Framework Teacher Edition Coordinate Algebra Unit 4

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Using Excel for Statistical Analysis

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Chapter 1: Exploring Data

A full analysis example Multiple correlations Partial correlations

The Chi-Square Test. STAT E-50 Introduction to Statistics

Simple Linear Regression

1) The table lists the smoking habits of a group of college students. Answer: 0.218

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

II. DISTRIBUTIONS distribution normal distribution. standard scores

Mathematics (Project Maths Phase 1)

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Additional sources Compilation of sources:

Diagrams and Graphs of Statistical Data

Statistics. Measurement. Scales of Measurement 7/18/2012

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Covariance and Correlation

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

4. Continuous Random Variables, the Pareto and Normal Distributions

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A) B) C) D) 0.

Organizing Your Approach to a Data Analysis

Correlation and Regression

Study Guide for the Final Exam

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Transcription:

STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 1 OF 7 Statistics E100 Fall 2013 Midterm I Exam Solutions 1. (9 points) Forbes magazine published data on the annual salary of the chief executive officer for 50 of the best small firms in 2012. The annual salary was reported in $1,000s. The salaries ranged from $44,000 to $2,105,000 and are depicted in the two plots below. a) (3 points) The mean of these salaries is $730,000. What is your guess of the median: $200,000, $400,000, $600,000, or $1,000,000? $600,000 (it s the line in the middle of the box in the boxplot). b) (3 points) What is your guess of the standard deviation for this data: $200,000, $400,000, $600,000, or $1,000,000? $400,000 (The whole range is $2 million, divide that by 5 to get a guess of SD) c) (3 points) What is your guess of the interquartile range for this data: $200,000, $400,000, $600,000, or $1,000,000? $600,000 (it s the width of the box in the boxplot).

STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 2 OF 7 2. (26 points) The General Social Survey measured the annual income (income: measured in dollars) and education (educ: measured in number of years) for n = 1758 adults in the US. The results of a regression for this data are shown below: a) (4 points) What is the correlation between income and educ?. We know it must be positive since the slope is positive. b) (5 points) What is the value for the slope in this model? What is its interpretation? The slope in this model is $5168.32. This means that for every extra year of school completed, a person s family income is predicted to increase by $5168.32 on average. c) (4 points) Interpret the value of R 2 for this regression. R 2 for this model is reported to be 0.154. This means that 15.4% of the variability in income can be predicted the number of years of school completed.

STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 3 OF 7 d) (4 points) College graduates will typically have 16 years of education after receiving their Bachelor s degree. Based on this model, what is the predicted annual income for a college graduate? ^ y = a + b(x) = -36133 + 5168(x) = -36133 + 5168(16) = $46,555 e) (4 points) A recently graduated friend of yours (with 16 years of education) gets a job and doesn t want to tell you her income, but you know her residual is $25,000. What is your friend s actual income? ^ e = y y. Solving for y we get: y = y + e = 25000 + 46555 = $71,555 ^ f) (5 points) This friend of yours interprets this model to mean that more education causes increased income later in life. Briefly comment on your friend s interpretation. This may not be a correct conclusion; this relationship may not be causal. Since this is based on a survey (on observational study), there may be confounding factors in this association. For example, people who are hard-working or more driven may end up getting more education and have more income. If you took away this extra education, they may make the same amount of income anyway.

STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 4 OF 7 3. (15 points total, 3 points each) Multiple Choice a) A survey of 122 families with epileptic children explored the behavior of the family dog in connection with epileptic seizures. Many families claimed that their dog was able to anticipate an upcoming seizure, and demonstrated its concern in a variety of ways. It was reported that anticipation time ranged from 10 seconds to 5 hours, with an average of 2.5 minutes. The shape of the distribution of anticipation times is likely i) skewed right ii) skewed left iii) symmetric iv) categorical b) Which of the following is not a property of r, the correlation coefficient? i) r is always between 0 and 1. ii) r does not depend on the units of y or x. iii) r measures the strength of the linear relationship between x and y. iv) r does not depend on which of the two variables is labeled as x. c) Mankiw was concerned that the highest score on the first Economics exam was only 99 (instead of 100). He decided to add one point to everyone s score. The effect of this would be: i) The standard deviation would increase by 1. ii) The median would change but the mean would not. iii) The standard deviation would not change but the mean and median would increase. iv) none of these d) On a statistics exam with a mean of 76 and SD of 12, Tom scored one standard deviation above the mean, Mary had a score of x = 79, and Bill had a z-score of z = -0.5. Place these three students in order from lowest to the highest score. i) Bill, Mary, Tom ii) Mary, Tom, Bill iii) Tom, Bill, Mary iv) Tom, Mary, Bill e) A simple random sample of 1200 adult Americans was selected and each person was asked the following question. In light of the huge national deficit, should the government at this time spend additional money to establish a national system of health insurance? 39% of those responding answered yes. Which of the following results is most likely? i) Accurate and unbiased ii) An understatement of the true percentage iii) An overstatement of the true percentage

STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 5 OF 7 4. (18 total points) A recent study examined the effectiveness of bicycle safety helmets in reducing head injuries. The data consist of a random sample of 837 cyclists who were involved in bicycle accidents in a one-year period; the data are summarized in the following two-way table. Wearing Helmet (H) Yes No Total Head Injury (I) Yes 101 90 191 No 418 228 646 Total 519 318 837 a) (4 points) What proportion of all cyclists in accidents wore a helmet AND suffered a head injury? Let H = helmet and I = injury. Then: ( ) b) (4 points) Among those cyclists wearing a helmet, what is the proportion of cyclists who suffered a head injury? ( ) c) (5 points) Ignoring any issues with study design, does this table suggest that helmets might be effective at decreasing head injuries? Justify your answer in 1-2 sentences. We should compare ( ) with ( ). Since these proportions are so different, it looks like wearing a helmet may in fact decrease the chance of a head injury. The distribution of head injuries in these two groups are quite different (rate of head injury lower for those wearing a helmet). d) (5 points) The cyclists were randomly selected from accident victims, but were not randomized to helmet use vs. no helmet use. Give one possible confounding variable here, and explain briefly why it may be a confounder (in 1-2 sentences). Any variable that is related both to whether or not someone wears a helmet and with the chance of getting a head injury could be considered a possible confounder. For example: riding speed of the cyclist at the time of the accident. If bikers who choose not to wear helmets tend to ride faster, it may be the speed of travel at which they have their accident that causes the head injury and not the fact that they are not wearing a helmet. Another possibility: the general carelessness of the way the biker rides.

STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 6 OF 7 5. (22 points) For men, binge drinking is defined as having five or more drinks in a row, and for women as having 4 or more drinks in a row. According to a study by the Harvard School of Public Health, 44% of college students engage in binge drinking and 56% do not binge drink (either drink moderately or abstain entirely). Another study has found that among young adult binge drinkers, 17% have been involved in an alcohol related automobile accident. Among adults of the same age who are not binge drinkers, 9% have been involved in such accidents. a) (5 points) Are the events binge drinking and being involved in an alcohol-related automobile accident independent? Support your statement numerically. Let s define the events: A = auto accident B = binge drinker We were given P(B) = 0.44, P(A B) = 0.17 and P(A B C ) = 0.09. Since P(A B) = 0.17 P(A B C ) = 0.09, we know that A and B are dependent. b) (5 points) What is the probability that a randomly selected college student will be both a binge drinker and will have been involved in an alcohol related automobile accident? P(A and B) = P(A B)*P(B) = 0.44(0.17) = 0.0748 c) (5 points) What is the probability that a randomly selected college student will be a binge drinker or will have been involved in an alcohol related automobile accident? P(A or B) = P(A) + P(B) P(A and B) = 0.1252 + 0.44 0.0748 = 0.4904 Note: P(A) = P(A and B) + P(A and B C ) = 0.44(0.17) + 0.056(0.09) = 0.0748 + 0.0504 = 0.1252 Or it can be solved by: P(A or B) = P(B) + P(A and B C ) = 0.44 + 0.56(0.09) = 0.4904 One more way to do it: P(A or B) = 1 P(A C and B C ) = 1 0.56(0.91) = 1 0.5096 = 0.4904

STATISTICS E100 FALL 2013 MIDTERM I EXAM SOLUTIONS PAGE 7 OF 7 d) (5 points) Given a college student has been involved in an alcohol-related automobile accident, what is the probability that he or she is a binge drinker? Based on previous work: P(B A) = P(A and B) / P(A) = 0.0748 / 0.1252 = 0.597 Note, this can be calculated based on the 2x2 table or directly by Bayes theorem. 6. (12 points) The National Collegiate Athletic Association (NCAA) requires a Division I athlete (one that has an average high school GPA) to score at least 820 on the combined math and verbal parts of the SAT exam to compete in their first college year. In 2012, the scores of all students nationwide taking the SATs were approximately normally distributed with mean μ = 1012 and standard deviation σ = 219. a) (6 points) What proportion of all students nationwide had scores less than 820? P(X < 820) = P(Z < (820-1012)/219) = P(Z < -0.88) = 0.1894 b) (6 points) What value did a student have to receive on the SAT in order to be in the top 1% of all students nationwide? Here, we need to first find the z-value that puts 1% of the distribution to the right of it (or 99% to the left of it so we know it should be positive). Looking it up on the standard normal z-table, we see that a z-value of z * = 2.33 is the value we want. Thus in terms of SAT scores in the general population: X = µ + z*(σ) = 1012 + 2.33(219) = 1522