Statistics E100 Fall 2013 Practice Midterm I - A Solutions



Similar documents
Name: Date: Use the following to answer questions 2-3:

Statistics 151 Practice Midterm 1 Mike Kowalski

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Exercise 1.12 (Pg )

First Midterm Exam (MATH1070 Spring 2012)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

MEASURES OF VARIATION

AP STATISTICS REVIEW (YMS Chapters 1-8)

The Normal Distribution

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Lecture 1: Review and Exploratory Data Analysis (EDA)

Father s height (inches)

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

AP * Statistics Review. Descriptive Statistics

Statistics 2014 Scoring Guidelines

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Mind on Statistics. Chapter 2

What is the purpose of this document? What is in the document? How do I send Feedback?

Second Midterm Exam (MATH1070 Spring 2012)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

4. Continuous Random Variables, the Pareto and Normal Distributions

Midterm Review Problems

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

Week 1. Exploratory Data Analysis

CALCULATIONS & STATISTICS

Review for Test 2. Chapters 4, 5 and 6

6.4 Normal Distribution

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS

Data Exploration Data Visualization

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams?

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Lesson 4 Measures of Central Tendency

Descriptive statistics; Correlation and regression

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Linear functions Increasing Linear Functions. Decreasing Linear Functions

Statistics. Measurement. Scales of Measurement 7/18/2012

COMMON CORE STATE STANDARDS FOR

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Descriptive Statistics

Mean = (sum of the values / the number of the value) if probabilities are equal

Simple linear regression

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

Simple Regression Theory II 2010 Samuel L. Baker

Module 3: Correlation and Covariance

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

E3: PROBABILITY AND STATISTICS lecture notes

Lesson 20. Probability and Cumulative Distribution Functions

Chapter 3. The Normal Distribution

Elementary Statistics

Final Exam Practice Problem Answers

Example: Boats and Manatees

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

MTH 140 Statistics Videos

Introduction to Quantitative Methods

Exploratory data analysis (Chapter 2) Fall 2011

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Chapter 7: Simple linear regression Learning Objectives

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

Sample Term Test 2A. 1. A variable X has a distribution which is described by the density curve shown below:

Chapter 1: Exploring Data

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Without data, all you are is just another person with an opinion.

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Week 3&4: Z tables and the Sampling Distribution of X

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics

Study Guide for the Final Exam

Normal distribution. ) 2 /2σ. 2π σ

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Means, standard deviations and. and standard errors

Section 1.3 Exercises (Solutions)

Correlation key concepts:

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

1. Multiple Choice: require no justification. Note: these parts are not related.

Week 4: Standard Error and Confidence Intervals

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

A full analysis example Multiple correlations Partial correlations

Chapter 13 Introduction to Linear Regression and Correlation Analysis

2013 MBA Jump Start Program. Statistics Module Part 3

Transcription:

STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 1 OF 5 Statistics E100 Fall 2013 Practice Midterm I - A Solutions 1. (16 points total) Below is the histogram for the number of medals won for the n = 203 countries that participated in the 2008 Summer Olympics in Beijing, along with the detailed summary statistics for this variable: a. (5 points) Is this distribution symmetric, left-skewed, or right-skewed? How do you know? The distribution is right-skewed (definitely not symmetric), which can be seen from the long right tail b. (5 points) The mean for these data is 4.70. Give a reasonable guess of the median for these data. 0, 1, 2, or 3. It should be less than or equal to 3, which is the 3 rd quartile, and greater than or equal to 0, the first quartile. (It should also be a whole number). c. (6 points) Based on the rule we used in class and in your text, are there any potential low or high outliers in the dataset? Show your work. IQR=3 Upper Q3+1.5*IQR = 3+4.5 = 7.5 Lower=Q1-1.5*IQR = 0-4.5 = -4.5 There are upper outliers (and the max is 110 medals, well above the limit) but no lower ones.

STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 2 OF 5 2. (16 points total) The following questions are multiple choice and DO NOT require any explanation or for you to show your work. Note: they are unrelated to each other. 2 a. (4 points) If the coefficient of determination ( R ) is 0.975 in a simple regression, then which of the following is true regarding the slope of the regression line? a) All we can tell is that it must be positive. b) It must be 0.975 c) It must be 0.987 d) Cannot tell the sign or the value. b. (4 points) Heights of college women have a distribution that can be approximated by a normal curve with a mean of 65 inches and a standard deviation equal to 3 inches. About what proportion of college women are between 65 and 67 inches tall? a) 0.75 b) 0.50 c) 0.25 d) 0.17 c. (4 points) Consider the annual salaries of mutual fund managers in the Boston area. The mean salary is $450,000 and the median salary is $380,000. Circle the correct answer below. The probability that the salary of a randomly selected mutual fund manager from the Boston area is larger than the mean of $450,000 is (Circle the appropriate answer): a) 0.5 b) = 0.5 c) 0.5 d) Cannot be determined

STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 3 OF 5 3. (20 points total) An observational study collected the monthly unemployment rate in the entire US (unemployment: in percentage points, ranging from 4.4% to 10%) along with the monthly inflation rate in the entire US (inflation: in percentage points change per month, ranging from - 1.92 to 1.22%). These data were taken from January 2003 until May 2012 (n = 113). The result of the regression is shown below: a. (4 points) What is the correlation between inflation and unemployment? sqrt(0.010) = -0.10 (must be negative since the slope is negative) b. (4 points) What is the formula for the regression line to predict inflation from unemployment? Inflation = 0.363-0.022* unemployment c. (4 points) June had an unemployment rate of 8.2%. What is the predicted inflation rate for June using this model? 0.363-0.022*8.2=0.183 d. (4 points) June had an inflation rate of 0.31%. What is June s residual value? Y - Yhat = 0.31-0.183 = 0.127 e. (4 points) A governmental official sees the results of this regression and states that a good way to lower the inflation rate is to increase the unemployment rate. In one or two sentences, please comment on this official s statement. Causality is not the same as correlation. The regression result just shows linear correlation between inflation rate and unemployment rate. But we cannot draw a causal conclusion from that (since it is an observational study).

STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 4 OF 5 4. (12 points total) The mean length of stay in a hospital is useful for planning purposes. Suppose that the following is the distribution of the length of stay in a hospital after a minor operation. Number of Days 1 2 3 4 Probability 0.4 0.3 0.2 0.1 a. (4 points) Calculate the mean (aka, expected value) for the length of stay. Mean is E(X) = 0.4+0.3*2+0.2*3+0.1*4 = 2 b. (4 points) Calculate the standard deviation for the length of stay. Variance = 0.4*(1-2) 2 +0.3*(2-2) 2 +0.2*(3-2) 2 +0.1*(4-2) 2 =1 Standard deviation = sqrt(variance) = 1 c. (4 points) A new policy in the hospital will add exactly one day to the length of stay for this operation for every patient. What will be the new mean and new standard deviation after this new policy is put in place? Expected Value is E(X) + 1 = 3 Standard deviation does not change so it is still 1. 5. (21 points total) Michael Phelps and Ryan Lochte are 2 of the US s top swimmers, and they both will be swimming the 400 IM in the Olympics. Overall, Michael Phelps is known to have a 60% chance of winning the gold medal in the 400 IM. If Michael Phelps does not win the gold medal, Ryan Lochte has a 75% chance of winning the gold medal in the 400 IM. Overall, Ryan Lochte has a 30% chance of winning the Gold Medal in the 400 IM. Define: MP: the event Michael Phelps wins the Gold Medal in the 400 IM RL: the event Ryan Lochte wins the Gold Medal in the 400 IM a. (3 points) Express the event Michael Phelps wins the Gold Medal and Ryan Lochte does not win the Gold Medal in terms of the events defined above. MP and RL c MP RL c b. (5 points) What is the overall probability that neither Michael Phelps nor Ryan Lochte wins the Gold Medal (someone else wins it)? P(MP c and RL c ) = 1 P(MP) P(RL) + P(MP and RL) = 1-0.6-0.3 + 0 = 0.1 Similarly, P(MP c and RL c ) = P(RL c MP c )*P(MP c )= 0.25*0.4 = 0.1 c. (5 points) Given Ryan Lochte does not win the gold medal, what is the probability that Michael Phelps does win it? P(MP RL c ) = 0.6/0.7 = 0.857 d. (4 points) Are events MP and RL independent? How do you know? P(MP and RL) = 0 and P(MP)* P(RL) = 0.6*0.3 = 0.18. Since P(A and B) P(A)* P(B), we can say MP and RL are dependent (aka, not independent).

STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 5 OF 5 Similarly: P(RL MP c ) = 0.75 is not equal to P(RL) = 0.3. e. (4 points) Are events MP and RL disjoint? How do you know? Since P(RL and MP) = 0, this implies that they are disjoint. P(RL and MP) = P(RL) - P(RL and MP C ) = P(RL) - P(RL MP C )*P(MP C ) = 0.3 (0.75)*(0.4) = 0 6. (12 points) At the 2008 Summer Olympics many of the top swimmers wore Speedo s LZR Racer swim suit, believing it to help them reduce their race times. You have been asked to design a high quality study to determine if the LZR Racer Suit actually reduces race times relative to the classic swim suit used by swimmers. Thirty world class swimmers have agreed to participate in your study, 18 men and 12 women. Describe the important elements of your study design in outline or bulleted list format (you may include a figure/schema giving the outline of your study design if that helps explain your approach). Note: you will receive a higher grade on this question if you design a higher quality study. The best design for this study would be as a matched pairs study (using each swimmer as their own control) and also stratify the study by gender (since the classic suits for women are quite different than the classic suits for men). Each subject will swim once with the LZR Racer swim suit then swim the same distance the next day using the classic swim suit or in reverse order, randomly assigned to order. Their racing time using the LZR Racer swim suit will be subtracted from their time using the classic swim suit and compared. - The subjects will be swimmers (individuals). - A matched pairs study design (with each swimmer wearing the LZR Racer swimsuit and the classic swim suit) stratified by gender will be used. The swimmers will swim a fixed course one day using one suit and then the same course the next day using the other suit, in a random order (study design). - The subjects will be world class swimmer volunteers (individuals selected). - 20 swimmers will be entered into the study (sample size). - The racing time using the LZR Racer swim suit will be subtracted from the time using the classic swim suit. The average improvement in time will be compared (response variable).