Stats Review Chapters 3-4

Similar documents
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Exercise 1.12 (Pg )

Exploratory data analysis (Chapter 2) Fall 2011

Name: Date: Use the following to answer questions 2-3:

Stats Review Chapters 9-10

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Variables. Exploratory Data Analysis

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

Final Exam Practice Problem Answers

First Midterm Exam (MATH1070 Spring 2012)

The Dummy s Guide to Data Analysis Using SPSS

Lecture 1: Review and Exploratory Data Analysis (EDA)

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

MTH 140 Statistics Videos

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Descriptive Statistics

Example: Boats and Manatees

AP * Statistics Review. Descriptive Statistics

Descriptive Statistics

MEASURES OF VARIATION

2 Describing, Exploring, and

Mind on Statistics. Chapter 2

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Statistics 151 Practice Midterm 1 Mike Kowalski

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Data Analysis Tools. Tools for Summarizing Data

CALCULATIONS & STATISTICS

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

Simple Regression Theory II 2010 Samuel L. Baker

Algebra I Vocabulary Cards

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Example: Find the expected value of the random variable X. X P(X)

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 23. Inferences for Regression

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Mathematics Online Instructional Materials Correlation to the 2009 Algebra I Standards of Learning and Curriculum Framework

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

Week 3&4: Z tables and the Sampling Distribution of X

How Does My TI-84 Do That

II. DISTRIBUTIONS distribution normal distribution. standard scores

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Simple linear regression

Diagrams and Graphs of Statistical Data

Study Guide for the Final Exam

Statistics E100 Fall 2013 Practice Midterm I - A Solutions

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

ELEMENTARY STATISTICS

Using SPSS, Chapter 2: Descriptive Statistics

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

3: Summary Statistics

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Module 3: Correlation and Covariance

CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

Univariate Regression

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

COMMON CORE STATE STANDARDS FOR

Curve Fitting in Microsoft Excel By William Lee

How To Write A Data Analysis

How To Understand And Solve A Linear Programming Problem

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

430 Statistics and Financial Mathematics for Business

a. mean b. interquartile range c. range d. median

Implications of Big Data for Statistics Instruction 17 Nov 2013

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Geostatistics Exploratory Analysis

What is the purpose of this document? What is in the document? How do I send Feedback?

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

Using Microsoft Excel to Plot and Analyze Kinetic Data

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines

List of Examples. Examples 319

GeoGebra Statistics and Probability

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

The importance of graphing the data: Anscombe s regression examples

Statistics. Measurement. Scales of Measurement 7/18/2012

Chapter 1: Exploring Data

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.


MBA 611 STATISTICS AND QUANTITATIVE METHODS

Scatter Plot, Correlation, and Regression on the TI-83/84

03 The full syllabus. 03 The full syllabus continued. For more information visit PAPER C03 FUNDAMENTALS OF BUSINESS MATHEMATICS

Probability. Distribution. Outline

Introduction to Quantitative Methods

Calculator Notes for the TI-Nspire and TI-Nspire CAS

Transcription:

Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by Michael Sullivan, III And the corresponding Test Generator from Pearson Revised 12/13

Note: This review is composed of questions the textbook and the test generator. This review is meant to highlight basic concepts from the course. It does not cover all concepts presented by your instructor. Refer back to your notes, unit objectives, handouts, etc. to further prepare for your exam. A copy of this review can be found at www.sctcc.edu/cas. The final answers are displayed in red and the chapter/section number is the corner.

Find the Mean, Median, Mode Data Set: 71, 74, 67, 64, 72, 71, 65, 66, 69,70 Mean: add up all the numbers and divide by the amount of numbers 71+74+ 67+ 64+72+71+65+66+69+70 = 68.9 10 Median: Arrange numbers from smallest to largest then find the middle number 64, 65,66, 67,69,70,71, 71, 72, 74 The middle number is between 69 and 70. Finding the mean of these two numbers will give us the median which is 69.5 Mode: The most repeated number (can have none, one, or more than one) The mode is 71. 3.1

Find the Range and Sample Standard Deviation Data Set: 71, 74, 67, 64, 72, 71, 65, 66, 69,70 Range: The highest number the smallest number 74-64=10 Sample Standard Deviation: Use Excel or Calculator You get 3.28 3.2

Find the Five-Number Summary: Part 1 of 2 Data Set: 64, 65, 66, 67, 69, 70, 71, 71, 72, 74 Min: the smallest number 64 Max: the largest number 74 Median: the middle number 69.5 Q1: Find the median of the numbers from the minimum and the median (do not include the median) These numbers are 64, 65, 66, 67, 69. The median of these numbers are 66. So Q1 is 66. 3.4

Find the Five-Number Summary: Part 2 of 2 Q3: Find the median of the numbers from the median (not included) to the maximum. These numbers are 70, 71, 71, 72, 74. The median of these numbers are 71. So Q1 is 71. The five-number summary is 64 66 69.5 71 74 3.4

Find the Upper and Lower Fences Data Set: 71, 74, 67, 64, 72, 71, 65, 66, 69, 70 1) Find the IQR. IQR=Q3-Q1=71-66=5 2) Multiply the IQR by 1.5: 5*1.5=7.5 3) Upper Fence: Add 7.5 to Q3; 7.5+71=78.5 4) Lower Fence: Subtract 7.5 from Q1; 66-7.5=58.5 Any numbers outside this range would be considered outliers. 3.4

Construct a Boxplot Construct a box plot given the five-number summary 64 66 69.5 71 74 64 66 69.5 71 74 3.5

Relationship between Mean, Median and Shape Match the shape to the relationship between mean and median. Mean>Median Mean=Median Mean<Median Skewed Left Mean<Median Symmetric Mean=Median Skewed right Mean>Median If data is skewed, median is more representative of the typical observation. 3.1

Empirical Rule to Find the Probability: Part 1 of 2 At a tennis tournament, a statistician keeps track of every serve. She reported that the mean serve speed of a particular player was 104 mph and the standard deviation of the serve speeds was 8 mph. Assume that the statistician also gave us the information that the distribution of the serve speeds was bell shaped. Using the Empirical Rule, what proportion of the player's serves are expected to be between 112 mph and 120 mph? 3.2

Empirical Rule to Find the Probability: Part 1 of 2 Start by drawing the bell curve Compare this to the empirical rule curve on page 149 or find the percents as described on page 149. We can see that the probability is 13.5% or.135. *You could also find the z-scores, then find the probability. 3.2

Find the Mean: Part 1 of 2 Days Frequency 1-2 2 3-4 21 5-6 20 7-8 10 9-10 30 Step 1: Find the midpoint of each class (days) Days Class Midpoint, x i Frequency, f i 1-2 1 + 2 2 = 1.5 2 3-4 3.5 21 5-6 5.5 20 7-8 7.5 10 9-10 9.5 30 3.3

Find the Mean: Part 2 of 2 Step 2: Multiply the class midpoint by frequency Days Class Midpoint, x i Frequency, f i x i f i 1-2 1 + 2 2 1.5*2=3 = 1.5 2 3-4 3.5 21 73.5 5-6 5.5 20 110 7-8 7.5 10 75 9-10 9.5 30 285 Step 3: Find the sum of the x i f i column: 3+73.3+110+75+285=546.5 Step 4: Find the sum of the frequency column: 83 Step 5: Divide the number from step 3 by the number from step 4: 546.5/83=6.6 The mean is 6.6. 3.3

Find the Sample Standard Deviation: Part 1 of 2 (from previous problem) Step 1: Find the mean (see previous 2 slides). Mean=6.6 Step 2: Complete the following table (the first three columns were done in the previous 2 slides and column 4 is the mean) Days Class Midpoint, x i Frequency, f i x x i -x (x i x) 2 f i 1-2 1 + 2 2 = 1.5 2 6.6 1.5-6.6=-5.1 ( 5.1) 2 2 = 52.02 3-4 3.5 21 6.6-3.1 201.81 5-6 5.5 20 6.6-1.1 24.2 7-8 7.5 10 6.6.9 8.1 9-10 9.5 30 6.6 2.9 252.3 3.3

Find the Sample Standard Deviation: Part 2 of 2 (from previous problem) Days Class Midpoint, x i Frequency, f i x x i -x (x i x) 2 f i 1-2 1 + 2 2 6.6 1.5-6.6=-5.1 ( 5.1) = 1.5 2 2 = 52.02 2 3-4 3.5 21 6.6-3.1 201.81 5-6 5.5 20 6.6-1.1 24.2 7-8 7.5 10 6.6.9 8.1 9-10 9.5 30 6.6 2.9 252.3 Step 3: Find the total of the (x i x ) 2 f i column. Total =538.43 Step 4: Find the total of the frequency column. Total =83. Take this number and subtract 1. 83-1=82 Step 5: Take the number from Step 3 and divide by the number from step 4. 538.43/82=6.56622 Step 6: Take the square root of the number from step 5. 6.56622 = 2.6 3.3

Correlation Coefficient Match the Correlation with the Graph r=-.6 r=0 r=.99 should not use correlation 20 10 r=-.6 15 10 8 6 4 r=0 5 2 0 0 2 4 6 8 10 0 0 2 4 6 8 10 r=.999 12 10 8 6 4 2 0 0 2 4 6 8 10 20 15 10 5 0 0 2 4 6 8 10 Should not use correlation 4.1

Least-Squares Regression Line: Part 1 of 3 The data are the average one-way commute times (in minutes) for selected students and the number of absences for those students during the term. a) Find the equation of the regression line for the given data. Round the regression line values to the nearest hundredth. b) What would be the predicted number of absences if the commute time was 40 minutes? Is this a reasonable question? c) Interpret the Slope Commute time (x) Number of absences (y) 72 3 85 7 91 10 90 10 88 8 98 15 75 4 100 15 80 5 4.2

Least-Squares Regression Line: Part 2 of 3 a) Find the regression Line With Calculator y =.45x 30.3 By Hand First find the mean of x and y (labeled x and y), the sample standard deviation of x and y (labeled s x and s y ), and the correlation (r). x =86.556, y=8.556, s x =9.593, s y =4.39, r=.98 Slope (b 1 )=r s y s x =.98 4.39 9.593 =.45 Y-Intercept (b 0 )= y-b1 x =8.556-.45(86.556)=-30.3 The regression Line is y = b 1 x + b o so ours is y =.45x 30.3 4.2

Least-Squares Regression Line: Part 3 of 3 b) What would be the predicted number of absences if the commute time was 40 minutes? Is this a reasonable question? Tine is 40 minutes or x=40. Put this value into our least-squares regression line y =.45(40) 30.3=-12.3 This means that when the commute time is 40 minutes, than the number of absences is -12.3. This is not a reasonable question since 40 is outside the scope (i.e. 40 is not within the given range of x values). c) Interpret the slope The slope is.45. This means that for every minute we increase our commute the number of absences increases by.45. 4.2

Sum Of Residuals: Part 1 of 2 Find the sum of residuals Step 1: Find the least squares regression line (see previous slides) Step 2: Find the predicted y values (y) for each x Commute time (x) Number of absences (y) 72 3 85 7 91 10 90 10 88 8 98 15 75 4 100 15 80 5 Predicted y =.45(72)-30.3=2.1 7.95 10.65 10.2 9.3 13.8 3.45 14.7 5.7 4.3

Sum Of Residuals: Part 2 of 2 Step 3: Calculate the residuals: observed predicted or y-y Step 4: Calculate the residuals squared: (observed predicted) 2 or (y-y) 2 Step 5: Find the sum of the numbers in the column from step 4. This is the sum of residuals which equals 6.1875. Commute time (x) Number of absences (y) 72 3 2.1 85 7 7.95 Predicted y 91 10 10.65 90 10 10.2 88 8 9.3 98 15 13.8 75 4 3.45 100 15 14.7 80 5 5.7 Step 3 y-y.9 -.95 -.65 -.2-1.3 1.2.55.3 -.7 Step 4 (y-y) 2.81.9025.4225.04 1.69 1.44.3025.09.49 4.3

Coefficient of Determination (R 2 ) If the coefficient of determination (R 2 ) is 86.44% and the data shows a negative association, what is the linear correlation coefficient (r)? r = r 2 =.8644 =.9297 Since it has a negative association, r =-.9297. Interpret R 2 = 86.44% 86.44% of the variability in y (the response variable) is explained by the least-squares regression line. 4.3

Residual Plots What does the residual plot to the right suggest? There is an outlier Removing the outlier, what does the residual plot suggest? No pattern, linear model is appropriate 6 4 2 0-2 -4-6 -8-10 -12 0 2 4 6 8 10-14 4.3

Contingency Tables: Part 1 of 3 Is there an association between party affliction and gender? The following represents the gender and party affliction of registered voters based on random sample 802 adults. Female Male Republican 105 115 Democratic 150 103 Independent 150 179 a) Construct a frequency marginal distribution b) Construct a relative frequency marginal distribution c) Construct a conditional distribution of party affiliation by gender d) Is gender associated with party affiliation? If so, how? 4.4

Contingency Tables: Part 2 of 3 a) Construct a frequency marginal distribution Gender Party Female Male frequency marginal distribution Republican 105 115 =105+115=220 Democratic 150 103 253 Independent 150 179 329 =105+150+150=405 397 802 To do: Find the total for each row and column b) Construct a relative frequency marginal distribution Gender Party Female Male relative frequency marginal distribution Republican 105 115.274 Democratic 150 103.315 Independent 150 179.41 =405/802=.505.495 1 To do: Divide the row/column total by the sample size 4.4

Contingency Tables: Part 3 of 3 c) Construct a conditional distribution of party affiliation by gender Gender Party Female Male Republican =105/405=.259 =115/397=.290 Democratic.370.259 Independent.370.451 Total 1 1 To do: Divide the each cell by its column total d) Is gender associated with party affiliation? If so, how? Yes; males are more likely to be Independents and less likely to be democrats. 4.4