AP STATISTICS REVIEW (YMS Chapters 1-8)



Similar documents
Exercise 1.12 (Pg )

How To Write A Data Analysis

Fairfield Public Schools

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

MTH 140 Statistics Videos

Diagrams and Graphs of Statistical Data

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Summarizing and Displaying Categorical Data

Northumberland Knowledge

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Descriptive Statistics

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Probability Distributions

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

430 Statistics and Financial Mathematics for Business

COMMON CORE STATE STANDARDS FOR

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

This curriculum is part of the Educational Program of Studies of the Rahway Public Schools. ACKNOWLEDGMENTS

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Exploratory Data Analysis

Correlation and Regression

THE BINOMIAL DISTRIBUTION & PROBABILITY

List of Examples. Examples 319

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

UNIT 1: COLLECTING DATA

6.4 Normal Distribution

Algebra I Vocabulary Cards

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

AP Statistics: Syllabus 1

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Simple linear regression

Lesson 4 Measures of Central Tendency

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DATA INTERPRETATION AND STATISTICS

Stats on the TI 83 and TI 84 Calculator

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Common Core Unit Summary Grades 6 to 8

Algebra 1 Course Title

Without data, all you are is just another person with an opinion.

AP * Statistics Review. Descriptive Statistics

Manhattan Center for Science and Math High School Mathematics Department Curriculum

Algebra 1 Course Information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Exploratory data analysis (Chapter 2) Fall 2011

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

What is the purpose of this document? What is in the document? How do I send Feedback?

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

MEASURES OF VARIATION

How To Understand And Solve A Linear Programming Problem

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Statistics 2014 Scoring Guidelines

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide

096 Professional Readiness Examination (Mathematics)

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Math 1. Month Essential Questions Concepts/Skills/Standards Content Assessment Areas of Interaction

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

CALCULATIONS & STATISTICS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Common Tools for Displaying and Communicating Data for Process Improvement

Name: Date: Use the following to answer questions 2-3:

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

AP STATISTICS (Warm-Up Exercises)

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Exploratory Data Analysis. Psychology 3256

Module 3: Correlation and Covariance

Bar Graphs and Dot Plots

Foundation of Quantitative Data Analysis

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

Interpreting Data in Normal Distributions

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Statistics E100 Fall 2013 Practice Midterm I - A Solutions

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Chapter 2: Frequency Distributions and Graphs

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Geostatistics Exploratory Analysis

Chapter 2 Statistical Foundations: Descriptive Statistics

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

AP Statistics Solutions to Packet 2

Measurement with Ratios

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Lecture 1: Review and Exploratory Data Analysis (EDA)

Big Ideas in Mathematics


5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Transcription:

AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with numbers describing data) e.g. weights of hamsters or amounts of chemicals in beverages A. Graphing one variable (univariate) data ALWAYS PLOT DATA 1. Categorical data Bar graphs (bars do not touch) Pie charts (percentages must sum to 100%). Quantitative data label carefully Dot plots can resemble probability curves Stem (& leaf) plots remember to put in the key (e.g. 8 means 8 mg. of salt) Split stems if too many data points Back to back for comparison of two samples Histogram put // for breaks in axis, use no fewer than 5 classes (bars), check to see if scale is misleading, look for symmetry & skewness Ogive cumulative frequency plot Time plot used for seasonal variation where the x-axis is time Box plot modified shows outliers Side by side are good for comparing quartiles, medians and spread B. Summary statistics for one variable data (use calculator with 1-variable stats) 1. Measures of central tendency (center) mean ( x, μ ) median (middle) mode (most). Measures of dispersion (spread) range (max min) quartile (5% = Q 1, 75% = Q 3 ) interquartile range (Q 1 Q 3 ) ( xi x) ( x ) i μ variance s = orσ = n 1 n standard deviation square root of variance (s,σ ) Mean, range, variance, and standard deviation are non resistant measures (strongly influenced by outliers). Use variance and standard deviation with approximately normal distributions only. Remember: the mean chases the tail.

The Normal Distributions (Chapter ) 1. If the mean = 0 and the standard deviation = 1, this is a standard, normal curve. x μ Use with z scores (standard scores), z =, where +z are scores are above the mean and σ z are scores below the mean. 3. To compare two observations from different circumstances, find the z score of each, then compare 4. Use z scores to find the p value, the probability (or proportion or percent) of the data that lies under a portion of the bell curve, p values represent area under the curve. Use shadenorm and normalcdf (to find the p value), or invnorm (to find z score) on the calculator 5. ALWAYS DRAW THE CURVE and shade to show your area. 6. 68% 95% 99.7% rule for area under the curve Examining Relationships (Chapter 3) A. Graphing two variable (bivariate) data DATA MUST BE QUANTITATIVE. Graph the explanatory variable (independent) on the x axis, the response variable (dependent) on the y axis 1. Scatterplots look for relationships between the variables.. Look for clusters of points and gaps. Two clusters indicate that the data should be analyzed to find reasons for the clusters. 3. If the points are scattered, draw an ellipse around the plot. The more elongated, the stronger the linear relationship. Sketch the major axis of the ellipse. This is a good model of the linear regression line. B. Analyzing two variable quantitative data when a linear relationship is suggested 1. Linear correlation coefficient (r) measures the strength of the linear relationship 1 r 1 r = 0 indicates no relationship (the ellipse is a perfect circle) r indicates an inverse relationship r is a non resistant measure (outliers strongly affect r) 1 xi x yi y r = ( )( ) (use calculator with variable stats) n 1 sx s y. Least squares regression line (LSRL) used for prediction; minimizes the vertical distances from each data point to the line drawn. (Linreg a+bx) y varies with respect to x, so choose the explanatory and response axes carefully (y is dependent on x) ŷ = predicted y value

s y ŷ = a + bx is the equation of the LSRL where b = slope = r ( ) sx is always on the line. and the point ( x, y) Do not extrapolate (predict a y value when the x value is far from the other x values). 3. Coefficient of Determination (r ) gives the proportion (%) of variation in the values of y that can be explained by the regression line. The better the line fits, the higher the value of r. To judge "fit of the line" look at r and r. If r = 0.7, then r =.49, so about half the variation in y is accounted for by the least squares regression line. 4. Residual (y ŷ ) vertical distance from the actual data point to the regression line. y ŷ = observed y value predicted y value where residuals sum to zero Residual plot scatterplot of (observed x values, predicted y values) or (x, ŷ ). Use calculator to plot residuals on y axis, original x values on x axis. Check: no pattern good linear relationship, curved pattern no linear relationship, plot widens larger x values do not predict y values well Outliers y values far from the regression line (have large residuals) Influential points x values far from the regression line (may have small residuals) More on Two-Variable Data (Chapter 4) A. Analyzing two variable quantitative data when the data is in a curved pattern: If the data appears curved in the shape of a power function or an exponential function, or or, use the calculator to fit an appropriate function to the data. B. Check for exponential regression 1. Take the log of the original y data (in your list). Replot the data (x, log y); it will look linear 3. Calculate a least squares regression line as if the transformed data was original data. Remember that the regression equation is now in the form log ŷ = a + bx. 4. Verify with residual plot (no pattern) 5. Remember to undo the transformation before making a prediction using the regression line. C. Check for power regression 1. Take the log of both the original x and original y data (in your lists). Replot the data (log x, log y); it will look linear 3. Calculate a least squares regression line as if the transformed data was original data. Remember that the regression equation is now in the form log ŷ = a + blog x 4. Verify with residual plot (no pattern) 5. Remember to undo the transformation before making a prediction using the regression line.

D. Cautions in analyzing data 1. Correlation does not imply causation. Only a well designed, controlled experiment may establish causation. Lurking variables (variables not identified or considered) may explain a relationship (correlation) between the explanatory and response variables by either confounding (a third variable affects the response variable only Hawthorne effect) or by common response (a third variable affects both the explanatory and response variables population growth affected both Methodist ministers and liquor imports). E. Relations in categorical data 1. From a two-way table of counts, find marginal and categorical distributions (in percents). Describe relationship between two categorical variables by comparing percents 3. Recognize Simpson s paradox and be able to explain it Producing Data (Chapter 5) A. Census contacts every individual in the population to obtain data Symbols μ and σ are parameters and are used only with population data B. Sample survey collects data from a part of a population in order to learn about the entire population Symbols x and s x are statistics and are used with sample data 1. Bad sampling designs result in bias in different forms voluntary response sample participants choose themselves, usually those with strong opinions choose to respond e.g. on line surveys, call in opinion questions convenience sample investigators choose to sample those people who are easy to reach e.g. marketing surveys done in a mall bias the design systematically favors certain outcomes or responses e.g. surveying pacifist church members about attitudes toward war. Good sampling designs simple random sample a group of n individuals chosen from a population in such a way that every set of n individuals has an equal chance of being the sample actually chosen; use a random number table or randint on the calculator stratified random sample divide the population into groups (strata) of similar individuals (by some chosen category) then choose a simple random sample from each of the groups multistage sampling design combines stratified and random sampling in stages systematic random sampling choosing every n th individual after choosing the first randomly

3. Cautions (even when the design is good) include: undercoverage when some groups of the population are left out, often because a complete list of the population from which the sample was chosen was not available. e.g. U.S. census has a task force to get data from the homeless because the homeless do not have addresses to receive the census forms in the mail nonresponse when an individual appropriately chosen for the sample cannot or does not respond response bias when an individual does not answer a question truthfully, e.g. a question about previous drug use may not be answered accurately wording of questions questions are worded to elicit a particular response, e.g. One of the Ten Commandments states, "Thou shalt not kill." Do you favor the death penalty? C. Observational study observes individuals in a population or sample, measures variables of interest, but does not in any way assign treatments or influence responses D. Experiment deliberately imposes some treatment on individuals (experimental units or subjects) in order to observe response. Can give give evidence for causation if well designed with a control group. 3 necessities: Control Randomize Replicate Control for lurking variables by assigning units to groups that do not get the treatment Randomize use simple random sampling to assign units to treatments/control groups Replicate use the same treatment on many units to reduce the variation due to chance The "best" experiments are double blind neither the investigators nor the subjects know which treatments are being used on which subjects. Placebos are often used. Block designs subjects are grouped before the experiment based on a particular characteristic or set of characteristics, then simple random samples are taken within each block. Matched pairs is one type of block design where two treatments are assigned, sometimes to the same subject, sometimes to two different subjects matched very closely. Remember how to sketch a design (p. 30) Probability: The Study of Randomness (Chapter 6) A. Basic definitions 1. Probability only refers to the long run ; never short term. Independent one event does not change (have an effect on) another event 3. Mutually exclusive (disjoint) events cannot occur at the same time, so there can be no intersection of events in a Venn diagram. Mutually exclusive events ALWAYS have an effect on each other so they can never be independent.

B. General rules 1. All probabilities for one event must sum to 1. P(A c ) = 1 P(A) A c is the complement of A 3. P(A or B) = P(A) + P(B) P(A and B) {or means U - union} 4. P(A and B) = P(A) P(B A) {and means I - intersection} 5. P( AandB) P(B A) = P(B given A) = P( A) (conditional probability) 6. If P(A and B) = 0 then A and B are mutually exclusive 7. If P(B A) = P(B), then A and B are independent 8. If P(A and B) = P(A) P(B), then A and B are independent Random Variables (Chapter 7) 1. Graphs, whether of continuous or discrete variables, must have area under a curve = 1. Histograms discrete smooth curves continuous. The graphs need not be symmetric.. To get the expected value or mean of a discrete random variable, multiply the number of items by the probability assigned to each item (usually given in a probability distribution table), then sum those products, μ x p i = i 3. To get the variance of a discrete random variable, useσ = ( x i μ) pi where p is the probability assigned to each item, x. 4. To find the sum or difference ( ± ) using two random variables, add or subtract the means to get the mean of the sum or difference of the variables, μx± y = μx ± μy. To get the standard deviation, add the variances, then take the square root of the sum, The Binomial and Geometric Distributions (Chapter 8) σ σ x + σ y =. A. The binomial distribution conditions: (1) used when there are only two options (success or failure), ( )there is a fixed number of observations, (3) all observations are independent, (4) the probability of success, p, is constant. 1. The mean of a binomial distribution is μ = np where p is the probability and n is the number of observations in the sample.. The standard deviation of the binomial distribution is σ = np( 1 p) 3. The graph of a binomial distribution is strongly right skewed (has a long right tail) unless the number in the sample is very large, then the distribution becomes normal.

4. binomial probability ncr(p) r (1 p) n r e.g. the probability of choosing a certain color is.35. If 8 people are in a room, the 8! 5 3 probability that exactly 5 of those will choose the color is (.35) (.65). 5!(8 5)! On calculator use nd DIST binompdf(8,.35, 5). To find the probability that 5 or less will choose the color, sum the individual probabilities of 0, 1,, 3, 4, & 5, OR use 1 (sum of probabilities of 6, 7, 8) On calculator use nd DIST binomcdf(8,.35,5). B. The geometric distribution conditions are the same as for the binomial except there is not a fixed number of observations because the task is to find out how many times it takes before a success occurs. This is sometimes called a waiting time distribution. 1. The mean of the geometric distribution is μ = 1 p. The standard deviation of the geometric distribution is σ = p 3. The graph of the geometric distribution is strongly right skewed always. 4.. geometric probability p n (1 ) 1 p e.g. the probability, when rolling a fair die, of rolling a particular number (say a 4) for the 1 7 1 1 first time on the 7th roll is (1 ) ( ). 6 6 On calculator, use nd DIST geometpdf( 1 6,7). 1 p To find the probability that rolling a particular number will take more than 7 rolls of the die, use 1 (the sum the geometric probabilities from 1 up to 7) On calculator use 1 nd DIST geometcdf( 1 6,7).