AQA STATISTICS 1 REVISION NOTES



Similar documents
GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

1 Correlation and Regression Analysis

Measures of Spread and Boxplots Discrete Math, Section 9.4

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

I. Chi-squared Distributions

Lesson 17 Pearson s Correlation Coefficient

Hypothesis testing. Null and alternative hypotheses

1. C. The formula for the confidence interval for a population mean is: x t, which was

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

PSYCHOLOGICAL STATISTICS

Math C067 Sampling Distributions

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Quadrat Sampling in Population Ecology

5: Introduction to Estimation

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Chapter 7: Confidence Interval and Sample Size

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Descriptive Statistics

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Properties of MLE: consistency, asymptotic normality. Fisher information.

Now here is the important step

Confidence Intervals for One Mean

Sampling Distribution And Central Limit Theorem

Output Analysis (2, Chapters 10 &11 Law)

1 Computing the Standard Deviation of Sample Means

NATIONAL SENIOR CERTIFICATE GRADE 12

Determining the sample size

FOUNDATIONS OF MATHEMATICS AND PRE-CALCULUS GRADE 10

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Chapter 7 Methods of Finding Estimators

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Mathematical goals. Starting points. Materials required. Time needed

TI-83, TI-83 Plus or TI-84 for Non-Business Statistics

Confidence Intervals

Maximum Likelihood Estimators.

Hypergeometric Distributions

Normal Distribution.

One-sample test of proportions

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

How To Solve The Homewor Problem Beautifully

Soving Recurrence Relations

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Confidence intervals and hypothesis tests

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Overview of some probability distributions.

Confidence Intervals for Linear Regression Slope

A Mathematical Perspective on Gambling

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

LECTURE 13: Cross-validation

Lesson 15 ANOVA (analysis of variance)

Chapter 5: Basic Linear Regression

Practice Problems for Test 3

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

NATIONAL SENIOR CERTIFICATE GRADE 11

Predictive Modeling Data. in the ACT Electronic Student Record


3 Basic Definitions of Probability Theory

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

CHAPTER 3 THE TIME VALUE OF MONEY

AP Calculus BC 2003 Scoring Guidelines Form B

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Example: Probability ($1 million in S&P 500 Index will decline by more than 20% within a

Basic Elements of Arithmetic Sequences and Series

STATISTICAL METHODS FOR BUSINESS

Chapter 14 Nonparametric Statistics

AP Calculus AB 2006 Scoring Guidelines Form B

7.1 Finding Rational Solutions of Polynomial Equations

CHAPTER 3 DIGITAL CODING OF SIGNALS

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

Incremental calculation of weighted mean and variance

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

3. Greatest Common Divisor - Least Common Multiple

Statistical inference: example 1. Inferential Statistics

FM4 CREDIT AND BORROWING

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

2-3 The Remainder and Factor Theorems

The Fundamental Forces of Nature

G r a d e. 2 M a t h e M a t i c s. statistics and Probability

THE TWO-VARIABLE LINEAR REGRESSION MODEL

BINOMIAL EXPANSIONS In this section. Some Examples. Obtaining the Coefficients

A probabilistic proof of a binomial identity

MATHEMATICS P1 COMMON TEST JUNE 2014 NATIONAL SENIOR CERTIFICATE GRADE 12

Transcription:

AQA STATISTICS 1 REVISION NOTES AVERAGES AND MEASURES OF SPREAD www.mathsbox.org.uk Mode : the most commo or most popular data value the oly average that ca be used for qualitative data ot suitable if the data values are very varied Mea : importat as it uses all the data values Disadvatage affected by extreme values If the data is grouped use the mid-poit of each group as your x Media : the middle value whe the data are i order For data values the media is the + 1 th value 2 Not affected by extreme values For 10 values the media will be the 5½ th value halfway betwee the 5 th ad the 6 th values Rage biggest value smallest value - greatly affected by extreme values Iterquartile Rage Upper quartile Lower quartile - measures the spread of the middle 50% of the data ad is ot affected by extreme values 3 4 5 7 8 9 9 LQ M UQ IQR = 9-4 = 5 2 3 4 4 5 7 8 9 LQ M UQ IQR = 7.5 3.5 = 4 Stadard Deviatio Deviatio from the mea is the differece from a value from the mea value The stadard deviatio is the average of all of these deviatios Formulas to work out stadard deviatio are give i the q SCALING DATA Additio if you add a to each umber i the list of data : New mea = old mea + a New media = old media + a New mode = old mode + a Stadard Deviatio is UNCHANGED Multiplicatio - If you multiply each umber i the list of data by b : New mea = old mea b New media = old media b New mode = old mode b New Stadard Deviatio = old stadard deviatio b

PROBABILITY Outcome : each thig that ca happe i a experimet Sample Space : list of all the possible values q NOTATION A Ç B A ad B both happe A È B either A or B or both happe C' C does t happe C P(C ) = 1 P(C) P(A È B) = P(A) + P(B) P(A Ç B) Mutually Exclusive Evets two or more evets that caot happe at the same time P(A È B) = P(A) + P(B) Idepedet Evets the outcome of oe evet does ot affect the outcome of aother P( B ) = P(A) P(B) A Ç Coditioal Probability : whe the outcome of the first evet affects the outcome of a secod evet, the probability of the secod evet depeds o what has happeed P(B/A) meas the coditioal probability of B give A P(B/A) = P(A Ç B) P(A) A Ç so P( B )= P(A) P(B/A) q If the questio states that the evets are idepedet the a tree diagram might be a good idea multiply alog the braches the add the appropriate combiatios together q Probability of at least 1 = 1 Probability of oe q If you are asked to fid probabilities usig data i a table work out the row/colum totals before you start

BINOMIAL DISTRIBUTION q A questio is biomial if: Probability of a evet happeig is give (p) Number of people/trials/objects chose give () q EQUALS or EXACTLY use the formula P(x=r) Make sure you write it out with the values substituted i C r p r (1 p) r Check that your powers add to make (umber of trials) from your calculator make sure you write dow this value q MORE/LESS THAN/AT LEAST use tables Remember tables give less tha or equal to Make sure you list the umbers ad idetify which oes you eed to iclude P(X>5) 0 1 2 3 4 5 6 7 8 9 = 1 P(X 5) P(X<5) 0 1 2 3 4 5 6 7 8 = P(X 4) q MEAN ad VARIANCE Mea = p Variace = p(1-p) stadard deviatio = p(1 p) q ASSUMPTIONS Idepedet evets with a fixed probability of success Radomly selected q COMPARISONS You may be asked to calculate the mea ad stadard deviatio of a biomial distributio ad compare them to the mea ad stadard deviatio of a sample (table of results) - if both the meas are approximately the same AND both stadard deviatios (or variaces) are approximately equal the you ca say that biomial model appears to fit the data ad that it must be idepedet, radom observatios.

NORMAL DISTRIBUTION FINDING PROBABILITIES q State the mea ad variace (stadard deviatio) q Stadardise to fid the z value z = x mea stadard deviatio q Sketch a graph ad shade i the area to represet the probability P(z <1.2) P(z >-0.8) 1.2-0.8 q Use the table to fid the probability - take care with egative z-values P(z > -0.8) = P(z < 0.8) -0.8 0.8 P(z < -0.8) = 1 P(z < 0.8) -0.8 0.8 q ALWAYS CHECK YOUR ANSWER WITH YOUR GRAPH if your shaded area is more that ½ ad your aswer is 0.4 (for example) you kow you have goe wrog somewhere!! WORKING BACKWARDS to fid the mea/stadard deviatio or both (simultaeous equatios) q State the probability you kow ad sketch a graph P(X < 34) = 0.95 0.05 Stadard deviatio = 8 q Use tables to fid the appropriate z value q Write dow the equatio used to stadardise with all of the kow values substituted q Rearrage to fid the mea 34 1.6445 = 34 mea 8

SAMPLES ad PROBABILITIES If you are asked to calculate probabilities ivolvig samples remember to divide the (populatio) stadard deviatio by the square root of the sample size whe you stadardise. z = x mea s or z = x mea variace ESTIMATION estimatig the populatio mea from a sample mea ad fidig a cofidece iterval If a radom sample of size is take from a ormal populatio ad the sample mea x is foud, the the 95% cofidece iterval of the populatio mea is give by where 2 Z value x 1.96 www.mathsbox.org.uk s is either the populatio variace (if give) or a ubiased estimate of the populatio variace (foud from the sample see below) s 2, x + 1.96 s 2 q CASE 1 : Stadard deviatio or Variace of the populatio is stated i the questio - from the data you oly eed to calculate the mea - use your tables to fid the appropriate z value - write out the above expressios for the cofidece itervals with all your values substituted i - calculate the two values for your cofidece itervals ad state clearly (3 sf) CHECK your aswer add the two aswers together ad divide by 2 this should be the sample mea!!! q CASE 2 : Stadard deviatio or Variace of the populatio is UNKNOWN - you will eed to use the data to calculate the variace of the sample ad the a ubiased estimate of the populatio variace - your calculator will give you value of the stadard deviatio for the sample you have etered s x - square this to calculate the sample variace A ubiased estimate of the populatio variace is Sample variace 1 Use this as the value of s 2 i your cofidece iterval whe substitutig the values i q INTERPRETATION a 95% cofidece iterval tells us that if we took the same size sample 100 times the 95 of the cofidece itervals we would calculate should cotai the TRUE populatio mea.

CENTRAL LIMIT THEOREM you oly eed to use the cetral limit theorem if you are ot told that the sample is selected from a populatio which is Normally Distributed The theorem cocers the distributio of the sample meas ad as log as the sample size is large eough (greater tha 30) the the sample meas will be ormally distributed ad so we ca calculate cofidece itervals REGRESSION fidig the equatio of the lie of best fit least squares y = a + bx Gradiet the chage i y for each uit chage i x e.g for every 1 degree rise i temperature sales icrease by b Itercept - the value of y whe x is zero. e.g. Whe the temperature is 0 C ice cream sales are 10 q If you are asked to iterpret the values of a ad b, make sure you discuss it i the cotext of the questio NOT i terms of x ad y (see examples above) q If you are give a table of values use your calculator to fid a ad b I your workigs state a =. b =.. ad show your values substituted ito y = a + bx q If you are calculatig a ad b usig the formulae make sure you use the formula book showig how you substituted the values i Always work out b first Use b ad the meas of x ad y to work out a a = (mea of y) - b(mea of x) q TO PLOT THE REGRESSION LINE choose 2 differet values of x use your equatio y=a + bx to work out the predicted y-values - plot the two poits ad joi with a straight lie q RESIDUAL = OBSERVED(actual value) PREDICTED(usig equatio y= a + bx) - the smaller the residuals the greater the accuracy of the lie of best fit i predictig values - sometimes a average residual ca be used to make predictios usig the lie of best fit e.g if a idividual has a average residual of 5 the to predict for this particular perso usig the lie, 5 should be added to the value predicted usig the equatio. q RELIABILITY OF PREDICTIONS - Iterpolatio predictig usig a x-value withi the rage of x-values used to calculate the a ad b cosidered to be a reliable predictio - Extrapolatio predictig usig a x-value outside of the rage of x-values used to calculate a ad b UNRELIABLE estimate - because you are assumig that the liear tred cotiues idefiitely use your commo sese to explai why this may be icorrect - watch out for NEGATIVE (or urealistic) y values which may result for the x values suggested agai use your commo sese to explai why this is urealistic

q SCALING either the x or the y values will chage the equatio e.g y = 0.5 1.2x If the x values are doubled the the equatio becomes y = 0.5 1.2(2x) y = 0.5 2.4x CORRELATION If 5 is added o to each of the y values the the equatio becomes y + 5 = 0.5 1.2x y = - 4.5 1.2 x (always rearrage to get y = a + b x) The Product Momet Correlatio Coefficiet : is a umerical measures of the stregth ad type of correlatio deoted by r ad will lie i the rage -1 r 1 q idicates how well the data, whe plotted i a scatter graph, fits a straight lie patter q NOT APPROPRIATE if the data does ot follow a liear patter whe plotted (straight lie) so scatter graph is eeded to check this q If you are give a table of values use your calculator to fid r (If you have time it s a good idea to check that you have etered your values correctly) q If you are calculatig usig summary values -make sure you use the formula book showig how you substituted the values ito the formula INTERPRETING r make sure you do this i the cotext of the questio (ot just positive correlatio) e.g. There appears to be a fairly strog relatioship betwee temperature ad ice cream sales, higher temperatures appear to correspod to higher values of ice-cream sales ad vice versa. q Scalig data a liear trasformatio or scalig of oe or both of the variables will ot affect the correlatio coefficiet all of the poits will stay i the same positio RELATIVE to each other TAKE CARE q Not all correlatio will be liear For this data, the correlatio coefficiet is close to 0 This does ot mea that there is o correlatio but simply mea that there is o liear correlatio (patter appears to be quadratic) q Spurious Correlatio A strog correlatio betwee 2 variables does ot mea that oe thig causes the other high marks i a maths exam do ot ecessarily cause high marks i a Statistics exam, they are likely to both be depedet o a commo third variable : the studets mathematical ability q Outliers Oe or two outliers ca have a dramatic effect o a correlatio coefficiet