STATISTICS AND STANDARD DEVIATION

Similar documents

Chapter 13 Introduction to Linear Regression and Correlation Analysis

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Solving Quadratic Equations by Graphing. Consider an equation of the form. y ax 2 bx c a 0. In an equation of the form

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Example: Boats and Manatees

Pearson s Correlation Coefficient

So, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1.

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

INVESTIGATIONS AND FUNCTIONS Example 1

2. Simple Linear Regression

FINAL EXAM REVIEW MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 9 Descriptive Statistics for Bivariate Data

Univariate Regression

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

1. a. standard form of a parabola with. 2 b 1 2 horizontal axis of symmetry 2. x 2 y 2 r 2 o. standard form of an ellipse centered

Correlation key concepts:

North Carolina Community College System Diagnostic and Placement Test Sample Questions

C3: Functions. Learning objectives

4. Continuous Random Variables, the Pareto and Normal Distributions

1.6. Piecewise Functions. LEARN ABOUT the Math. Representing the problem using a graphical model

DATA INTERPRETATION AND STATISTICS

MEASURES OF VARIATION

M122 College Algebra Review for Final Exam

AP Statistics Solutions to Packet 2

The Slope-Intercept Form

Exercise 1.12 (Pg )

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Simple linear regression

Zeros of Polynomial Functions. The Fundamental Theorem of Algebra. The Fundamental Theorem of Algebra. zero in the complex number system.

The Big Picture. Correlation. Scatter Plots. Data

6.4 Normal Distribution

MATH REVIEW SHEETS BEGINNING ALGEBRA MATH 60

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Florida Algebra I EOC Online Practice Test

LESSON EIII.E EXPONENTS AND LOGARITHMS

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

2.7 Applications of Derivatives to Business

POLYNOMIAL FUNCTIONS

Solving Absolute Value Equations and Inequalities Graphically

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

THE POWER RULES. Raising an Exponential Expression to a Power

The correlation coefficient

Algebra I Vocabulary Cards

Module 3: Correlation and Covariance

Session 7 Bivariate Data and Analysis

5.1. A Formula for Slope. Investigation: Points and Slope CONDENSED

INTRODUCTION TO ERRORS AND ERROR ANALYSIS

Zero and Negative Exponents and Scientific Notation. a a n a m n. Now, suppose that we allow m to equal n. We then have. a am m a 0 (1) a m

Exponential and Logarithmic Functions

Statistics. Measurement. Scales of Measurement 7/18/2012

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

CALCULATIONS & STATISTICS

Descriptive Statistics

15.1. Exact Differential Equations. Exact First-Order Equations. Exact Differential Equations Integrating Factors

Summer Math Exercises. For students who are entering. Pre-Calculus

Paper No 19. FINALTERM EXAMINATION Fall 2009 MTH302- Business Mathematics & Statistics (Session - 2) Ref No: Time: 120 min Marks: 80

Key Concept. Density Curve

Common Tools for Displaying and Communicating Data for Process Improvement

Introduction to Basic Reliability Statistics. James Wheeler, CMRP

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

SAMPLE. Polynomial functions

Section 3-3 Approximating Real Zeros of Polynomials

Direct Variation. 1. Write an equation for a direct variation relationship 2. Graph the equation of a direct variation relationship

MATH 185 CHAPTER 2 REVIEW

2 3 Histograms, Frequency Polygons, and Ogives

CHAPTER TEN. Key Concepts

7.7 Solving Rational Equations

DISTANCE, CIRCLES, AND QUADRATIC EQUATIONS

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Section 3-7. Marginal Analysis in Business and Economics. Marginal Cost, Revenue, and Profit. 202 Chapter 3 The Derivative

EQUATIONS OF LINES IN SLOPE- INTERCEPT AND STANDARD FORM

Core Maths C2. Revision Notes

CURVE FITTING LEAST SQUARES APPROXIMATION

Functions and Graphs CHAPTER INTRODUCTION. The function concept is one of the most important ideas in mathematics. The study

NAME DATE PERIOD. 11. Is the relation (year, percent of women) a function? Explain. Yes; each year is

Simple Regression Theory II 2010 Samuel L. Baker

3.2 Measures of Spread

Higher. Polynomials and Quadratics 64

2.3 Quadratic Functions

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

SECTION 2-2 Straight Lines

Name Class Date. Additional Vocabulary Support

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

SECTION 5-1 Exponential Functions

Week 3&4: Z tables and the Sampling Distribution of X

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

II. DISTRIBUTIONS distribution normal distribution. standard scores

Slope-Intercept Form and Point-Slope Form

3.4 Statistical inference for 2 populations based on two samples

Chapter 23. Inferences for Regression

17. SIMPLE LINEAR REGRESSION II

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

6. The given function is only drawn for x > 0. Complete the function for x < 0 with the following conditions:

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Common Core Unit Summary Grades 6 to 8

Elementary Statistics

Transcription:

STATISTICS AND STANDARD DEVIATION

Statistics and Standard Deviation STSD-A Objectives... STSD 1 STSD-B Calculating Mean... STSD STSD-C Definition of Variance and Standard Deviation... STSD 4 STSD-D Calculating Standard Deviation... STSD 5 STSD-E Coefficient of Variation... STSD 7 STSD-F Normal Distribution and z-scores... STSD 8 STSD-G Chebshev s Theorem... STSD 15 STSD-H Correlation and Scatterplots... STSD 16 STSD-I Correlation Coefficient and Regression Equation... STSD 1 STSD-J Summar... STSD 5 STSD-K Review Eercise... STSD 7 STSD-L Appendi z-score Values Table... STSD 9 STSD-Y Inde... STSD 30 STSD-Z Solutions... STSD 3 STSD-A Objectives To calculate the mean and standard deviation of lists, tables and grouped data To determine the correlation co-efficient To calculate z-scores To use normal distributions to determine proportions and values To use Chebshev s theorem To determine correlation between sets of data To construct scatterplots and lines of best fit To calculate correlation coefficient and regression equation for data sets. STSD 1

STSD-B Calculating Mean The mean is a measure of central tendenc. It is the value usuall described as the average. The mean is determined b summing all of the numbers and dividing the result b the number of values. The mean of a population of N values (scores) is defined as the sum of all the scores, of the population,, divided b the number of scores, N. The population mean is represented b the Greek letter μ (mu) and calculated b using μ. N Often it is not possible to obtain data from an entire population. In such cases, a sample of the population is taken. The mean of a sample of n items drawn from the population is defined in the same wa and is denoted b, pronounced bar and calculated using Eample STSD-B1 Calculate the mean of the following student test results percentages.. n 9% 66% 99% 75% 69% 51% 89% 75% 54% 45% 69% n 9+ 66+ 99+ 75+ 69+ 51+ 89+ 75+ 54+ 45+ 69 11 784 71.7 11 write out formula add together all scores divide b number of scores The mean of the student test results is 71.7 % (rounded to d.p.). When calculating the mean from a frequenc distribution table, it is necessar to multipl each score b its frequenc and sum these values. This result is then divided b the sum of the frequencies. The formula for the mean calculated from a frequenc table is f f Calculations using this formula are often simplified b setting up a table as shown below. Eample STSD-B Calculate the mean number of pins knocked down from the frequenc table. Pins () Frequenc (f) f 0 0 0 1 1 1 1 1 4 3 0 3 0 0 4 4 8 5 4 0 6 9 54 7 11 77 8 13 104 9 8 7 10 8 80 Total f 60 f 40 mean Note: It is rare for an eact number to result from a mean calculation. f f 40 7 60 The mean number of pins knocked down was 7 pins. STSD

If the frequenc distrubution table has grouped data, intervals, it is necessar to use the mid-value of the interval in mean calculations. The mid-value for an interval is calculated b adding the upper and lower boundaries of the interval and dividing the result b two. mid value: upper + lower Eample STSD-B3 Calculate the mean height of students from the frequenc table. Height (cm) mid-value () Frequenc(f) f 140 144.9 140 + 145 1 14.5 14.5 145 149.9 147.5 1 147.5 150 154.9 15.5 305 155 159.9 157.5 6 945 160 164.9 16.5 5 81.5 165 169.9 167.5 335 170 174.9 17.5 1 17.5 175 179.9 177.5 355 Σ f 0 Σ f 315 mean f f 315 0 160.75cm Eercise STSD-B1 Calculate the mean of the following data sets. (a) Hocke goals scored. 5, 4, 3,,, 1, 0, 0, 1,, 3 The mean height is 160.75cm. (b) Points scored in basketball games (d) Babies weights Points Scored Frequenc (f) Bab Weight (kg) Freq (f) () 10 1.80.99 11 0 3.00 3.19 1 1 4 3.0 3.39 3 13 1 3.40 3.59 14 3 3.60 3.79 5 15 1 3.80 3.99 Total 10 Total 15 (c) Number of tping errors (e) ATM withdrawals Tping (f) Withdrawals (f) errors ($) 0 6 0 49 7 1 8 50 99 9 5 100 149 5 3 1 150 199 5 Total 0 00 49 50 99 Total 30 STSD 3

STSD-C Definition of Variance and Standard Deviation To further describe data sets, measures of spread or dispersion are used. One of the most commonl used measures is standard deviation. This value gives information on how the values of the data set are varing, or deviating, from the mean of the data set. Deviations are calculated b subtracting the mean,, from each of the sample values,, i.e. deviation. As some values are less than the mean, negative deviations will result, and for values greater than the mean positive deviations will be obtained. B simpl adding the values of the deviations from the mean, the positive and negative values will cancel to result in a value of zero. B squaring each of the deviations, the problem of positive and negative values is avoided. To calculate the standard deviation, the deviations are squared. These values are summed, divided b the appropriate number of values and then finall the square root is taken of this result, to counteract the initial squaring of the deviation. The standard deviation of a population, σ, of N data items is defined b the following formula. ( μ ) Σ σ where μ is the population mean. N For a sample of n data items the standard deviation, s, is defined b, s Σ ( ) n 1 where is the sample mean. NOTE: When calculating the sample standard deviation we divide b (n 1) not N. The reason for this is comple but it does give a more accurate measurement for the variance of a sample. Standard deviation is measured in the same units as the mean. It is usual to assume that data is from a sample, unless it is stated that a population is being used. To assist in calculations data should be set up in a table and the following headings used: μ OR ( μ ) OR ( ) Eample STSD-C1 Determine the standard deviation of the following student test results percentages. 9% 66% 99% 75% 69% 51% 89% 75% 54% 45% 69% ( ) 9 9 71.3 0.7 ( 0.7) 48.49 66 5.3 8.09 99 7.7 767.9 75 3.7 13.69 69.3 5.9 51 0.3 41.09 89 17.7 313.9 75 3.7 13.69 54 17.3 99.9 45 6.3 691.69 69.3 5.9 Σ 784 Σ( ) 978.19 Σ 784 71.3 n 11 ( ) Σ s n 1 978.19 11 1 17.6 The standard deviation of the test results is approimatel 17.6%. STSD 4

The variance is the average of the squared deviations when the data given represents the population. The lower case Greek letter sigma squared, σ, is used to represent the population variance. ( μ ) σ where μ is the population mean, and N is the population size. N The sample variance, which is denoted b s, is defined as s ( ) n 1 where is the sample mean, and n is the sample size. As variance is measured in squared units, it is more useful to use standard deviation, the square root of variance, as a measure of dispersion. STSD-D Calculating Standard Deviation The previousl mentioned formulae for standard deviation of a population, σ and a sample standard deviation, s, ( μ ) ( ) Σ Σ σ s N n 1 can be manipulated to obtain the following formula which are easier to use for calculations. These are commonl called computational formulae. ( Σ) ( Σ) Σ Σ N n σ s N n 1 To perform calculations again it is necessar to set up a table. The table heading in this case will be: Eample STSD-D1 Determine the standard deviation of the following student test results percentages. 9% 66% 99% 75% 69% 51% 89% 75% 54% 45% 69% 9 9 8464 66 4356 99 9801 75 565 69 4761 51 601 89 791 75 565 54 916 45 05 69 4761 Σ 784 Σ 58856 ( Σ) Σ n s n 1 58856 784 11 11 1 58856 55877.81 10 17.6 NOTE: This is approimatel the same value as calculated previousl. This value will actuall be more accurate as it onl uses rounding in the final calculation step. The standard deviation of the test scores is approimatel 17.6%. STSD 5

When data is presented in a frequenc table the following computational formulae for populations standard deviation, σ, and sample standard deviation, s, can be used. ( Σf) ( Σf) Σf Σf Σf Σf σ s Σf Σf 1 If the data is presented in a grouped or interval manner, the mid-values are used as with the calculation of the mean. The table heading for calculations will include. f f f Eamples STSD-D Calculate the standard deviations for each of the following data sets. (a) Number of pins knocked down in ten-pin bowling matches Pins () f f f 0 0 0 0 1 1 1 1 1 4 4 8 3 0 0 9 0 4 8 16 3 5 4 0 5 100 6 9 54 36 34 7 11 77 49 539 8 13 104 64 83 9 8 7 81 648 10 8 80 100 800 Σ f 60 Σ f 40 Σ f 384 s Σf ( Σf) Σf Σf 1 40 384 60.41 60 1 The standard deviation of the number of pins knocked down is approimatel.41 pins. (b) Heights of students Heights f f f 140 144.9 14.5 1 14.5 0306.5 0306.5 145 149.9 147.5 1 147.5 1756.5 1756.5 150 154.9 15.5 305 356.5 4651.5 155 159.9 157.5 6 945 4806.5 148837.5 160 164.9 16.5 5 81.5 6406.5 13031.5 165 169.9 167.5 335 8056.5 5611.5 170 174.9 17.5 1 17.5 9756.5 9756.5 175 179.9 177.5 355 31506.5 6301.5 Σ f 0 Σ f 315 Σ f 544731.5 s Σf ( Σf) Σf Σf 1 544731.5 0 1 38.33 315 0 The standard deviation of the heights is approimatel 38.33cm. STSD 6

Eercise STSD-D1 Calculate the standard deviations for each of the following data sets. (a) Hocke goals scored. 5, 4, 3,,, 1, 0, 0, 1,, 3 (b) Points scored in basketball games. (d) Babies weights Points Scored Frequenc (f) Bab Weight (kg) Freq (f) () 10 1.80.99 11 0 3.00 3.19 1 1 4 3.0 3.39 3 13 1 3.40 3.59 14 3 3.60 3.79 5 15 1 3.80 3.99 Total 10 Total 15 (c) Number of tping errors. (e) ATM withdrawals STSD-E Tping (f) Withdrawals (f) errors ($) 0 6 0 49 7 1 8 50 99 9 5 100 149 5 3 1 150 199 5 Total 0 00 49 50 99 Total 30 Co-efficient of Variation Without an understanding of the relative size of the standard deviation compared to the original data, the standard deviation is somewhat meaningless for use with the comparison of data sets. To address this problem the coefficient of variation is used. The coefficient of variation, CV, gives the standard deviation as a percentage of the mean of the data set. s CV 100% CV σ 100% μ for a sample for a population Eample STSD-E1 Calculate the coefficient of variation for the following data set. The price, in cents, of a stock over five trading das was 5, 58, 55, 57, 59. 5 704 58 3364 55 305 57 349 59 3481 Σ 81 Σ 1583 s.77 CV 100% 100% 4.93% 56. n 81 56.1 5 ( Σ) Σ n s n 1 1583 5 1.77 The coefficient of variation for the stock prices is 4.93%. The prices have not showed a large variation over the five das of trading. 81 5 STSD 7

The coefficient of variation is often used to compare the variabilit of two data sets. It allows comparison regardless of the units of measurement used for each set of data. The larger the coefficient of variation, the more the data varies. Eample STSD-E The results of two tests are shown below. Compare the variabilit of these data sets. Test 1 (out of 15 marks): 9 s Test (out of 50 marks): 7 s 8 s s CVtest1 100% CVtest 100% 8 100%.% 100% 9.6% 9 7 The results in the second test show a great variation than those in the first test. Eercise STSD-E1 1. Calculate the coefficient of variation for each of the following data sets. (a) Stock prices: 8, 10, 9, 10, 11 (b) Test results: 10, 5, 8, 9,, 1, 5, 7, 5, 8. Compare the variation of the following data sets. (a) Data set A: 35, 38, 34, 36, 38, 35, 36, 37, 36 Data set B: 36, 0, 45, 40, 5, 46, 6, 6, 3 (b) Bo s Heights: 141.6cm s 15.1cm Girl s Heights: 143.7cm s 8.4cm STSD-F Normal Distribution and z-scores Another use of the standard deviation is to convert data to a standard score or z-score. The z-score indicates the number of standard deviations a raw score deviates from the mean of the data set and in which direction, i.e. is the value greater or less than the mean? The following formula allows a raw score,, from a data set to be converted to its equivalent standard value, z, in a new data set with a mean of zero and a standard deviation of one. A z-score can be positive or negative: μ z sample z population s σ positive z-score raw score greater than the mean negative z-score raw score less than the mean. STSD 8

Eamples STSD-F1 1. Given the scores 4, 7, 8, 1, 5 determine the z-score for each raw score. 5 5 n 5 4 16 7 49 ( ) 8 64 s n 1 1 n 1 5 5.7386 Σ 5 Σ 155 raw score z-score meaning 4 4 5 z 0.37.7386 0.37 standard deviations below the mean 7 7 5 z 0.73.7386 0.73 standard deviations above the mean 8 8 5 z 1.1.7386 1.1 standard deviations above the mean 1 1 5 z 1.46.7386 1.46 standard deviations below the mean 5 5 5 z 0.7386 at the mean. Given a data set with a mean of 10 and a standard deviation of, determine the z-score for each of the following raw scores,. 8 8 10 8 is 1 standard deviations below the mean. z 1 10 10 10 z 0 16 16 10 z 3 10 is 0 standard deviations from the mean, it is equal to the mean. 16 is 3 standard deviations above the mean. The z-scores also allow comparisons of scores from different sources with different means and/or standard deviations. Eample STSD-F Jenn obtained results of 48 in her English eam and 75 in her Histor eam. Compare her results in the different subjects considering: English eam : class mean was 45 and the standard deviation.5 Histor eam : class mean was 78 and the standard deviation.4 48 45 z English 1.33.5 z Histor 75 78 1.5.4 Jenn s English result is 1.33 standard deviations above the class mean, while her Histor was 1.5 standard deviations below the class mean. STSD 9

It is also possible to determine a raw score for a given z-score, i.e. it is possible to find a value that is a specified number of standard deviations from a mean. The z-score formula is transformed to + s z sample μ + σ z population Eamples STSD-F3 A data set has a mean of 10 and a standard deviation of. Find a value that is: (i) 3 standard deviations above the mean z 3 + s z 10 + 3 16 16 is three standard deviations above the mean. (ii) standard deviations below the mean z + s z 10 + 6 6 is two standard deviations below the mean. Eercise STSD-F1 1. Given the scores 56, 8, 74, 69, 94 determine the z-score for each raw score.. Given a data set with a mean of 54 and a standard deviation of 3., determine the z-score for each of the following raw scores,. (i) 81 (ii) 57 3. Peter obtained results of 63 in Maths and 58 in Geograph. For Maths the class mean was 58 and the standard deviation 3.4, and for Geograph the class mean was 55 and the standard deviation.3. Compared to the rest of the class did Peter do better in Maths or Geograph? 4. A data set has a mean of 54 and a standard deviation of 3.. Find a value that is: (i) standard deviations below the mean. (ii) 1.5 standard deviations above the mean. The distribution of z-scores is described b the standard normal curve. Normal distributions are smmetric about the mean, with scores more concentrated in the centre of the distribution than in the tails. Normal distributions are often described as bell shaped. mean Data collected on heights, weights, reading abilities, memor and IQ scores often are approimated b normal distributions. The standard normal distribution is a normal distribution with a mean of zero and a standard deviation of one. 1 0 1 STSD 10

For normall distributed data 50% of the data is below the mean and 50% above the mean. 50% 50% In a normal distribution, the distance between the mean and a given z-score corresponds to a proportion of the total area under the curve, and hence can be related to a proportion of a population. The total area under a normal distribution curve is taken as equal to 1 or 100%. The values of the proportions, written as decimals, for various z-scores are provide in a statistical table, Normal Distribution Areas (see Appendi). The values in the Normal Distribution Areas table give a proportion value for the area between the mean and the raw score greater than the mean, converted to a positive z-score. + z z As the distribution is smmetric, it is possible to calculate proportions for raw scores less than the mean. If the calculated z-score is negative, the corresponding positive value from the table is used. So values for z are the same distance from the mean whether the are a number of standard deviations more or a number of standard deviations less than the mean, and will result in the same proportion. A B z +z proportion A proportion B In a normal distribution approimatel: 68% of values lie within ± 1 s.d. of the mean 68% 1 1 95% of values lie within ± s.d. of the mean 95% 99% of values lie within ± 3 s.d. of the mean 99% 3 3 STSD 11

When using the Normal Distribution Areas table, the z-score is structured from the row and column headings of the table and the required proportion is found in the middle of the table at the intersection of the corresponding rows and columns. Eamples STSD-F4 Use the Normal Distribution Areas table to determine the proportions that correspond to the following z-scores. (i) z.00.0 +.00.0 row,.00 column proportion 0.477 47.7% (ii) z.1.1.10.1 +.00.1 row,.00 column proportion 0.481 48.1% (iii) z.1.1.1 +.0.1 row,.0 column proportion 0.4830 48.30% (iv) z. use z...0. +.00. row,.00 column proportion 0.4861 48.61% (v) z.1 use z.1.1. +.01. row,.01 column proportion 0.4864 48.64% z.00.01.0.0 0.477 0.4778 0.4783.1 0.481 0.486 0.4830. 0.4861 0.4864 0.4868 It is also possible to find a z-score for a required proportion from the Normal Distribution Areas table. It is necessar to find the proportion in the middle of the table and read back to the row and column headings to determine the z-score. Eamples STSD-F5 Use the Normal Distribution Areas table to determine the z-scores that correspond to the following proportions. (i) 48.1% 0.481.1 row,.00 column.1 +.00.10.1 The z-score would be either z.1 or z.1. (ii) 48.68% 0.4868. row,.0 column. +.0. The z-score would be either z. or z.. z.00.01.0.0 0.477 0.4778 0.4783.1 0.481 0.486 0.4830. 0.4861 0.4864 0.4868 STSD 1

Values from the Normal Distribution Areas table enable the determination of proportions of the data set that lie above a raw value, below a raw value or even between two raw values within the data set. Drawing a quick sketch of the distribution curve with the position of the mean and the raw scores(s) and shading the required area can assist with the understanding of the required calculations. Eamples STSD-F6 1. The weights of chips in packets have a mean of 50g and standard deviation of 3g. (a) Find the proportion of the packets of chips with a weight between 50g and 5g. 50 5 z 5 5 50 0.67 0.486 (from the table) 3 4.86% of the chip packets have a weight between 50g and 5g. (b) Find the proportion of the packets of chips with a weight between 47g and 50g. 47 50 z47 1 3 (from the table) 0.3413 47 50 34.13% of the chip packets have a weight between 47g and 50g. (c) Find the proportion of the packets of chips with a weight greater than 55g. 0.5 50 55 55 50 z55 1.67 3 0.455 50 55 Area > 55 0.5 0.455 0.0475 4.75% of the chip packets have a weight greater than 55g. (d) Find the proportion of the packets of chips with a weight less than 45g. 45 50 0.5 45 50 45 50 z45 1.67 3 0.455 Area < 45 0.5 0.455 0.0475 4.75% of the chip packets have a weight less than 45g. (e) Find the proportion of the packets of chips with a weight between 45g and 5g. 45 5 45 50 + 50 5 45 50 z45 1.67 3 0.455 5 50 z5 0.67 3 0.486 Area between 45 & 5 0.455 + 0.486 0.7011 70.11% of the chip packets have a weight between 45g and 5g. STSD 13

Eamples STSD-F6 continued 1. (f) Find the proportion of the packets of chips with a weight between 5g and 55g. 5 55 50 55 50 5 5 50 z5 0.67 3 0.486 55 50 z55 1.67 3 0.455 Area between 5 & 55 0.455 0.486 0.039 0.39% of the chip packets have a weight between 5g and 55g.. (a) If the compan selling the chips in the previous question rejects the 5% of the chip packets that are too light, at what weight should the chip packets be rejected? 5% 50g s 3g proportion 0.4500 below mean s z z 1.645 between 1.64 and 1.65 50 3 1.645 45.065 Packets should be rejected if the weigh 45g or less. (b) Between which weights do 80% of the chip packets fall? 45% ( ) 80% 40% + 40% 50g s 3g ± s z 50 ± 3 1.8 53.84 and 46.16 ( ) proportion 0.4 above and below mean z ± 1.8 80% of the packets weigh between 46.16g and 53.84g. 3. If there are 80 packets in a vending machine, how man packets would be epected to weight more than 55g? 50 55 0.5 50 55 55 50 z55 1.67 3 0.455 Area > 55 0.5 0.455 0.0475 Number > 55 0.0475 80 3.8 Approimatel 4 chip packets would be epected to have a weight of greater than 55g. Eercise STSD-F 1. If scores are normall distributed with a mean of 30 and a standard deviation of 5, what percentage of the scores is (i) greater than 30? (ii) between 8 and 30? (iii) greater than 37? (iv) between 8 and 34? (v) between 6 and 8? STSD 14

Eercise STSD-F continued. IQ scores have a mean of 100 and a standard deviation of 16. What proportion of the population would have an IQ of: (i) greater than 13? (ii) less than 91? (iii) between 80 and 10? (iv) between 80 and 91? (v) If Shane is smarter than 75% of the population, what is his IQ score? 3. The heights of bos in a school are approimatel normall distributed with a mean of 140cm and a standard deviation of 0cm. (a) Find the probabilit that a bo selected at random would have a height of less than 175cm. (b) If there are 400 bos in the school how man would be epected to be taller than 175cm? (c) If 15% of the bos have a height that is less than the girls mean height, what is the girls mean height? 4. Charlie is a sprinter who runs 00m in an average time of.4 seconds with a standard deviation of 0.6s. Charlie s times are approimatel normall distributed. (a) To win a given race a time of 1.9s is required. What is the probabilit that Charlie can win the race? (b) The race club that the sprinter is a member of has the record time for the 00m race at 0.7s. What is the likelihood that Charlie will be able to break the record? (c) A sponsor of athletics carnivals offers $100 ever time a sprinter breaks.5s. If Charlie competes in 80 races over a ear how much sponsorship mone can he epect? STSD-G Chebshev s Theorem A Russian mathematician, P.L. Chebshev, developed a theorem that approimated the proportion of data values ling within a given number of standard deviations of the mean regardless of whether the data is normall distributed or not. Chebshev s theorem states: For an data set, at least 1 k 1 of the values lie within k standard deviations either side of the mean. ( k > 1) So for an set of data at least: 1 3 1 75% 4 1 8 1 89% 3 9 of the values lie within standard deviations. of the values lie with in 3 standard deviations. This theorem allows the determination of the least percentage of values that must lie between certain bounds identified b standard deviations. STSD 15

Eamples STSD-G1 1. The heights of adult dogs in a town have a mean of 67.3cm and a standard deviation of 3.4cm. (a) What can be concluded from Chebshev s theorem about the percentage of dogs in the town that have heights between 58.8cm and 75.8cm? 58.8 67.3 75.8 67.3 z58.8.5 z75.8.5 3.4 3.4 1 1.5 At least 84% of the adult dogs would have heights between 1 0.16 58.8cm and 75.8cm. 84%. (b) What would be the range of heights that would include at least 75% of the dogs? 1 75% 1 k 1 0.75 1 k 1 1 0.75 k 1 0.5 k 1 k 0.5 4 k k 4 Chebshev s theorem suggests that 75% of the heights are within ± standard deviations of the mean. μ ± zσ 67.3 ± 3.4 74.1 and 60.5 75% of the dog s heights would range between 60.5cm and 74.1cm. Eercise STSD-G1 1. The weights of cattle have a mean of 434kg and standard deviation of 69kg. What percentage of cattle will weigh between 330.5kg and 537.5kg?. The age of pensioners residing in a retirement village has a mean of 74 ears and standard deviation of 4.5 ears. What is the age range of pensioners that contains at least 89% of the residents? 3. It was found that for a batch of softdrink bottles, the mean content was 994ml. If 75% of the bottles contained between 898ml and 1090ml, what was the standard deviation for the softdrink batch? 4. On a test the mean is 50 marks and standard deviation 11. At most, what percentage of the results will be less than 17 and greater than 83 marks? STSD-H Correlation and Scatterplots When two different data variables, quantities, are collected from the one source it is possible to determine if a relationship eists between the variables. A simple method of determining the relationship between two variables, if it eists, is b constructing a scatterplot. A scatterplot (scatter graph or scatter diagram) is a graph that is created b plotting one variable, quantit, on the horizontal ais and the other on the vertical ais. If one variable is likel to be dependent on the other, the dependent variable should be plotted on the vertical ais and the independent variable on the -ais. The scales on the vertical and horizontal aes do not need to be the same or even use the same units. Also the aes do not need to commence at zero. STSD 16

Eamples STSD-H1 1. The heights and weights of 10 students are recorded below. Construct a scatterplot for this data. Student Height (cm) Weight (kg) Adam 167 5 Brent 178 63 Charlie 173 69 David 155 41 Edd 171 6 Fred 167 49 Gar 158 4 Harr 169 54 Ian 178 70 John 181 61 Scatterplot of Weight against Height 80 70 60 Weight (kg) 50 40 30 Adam ( 167,5) John ( 181, 61) 0 10 0 150 160 170 180 190 Height (cm). For one week the midda temperature and the number of hot drinks sold were recorded. Construct a scatterplot for this data. Sun Mon Tues Wed Thur Fri Sat Temp ( C) 9 13 14 11 8 6 5 Number of Drinks 45 34 3 4 51 60 64 Scatterplot of Hot Drinks Sales Against Temperature 70 Number of Drinks Sold 60 50 40 30 0 10 0 0 1 3 4 5 6 7 8 9 10 11 1 13 14 15 Midda Temperature (oc) STSD 17

Eercise STSD-H1 Construct scatterplots for the following data sets. 1. Student A B C D E F G H I J Test 1 ( /50) 33 36 15 9 16 9 44 30 44 3 Final Eam % 75 87 34 56 39 45 9 69 93 59. Da Mon Tue Wed Thur Fri Sat Sun Temperature ( C) 4 8 30 33 3 35 31 Softdrink Sales (cans) 17 7 9 30 36 9 3. Name Ann Lee Ma Jan Tom Wes Height (cm) 176 181 173 169 178 180 Shoe Size 9 9.5 8.5 8 10.5 10 4. Speed (km/hr) 50 55 65 70 80 100 10 130 Fuel Econom (km/l) 18.9 18.6 18.1 17.3 16.7 14.7 13. 11. The points in a scatterplot often tend towards approimating a line. It is possible to summarise the points of a scatterplot b drawing a line through the plot as a whole, not necessaril through the individual points. This line is called the line of best fit. The line of best fit line need not pass through an of the original data points, but is used to represent the entire scatterplot. A line of best fit can be thought of as an average for the scatterplot, in a wa similar to a mean is the average of a list of values. To sketch a line of best fit for a scatterplot: 1. calculate the mean of the independent variable values,, and the mean of the dependent variable values, ;. plot this mean point, (, ) to the scatterplot; 3. sketch a through the mean point, that has a slope the follows the general trend of the points of scatterplot. Wherever possible the number of points below the line should equal the number of points above. Outling points need not strongl influence the line of best fit, and are often not included. The line of best fit can be used to predict values for data associated with the scatterplot. STSD 18

Eamples STSD-H 1. The heights and weights of 10 students are recorded below. Construct a scatterplot for this data and draw a line of best fit. Student Height (cm) Weight (kg) Adam 167 5 Brent 178 63 Charlie 173 69 David 155 41 Edd 171 6 Fred 167 49 Gar 158 4 Harr 169 54 Ian 178 70 John 181 61 Means Σ 1697 Σ 563 169.7 n 10 n 10 56.3 Weight Against Height 80 70 Weight (kg) 60 50 40 ( 169.7,56.3) 30 150 155 160 165 170 175 180 185 Height (cm). For one week the midda temperature and the number of hot drinks sold were recorded. Construct a scatterplot for this data. Sun Mon Tues Wed Thur Fri Sat means Temp ( C) 9 13 14 11 8 6 5 9.43 Number of Drinks 45 34 3 4 51 60 64 46.86 70 Drink Sales Against Temperature Drink Sales 60 50 40 30 ( 9.43, 46.86) 0 4 5 6 7 8 9 10 11 1 13 14 15 Temperature Eercise STSD-H Draw lines of best fit on scatterplots drawn in Eercise STSD-H1. STSD 19

Correlation is a measure of the relationship between two measures, variables, on sets of data. Correlation can be positive or negative. A positive correlation means that as one variable increases the other variable increases, eg. height of a child and age of the child. Negative correlation implies as one variable increases the other variable decrease, eg. value of a car and age of the car. If variables have no correlation there is no relationship between the variables, i.e. one measure does not affect the other. Scatterplots enable the visual determination of whether correlation eists between variables and the tpe of correlation. no correlation perfect positive correlation positive correlation perfect negative correlation negative correlation Eercise STSD-H3 1. For each of the following scatterplots determine if the correlation is perfect positive, positive, no correlation, negative or perfect negative. A B C D E F G H. Select the scatterplot from those above that would best describe the relationship between the following variables. (i) Height at 4 ears, height at 16 ears. (ii) (iii) (iv) (v) Age of a used car, price of a used car. Temperature at 6am, temperature at 3pm. Shoe size of mother, number of children in famil. Average eam result, class size. 3. For each of the scatterplots drawn in eercise STSD- H1 state the tpe of correlation between the variables. STSD 0

STSD-I Correlation Coefficient and Regression Equation It is possible to quantif the correlation between variables. This is done b calculating a correlation coefficient. A correlation coefficient measures the strength of the linear relationship between variables. Correlation coefficients can range from 1 to +1. A value of 1 represents a perfect negative correlation and a value of +1 represents a perfect positive correlation. If a data set has a correlation coefficient of zero there is no correlation between the variables. perfect positive correlation r 1 positive correlation r 0.7 no correlation r 0 perfect negative correlation r 1 negative correlation r 0.7 The most widel used tpe of correlation coefficient is Pearson s, r, simple linear correlation. The value of r is determined with the formula below r Σ( X X)( Y Y) ( X X) ( Y Y) Σ Σ This formula uses the sums of deviations from the means in both the X values and Y values. However for ease of calculation the following calculation formula is often used. r nσxy ΣXΣY ( nσx ( ΣX) ) nσy ( ΣY) ( ) where, n number of data points Σ X sum of the X values Σ Y sum of the Y values Σ XY sum of the product of each set of X and Y values Σ X sum of X Σ Y sum of Y ( Σ X ) the square of the sum of the X values ( Σ Y ) the square of the sum of they values STSD 1

To assist in calculations data can be set up in a table and the following headings used: X Y X Y XY Eample STSD-I1 Calculate the correlation coefficient for the height weight data below. Height, X Weight, Y X Y XY 167 5 7889 704 8684 178 63 31684 3969 1114 173 69 999 4761 11937 155 41 405 1681 6355 171 6 941 3844 1060 167 49 7889 401 8183 158 4 4964 1764 6636 169 54 8561 916 916 178 70 31684 4900 1460 181 61 3761 371 11041 Σ X 1697 Σ Y 563 Σ X 8867 Σ Y 3661 Σ XY 9638 r nσxy ΣXΣY nσx ΣX nσy ΣY ( ( ) ) ( ) ( ) 10 9638 1697 563 0.88998 ( 10 8867 1697 )( 10 3661 563 ) There is a positive correlation between the height and weight of students. NOTE: r, sometimes referred to as the co-efficient of determination, represents the proportion (percentage) of the relationship between the variables that can be eplained b a linear relationship. The greater the r value, the greater the linear relationship between the variables. Eample STSD-I For the previous height/weight eample r 0.88998 r 0.779685 approimatel 78% of the relationship can be eplained b a linear correlation. This is a moderatel strong correlation. Eercise STSD-I1 Determine the correlation co-efficient for the following data sets. 1. Temperature ( C) 9 13 14 11 8 6 5 Hot drink sales (cups) 45 34 3 4 51 60 64. Speed (km/hr) 50 55 65 70 80 100 10 130 Fuel econom (km/l) 18.9 18.6 18.1 17.3 16.7 14.7 13. 11. 3. Data collected of students results for sitting Test 1 and the final eam (eercise STSD H1 question 1) Determine the correlation co-efficient for this data. X : Test 1 results (out of 15 marks) Y : Final eam result (out of 50 marks) Σ X 99, Σ Y 649, Σ X 9849, Σ Y 46387, Σ XY 137, n 10 STSD

The relationship between two sets of data can be represented b a linear equation called a regression equation. The regression equation gives the variation of the dependent variable for a given change in the independent variable. It is etremel important to correctl determine which variable is dependent. The regression equation can be used to construct the regression line (line of best fit) on the associated scatterplot. Because the equation is for a line, the regression equation takes on the general linear equation format, m+ c. Usuall, however, for a regression equation this is written as α + β, where α is the -intercept and β the slope of the line. ΣXY nx Y α + β where β ΣX nx α Y βx The slope of the line depends on whether the correlation is positive or negative. A regression equation can be used to predict dependent variables from independent inputs within the range of the scatterplot values. It should not be used: to predict given to predict outside the bounds of given values. The stronger the correlation between the two variables the better the prediction made b the regression equation. Again setting up a table of values, with the following headings, can assist in calculations. X Y X XY Eamples STSD-I3 1. Calculate the correlation coefficient for the height/weight data below. The weight of a student should depend on the height rather than vice versa, so height is the independent,, variable and weight the dependent,, variable. Height, X Weight, Y X XY 167 5 7889 8684 178 63 31684 1114 173 69 999 11937 155 41 405 6355 171 6 941 1060 167 49 7889 8183 158 4 4964 6636 169 54 8561 916 178 70 31684 1460 181 61 3761 11041 Σ X 1697 Σ Y 563 Σ X 8867 Σ XY 9638 ΣX 1697 X 169.7 ΣXY nx Y 9638 10 169.7 56.3 n 10 β 8867 10 169.7 ΣY 563 ΣX nx Y 56.3 1.078655997... n 10 1.08 α Y βx 56.3 1.078.. 169.7 16.7 The regression equation is 1.08 16.7 OR w 1.08h 16.7 STSD 3

Eamples STSD-I3 continued. Use the regression equation to predict the following: (i) The weight of a student who is 160cm tall. h 160 w 1.08h 16.7 1.08 160 16.7 46.1kg The student should weigh approimatel 46.1kg. (ii) The weight of a student who is 175cm tall. h 175 w 1.08h 16.7 1.08 175 16.7 6.3kg The student should weigh approimatel 6.3kg. (iii) The weight of a student who is 185cm tall. Can not be predicted as input is outside range of recorded heights. (iv) The height of a student who weighs 65kg. Regression equation can not be used to predict height from weight. Eercise STSD-I 1. Determine the regression equation for each the following data sets. (Use sums calculated in the previous eercise and the equations to predict the requested values.) (a) Temperature ( C) 9 13 14 11 8 6 5 Hot drink sales (cups) 45 34 3 4 51 60 64 (i) (ii) Predict the drink sales when the temperature is 10 C. Predict the drink sales if the temperature is 5 C. (b) Speed (km/hr) 50 55 65 70 80 100 10 130 Fuel econom (km/l) 18.9 18.6 18.1 17.3 16.7 14.7 13. 11. (i) (ii) Predict the fuel econom for a speed of 75km/hr. Predict what speed a car would be travelling if it was getting 17.5km/l.. Data collected of students results for sitting Test 1 and the final eam (eercise STSD H1 question 1) X : Test 1 results (out of 15 marks) Y : Final eam result (out of 50 marks) Σ X 99, Σ Y 649, Σ X 9849, Σ Y 46387, Σ XY 137, n 10 Find the regression equation for this data. STSD 4

STSD-J Summar The mean is a measure of central tendenc. Standard deviation measures spread or dispersion of a data set. The coefficient of variation, CV, gives the standard deviation as a percentage of the mean of the data set. The z-score indicates how far, the number of standard deviations, a raw score deviates from the mean of the data set. The following formulae can be used to calculate the the given statistical measures. Statistical Measure Population Formula Sample Formula Mean Mean from frequenc table μ N n f f μ f f Standard Deviation ( Σ) Σ N σ N s Σ ( Σ) n 1 n Standard Deviation from frequenc table σ Σf ( Σ ) f Σf Σf s Σf ( Σf) Σf Σf 1 Coefficient of Variation CV σ 100% CV s 100% μ z-score μ z σ z s Raw Score from z-score μ + σ z + s z The standard normal distribution is a normal distribution with a mean of zero and a standard deviation of one. The distance between the mean and a given z-score corresponds to a proportion of the total area under the curve, and hence can be related to a proportion of a population. The total area under a normal distribution curve is taken as equal to 1 or 100%. The values in the Normal Distribution Areas table give a proportion value for the area between the mean and the raw score greater than the mean, converted to a positive z-score. + z In a normal distribution approimatel: 68% of values lie within ± 1 s.d. of the mean; 95% of values lie within ± s.d. of the mean; and 99% of values lie within ± 3 s.d. of the mean. 1 Chebshev s theorem states: for an data set, at least ( 1 k ) deviations either side of the mean. ( k > 1). of the values lie within k standard STSD 5

A scatterplot is a graph that is created b plotting one variable, quantit, on the horizontal ais and the other on the vertical ais. To sketch a line of best fit for a scatterplot: 1. calculate the mean of the independent variable values,, and the mean of the dependent variable values, ;. plot this mean point, (, ) to the scatterplot; 3. sketch a line through the mean point, that has a slope that follows the general trend of the points of scatterplot. Correlation is a measure of the relationship between two measures, variables, on sets of data. Correlation can be positive or negative. no correlation perfect positive correlation positive correlation perfect negative correlation negative correlation A correlation coefficient measures the strength of the linear relationship between variables. The most widel used tpe of correlation coefficient is Pearson s, r, simple linear correlation. The value of r is determined with the calculation formula r nσxy ΣXΣY ( nσx ( ΣX) ) nσy ( ΣY) ( ) r, sometimes referred to as the co-efficient of determination, represents the proportion (percentage) of the relationship between the variables that can be eplained b a linear relationship. The relationship between two sets of data can be represented b a linear equation called a regression equation. ΣXY nx Y α + β where β ΣX nx α Y βx A regression equation can be used to predict dependent variables from independent inputs within the range of the scatterplot values. It should not be used: to predict, the independent variable, given, the dependent variable. to predict outside the bounds of given values. The stronger the correlation between the two variables the better the prediction made b the regression equation. STSD 6

STSD-K Review Eercise 1. For each of the following data sets calculate (i) the mean (ii) the standard deviation (iii) the coefficient of variation. (a) (b) (c) Store Sales for a week $55 $698 $547 $70 $645 $451 Student Mark in a 5 Mark Test Mark Frequenc 0 1 1 4 3 10 4 8 5 5 Dail Rainfall in millimetres Rainfall (mm) Frequenc (das) 0 4 5 9 8 10 14 4 15 19 3 0 4 4. A soft-drink filling machine uses cans with a maimum capacit of 340ml. The machine is set to output softdrink with a mean capacit of 330ml. It has been found that due to machine error the amount outputted varies with a standard deviation of 8ml and the amount outputted is normall distributed. (a) What proportion of cans will have between 330 ml and 340ml of softdrink? (b) What percentage of cans will have between 35ml and 340ml? (c) What percentage of cans will overflow? (d) If the smallest 5% of drinks must be rejected, what is the smallest amount which will be accepted? 3 (a) If a set of data has a mean of 76 and a standard deviation of 8.8, what is the interval that should contain at least 75% of the data? (b) A data set has a mean of 87 and a standard deviation of 98. At least what percentage of values should lie been 58 and 107? (c) A set of data has a mean of 468. If 89% of the data values lie between 336 and 600, what is the standard deviation for the data set? STSD 7

Eercise STSD-K continued 4. (a) Match each of the correlation coefficients with a scatterplot below. (i) r 0.6 (iv) r 0.9 (ii) r 0 (v) r 1 (iii) r 0.9 (b) Which scatterplot best approimates the correlation between each of the two variables below? (a scatterplot ma be used more than once) (i) das on a good diet, weight (ii) temperature outside, temperature in a non-air conditioned car (iii) hand span, height (iv) rainfall, level of water in river (v) length of finger nails, intelligence A B C D E F G 5. The average test results for a standard eamination and corresponding class size were recorded for five schools. The results are summarised in the table below. School Class Size Test Result A 8 8% B 33 50% C 5 80% D 14 98% E 0 90% Use the class size/test result data to: (a) construct a scatterplot (b) draw a line of best fit (c) determine the correlation coefficient (d) comment on the correlation (e) determine the regression equation (f) use the regression equation to predict (i) the epected result if a school had a class size of 30 students (ii) the epected result for a class of 10 students (iii) how man student there would be in a class if the test result was 75% STSD 8

STSD-L Appendi Normal Distribution Areas Table 0 z z 0.00 0.01 0.0 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.0000 0.0040 0.0080 0.010 0.0160 0.0199 0.039 0.079 0.0319 0.0359 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753 0. 0.0793 0.083 0.0871 0.0910 0.0948 0.0987 0.106 0.1064 0.1103 0.1141 0.3 0.1179 0.117 0.155 0.193 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517 0.4 0.1554 0.1591 0.168 0.1664 0.1700 0.1736 0.177 0.1808 0.1844 0.1879 0.5 0.1915 0.1950 0.1985 0.019 0.054 0.088 0.13 0.157 0.190 0.4 0.6 0.57 0.91 0.34 0.357 0.389 0.4 0.454 0.486 0.517 0.549 0.7 0.580 0.611 0.64 0.673 0.704 0.734 0.764 0.794 0.83 0.85 0.8 0.881 0.910 0.939 0.967 0.995 0.303 0.3051 0.3078 0.3106 0.3133 0.9 0.3159 0.3186 0.31 0.338 0.364 0.389 0.3315 0.3340 0.3365 0.3389 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.361 1.1 0.3643 0.3665 0.3686 0.3708 0.379 0.3749 0.3770 0.3790 0.3810 0.3830 1. 0.3849 0.3869 0.3888 0.3907 0.395 0.3944 0.396 0.3980 0.3997 0.4015 1.3 0.403 0.4049 0.4066 0.408 0.4099 0.4115 0.4131 0.4147 0.416 0.4177 1.4 0.419 0.407 0.4 0.436 0.451 0.465 0.479 0.49 0.4306 0.4319 1.5 0.433 0.4345 0.4357 0.4370 0.438 0.4394 0.4406 0.4418 0.449 0.4441 1.6 0.445 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.455 0.4535 0.4545 1.7 0.4554 0.4564 0.4573 0.458 0.4591 0.4599 0.4608 0.4616 0.465 0.4633 1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706 1.9 0.4713 0.4719 0.476 0.473 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767.0 0.477 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.481 0.4817.1 0.481 0.486 0.4830 0.4834 0.4838 0.484 0.4846 0.4850 0.4854 0.4857. 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916.4 0.4918 0.490 0.49 0.495 0.497 0.499 0.4931 0.493 0.4934 0.4936.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.495.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.496 0.4963 0.4964.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.497 0.4973 0.4974.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981.9 0.4981 0.498 0.498 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986 3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990 STSD 9

STSD-Y Inde Topic Page Chebshev's Theorem... STSD 15 Coefficient of Determination... STSD Co-efficient of Variation... STSD 7 Correlation... STSD 0 Correlation Coefficient... STSD 1 Deviation... STSD 3 Line of Best Fit... STSD 18 Mean... STSD Mean population... STSD Mean sample... STSD Measure of Central Tendenc... STSD Negative Correlation... STSD 0 No Correlation... STSD 0 Normal Distribution... STSD 11 Normal Distribution table... STSD 1, 9 Population mean... STSD Population standard deviation... STSD 3 Positive Correlation... STSD 0 Raw Score... STSD 10 Regression Equation... STSD 3 Regression Line... STSD 3 Sample mean... STSD Sample standard deviation... STSD 3 Scatterplot... STSD 16 Standard Deviation... STSD 3 Standard Deviation calculation... STSD 4 Standard Deviation population... STSD 3 Standard Deviation sample... STSD 3 Standard Normal Curve... STSD 10 Standard Normal Distribution... STSD 10 Variance... STSD 5 bar... STSD z-scores... STSD 8 STSD 30

STSD 31

SOLUTIONS Statistics and Standard Deviation STSD-Z Solutions STSD -B Calculating Means... STSD 33 STSD -D Calculating Standard Deviations... STSD 34 STSD -E Coefficient of Variation... STSD 35 STSD -F Normal Distribution and z-scores STSD F1 z-scores... STSD 36 STSD F Normal Distributions... STSD 37 STSD -G Chebshev s Theorem... STSD 40 STSD -H Correlation and Scatterplots STSD H1/H Scatterplots/Lines of Best Fit... STSD 41 STSD H3 Correlation... STSD 43 STSD -I Correlation Coefficient and Regression Equation STSD I1 Correlation Coefficient... STSD 44 STSD I Regression Equation... STSD 45 STSD -K Review Eercise... STSD 46 STSD 3

SOLUTIONS Statistics and Standard Deviation STSD B1 Calculating Means (a) 0+ 0+ 1+ 1+ + + + 3+ 3+ 4+ 5 mean n 11 3.09 11 The mean number of hocke goals scored is approimatel.09 goals. (b) Points Scored () Frequenc (f) f 10 1 10 11 0 0 1 4 48 13 1 13 14 3 4 15 1 15 Total Σ f 10 Σ f 18 f mean f 18 10 1.8 (c) The mean number of points scored is 1.8 points. Tping errors () Frequenc (f) f 0 6 0 1 8 8 5 10 3 1 3 Total Σ f 0 Σ f 1 The mean number of tping errors is 1.05 errors. f mean f 1 0 1.05 (d) Bab Weight (kg) Mid-value () Frequenc (f) f.80.99.8 3.9 5.8 3.00 3.19 3.1 1 3.1 3.0 3.39 3.3 3 9.9 3.40 3.59 3.5 7 3.60 3.79 3.7 5 18.5 3.80 3.99 3.9 7.8 Total Σ f 15 Σ f 5.1 f mean f 5.1 15 The mean bab weight is approimatel 3.47kg. 3.473 3.47 (e) Withdrawals ($) Mid-value () Frequenc (f) f 0 49 0+ 50 5 7 175 50 99 75 9 675 100 149 15 5 65 150 199 175 5 875 00 49 5 450 50 99 75 550 Total Σ f 30 Σ f 3350 f mean f 3350 The mean withdrawal was approimatel $111.67. 30 111.6 111.67 STSD 33

SOLUTIONS Statistics and Standard Deviation STSD-D1 Calculating Standard Deviations (a) 5 5 4 16 3 9 4 4 1 1 0 0 0 0 1 1 4 3 9 Σ 3 Σ 73 ( Σ) Σ n s n 1 3 73 11 11 1 1.58 The standard deviation of the number of goals scored is approimatel 1.58 goals. (b) Points () f f f 10 1 10 100 100 11 0 0 11 0 1 4 48 144 576 13 1 13 169 169 14 3 4 196 588 15 1 15 5 5 Σ f 10 Σ f 18 Σ f 1658 ( Σf) Σf 1658 18 Σf 10 s 1.48 Σf 1 10 1 The standard deviation of number of basketball points is approimatel 1.48 points. (c) Errors () f f f 0 6 0 0 0 1 8 8 1 8 5 10 4 0 3 1 3 9 9 Σ f 0 Σ f 1 Σ f 37 s Σf ( Σf) Σf 1 37 1 0 0.89 0 1 Σf The standard deviation of the number of tping errors is approimatel 0.89 errors. (d) Weights f f f.80.99.9 5.8 8.41 16.8 3.00 3.19 3.1 1 3.1 9.61 9.61 3.0 3.39 3.3 3 9.9 10.89 3.67 3.40 3.59 3.5 7 1.5 4.5 3.60 3.79 3.7 5 18.5 13.69 68.45 3.80 3.99 3.9 7.8 15.1 30.4 Σ f 15 Σ f 5.1 Σ f 18.47 ( Σf) Σf 18.47 5.1 Σf 15 s 0.33 Σf 1 15 1 The standard deviation of the bab weights is approimatel 0.33kg. STSD 34

SOLUTIONS Statistics and Standard Deviation STSD-D1 continued (e) Withdrawals f f f 0 49 5 7 175 65 4375 50 99 75 9 675 565 5065 100 149 15 5 65 1565 7815 150 199 175 5 875 3065 15315 00 49 5 450 5065 10150 50 99 75 550 7565 15150 Σ f 30 Σ f 3350 Σ f 538750 ( Σf) Σf 538750 3350 Σf 30 s 75.35 Σf 1 30 1 The standard deviation of the ATM withdrawals is approimatel $75.35. STSD-E1 Coefficient of Variation 1. (a) 8 + 10 + 9 + 10 + 11 n 5 48 9.6 5 8 + 10 + 9 + 10 + 11 466 ( Σ) Σ n s n 1 48 466 5 5 1 1.14 s CV 100% 1.14 100% 9.6 11.9% (b) 71 7.1 n 10 581 ( Σ) Σ n s n 1 71 581 10 10 1.9 s CV 100%.9 100% 7.1 41.1%. (a) Data Set A: 35 36.1 n 9 11751 ( Σ) Σ n s n 1 11751 9 1 1.36 35 9 s CV 100% 1.36 100% 36.1 3.77% Data Set B: 33 35.8 n 9 1517 ( Σ) Σ n s n 1 1517 9 1 10.75 There is greater relative variation in Data set B. 33 9 s CV 100% 10.75 100% 35.9 9.96% STSD 35

SOLUTIONS Statistics and Standard Deviation STSD-E1 continued. (b) Bos Heights: 141.6cm s 15.1cm Girls Heights: 143.7cm s 8.4cm There is greater relative variation in bos heights. s CV 100% 15.1 100% 141.6 10.7% s CV 100% 8.4 100% 143.7 5.8% STSD-F1 1. z-scores 375 75 n 5 ( ) s n n 1 14.17 56 3136 8 674 74 5476 69 4761 94 8836 Σ 375 Σ 8933 raw score z-score meaning 56 56 75 z 1.34 14.17 1.34 standard deviations below the mean 8 8 75 z 0.49 14.17 0.49 standard deviations above the mean 74 74 75 z 0.07 14.17 0.07 standard deviations below the mean 69 69 75 z 0.4 14.17 0.4 standard deviations below the mean 94 94 75 z 1.34 14.17 1.34 standard deviations above the mean. (i) 81 81 54 8.44 s.d. above the mean z 8.44 3. (ii) 57 57 54 0.94 s.d. above the mean z 0.94 3. 3. 63 58 58 55 z Maths 1.47 z Geograph 1.30 3.4.3 Peter s Maths result is 1.47 standard deviations above the class mean, while his geograph was 1.30 standard deviations above the class mean. So Peter did slightl better with his Maths result compared to the rest of the class. 4. (i) z + s z 54 + 3. 47.6 47.6 is two standard deviations below the mean. (ii) z 1.5 + s z 54 + 3. 1.5 58.8 58.8 is 1.5 standard deviations above the mean. STSD 36

SOLUTIONS Statistics and Standard Deviation STSD-F 1. (i) Normal Distributions 30 50% 50% of the scores are greater than 30, the mean. (ii) z 8 8 30 0.4 0.1554 5 15.54% of the scores are between 8 and 30. (iii) 8 30 30 37 0.5 30 37 37 30 z37 1.4 5 0.419 8.08% of the scores are greater than 37. Area > 37 0.5 0.419 0.0808 (iv) 8 30 8 34 8 30 34 30 z8 0.4 z34 0.8 5 5 0.1554 0.881 44.35% of the score are between 8 and 34. + 30 34 Area between 8 & 34 0.1554 + 0.881 0.4435 (v) 6 8 6 30 8 30 6 30 8 30 z6 0.8 z8 0.4 5 5 0.881 0.1554 13.7% of the score are between 6 and 8. Area between 6 & 8 0.881 0.1554 0.137. (i) 100 13 0.5 100 13 13 100 Area > 13 0.5 0.477 z13 16 0.08 0.477.8% of the population have IQs greater than 13. STSD 37

SOLUTIONS Statistics and Standard Deviation STSD-F. (ii) continued 91 100 50% 91 100 91 100 z91 0.565 16 0.13 Area < 91 0.5 0.13 0.877 8.77% of the population have IQs less than 91. (iii) + 80 10 80 100 100 10 80 100 z80 1.5 16 0.3944 10 100 z10 1.5 16 0.3944 Area between 80 & 10 0.3944 0.7888 78.88% of the population have IQs between 80 and 10. (iv) 80 91 80 100 91 100 80 100 z80 1.5 16 0.3944 91 100 z91 0.565 16 0.13 Area between 80 & 91 0.3944 0.13 0.181 18.1% of the population have IQs between 80 and 91. (v) 75% 50% + 5% ( ) proportion 0.5 above mean + s z z 0.675 100 + 16 0.675 between 0.67 and 0.68 110.8 Shane would have an IQ of over 110. STSD 38

SOLUTIONS Statistics and Standard Deviation STSD-F 3. (a) continued 140 175 50% 140 + 140 175 175 140 z175 1.75 0 0.4599 Area < 175 0.5 + 0.4599 0.9599 95.99% of the bos have a height of less than 175cm. (b) 0.5 140 175 140 175 175 140 Area > 175 0.5 0.4599 Bos > 175 0.0401 400 z175 1.75 0 0.0401 16 0.4599 Approimatel 16 bos have a height of greater than 175cm. (c) 15% 35% 140 proportion 0.35 ( below mean) s z z 1.035 140 0 1.035 between 1.03 and 1.04 119.3 The girls mean height is approimatel 119.3cm. 4. (a) (b) 1.9.4 1.9.4 z1.9 0.83 0.6 0.967 0.5 1.9.4 Area < 1.9 0.5 0.967 0.033 Charlie has approimatel 0% probabilit of winning the race. 0.7.4 0.5 0.7.4 (c) 0.7.4 z0.7.83 0.6 0.4977 Area < 0.7 0.5 0.4977 0.003 Charlie has approimatel 0.3% probabilit of breaking the record..4.5 50%.4.5.4 Area <.5 0.5 + 0.0675 z.5 0.17 0.6 0.5675 0.0675 Charlie epect $4540 in prize mone. +.4.5 Prize 0.5675 80 $100 $4540 STSD 39

SOLUTIONS Statistics and Standard Deviation STSD-G1 1.. z 330.5 Chebshev s Theorem 330.5 434 537.5 434 1.5 z537.5 1.5 69 69 1 1 1.5 1 0.4 55.5% 1 89% 1 k 1 0.89 1 k 1 1 0.89 0.11 k 1 k 9 0.11 k ± 3 At least 56% of the cattle would have weights between 330.5 kg and 537.5 kg. z + 3 + s z 74 + 4.5 3 87.5 z 3 s z 74 4.5 3 60.5 At least 89% of the pensioners would have ages between 60.5 ears and 87.5 ears. 3. 1 75% 1 k 1 0.75 1 k 1 1 0.75 0.5 k 1 k 4 0.5 k ± z + s z z s z 1090 994 + s OR 898 994 s 1090 994 s s 994 898 96 96 s 48 s 48 The standard deviation is 48ml. 4. 17 50 83 50 z17 3 z83 3 11 11 1 1 3 At least 89% of the test result would be between 17 and 83, therefore 100% 89% 11% of the results will be less than 17 1 0.1 and greater than 83 marks 89% STSD 40

SOLUTIONS Statistics and Standard Deviation STSD-H1/H 1. Scatterplots/Lines of Best Fit Student A B C D E F G H I J Test 1 ( /50) 33 36 15 9 16 9 44 30 44 3 Final Eam % 75 87 34 56 39 45 9 69 93 59 99 649 mean test1 9.9 mean test1 64.9 10 10 Scatterplot of Final Eam against Test 1 Final Eam Result (%) 100 90 80 70 60 50 40 30 0 10 0 ( 9.9, 64.9) 0 10 0 30 40 50 Test 1 Results (/50). Da Mon Tue Wed Thur Fri Sat Sun Temperature ( C) 4 8 30 33 3 35 31 Softdrink Sales (cans) 17 7 9 30 36 9 13 mean temp 30.43 7 190 mean sales 7.14 7 Scatterplot of Drink Sales against Temperature Softdrink Sales (cans) 40 35 30 5 0 ( 30.43,7.14) 15 0 5 30 35 40 Temperature (oc) STSD 41

SOLUTIONS Statistics and Standard Deviation STSD-H1/H continued 3. Name Ann Lee Ma Jan Tom Wes Height (cm) 176 181 173 169 178 180 Shoe Size 9 9.5 8.5 8 10.5 10 1057 55.5 mean height 176.17 mean shoesize 9.5 6 6 Shoe Size against Height 11 10 Shoe Size 9 8 ( 176.17,9.5) 7 6 168 170 17 174 176 178 180 18 Height (cm) 4. Speed (km/hr) 50 55 65 70 80 100 10 130 Fuel Econom (km/l) 18.9 18.6 18.1 17.3 16.7 14.7 13. 11. 670 18.7 mean speed 83.75 mean econom 16.09 8 8 Fuel Econom against Speed 0 18 Fuel Econom (km/l) 16 14 1 ( 83.75,16.09) 10 40 60 80 100 10 140 Speed (km/hr) STSD 4

SOLUTIONS Statistics and Standard Deviation STSD-H3 Correlation 1. A perfect positive B- negative C - positive D - positive E - negative F perfect negative G no correlation H perfect positive. (i) Height at 4ears, height at 16 ears. C (ii) Age of a used car, price of a used car. E (iii) Temperature at 6am, temperature at 3pm. C (iv) Shoe size of mother, number of children in famil. G (v) Average eam result, class size. E 3. 1. positive correlation the better a student did in Test 1 the better the did for the final eamination 3. positive correlation the taller a person the larger their shoe size. Shoe Size against Height Scatterplot of Final Eam against Test1 11 100 10 90 Final Eam (%) 80 70 60 50 Shoe Size 9 8 7 40 30 6 10 0 30 40 50 168 170 17 174 176 178 180 18 Test 1 Results (/50) Height (cm). positive correlation the hotter the temperature the more softdrinks sold. Scatterplot of Drink Sales against Temperature 4. negative correlation the faster a car the less economical it is. Fuel Econom against Speed 40 0 35 18 Softdrink Sales (cans) 30 5 0 Fuel Econom (km/l) 16 14 1 15 10 0 4 6 8 30 3 34 36 40 60 80 100 10 140 Temperature (oc) Speed (km/hr) STSD 43

SOLUTIONS Statistics and Standard Deviation STSD-I1 Correlation Coefficient 1. Temperature, X Hot Drink Sales Y X Y XY 9 45 81 05 405 13 34 169 1156 44 14 3 196 104 448 11 4 11 1764 46 8 51 64 601 408 6 60 36 3600 360 5 64 5 4096 30 Σ X 66 Σ Y 38 Σ X 69 Σ Y 1666 Σ XY 845 r 7 845 66 38 ( 7 69 66 )( 7 1666 38 ) 0.9901 There is a strong negative correlation, -0.9901, between the sales of hot drinks and the temperature. r 0.9803. Speed, X Econom, Y X Y XY 50 18.9 500 357.1 945 55 18.6 305 345.96 103 65 18.1 45 37.61 1176.5 70 17.3 4900 99.9 111 80 16.7 6400 78.89 1336 100 14.7 10000 16.09 1470 10 13. 14400 174.4 1584 130 11. 16900 15.44 1456 Σ X 670 Σ Y 18.7 Σ X 6350 Σ Y 14.73 Σ XY 1001.5 r 8 1001.5 670 18.7 ( 8 6350 670 )( 8 14.73 18.7 ) 0.99195 There is a strong negative correlation, 0.99, between the speed and the fuel econom. r 0.984 3. r 10 137 99 649 ( 10 9849 99 )( 10 46387 649 ) 0.930 There is a positive correlation, 0.93, between the result in test 1 and the result in final eam. r 0.865 STSD 44

SOLUTIONS Statistics and Standard Deviation STSD-I Regression Equation 1. (a) Temperature, X Hot Drink Sales, Y X XY 9 45 81 405 13 34 169 44 14 3 196 448 11 4 11 46 8 51 64 408 6 60 36 360 5 64 5 30 Σ X 66 Σ Y 38 Σ X 69 Σ XY 845 X Y ΣX n 66 7 9.43 ΣY n 38 7 46.86 ΣXY nx Y 845 7 9.43 46.86 α Y βx β ΣX nx 69 7 9.43 46.86 ( 3.57) 9.43 3.57 80.53 The regression equation is 3.57+ 80.53 (i) 10 3.57 10 + 80.53 45 It would be epected that 45 hot drinks would be sold at 10 C. (ii) Can not be predicted as input is outside range of recorded temperatures. (b) Speed, X Econom, Y X XY 50 18.9 500 945 55 18.6 305 103 65 18.1 45 1176.5 70 17.3 4900 111 80 16.7 6400 1336 100 14.7 10000 1470 10 13. 14400 1584 130 11. 16900 1456 Σ X 670 Σ Y 18.7 Σ X 6350 Σ XY 1001.5 X Y ΣX n 670 8 83.75 ΣY n 18.7 8 16.09 ΣXY nx Y 1001.5 8 83.75 16.09 β ΣX nx 6350 8 83.75 0.093 The regression equation is 0.093+ 3.88. α Y βx 16.09 ( 0.093) 83.75 3.88 (i) 75 0.093 75 + 3.88 16.91 It would be epected that at 75km/hr the econom would be appro. 16.9km/l. (ii) Regression equation can not be used to predict speed from econom.. X ΣX n 99 10 9.9 Y ΣY n 649 10 64.9 ΣXY nx Y 137 10 9.9 64.9 β ΣX nx 9849 10 9.9.016 ( ) α Y βx 64.9.016 9.9 4.6 The regression equation is.016+ 4.6. STSD 45

SOLUTIONS Statistics and Standard Deviation STSD-K Review Eercise 1. (a) Store Sales for a week (b) Sales () 55 304704 698 48704 547 9909 70 518400 645 41605 451 03401 Σ 3613 Σ 8943 CV n 3613 6 $60.17 s 100% 103.6 100% 60.17 17.15% ( Σ) Σ n s n 1 8943 6 1 103.6 (i) the mean sales is approimatel $60.17 (ii) the standard deviation of the sales is approimatel $103.6 (iii) the coefficient of variation is approimatel 17.15% Student Mark in a 5 Mark Test Mark ( ) Frequenc, ( ) f f 0 1 0 0 0 1 1 4 8 4 16 3 10 30 9 90 4 8 3 16 18 5 5 5 5 15 Σ f 30 Σ f 97 Σ f 361 f 3613 6 (c) f f 97 3.3 30 s Σf ( Σf) Σf 1 97 30 361 30 1 1.8 Σf CV s 100% 1.8 100% 3.3 39.6% (i) the mean mark is approimatel 3.3 (ii) the standard deviation of the marks is approimatel 1.8 (iii) the coefficient of variation is approimatel 39.6% Dail Rainfall in millimetres Rainfall f f 0 4 4 4 8 5 9 7 8 56 49 39 10 14 1 4 48 144 576 15 19 17 3 51 89 867 0 4 4 88 484 1936 Σ f 1 Σ f 47 Σ f 3779 f f ( Σf) Σf s f Σf s CV 100% 47 Σf 1 11.76 6.61 1 47 100% 3779 1 11.76 1 1 56.% 6.61 (i) the mean rainfall was approimatel 11.76mm (ii) the standard deviation of the rainfall was approimatel 6.61mm (iii) the coefficient of variation is approimatel 56.% STSD 46

SOLUTIONS Statistics and Standard Deviation STSD-K. (a) continued 330 340 340 330 z340 1.5 8 0.3944 A proportion of 0.3944 of the cans between 330 ml and 340ml of softdrink. (b) + 35 340 35 330 330 340 (c) 35 330 z35 0.65 8 0.340 ( between 0.6 & 0.63) 340 330 z340 1.5 8 0.3944 6.84% of the cans between 35 ml and 340ml of softdrink. 330 340 340 330 z340 1.5 8 0.3944 0.5 Area > 340 0.5 0.3944 0.1056 10.56% of the cans will overflow. Area between 35 ml & 340ml 0.340 + 0.3944 0.684 330 340 (d) 5% 330 45% ( ) proportion 0.45 below mean z 1.645 s z 330 8 1.645 between 1.64 and 1.65 316.84 The smallest amount of softdrink that would be accepted would be 316.84ml. 3. (a) (b) 1 75% 1 k 1 0.75 1 k 1 1 0.75 0.5 k 1 k 4 k ± 0.5 z z 58 107 58 87.5 98 107 87.5 98 z + s z 76 + 8.8 133.6 z s z 76 8.8 18.4 At least 75% of values would lie between 18.4 and 133.6. 1 1.5 1 0.16 84% At least 84% of the values should lie been 58 and 107. STSD 47

SOLUTIONS Statistics and Standard Deviation STSD-K continued 3. (c) 1 z + 3 z 3 89% 1 k s z s z 1 600 468 + s 3 OR 336 468 s 3 0.89 1 k 600 468 s 3s 468 336 1 13 13 1 0.89 0.11 s 44 s 44 k 3 3 1 k 9 k ± 3 0.11 The standard deviation is 44. 4. (a) (i) r 0.6 C (iv) r 0.9 D (ii) r 0 G (v) r 1 F (iii) r 0.9 E (b) (i) Das on a good diet, weight B (ii) temperature outside, temperature in a non-airconditioned car D (iii) hand span, height C (iv) rainfall, level of water in river C (v) length of finger nails, intelligence G 5. School Class Size, X Test Result, Y X Y XY A 8 8 784 674 96 B 33 50 1089 500 1650 C 5 80 65 6400 000 D 14 98 196 9604 137 E 0 90 400 8100 1800 Σ X 10 Σ Y 400 Σ X 3094 Σ Y 3338 Σ XY 9118 (a)/(b) 110 100 10 X 4 5 400 Y 80 5 Scatterplot of Test Result against Class size Test result (%) 90 80 70 60 50 ( 4,80) 40 10 15 0 5 30 35 Class Size STSD 48

SOLUTIONS Statistics and Standard Deviation STSD-K 5. (c) (d) (e) r continued 5 9118 10 400 ( 5 3094 10 )( 5 3338 400 ) 0.904 There is a negative correlation between the class size and test result. The smaller the class the better the test result. r 0.8175 ΣXY nx Y 9118 5 4 80 α Y βx β ΣX nx 3094 5 4 80 (.5) 4.5 134.05 The regression equation is.5+ 134.05 (f) (i) 30.5 30 + 134.05 66.5 (ii) (iii) It would be epected that the test result would be 66.5% when the class size was 30 students. Can not be predicted as input is outside range of recorded class sizes. Regression equation can not be used to predict class size from test result. STSD 49