Chapter 4: Average and standard deviation



Similar documents
Exercise 1.12 (Pg )

$ ( $1) = 40

Descriptive statistics; Correlation and regression

WEEK #22: PDFs and CDFs, Measures of Center and Spread

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

MEASURES OF VARIATION

Exploratory Data Analysis. Psychology 3256

Variables. Exploratory Data Analysis

Exploratory data analysis (Chapter 2) Fall 2011

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Statistics Revision Sheet Question 6 of Paper 2

Chapter 1: Exploring Data

Lesson 4 Measures of Central Tendency

Data Exploration Data Visualization

Name: Date: Use the following to answer questions 2-3:

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Descriptive Statistics

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

DATA INTERPRETATION AND STATISTICS

Interpreting Data in Normal Distributions

Lecture 1: Review and Exploratory Data Analysis (EDA)

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Means, standard deviations and. and standard errors

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

Describing and presenting data

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

Measures of Central Tendency and Variability: Summarizing your Data for Others

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

AP * Statistics Review. Descriptive Statistics

Northumberland Knowledge

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Statistics Review PSY379

EXPLORING SPATIAL PATTERNS IN YOUR DATA

3.2 Measures of Spread

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Bar Graphs and Dot Plots

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Module 3: Correlation and Covariance

Motion Graphs. It is said that a picture is worth a thousand words. The same can be said for a graph.

Probability and Statistics

AMS 5 CHANCE VARIABILITY

How To Write A Data Analysis

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Frequency Distributions

Characteristics of Binomial Distributions

Chapter 11: r.m.s. error for regression

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

9. Sampling Distributions

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C?

First Midterm Exam (MATH1070 Spring 2012)

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Common Tools for Displaying and Communicating Data for Process Improvement

What Does the Normal Distribution Sound Like?

COMMON CORE STATE STANDARDS FOR

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

2. Simple Linear Regression

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Topic 9 ~ Measures of Spread

Describing, Exploring, and Comparing Data

Unit 7: Normal Curves

Student Activity: To investigate an ESB bill

Diagrams and Graphs of Statistical Data

Skewness and Kurtosis in Function of Selection of Network Traffic Distribution

Descriptive Statistics and Measurement Scales

Module 4: Data Exploration

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

CHAPTER THREE. Key Concepts

Exploratory Data Analysis

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

+ Chapter 1 Exploring Data

Chapter 2 Data Exploration

Introduction to Statistics and Quantitative Research Methods

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Unit 7 Quadratic Relations of the Form y = ax 2 + bx + c

What is the purpose of this document? What is in the document? How do I send Feedback?

Stat 20: Intro to Probability and Statistics

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Common Core Unit Summary Grades 6 to 8

FACTORING QUADRATICS and 8.1.2

Exploratory Data Analysis

MATH 4470/5470 EXPLORATORY DATA ANALYSIS ONLINE COURSE SYLLABUS

RECOMMENDED COURSE(S): Algebra I or II, Integrated Math I, II, or III, Statistics/Probability; Introduction to Health Science

3. What is the difference between variance and standard deviation? 5. If I add 2 to all my observations, how variance and mean will vary?

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

Algebra I Vocabulary Cards

5. Linear Regression

4. Continuous Random Variables, the Pareto and Normal Distributions

Organizing Your Approach to a Data Analysis

Measurement with Ratios

9.1 Measures of Center and Spread

Welcome to Math Video Lessons. Stanley Ocken. Department of Mathematics The City College of New York Fall 2013

Transcription:

Chapter 4: Average and standard deviation Context................................................................... 2 Average vs. median 3 Average................................................................. 4 Median................................................................. Average vs. median......................................................... 6 r.m.s. size 7 Determine the size of numbers?................................................ 8 Getting rid of the signs...................................................... 9 Getting rid of the signs...................................................... 10 Examples................................................................ 11 Standard deviation 12 Interpretation............................................................. 13 68% and 9% rules......................................................... 14 Examples................................................................ 1 Computing the SD......................................................... 16 Example................................................................ 17 Cross-sectional vs. longitudinal 18 Example: HANES.......................................................... 19 Definitions............................................................... 20 Statistical calculator........................................................ 21 1

Context We are looking at summarizing data We saw that the histogram is a nice graph to summarize data Sometimes we want to summarize even further, by just giving two numbers that describe the center and spread of the data for the center of the data we use: average (=mean), median for the spread of the data we use: standard deviation, interquartile range, range Other new terminology: r.m.s. size, cross-sectional studies, longitudinal studies 2 / 21 Average vs. median 3 / 21 Average Averageof alist = sum of entries number of entries Average of a histogram: imagine that the histogram consists of blocks on stiff weightless board. This average is the balance point of the histogram. See examples on overhead (Figure 6 on page 63) 4 / 21 Median Median is a number such that 0% of the data is smaller than this number, and 0% of the data is larger than this number Median of a list: order the entries from small to large, and take the middle entry. (If the list has an even number of entries, you can take the average of the middle two entries) Median of a histogram: find the point on the horizontal axis, so that half the area of the histogram is to the left, and half the area is to the right of that point. (You often have to guess this) See examples on overhead / 21 2

Average vs. median The average and median are both measures of the center of a distribution What are the differences? The average is more affected by the tail of the distribution. The average is pulled towards the tail. See overhead Three cases: Symmetric: average is about the same as the median Long left tail: average is smaller than the median Long right tail: average is larger than the median When to use which one? If the distribution is symmetric, it does not matter much If the distribution is asymmetric, the median is often a better description of the center. Sometimes people give both. 6 / 21 r.m.s. size 7 / 21 Determine the size of numbers? List: 1, -3,, -6, 3 We want to know the size of these numbers - some measure of how far they are from zero Average = 1 + ( 3) + + ( 6) + 3 = 0 = 0 So the average is not a good measure of the size of these numbers. Why? The positives cancel the negatives. So we want to get rid of the negative signs. How? 8 / 21 3

Getting rid of the signs List: 1, -3,, -6, 3 Option (a): We wipe out the signs and then take the average: 1 + 3 + + 6 + 3 = 18 = 3.6 Option (b): We square the entries, then take the average, and then take the square root again: 1 2 + ( 3) 2 + 2 + ( 6) 2 + 3 2 = 80 = 16 = 4 9 / 21 Getting rid of the signs Both options give about the same answer, and are reasonable Option (a) seems easier to compute For option (b) the math works out nicer. So we will use option (b). Option (b) is called r.m.s. size, and r.m.s. stands for root-mean-square : root of the mean (=average) of the squares 10 / 21 Examples Is the r.m.s. size of the following lists around 1, 10 or 20? List: 1,, -7, 8, 10, 9, -6,, 12, -17 List: 22, -18, -33, 7, 31, -12, 1, 24, -6, 16 List: 1, 2, 0, 0, -1, 0, 0, -3, 0, 1 Find the r.m.s. size of the following lists: List: 7, 7, 7, 7 List: 7, -7, 7, -7 11 / 21 4

Standard deviation 12 / 21 Interpretation The standard deviation (SD) is a measure of spread: If the data are spread out widely, then the SD is large If the data are close together, then the SD is small The SD is a measure for how far numbers are from their average. Most entries on the list are about 1 SD away from the average. Very few are more than 2 or 3 SDs away. 13 / 21 68% and 9% rules Rule of thumb that holds for many lists (but not all): About 68% (2 out of 3) of the entries on a list are within 1 SD from the average. The other 32% are farther away. About 9% (19 out of 20) of the entries on a list are within 2 SDs from the average. The other % are farther away. See overhead for pictures 14 / 21 Examples Take a group of 11 year old boys. The average height is 146 cm, and the SD is 8 cm. One boy was 170 cm tall. He was above average by... SDs. Another boy was 1. SDs below average in height. He was... cm tall. Here are the heights of four boys: 10cm, 130cm, 16cm, 140cm. For each boy, say whether his height was unusually short, about average, or unusually high. About what percentage of the boys had heights between 138cm and 14cm? About what percentage of the boys had heights between 130cm and 162cm? 1 / 21 Computing the SD SD = r.m.s. size of deviations from average: Step 1: compute the average of the list Step 2: compute the deviations from the average (deviation from average = entry - average) Step 3: compute the r.m.s. size of the deviations 16 / 21

Example Compute the SD of the following list: 1, 3, 4,, 7 Step 1: compute the average of the list: Ave = 1 + 3 + 4 + + 7 = 20 = 4 Step 2: compute the deviations from the average: -3, -1, 0, 1, 3 Step 3: compute the r.m.s. size of the deviations (take the root of the mean of the squares: ( 3) 2 + ( 1) 2 + 0 2 + 1 2 + 3 2 = 20 = 4 = 2 The SD of the list is 2. 17 / 21 Cross-sectional vs. longitudinal 18 / 21 Example: HANES HANES: Health And Nutrition Examination Survey (1976-1980) They examined a representative cross-section of 20,322 Americans age 1 to 74. They got data about: demographic variables: age, education, income physiological variables: height, weight, blood pressure, cholesterol levels dietary habits levels of lead and pesticides in the blood prevalence of diseases We look at height and weight, see Figure 3 on page 9 19 / 21 Definitions Cross-sectional: different subjects are compared to each other at one point in time Longitudinal: subjects are followed over time, and compared with themselves at different points in time When interpreting effect of age, be careful to know what type of study it is! 20 / 21 6

Statistical calculator We compute the SD as sum of thesquared deviations from theaverage nr. of entries Some calculators compute the SD as sum of thesquared deviations from theaverage nr.of entries 1 The difference between these two methods is usually small. 21 / 21 7