Statistics 13 Elementary Statistics

Similar documents
Descriptive Statistics

Exercise 1.12 (Pg )

Exploratory data analysis (Chapter 2) Fall 2011

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Variables. Exploratory Data Analysis

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Diagrams and Graphs of Statistical Data

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Lecture 1: Review and Exploratory Data Analysis (EDA)

Summarizing and Displaying Categorical Data

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

How To Write A Data Analysis

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Week 1. Exploratory Data Analysis

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

3: Summary Statistics

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

MEASURES OF VARIATION

Sta 309 (Statistics And Probability for Engineers)

Exploratory Data Analysis

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Foundation of Quantitative Data Analysis

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Chapter 2: Frequency Distributions and Graphs

3.2 Measures of Spread

Using SPSS, Chapter 2: Descriptive Statistics

Sampling and Descriptive Statistics

THE BINOMIAL DISTRIBUTION & PROBABILITY

Northumberland Knowledge

Statistics Revision Sheet Question 6 of Paper 2

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Describing, Exploring, and Comparing Data

AP Statistics Solutions to Packet 2

Data Exploration Data Visualization

Statistics Chapter 2

Describing and presenting data

Scatter Plots with Error Bars

List of Examples. Examples 319

Mean = (sum of the values / the number of the value) if probabilities are equal

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

6.4 Normal Distribution

What Does the Normal Distribution Sound Like?

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Topic 9 ~ Measures of Spread

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

2 Describing, Exploring, and

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

Lesson 4 Measures of Central Tendency

Common Core Unit Summary Grades 6 to 8

AP * Statistics Review. Descriptive Statistics

Common Tools for Displaying and Communicating Data for Process Improvement

Measurement with Ratios

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Big Ideas in Mathematics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

AP STATISTICS REVIEW (YMS Chapters 1-8)

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Ch. 3.1 # 3, 4, 7, 30, 31, 32

Exploratory Data Analysis. Psychology 3256

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Descriptive statistics parameters: Measures of centrality

Intro to Statistics 8 Curriculum

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Practice#1(chapter1,2) Name

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, , , 4-9

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

Module 4: Data Exploration

Lecture 2. Summarizing the Sample

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

DATA INTERPRETATION AND STATISTICS

Chapter 1: Exploring Data

Basics of Statistics

Algebra I Vocabulary Cards

Means, standard deviations and. and standard errors

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Fairfield Public Schools

+ Chapter 1 Exploring Data

CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide

Chapter 3 RANDOM VARIATE GENERATION

determining relationships among the explanatory variables, and

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

GeoGebra Statistics and Probability

Transcription:

Statistics 13 Elementary Statistics Summer Session I 2012 Lecture Notes 2: Methods for Describing Data 1 Describing Qualitative Data Definition 2.1 classified. A class is one of the categories into which qualitative data can be Definition 2.2 The class frequency is the number of observations in the data set that fall into a particular class. Definition 2.3 The class relative frequency is the class frequency divided by the total number of observations in the data set; that is class relative frequency = class frequency total number of observations Definition 2.4 The class percentage is the class relative frequency multiplied by 100; that is, class percentage = (class relative frequency) 100 Summary of Graphical Descriptive Methods for Qualitative Data Bar Graph: The categories (classes) of the qualitative variable are represented by bars, where the height of each bar is either the class frequency, class relative frequency, or class percentage. Pie Chart: The categories (classes) of the qualitative variable are represented by slices of a pie (circle). The size of each slice is proportional to the class relative frequency. Pareto Diagram: A bar graph with the categories (classes) of the qualitative variable (i.e., the bars) arranged by height in descending order from left to right. 1 Last update: June 25, 2012 1

Control Treatment 12.5% 16.7% 6.7% 17.8% 28.9% 20.8% 12.5% 17.8% 18.8% 18.8% 13.3% 15.6% 25 Under $25,000 25 Under $25,000 $25,000 $50,000 $25,000 $50,000 20 $50,001 $75,000 $75,001 $100,000 Above $100,000 20 $50,001 $75,000 $75,001 $100,000 Above $100,000 Prefer not to answer Prefer not to answer 15 15 13 10 5 8 6 9 9 10 6 10 5 7 6 8 8 3 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Income of the patients: Examples of pie charts (top) and bar graphs (down) Reasons for arriving late at work (from Wikipedia): Example of Pareto Diagram Describing Quantitative Data Summary of Graphical Descriptive Methods for Quantitative Data Dot Plot: The numerical value of each quantitative measurement in the data set is represented by a dot on a horizontal scale. When data values repeat, the dots are placed above one another vertically. Stem-and-Leaf Display: The numerical value of the quantitative variable is partitioned into a stem and a leaf. The possible stems are listed in order in a column. The leaf for each quantitative measurement in the data set is placed in the corresponding stem row. Leaves for observations with the same stem value are listed in increasing order horizontally. Histogram: The possible numerical values of the quantitative variable are partitioned into class intervals, each of which has the same width. These intervals from the scale of the horizontal axis. The frequency or relative frequency of observations in each class interval is determined. A vertical bar is placed over each class interval, with the height of the bar equal to either the class frequency or class relative frequency. 2

Dotplots Example 1 The outbreak of food poisoning on a sportsday, Thailand 1990. Age by sex Distribution of birthdate F M Frequency 0 5 10 15 20 0 10 20 30 40 50 60 70 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 Stem-and-Leaf Display Example 2 The following data show the ages of the 27 residents of Alcan, Alaska. (Source: U.S. Bureau of the Census) The stem-and-plot leaf for the data: 45 1 52 42 10 40 50 40 7 46 19 35 3 11 31 6 41 12 43 37 8 41 48 42 55 30 58 0 13678 1 0129 2 3 0157 4 0011223568 5 0258 3

Histograms Example 3 Using the age data from above. Histogram of age Histogram of age Frequency 0 2 4 6 8 10 Relative Frequency 0.00 0.01 0.02 0.03 0.04 0 10 20 30 40 50 60 age 0 10 20 30 40 50 60 age The Meaning of Summation Notation n i=1 x i Sum the measurements of the variable that appears to the right of the summation symbol, beginning with the first measurement and ending with the nth measurement. Example 4 A data set contains the observations 5,1,3,2,1. Then we set x 1 = 5, x 2 = 1, x 3 = 3, x 4 = 2, x 5 = 1. Then a. 5 i=1 x i = x 1 + x 2 + x 3 + x 4 + x 5 = 5 + 1 + 3 + 2 + 1 = 12 b. 5 i=1 x 2 i = x2 1 + x2 2 + x2 3 + x2 4 + x2 5 = 52 + 1 2 + 3 2 + 2 2 + 1 2 = 12 c. 5 i=1 (x 1) = (x 1 1) + (x 2 1) + (x 3 1) + (x 4 1) + (x 5 1) = (x 1 + x 2 + x 3 + x 4 + x 5 ) (1 + 1 + 1 + 1 + 1) = 5 i=1 x i 5 = 12 5 = 7 d. 5 i=1 (x 1) 2 = (x 1 1) 2 +(x 2 1) 2 +(x 3 1) 2 +(x 4 1) 2 +(x 5 1) 2 = 4 2 +0 2 +2 2 +1 2 +0 2 = 21 e. ( 5 i=1 x i ) 2 = (x 1 + x 2 + x 3 + x 4 + x 5 ) 2 = (5 + 1 + 3 + 2 + 1) 2 = 12 2 = 144 Definition 2.5 The mean of a set of quantitative data is the sum of the measurements, divided by the number of measurements contained in the data set. Formula for a Sample Mean: x = n i=1 x i n Symbols for the Sample Mean and the Population Mean x =Sample mean µ =Population mean 4

Definition 2.6 The median of a quantitative data set is the middle number when the measurements are arranged in ascending (or descending) order. Calculating a Sample Median M Arrange the n measurements from the smallest to the largest. 1. If n is odd, M is the middle number. 2. If n is even, M is the mean of the middle two numbers. Definition 2.7 A data set is said to be skewed if one tail of the distribution has more extreme observations than the other tail. mean median mean median mean median Relative frequency Relative frequency Relative frequency Rightward skewness Symmetry Leftward skewness Definition 2.8 set. The mode is the measurement that occurs most frequently in the data Definition 2.9 The range of a quantitative data set is equal to the largest measurement minus the smallest measurement. Definition 2.10 The sample variance for a sample of n measurements is equal to the sum of the squared distances from the mean, divided by (n 1). The symbol s 2 is used to represent the sample variance. n i=1 (x i x) 2 Formula for a Sample Variance: s 2 = n 1 n n A shortcut formula: s 2 i=1 = x2 ( i=1 x i) 2 i n n 1 5

Definition 2.11 The sample standard deviation, s, is defined as the positive square root of the sample variance, s 2, or, mathematically, s = s 2 Symbols for Variance and Standard Deviation s 2 = Sample variance s = Sample standard deviation σ 2 = Population variance σ = Population standard deviation Numerical Descriptive Measures Central Tendency Mean Median Mode Variation Range Variance Standard Deviation Two ways to interpret the standard deviation: 1. Chebyshev s Rule and 2. Empirical Rule. 1. Chebyshev s rule applies to any data set, regardless of the shape of the frequency distribution of the data. a. It is possible that very few of the measurements will fall within one standard deviation of the mean. b. At least 3/4 of the measurements will fall within two standard deviations of the mean. c. At least 8/9 of the measurements will fall within three standard deviations of the mean. d. Generally, for any number k greater than 1, at least (1 1/k 2 ) of the measurements will fall within k standard deviations of the mean. 2. Empirical rule is a rule of thumb that applies to data sets with frequency distributions that are mound shaped and symmetric, as follows: Relative frequency Population measurements 6

a. Approximately 68% of the measurements will fall within one standard deviation of the mean. b. Approximately 95% of the measurements will fall within two standard deviations of the mean. c. Approximately 99.7% (essentially all) of the measurements will fall within three standard deviation of the mean. x ± s x ± 2s x ± 3s x ± ks ( x ± σ) ( x ± 2σ) ( x ± 3σ) ( x ± kσ) Chebyshev s rule less than 3 At least 3 At least 8 At least (1 1 ) 4 4 9 k 2 Empirical rule approx 68% approx 95% approx 99.7% Example 5 Use Chebyshev s Theorem to give a lower bound on the percent of data in the interval ( x 2.5s, x + 2.5s). Answer: At least 1 1/2.5 2 = 0.84 = 84% of the measurements will fall within the interval. i.e. The lower bound is 84%. Definition 2.12 For any set of n measurements (arranged in ascending or descending order), the pth percentile is a number such that p% of the measurements fall below that number and (100 p)% fall above it. Definition 2.13 The sample z-score for a measurement x is z = x x s The population z-score for a measurement x is z = x µ σ Interpretation of z-scores for Mound-Shaped Distributions of Data 1. Approximately 68% of the measurements will have a z-score between -1 and 1. 2. Approximately 95% of the measurements will have z-score between -2 and 2. 3. Approximately 97% (almost all) of the measurements will have a z-score between -3 and 3. Definition 2.14 An observation (or measurement) that is unusually large or small relative to the other values in a data set is called an outlier. Outliers typically are attributable to one of the following causes: 1. The measurement is observed, recorded, or entered into the computer incorrectly. 2. The measurement comes from a different population. 7

3. The measurement is correct, but represents a rare (chance) event. Definition 2.15 The lower quartile Q L is the 25th percentile of a data set. The middle quartile M is the median. The upper quartile Q U is the 75th percentile. Definition 2.16 The interquartile range (IQR) is the distance between the lower and upper quartiles. Elements of a Box Plot IQR= Q U Q L 1. A rectangle (the box) is drawn with the ends (the hinges) drawn at the lower and upper quartiles(q L and Q U ). The median of the data is shown in the box, usually by a line. 2. The points at distances 1.5(IQR) from each hinge mark the inner fences of the data set. Lines (the whiskers) are drawn from each hinge to the most extreme measurement inside the inner fence. Thus, Lower inner fence= Q L 1.5(IQR) Upper inner fence= Q U + 1.5(IQR) A second pair of fences, the outer fences, appears at a distance of 3(IQR) from the hinges. One symbol (e.g., * ) is used to represent measurements falling between the inner and outer fences, and another (e.g., 0 ) is used to represent measurements that lie beyond the outer fences. Thus outer fences are not shown unless one or more measurements lie beyond them. We have Lower outer fence= Q L 3(IQR) Upper outer fence= Q U + 3(IQR) Different symbols can be used to represent the median and the extreme data points. Measurements beyond the outer fences are probably outliers. Graphing Bivariate Relationships One way to describe the relationship between two quantitative variables, called a bivariate relationship, is to plot the data in a scattergram (or scatterplot). a. Positive relationship b. Negative relationship c. No relationship 8