Lecture 3: Measure of Central Tendency

Similar documents
3.2 Measures of Spread

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Lesson 4 Measures of Central Tendency

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY EXERCISE 8/5/2013. MEASURE OF CENTRAL TENDENCY: MODE (Mo) MEASURE OF CENTRAL TENDENCY: MODE (Mo)

Supply Chain Management: Risk pooling

Descriptive Statistics

Business Statistics: Intorduction

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Descriptive statistics parameters: Measures of centrality

Descriptive Statistics and Measurement Scales

Calculation example mean, median, midrange, mode, variance, and standard deviation for raw and grouped data

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Ch. 3.1 # 3, 4, 7, 30, 31, 32

MEASURES OF CENTER AND SPREAD MEASURES OF CENTER 11/20/2014. What is a measure of center? a value at the center or middle of a data set

Lecture 1: Review and Exploratory Data Analysis (EDA)

MEASURES OF VARIATION

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Measures of Central Tendency and Variability: Summarizing your Data for Others

Descriptive Statistics

Midterm Review Problems

Means, standard deviations and. and standard errors

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

MBA 611 STATISTICS AND QUANTITATIVE METHODS

S P S S Statistical Package for the Social Sciences

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Introduction; Descriptive & Univariate Statistics

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Describing Data: Measures of Central Tendency and Dispersion

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Activity 3.7 Statistical Analysis with Excel

3, 8, 8, 6, 4, 2, 8, 9, 4, 5, 1, 5, 7, 8, 9. 3 kg, 11 kg, 5 kg, 20 kg, 11 kg,

AP * Statistics Review. Descriptive Statistics

Levels of measurement in psychological research:

Statistics. Measurement. Scales of Measurement 7/18/2012

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Section 3 Part 1. Relationships between two numerical variables

Frequency Distributions

II. DISTRIBUTIONS distribution normal distribution. standard scores

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey):

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Descriptive Statistics

Exploratory Data Analysis

summarise a large amount of data into a single value; and indicate that there is some variability around this single value within the original data.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Mathematical goals. Starting points. Materials required. Time needed

Activity 4 Determining Mean and Median of a Frequency Distribution Table

Geostatistics Exploratory Analysis

1 Descriptive statistics: mode, mean and median

Interpreting Data in Normal Distributions

2 Describing, Exploring, and

Quantitative Methods for Finance

Describing, Exploring, and Comparing Data

Elementary Statistics

Analyzing and interpreting data Evaluation resources from Wilder Research

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Mathematical goals. Starting points. Materials required. Time needed

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Introduction to Statistics and Quantitative Research Methods

Chapter 21: Savings Models

Statistics Revision Sheet Question 6 of Paper 2

How To Understand And Solve A Linear Programming Problem

Mathematics Pre-Test Sample Questions A. { 11, 7} B. { 7,0,7} C. { 7, 7} D. { 11, 11}

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Mental Questions. Day What number is five cubed? 2. A circle has radius r. What is the formula for the area of the circle?

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Data exploration with Microsoft Excel: univariate analysis

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Name: Date: Use the following to answer questions 3-4:

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Foundation of Quantitative Data Analysis

Lesson outline 1: Mean, median, mode

Thursday, November 13: 6.1 Discrete Random Variables

Exercise 1.12 (Pg )

Alabama Department of Postsecondary Education

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

Module 4: Data Exploration

X: Probability:

Normal distribution. ) 2 /2σ. 2π σ

Statistics Review PSY379

Three had equal scores for the Progressive and Humanistic. schools. The other four had the following combination of

Descriptive statistics

Exploratory data analysis (Chapter 2) Fall 2011

Chapter 2 Statistical Foundations: Descriptive Statistics

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; and Dr. J.A. Dobelman

Probability Distributions

2. Filling Data Gaps, Data validation & Descriptive Statistics

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

AP Statistics Solutions to Packet 2

Survey Research Data Analysis

Introduction to Quantitative Methods

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

IBM SPSS Direct Marketing 23

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

Transcription:

Lecture 3: Measure of Central Tendency Donglei Du (ddu@unb.edu) Faculty of Business Administration, University of New Brunswick, NB Canada Fredericton E3B 9Y2 Donglei Du (UNB) ADM 2623: Business Statistics 1 / 38

Table of contents 1 Measure of central tendency: location parameter Introduction Arithmetic Mean Weighted Mean (WM) Median Mode Geometric Mean Mean for grouped data The Median for Grouped Data The Mode for Grouped Data Donglei Du (UNB) ADM 2623: Business Statistics 2 / 38

Layout 1 Measure of central tendency: location parameter Introduction Arithmetic Mean Weighted Mean (WM) Median Mode Geometric Mean Mean for grouped data The Median for Grouped Data The Mode for Grouped Data Donglei Du (UNB) ADM 2623: Business Statistics 3 / 38

Introduction Characterize the average or typical behavior of the data. There are many types of central tendency measures: Arithmetic mean Weighted arithmetic mean Geometric mean Median Mode Donglei Du (UNB) ADM 2623: Business Statistics 4 / 38

Arithmetic Mean The Arithmetic Mean of a set of n numbers AM = x 1 +...+x n n Arithmetic Mean for population and sample µ = x = N x i i=1 N n x i i=1 n Donglei Du (UNB) ADM 2623: Business Statistics 5 / 38

Example Example: A sample of five executives received the following bonuses last year ($000): 14.0 15.0 17.0 16.0 15.0 Problem: Determine the average bonus given last year. Solution: x = 14+15+17+16+15 5 = 77 5 = 15.4. Donglei Du (UNB) ADM 2623: Business Statistics 6 / 38

Example Example: the weight example (weight.csv) The R code: > weight <- read.csv("weight.csv") > sec_01a<-weight$weight.01a.2013fall # Mean > mean(sec_01a) #[1] 155.8548 Donglei Du (UNB) ADM 2623: Business Statistics 7 / 38

Will Rogers phenomenon Consider two sets of IQ scores of famous people. Group 1 IQ Group 2 IQ Albert Einstein 160 John F. Kennedy 117 Bill Gates 160 George Washington 118 Sir Isaac Newton 190 Abraham Lincoln 128 Mean 170 Mean 123 Let us move Bill Gates from the first group to the second group Group 1 IQ Group 2 IQ Albert Einstein 160 John F. Kennedy 117 Bill Gates 160 Sir Isaac Newton 190 George Washington 118 Abraham Lincoln 128 Mean 175 Mean 130.75 Donglei Du (UNB) ADM 2623: Business Statistics 8 / 38

Will Rogers phenomenon The above example shows the Will Rogers phenomenon: "When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states." Donglei Du (UNB) ADM 2623: Business Statistics 9 / 38

Properties of Arithmetic Mean It requires at least the interval scale All values are used It is unique It is easy to calculate and allow easy mathematical treatment The sum of the deviations from the mean is 0 The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero! It is easily affected by extremes, such as very big or small numbers in the set (non-robust). Donglei Du (UNB) ADM 2623: Business Statistics 10 / 38

The sum of the deviations from the mean is 0: an illustration Values deviations 3-2 4-1 8 3 Mean 5 0 Donglei Du (UNB) ADM 2623: Business Statistics 11 / 38

How Extremes Affect the Arithmetic Mean? The mean of the values 1,1,1,1,100 is 20.8. However, 20.8 does not represent the typical behavior of this data set! Extreme numbers relative to the rest of the data is called outliers! Examination of data for possible outliers serves many useful purposes, including Identifying strong skew in the distribution. Identifying data collection or entry errors. Providing insight into interesting properties of the data. Donglei Du (UNB) ADM 2623: Business Statistics 12 / 38

Weighted Mean (WM) The Weighted Mean (WM) of a set of n numbers WM = w 1x 1 +...+w n x n w 1 +...+w n This formula will be used to calculate the mean and variance for grouped data! Donglei Du (UNB) ADM 2623: Business Statistics 13 / 38

Example Example: During an one hour period on a hot Saturday afternoon Cabana boy Chris served fifty drinks. He sold: five drinks for $0.50 fifteen for $0.75 fifteen for $0.90 fifteen for $1.10 Problem: compute the weighted mean of the price of the drinks WM = 5(0.50)+15(0.75)+15(0.90)+15(1.10) 5+15+15+15 = 43.75 50 = 0.875. Donglei Du (UNB) ADM 2623: Business Statistics 14 / 38

Example Example: the above example The R code: > ## weighted mean wt <- c(5, 15, 15, 15)/50 x <- c(0.5,0.75,0.90,1.1) weighted.mean(x, wt) [1] 0.875 Donglei Du (UNB) ADM 2623: Business Statistics 15 / 38

Median The Median is the midpoint of the values after they have been ordered from the smallest to the largest Equivalently, the Median is a number which divides the data set into two equal parts, each item in one part is no more than this number, and each item in another part is no less than this number. Donglei Du (UNB) ADM 2623: Business Statistics 16 / 38

Two-step process to find the median Step 1. Sort the data in a nondecreasing order Step 2. If the total number of items n is an odd number, then the number on the (n+1)/2 position is the median; If n is an even number, then the average of the two numbers on the n/2 and n/2+1 positions is the median. (For ordinal level of data, choose any one on the two middle positions). Donglei Du (UNB) ADM 2623: Business Statistics 17 / 38

Examples Example: The ages for a sample of five college students are: 21, 25, 19, 20, 22 Arranging the data in ascending order gives: 19, 20, 21, 22, 25. The median is 21. Example: The heights of four basketball players, in inches, are: 76, 73, 80, 75 Arranging the data in ascending order gives: 73, 75, 76, 80. The median is the average of the two middle numbers Median = 75+76 2 = 75.5. Donglei Du (UNB) ADM 2623: Business Statistics 18 / 38

One more example Example: Earthquake intensities are measured using a device called a seismograph which is designed to be most sensitive for earthquakes with intensities between 4.0 and 9.0 on the open-ended Richter scale. Measurements of nine earthquakes gave the following readings: 4.5, L, 5.5, H, 8.7, 8.9, 6.0, H, 5.2 where L indicates that the earthquake had an intensity below 4.0 and a H indicates that the earthquake had an intensity above 9.0. Problem: What is the median earthquake intensity of the sample? Solution: Step 1. Sort: L, 4.5, 5.2, 5.5, 6.0, 8.7, 8.9, H, H Step 2. So the median is 6.0 Donglei Du (UNB) ADM 2623: Business Statistics 19 / 38

Example Example: the weight example (weight.csv) The R code: > weight <- read.csv("weight.csv") > sec_01a<-weight$weight.01a.2013fall # Median > median(sec_01a) #[1] 155 Donglei Du (UNB) ADM 2623: Business Statistics 20 / 38

Properties of Median It requires at least the ordinal scale All values are used It is unique It is easy to calculate but does not allow easy mathematical treatment It is not affected by extremely large or small numbers (robust) Donglei Du (UNB) ADM 2623: Business Statistics 21 / 38

Mode The number that has the highest frequency. Donglei Du (UNB) ADM 2623: Business Statistics 22 / 38

Example Example: The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87 The score of 81 occurs the most often. It is the Mode! Donglei Du (UNB) ADM 2623: Business Statistics 23 / 38

Example Example: the weight example (weight.csv) The R code: > weight <- read.csv("weight.csv") > sec_01a<-weight$weight.01a.2013fall # Mode > names(table(sec_01a)[which.max(table(sec_01a))]) #[1] 155 Donglei Du (UNB) ADM 2623: Business Statistics 24 / 38

Properties of Mode Even nominal data have mode(s) All values are used It is not unique Modeless: if all data have different values, such as 1,1,1 Multimodal: if more than one value have the same frequency, such as 1,1,2,2,3. It is easy to calculate but does not allow easy mathematical treatment It is not affected by extremely large or small numbers (robust) Donglei Du (UNB) ADM 2623: Business Statistics 25 / 38

Geometric Mean (GM) Given a of a set of n numbers x 1,...,x n, the geometric mean is given by the following formula: GM = n x 1 x n If we know the initial and final value over a certain period of n (instead of the individual number sin each period), then GM = n final value initial value Donglei Du (UNB) ADM 2623: Business Statistics 26 / 38

Example Example: The interest rate on three bonds was 5%, 21%, and 4% percent. Suppose you invested $10000 at the beginning on the first bond, then switch to the second bond in the following year, and switch again to the third bond the next year. Problem: What is your final return? Solution: Your final return will be 10000 GM 3 = 10,000 1.097 3 = 1321.32, where GM = 3 1.05 1.21 1.04 1.097. Donglei Du (UNB) ADM 2623: Business Statistics 27 / 38

Example Example: the above example The R code: #geometric mean: R does not have a built-in function for #You can install and use another package > library(psych) > rates<-c(1.05, 1.21,1.04) > geometric.mean(rates) [1] 1.097327 Donglei Du (UNB) ADM 2623: Business Statistics 28 / 38

Example Example: The total number of females enrolled in American colleges increased from 755,000 in 1992 to 835,000 in 2000. Problem: What is your the the Geometric Mean rate of increase?. Solution: where 835, 000 GM 1 = 8 1.0127 1 = 1.27%. 755,000 Donglei Du (UNB) ADM 2623: Business Statistics 29 / 38

Arithmetic Mean vs Geometric Mean: the AM-GM inequality: If x 1,...,x n 0, then AM = x 1 +x 2 + +x n n with equality if and only if x 1 = x 2 = = x n. n x 1 x 2...x n = GM, Donglei Du (UNB) ADM 2623: Business Statistics 30 / 38

Properties of Geometric Mean Similar to arithmetic mean, expect used in different scenario It requires interval level All values are used It is unique It is easy to calculate and allow easy mathematical treatments Donglei Du (UNB) ADM 2623: Business Statistics 31 / 38

Mean for grouped data The mean of a sample of data organized in a frequency distribution is computed by the following formula: x = f 1x 1 +...+f k x k f 1 +...+f k, where f i is the frequency of Class i and x i is the class mid-point of Class i. Donglei Du (UNB) ADM 2623: Business Statistics 32 / 38

Example Example: Recall the weight example from Chapter 2: class freq. (f i ) mid point (x i ) f i x i [130, 140) 3 135 405 [140, 150) 12 145 1740 [150, 160) 23 155 3565 [160, 170) 14 165 2310 [170, 180) 6 175 1050 [180, 190] 4 185 740 62 9810 The mean for the grouped data is: x = 9810 62 158.2258. The real mean for the raw data is 155.8548. Donglei Du (UNB) ADM 2623: Business Statistics 33 / 38

Median for grouped data: Two-step procedure Step 1: identify the median class, which is the class that contains the number on the n/2 position. Step 2: Estimate the median value within the median class using the following formula: median = L+C n 2 CF, f where L is the lower limit of the median class CF is the cumulative frequency before the median class f is the frequency of the median class C is the class interval or size Donglei Du (UNB) ADM 2623: Business Statistics 34 / 38

Example Example: Recall the weight example from Chapter 2: class freq relative freq. cumulative freq. [130, 140) 3 0.05 3 [140, 150) 12 0.19 15 10 {}}{ [ 150, 160) 23 0.37 38 [160, 170) 14 0.23 52 [170, 180) 6 0.10 58 [180, 190] 4 0.06 62 median = 150+10 62 2 15 156.9565. 23 Donglei Du (UNB) ADM 2623: Business Statistics 35 / 38

An explanation of the median formula L n CF 2 1 2 3 C f U Donglei Du (UNB) ADM 2623: Business Statistics 36 / 38

Mode for grouped data: Two-step procedure Step 1: Identify the modal class, which is the class(es) that has the highest frequency(ies). Step 2: Estimate the modal(s) within the modal class (es) as the class midpoint(s). Donglei Du (UNB) ADM 2623: Business Statistics 37 / 38

Example Example: Recall the weight example from Chapter 2: class freq. (f i ) mid point (x i ) [130, 140) 3 135 [140, 150) 12 145 [150, 160) 23 155 [160, 170) 14 165 [170, 180) 6 175 [180, 190] 4 185 mode = 155. Donglei Du (UNB) ADM 2623: Business Statistics 38 / 38