Lesson 4 Measures of Central Tendency



Similar documents
Measures of Central Tendency and Variability: Summarizing your Data for Others

Descriptive statistics parameters: Measures of centrality

Descriptive Statistics

Descriptive Statistics and Measurement Scales

Frequency Distributions

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Lesson 9 Hypothesis Testing

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

AP * Statistics Review. Descriptive Statistics

Summarizing and Displaying Categorical Data

6.4 Normal Distribution

Exploratory Data Analysis. Psychology 3256

AP Statistics Solutions to Packet 2

Chapter 2: Descriptive Statistics

Lesson 7 Z-Scores and Probability

II. DISTRIBUTIONS distribution normal distribution. standard scores

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

Chapter 6: Probability

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Exercise 1.12 (Pg )

Shape of Data Distributions

TEACHER NOTES MATH NSPIRED

Diagrams and Graphs of Statistical Data

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Means, standard deviations and. and standard errors

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Introduction; Descriptive & Univariate Statistics

MEASURES OF CENTER AND SPREAD MEASURES OF CENTER 11/20/2014. What is a measure of center? a value at the center or middle of a data set

2. Filling Data Gaps, Data validation & Descriptive Statistics

3: Summary Statistics

CALCULATIONS & STATISTICS

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

3.2 Measures of Spread

Exploratory Data Analysis

Adding and Subtracting Positive and Negative Numbers

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Lecture 2. Summarizing the Sample

1.5 Oneway Analysis of Variance

Common Tools for Displaying and Communicating Data for Process Improvement

MEASURES OF VARIATION

Data Exploration Data Visualization

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Describing Data: Measures of Central Tendency and Dispersion

Statistics. Measurement. Scales of Measurement 7/18/2012

What Does the Normal Distribution Sound Like?

Lecture 5 : The Poisson Distribution

3 Describing Distributions

Mathematical goals. Starting points. Materials required. Time needed

1 Descriptive statistics: mode, mean and median

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

Unit 7: Normal Curves

Midterm Review Problems

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Introduction to Quantitative Methods

Descriptive statistics; Correlation and regression

EXPLORING SPATIAL PATTERNS IN YOUR DATA

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Normal distribution. ) 2 /2σ. 2π σ

Lecture 1: Review and Exploratory Data Analysis (EDA)

CHAPTER THREE. Key Concepts

Descriptive Statistics

Interpreting Data in Normal Distributions

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

AP STATISTICS REVIEW (YMS Chapters 1-8)

Module 3: Correlation and Covariance

Without data, all you are is just another person with an opinion.

Descriptive Statistics

Statistics Review PSY379

5.1 Identifying the Target Parameter

THE BINOMIAL DISTRIBUTION & PROBABILITY

Independent samples t-test. Dr. Tom Pierce Radford University

1 Nonparametric Statistics

The Normal Distribution

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Correlation key concepts:

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Chapter 4. Probability and Probability Distributions

Chapter 2 Statistical Foundations: Descriptive Statistics

4. Continuous Random Variables, the Pareto and Normal Distributions

Mind on Statistics. Chapter 2

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

Descriptive Analysis

determining relationships among the explanatory variables, and

The Standard Normal distribution

Week 3&4: Z tables and the Sampling Distribution of X

Covariance and Correlation

Bar Graphs and Dot Plots

Variables. Exploratory Data Analysis

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

Transcription:

Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central Tendency Measures of Shape With frequency distribution you can an idea of a distribution s shape. If we trace the outline of the edges of the frequency bars you can idea about the shape. From this point on, I will draw these shapes to illustrate different point throughout the semester. Keep in mind what you are looking at is a line indicating the frequency or how many values in a distribution lie at a particular point on the scale: just like a histogram.

Modality measures the number of major peaks in a distribution. A single major peak is unimodal, and is the most common modality. Two major peaks is a bi-modal distribution. You could also have multi-modal distributions. Skewness measures the symmetry of a distribution. A symmetric distribution is most common, and is not skewed. If the distribution is not symmetric, and one side does not reflect the other, then it is skewed. Skewness is indicated by the tail or trailing frequencies of the distribution. If the tail is to the right it is a positive skew. If the tail is to the left then it is a negatively skewed distribution. For example, a positively skewed distribution would be: 1, 1, 2, 2, 2, 3, 3, 3, 9, 10. The outliers are on the high end of the scale. On the other hand a negatively skewed distribution might be: 1, 2, 9, 9, 9, 10, 10, 10, 11, 11, 11. Here the outliers are on the low end of the scale.

The normal distribution is one that is unimodal and symmetric. Most things we can measure in nature exhibit a normal distribution of values. Regression toward the mean is an idea that states values will tend to cluster around the mean with few values toward the trailing ends or tails of the distribution. As a result, most things we measure will tend to have a normal shape. Think about measures of height. There are very few people that are extremely tall or extremely short, but most tend to cluster around the average. With I.Q. scores, measures of weight, or most anything we can measure, the same pattern will repeat. Since most things we measure have more values close to the mean, we end up with mostly normally shaped distributions. Measures of Central Tendency Knowing where the center of a distribution is tells us a lot about a distribution. That s because most of the scores in a distribution will tend to cluster about the center. Measures of central tendency give us a number that describes where the center lies (and most scores as well). Mean The mean, or average score, is the arithmetic center of the distribution. You can find the mean by adding all the scores ( Σ X ) together and dividing by the number of values you added together (N). X 1 2 3 4 ΣX = 1 N = For the Population: µ = X N = 1 = 3 For the Sample: x = X n = 1 = 3

Note that we calculate the mean the same way for both the sample and the population we symbolize them differently. Other statistics will differ in how they are computed for the sample versus the population. Most students are familiar with these measures of central tendency, but there are several properties that may be new to you. 1) The first property of the mean is that it is the most reliable and most often used measure of central tendency. 2) The second property of the mean is that it need not be an actual score in the distribution. 3) The third property is that it is strongly influenced by outliers. 4) The fourth property is that the sum of the deviations about the mean must always equal zero. The last two properties need further explanation. An outlier is an extreme score. It is a score that lies apart from most of the rest of the distribution. If there are several outliers in a distribution it will often result in skewed shape to the distribution. Outliers tend to pull central tendency measures with them. Thus a distribution of values 1, 2, 3, 4, has an average of 3. Three does a good job of describing where most of the scores in this distribution lie. However, if there is an outlier, say by substituting 2 for the in the above distribution, then the mean changes a great deal. The new distribution 1, 2, 3, 4, 2 has a mean of 7. Seven is not really close to most of the other values in the distribution. Thus, the mean is a poor measure of the center when we have outliers, or a skewed distribution. A deviation is just a difference. A deviation from the mean is the difference between a score and the mean. So, when we say the sum of the deviations about the mean must always equal zero is just a way of saying that there are just as many differences between values above the mean and the center as there are differences between values below the mean and the center. Thus, for a simple distribution 1, 2, 3, 4, the average is 3. Let s use population symbols and say µ = 3. The deviations are the differences between the score and the mean. X X- µ 1 1-3 = -2 2 2-3 = -1 3 3-3 = 0 4 4-3 = 1-3 = 2 µ = 3 Σ( X µ ) = 0 Now if we add these deviations we will always get zero, no matter what original values we use. This concept will be important when we consider standard deviation because we will need to look at differences between values in our distribution and the mean. Median The median is the physical center of the distribution. It is the value in the middle when the values of the distribution are arranged sequentially. The distribution:

1, 1, 2, 2, 3, 4,,,, 6, 7 has a median value of 4 because there are five values above this point and five values below this point in the distribution (1, 2, 2, 3, 4,,,, 6, 7). If you have an even set of numbers then there will be two values that are at the center, and you average these two values together in order to determine the median. For example, if we take out one of the numbers in the distribution so that we have 1, 1, 2, 2, 3, 4,,,, 6 then the two values in the center are 3 and 4 (1, 1, 2, 2, 3, 4,,,, 6). The average is 3. and that is the median. The median is resistant to outliers. That is, outliers will generally not affect the median and it will not be affected as much as the mean. It is possible the median might move slightly in the direction of the skew or outliers in the distribution. Mode The mode is the most frequent value in the distribution. It is simply the value that appears most often. For example in the distribution: 1, 1, 2, 3, 3, 3, 4, 4, 4, 4,, 6, 7 there is only one mode (4). But, in the distribution: 1, 1, 2, 3, 3, 3, 3, 4, 4, 4, 4,, 6 there are two modes (3 and 4). If there is only one of each value then there is no mode. The mode is not affected by outliers. Since it is only determined by one point on the scale, other values will have no effect. Skewness and Central Tendency We have already discussed how each measure is affected by outliers or skewed distribution. Let s consider this information further. In a positively skewed distribution the outliers will be pulling the mean down the scale a great deal. The median might be slightly lower due to the outlier, but the mode will be unaffected. Thus, with a negatively skewed distribution the mean is numerically lower than the median or mode. The opposite is true for positively skewed distributions. Here the outliers are on the high end of the scale and will pull the mean in that direction a great deal. The median might be slightly affected as well, but not the mode. Thus, with a positively skewed distribution the mean is numerically higher than the median or the mode.