Chapter 3: Central Tendency

Similar documents
DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Lesson 4 Measures of Central Tendency

Descriptive Statistics and Measurement Scales

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

MEASURES OF VARIATION

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Module 3: Correlation and Covariance

Summarizing and Displaying Categorical Data

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Means, standard deviations and. and standard errors

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Measures of Central Tendency and Variability: Summarizing your Data for Others

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

CALCULATIONS & STATISTICS

Northumberland Knowledge

Descriptive Statistics

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY EXERCISE 8/5/2013. MEASURE OF CENTRAL TENDENCY: MODE (Mo) MEASURE OF CENTRAL TENDENCY: MODE (Mo)

Statistics Revision Sheet Question 6 of Paper 2

TEACHER NOTES MATH NSPIRED

3: Summary Statistics

Frequency Distributions

How To Write A Data Analysis

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Statistics Review PSY379

Describing, Exploring, and Comparing Data

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

II. DISTRIBUTIONS distribution normal distribution. standard scores

Lecture 1: Review and Exploratory Data Analysis (EDA)

Midterm Review Problems

Data Exploration Data Visualization

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

CHAPTER THREE. Key Concepts

Statistics. Measurement. Scales of Measurement 7/18/2012

Exercise 1.12 (Pg )

6.4 Normal Distribution

Foundation of Quantitative Data Analysis

Module 4: Data Exploration

Chapter 2 Statistical Foundations: Descriptive Statistics

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Chapter 2: Descriptive Statistics

AP * Statistics Review. Descriptive Statistics

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Interpreting Data in Normal Distributions

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

DATA INTERPRETATION AND STATISTICS

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

Exploratory Data Analysis. Psychology 3256

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Describing Data: Measures of Central Tendency and Dispersion

Introduction to Quantitative Methods

Week 4: Standard Error and Confidence Intervals

Exploratory data analysis (Chapter 2) Fall 2011

Diagrams and Graphs of Statistical Data

Calculation example mean, median, midrange, mode, variance, and standard deviation for raw and grouped data

1.5 Oneway Analysis of Variance

3. What is the difference between variance and standard deviation? 5. If I add 2 to all my observations, how variance and mean will vary?

Study Guide for the Final Exam

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Session 7 Bivariate Data and Analysis

Introduction; Descriptive & Univariate Statistics

Descriptive Statistics

MEASURES OF CENTER AND SPREAD MEASURES OF CENTER 11/20/2014. What is a measure of center? a value at the center or middle of a data set

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

Descriptive statistics parameters: Measures of centrality

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics Chapter 2

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Basic Tools for Process Improvement

Bar Graphs and Dot Plots

Chapter 5 Analysis of variance SPSS Analysis of variance

6 3 The Standard Normal Distribution

Lecture 2. Summarizing the Sample

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

CHAPTER TWELVE TABLES, CHARTS, AND GRAPHS

1 One Dimensional Horizontal Motion Position vs. time Velocity vs. time

1 Descriptive statistics: mode, mean and median

TECHNIQUES OF DATA PRESENTATION, INTERPRETATION AND ANALYSIS

Quantitative vs. Categorical Data: A Difference Worth Knowing Stephen Few April 2005

Chapter 1: Exploring Data

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Mathematical goals. Starting points. Materials required. Time needed

Distribution is a χ 2 value on the χ 2 axis that is the vertical boundary separating the area in one tail of the graph from the remaining area.

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Transcription:

Chapter 3: Central Tendency

Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the distribution and represents the entire distribution of scores. The goal of central tendency is to identify the single value that is the best representative for the entire set of data.

Central Tendency (cont'd.) By identifying the "average score," central tendency allows researchers to summarize or condense a large set of data into a single value. Thus, central tendency serves as a descriptive statistic because it allows researchers to describe or present a set of data in a very simplified, concise form. In addition, it is possible to compare two (or more) sets of data by simply comparing the average score (central tendency) for one set versus the average score for another set.

The Mean, the Median, and the Mode It is essential that central tendency be determined by an objective and well-defined procedure so that others will understand exactly how the "average" value was obtained and can duplicate the process. No single procedure always produces a good, representative value. Therefore, researchers have developed three commonly used techniques for measuring central tendency: the mean, the median, and the mode.

The Mean The mean is the most commonly used measure of central tendency. Computation of the mean requires scores that are numerical values measured on an interval or ratio scale. The mean is obtained by computing the sum, or total, for the entire set of scores, then dividing this sum by the number of scores.

The Mean (cont'd.) Conceptually, the mean can also be defined in the following ways: 1. The mean is the amount that each individual receives when the total (ΣX) is divided equally among all N individuals. 2. The mean is the balance point of the distribution because the sum of the distances below the mean is exactly equal to the sum of the distances above the mean.

Changing the Mean Because the calculation of the mean involves every score in the distribution, changing the value of any score will change the value of the mean. Modifying a distribution by discarding scores or by adding new scores will usually change the value of the mean. To determine how the mean will be affected for any specific situation you must consider: 1) how the number of scores is affected, and 2) how the sum of the scores is affected.

Changing the Mean (cont'd.) If a constant value is added to every score in a distribution, then the same constant value is added to the mean. Also, if every score is multiplied by a constant value, then the mean is also multiplied by the same constant value.

When the Mean Won t Work Although the mean is the most commonly used measure of central tendency, there are situations where the mean does not provide a good, representative value, and there are situations where you cannot compute a mean at all. When a distribution contains a few extreme scores (or is very skewed), the mean will be pulled toward the extremes (displaced toward the tail). In this case, the mean will not provide a "central" value.

When the Mean Won t Work (cont'd.) With data from a nominal scale it is impossible to compute a mean, and when data are measured on an ordinal scale (ranks), it is usually inappropriate to compute a mean. Thus, the mean does not always work as a measure of central tendency and it is necessary to have alternative procedures available.

The Median If the scores in a distribution are listed in order from smallest to largest, the median is defined as the midpoint of the list. The median divides the scores so that 50% of the scores in the distribution have values that are equal to or less than the median. Computation of the median requires scores that can be placed in rank order (smallest to largest) and are measured on an ordinal, interval, or ratio scale.

The Median (cont'd.) Usually, the median can be found by a simple counting procedure: 1. With an odd number of scores, list the values in order, and the median is the middle score in the list. 2. With an even number of scores, list the values in order, and the median is half-way between the middle two scores.

The Median (cont'd.) If the scores are measurements of a continuous variable, it is possible to find the median by first placing the scores in a frequency distribution histogram with each score represented by a box in the graph. Then, draw a vertical line through the distribution so that exactly half the boxes are on each side of the line. The median is defined by the location of the line.

The Median (cont'd.) One advantage of the median is that it is relatively unaffected by extreme scores. Thus, the median tends to stay in the "center" of the distribution even when there are a few extreme scores or when the distribution is very skewed. In these situations, the median serves as a good alternative to the mean.

The Mode The mode is defined as the most frequently occurring category or score in the distribution. In a frequency distribution graph, the mode is the category or score corresponding to the peak or high point of the distribution. The mode can be determined for data measured on any scale of measurement: nominal, ordinal, interval, or ratio.

The Mode (cont'd.) The primary value of the mode is that it is the only measure of central tendency that can be used for data measured on a nominal scale. In addition, the mode often is used as a supplemental measure of central tendency that is reported along with the mean or the median.

Bimodal Distributions It is possible for a distribution to have more than one mode. Such a distribution is called bimodal. (Note that a distribution can have only one mean and only one median.) In addition, the term "mode" is often used to describe a peak in a distribution that is not really the highest point. Thus, a distribution may have a major mode at the highest peak and a minor mode at a secondary peak in a different location.

Central Tendency and the Shape of the Distribution Because the mean, the median, and the mode are all measuring central tendency, the three measures are often systematically related to each other. In a symmetrical distribution, for example, the mean and median will always be equal.

Central Tendency and the Shape of the Distribution (cont'd.) If a symmetrical distribution has only one mode, the mode, mean, and median will all have the same value. In a skewed distribution, the mode will be located at the peak on one side and the mean usually will be displaced toward the tail on the other side. The median is usually located between the mean and the mode.

Reporting Central Tendency in Research Reports In manuscripts and in published research reports, the sample mean is identified with the letter M. There is no standardized notation for reporting the median or the mode. In research situations where several means are obtained for different groups or for different treatment conditions, it is common to present all of the means in a single graph.

Reporting Central Tendency in Research Reports (cont'd.) The different groups or treatment conditions are listed along the horizontal axis and the means are displayed by a bar or a point above each of the groups. The height of the bar (or point) indicates the value of the mean for each group. Similar graphs are also used to show several medians in one display.

Chapter 4: Variability

Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies a measure of central tendency as basic descriptive statistics for a set of scores.

Central Tendency and Variability Central tendency describes the central point of the distribution, and variability describes how the scores are scattered around that central point. Together, central tendency and variability are the two primary values that are used to describe a distribution of scores.

Variability Variability serves both as a descriptive measure and as an important component of most inferential statistics. As a descriptive statistic, variability measures the degree to which the scores are spread out or clustered together in a distribution. In the context of inferential statistics, variability provides a measure of how accurately any individual score or sample represents the entire population.

Variability (cont'd.) When the population variability is small, all of the scores are clustered close together and any individual score or sample will necessarily provide a good representation of the entire set. On the other hand, when variability is large and scores are widely spread, it is easy for one or two extreme scores to give a distorted picture of the general population.

Measuring Variability Variability can be measured with The range The standard deviation/variance In both cases, variability is determined by measuring distance.

The Range The range is the total distance covered by the distribution, from the highest score to the lowest score (using the upper and lower real limits of the range).

The Range (cont'd.) Alternative definitions of range: When scores are whole numbers or discrete variables with numerical scores, the range tells us the number of measurement categories. Alternatively, the range can be defined as the difference between the largest score and the smallest score.

The Standard Deviation Standard deviation measures the standard (average) distance between a score and the mean. The calculation of standard deviation can be summarized as a four-step process:

The Standard Deviation (cont'd.) 1. Compute the deviation (distance from the mean) for each score. 2. Square each deviation. 3. Compute the mean of the squared deviations. For a population, this involves summing the squared deviations (sum of squares, SS) and then dividing by N. The resulting value is called the variance or mean square and measures the average squared distance from the mean. For samples, variance is computed by dividing the sum of the squared deviations (SS) by n - 1, rather than N. The value, n - 1, is know as degrees of freedom (df) and is used so that the sample variance will provide an unbiased estimate of the population variance. 4. Finally, take the square root of the variance to obtain the standard deviation.

Properties of the Standard Deviation If a constant is added to every score in a distribution, the standard deviation will not be changed. If you visualize the scores in a frequency distribution histogram, then adding a constant will move each score so that the entire distribution is shifted to a new location. The center of the distribution (the mean) changes, but the standard deviation remains the same.

Properties of the Standard Deviation (cont'd.) If each score is multiplied by a constant, the standard deviation will be multiplied by the same constant. Multiplying by a constant will multiply the distance between scores, and because the standard deviation is a measure of distance, it will also be multiplied.

The Mean and Standard Deviation as Descriptive Statistics If you are given numerical values for the mean and the standard deviation, you should be able to construct a visual image (or a sketch) of the distribution of scores. As a general rule, about 70% of the scores will be within one standard deviation of the mean, and about 95% of the scores will be within a distance of two standard deviations of the mean.