1 Central Tendency
2 Central Tendency n A single summary score that best describes the central location of an entire distribution of scores. n Measures of Central Tendency: n Mean n The sum of all scores divided by the number of scores. n Median n The value that divides the distribution in half when observations are ordered. n Mode n The most frequent score.
3 Central Tendency Example: Mode n 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 n Mode: most frequent observation n Mode(s) for hotel rates: n 264, 317, 384
4 Pros and Cons of the Mode n Pros n Good for nominal data. n Good when there are two typical scores. n Easiest to compute and understand. n The score comes from the data set. n Cons n Ignores most of the information in a distribution. n Small samples may not have a mode.
5 Central Tendency Example: Median n 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 n The median is the middle value when observations are ordered. n To find the middle, count in (N+1)/2 scores when observations are ordered lowest to highest. n Median hotel rate: n (35+1)/2 = 18 n 317
6 Finding the median with an even number of scores. n 2, 2, 3, 5, 6, 7, 7, 7, 8, 9 n With an even number of scores, the median is the average of the middle two observations when observations are ordered. n Find the average of the N/2 and the (N/2)+1 score. n N/2 = 5 th score, (N/2)+1 = 6 th score n Add middle two observations and divide by two. n (6+7)/2 = 6.5 n Median is 6.5
7 Pros and Cons of Median n Pros n Not influenced by extreme scores or skewed distributions. n Good with ordinal data. n Easier to compute than the mean. n Cons n May not exist in the data. n Doesn t take actual values into account.
8 Mean: the average of all the scores mean= x N eg mean = = 4.1 *the most commonly used measure of central tendency *the balance point of a distribution (illustrated next slide)
9 Mean n Is the balance point of a distribution.
10 Mean n Population mu µ = ΣX N sigma, the sum of X, add up all scores N, the total number of scores in a population n Sample sigma, the sum of X, add up all scores X bar X = ΣX n n, the total number of scores in a sample
11 Central Tendency Example: Mean n 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891 n Mean hotel rate: X X = ΣX n = = Mean hotel rate: $371.60
12 Pros and Cons of the Mean n Pros n Mathematical center of a distribution. n Just as far from scores above it as it is from scores below it. n Good for interval and ratio data. n Does not ignore any information. n Inferential statistics is based on mathematical properties of the mean. n Cons n Influenced by extreme scores and skewed distributions. n May not exist in the data.
13 The effect of skew on average. n In a normal distribution, the mean, median, and mode are the same. n In a skewed distribution, the mean is pulled toward the tail.
14 Which average? n Each measure contains a different kind of information. n For example, all three measures are useful for summarizing the distribution of American household incomes. n In 1998, the income common to the greatest number of households was $25,000. n Half the households earned less than $38,885. n The mean income was $50,600. n Reporting only one measure of central tendency might be misleading and perhaps reflect a bias.
15 Describing Data with Tables & Graphs
16 Descriptive Statistics n The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way. n What is the pattern of scores over the range of possible values? n Where, on the scale of possible scores, is a point that best represents the set of scores? n Do the scores cluster about their central point or do they spread out around it?
17 What is the pattern of scores? n Graphs often make it easier to see certain characteristics and trends in a set of data. n Graphs for quantitative data. n n n Stem and Leaf Display Histogram Frequency Polygon n Graphs for qualitative data. n n Bar Chart Pie Chart number/count class/category
18 Histogram n Consists of a number of bars placed side by side. n The width of each bar indicates the interval size. n The height of each bar indicates the frequency of the interval. n There are no gaps between adjacent bars. n Continuous nature of quantitative data.
19 Tables: Frequency Distributions n (def) organizes raw data or observations that have been collected by showing how often an observation occurs in each class n For quantitative data n Ungrouped: list all possible scores that occur, and then indicate how often each score occurs n (Maximizes information about individual scores) n Grouped: combine all possible scores into classes and indicate how often each score occurs within a class n (Easier to see patterns in data, but lose information bout individual scores) n Do not use if you have more than 20 classes!
20 Organize these depression scores! (ranging from 1 to 10) Ungrouped 1. Each observation should be included in only one class 2. List all classes, even those with zero frequencies score f Total 32
21 Organize these depression scores! (ranging from 1 to 10) Grouped 1. Find the lowest and highest scores 2. Classes should have (roughly) equal intervals 3. Each observation should be included in only one class 4. List all classes, even those with zero frequencies score 1-3 (no depression) (moderate depression) 7-10 (high depression) 4 Total 32 f 8
22 Tables: Relative Frequency Distributions n (def) a frequency distribution that also includes the proportion of each group n Divide the frequency of each class by the total frequency 20/32 =.625 score f Relative f 1-3 (no depression) (moderate depression) (high depression) Total
23 Tables: Cumulative Frequency Distributions n (def) a frequency distribution that also includes the total number of observations in each class and all lower-ranked classes n Cumulative f: Add the frequency of each class to the sum of all classes ranked before it n Cumulative %: Divide the cumulative frequency by the total sample (i.e., percentile rank) score f Relative f Cumulative f Cumulative % 1-3 (no depression) % 4-6 (moderate depression) % 7-10 (high depression) % Total = 28 28/32 =.875
24 Graphs for quantitative data: Histogram n Consists of bars placed side by side (to represent continuity of data), the width of each bar indicates a single interval, the height indicates the frequency of the interval 10! 8! Frequency! 6! 4! 2! 0! 1! 2! 3! 4! 5! 6! 7! 8! 9! Level of Depression! 10!
25 Outliers n (def) a very extreme score n Is it accurate? n Should you segregate it from your summary? n Will it enhance your understanding?
26 Graphs: Shape
27 Review! Assume all scores go from low to high n Draw and describe the shape: n IQ scores for the general population n Attractiveness scores for a bunch of supermodels n 1 st grader s math scores on a college-level exam n Annual income level for a community consisting of equal numbers of relatively poor and relatively wealthy people
28 Graphs for quantitative data: Frequency Polygons n A line graph that emphasizes the continuity of continuous variables n Uses a single point rather than a bar Frequency! 10! 8! 6! 4! 2! 0! 3! 3! 2! 2! 1! 1! 4! 5! 6! 7! Level of Depression! 10! 10! 9! 9! 8! 8!
29 Graphs for quantitative data: Stem & Leaf n A display for sorting data on the basis of leading and trailing digits
30 Raw data Stem Leaf
31 Histogram Las Vegas Hotel R ates Frequency hotel rates Rates
32 Shapes of Histograms
33 Frequency Polygons 35 n Uses a single point rather than a bar Frequency Range
34 Bar Graph
35 Pie Graph
36 Misleading Graphs