Section 2.4 Numerical Measures of Central Tendency
|
|
- Pearl Murphy
- 7 years ago
- Views:
Transcription
1 Section 2.4 Numerical Measures of Central Tendency Definitions Mean: The Mean of a quantitative dataset is the sum of the observations in the dataset divided by the number of observations in the dataset. Median: The Median (m) of a quantitative dataset is the middle number when the observations are arranged in ascending order. Mode: The Mode of a datset is the observation that occurs most frequently in the dataset How to calculate these Mean: There are two means, the Population Mean μ and the Sample mean x. The calculation of both is the same except that μ is calculated for the entire population and x is calculated for a sample taken from that population. We will now refer to x as in practice we never calculate μ, after all not calculating but estimating μ is the whole point of inferential statistics. 1
2 Dataset: X 1 X 2 X 3 X 4 X X n so there are n observations in this dataset Sample Mean: x n i = = 1 n x i Median: Arrange the n observations in order from smallest to largest, then: if n is odd, the median (m) is the middle number, if n is even, the median is the mean of the middle two numbers Given a histogram the median is the point on the X-axis such that half the area under the histogram lies to the left of the median and half lies to the right. An example of finding the median from a histogram with Class Intervals is shown in Example Median 50% 50% 2
3 Mode: If given a dataset, the mode is easily chosen as the value with the highest relative frequency. If given a relative frequency distribution with class intervals then the mode is chosen to be the mid point of the class interval which has the highest relative frequency. This class interval which has the highest relative frequency is called the Modal Class. The mode measures data concentration and so can be used to locate the region in a large dataset where much of the data is concentrated. NOTE: unlike the mean and median the mode must be an element of the original dataset Example Calculate the Mean Median and Mode for the following datasets: Example A: Dataset: 5, 3, 8, 5, 6 x 5 x i = i = 1 = = 5.4 Mode = 5 Median: 3, 5, 5, 6, 8 so m = 5 Note: 5.4 is not one of the original values in the dataset B: 11, 140, 98, 23, 45, 14, 56, 78, 93, 200, 123, 165 3
4 n = 12, x n i = = 1 n x i = 1046/12 = Median: 11, 14, 23, 45, 56, 78, 93, 98, 123, 140, 165, 200 m = ( )/2 = 85.5 C: generate a dataset containing 9 numbers using the Day, Month and Year of your birth and that of the people sitting to your left and right. ie: DD/MM/YY 4
5 *** D: Class Interval Frequency 2 -< < < < 10 7 Modal Class is 4 -< 6 as frequency of 18 is highest, mode is in the middle of this so mode = 5 Mean = (3*3 + 5*18 + 7*9 + 9*7)/( ) = 225/37 = Median: There are 37 observations in this datset so the median is the 19th observation. There are 3 observations in the first Class Interval 2 -<4 and as 19-3 =16 we need to find the 16th observation in the Class Interval 4 -< 6. Assuming the observations are distributed uniformly within each Class Interval we find that the 16th observation in the second interval should lie 16/18 = 0.89 of the way between 4 and 6. The distance between 4 and 6 is 2 units, 2*.89 = 1.78, and so we find: median (m) = =
6 2.4.4 Mean vs Median vs Mode - which measures the centre best? Choosing which of these three measures to use in practice can sometimes seem like a difficult task. However if we understand a little about the relative merits of each we should at least be able to make an informed decision. If the distribution is symmetric then Mean = Median If the distribution is Positively Skewed (to the right) then Mean > Median If the distribution is Negatively Skewed (to the left) then Median > Mean So the difference between the mean and median can be used to measure the skewness of a dataset. ***********INSERT SLIDE Note: The presence of outliers affects the mean but not the median. This can be seen from the diagrams and from the following example: 6
7 *** Example Ten statistics graduates who are now working as statisticians are surveyed for their annual salary. The survey produced the following dataset: 60,000 20,000 19,000 22,000 21,500 21,000 18,000 16,000 17,500 20,000 Calculate the Mode, Median and Mean: Mode = 20,000 Median = 20,000 Mean = 23,500 Notice that the distribution is positively skewed, the presence of the one high earner has affected the Mean causing it to be 1,500 higher than the highest of all the salaries excluding 60,000. For this dataset the Mean is therefore not a good measure of the centre of the dataset. Notice also that the median would be unaffected if the 60,000 was changed to a value like 23,000 which is more in line with the rest of the data. Because of this sensitivity of the mean to outliers and because the median is completely insensitive to outliers a revised version of the mean is sometimes used called the trimmed mean. 7
8 2.4.6 Definition: Trimmed Mean NOTE: This definition is NOT in the textbook A trimmed mean is computed by first ordrering the data values from smallest to largest, then deleting a selected number of values from each end of the ordered list and finally averaging the remaining undeleted values. The trimming percentage is the percentage of values deleted from EACH end of the ordered list. So if a dataset contained 10 observations and we wanted to find a 20% trimmed mean we would delete 2 observations from the top of the ordered dataset and 2 from the bottom leaving 6 remaining values. The mean is then calculated for these 6 remaining values and this is the 20% Trimmed Mean. Example: Compute a 10% trimmed mean for the dataset in Example 2.4.5, compare with previous measures. There are 10 observations in the dataset, 10% of 10 is 1 so we delete the largest and smallest observations ie the values 60,000 and 16,000 are deleted. The mean of the remaining values is then calculated: 10% Trimmed Mean = ( 17, , , , , , , ,000)/8 = 19,875 This is very similar to the median and mode for this data. 8
9 2.4.7 Some more Examples Sometimes we are not presented with a dataset but with a a Histogram or a Stem and Leaf Diagram. It is still possible to measure the centre of the dataset from these graphs. **********INSERT MPG Histogram and Stem&Leaf Example Measurements were taken of the pulses of a certain number of UCD Students, the observations are listed below. Find the median and mode of this dataset. What is the best way to present this data which will allow the median and mode to be calculated more easily? Examples Would you expect the datasets described below to possess relative Frequency distributions which are symmetric, skewed to the right or skewed to the left. A. The salaries of people employed by UCD B. The grades on an easy exam C. The grades on a diffucult exam D. The amount of time spent by students in a difficult 3 hour exam. E. The amount of time students in this class studied last week. F. The age of cars on a used car lot 9
10 Example: The median age of the population in Ireland is now 32 years old. The median age of the Irish population in 1986 was 27. Interepret these values and explain the trend, what implications does this data have for Irish society. What are the consequences for the entertainment industry in Ireland? 10
11 Section 2.5 Numerical Measures of Variability When we want to describe a dataset providing a measure of the centre of that dataset is only part of the story. Consider the following two distributions: A B Both of these distributions are symmetric and meana = meanb, modea=modeb and mediana=medianb. However these two distributions are obviously different, the data in A is quite spread out compared to the data in B. This spread is technically called variability and in this section we will examine how best to measure it. 11
12 2.5.1 Definitions Range: The Range of a quantatitive dataset is equal to the largest value minus the smallest value. Sample Variance: The Sample Variance is equal to the sum of the squared distances from the mean divided by n-1. s 2 = n ( x x) i i= 1 n 1 2 An easier formula to be used when calculating the variance is: s 2 = n i= 1 x n 2 i= 1 i n 1 x i n 2 12
13 Sample Standard Deviation: The Sample Standard Deviation, s, is defined as the positive square root of the Sample Variance, s Which is best? The meaning of the Range is easily seen from its definition. It is a very crude measure of the variability contained in a dataset as it is only interested in the largest and smallest values and does not measure the variability of the rest of the dataset. ExampleA: These two datasets have the same range but do they have the same variability? Dataset1: 1, 5, 5, 5, 9 Dataset2: 1, 2, 5, 8, 9 NO, Dataset2 is obviously more spread out than Dataset1 which has threee values clustered at 5. The Sample Variance is a much better measure of the variability in the whole dataset. This is because the term ( xi x) in s 2 calculates the distance of each observation in the dataset from the centre of the dataset (as measured by the Sample Mean). 13
14 As some of the x i s are smaller than x and some are larger they tend to cancel each other out. For this reason we square each ( xi x) term before adding them together and dividing by n-1 to get an average measure of the squared distance of each observation from the mean. The Sample Variance therefore will be small if all observations are close to the Sample Mean but will be large if the observations are far away from the mean. This is best illustrated by comparing the calculation of s 2 for the two datasets in ExampleA above. Dataset1: 1, 5, 5, 5, 9... x =5 s 2 = [(1-5) 2 + (5-5) 2 + (5-5) 2 + (5-5) 2 + (9-5) 2 ]/4 = [ (-4) 2 + (0) 2 + (0) 2 + (0) 2 + (4) 2 ]/4 = [ ]/4 = 8 Dataset2: 1, 2, 5, 8, 9... x =5 s 2 = [(1-5) 2 + (2-5) 2 + (5-5) 2 + (8-5) 2 + (9-5) 2 ]/4 = [ (-4) 2 + (-3) 2 + (0) 2 + (3) 2 + (4) 2 ]/4 = [ ]/4 = 12.5 So the increased spread contained in Dataset2 is indeed measured by s Samples and Populations 14
15 You will have noticed that although we described s 2 as an average of the squared distances from the sample mean, in fact we divided the sum of the squares not by n but by n-1. Now there were n observations in the dataset so surely the correct thing would be to divide by n and not n-1. The reason we divided by n-1 is because we are as always intereted in Inferential Statistics and we want to use s 2 (the Sample Variance) to estimate for the Population Variance which we will denote by σ 2 ( sigma squared). And we will find later that s 2 with the n-1 provides a more accurate estimator of σ 2. So again we have a sample and a Population and two Population Characteristics estimated by two Sample Statistics. Population Characteristic Sample Statistic Population σ 2 Sample s 2 Variance Variance Population Standard Deviation σ Sample Standard Deviation s Example Two samples are chosen from a population: 15
16 Sample1: 10, 0, 1, 9, 10, 0, 8, 1, 1, 9 Sample2: 0, 5, 10, 5, 5, 5, 6, 5, 6, 5 Answer the following questions based on these two samples: A. Examine both samples and identify which has the greater variability B. Calculate the Range for each sample, does your result aggree with the answer in A. C. Calculate the Standard Deviation for each sample, does this result aggree with your answer to part A. D. Which of the two, Range or Standard Deviation provides the best measure of variability. Answers: Range1 = 10, Range2 = 10 S 1 =4.5814, S 2 = Example Once upon a time there were two lecturers A & B, each delivered the same course to two different classes. When exam time came both classes had the same average marks of 70%. The marks for Lecturer A s class however had a standard deviation of 25% whereas the Standard Deviation for Lecturer B s class was 5%. Who s class would you rather be in? 16
17 Section 2.6 Interpreting the Standard Deviation - Chebyshev s Rule and the Empirical Rule We have seen that the Variance and hence the Standard Deviation of a dataset provides us with a relative measure of the variability contained in a dataset. So that if we are given two datasets the one with the larger Standard Deviation will be the dataset which exhibits the greater variability. Is it posssible for the Standard Deviation to give more than a relative measure of variability? Can we actually say how spread ou the data is? The answer is yes, we will see later how to give detailed answers for particular distributions. In the meantime there are two rules which will provide us with a good deal of information about some general datasets Chebyshev s Rule This rule applies to any dataset (population or sample) regardless of the shape or frequency distribution of the data. For k > 1 the proportion of observations which are within k Standard Deviations of the mean is at least 1-1/k 2. 17
18 Computing this for several values of k gives: k: Number of Standard Deviations Proportion of the observations within k Standard Deviations from the Mean 2 At least 1-1/4 = At least 1-1/9 = At least 1-1/16 = At least 1-1/20 = At least 1-1/25 = At least 1-1/100 = 0.99 Note: Chebyshev s Rule provides us with an idea of the spread of distributions. Because it is meant to work for all distributions regardless of their shape it doesn t give definite specific results. Instead it tells us that at least a certain proportion of observations lie in a specified interval. The proportions in Chebyshev s Rule are therefore very conservtive and for certain distributions we may find a much higher proportion of observations within these intervals. The Empirical rule provides us with some definite statements about the proportion of observations in a specified interval. It only works for Symmetric Bell- Shaped (mound-shaped) distributions. Also this rule is an approximation and more or less data than is indicated by the rule may lie in each interval. 18
19 2.6.2 The Empirical Rule For a Symmetric Bell-Shaped distribution; Approximately 68% of the observations are within 1 Standard Deviation of the Mean Approximately 95% of the observations are within 2 Standard Deviation of the Mean Approximately 99.7% of the observations are within 3 Standard Deviation of the Mean 19
20 2.6.3 Some Examples ExampleA The following is a list of the times it takes 12 UCD students to get to college in the morning : 12, 23, 56, 14, 17, 21, 33, 42, 45, 38, 51, 29 Calculate x and s and calculate the percentage of data between x - 2s and x + 2s and also between x - 3s and x + 3s. Compare these results with the predictions of Chebyshev s Rule. Assuming that the data is distributed in an approximate Bell shape use the Empirical Rule to calculate the percentage of the data within 2 standard deviations of the mean and within 3 S.Devs of the mean. Comment on your results. x =31.75 s = s = 2*14.78 = s = x - 2s = = 2.19 x + 2s = = x - 3s = = ~ 0!!!!!!!!!!!!!!!! x + 3s = =
21 Interval Actual Chebyshev s Empirical x -2s ~ x +2s 100% at least 75% approx. 95% 2.19 ~ x -3s ~ x +3s 100% at least 89% approx 0 ~ % This table illustrates very clearly how Chebyshev s rule generally underestimates the amount of data in each interval. The empirical rule provides, in this case, more accurate results. ExampleB: A lecturer in UCD has assigned some problems to be done by the 120 students in her class. When it comes time to collect the problems 9 students inform her that The dog ate my homework. From many years of teaching classes this size she has observed that the mean for homeworks actually eaten by pets of all kinds is 3 homeworks and the standard deviation is 0.8 homeworks. Should the lecturer believe that the homeworks of all 9 students were eaten by their dogs or not. By Chebyshev s rule at least 1-1/k 2 of the observations should in the interval ( x - ks, x + ks). This gives the following table: 21
22 k- # of Standard Deviations Interval 2 1.4, % 3 0.6, % 4 0, % 5 0, 7 96% 6 0, % 7 0, % 8 0, % At least Percentage of observations in interval From this table we can see that there is an AT MOST 2% chance that dogs ate 9 homeworks in this class. Remembering that Chebyshev s rule is extremely conservative we could conclude that the chances are very high that some of the students just didn t do their homeworks. 22
23 Example C: In Tombstone, Arizona Territory people used Colt.45 revolvers. However people used different ammunition. Wyatt Earp knew that his brothers and Doc Holliday were the only ones in the territory who used Colt.45s with Winchester ammunition. The Earp brothers conducted tests on many different combinations of weapons and ammunition. They found that dataset of observations produced by the combination of Colt.45 with Winchester shells showed a Mean velocity of 936 feet/second and a Standard Deviation of 10 feet/second. The measurements were taken at a distance of 15 feet from the gun. When Wyatt examined the body of a cowboy shot in the back in cold blood he concluded that he was shot at a distance of 15 feet and that the velocity of the bullet at impact was 1,000 feet/second. The dastardly Ike Clanton claimed that this cowboy was shot by the Earp brothers or Doc Holliday. Was Wyatt able to clear his good name using the Empirical Rule? 23
24 The distribution of this bullet velocity data should be approximately bell-shaped. This implies that the empirical rule should give a good estimation of the percentages of the data within each interval. k- # of Standard Deviations Interval Chebyshev s At least Percentage Empirical approximate Percentage 2 916, % 95% 3 906, % 99.7% 4 896, % ~100% 5 886, % ~100% 6 876, % ~100% 7 866, % ~100% This table quite clearly demonstrates that since the bullet velocity in the shooting was 1000 ft/sec and since this lies more than 6 Standard Deviations away from the mean the probability is extremely high that the Earps were not responsible for this shooting. This is especially evident from looking at the column showing percentages from the empirical rule. Practically 100% of bullet velocities should be between 896 and 976 ft/sec. 24
25 Example C2: During The Troubles in Northern Ireland both Republicans and Loyalists used 9mm handguns however they used different brands of handgun and ammunition. The security forces in NI knew that the republicans used Heckler and Koch 9mm handguns with Winchester ammunition. The security forces conducted tests on many different combinations of weapons and ammunition. They found that dataset of observations produced by the combination of a H&K 9mm with Winchester shells showed a Mean velocity of 936 feet/second and a Standard Deviation of 10 feet/second. The measurements were taken at a distance of 15 feet from the gun. Forensic scientists examining the body of a shooting victim concluded that he was shot at a distance of 15 feet and that the velocity of the bullet at impact was 1,000 feet/second. Describe the distribution of the bullet velocities. Did they conclude that the shooter was a member of a Republican terrorist organisation or a Loyalist organisation? 25
26 The distribution of this bullet velocity data should be approximately bell-shaped. This implies that the empirical rule should give a good estimation of the percentages of the data within each interval. k- # of Standard Deviations Interval Chebyshev s At least Percentage Empirical approximate Percentage 2 916, % 95% 3 906, % 99.7% 4 896, % ~100% 5 886, % ~100% 6 876, % ~100% 7 866, % ~100% This table quite clearly demonstrates that since the bullet velocity in the shooting was 1000 ft/sec and since this lies more than 6 Standard Deviations away from the mean the probability is extremely high that Republicans were not responsible for this shooting. This is especially evident from looking at the column showing percentages from the empirical rule. Practically 100% of bullet velocities should be between 896 and 976 ft/sec. 26
27 2.6.4 Example to illustrate the difference beween Chebyshev s Rule, The Empirical Rule and some actual data. A survey was conducted to measure the height 14 year olds, a sample of 1052 children were measured and it was found that : x = inches s = inches A bell-shaped symmetric distribution provided a good fit to the data, applying Chebyshev s and the Empirical rule we get: k: number of SDevs Interval: ( x -ks, x +ks) Actual % of Obs. in Interval Empirical Rule: % of Obs. Chebyshev s Rule: 72.1% 68% >= 0% 96.2% 95% >= 75% 99.2% 99.7% >= 89% Clearly in this instance Chebyshev s Rule underestimates the proportions very severely. 27
28 2.6.5 Estimating the Standard Deviation from the Range According to the Empirical rule for Bell-Shaped distributions almost all of the data should be in the interval ( x -3s, x +3s). So the Range should be approximately 6s ie: x +3s - ( x -3s). This gives us a crude but useful measure of the Standard Deviation. Standard Deviation ~ Range/6 28
29 Section 2.7 Numerical Measures of Relative Standing While it is useful to know how to measure the centre of a dataset and the variability of a dataset, many times we want to be able to compare one observation with the rest of the observations in the dataset. Is one observation larger than many others? For Example suppose you get 35% on the exam for this course you will probably feel quite bad about your performance but what if 90% of the class actually did worse than you? Then you might feel a bit better about your 35%. So in some cases knowing how one observation compares with others can be more useful than just knowing the value of that observation. This chapter will introduce some different ways of measuring Relative Standing. 29
30 2.7.1 Definitions Percentile: For any dataset the p th percentile is the observation which is greater in value than P% of all the numbers. Consequently this observation will be smaller than (100-P)% of the data. Z-Score: The Z-Score of an observation is the distance between that observation and the mean expressed in units of standard deviations. So: Sample Z-Score for an observation x is: Z x = s x Population Z-Score of an observation is: Z x = μ σ The numerical value of the Z-score reflects the relative standing of the observation. A large positive Z-score implies that the observation is larger than most of the other observations. A large negative Z-score indicates that the bservation is smaller than almost all the other observations. A Z score of zero or close to 0 means that the observation is located close to the mean of the dataset
31 ExampleA: The 50 th percentile of a dataset is the median (The median remember is the value which is larger than half of the data). ExampleB: Dataset 15, 3, 1, 7, 5, 17, 19, 11, 9, 13 In this dataset the 80th percentile is the value 15 as 15 is greater than or equal to 80% of the data. This is easily seen if we arrange the data in ascending order: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19 Exercise 2.79 in textbook The distribution of scores on a nationally administered college achievement test has a median of 520 and a mean of 540. a. Explain how it is possible for the mean to exceed the median for this distribution. b. Suppose that you are told that the 90th percentile is 660, what does this mean? c. Suppose you are told that you scored at the 94th percentile, what does this mean? Answers: a. Distribution is positively skewed (to the right) b. 90% of the test scores are below 660 and 10% are above. c. 94% of the test scores were below yours and only 6% were above. 31
32 Example D. A sample of 120 statistics students was chosen and their exam results summarised, the mean and standard deviation were shown to be: x = 53% and s = 7% Eric and Kenny are two students in this class and Eric s exam result was 47% what was his Z-score? If Kenny s Z-Score is 2, what was his percentage on the exam? Z-scores and the Empirical Rule For a bell shaped distribution the Empirical Rule tells us the following about Z-scores: 1. Approximately 68% of the observations have a Z-Score between -1 and Approximately 95% of the observations have a Z-Score between -2 and Approximately 99.7% of the observations have a Z-Score between -3 and 3. Example 2.14 in the textbook: Suppose a female bank employee believes that her salary is low as a result of sex discrimination. To substantiate her belief, she collects information on the salaries of her male counterparts. She finds that their salaries have a mean of $34,000 and a standard deviation of $2,000. Her salary is $27,000 does this information support her claim of sex discrimination? Answer: 32
33 Calculate her Z-score with respect to her male counterparts: Z x x = = s $27, 000 $34, 000 $2, 000 = 35. So the woman s salary is 3.5 Standard Deviations below the mean of the male salary distribution. If the male salaries are distributed in a bell shape then the empirical rule tells us that very few salaries in this distribution should have a z-score below -3. Therefore a Z-score of -3.5 represents either a highly unsual observation from the male salary distribution or is from a different distribution. Do you think her claim of sex discrimination is justified? Answer: Need more data, on the collection technique the woman used, the length of time she has been in her job, her competence at her job etc. If she truly chose a representative sample, if she had been employed there as long as others and if she was good at her job then one might conclude that she was discriminated against. 33
MEASURES OF VARIATION
NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are
More informationDescriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationChapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs
Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)
More informationDef: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.
Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.
More informationDescriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationLesson 4 Measures of Central Tendency
Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central
More informationIntroduction to Statistics for Psychology. Quantitative Methods for Human Sciences
Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html
More informationThe right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median
CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box
More information3.2 Measures of Spread
3.2 Measures of Spread In some data sets the observations are close together, while in others they are more spread out. In addition to measures of the center, it's often important to measure the spread
More informationMBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
More informationAP Statistics Solutions to Packet 2
AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that
More informationLecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
More informationPie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.
Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of
More informationLecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
More informationChapter 3. The Normal Distribution
Chapter 3. The Normal Distribution Topics covered in this chapter: Z-scores Normal Probabilities Normal Percentiles Z-scores Example 3.6: The standard normal table The Problem: What proportion of observations
More informationMidterm Review Problems
Midterm Review Problems October 19, 2013 1. Consider the following research title: Cooperation among nursery school children under two types of instruction. In this study, what is the independent variable?
More information1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression
More informationVariables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
More informationStatistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
More informationconsider the number of math classes taken by math 150 students. how can we represent the results in one number?
ch 3: numerically summarizing data - center, spread, shape 3.1 measure of central tendency or, give me one number that represents all the data consider the number of math classes taken by math 150 students.
More informationExercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationUnit 7: Normal Curves
Unit 7: Normal Curves Summary of Video Histograms of completely unrelated data often exhibit similar shapes. To focus on the overall shape of a distribution and to avoid being distracted by the irregularities
More informationDensity Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:
Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve
More informationAP * Statistics Review. Descriptive Statistics
AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production
More informationWeek 3&4: Z tables and the Sampling Distribution of X
Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal
More informationThe Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)
Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,
More informationMind on Statistics. Chapter 2
Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table
More informationExploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
More informationThe Normal Distribution
Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution
More informationc. Construct a boxplot for the data. Write a one sentence interpretation of your graph.
MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More informationCA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction
CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationMATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
MATH 3/GRACEY PRACTICE EXAM/CHAPTERS 2-3 Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) The frequency distribution
More informationMeans, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
More informationEXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!
STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.
More informationInterpreting Data in Normal Distributions
Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,
More informationDESCRIPTIVE STATISTICS & DATA PRESENTATION*
Level 1 Level 2 Level 3 Level 4 0 0 0 0 evel 1 evel 2 evel 3 Level 4 DESCRIPTIVE STATISTICS & DATA PRESENTATION* Created for Psychology 41, Research Methods by Barbara Sommer, PhD Psychology Department
More informationChapter 4. Probability and Probability Distributions
Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the
More informationMathematics (Project Maths Phase 1)
2012. M128 S Coimisiún na Scrúduithe Stáit State Examinations Commission Leaving Certificate Examination, 2012 Sample Paper Mathematics (Project Maths Phase 1) Paper 2 Ordinary Level Time: 2 hours, 30
More information4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"
Data Analysis Plan The appropriate methods of data analysis are determined by your data types and variables of interest, the actual distribution of the variables, and the number of cases. Different analyses
More informationDescriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion
Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research
More informationNorthumberland Knowledge
Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationHISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
More informationExploratory Data Analysis. Psychology 3256
Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find
More informationFrequency Distributions
Descriptive Statistics Dr. Tom Pierce Department of Psychology Radford University Descriptive statistics comprise a collection of techniques for better understanding what the people in a group look like
More informationStandard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
More informationPractice#1(chapter1,2) Name
Practice#1(chapter1,2) Name Solve the problem. 1) The average age of the students in a statistics class is 22 years. Does this statement describe descriptive or inferential statistics? A) inferential statistics
More information3: Summary Statistics
3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes
More informationDescriptive Statistics
Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9
More information8. THE NORMAL DISTRIBUTION
8. THE NORMAL DISTRIBUTION The normal distribution with mean μ and variance σ 2 has the following density function: The normal distribution is sometimes called a Gaussian Distribution, after its inventor,
More informationMeasures of Central Tendency and Variability: Summarizing your Data for Others
Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationTopic 9 ~ Measures of Spread
AP Statistics Topic 9 ~ Measures of Spread Activity 9 : Baseball Lineups The table to the right contains data on the ages of the two teams involved in game of the 200 National League Division Series. Is
More informationDescribing, Exploring, and Comparing Data
24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter
More informationCenter: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)
Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center
More information1 Descriptive statistics: mode, mean and median
1 Descriptive statistics: mode, mean and median Statistics and Linguistic Applications Hale February 5, 2008 It s hard to understand data if you have to look at it all. Descriptive statistics are things
More informationTHE BINOMIAL DISTRIBUTION & PROBABILITY
REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution
More informationIntroduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data
A Few Sources for Data Examples Used Introduction to Environmental Statistics Professor Jessica Utts University of California, Irvine jutts@uci.edu 1. Statistical Methods in Water Resources by D.R. Helsel
More informationThe Standard Normal distribution
The Standard Normal distribution 21.2 Introduction Mass-produced items should conform to a specification. Usually, a mean is aimed for but due to random errors in the production process we set a tolerance
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Exam Name 1) A recent report stated ʺBased on a sample of 90 truck drivers, there is evidence to indicate that, on average, independent truck drivers earn more than company -hired truck drivers.ʺ Does
More informationBiostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
More informationLecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000
Lecture 4 Nancy Pfenning Stats 000 Chapter 7: Probability Last time we established some basic definitions and rules of probability: Rule : P (A C ) = P (A). Rule 2: In general, the probability of one event
More informationLecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
More information2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.
Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible
More information99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm
Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly
More informationPsychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!
Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck! Name: 1. The basic idea behind hypothesis testing: A. is important only if you want to compare two populations. B. depends on
More informationCh. 3.1 # 3, 4, 7, 30, 31, 32
Math Elementary Statistics: A Brief Version, 5/e Bluman Ch. 3. # 3, 4,, 30, 3, 3 Find (a) the mean, (b) the median, (c) the mode, and (d) the midrange. 3) High Temperatures The reported high temperatures
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More information5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.
The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution
More informationClassify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous
Chapter 2 Overview Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Classify as categorical or qualitative data. 1) A survey of autos parked in
More informationMeasurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement
Measurement & Data Analysis Overview of Measurement. Variability & Measurement Error.. Descriptive vs. Inferential Statistics. Descriptive Statistics. Distributions. Standardized Scores. Graphing Data.
More informationWEEK #22: PDFs and CDFs, Measures of Center and Spread
WEEK #22: PDFs and CDFs, Measures of Center and Spread Goals: Explore the effect of independent events in probability calculations. Present a number of ways to represent probability distributions. Textbook
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationDescriptive statistics parameters: Measures of centrality
Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between
More informationStatistics Revision Sheet Question 6 of Paper 2
Statistics Revision Sheet Question 6 of Paper The Statistics question is concerned mainly with the following terms. The Mean and the Median and are two ways of measuring the average. sumof values no. of
More informationLecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
More informationHow To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Chapter 1 Review 1. As part of survey of college students a researcher is interested in the variable class standing. She records a 1 if the student is a freshman, a 2 if the student
More informationFoundation of Quantitative Data Analysis
Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1
More informationOpgaven Onderzoeksmethoden, Onderdeel Statistiek
Opgaven Onderzoeksmethoden, Onderdeel Statistiek 1. What is the measurement scale of the following variables? a Shoe size b Religion c Car brand d Score in a tennis game e Number of work hours per week
More informationNormal distribution. ) 2 /2σ. 2π σ
Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
More information6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey):
MATH 1040 REVIEW (EXAM I) Chapter 1 1. For the studies described, identify the population, sample, population parameters, and sample statistics: a) The Gallup Organization conducted a poll of 1003 Americans
More information6 3 The Standard Normal Distribution
290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since
More informationz-scores AND THE NORMAL CURVE MODEL
z-scores AND THE NORMAL CURVE MODEL 1 Understanding z-scores 2 z-scores A z-score is a location on the distribution. A z- score also automatically communicates the raw score s distance from the mean A
More informationIntroduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
More informationProbability Distributions
Learning Objectives Probability Distributions Section 1: How Can We Summarize Possible Outcomes and Their Probabilities? 1. Random variable 2. Probability distributions for discrete random variables 3.
More informationSTATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
More information5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More information