Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple."

Transcription

1 Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of values and the most likely values as well as properties like symmetry, uni- or multi-modality, tail behavior, etc. To quantify the central value of the distribution of a given sample we define the average and the median. To quantify the spread (dispersion) of the sample with respect to its central value we define the standard deviation. 1 Pie Charts Cherry Cherry The pie Blueberry Blueberry charts correspond to the proportion of ice-cream flavors sold annually by a given brand Apple Apple Cherry Other Boston Cream Vanilla Cream Blueberry Vanilla Cream Apple Apple Cherry Other Boston Cream Vanilla Cream Blueberry Vanilla Cream Other Boston Cream Other Boston Cream 2

2 Pie Charts are a bad idea! From the R manual page for the pie function: Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data. Cleveland (1985), page 264: "Data that can be shown by pie charts always can be shown by a dot chart. This means that judgements of position along a common scale can be made instead of the less accurate angle judgements." This statement is based on the empirical investigations of Cleveland and McGill as well as investigations by perceptual psychologists. 3 The same data are represented now with bar charts. Notice that this representation allows a very clear quantification of the differences between the ice-cream types Vanilla Cream Other Boston Cream Apple Cherry Blueberry Bar Charts Blueberry Cherry Apple Boston Cream Other Vanilla Cream

3 Bar Charts Strictly speaking bar charts can be used for drawing a summary of either quantitative or qualitative data types. Qualitative data types, such as nominal or ordinal variables, can be visualized using bar charts. The next data summary graphing techniques, Stem and Leaf Plots and Histograms are used for quantitative variables. 5 Stem and Leaf Plots A similar technique used to graphically represent data is to use stem and leaf plots. Take a look at the Healthy Breakfast data set. This datafile contains nutritional information and grocery shelf location for 77 breakfast cereals. Current research states that adults should consume no more than 30 % of their calories in the form of fat, they need about 50 grams (women) or 63 grams (men) of protein daily, and should provide for the remainder of their caloric intake with complex carbohydrates. One gram of fat contains 9 calories and carbohydrates and proteins contain 4 calories per gram. A good diet should also contain grams of dietary fiber. Check out R code. 6

4 7 Income level in $ frequency 0 1, ,000 2, ,000 3, ,000 4, ,000 5, ,000 6, ,000 7, ,000 10, ,000 15, ,000 25, ,000 50, ,000 and over Frequency Histograms Consider the table of families by income in the US in 1973 and corresponding percents. In this table class intervals include the left point, but not the right point. It is important to specify which of the endpoints are included in each class. Notice that in this case class intervals do not have the same length. 8

5 We can draw a frequency histogram for each of the income level ranges specified by dividing the frequency counts by the total number of families. However, the ranges are different widths, so that the area of each block is NOT equally proportional to the number of families with incomes in the corresponding class interval. We really WANT the areas of each block to equally represent the proportion of families within the income class interval, so instead we use a density histogram. 9 Density histograms are similar to frequency histogram except heights of rectangles are calculated by dividing relative frequency by class width. (frequency total number of families class width). Resulting rectangle heights called densities, and the vertical scale called density scale. Density Histograms Density Density Histogram for Income Income in $1000 NOTE: I used total subjects in income data set = 211,908,

6 IMPORTANT! When comparing data sets with different sample sizes OR when drawing a histograms with varying class interval widths, it is NOT appropriate to compare raw frequency histograms. Why? When sample sizes are different, density scale histograms are BETTER. 11 Drawing Density Histograms Once the distribution table of percentages is available the next step is to draw a horizontal axis specifying the class intervals. Then we draw the blocks remembering that In a density histogram the areas of the blocks represent percentages So, it is a mistake to set the heights of the blocks equal to the percentages in the table. (that would be a relative frequency histogram, which we ll talk about next.) To figure out the height of a block divide the percentage by the class width of the interval. The table needed to calculate the heights of the blocks looks like 12

7 Income level in $ percent class width (in $1,000 s) height 0 1, ,000 2, ,000 3, ,000 4, ,000 5, ,000 6, ,000 7, ,000 10, ,000 15, ,000 25, ,000 50, ,000 and over 1 13 Distribution of family income in the US in 1973 percent per $ This is the resulting histogram. The sum of the areas of a density histogram adds to 1. Notice that the class interval of incomes above $50,000 has been ignored income in $

8 Vertical scale What is the meaning of the vertical scale in a histogram? Remember that the area of the blocks is proportional to the percents. A high height implies that large chunks of area accumulate in small portions of the horizontal scale. This implies that the density of the data is high in the intervals where the height is large. In other words, the data are more crowded in those intervals. 15 Another Example of a Density Histogram Information is available from 131 hospitals. We show a histogram of the average length of stay measured in days for each hospital. The area of each block is proportional to the number of hospitals in the corresponding class interval. Histogram of the average length of stay in hospital length of stay (days) In this example all the intervals have the same length, so the heights of the blocks give all the information about the number of hospitals in each class. 16

9 There are 7 class intervals corresponding to 6 to 8 days 8 to 10 days 10 to 12 days 12 to 14 days 14 to 16 days 16 to 18 days 18 to 20 days Note that the class that corresponds to 14 to 16 days is empty and that the class with the highest count of hospitals is the one of 8 to 10 days. 17 Cross tabulation In many situations we need to perform an exploratory analysis of data to observe possible associations with a discrete variable. For example, consider measuring the blood pressure of women and divide them in two groups: one taking the contraceptive pill and the other not taking it. We can produce a table with the distribution of one group in one column and the distribution of the other in another column. This can be used to produce two histograms in order to make a visual comparison of the the two groups. The variable that is used for the cross-tabulation is usually referred to as a covariable. 18

10 blood pressure non users users (mm) % % under women not using the pill blood pressure (mm) over We observe that the histogram of pill users is slightly shifted to the right, suggesting an increase in blood pressure among women taking the pill. These are relative frequency histograms. percent per mm percent per mm women using the pill blood pressure (mm) 19 percent per mm percent per mm Relative Frequency Histograms women not using the pill blood pressure (mm) women using the pill blood pressure (mm) These are relative frequency histograms, because the height of the bars is the fraction of times the value occurs e.g. the frequency of value(s) number of observations in the set. Relative frequency histograms are also useful for comparing two samples with different sample sizes. The Sum of all relative frequencies in a dataset is 1. 20

11 Problems Data from the 1990 Census produce the following for houses in the New York City area that are either occupied by the owner or rented out. 1. The owner-occupied percents add up to 99.9% and the renter-occupied percents add up to 100.1%, why? 2. The percentage of one-room units is much larger for renter-occupied housing. Is that because there is more renter-occupied housing in total? 3. Which are larger on the whole: the owner-occupied units or the renter-occupied units? 21 Number of Rooms Owner Occupied Renter Occupied Total Number 785,120 1,782,459 22

12 The answer to the first question is that there is rounding involved in the calculation of the percentages. As for the second question, the fact that we are taking percentages accounts for the difference in totals, so a larger total of renter-occupied units does not explain the difference. What seems to be happening is that units for rent tend to be smaller than units occupied by their owners. This is more clearly seen from the comparison of the two histograms. 23 Owner!occupied Renter!occupied

13 Average and spread in a histogram A histogram provides a graphical description of the distribution of a sample of data. If we want to summarize the properties of such a distribution we can measure the center and the spread of the histogram. Histogram of n1 These two histograms correspond to samples with the same center. The spread of the sample on top is smaller than that of the sample in the bottom Density Density !6!4! n.1 Histogram of n2!6!4! n.2 25 Average and median In addition to summarizing a variable graphically to look for patterns, it s also useful to summarize it numerically. The three most useful measures of center (or central tendency) are the mean, the median (and other quantiles, or percentiles), and the mode. It turns out the the mean has the graphical interpretation of the center of gravity of the data. If you visualize the histogram of a variable as made of bricks that are sitting on a number line made of plywood, which in turn is put on top of a saw-hours, the mean is the place where the histogram would exactly balance. 26

14 To obtain an estimate of the center of the distribution we can calculate an average. The average of a list of numbers equals their sum, divided by how many they are Thus, if 18; 18; 21; 20; 19; 20; 20; 20; 19; 20 are the ages of 10 students in this class, the average is given by = 19.5 In the hospital data that we considered in the previous class the data corresponded to the average length of stay of patients in each hospital in the survey. This means that the length of stay of all patients in a given hospital were added and the sum divided by the number of patients in that hospital. (remember Summation Notation??) 27 Average and median The median of a column of numbers is found by sorting the data, from smallest to highest, and finding the middle value in the list. If the sorted list has an odd number of elements, then the median is uniquely defined. If the sorted list has an even number of elements, the median is the mean of the two middle values. 28

15 histogram of rainfall in Guarico, Venezuela This histogram corresponds to Density median mean mm the rainfall over periods of 10 days in an area of the central plains of Venezuela. The average or mean rainfall is mm. We observe that only about 30% of the observations are above the average. Notice that this histogram is not symmetric with respect to the 29 average. The median of a histogram is the value with half the area to the left and half to the right. The median and average of a non-symmetric histogram are DIFFERENT. histogram of rainfall in Guarico, Venezuela Density median mean mm 30

16 Histogram of dat A symmetric histogram will look like this. In this case 50% of the data are above the average. Density In a symmetric histogram the median and the average coincide. By definition the median is the 50 th percentile, although it s also useful sometimes to look at other percentiles, for example the 25 th percentile ( also called the first quartile) is the place where 1 4 of the data is to the left of that place. dat 31 Average bigger than median: long right tail Average about the same as median: symmetry Average is smaller than median: long left tail The average is very sensitive to extreme observations, so when dealing with variables like income or rainfall, that exhibit very long tails, it is preferable to use the median as a measure of centrality. The relationship between the average and the median determines the shape of the tails of a histogram. 32

17 Problem 1:According to the Department of Commerce, the mean and median price of new houses sold in the United States in mid 1988 were 141, 200 and 117, 800. Which of these numbers is the mean and which is the median? Explain your answer. Problem 2: The number of deaths from cancer in the US has risen steadily over time. In 1985, about 462,000 people died of cancer, up from deaths in A member of Congress says that these numbers show that these numbers show that no progress has been made in treating cancer. Explain how the number of people dying of cancer could increase even if treatment of the disease were improving. Then describe at least one variable that would be a more appropriate measure of the effectiveness of medical treatment for a potentially fatal disease. 33 Consider the sample A measure of size 0, 5, 8, 7, 3 How big are these five numbers? If we consider the average as a measure of size then we obtain 0.2, which is a fairly small value compared to 7. The trouble is that in the average large negative quantities cancel large positive ones. To avoid this problem we need a measure of size that disregards signs. We proceed as follows: 1. square all values 2. Calculate the average of the resulting numbers 3. Take the root of the resulting mean. This is called the root mean square size of the sample. 34

18 For the previous data set we have r.m.s. size = ( 8) ( 3) 2 ) 5 = We could have also considered the average disregarding the signs, which amounts to = 4.6 Unfortunately the mathematical properties of this way of measuring size are not as appealing as the ones of r.m.s. 35 Spread As we saw at the beginning of the lecture two samples can have the same center and be scattered along their ranges in different ways. To measure the way a sample is spread around its average we can use the standard deviation, or SD. The SD of a list of numbers measures how far away they are from their average Thus a large SD implies that many observations are far from the overall average. Most observations will be one SD from the average. Very few will be more than two SDs away. 36

19 Empirical Rule SDs are a pain to compute by hand or with a calculator, and it s easy to make mistakes when doing so, so it s good to have a simple way to roughly approximate the SD of a list of numbers by looking at its histogram. If you start at the mean and go one SD either way, you ll capture about 2 3 of the data. Roughly 95% of the observations are within two SDs of the average. Roughly 99% of the observations are within three SDs of the average. This statements are more accurate when the distribution is symmetric. 37 Generally, more data is better than less data because more data mean less uncertainty (or smaller give or take). 38

20 Empirical Rule and the Cereal Data Example Density Histogram Density Let s look back at the cereal data example, which has a mean of about 160 mg of sodium milligrams of sodium Using the empirical rule, what guess would you make for the SD? 39 Goldilocks and the Cereal Data Example If you guessed 20 mg, that would be too small, because mg ought to be about 2 3 of the data. If you guessed 100 mg, that would be too large, because mg is more than 2 3 of the data. If you guessed 80 mg, then mg ought to be about 2 3 data, and 0-320mg would be about 95% of the data, which looks about just right! of the 40

21 Calculating the SD To calculate the standard deviation of a sample follow the steps: Calculate the average Calculate the list of deviations from the average by taking the difference between each datum and the average. Calculated the r.m.s. size of the resulting list. SD = r.m.s. deviation from average. Consider the list 20,10,15,15. Then average = The list of deviations is 5, -5, 0, 0. Then = 15 SD = 52 + ( 5) = Using a calculator Most scientific calculators will have a function to calculate the average and the SD of a sample. The steps needed to obtain those values vary from model to model. The important fact is that most calculators do not produce the SD as we have defined it here. They consider the sum of the squares of the deviations over the total number of data minus one. So, if you obtain the SD from your calculator (or spreadsheet), say SD, then SD = number of entries - one number of entries SD Some calculators have both, SD and SD. Please read the manual of your calculator regarding this fact. Notice that the units of SD are the same as the original data. So if the data were measured in years, SD is also in years. 42

22 Problems Problem 1: Both the following lists have the same average of 50. Which one has the smaller SD and why? (Do no computations) 1. 50,40,60,30,70,25, ,40,60,30,70,25,75,50,50,50 The second list has more entries at the average, so the SD is smaller. Repeat for the following two lists 1. 50,40,60,30,70,25, ,40,60,30,70,25,75,99,1 The second list has two wild observations, 99 and 1, which are away from the average, so the SD is larger. 43 Problem 2: Consider the list of numbers Without doing any arithmetic, guess whether the average is around 1, 5 or 10. Only three of the numbers are smaller than 1, none are bigger than 10, so the average is around Without doing any arithmetic, guess whether the SD is around 1,3 or 6. If the SD is 1, then the entries 0.6 and 9.9 are too far away from the average. The entries are too concentrated around 5 for the SD to be 6. So the 3 is the most likely value. 44

23 Problem 3: The usual method for determining heart rate is to take the pulse and count the number of beats in a given time period. The results are generally reported as beats per minute; for instance, if the time period is 15 seconds, the count is multilied by four. Take your pulse for two 15-sec. periods, two 30-sec. periods, and two 1-minute periods. Convert the counts to beats per minute and report the results. Which procedure do you think gives the best results?? Why? 45

Chapter 3: Central Tendency

Chapter 3: Central Tendency Chapter 3: Central Tendency Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the distribution and represents

More information

Chapter 7 What to do when you have the data

Chapter 7 What to do when you have the data Chapter 7 What to do when you have the data We saw in the previous chapters how to collect data. We will spend the rest of this course looking at how to analyse the data that we have collected. Stem and

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Chapter 3: Data Description Numerical Methods

Chapter 3: Data Description Numerical Methods Chapter 3: Data Description Numerical Methods Learning Objectives Upon successful completion of Chapter 3, you will be able to: Summarize data using measures of central tendency, such as the mean, median,

More information

Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010 Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

More information

F. Farrokhyar, MPhil, PhD, PDoc

F. Farrokhyar, MPhil, PhD, PDoc Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

More information

Lecture I. Definition 1. Statistics is the science of collecting, organizing, summarizing and analyzing the information in order to draw conclusions.

Lecture I. Definition 1. Statistics is the science of collecting, organizing, summarizing and analyzing the information in order to draw conclusions. Lecture 1 1 Lecture I Definition 1. Statistics is the science of collecting, organizing, summarizing and analyzing the information in order to draw conclusions. It is a process consisting of 3 parts. Lecture

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS

Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction

More information

Graphical and Tabular. Summarization of Data OPRE 6301

Graphical and Tabular. Summarization of Data OPRE 6301 Graphical and Tabular Summarization of Data OPRE 6301 Introduction and Re-cap... Descriptive statistics involves arranging, summarizing, and presenting a set of data in such a way that useful information

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Module 2 Project Maths Development Team Draft (Version 2)

Module 2 Project Maths Development Team Draft (Version 2) 5 Week Modular Course in Statistics & Probability Strand 1 Module 2 Analysing Data Numerically Measures of Central Tendency Mean Median Mode Measures of Spread Range Standard Deviation Inter-Quartile Range

More information

Descriptive Statistics. Frequency Distributions and Their Graphs 2.1. Frequency Distributions. Chapter 2

Descriptive Statistics. Frequency Distributions and Their Graphs 2.1. Frequency Distributions. Chapter 2 Chapter Descriptive Statistics.1 Frequency Distributions and Their Graphs Frequency Distributions A frequency distribution is a table that shows classes or intervals of data with a count of the number

More information

Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

More information

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

4. Introduction to Statistics

4. Introduction to Statistics Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

More information

Table 2-1. Sucrose concentration (% fresh wt.) of 100 sugar beet roots. Beet No. % Sucrose. Beet No.

Table 2-1. Sucrose concentration (% fresh wt.) of 100 sugar beet roots. Beet No. % Sucrose. Beet No. Chapter 2. DATA EXPLORATION AND SUMMARIZATION 2.1 Frequency Distributions Commonly, people refer to a population as the number of individuals in a city or county, for example, all the people in California.

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Regression. In this class we will:

Regression. In this class we will: AMS 5 REGRESSION Regression The idea behind the calculation of the coefficient of correlation is that the scatter plot of the data corresponds to a cloud that follows a straight line. This idea can be

More information

We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students:

We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students: MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

909 responses responded via telephone survey in U.S. Results were shown by political affiliations (show graph on the board)

909 responses responded via telephone survey in U.S. Results were shown by political affiliations (show graph on the board) 1 2-1 Overview Chapter 2: Learn the methods of organizing, summarizing, and graphing sets of data, ultimately, to understand the data characteristics: Center, Variation, Distribution, Outliers, Time. (Computer

More information

Central Tendency. n Measures of Central Tendency: n Mean. n Median. n Mode

Central Tendency. n Measures of Central Tendency: n Mean. n Median. n Mode Central Tendency Central Tendency n A single summary score that best describes the central location of an entire distribution of scores. n Measures of Central Tendency: n Mean n The sum of all scores divided

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

Chapter 2 Summarizing and Graphing Data

Chapter 2 Summarizing and Graphing Data Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms 2-4 Graphs that Enlighten and Graphs that Deceive Preview Characteristics of Data 1. Center: A

More information

MATH CHAPTER 2 EXAMPLES & DEFINITIONS

MATH CHAPTER 2 EXAMPLES & DEFINITIONS MATH 10043 CHAPTER 2 EXAMPLES & DEFINITIONS Section 2.2 Definition: Frequency distribution a chart or table giving the values of a variable together with their corresponding frequencies. A frequency distribution

More information

The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

More information

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table 2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

More information

Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS

Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS 1. Introduction, and choosing a graph or chart Graphs and charts provide a powerful way of summarising data and presenting them in

More information

Comments 2 For Discussion Sheet 2 and Worksheet 2 Frequency Distributions and Histograms

Comments 2 For Discussion Sheet 2 and Worksheet 2 Frequency Distributions and Histograms Comments 2 For Discussion Sheet 2 and Worksheet 2 Frequency Distributions and Histograms Discussion Sheet 2 We have studied graphs (charts) used to represent categorical data. We now want to look at a

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

13.2 Measures of Central Tendency

13.2 Measures of Central Tendency 13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

More information

AMS 7L LAB #2 Spring, 2009. Exploratory Data Analysis

AMS 7L LAB #2 Spring, 2009. Exploratory Data Analysis AMS 7L LAB #2 Spring, 2009 Exploratory Data Analysis Name: Lab Section: Instructions: The TAs/lab assistants are available to help you if you have any questions about this lab exercise. If you have any

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Sampling, frequency distribution, graphs, measures of central tendency, measures of dispersion

Sampling, frequency distribution, graphs, measures of central tendency, measures of dispersion Statistics Basics Sampling, frequency distribution, graphs, measures of central tendency, measures of dispersion Part 1: Sampling, Frequency Distributions, and Graphs The method of collecting, organizing,

More information

1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics)

1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics) 1.5 NUMERICAL REPRESENTATION OF DATA (Sample Statistics) As well as displaying data graphically we will often wish to summarise it numerically particularly if we wish to compare two or more data sets.

More information

6. Methods 6.8. Methods related to outputs, Introduction

6. Methods 6.8. Methods related to outputs, Introduction 6. Methods 6.8. Methods related to outputs, Introduction In order to present the outcomes of statistical data collections to the users in a manner most users can easily understand, a variety of statistical

More information

Introduction to Descriptive Statistics

Introduction to Descriptive Statistics Mathematics Learning Centre Introduction to Descriptive Statistics Jackie Nicholas c 1999 University of Sydney Acknowledgements Parts of this booklet were previously published in a booklet of the same

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Readings: Ha and Ha Textbook - Chapters 1 8 Appendix D & E (online) Plous - Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability

More information

GCSE Statistics Revision notes

GCSE Statistics Revision notes GCSE Statistics Revision notes Collecting data Sample This is when data is collected from part of the population. There are different methods for sampling Random sampling, Stratified sampling, Systematic

More information

Sheffield Hallam University. Faculty of Health and Wellbeing Professional Development 1 Quantitative Analysis. Glossary

Sheffield Hallam University. Faculty of Health and Wellbeing Professional Development 1 Quantitative Analysis. Glossary Sheffield Hallam University Faculty of Health and Wellbeing Professional Development 1 Quantitative Analysis Glossary 2 Using the Glossary This does not set out to tell you everything about the topics

More information

STATISTICS FOR PSYCH MATH REVIEW GUIDE

STATISTICS FOR PSYCH MATH REVIEW GUIDE STATISTICS FOR PSYCH MATH REVIEW GUIDE ORDER OF OPERATIONS Although remembering the order of operations as BEDMAS may seem simple, it is definitely worth reviewing in a new context such as statistics formulae.

More information

Chapter 2: Frequency Distributions and Graphs

Chapter 2: Frequency Distributions and Graphs Chapter 2: Frequency Distributions and Graphs Learning Objectives Upon completion of Chapter 2, you will be able to: Organize the data into a table or chart (called a frequency distribution) Construct

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Math Review Large Print (18 point) Edition Chapter 4: Data Analysis

Math Review Large Print (18 point) Edition Chapter 4: Data Analysis GRADUATE RECORD EXAMINATIONS Math Review Large Print (18 point) Edition Chapter 4: Data Analysis Copyright 2010 by Educational Testing Service. All rights reserved. ETS, the ETS logo, GRADUATE RECORD EXAMINATIONS,

More information

10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation 10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Models for Discrete Variables

Models for Discrete Variables Probability Models for Discrete Variables Our study of probability begins much as any data analysis does: What is the distribution of the data? Histograms, boxplots, percentiles, means, standard deviations

More information

There are some general common sense recommendations to follow when presenting

There are some general common sense recommendations to follow when presenting Presentation of Data The presentation of data in the form of tables, graphs and charts is an important part of the process of data analysis and report writing. Although results can be expressed within

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab 1 Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab I m sure you ve wondered about the absorbency of paper towel brands as you ve quickly tried to mop up spilled soda from

More information

Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

More information

STA201 Intermediate Statistics Lecture Notes. Luc Hens

STA201 Intermediate Statistics Lecture Notes. Luc Hens STA201 Intermediate Statistics Lecture Notes Luc Hens 15 January 2016 ii How to use these lecture notes These lecture notes start by reviewing the material from STA101 (most of it covered in Freedman et

More information

Data Analysis: Describing Data - Descriptive Statistics

Data Analysis: Describing Data - Descriptive Statistics WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

MCQ S OF MEASURES OF CENTRAL TENDENCY

MCQ S OF MEASURES OF CENTRAL TENDENCY MCQ S OF MEASURES OF CENTRAL TENDENCY MCQ No 3.1 Any measure indicating the centre of a set of data, arranged in an increasing or decreasing order of magnitude, is called a measure of: (a) Skewness (b)

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Fall 2010 Practice Exam 1 Without Essay

Fall 2010 Practice Exam 1 Without Essay Class: Date: Fall 2010 Practice Exam 1 Without Essay Multiple Choice Identify the choice that best completes the statement or answers the question. 1. A researcher wants to visually display the U.S. divorce

More information

Session 1.6 Measures of Central Tendency

Session 1.6 Measures of Central Tendency Session 1.6 Measures of Central Tendency Measures of location (Indices of central tendency) These indices locate the center of the frequency distribution curve. The mode, median, and mean are three indices

More information

GCSE HIGHER Statistics Key Facts

GCSE HIGHER Statistics Key Facts GCSE HIGHER Statistics Key Facts Collecting Data When writing questions for questionnaires, always ensure that: 1. the question is worded so that it will allow the recipient to give you the information

More information

Summarizing Data: Measures of Variation

Summarizing Data: Measures of Variation Summarizing Data: Measures of Variation One aspect of most sets of data is that the values are not all alike; indeed, the extent to which they are unalike, or vary among themselves, is of basic importance

More information

vs. relative cumulative frequency

vs. relative cumulative frequency Variable - what we are measuring Quantitative - numerical where mathematical operations make sense. These have UNITS Categorical - puts individuals into categories Numbers don't always mean Quantitative...

More information

Each exam covers lectures from since the previous exam and up to the exam date.

Each exam covers lectures from since the previous exam and up to the exam date. Sociology 301 Exam Review Liying Luo 03.22 Exam Review: Logistics Exams must be taken at the scheduled date and time unless 1. You provide verifiable documents of unforeseen illness or family emergency,

More information

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

COMPASS Numerical Skills/Pre-Algebra Preparation Guide. Introduction Operations with Integers Absolute Value of Numbers 13

COMPASS Numerical Skills/Pre-Algebra Preparation Guide. Introduction Operations with Integers Absolute Value of Numbers 13 COMPASS Numerical Skills/Pre-Algebra Preparation Guide Please note that the guide is for reference only and that it does not represent an exact match with the assessment content. The Assessment Centre

More information

Methods for Describing Data Sets

Methods for Describing Data Sets 1 Methods for Describing Data Sets.1 Describing Data Graphically In this section, we will work on organizing data into a special table called a frequency table. First, we will classify the data into categories.

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Statistical Foundations: Measures of Location and Central Tendency and Summation and Expectation

Statistical Foundations: Measures of Location and Central Tendency and Summation and Expectation Statistical Foundations: and Central Tendency and and Lecture 4 September 5, 2006 Psychology 790 Lecture #4-9/05/2006 Slide 1 of 26 Today s Lecture Today s Lecture Where this Fits central tendency/location

More information

Graphical methods for presenting data

Graphical methods for presenting data Chapter 2 Graphical methods for presenting data 2.1 Introduction We have looked at ways of collecting data and then collating them into tables. Frequency tables are useful methods of presenting data; they

More information

Statistics Chapter 2

Statistics Chapter 2 Statistics Chapter 2 Frequency Tables A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students

More information

Summarizing Your Data

Summarizing Your Data Summarizing Your Data Key Info So now you have collected your raw data, and you have results from multiple trials of your experiment. How do you go from piles of raw data to summaries that can help you

More information

An approach to Descriptive Statistics through real situations

An approach to Descriptive Statistics through real situations MaMaEuSch Management Mathematics for European Schools http://www.mathematik.unikl.de/ mamaeusch An approach to Descriptive Statistics through real situations Paula Lagares Barreiro 1 Federico Perea Rojas-Marcos

More information

1 Lesson 3: Presenting Data Graphically

1 Lesson 3: Presenting Data Graphically 1 Lesson 3: Presenting Data Graphically 1.1 Types of graphs Once data is organized and arranged, it can be presented. Graphic representations of data are called graphs, plots or charts. There are an untold

More information

Statistics Summary (prepared by Xuan (Tappy) He)

Statistics Summary (prepared by Xuan (Tappy) He) Statistics Summary (prepared by Xuan (Tappy) He) Statistics is the practice of collecting and analyzing data. The analysis of statistics is important for decision making in events where there are uncertainties.

More information

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175) Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

More information

Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

More information

Histogram. Graphs, and measures of central tendency and spread. Alternative: density (or relative frequency ) plot /13/2004

Histogram. Graphs, and measures of central tendency and spread. Alternative: density (or relative frequency ) plot /13/2004 Graphs, and measures of central tendency and spread 9.07 9/13/004 Histogram If discrete or categorical, bars don t touch. If continuous, can touch, should if there are lots of bins. Sum of bin heights

More information

Sampling Distributions and the Central Limit Theorem

Sampling Distributions and the Central Limit Theorem 135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained

More information

2 Descriptive statistics with R

2 Descriptive statistics with R Biological data analysis, Tartu 2006/2007 1 2 Descriptive statistics with R Before starting with basic concepts of data analysis, one should be aware of different types of data and ways to organize data

More information

MEASURES OF DISPERSION

MEASURES OF DISPERSION MEASURES OF DISPERSION Measures of Dispersion While measures of central tendency indicate what value of a variable is (in one sense or other) average or central or typical in a set of data, measures of

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Frequency distributions, central tendency & variability. Displaying data

Frequency distributions, central tendency & variability. Displaying data Frequency distributions, central tendency & variability Displaying data Software SPSS Excel/Numbers/Google sheets Social Science Statistics website (socscistatistics.com) Creating and SPSS file Open the

More information

Cents and the Central Limit Theorem Overview of Lesson GAISE Components Common Core State Standards for Mathematical Practice

Cents and the Central Limit Theorem Overview of Lesson GAISE Components Common Core State Standards for Mathematical Practice Cents and the Central Limit Theorem Overview of Lesson In this lesson, students conduct a hands-on demonstration of the Central Limit Theorem. They construct a distribution of a population and then construct

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

STAT 155 Introductory Statistics. Lecture 5: Density Curves and Normal Distributions (I)

STAT 155 Introductory Statistics. Lecture 5: Density Curves and Normal Distributions (I) The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 5: Density Curves and Normal Distributions (I) 9/12/06 Lecture 5 1 A problem about Standard Deviation A variable

More information

CHINHOYI UNIVERSITY OF TECHNOLOGY

CHINHOYI UNIVERSITY OF TECHNOLOGY CHINHOYI UNIVERSITY OF TECHNOLOGY SCHOOL OF NATURAL SCIENCES AND MATHEMATICS DEPARTMENT OF MATHEMATICS MEASURES OF CENTRAL TENDENCY AND DISPERSION INTRODUCTION From the previous unit, the Graphical displays

More information

WHICH TYPE OF GRAPH SHOULD YOU CHOOSE?

WHICH TYPE OF GRAPH SHOULD YOU CHOOSE? PRESENTING GRAPHS WHICH TYPE OF GRAPH SHOULD YOU CHOOSE? CHOOSING THE RIGHT TYPE OF GRAPH You will usually choose one of four very common graph types: Line graph Bar graph Pie chart Histograms LINE GRAPHS

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

Measures of Central Tendency. There are different types of averages, each has its own advantages and disadvantages.

Measures of Central Tendency. There are different types of averages, each has its own advantages and disadvantages. Measures of Central Tendency According to Prof Bowley Measures of central tendency (averages) are statistical constants which enable us to comprehend in a single effort the significance of the whole. The

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Chapter 2: Frequency Distributions and Graphs (or making pretty tables and pretty pictures)

Chapter 2: Frequency Distributions and Graphs (or making pretty tables and pretty pictures) Chapter 2: Frequency Distributions and Graphs (or making pretty tables and pretty pictures) Example: Titanic passenger data is available for 1310 individuals for 14 variables, though not all variables

More information

MAT 142 College Mathematics Module #3

MAT 142 College Mathematics Module #3 MAT 142 College Mathematics Module #3 Statistics Terri Miller Spring 2009 revised March 24, 2009 1.1. Basic Terms. 1. Population, Sample, and Data A population is the set of all objects under study, a

More information

Glossary of numeracy terms

Glossary of numeracy terms Glossary of numeracy terms These terms are used in numeracy. You can use them as part of your preparation for the numeracy professional skills test. You will not be assessed on definitions of terms during

More information

Utah Core Curriculum for Mathematics

Utah Core Curriculum for Mathematics Core Curriculum for Mathematics correlated to correlated to 2005 Chapter 1 (pp. 2 57) Variables, Expressions, and Integers Lesson 1.1 (pp. 5 9) Expressions and Variables 2.2.1 Evaluate algebraic expressions

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

MEI Statistics 1. Exploring data. Section 1: Introduction. Looking at data

MEI Statistics 1. Exploring data. Section 1: Introduction. Looking at data MEI Statistics Exploring data Section : Introduction Notes and Examples These notes have sub-sections on: Looking at data Stem-and-leaf diagrams Types of data Measures of central tendency Comparison of

More information