Comments 2 For Discussion Sheet 2 and Worksheet 2 Frequency Distributions and Histograms

Similar documents
Drawing a histogram using Excel

AP * Statistics Review. Descriptive Statistics

Statistics Revision Sheet Question 6 of Paper 2

Chapter 1: Exploring Data

Statistics Chapter 2

Summarizing and Displaying Categorical Data

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Chapter 2: Frequency Distributions and Graphs

Diagrams and Graphs of Statistical Data

Variables. Exploratory Data Analysis

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

MEASURES OF VARIATION

Appendix 2.1 Tabular and Graphical Methods Using Excel

Descriptive Statistics

seven Statistical Analysis with Excel chapter OVERVIEW CHAPTER

Means, standard deviations and. and standard errors

To create a histogram, you must organize the data in two columns on the worksheet. These columns must contain the following data:

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Northumberland Knowledge

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Exploratory data analysis (Chapter 2) Fall 2011

Coins, Presidents, and Justices: Normal Distributions and z-scores

Sta 309 (Statistics And Probability for Engineers)

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Describing, Exploring, and Comparing Data

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

Section 1.1 Exercises (Solutions)

Using Excel for descriptive statistics

Data exploration with Microsoft Excel: univariate analysis

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Probability Distributions

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Step 3: Go to Column C. Use the function AVERAGE to calculate the mean values of n = 5. Column C is the column of the means.

Interpreting Data in Normal Distributions

How to make a line graph using Excel 2007

Week 4: Standard Error and Confidence Intervals

Basic Tools for Process Improvement

Getting started in Excel

Exercise 1.12 (Pg )

6.4 Normal Distribution

How To Check For Differences In The One Way Anova

TEACHER NOTES MATH NSPIRED

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Session 6 Number Theory

Data Analysis Tools. Tools for Summarizing Data

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

Visualization Quick Guide

Directions for Frequency Tables, Histograms, and Frequency Bar Charts

Gestation Period as a function of Lifespan

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

6 3 The Standard Normal Distribution

Years after US Student to Teacher Ratio

2 Describing, Exploring, and

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Chapter 2: Descriptive Statistics

Data Exploration Data Visualization

Chapter 4: Average and standard deviation

Descriptive Statistics and Measurement Scales

Midterm Review Problems

Common Tools for Displaying and Communicating Data for Process Improvement

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

THE BINOMIAL DISTRIBUTION & PROBABILITY

Characteristics of Binomial Distributions

Unit 7: Normal Curves

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Lecture 1: Review and Exploratory Data Analysis (EDA)

Create Charts in Excel

Probability. Distribution. Outline

Simplifying Improper Fractions Poster

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

Area and Perimeter: The Mysterious Connection TEACHER EDITION

Activity 3.7 Statistical Analysis with Excel

Projects Involving Statistics (& SPSS)

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Infographics in the Classroom: Using Data Visualization to Engage in Scientific Practices

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

ABSORBENCY OF PAPER TOWELS

Table of Contents TASK 1: DATA ANALYSIS TOOLPAK... 2 TASK 2: HISTOGRAMS... 5 TASK 3: ENTER MIDPOINT FORMULAS... 11

4. Continuous Random Variables, the Pareto and Normal Distributions

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Module 2: Introduction to Quantitative Data Analysis

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Quantitative vs. Categorical Data: A Difference Worth Knowing Stephen Few April 2005

Chapter 4 Displaying Quantitative Data

Describing and presenting data

Instruction Manual for SPC for MS Excel V3.0

Formulas, Functions and Charts

Module 3: Correlation and Covariance

GeoGebra Statistics and Probability

STAB22 section 1.1. total = 88(200/100) + 85(200/100) + 77(300/100) + 90(200/100) + 80(100/100) = = 837,

Chapter 32 Histograms and Bar Charts. Chapter Table of Contents VARIABLES METHOD OUTPUT REFERENCES...474

Questions: Does it always take the same amount of force to lift a load? Where should you press to lift a load with the least amount of force?

Excel -- Creating Charts

Transcription:

Comments 2 For Discussion Sheet 2 and Worksheet 2 Frequency Distributions and Histograms Discussion Sheet 2 We have studied graphs (charts) used to represent categorical data. We now want to look at a table and a kind of graph for representing numerical (as opposed to categorical) data: frequency distributions and histograms. We sometimes want to make tables that show us the shape of a numerical data set for which numbers there are the most cases, if it is skewed towards small or large values or is fairly symmetrical, if there are large gaps in the data, if there are unusually large or small values, and so on. It is important to get a feel for the shape or structure of a data set with which you are working. Constructing Frequency Distributions To make a frequency distribution or relative frequency distribution, follow these steps: 1. Calculate the range of the data: Range = largest value smallest value. 2. Determine how many (between 5 and 20) classes (intervals) of data will be needed to cover the range. The rule of thumb is to use a number of classes approximately equal to the square root of the number of values in the data set, but no more than 20 and no less than 5. The idea is to pick a number of classes that will show the structure of the data without picking so many classes that there will only be a few numbers in each class. 3. Divide the range by the number of classes (intervals) to determine the width of each interval. The classes will all be of equal width, that is, consist of an interval of the range that is the same size as each of the other intervals. 4. Determine the upper and lower bound of each class (interval). You are dividing the range into a set of intervals that do not overlap and that together cover the range from smallest to largest value. Adjust the bounds of each class so that it is not a number in the data set. For example, if the data in the set are integers, you might change the boundaries to end with.5. We want each value in the data set to fall into one and only one of these classes. 5. Determine the number of data values that fall into each class. This is called the class frequency for that class. 6. Make a table listing the classes in one column and the class frequencies next to them in another column. This kind of table is called a frequency distribution. 7. Alternatively, we could determine what percentage of the total number of data values from the data set lie in each class by dividing the class frequency by the total and multiplying by 100. This is called the class relative frequency. 8. We could make a table, just as in Step 6 but using the class relative frequencies instead of the class frequencies. This kind of table is called a relative frequency distribution. 1

1. The data on female cholesterol for a sample of 20 are given below. Make a frequency distribution for this data set. Sex Cholesterol FEM 215 FEM 257 FEM 212 FEM 238 FEM 163 FEM 171 FEM 196 FEM 187 FEM 405 FEM 232 FEM 155 FEM 309 The frequency distribution should have 5 to 20 classes. The approximate number is 20 4.47214 5. If we extend the data from 150 (below the lowest value of 155) to 450 (above the highest value of 405), we could use 6 classes of size 50 to get from 150 to 450. Let s do that. So 450 150 our class size is = 300 = 50. Since we want non-overlapping classes (intervals) that cover 6 6 the range from 150 to 450, we could have 150-200, 200-250, 250-300, 300-350, 350-400, and 400-450. Since the numbers 150, 200, 250, 300, 350, 400 and 450 do not appear among the values in our data set, we can use these as boundaries for the classes and have exactly one class into which to put every number in the data set. If one of these numbers, say 200, was among the data values, we could avoid problems by setting the boundaries as 150.5, 200.5, 250.5 and so on (since all of our data values are integers and thus none of them could be one of these boundaries). With these classes, we then make a table and show the frequency of values in each class. If we look at the set of classes, we see this distribution of the data values: 150-200 167, 167, 198, 198, 163, 171, 196, 187, 155 200-250 234, 215, 212, 238, 234, 232 250-300 271, 257, 271 300-350 309 350-400 400-450 405 Of course, for a frequency distribution, we don t want the actual values in each class but the frequency (number of values) in each class. Counting them from above, we get Class Frequency 150-200 9 200-250 6 250-300 3 300-350 1 350-400 0 400-450 1 2

2. Make a relative frequency distribution for these data. Once we have a frequency distribution, the relative frequency distribution is easy to find. We just need to convert the frequency for each class into a percentage by dividing by the total number of data values and multiplying by 100: 100 9 20 = 45% 100 6 20 = 30% 100 3 20 = 15% 100 1 20 = 5% This then gives us the relative frequency distribution: Class Percent 150-200 45 200-250 30 250-300 15 300-350 5 350-400 0 400-450 5 3. How are the frequency distribution and the relative frequency distribution the same and how are they different? Both frequency distributions have the same classes. For the frequency distribution, the actual count or frequency of data values in each class is shown. For the relative frequency distribution, the percentage of the total number of data values in each class is shown. They show the same shape (center, spread, skew, gaps, unusually high or low values, etc.) but it may be easier to estimate the size in percentages rather than actual counts, especially when the number of data values in the data set is large. Constructing Histograms A histogram (so called because it was first used in picturing numbers of different types of blood cells) is essentially a bar graph (usually vertical) in which is category is a class from a frequency distribution or relative frequency distribution. Because the classes of a frequency distribution form a continuous set of intervals covering the range of data, the bars of a histogram lie next to each other and are not separated by spaces. One axis (scale) of the graph is the set of categories from the frequency distribution. The other axis (scale) can be the number of data values from each class. If so, this is called a histogram and that axis is labeled Number. It can, alternatively, be relative frequency of each class. If so, this is called a relative frequency histogram and the axis is labeled Percent. 3

4. Make a histogram of the cholesterol data. 10 8 6 4 2 150 300 F Chol 5. Which makes it easier to see the structure or shape of the data set, the frequency distribution or the histogram? For most people it is easier to get a sense of the shape or structure of a distribution (center, spread, skew, gaps, unusually high or low values, etc.) from a picture than from a table of numbers. This means that for most people the histogram makes it easier to see the structure or shape of the data set than the frequency distribution does. Worksheet 2 The data on female cholesterol for a sample of 20 used in Discussion Sheet 2 are given below: Sex Cholesterol FEM 215 FEM 257 FEM 212 FEM 238 FEM 163 FEM 171 FEM 196 FEM 187 FEM 405 FEM 232 FEM 155 FEM 309 4

1. Make a relative frequency histogram of these data (you probably will want to use the relative frequency distribution you made in Discussion Sheet 2. Percent 50 10 40 8 30 6 20 4 10 2 150 300 F Chol 2. How is the size and shape of this relative frequency histogram the same and how is it different from the histogram you made for the same data in Discussion Sheet 2? The size and shape of the histogram and the relative frequency histogram are the same. The only difference is that the vertical axis is scaled with numbers (frequencies) for the histogram and with percents for the relative frequency histogram. 3. Which reveals more about the shape or structure of the data: the relative frequency distribution or the relative frequency histogram for the same data? For most people it is easier to get a sense of the shape or structure of a distribution (center, spread, skew, gaps, unusually high or low values, etc.) from a picture than from a table of numbers. This means that for most people the relative frequency histogram makes it easier to see the structure or shape of the data set than the relative frequency distribution does. 4. Which reveals more about the shape or structure of the data: the histogram or the relative frequency histogram for the same data? Since their size and shape are exactly the same, they reveal the same thing about the shape and structure of the data so neither reveals more than the other. 5. Why might you use a relative frequency histogram instead of a simple histogram to picture a data set? When there are a large number of values or an unusual number of values (for example, 23), percentages are more familiar than the actual counts would be. In that case, we might have a better sense of the data from a relative frequency histogram instead of a simple histogram. 5