vs. relative cumulative frequency

Similar documents
Exploratory data analysis (Chapter 2) Fall 2011

Exercise 1.12 (Pg )

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

AP * Statistics Review. Descriptive Statistics

2 Describing, Exploring, and

Variables. Exploratory Data Analysis

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Summarizing and Displaying Categorical Data

+ Chapter 1 Exploring Data

a. mean b. interquartile range c. range d. median

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Describing, Exploring, and Comparing Data

Chapter 2: Frequency Distributions and Graphs

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Diagrams and Graphs of Statistical Data

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

MTH 140 Statistics Videos

Box-and-Whisker Plots

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Descriptive Statistics

Lecture 1: Review and Exploratory Data Analysis (EDA)

Sta 309 (Statistics And Probability for Engineers)

Statistics Revision Sheet Question 6 of Paper 2

How Does My TI-84 Do That

Using SPSS, Chapter 2: Descriptive Statistics

How To Write A Data Analysis

Exploratory Data Analysis. Psychology 3256

Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Academic Support Center. Using the TI-83/84+ Graphing Calculator PART II

AP Statistics Solutions to Packet 2

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

3: Summary Statistics

Bar Graphs and Dot Plots

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Chapter 1: Exploring Data

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Lesson 3 Using the Sine Function to Model Periodic Graphs

AP STATISTICS REVIEW (YMS Chapters 1-8)

Math Tools Cell Phone Plans

MBA 611 STATISTICS AND QUANTITATIVE METHODS

First Midterm Exam (MATH1070 Spring 2012)

AMS 7L LAB #2 Spring, Exploratory Data Analysis

Statistics Chapter 2

Correlation and Regression

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

SPSS Manual for Introductory Applied Statistics: A Variable Approach

I. Turn it on: Press É

Getting to know your TI-83

TEACHER NOTES MATH NSPIRED

Module 2: Introduction to Quantitative Data Analysis

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Mind on Statistics. Chapter 2

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Box-and-Whisker Plots

GeoGebra Statistics and Probability

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.


Lecture 2. Summarizing the Sample

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

Basics of Statistics

6 3 The Standard Normal Distribution

Chapter 3. The Normal Distribution

THE BINOMIAL DISTRIBUTION & PROBABILITY

Scatter Plots with Error Bars

Section 1.1 Exercises (Solutions)

2: Frequency Distributions

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

Lesson 4 Measures of Central Tendency

What Does the Normal Distribution Sound Like?

MEASURES OF VARIATION

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

determining relationships among the explanatory variables, and

USING A TI-83 OR TI-84 SERIES GRAPHING CALCULATOR IN AN INTRODUCTORY STATISTICS CLASS

Probability Distributions

Module 4: Data Exploration

Data exploration with Microsoft Excel: univariate analysis

Week 1. Exploratory Data Analysis

Measures of Central Tendency and Variability: Summarizing your Data for Others

Drawing a histogram using Excel

List of Examples. Examples 319

Data Exploration Data Visualization

Unit 7 Quadratic Relations of the Form y = ax 2 + bx + c

AP Statistics Solutions to Packet 1

Describing and presenting data

TI-83, TI-83 Plus and the TI-84 GRAPHING CALCULATOR MANUAL. Introductory Statistics. Prem S. Mann Eastern Connecticut State University

Common Tools for Displaying and Communicating Data for Process Improvement

SPSS Explore procedure

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

2-7 Exploratory Data Analysis (EDA)

Definition: A vector is a directed line segment that has and. Each vector has an initial point and a terminal point.

Transcription:

Variable - what we are measuring Quantitative - numerical where mathematical operations make sense. These have UNITS Categorical - puts individuals into categories Numbers don't always mean Quantitative... Frequency vs. Relative Frequency vs. cumulative frequency vs. relative cumulative frequency

Two-Way Tables and Marginal Distributions Distributions are of VARIABLES, not individual values!!! To examine a marginal distribution, 1) Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. 2) Make a graph to display the marginal distribution. Note: Percents are often more informative than counts, especially when comparing groups of different sizes.

A Conditional Distribution of a variable describes the values of that variable among individuals who have a specidic value of another variable. To examine or compare conditional distributions, 1) Select the row(s) or column(s) of interest. 2) Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). 3) Make a graph to display the conditional distribution. Use a side-by-side bar graph or segmented bar graph to compare distributions. There are three main ways to display quantitative data: -Dotplots -Stemplots -split -back-to-back -Histograms

How to create a dotplot: 1) Draw a horizontal axis (a number line) and label it with the variable name. 2) Scale the axis from the minimum to the maximum value. 3) Mark a dot above the location on the horizontal axis corresponding to each data value. How to make a stemplot: 1) Separate each observation into a stem (all but the Linal digit) and a leaf (the Linal digit). 2) Write all possible stems from the smallest to the largest in a vertical column and draw a vertical line to the right of the column. 3) Write each leaf in the row to the right of its stem. Arrange the leaves in increasing order out from the stem. 4) Provide a key that explains in context what the stems and leaves represent. Splitting Stems and Back-to-Back Stemplots When data values are bunched up, we can get a better picture of the distribution by splitting stems. Two distributions of the same quantitative variable can be compared using a back-to-back stemplot with common stems. How to make a histogram: 1) Divide the range of data into classes of equal width. 2) Find the count (frequency) or percent (relative frequency) of individuals in each class. 3) Label and scale your axes and draw the histogram. The height of the bar equals its frequency. Adjacent bars should touch, unless a class contains no individuals.

(Using your calculator) 1. Enter the data into L 1. (press the STAT button, highlight EDIT and choice #1 and press ENTER). 2. Turn on the stat-plot. (press 2 nd and the Y= button to select STAT PLOT, highlight choice #1 and press ENTER, select ON and press enter, select the histogram under TYPE and press enter) 3. Adjust your window. (press the WINDOW button; enter your minimum value (smaller than the smallest observation) for Xmin, enter your maximum value (larger than the largest observation) for Xmax, enter the length of your classes for Xscl (i.e. what you are counting by to get from Xmin to Xmax), adjust your Ymin = 0 and Ymax appropriately) OR Go to ZOOM and select #9ZoomStat Using Histograms Wisely Here are several cautions based on common mistakes students make when using histograms. 1) Don t confuse histograms and bar graphs. 2) Don t use counts (in a frequency table) or percents (in a relative frequency table) as data. 3) Use percents instead of counts on the vertical axis when comparing distributions with different numbers of observations. 4) Just because a graph looks nice, it s not necessarily a meaningful display of data.

Relative Frequency Histogram This type of histogram displays proportions or percents rather than counts. Cumulative Frequency Histogram (Ogive) Examine the Distribution Look for the OVERALL pattern and any striking DEVIATIONS from that pattern Describe the shape, center, and spread and determine if there are any outliers (don't forget your SOCS!) Shape Skewed or symmetric? Symmetric - the left and right hand sides of the histogram are approximately mirror images of each other Skewed right - the right side of the histogram extends MUCH farther out than the left side ("tail" goes to the right) Skewed left - the left side of the histogram extends MUCH farther out than the right side ("tail" goes to the left) Uniform distribution - doesn't appear to have any modes - pretty much the same height across the whole distribution

Measures of Center We have two ways of numerically measuring the center of a quantitative data set - the Median and the Mean. Both of these can be considered to give us the "average" of a data set. Some issues with notation: There are two ways to write the mean The choice depends on whether you are talking about the entire POPULATION of interest or just a SAMPLE from the entire population. Unless you are 100% positive you have the data from the ENTIRE population, use μ. If you see being used, then the data must be from the entire population. Comparing the Mean and Median In a symmetric distribution the mean and median are VERY close together. In a skewed distribution the mean will be greater than or less than the median, depending upon the skew. The larger the difference between the two, the greater the skew. If the mean is greater than the median, the distribution is skewed right If the mean is smaller than the median, the distribution is skewed left

Measures of Spread As with measures of center, we have two different ways to measure the spread in quantitative data - quartiles and IQR and the standard deviation and variance. Standard Deviation - (written as σ - population or s - sample) and Variance - (written as σ 2 - population or s 2 - sample) The standard deviation gives a measure of the "average" distance that data points fall from the mean s = 0 ONLY when there is NO SPREAD - this only happens when every observation is the SAME otherwise s > 0 The more spread out the observations are the greater s will be s has the same units of measurement as the observations do Like we saw with the mean, s is not resistant Choosing measures of center of spread 1. FIVE-NUMBER SUMMARY or Median and IQR The Five-Number Summary gives a quick summary of both the center and spread of your data. Some people also consider giving the IQR with the Median to be a suflicient measure of center and spread. It contains the Minimum observation, Q 1, the Median, Q 3, and the Maximum observation. Use when the distribution is skewed or has strong outliers Used to create another graphical display of quantitative data - the BOXPLOT 2. The Mean and Standard Deviation Use for reasonably symmetric distribution that are free of outliers

Boxplot A graph of the Dive-number summary A central box spans the quartiles, Q 1 and Q 3 with a line marking the median, M. Lines extend from the edge of the box ( Q 1 and Q 3 ) out to the minimum and maximum values, respectively. IF THERE ARE OUTLIERS: DO NOT extend the lines to outliers. Only extend to the minimum and maximum values that are NOT outliers. Mark outliers with an asterisk. How to use the calculator for numerical summaries and boxplots: (Using your calculator) 1. Enter the data into L 1. (press the STAT button, highlight EDIT and choice #1 and press ENTER). For Numerical Summaries: 2. Press the STAT button, arrow over to CALC 3. Select 1-Var Stats 4. You will get a list of values on your main screen. Arrow through to find all necessary values. mean standard deviation Minimum Observation Q 1 Median Q 3 Maximum For Boxplot: 2. Turn on the stat-plot. (press 2 nd and the Y= button to select STAT PLOT, highlight choice #1 and press ENTER, select ON and press enter) 3. Select the FIRST boxplot option under "TYPE" - this one graphs outliers 4. Adjust your window. (ZOOM, select #9ZoomStat)