Multiple Box-and-Whisker Plot



Similar documents
Factor Analysis. Sample StatFolio: factor analysis.sgp

Using SPSS, Chapter 2: Descriptive Statistics

Box-and-Whisker Plots with The SAS System David Shannon, Amadeus Software Limited

Statgraphics Getting started

Scatter Plots with Error Bars

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Bellwork Students will review their study guide for their test. Box-and-Whisker Plots will be discussed after the test.

SPC Response Variable

What is a Box and Whisker Plot?

Exploratory data analysis (Chapter 2) Fall 2011

Descriptive Statistics

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

10 Listing data and basic command syntax

Chapter 3. The Normal Distribution

STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc.

+ Chapter 1 Exploring Data

Shape of Data Distributions

Understanding, Identifying & Analyzing Box & Whisker Plots

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Intro to Statistics 8 Curriculum

Excel Unit 4. Data files needed to complete these exercises will be found on the S: drive>410>student>computer Technology>Excel>Unit 4

Data Exploration Data Visualization

DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability

Mean = (sum of the values / the number of the value) if probabilities are equal

Lecture 1: Review and Exploratory Data Analysis (EDA)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Appendix 2.1 Tabular and Graphical Methods Using Excel

Descriptive Statistics

3: Summary Statistics

Box-and-Whisker Plots

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Bar Graphs and Dot Plots

Data Handling and Probability (Grades 10, 11 and 12)

Introduction to Exploratory Data Analysis

Using Excel s PivotTable to Analyze Learning Assessment Data

Interpreting Data in Normal Distributions

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Northumberland Knowledge

ECONOMICS 351* -- Stata 10 Tutorial 2. Stata 10 Tutorial 2

Microsoft Excel 2010 Tutorial

Microsoft Excel 2010 Part 3: Advanced Excel

Describing and presenting data

Describing, Exploring, and Comparing Data

AP * Statistics Review. Descriptive Statistics

Microsoft Excel 2010 Pivot Tables

Confidence Intervals for One Standard Deviation Using Standard Deviation

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

DIRECTIONS FOR SETTING UP LABELS FOR MARCO S INSERT STOCK IN WORD PERFECT, MS WORD AND ACCESS

Create Charts in Excel

Statistics Chapter 2

How To Check For Differences In The One Way Anova

Variables. Exploratory Data Analysis

Hierarchical Clustering Analysis

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

CONSTRUCTING SINGLE-SUBJECT REVERSAL DESIGN GRAPHS USING MICROSOFT WORD : A COMPREHENSIVE TUTORIAL

Module 2 Basic Data Management, Graphs, and Log-Files

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

AP Statistics Solutions to Packet 2

Box Plots. Objectives To create, read, and interpret box plots; and to find the interquartile range of a data set. Family Letters

Workspaces Creating and Opening Pages Creating Ticker Lists Looking up Ticker Symbols Ticker Sync Groups Market Summary Snap Quote Key Statistics

Mark. Use this information and the cumulative frequency graph to draw a box plot showing information about the students marks.

Dealing with Data in Excel 2010

OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE

Data Visualization. Prepared by Francisco Olivera, Ph.D., Srikanth Koka Department of Civil Engineering Texas A&M University February 2004

MBA 611 STATISTICS AND QUANTITATIVE METHODS

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

FirstClass FAQ's An item is missing from my FirstClass desktop

CHAPTER A tomobile utomobile Ownership Ownership 5-1 Classifi ed Ads 5-2 Buy or Sell a Car 5-3 Graph Frequency Distributions 5-4 Automobile Insurance

Confidence Intervals for Cpk

Data Analysis Tools. Tools for Summarizing Data

Modifying Colors and Symbols in ArcMap

Microsoft Word Revising Word Documents Using Markup Tools

How to make a line graph using Excel 2007

Algebra I Vocabulary Cards

Exploratory Data Analysis

About PivotTable reports

Table State Percent (%) State Percent (%) State Percent (%)

AMS 7L LAB #2 Spring, Exploratory Data Analysis

II-13Category Plots. Chapter II-13

Diagrams and Graphs of Statistical Data

Introduction to SPSS 16.0

Hands On Exercise Using the SHA Detail Model (Bridge)

SPSS Manual for Introductory Applied Statistics: A Variable Approach

IBM SPSS Statistics for Beginners for Windows

Dime.Scheduler for Dynamics NAV

Common Tools for Displaying and Communicating Data for Process Improvement

All V7 registers support barcode printing, except the Sharp 410/420 1A ROM and that limitation is based upon the register.

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Mean, Median, and Mode

Using ELM Reports in WhatsUp Gold. This guide provides information about configuring ELM reports in WhatsUp Gold v15.0

Students summarize a data set using box plots, the median, and the interquartile range. Students use box plots to compare two data distributions.

Confidence Intervals for the Difference Between Two Means

There are two distinct working environments, or spaces, in which you can create objects in a drawing.

DesCartes (Combined) Subject: Mathematics Goal: Data Analysis, Statistics, and Probability

Scientific Graphing in Excel 2010

Comparing Means in Two Populations

ABOUT THIS DOCUMENT ABOUT CHARTS/COMMON TERMINOLOGY

Transcription:

Multiple Summary The Multiple procedure creates a plot designed to illustrate important features of a numeric data column when grouped according to the value of a second variable. The box-and-whisker plot was first described by John Tukey (1977) in his box Exploratory Data Analysis. Box-and-whisker plots summarize data samples through 5 statistics: 1. minimum 2. lower quartile 3. median 4. upper quartile 5. maximum They can also indicate the presence of outliers. Sample StatFolio: boxplot.sgp Sample Data: The file 93cars.sf3 contains information on 26 variables for n = 93 makes and models of automobiles, taken from Lock (1993). The table below shows a partial list of 4 columns from that file: Make Model MPG Highway Domestic Acura Integra 31 0 Acura Legend 25 0 Audi 90 26 0 Audi 100 26 0 BMW 535i 30 0 Buick Century 31 1 Buick LeSabre 28 1 Buick Roadmaster 25 1 Buick Riviera 27 1 Cadillac DeVille 25 1 Cadillac Seville 25 1 Chevrolet Cavalier 36 1 The Domestic column contains a 1 for a car manufactured in the United States and a 0 otherwise. 2005 by StatPoint, Inc. Multiple - 1

Data Input The data to be analyzed consist of a numeric column containing n = 2 or more observations and a second column (numeric or character) containing identifiers by which to group the data. Data: numeric column containing the data to be summarized. Level codes: numeric or non-numeric column containing group identifiers. Select: subset selection. Analysis Summary The Analysis Summary shows the number of observations in the data column and the number of levels (groups) into which the data has been divided. Multiple Dependent variable: MPG Highway Factor: Domestic Number of observations: 93 Number of levels: 2 2005 by StatPoint, Inc. Multiple - 2

This pane displays a box-and-whisker plot for the data corresponding to each level code. Domestic 0 1 20 25 30 35 40 45 50 MPG Highway For each level code, a plot is constructed in the following manner: A box is drawn extending from the lower quartile of the sample to the upper quartile. This is the interval covered by the middle 50% of the data values when sorted from smallest to largest. A vertical line is drawn at the median (the middle value). If requested, a plus sign is placed at the location of the sample mean. Whiskers are drawn from the edges of the box to the largest and smallest data values, unless there are values unusually far away from the box (which Tukey calls outside points). Outside points, which are points more than 1.5 times the interquartile range (box width) above or below the box, are indicated by point symbols. Any points more than 3 times the interquartile range above or below the box are called far outside points, and are indicated by point symbols with plus signs superimposed on top of them. If outside points are present, the whiskers are drawn to the largest and smallest data values which are not outside points. The above plot for the data on highway miles per gallon shows the data grouped by whether or not it is domestic (manufactured in the United States). The upper plot displays information for non-domestic automobiles, while the lower plot displays information for domestic automobiles. The greater width of the upper plot indicates that there is more variability amongst cars made outside the U.S. The difference between the median lines also shows that non-domestic automobiles have greater miles per gallon on average than domestic automobiles. 4 outside points can also been observed. Pane Options 2005 by StatPoint, Inc. Multiple - 3

Direction: the orientation of the plot, corresponding to the direction of the whiskers. Median Notch: if selected, a notch will be added to each plot to allow a comparison between the sample medians at the default system confidence level (set on the General tab of the Preferences dialog box on the Edit menu). The notches are constructed such that if 2 notches do not overlap, there is a statistically significant difference between the medians of those 2 groups. Outlier Symbols: if selected, indicates the location of outside points. Mean Marker: if selected, shows the location of the sample means as well as the medians. Example Notched The following plot shows the addition of a median notch at the 95% confidence level. Domestic 0 1 20 25 30 35 40 45 50 MPG Highway The notch covers the interval z 1.25( IQR ) 1 ± α / 1 + 2 1.35 n 2 ~ 2 x (1) 2005 by StatPoint, Inc. Multiple - 4

where x~ is the median of the data in group, IQR is the sample interquartile range, n is the sample size, and z α/2 is the upper (α/2)% critical value of a standard normal distribution. In the above plot, the 2 notches overlap horizontally, indicating that the medians of the two groups of data are not significantly different at the 5% significance level. 2005 by StatPoint, Inc. Multiple - 5