Summarizing and Displaying Categorical Data



Similar documents
Diagrams and Graphs of Statistical Data

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Appendix 2.1 Tabular and Graphical Methods Using Excel

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Sta 309 (Statistics And Probability for Engineers)

Statistics Chapter 2

Chapter 2: Frequency Distributions and Graphs

Exploratory data analysis (Chapter 2) Fall 2011

Exercise 1.12 (Pg )

Darton College Online Math Center Statistics. Chapter 2: Frequency Distributions and Graphs. Presenting frequency distributions as graphs

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Lecture 1: Review and Exploratory Data Analysis (EDA)

Data Exploration Data Visualization

Exploratory Data Analysis

Descriptive Statistics

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

How To Write A Data Analysis

Describing, Exploring, and Comparing Data

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Data exploration with Microsoft Excel: analysing more than one variable

% 40 = = M 28 28

Intro to Statistics 8 Curriculum

Exploratory Data Analysis. Psychology 3256

Module 2: Introduction to Quantitative Data Analysis

Lesson 4 Measures of Central Tendency

Variables. Exploratory Data Analysis

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Data exploration with Microsoft Excel: univariate analysis

Statistics Revision Sheet Question 6 of Paper 2


Exploratory Data Analysis

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

Week 1. Exploratory Data Analysis

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Chapter 23. Inferences for Regression

Descriptive statistics parameters: Measures of centrality

A Picture Really Is Worth a Thousand Words

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

MTH 140 Statistics Videos

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Lecture 2. Summarizing the Sample

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

Visualizations. Cyclical data. Comparison. What would you like to show? Composition. Simple share of total. Relative and absolute differences matter

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

2: Frequency Distributions

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

COMMON CORE STATE STANDARDS FOR

Correlation and Regression

Chapter 1: Exploring Data

Common Tools for Displaying and Communicating Data for Process Improvement

AP * Statistics Review. Descriptive Statistics

Using Excel 2003 with Basic Business Statistics

Practice#1(chapter1,2) Name

determining relationships among the explanatory variables, and

Visualization Quick Guide

Basic Tools for Process Improvement

Section 1.1 Exercises (Solutions)

TEACHER NOTES MATH NSPIRED

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Demographics of Atlanta, Georgia:

430 Statistics and Financial Mathematics for Business

Evaluating the results of a car crash study using Statistical Analysis System. Kennesaw State University

2 Describing, Exploring, and

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

SPSS Manual for Introductory Applied Statistics: A Variable Approach

Exploratory Spatial Data Analysis

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

TECHNIQUES OF DATA PRESENTATION, INTERPRETATION AND ANALYSIS

THE BINOMIAL DISTRIBUTION & PROBABILITY

Foundation of Quantitative Data Analysis

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

2. Simple Linear Regression

Chapter 2 Data Exploration

Chapter 4 Displaying Quantitative Data

Shape of Data Distributions

AP STATISTICS REVIEW (YMS Chapters 1-8)

Using Excel for descriptive statistics

AMS 7L LAB #2 Spring, Exploratory Data Analysis

All Visualizations Documentation

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

GeoGebra Statistics and Probability

Bar Graphs and Dot Plots

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Fairfield Public Schools

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

CAMI Education linked to CAPS: Mathematics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Describing Data: Frequency Distributions and Graphic Presentation

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

The Comparisons. Grade Levels Comparisons. Focal PSSM K-8. Points PSSM CCSS 9-12 PSSM CCSS. Color Coding Legend. Not Identified in the Grade Band

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

Transcription:

Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency distribution which measures the percentage of the data set, or proportion, within each category. Categorical data can be visualized in a bar graph. Bars, labelled by category, have heights determined by the frequency (or relative frequency) of data in that category. Bars should be separated by gaps across the display. [Excel: Data > PivotTable; Insert > Charts > Column > Clustered Column] A pie chart represents categories with labeled sectors in a circle; the proportion of data in that category equals the percentage of the area of the circle assigned to that sector. Its best not to use a pie chart when the number of categories is large. [Excel: Data > PivotTable; Insert > Charts > Pie > Pie] 1

Summarizing Categorical Data The most common categorical data involves a variable with only two possible values: either the individual being measured possesses some characteristic of interest, or it doesn t. The resulting categories can be referred to as either Success or Failure. proportion of successes (p) statistic that summarizes the data set by recording the proportion of data values which are Successes: number of Successes p =, n where n represents the number of values in the data set proportion of failures (q) statistic that summarizes the data set by recording the proportion of data values which are Failures: number of Failures q = ; n since there are only two categories, we always have that q = 1 p. 2

Displaying Quantitative Data Numerical data can be visualized with a histogram. Data are separated into (usually equal) intervals along a numerical scale, called classes, then the frequency distribution of data in each class is tallied. Bars are built over each interval with heights, measured along a vertical scale, given by the frequency (or relative frequency) of data within each class. [Excel: Data > PivotTable; PivotTableTools > Options > Group > Group Field; Insert > Charts > Column > Clustered Column; Format Data Series > Series Option > Gap Width > No Gap] A polygon display is obtained by replacing the bars of a histogram with a broken line joining points which are plotted at the midpoints of tops of the bars for each class interval. [Excel: build histogram, then... Change Series Chart Type > Line > Line with Markers] 3

A cumulative frequency distribution records the number of observations that fall at or below the upper limits of each class; a cumulative relative frequency distribution records the proportion of observations that fall at or below the upper limits of the classes. The histogram-like display of the cumulative (relative) frequency distribution formed by erecting bars over each class is called an ogive. [Excel: build polygon, then... PivotTable Field List > Values > Value Field Settings > Show Values As > % Running Total In] A quick way to display numerical data by hand is with a stem-and-leaf display. All but the rightmost digit (or digits) of the measurement become stems; stems head rows in which the remaining digit(s), the leaves, are listed, lined up vertically in columns. (List all intermediate stems, even if they contain no leaves!) 4

Describing Quantitative Data: Features of Interest The shape of a histogram or stem-and-leaf describes the distribution of the data where data is concentrated and how it spreads out across the entire range of values. Where is the center of the distribution located? How much spread is there in the distribution? How tightly is the data clustered about the center? Is there more than one cluster, or mode? Is the data unimodal, bimodal, multimodal? Note: The location of modes can change with the scaling unit of a display (width of a bar). Is the distribution uniform (has a flat contour), indicating that every value is (roughly) equally represented? Is it roughly symmetric, with equally frequent values on either side of the center (the distribution to the right of the center is the mirror image of what appears to the left)? Or is it skewed (heaver on one side of the center than the other) to the left or right, in the direction of the tail (region of most extreme values)? Are there any outliers (values located very far from the center)? Can we explain why they appear? 5

Displaying Paired Numerical Data Paired numerical data sets are quite common in statistical practice. This occurs when two Whats are measured for the same set of Whos. Often the goal is to determine whether values of one of the variables are affected by changes in values of the other variable. response (dependent) variable measures a characteristic of interest in a study; the aim is to determine how this variable is affected by variation in some other quantity, namely, an... explanatory (independent or predictor) variable a variable which may turn out to influence the outcome of the response variable scatterplot display of paired data as points (x, y) in a coordinate plane; here, x represents the explanatory variable, y the response variable [Excel: Insert > Charts > Scatter > Scatter with only Markers] 6

To investigate the possible relationship between the variables, look for overall patterns in the plot and be on the watch for outliers (points located far from the region where most data are clustered) or deviations from the overall patterns association tendency for change in one variable to be accompanied by change in the other direction variables display a positive association if larger values of one tend to be paired with larger values of the other, and a negative association if larger values of one tend to be paired with smaller values of the other form shape of the plot, including clusters of data points; linear relationships are most important strength how closely the points conform to the overall shape of the plot 7