Descriptive Statistics and Exploratory Data Analysis



Similar documents
Lecture 1: Review and Exploratory Data Analysis (EDA)

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Exploratory data analysis (Chapter 2) Fall 2011

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Describing, Exploring, and Comparing Data

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Fairfield Public Schools

How To Write A Data Analysis

Description. Textbook. Grading. Objective

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

II. DISTRIBUTIONS distribution normal distribution. standard scores

Descriptive Statistics

An introduction to using Microsoft Excel for quantitative data analysis

Organizing Your Approach to a Data Analysis

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Variables. Exploratory Data Analysis

Economic Statistics (ECON2006), Statistics and Research Design in Psychology (PSYC2010), Survey Design and Analysis (SOCI2007)

Exploratory Data Analysis. Psychology 3256

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

INTRODUCING THE NORMAL DISTRIBUTION IN A DATA ANALYSIS COURSE: SPECIFIC MEANING CONTRIBUTED BY THE USE OF COMPUTERS

Summarizing and Displaying Categorical Data

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Module 2: Introduction to Quantitative Data Analysis

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

MTH 140 Statistics Videos

Exercise 1.12 (Pg )

Statistics. Measurement. Scales of Measurement 7/18/2012

EXPLORING SPATIAL PATTERNS IN YOUR DATA

How To Test For Significance On A Data Set

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

TRAINING PROGRAM INFORMATICS

The Visual Statistics System. ViSta

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Analyzing and interpreting data Evaluation resources from Wilder Research

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Module 4: Data Exploration

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Lecture 2. Summarizing the Sample

GETTING YOUR DATA INTO SPSS

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Data exploration with Microsoft Excel: analysing more than one variable

Introduction to Exploratory Data Analysis

Exploratory Data Analysis

Diagrams and Graphs of Statistical Data

THE UNIVERSITY OF TEXAS AT TYLER COLLEGE OF NURSING COURSE SYLLABUS NURS 5317 STATISTICS FOR HEALTH PROVIDERS. Fall 2013

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

Hands-On Data Analysis

Geostatistics Exploratory Analysis

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Descriptive Analysis

Now we begin our discussion of exploratory data analysis.

AP Statistics: Syllabus 1

CHAPTER TWELVE TABLES, CHARTS, AND GRAPHS

Practice#1(chapter1,2) Name

Correlation and Regression

Quantitative Methods for Finance

Statistics 151 Practice Midterm 1 Mike Kowalski

A Picture Really Is Worth a Thousand Words

The importance of graphing the data: Anscombe s regression examples

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics

Week 1. Exploratory Data Analysis

Statistics Revision Sheet Question 6 of Paper 2

Introduction to Statistics and Quantitative Research Methods

Descriptive statistics parameters: Measures of centrality

Chapter 1: Exploring Data

Data exploration with Microsoft Excel: univariate analysis

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Northumberland Knowledge

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

Analyzing Experimental Data

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

MBA 611 STATISTICS AND QUANTITATIVE METHODS

LESSON 4 - ADVANCED GRAPHING

Cleveland State University NAL/PAD/PDD/UST 504 Section 51 Levin College of Urban Affairs Fall, 2009 W 6 to 9:50 pm UR 108

Cell Phone Impairment?

Information Technology Services will be updating the mark sense test scoring hardware and software on Monday, May 18, We will continue to score

Copyright PEOPLECERT Int. Ltd and IASSC

5. Correlation. Open HeightWeight.sav. Take a moment to review the data file.

Simple Predictive Analytics Curtis Seare

Introduction Course in SPSS - Evening 1

2 Describing, Exploring, and

INTRODUCING DATA ANALYSIS IN A STATISTICS COURSE IN ENVIRONMENTAL SCIENCE STUDIES

MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS

Foundation of Quantitative Data Analysis

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

5 Correlation and Data Exploration

Transcription:

Descriptive Statistics and Exploratory Data Analysis Dean s s Faculty and Resident Development Series UT College of Medicine Chattanooga Probasco Auditorium at Erlanger January 14, 2008 Marc Loizeaux, PhD Department of Mathematics University of Tennessee at Chattanooga

What is descriptive statistics? Descriptive statistics describes your data. Visual and Numerical Inferential statistics draws inferences about a larger population. Estimation and hypothesis testing

The Big Picture Statistics Descriptive Inferential Visual Numerical Estimation Hypothesis Testing

Why descriptive statistics? To summarize our data To help us get to know our data To help us describe our data to an audience To help us explore our data.

What is Exploratory Data Analysis? Exploratory data analysis is detective work numerical detective work or counting detective work or graphical detective work - John Wilder Tukey, Exploratory Data Analysis,, page 1

Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot oddball values Helps us avoid embarrassing oversights May help us decide on the next step

Visual Descriptions (Tools for exploring your data visually) Charts and Graphs Histogram Dotplot Stem and leaf plot Boxplot Scatterplot And many more

A simple example Grades on the first exam 84 75 83 48 70 31 39 51 57 68 55 84 89 45 53 55 69 93 54 65 75 78 88 90 91 95 88 55 55 41 47 78 20 30 40 50 60 70 80 90 100

Numerical Descriptions (Univariate,, interval data) We want to describe. The central tendency of the data What is a center point for the data? What is a typical score? The variation of the data? How much spread is there to the data? How far apart are the data values from each other?

Measures of Central Tendency The mean is the arithmetic average. Easy to calculate, easy to understand The balance point of the data The median is the score in the middle. Resistant to extreme scores

Measures of Dispersion The range. Easy to calculate and quick Range = high score low score Limited only considers two scores The standard deviation. More complicated, but Indicates a typical deviation from the mean

Childhood Respiratory Disease (playing playing with the data) Data available from OzDASL, StatSci.org FEV (forced expiratory volume) is an index of pulmonary function that measures the volume of air expelled after one second of constant effort. The data: determinations of FEV on 654 children ages 6-226 who were seen in the Childhood Respiratory Desease Study in 1980 in East Boston, Massachusetts. The data are part of a larger study to follow the change in pulmonary function over time in children. Source: Tager,, I. B., Weiss, S. T., Rosner,, B., and Speizer,, F. E. (1979). Effect of parental cigarette smoking on pulmonary function in children. American Journal of Epidemiology, 110,, 15-26. Rosner,, B. (1990). Fundamentals of Biostatistics, 3rd Edition.. PWS-Kent, Boston, Massachusetts.

Some of the Data ID Age FEV Height Sex Smoker 46951 12 3.082 63.5 Female Non 47051 13 3.297 65 Female Current 47052 11 3.258 63 Female Non 72901 12 2.935 65.5 Male Non 73041 16 4.27 67 Male Current 73042 15 3.727 68 Male Current 73751 18 2.853 60 Female Non 75852 16 2.795 63 Female Current 77151 15 3.211 66.5 Female Non

Descriptive Statistics Age FEV Height Mean 9.93 2.64 61.14 Median 10.00 2.55 61.50 Mode 9 3.08 63 Standard Deviation 2.95 0.87 5.70 Range 16 5.00 28 Minimum 3 0.79 46 Maximum 19 5.79 74

Pictures may say more

The ages look like this

And again

One variable, then two A univariate exploration Explore each data column individually A multivariate exploration Explore the relationships between two data columns

Consider natural subgroups

Raising more questions?

It starts to make sense

Something else to study?

Differentiating Subgroups

Preparing for an Audience Some Do s Pick and choose your graphs Include appropriate numbers for your type of data Include narrative Does the histogram indicate asymmetry? Are there unexpected values in the data set? Are there special problems you had to deal with to describe the data?

Preparing for an Audience (2) Some Don ts Don t t include everything that just confuses us. Don t t be redundant some graphs say the same thing. Don t t include descriptors you don t understand (kurtosis?) ask the chauffeur

Points to Remember (in no particular order) Don t t skip the simple stuff! Spend time playing with your data. Pictures say a lot. Describe the spread as well as the center. Consider the natural subgroups in your data.

Next Time Confidence Intervals, 2 x 2 tables Hypothesis Tests, and Statistical Significance Monday, February 11