Basics and Beyond: Displaying Your Data. Mario Davidson, PhD Vanderbilt University School of Medicine Department of Biostatistics Instructor



Similar documents
Exercise 1.12 (Pg )

Using SPSS, Chapter 2: Descriptive Statistics

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

II. DISTRIBUTIONS distribution normal distribution. standard scores

Diagrams and Graphs of Statistical Data

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

MTH 140 Statistics Videos

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Exploratory data analysis (Chapter 2) Fall 2011

Northumberland Knowledge

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

List of Examples. Examples 319

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey):

Foundation of Quantitative Data Analysis

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Descriptive Statistics and Measurement Scales

Statistics. Measurement. Scales of Measurement 7/18/2012

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Practice#1(chapter1,2) Name

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

Scatter Plots with Error Bars

Chapter 2: Frequency Distributions and Graphs

CSU, Fresno - Institutional Research, Assessment and Planning - Dmitri Rogulkin

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

First Midterm Exam (MATH1070 Spring 2012)

Charts, Tables, and Graphs

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Exploratory Data Analysis. Psychology 3256

Descriptive Statistics

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Exploratory Data Analysis

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Describing, Exploring, and Comparing Data

AP * Statistics Review. Descriptive Statistics

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Demographics of Atlanta, Georgia:

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

How To Write A Data Analysis

Data Exploration Data Visualization

Summarizing and Displaying Categorical Data

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

WHAT IS A JOURNAL CLUB?

Interpreting Data in Normal Distributions

Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics

Describing and presenting data

Data Visualization Handbook

Descriptive Statistics and Exploratory Data Analysis

2 Describing, Exploring, and

Correlation and Regression

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

a. mean b. interquartile range c. range d. median

430 Statistics and Financial Mathematics for Business

Variables. Exploratory Data Analysis

DATA COLLECTION AND ANALYSIS

Mind on Statistics. Chapter 2

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

EXPLORING SPATIAL PATTERNS IN YOUR DATA

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Math Journal HMH Mega Math. itools Number

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

Best Practices in Data Visualizations. Vihao Pham January 29, 2014

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Best Practices in Data Visualizations. Vihao Pham 2014

Information Technology Services will be updating the mark sense test scoring hardware and software on Monday, May 18, We will continue to score

Chapter 1: Exploring Data

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Examples of Data Representation using Tables, Graphs and Charts

CHARTS AND GRAPHS INTRODUCTION USING SPSS TO DRAW GRAPHS SPSS GRAPH OPTIONS CAG08

AP Statistics Solutions to Packet 2

Directions for Frequency Tables, Histograms, and Frequency Bar Charts

Section 1.3 Exercises (Solutions)

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

IBM SPSS Statistics for Beginners for Windows

Mathematics Content: Pie Charts; Area as Probability; Probabilities as Percents, Decimals & Fractions

AP STATISTICS REVIEW (YMS Chapters 1-8)

Means, standard deviations and. and standard errors

Course Syllabus MATH 110 Introduction to Statistics 3 credits

Descriptive statistics; Correlation and regression

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

STAB22 section 1.1. total = 88(200/100) + 85(200/100) + 77(300/100) + 90(200/100) + 80(100/100) = = 837,

Example Research Scenarios

Fairfield Public Schools

Section 1.1 Exercises (Solutions)

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

GRAPHS/TABLES. (line plots, bar graphs pictographs, line graphs)

Intro to GIS Winter Data Visualization Part I

SPSS Manual for Introductory Applied Statistics: A Variable Approach

Measurement with Ratios

CHAPTER TWELVE TABLES, CHARTS, AND GRAPHS

3. There are three senior citizens in a room, ages 68, 70, and 72. If a seventy-year-old person enters the room, the

General instructions for the content of all StatTools assignments and the use of StatTools:

SURVEY DESIGN: GETTING THE RESULTS YOU NEED

Transcription:

Basics and Beyond: Displaying Your Data Mario Davidson, PhD Vanderbilt University School of Medicine Department of Biostatistics Instructor

Objectives 1.Understand the types of data and levels of measurement 2.Understand how a Table 1 typically looks 3.Be able to interpret all of the basic graphs. 4.Know the type of displays that may be used dependent upon the type of data and level of measurement 5.Be introduced to less familiar displays of the data

Types of Data (Obj1) Qualitative Data Consist of attributes, labels, or non-numerical entries. If you can t perform mathematical operations or order data, it s qualitative. Ex: Colors in a box of crayons; names; county Quantitative Data Consist of numerical measurements or counts. Ordering is a dead give away Ex: BMI; age; numerical grade

Levels of Measurement (Obj1) Nominal Qualitative Categorized using names, qualities, or labels Ex: Top 5 movies, jersey numbers, type of drug Ordinal Quantitative or Qualitative Can order Differences between data are not meaningful. Ex: Letter grade, Likert scale such as very dissatisfied to very satisfied

Levels of Measurement (Obj1) Interval Level of Measurement Quantitative Can order Can calculate meaningful differences No Value that means nothing/none. A zero entry merely represents a position on a scale (i.e. no inherent zero). Ex: Time of day, temperature Ratio Level of Measurement Quantitative Can order Can calculate meaningful differences There s a value that means nothing/none. Ex: Age, weight, test score

Popular Displays

Description of Table 1 (Obj2) Typically summarizes baseline characteristics of the data. Compares statistics between groups May provide means, medians, confidence intervals, percentiles, percentages, p-values, standard deviations, etc. Summaries of all types of data (e.g. continuous, categorical, nominal, ordinal, interval, ratio) may be used. Likert scale: Scale indicating degree of agreement (e.g. Rate the following statement: I have a had a difficult time focusing on my studies this semester: SD D N A SA

Example of a Table 1 (Obj2)

Test Your Knowledge Interpret the following graphs.

Test Your Knowledge Interpret the following graphs. Cherry or Apple Pies sold the most in January. Other pies sold the least Nearly 15 subjects chose Saturday as their favorite day. Sunday was the least chosen.

Pie Charts (Obj3) Features (Obj4) Nominal or Ordinal Compares Levels of One Characteristic Advantages: Easily Interpreted Larger Area; Greater Proportion Easy to Create Disadvantages Difficult to Judge Areas Wastes Ink

Bar Plots (Obj3) Features (Obj4) Nominal and Ordinal Compares Advantages Same as Pie Chart Disadvantages Similar to Pie Chart No such thing as an Analyte 2.5 Ordering can Change Perception

Test Your Knowledge Interpret the following graphs

Test Your Knowledge The most frequent BMI seems to be approximately around 24-26. There were 8 subject weighing approximately 0 grams. There was only one weighing 10 grams.

Histograms (Obj3) Features Shows Distribution Continuous One Characteristic (Obj4) Advantages Easy to Interpret Easy to Produce Disadvantages Size of Bins can Change Perception Cannot Read Exact Values

Dot Plot (Obj3) Features (Obj4) One Characteristic Ordinal Advantages Good for Small and Moderate Data Easily Interpreted Disadvantages May not be Best Option with Large Data Not Produced in all Packages

Stem and Leaf Plot (Obj3) Features (Obj4) One Characteristic Ordinal Advantages The most frequent USMLE1 scores in our data were in the 220's, 230's, and 260's. The highest and lowest scores were 190 and 278 respectively. Useful with Small Data and May be Used with Large Data Can be produce by hand Easily Interpreted Useful with Numeric Disadvantages May be Difficult to Measure Center Not Appealing

Test Your Knowledge Why is this graph difficult to interpret? What is the trend? What is the trend? An outlier is data that is a numerical distance from the rest. Can you find one?

Test Your Knowledge There is no y-label. R is a statistical software. From Jan-Dec, there is an upward trend. Seems to be a slight positive trend: as age increases so does POMS. The arrows suggest 2 possibly outliers.

Line Graph (Obj3) Features (Obj4) One Characteristic Used with Ordinal and Continuous Displays Associations, Trends, and Range Advantages Produced in Most Packages

Line Graph with Rugplot

Scatterplot (Obj3) Features (Obj4) Continuous and Ordinal Shows Associations Shows Trend Advantages Shows all of Data Produced in Most Packages not the Line Exact values shown Easily Interpreted Disadvantage May not be Best Way for Large Data

Less Familiar Graphs

Boxplot (Obj3 and Obj5) Features Advantages Continuous by Nominal or Ordinal (Obj4) May Compare Groups Good Summary: Min, 1Q, 2Q(median), 3Q, Max Disadvantages Does not Display All the Data Not as Appealing Cannot be Created in All Packages May not be as Recognized by Some

Boxplot The median tooth length for orange juice at 1dose of Vitamin C was roughly 25 units. The first quartile length for 1 dose of ascorbic acid was approx. 15. As Vitamin C doses increase tooth length increases. Overall, it appears that those using orange juice had greater length given the same dose and excluding possibly a Vitamin C dose of two. There was an outlier for the ascorbic acid at dose 1.

Boxplot Overlayed with Stripchart (Obj5) Features Same as Boxplot Advantages Same as Boxplot Can See All of the Data Disadvantage Many Programs Cannot Create

Dot Chart (Obj5) Features Nominal, Ordinal Characteristics with a Continuous Outcome (Obj4) Can Compare Levels and Groups Advantages Easily Interpreted Size of Data Irrelevant Disadvantage Not as Recognized as Bar Graphs and Pie Charts

Kaplan Meier Curve (Obj5) Demonstrates the probability of survival The plot suggests that males have a more favorable rate of survival over the years. Can be created in most programs Number at Risk

Probably Even Less Familiar Graphs

Spaghetti Plot (Obj5) Alzheimer's Disease Verbal IQ Words that could not be sounded out (e.g. Depot)

Spaghetti Plot Features (Obj4) The overall trend suggest that as age increases so do earnings. Continuous, Longitudinal Two Characteristics Shows Trend Advantages Shows all of the Data Disadvantages Not Available in All Packages May be Difficult to Interpret odf o ds ht ( s gni nr a E Age(yrs)

Dendogram: Cluster (Obj5) Useful for Determining Clustering May Help to Remove Variables (Data Reduction) PGY clustered Clinical Year

Scatter Plot with Marginal Histograms (Obj5) Continuous Virtually appealing Shows trends, associations, and the distributions of the data Cannot be created in many programs

Large Data Sets

Sunflower Plot (Obj5) Large data sets The more ink used, the more dense the data Ordinal More fresh embryos to the uterine were transferred on day 3.

Heat Map (Obj5) Encephalitis Red Proportion of Presence Green Proportion of Absence White Missing Light/Dark Intensity of Presence of Attribute

Heat Map Similar to the Hexagon Plot Lightness or Darkness Indicates Intensity May not be Created in Some Programs

Nomogram (Obj5) May Provide Risk, Probability, etc. Useful in Providing Predictive Scores Sum the Points for each category, find the Total Points, then look at the corresponding Risk of Death. 40 yo, Male, 200 Cholesterol, and 170 BP has Approximately a 48% Risk of Death

Multidimensional Plot (Obj5) http://data.vanderbilt.edu/rapache/bbplot/

Multidimensional Plot (Obj5)

Conclusion Always try to think of the best way to display your story (data). Consider your target audience. When publishing, color may cost.

References Hamid, et al. BMC Infectious Diseases 2010, 10:364. http://www.biomedcentral.com/1471-2334/10/364 Grober, E, Hall, CB, Lipton, RB, Zonderman, AB, Resnick, SM, and Kawas, C (2009). Memory impairment, executive dysfunction, and intellectual decline in preclinical Alzheimer's disease. Journal of the International Neuropsychological Society, 14(2), 266-278. http://data.vanderbilt.edu/rapache/bbplot/