F. Farrokhyar, MPhil, PhD, PDoc



Similar documents
Exploratory data analysis (Chapter 2) Fall 2011

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Variables. Exploratory Data Analysis

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Exercise 1.12 (Pg )

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Northumberland Knowledge

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Foundation of Quantitative Data Analysis

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Lecture 1: Review and Exploratory Data Analysis (EDA)

II. DISTRIBUTIONS distribution normal distribution. standard scores

Mind on Statistics. Chapter 2

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

AP * Statistics Review. Descriptive Statistics

Data Exploration Data Visualization

Summarizing and Displaying Categorical Data

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Diagrams and Graphs of Statistical Data

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

Chapter 1: Exploring Data

Using SPSS, Chapter 2: Descriptive Statistics

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

How To Write A Data Analysis

Describing, Exploring, and Comparing Data

Exploratory Data Analysis

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Basics of Statistics

MEASURES OF VARIATION

Module 4: Data Exploration

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Descriptive Statistics

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

DATA INTERPRETATION AND STATISTICS

Descriptive Statistics and Measurement Scales

Means, standard deviations and. and standard errors

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Statistics. Measurement. Scales of Measurement 7/18/2012

a. mean b. interquartile range c. range d. median

Describing and presenting data

Statistics Revision Sheet Question 6 of Paper 2

How To: Analyse & Present Data

3.2 Measures of Spread

Chapter 2 Data Exploration

MTH 140 Statistics Videos

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

2 Describing, Exploring, and

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Exploratory Data Analysis. Psychology 3256

Correlation and Regression

3: Summary Statistics

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

List of Examples. Examples 319

Week 1. Exploratory Data Analysis

Interpreting Data in Normal Distributions

Statistics Review PSY379

Descriptive Statistics

First Midterm Exam (MATH1070 Spring 2012)

Chapter 2: Frequency Distributions and Graphs

Exploratory Data Analysis

Measures of Central Tendency and Variability: Summarizing your Data for Others

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

430 Statistics and Financial Mathematics for Business

+ Chapter 1 Exploring Data

Scatter Plots with Error Bars

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

IBM SPSS Statistics for Beginners for Windows

STAT355 - Probability & Statistics

Introduction to Quantitative Methods

Shape of Data Distributions

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

Descriptive Statistics and Exploratory Data Analysis

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Topic 9 ~ Measures of Spread

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP Statistics Solutions to Packet 2


Dongfeng Li. Autumn 2010

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

Module 2: Introduction to Quantitative Data Analysis

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey):

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Lesson 4 Measures of Central Tendency

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Transcription:

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How to display data with numbers and tables To learn about measures of central tendency To learn about the measures of variation Descriptive and Inferential statistics? Descriptive statistics help us with the presentation, organization, and summarization of data. Inferential statistics allow us to make inferences from a sample of individuals to a larger population. What is data? Data is a set of information or observation about a group of individuals or subjects. This information is organized in the form of variables. A variable is any characteristic of a person or a subject that can be measured or categorized. Its value varies from individual to individual. Type of variables Qualitative or attribute variable Nonnumeric gender (male, female), type of injury (blunt, fall, burn, etc) Quantitative variable Numeric Discrete variable can assume only whole numbers no. of accidents, no. of injuries, no. of positive nodes Continuous variable may take any value, within a defined range: weight, age, blood pressure, level of cholesterol Level of measurement There are four levels of measurement: Nominal Ordinal. Interval. Ratio 1

Level of measurement cont d Nominal variable: consists of named categories with no order among the categories. - binomial ---- gender, mortality - multinomial ---- type of injury, blood type Level of measurement cont d Interval variable: has equal distances between values with no meaningful zero value. -IQ test - Temperature (0 o C does not represent absence of temperature Ordinal variable: consists of ordered categories, where the differences between categories cannot be considered to be equal. - Tumour stage 1,, 3, 4 - Likert scale excellent, very good, good, fair, poor Ratio variable: has equal intervals between values and a meaningful zero point. The ratio between them makes sense. - height, weight, laboratory test values Level of measurement Variable type: Nominal Ordinal. Interval. ratio Assumptions: Named categories Same as nominal plus ordered categories Same as ordinal plus equal intervals Same as interval plus meaningful zero Type of variables Dependent variable Is the outcome of interest, which changes in response to some intervention or exposure. - mortality, survival, post-op pain, quality of life Independent variable Is the explanatory variable that explains the changes in the dependent variable - demographics (age, gender, height), risk factors (diabetes, BP) Is the intervention or exposure variable that causes the changes in the dependent variable. - drug, surgery, radiation, smoking Independent (Explanatory) variables: Age, Sex, Pre-op pain Severity Independent (Comparison) variable Dependent/outcome variables: Changes in pain, Complication Describing Categorical data Graphs Bar charts Pie charts

Bar charts Bar Charts Used to display nominal or ordinal data. It is a series of separated bars. Bars represent frequency (counts) or relative frequency (percent or proportion) of each category. Used to display data for more than one group. Bar Charts Pie charts Used for nominal and ordinal data. Used to display relative frequency distribution. The circle is divided proportionally using relative frequency of each category. A pie chart is useful for showing data for one group but it is useless for illustration of two or more groups. Pie Charts Describing Categorical data Numerically Frequencies (counts) Relative frequencies (%) 3

Cross-tabulation of categorical data Type of surgery Open Laparoscopic Total Severity mild 4 (7%) 3 (0%) 7 (3%) moderate severe 6 (40%) 5 (33%) 7 (47%) 5 (33%) 13 (43%) 10 (33%) Describing quantitative data Graphs Histograms The five-number summary Boxplot Sex male female 7 (47%) 8 (53%) 4 (7%) 11 (73%) 11 (37%) 19 (63%) Histogram Histograms Used for interval and ratio data. A histogram is a graph in which each bar (horizontal axis) represent a range of numbers called interval width. The vertical axis represents the frequency of each interval. There are no spaces between bars. The frequencies are represented by the bar height and area of each bar Histogram is useful for graphic illustration of one group. Box plot: 5 number summary 100 th Maximum Q3 Median (Q) Q1 Box Plots Used for interval and ratio data. Uses the five-number summary measures Median, Q1, Q3, minimum and maximum. It is useful in detecting outliers It is useful to illustrate the distribution of more than on group. 1 st Minimum 4

Box plot of change in pain score Scatter plot Used to display the relationship between two continuous variables. Describing quantitative data Numbers Measures of central tendency mode, median, mean Measures of spread range, interquartile range, variance, standard deviation Mode Measures of central tendency Mode is the most frequent value the highest peak Used for nominal, ordinal, interval and ratio data. Could be more than one mode. Example: pain score 1, 4, 6, 8, 5, 6, 3,, 15 1,, 3, 4, 5, 6, 6, 8, 15 Median Measures of central tendency Median is the midpoint of the values after arranging the observations in order of size, from smallest to largest. There is a unique median for each dataset Used for interval and ratio data. It may not be necessarily equal to one of the sample values. Properties: It is resistant (insensitive) toward extreme values. It is useful for summarising skewed data. Mean Measures of central tendency Mean is the sum of sample values divided by the number of sample values --- n. It is useful for interval and ratio data. n x i i 1 X = = n 1+ + 3+ 4+ 5+ 6+ 6+ 8+ 15 = = 5.5 9 Example - 1,, 3, 4, 5, 6, 6, 8, 15 5

Properties of mean Measures of central tendency There is a unique mean for each dataset. All values are included in the computation. It is the only measure of central tendency where the sum of deviations of each value from the mean will always be zero. n ( X i - X ) i= 1 Normal curve Skewed curve The mean is sensitive toward extreme values. X Mean Median Mode Mean Median Mode Measures of Spread Range Interquartile range Variance Standard deviation Range Used mainly for interval or ratio data Range is the differences between the largest and smallest values in a dataset. Properties It uses only two values in its calculation. It is effected by extreme values. It is easy to understand. 1,, 3, 4, 5, 6, 6, 8, 15 ---- range = 14 Interquartile range Used mainly for interval and ratio data It is the distance between the third quartile (Q 3 ) and the first quartile (Q 1 ). Interquartile range = Q 3 Q 1 Interquartile range It is resistant (insensitive) to extreme values. It is useful for summarising skewed interval and ratio data. Arrange the observations from smallest to largest. Divide into 4 equal parts. Example, 1,, 3, 4, 5, 6, 6, 8, 15 1 st quartile (Q 1 ) = (+3)/ =.5 Median (Q ) = 5 3 rd quartile (Q 3 ) = (6+8) / = 7 Interquartile range = 7.5 = 4.5 6

Interquartile range Used to locate the outliers. What are outliers? Outliers are extreme data values that fall outside of distribution of the data set. 1.5 IQR Criterion for Outliers Interquartile range (IQR) is the distance between the first and third quartiles. IQR = Q 3 Q 1 From data Q 1 = 59 yrs, Q 3 = 70 yrs, IQR = 70 59 = 11 1.5 IQR = 1.5 11 = 16.5 Q 1 IQR = 59 16.5 = 4.5 Q 3 + IQR = 70 + 16.5 = 86.5 From data: Min= 44 and Max = 8 100 Box plot: 5 number summary th Outliers: 8 < 4.5 > 86.5 Q3 Median (Q) Q1 1 st 44 Variance Used for interval or ratio data Is the average of the squared deviations from the mean population variance σ n - ( x i x ) = i = 1 N sample variance n ( x i - x ) = n - 1 i = 1 Degrees of freedom measure the amount of information available in the data that can be to estimate σ. Here, the df is n-1 rather than n because we lose 1 df by estimating the sample mean. s Variance Properties All values are used in the calculation The units are not the same as data, they are the square of the original units Standard deviation is square root of variance sd = n (xi - x) n - 1 i=1 = 4.1 Example: 1,, 3, 4, 5, 6, 6, 8, 15 It is the average deviation from the mean in the same unit as the data. (1 5.5) + ( 5.5) + (3 5.5) +...+ (15 5.5) S = 9 1 = 17. 7

Uses of standard deviation Standard normal curve It is used for Empirical Rule. For any symmetrical distribution: About 68% of the observations will lie within 1 s.d. of the mean. About 95% of the observations will lie within s.d. of the mean. About 99.8% of the observations will lie within 3 s.d. of the mean. Summary of what we have learned. We report Mean with standard deviation Median with first and third quartiles Median with minimum and maximum Data type Graph Numerically Ratio and interval Histogram Box plot Scatter plot Mean with standard deviation Median with IQR, range Mode Ordinal data Bar chart Count and % Pie chart Median IQR, range mode Nominal Bar chart Pie chart Count and % mode 8