Content DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS

Similar documents
STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Exploratory data analysis (Chapter 2) Fall 2011

Descriptive Statistics

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Lecture 1: Review and Exploratory Data Analysis (EDA)

Descriptive Statistics and Measurement Scales

II. DISTRIBUTIONS distribution normal distribution. standard scores

Exploratory Data Analysis. Psychology 3256

Variables. Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Week 1. Exploratory Data Analysis

Statistics. Measurement. Scales of Measurement 7/18/2012

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Data Exploration Data Visualization

Exercise 1.12 (Pg )

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Summarizing and Displaying Categorical Data

Diagrams and Graphs of Statistical Data

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Northumberland Knowledge

MEASURES OF VARIATION

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

How To Write A Data Analysis

Using SPSS, Chapter 2: Descriptive Statistics

Foundation of Quantitative Data Analysis

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Introduction to Quantitative Methods

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Module 2: Introduction to Quantitative Data Analysis

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Chapter 2: Frequency Distributions and Graphs

Means, standard deviations and. and standard errors

Exploratory Data Analysis

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

How To: Analyse & Present Data

Sta 309 (Statistics And Probability for Engineers)

AP * Statistics Review. Descriptive Statistics

Describing, Exploring, and Comparing Data

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

DATA INTERPRETATION AND STATISTICS

Describing and presenting data

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Statistics Review PSY379

Statistics Chapter 2

Introduction to Statistics and Quantitative Research Methods

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Module 4: Data Exploration

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

3: Summary Statistics

Descriptive statistics parameters: Measures of centrality

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Describing Data: Measures of Central Tendency and Dispersion

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

Descriptive Statistics

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Introduction; Descriptive & Univariate Statistics

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

IBM SPSS Statistics for Beginners for Windows

Descriptive Statistics

Descriptive Statistics and Exploratory Data Analysis

A Picture Really Is Worth a Thousand Words

Analyzing Research Data Using Excel

Data exploration with Microsoft Excel: univariate analysis

Dongfeng Li. Autumn 2010

Measures of Central Tendency and Variability: Summarizing your Data for Others

Quantitative Methods for Finance

Module 3: Correlation and Covariance

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

2. Filling Data Gaps, Data validation & Descriptive Statistics

Mind on Statistics. Chapter 2

Measurement with Ratios

EXPLORING SPATIAL PATTERNS IN YOUR DATA

CHAPTER THREE. Key Concepts

Basics of Statistics

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Chapter 2 Data Exploration

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

Interpreting Data in Normal Distributions

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES


Geostatistics Exploratory Analysis

The correlation coefficient

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

Demographics of Atlanta, Georgia:

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

MEASURES OF LOCATION AND SPREAD

Projects Involving Statistics (& SPSS)

Transcription:

Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction to statistics Descriptive vs. inferential statistics Variables Types of variables Organizing and displaying data for categorical variables Organizing and displaying data for categorical variables Data & Statistic INTRODUCTION TO STATISTICS DATA VS. STATISTIC VS. STATISTICS Data: A collection of items of information. Statistic : A summary of value of some attribute of a sample, usually but not necessarily as an estimator of some population parameter. Is calculated by applying a function to the values of the items of the sample (Porta, M. (2014). A Dictionary of Epidemiology: Oxford University Press, USA) Statistics The science of collecting, summarizing, and analyzing data. Data may or may not subject to random variation. The data themselves and summarizations of the data. Porta, M. (2008). A Dictionary of Epidemiology: Oxford University Press, USA A Branch of applied mathematics concerned with the collection and interpretation of quantitative data and the use of probability theory to estimate population parameters. http://www.thefreedictionary.com Example: Data; ID Gender Height (m) 1 Male 1.67 2 Male 1.73 3 Female 1.61 4 Male 1.63 5 Female 1.57 6 Female 1.62 7 Female 1.53 1

Example: Statistic; 4 (57.1%) Female, 3 (42.9%) Male Mean height = 1.62m Standard deviation for height = 0.06m Statistics The process of calculating the statistic. How to calculate the frequency and percentage for gender and how to calculate mean and standard deviation for height. Why use statistics? Modern society concern with reading & writing Statistics in used to make the strongest possible conclusions from limited amount of data. A more thorough understanding of research literature will lead to improves patient care. Descriptive statistics BRANCHES OF STATISTICS DESCRIPTIVE VS. INFERENTIAL Describe and summarize dataset Involves collection, organization, analysis, interpretation and presentation of sample data Can be presented in tables, graphs or narrative format Descriptive statistics How to describe this population? Purpose Describe the characteristics of study participants Understand the data Answer the research questions in descriptive study Detect outliers or extreme values 2

How to describe this population? samples Describe samples Descriptive statistics Frequency distribution Measures of central tendency Measures of dispersion Measures of position Exploratory data analysis Measures of shape of distribution: graphs, skewness, kurtosis Inferential statistics Estimation Hypothesis testing reach a decision Parametric statistics Non-parametric statistics (distribution free statistics) Modelling, predicting. How to make conclusion from this population? How to make conclusion from this population? samples Inferential statistics VARIABLE Infer findings to population 3

Y axis: Dependent variable Variables Any quantity that have different values across individuals or other study units. (Porta, M. (2014). A Dictionary of Epidemiology: Oxford University Press, USA) Variables Independent Dependent Variables Independent variable A variable that is hypothesized to influence an event or state (the dependent variable) The independent variable is not influenced by the event but may cause (or contribute to the occurrence of) the event, or contribute to change the (psychological, environmental, socioeconomic) status. Variables Dependent variable A variable the value of which is dependent on the effect of another variable(s) the independent variable(s) in the relationship under study. A manifestation or outcome whose variation we seek to explain or account for by the influence of independent variables. Variables Effect of sunlight to plant growth Variables Variables Effect of sunlight to plant growth Effect of sunlight to plant growth Independent variable Dependent variable X axis: Independent variable 4

Variables Controlled variable(s) Everything you want to remain constant and unchanged during the study period Example: Investigating effect of sunlight exposure duration (hours/day) to plant growth Independent variable: Duration of sunlight exposure Dependent variable: Plant height Controlled variable: type of plant, size of pot, amount of water, type of soil etc. TYPES OF VARIABLES MEASUREMENT SCALE Measurement scale Classification of data Different types of scale are measured differently Knowledge about the measurement scale/data helps in deciding how to organize, analyse and present the data. Four fundamental scale ; Nominal Ordinal Interval Ratio Nominal Categorical (qualitative) Ordinal Data Numerical (quantitative) Interval Ratio Less info More Info Categorical data: Nominal scales Names or categories, mutually exclusive Does not imply any ordering of responses Example; Sex: Male, Female Race: Malay, Chinese, Indian, Others Lowest and least informative level of measurement Categorical data: Ordinal scales Names or categorizes which are mutually exclusive and the order is meaningful Example; Severity: mild, moderate, severe Socioeconomic status: Low, Middle, High Limitation; Can t assume the differences between adjacent scale values are equal Can t make this assumption even if the labels are number 5

Numerical data: Interval scales Interval scales Names or categorizes, the order is meaningful, the intervals are equal. Example; Fahrenheit temperature scale Celsius temperature scale Problem: No true zero point (Zero point is arbitrary) Zero does not mean complete absence of temperature Numerical data: Ratio scales Ratio scales Highest and most informative scale Contains the qualities of the nominal, ordinal and interval scale with the addition of an absolute zero point. Example: Amount of money Age Blood pressure The values were able to be multiple or divide Zero in Kelvin scale is absolute absence of thermal energy. Kelvin scale is therefore considered as ratio scale. Numerical data Interval and ratio variables are sometime indistinguishable, and handled the same way in data analysis. Both can be converted to categorical data Converting numerical to categorical data causes lost of information Summary of data types and scale measurement Provides Nominal Ordinal Interval Ratio Counts/frequency of distribution Mode, median The order of values is known Can quantify the difference between each value Can add or subtract values Can multiple and divide values Has true zero 6

ORGANIZING & DISPLAYING DATA FOR CATEGORICAL VARIABLE Organizing & displaying data for categorical variable Table: Frequency table Frequency Relative frequency (percentage) Cumulative frequency (cumulative percentage) Graphical: Bar chart Pie chart Output from SPSS Frequency table Bar chart Characteristics; 1. Y axis represent frequency 2. X axis represent categorical variables 3. Equal width of bars 4. Bars separated by equal gaps 5. Height represent frequency or percent Pie chart Characteristics; 1. Size of slice represent frequency or percent 2. Each piece of slice represent ach category 3. Combination of all slices must add up to 100% Excellent graphical presentation of data Accuracy: proper data entry, not misleading, distortion or susceptible to misinterpretation Clarity: The ideas and concept conveyed are clearly understood Simplicity: Straight forward, avoid gridlines or odd lettering Appearance: should be appealing Well-designed structure: pattern highlighted, letterings are horizontal 7

ORGANIZING & DISPLAYING DATA FOR NUMERICAL DATA Organizing & displaying data for numerical data Central tendency Dispersion Exploratory data analysis 1. Stem & leaf displays 2. Box and whisker plots Frequency 1. Histogram 2. Frequency polygon 3. Cumulative frequency Shape of distribution Measures of central tendency 1. Mean 2. Median 3. Mode Measures of central tendency 1. Mean Sample average Sum all values, divided by the number of values Sensitive to extreme values n X i i X 1 Example: n What is the mean height of these 9 students? id 1 2 3 4 5 6 7 8 9 height (cm) 167 176 174 172 170 162 171 171 169 Measures of central tendency 2. Median Middle value Not sensitive to extreme value Used to summarize a skewed data When n is odd, median=[(n+1)/2]th value When n is even, median=average of (n/2)th and [(n/2)+1]th value Measures of central tendency 2. Median Example: What is the median height of these 9 students? id 1 2 3 4 5 6 7 8 9 height (cm) 167 176 174 172 170 162 171 171 169 8

Measures of central tendency 2. Median Example: What is the median height of these 9 students? Measures of central tendency 3. Mode Observation that occur most frequently Less useful in describing data N=9, median = (9+1)/2th value = 5 th value 1 2 3 4 5 6 7 8 9 sort 162 167 169 170 171 171 172 174 176 Measures of dispersion 1. Range 2. Variance 3. Standard deviation 4. Coefficient of variation 5. Inter quartile range Measures of dispersion 1. Range Largest value smallest value (max-min) Sensitive to extreme values Measures of dispersion 2. Variance Measures the amount of spread or variability of observation from mean The sample variance (s 2 )=the average of the square of the deviations about the sample mean (population variance= 2 ) Not used in descriptive statistics because difficulty in interpreting a square unit of data. s 2 n i1 ( X X ) 1 n 1 2 Measures of dispersion 3. Standard deviation Square root of variance Most widely used and better measure of variability The smaller the value, the closer to the mean Sensitive to extreme values s n i1 ( X X ) 1 n 1 2 9

Measures of dispersion 4. Coefficient of variation Ratio of the standard deviation to the mean Expressed as percentage Also known as relative standard deviation Shows the extent of variability in relation to the mean. s CoV X Hands-on Calculate/find the range, variance, standard deviation and coefficient of variation for numerical variables in the given data file. (5 minutes) id 1 2 3 4 5 6 7 8 9 height (cm) 167 176 174 172 170 162 171 171 169 Measures of dispersion 4. Inter quartile range: Data can be divided into quarter or four equal parts; Q1=25 th percentile Q2=50 th percentile Q3=75 th percentile IQR is the distance from Q1 to Q3 Measures of dispersion 4. Inter quartile range: The most common inter percentile measure Not sensitive to extreme values (outliers) Usually described together with median in skewed distribution observation Min Max In SPSS In SPSS 10

Exploratory data analysis 1. Stem & leaf displays 2. Box and whisker plots GRAPHICAL VISUALIZATION/ PRESENTATION FOR NUMERICAL DATA Exploratory data analysis Stem & leaf displays Allows easier identification of individual values in the sample id 1 2 3 4 5 6 7 8 9 height (cm) 167 176 174 172 170 162 171 171 169 height Stem-and-Leaf Plot Frequency Stem & Leaf 1.00 Extremes (=<162) 2.00 16. 79 5.00 17. 01124 1.00 17. 6 Stem width: 10 Each leaf: 1 case(s) Exploratory data analysis Box and whisker plots Graphical display of percentile Also known as 5 number summary plot (min, Q1, Q2, Q3, max) Provide information on central tendency and variability of the middle 50% of the distribution Box represent 25 th to 75 th percentile Exploratory data analysis Box and whisker plots Observation >1.5 times IQR away from the edge of the box is/are the outlier(s) Observation >3 times IQR away is/are the extreme outlier(s) Whisker are made of smallest and largest value outside the outliers Continuous data in multiple groups can be displayed side by side Exploratory data analysis Box and whisker plots 11

Exploratory data analysis Box and whisker plots Measures of frequency of distribution: Graphs 1. Histogram 2. Frequency polygon 3. Cumulative frequency Measures of frequency of distribution: Graphs Histogram Graphical representation of the frequency distribution of a variable. Bar height represent frequency or percent Bar width represent the interval class No gap between the interval class Gives us idea of the distribution: normal distribution or skewed Measures of frequency of distribution: Graphs Histogram Measures of frequency of distribution: Graphs Frequency polygon A graph that displays the data using lines to connect points plotted for the frequency The frequency represent the heights of the vertical bars in the histogram Measures of frequency of distribution: Graphs Frequency polygon 12

Measures of frequency of distribution: Graphs Cumulative frequency Used to determine the number of observation that lie below or above a particular value Calculated using a frequency distribution table Can be constructed from stem and leaf plots or directly from data Measures of frequency of distribution: Graphs Cumulative frequency Measures of shape of distribution Skewness Kurtosis Measures of shape of distribution Skewness: measure of asymmetry of a distribution around its mean. Graphically examined by plotting normal curve on histogram Negative skewness: left tail is more pronounced than the right tail Positive skewness: right tail is more prominent than the left tail. Measures of shape of distribution Skewness: Measures of shape of distribution Kurtosis; Relative peakness or flatness of a distribution compared with the normal distribution. Visualised by plotting a normal curve on histogram Types; Distribution with a high peak: leptokurtic Distribution with a flat-topped curve: platykurtic Normal distribution: mesokurtic 13

Measures of shape of distribution Kurtosis; HOW TO PRESENT General rule Can be presented in either graphical, table or text format Categorical variable: n (%) Numerical variable: Symmetric data: mean (standard deviation) Skewed data: median (IQR) How to decide symmetric or skewed? Statistical Mean = median = mode Skewness Kurtosis Kolmogorov-Smirnov test (p>0.05) Shapiro Wilk test (P>0.05) How to decide symmetric or skewed? Graphical Histogram Stem and Leaf plot Box and whisker plot Table presentation Table 1: Characteristic of study participants (n=30) Variable Mean (SD) n (%) Age (yrs) Sex Female Male Race Malay Chinese Indian Education Primary Secondary Tertiary BMI (kg/m 2 ) DBP (mmhg) SBP (mmhg) *median (IQR) 14

THANK YOU. 15