Descriptive statistical methods and comparison measures

Size: px
Start display at page:

Download "Descriptive statistical methods and comparison measures"

Transcription

1 Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Descriptive statistical methods and comparison measures PD Dr. C. Schindler Swiss Tropical and Public Health Institute University of Basel Annual meeting of the Swiss Societies of Neurophysiology, Neurology and Stroke, Lucerne, May 19 th

2 Contents Tabular representations Graphical representations Comparison measures for quantitative variables (difference in means, geometric mean ratio) Comparison measures for binary variables (risk difference, relative risk, odds ratio) Comparison measures for count data (incidence rate ratio) Non-parametric comparison measures (AUC) 2

3 General rules for tabulary and graphical representations Tables and Figures should be self-explanatory T + F: Title T + F: Caption F: clear axis titles with indication of units F: explanation of different graphical elements (colors, symbols, line types, etc.) 3

4 Tabular representations 4

5 Table 1 (longitudinal study report) Comparison of the different groups with respect to baseline characteristics (sex, age, etc., incl. baseline of the outcome variable) Qualitative variables: relative frequencies in % + absolute frequencies Quantitative variables: mean (standard deviation) 1 median (minimum maximum) 2 or (lower upper quartile) 2 1 if QQ-Plot does not deviate systematically from a straight line 2 if QQ-Plot shows clear curvature or wave pattern 5

6 Statistical properties of the normal distribution ~ 2/3 of all values (in fact: 68%) µ = mean σ = standard deviation µ - 2σ µ - σ µ µ + σ µ + 2σ 2.5% ~ 95% of all values (in fact: 95.4%) 2.5% µ - 2σ µ - σ µ µ + σ µ + 2σ 6

7 Huang HY et al., The Effects of Vitamin C Supplementation on Serum Concentrations of Uric Acid - Results of a Randomized Controlled Trial, ARTHRITIS & RHEUMATISM Vol. 52, No. 6, June 2005, pp DOI /art

8 Table 1 (cross-sectional study report) Description of the sample studied and comparison with persons not included in the sample (with respect to demographic characteristics and health-relevant variables.) Same rules as for table 1 of a longitudinal study report. 8

9 Alkerwi et al., Comparison of participants and non-participants to the ORISCAV-LUX populationbased study on cardiovascular risk factors in Luxembourg, BMC Medical Research Methodology 2010, om/content/pdf/ pdf 9

10 Graphical representations 10

11 Boxplot (box plot) Graphical representation of the distribution of a quantitative variable based on a few important measures (minimum, lower quartile, median, upper quartile, maximum). Outlying values are represented as individual points. 11

12 BMI in adults aged 30 to 70 years in Basel (SAPALDIA-study) 50 Body mass index upper fence* 3. quartile (75. percentile) median 1. quartile (25. percentile) lower fence* 0 Men sex Women *lower (upper) fence: smallest (largest) observation which is still within 1.5 box lengths of the lower (upper) end of the box. 12

13 Number of discharges as percentage of total number of patients, by day of week Wong HJ et al., Real-time operational feedback: daily discharge rate as a novel hospital efficiency metric, Qual Saf Health Care 2010;19:1-5 doi: /qshc

14 Bar charts 1. Representation of the distribution of a qualitative variable or of a quantitative variable with few values (e.g. parity of a woman). Each value of the variable is assigned a bar, whose height equals the absolute or relative frequency of the value. 2. Representation of group statistics (e.g., group means of the outcome variable) or of statistics of complex observational units (e.g., regions, hospitals, etc.) 14

15 Bar charts representing the distribution of a categorical variable relative frequency (%) Â B C D category Group 1 Group 2 Bars represent different categories (or levels) of the respective categorical variable. relaative frequency (%) D C B Â Heights of bars are proportional to the relative frequencies of the associated categories. Group 1 Group 2 15

16 Representation of group means by bar charts Here, bars represent group means and error intervals are mean ±1 standard error. (68%-confidence interval). 95%-confidence intervals would be better (mean ± 2 standard error) Smith HAB et al., Nitric oxide precursors and congenital heart surgery: A randomized controlled trial of oral citrulline, J Thorac Cardiovasc Surg 2006; 132:

17 Scatter plots z-score of lower extremity latency z-score of upper extremity latency Scatter plots serve to visualize the association between two numerical variables (here z-scores of upper and lower extremity latencies in RRMS and SPMS-patients) 17

18 Comparison measures a) for quantitative data b) for binary data c) for count data 18

19 Comparison measures for quantitative variables 19

20 Differences in means Application: Comparison of different groups with respect to a) Outcome of interest at follow-up and / or b) Change in outcome variable during follow-up. Example: Effect of vitamin C on serum uric acid level. Comparison measure: Difference between the mean change in serum uric acid level in the treatment group (vitamin C supplementation) and the mean change in serum uric acid level in the placebo group. 20

21 Huang HY et al., The Effects of Vitamin C Supplementation on Serum Concentrations of Uric Acid: results of a randomized controlled trial, Arthritis Rheum. 2005; 52:

22 Remarks The difference in the mean of an outcome variable between two independent samples is generally assessed using the t-test (validity condition: approximate normality and similar variability of the data in both groups or sufficiently large sample sizes.) If data have a skewed distribution (e.g., lab measurements), approximate normality of the data may often be achieved by a logarithmic transformation of the data (cf. next topic) But a data transformation is not always appropriate, e.g., if mean costs are to be compared. In this case, bootstrap methods or permutation tests may help to achieve valid statistical comparisons. 22

23 Geometric mean ratios In many cases, the original outcome has a skewed distribution. But, on a logarithmic scale, it becomes approximately normal. In this case, the data should first be log-transformed. Then the group means of the log-transformed data should be compared. Example: Neurofilament heavy chain protein in cerebrovascular fluid across healthy controls and different groups of MS-patients 23

24 NFH-protein concentration controls CIS PPMS SPMS RRMS ln(nfh-protein concentration) controls CIS PPMS SPMS RRMS Group median Geometric mean Controls 27.1 exp(3.30) = 27.1 CIS 32.9 exp(3.48) = 32.5 PPMS 47.8 exp(3.97) = 53.0 SPMS 51.2 exp(3.83) = 46.1 RRMS 43.4 exp(3.84) = 46.5 Group Mean Controls 3.30 CIS 3.48 PPMS 3.97 SPMS 3.83 RRMS

25 QQ-plots (of ln(nfh)) lognfh HC lognfh CIS Inverse Normal Inverse Normal lognfh PPMS Inverse Normal If points are close to a straight line, the distribution can be considered as approximately normal. lognfh RRMS lognfh SPMS Inverse Normal Inverse Normal 25

26 Geometric mean mathematical definition Let mean(ln(x)) denote the sample mean of a log-transformed variable ln(x). Then, after back-exponentiation, this mean turns into the so-called geometric mean of X: geometric mean of X = e mean(ln(x)) (*) If the distribution of ln(x) is approximately symmetrical, then the geometric mean of X is a good approximation of the median of X. (*) e u = exp(u) = Euler s exponential function (e = = Euler s number) 26

27 Geometric mean ratios Let mean 1 (ln(x)) = mean of ln(x) in sample 1 mean 2 (ln(x)) = mean of ln(x) in sample 2. Then, after back-exponentiation, the difference mean = mean 2 (ln(x)) mean 1 (ln(x)) turns into the so-called geometric mean ratio between the two samples e mean = e mean 2 (ln( X )) mean 1 (ln( X )) = e e mean mean 2 1 (ln( X )) (ln( X )) = GM GM 2 1 ( X ) ( X ) In many cases, this ratio is close to the ratio of medians. 27

28 Geometric mean ratios Group Mean log-scale Geometric mean Geometric mean ratio Mean difference log scale Controls 3.30 exp(3.30) = CIS 3.48 exp(3.48) = / 27.1 = 1.20 exp(x) 0.18 PPMS 3.97 exp(3.97) = / 27.1 = 1.96 exp(x) 0.67 SPMS 3.83 exp(3.83) = / 27.1 = 1.70 exp(x) 0.53 RRMS 3.84 exp(3.84) = / 27.1 = 1.72 exp(x) 0.54 Digression: 95%-confidence limits of geometric means: exp [ mean log scale ± 1.96 SE( mean log scale) ] 95%-confidence limits of geometric mean rations: exp [ mean log scale ± 1.96 SE( mean log scale) ] 28

29 Comparison measures for binary variables 29

30 Binary outcome variables X 1 = Treatment was effective in patient P X 1 = 1, if P was sucessfully treated, X 1 = 0, if the result of the treatment in patient P did not meet expectations X 2 X 3 = Subject P developed cancer during follow-up X 2 = 1, if this happened with P, X 2 = 0, if P did not develop cancer during follow-up = Patient P was satisfied with treatment X 3 = 1, if P expressed satisfaction, X 3 = 0, if P was not satisfied 30

31 Comparison measures for binary outcome variables A) Frequency or risk difference (RD) Difference in risks (relative frequencies) between the two groups B) Relative risk (RR) Ratio of risks (relative frequencies) between the two groups C) Odds ratio (RR) Ratio of odds* between the two groups Odds = risk : 1 risk 31

32 Risk and Odds (examples) Risk Odds 0.1 (10%) 10 / 90 = (20%) 20 / 80 = (50%) 50 / 50 = 1.0 For risks < 10%, odds and risks are essentially the same 0.6 (60%) 60 / 40 = (80%) 80 / 20 =

33 These comparison measures can be computed directly from the underlying 2 by 2 table with outcome exposed* 64 (80%) unexposed 72 (60%) without outcome 16 (20%) 48 (40%) RD = 64/80-72/120 = (96 72)/120 = 0.2 OR = 64/16 : 72/48 = (64 48) / (16 72) = 2.67 RR = 64/80 : 72/120 = (64 120) / (72 80) = 1.33 * exposed can also stand for a specific treatment, in which case subjects with the control treatment are said to be unexposed. 33

34 Intervention group (n = 80) (95%-conf. interval) Control group (n = 120) (95%- conf. interval) Risk Difference (95%- conf. interval) p-value Successful treatment 80% (71%, 89%) 60% (49%, 71%) 20% (8%, 32%) Satisfied patients 90% (83%, 97%) 80% (71%, 89%) 10% (<0%, 20%) 0.06 Relative Risk Successful treatment 80% (71%, 89%) 60% (49%, 71%) 1.33 (1.11, 1.60) Satisfied patients 90% (83%, 97%) 80% (71%, 89%) 1.13 (<1.00, 1.26) 0.06 Odds Ratio Successful treatment 80% (71%, 89%) 60% (49%, 71%) 2.67 (1.38, 5.15) Satisfied patients 90% (83%, 97%) 80% (71%, 89%) 2.25 (0.96, 5.30)

35 Why odds ratios? Odds ratios ratios are commonly used to describe associations between binary outcomes and predictor variables because: a) Unlike the relative risk, the odds ratio is a meaningful measure not only in cohort but also in case control studies. b) Logistic regression models provide effect estimates in the form of odds ratios. 35

36 How to interpret odds ratios? There are 3 possibilities: a) 1 < RR < OR b) OR < RR < 1 c) RR = 1 = OR Odds ratios are always farther away from 1 than the corresponding relative risks With low risks (i.e., risks < 10%), odds ratios may be interpreted as relative risks. 36

37 Comparison measures for count data 37

38 Count variables Examples Number of doctor s visits of a patient during a certain time period. Number of deaths within a specific region during a certain time period. Number of children with epilepsy manifesting in the first 5 years of life in Denmark

39 Incidence rate If observational units are individual persons: IR = number of events / length of the observation period If observational units are populations IR = number of events / person time observed Example: IR of epilepsy in first 5 years of life in Denmark: low birth weight: 361 / person years = 179 / 10 5 pyrs normal birth weight: 1342 / person years = 89 / 10 5 pyrs Sun et al., Gestational Age, Birth Weight, Intrauterine Growth and Risk of Epilepsy, Am J Epidemiol 2007; 167:

40 Incidence rate If the event is unique (e.g., death), then the period of observation of a person with this event equals the time between the beginning of the observation period and the event. observation period event incomplete observation without event time complete observation without event 40

41 Incidence rate ratio IRR = IR in group 2 / IR in group 1 ( = 179 / 89 = 2.01 ) 95%-confidence interval (approximative)* ± n 1 + n ± IRR e ( = 2.01 e 1 = (1.71, 2.37) ) n 1 = number of events in group 1 n 2 = number of events in group 2 * holds if n 1 and n 2 have a Poisson-distribution 41

42 Adjusted and unadjusted comparison measures In observational studies, but also in randomised trials with a remaining imbalance of certain factors, differences between groups may be confounded. E.g., the difference in mean blood pressure between normal and overweight persons is confounded by age (since both weight and blood pressure tend to increase with age). Without adjustment for the influence of age, the effect of overweight on blood pressure is therefore overestimated. There exist different statistical methods by which comparison measures can be rid of such confounding influences. -> stratification, standardization, regression models 42

43 Non-parametric comparison measures 43

44 Receiver Operating Characteristic-curve (True Positive Rate) Sensitivity AUC = 0.83 Outcome: Worsening of EDSS-score by > 0.5 units over 14 years Predictor: score involving z-values of latencies from eyes and upper extremities at baseline 1-Specificity (False Positive Rate) AUC = area under the curve 44

45 Area under the ROC-curve The ROC-curve of X as a predictor of membership in population 2 (as opposed to population 1) has the property AUC = proportion of pairs (x 1, x 2 ) with x 1 from group 1 and x 2 from group 2 satisfying x 2 > x *(proportion of pairs (x 1, x 2 ) with x 1 from group 1 and x 2 from group 2 satisfying x 2 = x 1 ) This is an estimate of the probability that a randomly selected member of population 2 will have a higher value of X than a randomly selected member of population 1. 45

46 AUC > 0.5 values of X are higher in group 2 than in group 1 AUC = 0.5 X does not discriminate between the two groups AUC < 0.5 values of X are lower in group 2 than in group 1! AUC can also be applied with ordinal variables and provides a natural way of comparing such variables. Moreover, AUC has a direct link to the Wilcoxon-rank sum test. A significant result of the Wilcoxon rank sum test is equivalent to a significant difference between AUC and

47 Summary: Tabular and graphical representations of distributions Basic rule: all such representations should be self explanatory Tables: categorical variables: relative (%) and absolute frequencies (n) numerical variables: mean ± SD (if normally distributed) median + quartiles or min / max (otherwise) Figures: Boxplots for numerical variables Bar charts for categorical variables Scatter plots to display association between two numerical variables (Normal probability plot for visual assessment of degree of normality of data distribution) 47

48 Summary: comparison measures Numerical variables: Difference in means (data normally distributed or no other measure wanted) Geometric mean ratio (data have log-normal distribution) Binary variables: Risk difference (or frequency difference) Relative risk Odds Ratio Count data: Incidence rate ratio Numerical and ordinal data: area under the ROC-curve All comparison measures always with 95%-confidence intervals! 48

49 Thank you for your attention! 49

Data Analysis, Research Study Design and the IRB

Data Analysis, Research Study Design and the IRB Minding the p-values p and Quartiles: Data Analysis, Research Study Design and the IRB Don Allensworth-Davies, MSc Research Manager, Data Coordinating Center Boston University School of Public Health IRB

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Guide to Biostatistics

Guide to Biostatistics MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

More information

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

More information

Appendix: Description of the DIETRON model

Appendix: Description of the DIETRON model Appendix: Description of the DIETRON model Much of the description of the DIETRON model that appears in this appendix is taken from an earlier publication outlining the development of the model (Scarborough

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1 BRM.1 The proportion of individuals with a particular disease who die from that condition is called... BRM.2 This study design examines factors that may contribute to a condition by comparing subjects

More information

Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice! Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!) Part A - Multiple Choice Indicate the best choice

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

PRACTICE PROBLEMS FOR BIOSTATISTICS

PRACTICE PROBLEMS FOR BIOSTATISTICS PRACTICE PROBLEMS FOR BIOSTATISTICS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION 1. The duration of time from first exposure to HIV infection to AIDS diagnosis is called the incubation period.

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

WHAT IS A JOURNAL CLUB?

WHAT IS A JOURNAL CLUB? WHAT IS A JOURNAL CLUB? With its September 2002 issue, the American Journal of Critical Care debuts a new feature, the AJCC Journal Club. Each issue of the journal will now feature an AJCC Journal Club

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

CHAPTER THREE. Key Concepts

CHAPTER THREE. Key Concepts CHAPTER THREE Key Concepts interval, ordinal, and nominal scale quantitative, qualitative continuous data, categorical or discrete data table, frequency distribution histogram, bar graph, frequency polygon,

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

More information

Erik Parner 14 September 2016. Basic Biostatistics - Day 2-21 September, 2016 1

Erik Parner 14 September 2016. Basic Biostatistics - Day 2-21 September, 2016 1 PhD course in Basic Biostatistics Day Erik Parner, Department of Biostatistics, Aarhus University Log-transformation of continuous data Exercise.+.4+Standard- (Triglyceride) Logarithms and exponentials

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Data Transforms: Natural Logarithms and Square Roots

Data Transforms: Natural Logarithms and Square Roots Data Transforms: atural Log and Square Roots 1 Data Transforms: atural Logarithms and Square Roots Parametric statistics in general are more powerful than non-parametric statistics as the former are based

More information

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences. 1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

Main Section. Overall Aim & Objectives

Main Section. Overall Aim & Objectives Main Section Overall Aim & Objectives The goals for this initiative are as follows: 1) Develop a partnership between two existing successful initiatives: the Million Hearts Initiative at the MedStar Health

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study. Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study Prepared by: Centers for Disease Control and Prevention National

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Statistics for Sports Medicine

Statistics for Sports Medicine Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota (suzanne.hecht@gmail.com) Fellow s Research Conference July 2012: Philadelphia GOALS Try not to bore you to death!! Try to teach

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

Randomized trials versus observational studies

Randomized trials versus observational studies Randomized trials versus observational studies The case of postmenopausal hormone therapy and heart disease Miguel Hernán Harvard School of Public Health www.hsph.harvard.edu/causal Joint work with James

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Methods for Meta-analysis in Medical Research

Methods for Meta-analysis in Medical Research Methods for Meta-analysis in Medical Research Alex J. Sutton University of Leicester, UK Keith R. Abrams University of Leicester, UK David R. Jones University of Leicester, UK Trevor A. Sheldon University

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1. General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1 DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1 OVERVIEW STATISTICS PANIK...THE THEORY AND METHODS OF COLLECTING, ORGANIZING, PRESENTING, ANALYZING, AND INTERPRETING DATA SETS SO AS TO DETERMINE THEIR ESSENTIAL

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Scatter Plots with Error Bars

Scatter Plots with Error Bars Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA

EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA A CASE STUDY EXAMINING RISK FACTORS AND COSTS OF UNCONTROLLED HYPERTENSION ISPOR 2013 WORKSHOP

More information

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

Mind on Statistics. Chapter 2

Mind on Statistics. Chapter 2 Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Statistics 305: Introduction to Biostatistical Methods for Health Sciences Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser

More information

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online

More information

a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams?

a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams? 1. Phone surveys are sometimes used to rate TV shows. Such a survey records several variables listed below. Which ones of them are categorical and which are quantitative? - the number of people watching

More information

1 Nonparametric Statistics

1 Nonparametric Statistics 1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

z-scores AND THE NORMAL CURVE MODEL

z-scores AND THE NORMAL CURVE MODEL z-scores AND THE NORMAL CURVE MODEL 1 Understanding z-scores 2 z-scores A z-score is a location on the distribution. A z- score also automatically communicates the raw score s distance from the mean A

More information

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data A Few Sources for Data Examples Used Introduction to Environmental Statistics Professor Jessica Utts University of California, Irvine jutts@uci.edu 1. Statistical Methods in Water Resources by D.R. Helsel

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION SOLUTIONS 1. a. To calculate the mean, we just add up all 7 values, and divide by 7. In Xi i= 1 fancy

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA Csilla Csendes University of Miskolc, Hungary Department of Applied Mathematics ICAM 2010 Probability density functions A random variable X has density

More information

Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

More information

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

More information

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

More information

Sponsor. Novartis Generic Drug Name. Vildagliptin. Therapeutic Area of Trial. Type 2 diabetes. Approved Indication. Investigational.

Sponsor. Novartis Generic Drug Name. Vildagliptin. Therapeutic Area of Trial. Type 2 diabetes. Approved Indication. Investigational. Clinical Trial Results Database Page 1 Sponsor Novartis Generic Drug Name Vildagliptin Therapeutic Area of Trial Type 2 diabetes Approved Indication Investigational Study Number CLAF237A2386 Title A single-center,

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information