# Biostatistics: Types of Data Analysis

Size: px
Start display at page:

Transcription

1 Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu Theresa A Scott, MS (Vandy Biostats) Data Analysis 1 / 29 Goals of data analysis (1) To describe (summarize) the population of interest by describing what was observed in the (study) sample. Employs descriptive statistics, which involves Summarizing continuous variables using the mean, standard deviation, range, and percentiles (including the median). Summarizing categorical variables using raw and relative frequencies. (2) To use patterns in the (study) sample data to draw inferences about the population represented. Employs inferential statistics, which involves Confidence intervals. Hypothesis tests & p-values. Correlation and determining associations. Determing relationships, estimating effects, and making predictions using regression analysis. Theresa A Scott, MS (Vandy Biostats) Data Analysis 2 / 29

2 Recall some key terms Population of interest and (study) sample. Key to being able to generalize your findings to the population how representative your study sample is of the population. Roles of collected variables: Outcome Predictor Confounder Additional descriptor Types of collected variables: Continuous, which includes discrete numeric. Categorical, which includes binary and ordinal. Definitions given in the Biostatistics and Research lecture. Theresa A Scott, MS (Vandy Biostats) Data Analysis 3 / 29 Revisiting specific aim(s)/objective(s) Nice if the wording of the specific aim(s)/objective(s) conveys the statistical analysis that is/will be used. Some examples: To describe the distributions of risk factors among a cohort of women with breast cancer. To compare the presentation, evaluation, diagnosis, treatment, and follow-up of... To estimate the incidence of skin cancer among elderly smokers and non-smokers. To determine whether a significant association exists between cigarette smoking and pancreatic cancer. To determine the effect of X on Y once adjusted for Z. To predict the probability of surviving one-year post-surgery... Theresa A Scott, MS (Vandy Biostats) Data Analysis 4 / 29

3 Descriptive Statistics Theresa A Scott, MS (Vandy Biostats) Data Analysis 5 / 29 Summarizing individual continuous variables Mean (average) ± standard deviation (SD). SD = measure of variability (dispersion) around the mean. Empirical rule: If the distribution of a variable approximates a bell-shaped curve (ie, is normally distributed), approximately 95% of the variable s values lie within 2 SDs of the mean. Both influenced by outliers bad descriptors if not bell-shaped. Range (minimum and maximum values). Also influenced by outliers. Percentiles values that divide an ordered continuous variable into 100 groups with at most 1% of the values in each group. The p-th percentile is the value that p% of the data are less than or equal to (ie, p% of the data lie below it). Follows that (100-p)% of the data lie above it. Theresa A Scott, MS (Vandy Biostats) Data Analysis 6 / 29

4 Continuous variables, cont d Percentiles, cont d: Example: if the 85% percentile of household income is \$60,000, then 85% of households have incomes of \$60,000 and the top 15% of households have incomes of \$60,000. Not influenced by outliers great descriptors no matter shape. Good 3-number summary: lower quartile (25th percentile), median (50th percentile), and the upper quartile (75th percentile), which describe Central tendency = median (ie, the value in the middle when the data is arranged in order). Spread = difference between the upper and lower quartiles (ie, the inter-quartile range, IQR). Symmetry (ie, skewness) compare the difference between the upper quartile and the median with the difference between the median and the lower quartile. Theresa A Scott, MS (Vandy Biostats) Data Analysis 7 / 29 Summarizing individual categorical variables Raw and relative frequencies Raw: counts; number of times a particular value is obtained in the sample. Relative: proportions or percentages; frequency of a particular value divided by the total number of observations. Often presented in a frequency table. Example: Distribution of blood types in a sample of 25 people. Blood Type % (N) A 20% (5/25) B 32% (8/25) O 32% (8/25) AB 16% (4/25) Theresa A Scott, MS (Vandy Biostats) Data Analysis 8 / 29

5 Summarizing combinations of variables Calculate the median, quartiles, mean, SD, etc of a continuous variable among the subsets with each value of a categorical variable. Cross-tabulate two (or more) categorical variables using contingency tables (allow you to report marginal frequencies). Example: Height and race of a sample of N = 2735 subjects receiving either a new drug or placebo. 1 Drug Placebo N N = 2165 N = 570 Weight (lbs) ±50 ( ) 188±48 ( ) Race 2696 Afr American 41% (868/2134) 38% (215/562) Caucasian 47% (996/2134) 50% (283/562) Other 13% (270/2134) 11% ( 64/562) 1 The number of missing values is inferred via the N column and the denominator frequencies. Theresa A Scott, MS (Vandy Biostats) Data Analysis 9 / 29 Inferential Statistics Theresa A Scott, MS (Vandy Biostats) Data Analysis 10 / 29

6 Confidence intervals Wish to calculate/determine a characteristic of or a fact about a population a population parameter. Because impossible to collect data from the entire population, must use the data collected in the sample to estimate the population parameter a point estimate. Unlikely that the value of the point estimate will be equal to the value of the population parameter because have only collected one sample from the population. Therefore, the value of the point estimate is used to construct an interval estimate for the population parameter. Will be able to state, with some confidence, that the population parameter lies within this interval thus, a confidence interval. Theresa A Scott, MS (Vandy Biostats) Data Analysis 11 / 29 Confidence intervals, cont d Definition: a y% confidence interval (CI) for an unknown population parameter Y is an interval calculated from sample values by a procedure such that if a large number of independent samples is taken, y% of the intervals obtained will contain Y. Most often report 95% confidence intervals. Interpretation via an example: We are 95% confident that mean total cholesterol on this new statin will be 10 to 20 mg/dl lower than on the old formulation. CANNOT STATE: There s a 95% probability that mean total cholesterol on this new statin will be 10 to 20 mg/dl lower than on the old formulation. A narrow 95% CI indicates high precision, whereas a wide 95% CI indicates inadequate sample size. Theresa A Scott, MS (Vandy Biostats) Data Analysis 12 / 29

7 Hypothesis testing Wish to test a hypothesis about the value of a population parameter eg, that it equals a specific value. In order to do so, we sample the population and compare our observations with theory. If the observations disagree with the theory, the hypothesis is rejected. If not, we conclude either that the theory is true or that the sample did not detect the difference between the real and hypothesized values of the population parameter. IMPORTANT: Hypothesis testing involves proof by contradiction. Support for our hypothesis is obtained by showing that the converse is false. Theresa A Scott, MS (Vandy Biostats) Data Analysis 13 / 29 Hypothesis testing, cont d Elements of a hypothesis test: 1 Null hypothesis (H 0 ): hypothesis under test; referred to as the straw man something set up solely to be knocked down. 2 Alternative hypothesis (H a ): usually the hypothesis we seek to support on the basis of the information contained in the sample. 3 Test statistic: a function of the sample measurements upon which the statistical decision will be based. 4 Rejection region: the values of the test statistic for which the null hypothesis is rejected. Decision: If for a particular sample, the computed value of the test statistic falls in the rejection region, we reject the null hypothesis and accept the alternative hypothesis. Theresa A Scott, MS (Vandy Biostats) Data Analysis 14 / 29

8 Hypothesis testing, cont d Question: How do we choose the values that mark the boundaries of the rejection region? Answer: Determined by the choice of α the probability of rejecting the null hypothesis when the null hypothesis is true (ie, the probability of a Type I error). The value of α is also called the significance level of the test. Although small values of α are recommended, the actual size of α to use in an analysis is chosen somewhat arbitrarily. Two commonly used values are α = 0.05 and α = NOTE: Type II error can also be made failing to reject the null hypothesis when it is false. β (probability of a Type II error) and power (1-β; probability of correctly rejecting the null hypothesis) are discussed in the Sample Size lecture. Theresa A Scott, MS (Vandy Biostats) Data Analysis 15 / 29 Hypothesis testing, cont d Still have a problem: One person may choose to implement a hypothesis test with α = 0.05, whereas another person might prefer α = In turn, it is possible for these 2 people to analyze the same data and reach opposite conclusions one concluding that the null hypothesis should be rejected at α = 0.05; the other deciding the null hypothesis should not be rejected with α = Solution: Calculate the p-value the smallest level of significance α for which the observed data indicate that the null hypothesis should be rejected (ie, the attained significance level). The smaller a p-value becomes, the more compelling the evidence that the null hypothesis should be rejected. NOTE: a smaller p-value does not indicate greater significance. Theresa A Scott, MS (Vandy Biostats) Data Analysis 16 / 29

9 Hypothesis testing, cont d Assuming a specific value of α, the p-value can be used to implement an α-level hypothesis test: If the p-value α, then you reject the null hypothesis. If the p-value > alpha, then you fail to reject the null hypothesis null hypothesis is never accepted. REMEMBER: P-values provide evidence against a hypothesis, never evidence in favor of it. Quickly, back to hypotheses: The null hypothesis is usually stated as the absence of a difference or an effect. The alternative hypothesis can be one- or two-sided. Two-sided states only that a difference/effect exists (ie, the effect 0). One-sided specifies the direction of the difference/effect (ie, the difference between group A and group is > 0.) Theresa A Scott, MS (Vandy Biostats) Data Analysis 17 / 29 Hypothesis testing, cont d Example: H 0 : There is no difference between the true mean reaction times (to a stimulus) for men and women. x men = x women x men x women = 0. H a : There is a difference (ie, x men x women 0). Data: Independent random samples of 50 men and 50 women x ± SD is 3.6±0.18 seconds for the men, while 3.8±0.14 seconds for the women. Based on the observed data, p-value = Thus, if α = 0.05, we reject the null hypothesis and conclude that there is a significant difference between the true mean reaction times for men and women. However, if α = 0.01, we fail to reject the null hypothesis and conclude that we failed to find a significant difference. Theresa A Scott, MS (Vandy Biostats) Data Analysis 18 / 29

10 Hypothesis testing, cont d Problems with p-values: Statistical significance does not indicate clinical significance. A small p-value by itself only tells half the story it gives you no information about magnitude (and depending, direction). You can t make any conclusion from a large p-value (only perhaps that your sample size was too small). Absence of evidence is not the evidence of absence. Alternatives: Report estimated confidence intervals (CIs) in addition to p-values can glean clinical significance. Perform hypothesis tests with CIs look to see whether the CI contains the null value. Example: With a CI for a difference, does it contain 0? Example: With a CI for an odds ratio, does it contain 1? Theresa A Scott, MS (Vandy Biostats) Data Analysis 19 / 29 Correlation A quantitative measure of association between 2 continuous variables (ie, the degree to which they change together). Pearson correlation describes the direction and relative strength of a linear relationship. Always between -1 and 1. The closer it is to ±1, the closer to a perfect linear relationship. Can use hypothesis test to determine if significant (ie, 0). Spearman s rank correlation does not assume a linear relationship; only a monotonic one when X increases, Y always increases or stays flat, or Y always decreases or stays flat. Can also use hypothesis test to determine if significant. IMPORTANT: Neither is an estimate of the slope. Correlation Causation Correlation Agreement Theresa A Scott, MS (Vandy Biostats) Data Analysis 20 / 29

11 Determining if an association exists Use tests of association to determine if two variables (one continuous and one categorical or both categorical) are independent. Two variables are associated if one variable affects the value/distribution of the values of the other. Example: Association between race (categorical predictor) and presence of disease (categorical outcome). Testing for a difference in the distribution of a variable between groups testing for an association between a group (predictor) variable and an outcome variable. Example: Difference in the distribution of cholesterol (continuous outcome) between genders (categorical predictor). Also includes paired data (eg, difference in the distribution of test scores before and after a didactics class). Theresa A Scott, MS (Vandy Biostats) Data Analysis 21 / 29 Determining if an association exists, cont d Incorporates hypothesis testing: Null hypothesis proposes the variables are independent. Example: There is no difference in the frequency of drinking well water between subjects who develop peptic ulcer diseases and those who do not. Alternative hypothesis proposes that they are associated. Can be either one-sided (eg, drinking well water is more common among subjects who develop peptic ulcers) or two-sided (eg, the frequency of drinking well water is different in subjects who develop peptic ulcers). If test s p-value α (eg, 0.05), conclude that the two variables are significantly associated. p-value > α does not mean that there is no association in the population; only means you failed to find one in your study sample. Theresa A Scott, MS (Vandy Biostats) Data Analysis 22 / 29

12 Determining if an association exists, cont d When testing for a difference in a continuous outcome, commonly used tests of association (ie, t-test) assume the continuous outcome is normally distributed parametric tests. Non-parametric tests don t assume normality; test is performed on the raw values converted to ranks. If normality holds, a non-par test is 95% as efficient as its par equivalent. If normality does not hold, non-par test can be arbitrarily more efficient and powerful than its par equivalent. Result of par test (ie, p-value) can be highly influenced by outliers; non-par tests are not (because based on ranks). Sometimes see others use tests/graphics to assess normality and run a par or non-par test based on the result. Not a good approach test of normality may not have adequate power to detect non-normality. Theresa A Scott, MS (Vandy Biostats) Data Analysis 23 / 29 Determining if an association exists, cont d Available tests of association: Purpose Type of outcome Test to use Compare paired Continuous Paired t-test responses [Wilcoxon signed-rank test] Compare 2 Continuous Student s t-test (independent) groups [Wilcoxon rank-sum/ Mann-Whitney U test] Compare >2 Continuous 1-way ANOVA (independent) groups [Kruskal Wallis test] Compare 2 Categorical Chi-square test (independent) groups ( 2 levels) (Fisher s exact test when cell counts <5) Group defined by a categorical variable ( 2 levels). Non-parametric equivalent test given in [ ] brackets. Theresa A Scott, MS (Vandy Biostats) Data Analysis 24 / 29

13 Determining if an association exists, cont d IMPORTANT: Association Causation. Limitations to just performing tests of association: Blur the distinction between statistical & clinical significance. Possible for a difference of little clinical importance to achieve a high degree of statistical significance. Cannot conclude clinical relevance from small p-value. Very often, not only would you like to determine if an association or difference exists, but would also like to estimate the magnitude and direction/shape of the effect. Can only look at one pair of variables at a time (a single predictor variable and the primary outcome variable); cannot incorporate confounders. (True of correlation too!) Cannot incorporate other possible complexities of your data (eg, repeated measurements within subjects and cluster sampling). Theresa A Scott, MS (Vandy Biostats) Data Analysis 25 / 29 Regression analysis Determining relationships and making predictions an extension of testing for associations: Allows you to estimate the significance, direction/shape, and magnitude of the effect of 1 predictor variables on the outcome variable ie, determine if the outcome is significantly affected by 1 of the predictors. Allows you to incorporate confounders by adjusting the predictor-outcome association for the predictor-confounder and confounder-outcome relationships. Results may be used to predict the outcome of subjects that were not sampled but are from the same population. Specific regression analysis used based on type of outcome Multiple Linear (continuous), Logistic (binary), Proportional Odds (ordinal), or Cox Proportional Hazards (time to event). Theresa A Scott, MS (Vandy Biostats) Data Analysis 26 / 29

14 Regression analysis, cont d IMPORTANT: All regression analyses make assumptions (eg, the observations are independent; variance of the error is constant). Must assess whether the assumptions are violated. All regression analyses (by default) assume a linear relationship between each continuous predictor and the outcome. Most can be extended to fit non-linear relationships (eg, by using restricted cubic splines). Interpretation of effect estimates depends on type of regression: Linear example: Holding all other predictors constant, the outcome increases/decreases Y units per 1-unit increase of X. Logistic example: The odds of the outcome occurring are Y times higher/lower for group X1 compared to group X2, holding all other predictors constant. 95% CIs should be reported for all effect estimates. Theresa A Scott, MS (Vandy Biostats) Data Analysis 27 / 29 Pitfalls to avoid Pitfalls in regression modeling: Casewise deletion of missing data. Categorizing continuous variables. Not using clinical knowledge to specify the model. Inappropriate linearity assumptions. Using stepwise variable selection (ie, deciding based on p-values). Fitting more complex model than data allows (ie, overfitting). Lack of model validation (if appropriate). Pitfalls in reporting & analysis in general: Reporting only favorable results. Deleting outliers based on observed response values. Non-reproducible analyses/results. Theresa A Scott, MS (Vandy Biostats) Data Analysis 28 / 29

15 References The Little Handbook of Statistical Practices Gerard Dallal. Mathematical Statistics with Applications (5th edition) by Wackerly, Mendenhall, and Schaeffer. Theresa A Scott, MS (Vandy Biostats) Data Analysis 29 / 29

### Sample Size Planning, Calculation, and Justification

Sample Size Planning, Calculation, and Justification Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

### Data Analysis, Research Study Design and the IRB

Minding the p-values p and Quartiles: Data Analysis, Research Study Design and the IRB Don Allensworth-Davies, MSc Research Manager, Data Coordinating Center Boston University School of Public Health IRB

### UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

### Parametric and Nonparametric: Demystifying the Terms

Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD

### Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

### Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

DATA ANALYSIS QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University Quantitative Research What is Statistics? Statistics (as a subject) is the science

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

### STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4

STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate

### Statistics for Sports Medicine

Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota (suzanne.hecht@gmail.com) Fellow s Research Conference July 2012: Philadelphia GOALS Try not to bore you to death!! Try to teach

### The Statistics Tutor s Quick Guide to

statstutor community project encouraging academics to share statistics support resources All stcp resources are released under a Creative Commons licence The Statistics Tutor s Quick Guide to Stcp-marshallowen-7

### Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### 1 Nonparametric Statistics

1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions

### Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Study Design and Statistical Analysis

Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### Statistical tests for SPSS

Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### The Wilcoxon Rank-Sum Test

1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### How To Check For Differences In The One Way Anova

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

### How To Test For Significance On A Data Set

Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

### Research Methods & Experimental Design

Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

### Statistics in Medicine Research Lecture Series CSMC Fall 2014

Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power

### MTH 140 Statistics Videos

MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

### Guide to Biostatistics

MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading

### Come scegliere un test statistico

Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

### Descriptive Analysis

Research Methods William G. Zikmund Basic Data Analysis: Descriptive Statistics Descriptive Analysis The transformation of raw data into a form that will make them easy to understand and interpret; rearranging,

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### DATA COLLECTION AND ANALYSIS

DATA COLLECTION AND ANALYSIS Quality Education for Minorities (QEM) Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. August 23, 2013 Objectives of the Discussion 2 Discuss

### Principles of Hypothesis Testing for Public Health

Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

### Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Mathematics within the Psychology Curriculum

Mathematics within the Psychology Curriculum Statistical Theory and Data Handling Statistical theory and data handling as studied on the GCSE Mathematics syllabus You may have learnt about statistics and

### Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

CHAPTER 13 Nonparametric and Distribution-Free Statistics Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., 2 tests for goodness of fit and independence).

### Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

### MEASURES OF LOCATION AND SPREAD

Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

Measurement & Data Analysis Overview of Measurement. Variability & Measurement Error.. Descriptive vs. Inferential Statistics. Descriptive Statistics. Distributions. Standardized Scores. Graphing Data.

### Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

### How To Write A Data Analysis

Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Data analysis process

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

### Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

### SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

### Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

### STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

### UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

### Variables. Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Chi-square test Fisher s Exact test

Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

### Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

### Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

### Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Guidelines for Data Collection & Data Entry

Guidelines for Data Collection & Data Entry Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott,

### Categorical Data Analysis

Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

### ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

### Over the past decade, the use of evidencebased. Interpretation and Use of Statistics in Nursing Research ABSTRACT

AACN19_2_211 222 4/14/08 5:44 PM Page 211 Volume 19, Number 2, pp.211 222 2008, AACN Interpretation and Use of Statistics in Nursing Research Karen K. Giuliano, PhD, RN, FAAN Michelle Polanowicz, MSN,

### Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

### A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

Glossary Brase: Understandable Statistics, 10e A B This is the notation used to represent the conditional probability of A given B. A and B This represents the probability that both events A and B occur.

### Difference tests (2): nonparametric

NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge

### Section 12 Part 2. Chi-square test

Section 12 Part 2 Chi-square test McNemar s Test Section 12 Part 2 Overview Section 12, Part 1 covered two inference methods for categorical data from 2 groups Confidence Intervals for the difference of

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

### Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

### COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in