Analyzing Research Data Using Excel



Similar documents
Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

Data exploration with Microsoft Excel: analysing more than one variable

Simple Predictive Analytics Curtis Seare

Data Analysis Tools. Tools for Summarizing Data

Data analysis process

The Dummy s Guide to Data Analysis Using SPSS

Come scegliere un test statistico

Introduction to Statistics and Quantitative Research Methods

Biostatistics: Types of Data Analysis

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

When to use Excel. When NOT to use Excel 9/24/2014

An introduction to using Microsoft Excel for quantitative data analysis

Additional sources Compilation of sources:

Using Excel in Research. Hui Bian Office for Faculty Excellence

January 26, 2009 The Faculty Center for Teaching and Learning

Statistics Review PSY379

Advanced Excel for Institutional Researchers

Data exploration with Microsoft Excel: univariate analysis

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

II. DISTRIBUTIONS distribution normal distribution. standard scores

Using SPSS, Chapter 2: Descriptive Statistics

DATA COLLECTION AND ANALYSIS

Descriptive Statistics

Exploratory data analysis (Chapter 2) Fall 2011

UNIVERSITY OF NAIROBI

SPSS Tests for Versions 9 to 13

Instructions for SPSS 21

Introduction Course in SPSS - Evening 1

Appendix 2.1 Tabular and Graphical Methods Using Excel

Version 5.0. Statistics Guide. Harvey Motulsky President, GraphPad Software Inc GraphPad Software, inc. All rights reserved.

Introduction to Regression and Data Analysis

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Projects Involving Statistics (& SPSS)

An SPSS companion book. Basic Practice of Statistics

An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman

MASTER COURSE SYLLABUS-PROTOTYPE PSYCHOLOGY 2317 STATISTICAL METHODS FOR THE BEHAVIORAL SCIENCES

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Introduction to StatsDirect, 11/05/2012 1

Parametric and Nonparametric: Demystifying the Terms

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

IBM SPSS Direct Marketing 22

Survey Research Data Analysis

Statistical tests for SPSS

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

SPSS Manual for Introductory Applied Statistics: A Variable Approach

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

TIPS FOR DOING STATISTICS IN EXCEL

ADD-INS: ENHANCING EXCEL

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

IBM SPSS Direct Marketing 23

MEASURES OF LOCATION AND SPREAD

Using Excel for Statistics Tips and Warnings

Statistics for Sports Medicine

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; and Dr. J.A. Dobelman

Directions for using SPSS

Levels of measurement in psychological research:

Analyzing and interpreting data Evaluation resources from Wilder Research

Analysis of categorical data: Course quiz instructions for SPSS

SPSS The Basics. Jennifer Thach RHS Assessment Office March 3 rd, 2014

SPSS Explore procedure

TRAINING PROGRAM INFORMATICS

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Nonparametric Statistics

Activity 3.7 Statistical Analysis with Excel

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

IBM SPSS Data Preparation 22

An introduction to IBM SPSS Statistics

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

How to Use a Data Spreadsheet: Excel

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

IBM SPSS Statistics for Beginners for Windows

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

Table of Contents. Preface

Excel 2010: Create your first spreadsheet

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

Mathematics within the Psychology Curriculum

Working with data: Data analyses April 8, 2014

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

Non-parametric Tests Using SPSS

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Using MS Excel to Analyze Data: A Tutorial

[MICROSOFT EXCEL FOR DATA ENTRY] Fernandez Hospital Pvt Ltd. Academics Dept & Clinical Research Unit. Page1

Description. Textbook. Grading. Objective

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Psyc 250 Statistics & Experimental Design. Correlation Exercise

Introduction to Quantitative Methods

Statistics. Measurement. Scales of Measurement 7/18/2012

Organizing Your Approach to a Data Analysis

Introduction to Statistics Used in Nursing Research

Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

Overview What is a PivotTable? Benefits

Foundation of Quantitative Data Analysis

Transcription:

Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial redistribution. In consideration for this authorization, the user agrees that any unmodified reproduction of this publication shall retain all copyright and proprietary notices. If the user modifies the content of this publication, all FH copyright notices shall be removed, however FH shall be acknowledged as the author of the source publication. Reproduction or storage of this publication in any form by any means for the purpose of commercial redistribution is strictly prohibited. This publication is intended to provide general information only, and should not be relied on as providing specific healthcare, legal or other professional advice. The Fraser Health Authority, and every person involved in the creation of this publication, disclaims any warranty, express or implied, as to its accuracy, completeness or currency, and disclaims all liability in respect of any actions, including the results of any actions, taken or not taken in reliance on the information contained herein.

http://research.fraserhealth.ca/ 2

Objectives To review key concepts and elements of quantitative research To explore the application of Excel in conducting a research project: Creating data files Creating data dictionary Linking research question to appropriate analysis To apply statistics to analytical data 3

Workshop Outline 09:00-09:15 Review of Quantitative Research 09:15-09:30 Measurement 09:30-10:30 Excel 101 Excel as database 10:30-10:45 Break 10:45-11:15 Using Excel to Clean/Explore Data 11:15-12:00 Using Excel to Analyze Data 4

Excel Pop Quiz How many columns? 2003: 256 2007 & 2010: 16,384 How many rows? 2003: 65,546 2007 & 2010: 1,048,576 True or False You can conduct statistical analyses in Excel? 5

Quick Review of Quantitative Research 6

Framework for Quantitative Research Conduct literature review Develop rationale Why do want to do this research? What do others say? What are knowledge gaps? Formulate research question Generate objective(s) and/or hypothesis PICO Method P = population / patient I = intervention C = comparison O = outcome Hypothesis Objective (Usually) statement of anticipated Action statement results Apply methods and conduct the study Measurement Study Design Analysis 7

Research Question 8

Measurement: Thinking in Numbers From this To this ID Gender Age Disease Outcome ID Gender Age Disease Outcome 1 Male 59 Y 1 1 59 1 2 Female 52 Y 2 2 52 1 3 Male 53 N 3 1 53 0 4 Female 60 N 4 2 60 0 9

Types of Variables Independent variable (IV) Influences your outcome measure Active (intervention) or Attribute (characteristic) Dependent variable (DV) Influenced by the IV(s) Usually represents outcome studied Confounders Alternative explanation for an association between an exposure (IV) and an outcome (DV) Not a focus of the study Independently associated with the outcome Associated with the exposure under study 10

Level of Measurement Nominal Example: gender Data categorized into mutually exclusive and unordered groups Can assign number codes but calculations would be meaningless (male=1; female=2) Ordinal Example: SES level: low, middle and high income Data classified/categorized with implied order Distance between data not always equal Can't measure the magnitude or quantify the difference between data: how much lower is middle from higher income? 11

Level of Measurement Interval: attributes measured on interval scales Equal distance between each interval Distance between scale numbers has meaning Arbitrary zero point (e.g., temperature) Ratio: similar to interval scale Has true zero point Clear definition of 0: There is none of the variable Example: weight, salary ($0=$0). Can make assumptions about the ratio of two measurements 6 grams is twice as much as 3 grams 12

Level of Measurement & Acceptable Statistical Operations Nominal Ordinal Interval Ratio Frequency distribution Yes Yes Yes Yes Mode Yes Yes Yes Yes Median & percentile No Yes Yes Yes Mean & standard deviation No No Yes Yes 13

Excel 101 14

Objectives How to organize data in an MS Excel spreadsheet How to define variables How to code in preparation for analysis 15

Terminologies Data: Information that you collect Dataset Collection of data usually presented in tabular form Columns represents variables Rows represent members of the dataset Spreadsheet Computer application that facilitates use of datasets (enter data, analyses, sharing) MS Excel is a spreadsheet program 16

Using Excel for a Research Study To capture data Facilitate data collection, minimize entry errors To clean/explore/describe data Starting point for analyses is cleaning raw data Preliminary descriptive statistics To analyze data Using program add-ins 17

Stages in preparing data for analysis Collect data Create data file: Enter & clean data Explore data Analyze data Interpret results 18

Stages in preparing data for analysis Good practice Design your spreadsheet keeping your statistical analyses in mind Use logic check to clean data Create data dictionary Consult with analyst 19

Creating data file using spreadsheet Each variable (e.g., ID #) represented by a column Each participant is represented by a row All the information for a single case is entered across one row only the data in each column summarizes information on a particular variable 20

Defining variables Creating Your Spreadsheet Use descriptive, unique, names for variables Use underscore (_) instead of space Be consistent in naming especially with array variables Variables that capture a pattern Example: measuring blood pressure at regular intervals (e.g., Q30 min for 12 hours) BP_30min BP_1hr BP_final BP_time1 BP_time2 BP_time3 21

Creating Your Spreadsheet Each column captures only single piece of information ID 1 2 3 Intervention_Used Yes surgery Yes medication No ID Intervention_Used Intervention_Type 1 2 3 Yes Yes No Surgery Medication Not applicable 22

Creating Your Spreadsheet Use tools to facilitate entry: Add notes for data entry Insert Comments 23

Creating Your Spreadsheet Use tools to facilitate entry: Colour-code columns Fill color icon 24

Creating Your Spreadsheet For numeric variables, use Format to force entry into specific form Date Number of decimals Highlight entire column to ensure consistent format applied 25

Data Dictionaries AKA code books A centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format Develop data dictionary: Before collecting data Components: Variable name Variable label Type: nominal or interval Values/coding for each variable Always assign values for missing or non applicable cases 26

Sample Data Dictionary Data Dictionary for Research Consults Database Item Variable Label Type Coding/ entry instruction 1 ID_no ID number Numeric Enter unique number for each record/patient (1.100) 2 Gender Gender of participant String Enter corresponding number: 0= Female 1= Male 999= missing 3 Date_T1 Date at baseline Date Specify format: mm/dd/yy 4 Education Highest level of education Numeric Enter corresponding number: 1= Grade 10 2= Grade 12 3= College Diploma 4= University 27

Data Structures Re-emphasize importance of keeping analyses in mind when designing dataset Statistical packages require that data are entered in specific way in order to run analytical steps 28

Using Excel to Clean / Explore Data 29

Data Cleaning & Data Exploration Data at this stage is raw dataset Need to Clean: any entry errors, duplicate entries convert text variables into numeric variables Explore: any outliers Excel tools to facilitate cleaning and exploring Filter Sort 30

Data Cleaning & Data Exploration Sources of data errors: Missing: never leave blank Assign a meaningful number for missing values Consider coding for cases with non-applicable responses Typing errors on data entry (e.g., age= 121) Measurement errors (e.g., height) Identify errors: Descriptive statistics for each variable Minimum and maximum values Means, medians and SD 31

Our Dataset for the Remainder of the Workshop 32

Gout and AMI Study 33

Gout and AMI 2 Study Research Question What clinical factors are associated with the development of AMI among elderly women with gout? Objectives To compare women with gout and no gout on clinical factors including age, BMI, uric acid level To evaluate the correlation between clinical factors among elderly women with gout and without gout 34

Gout and AMI 2 Dataset 35

Gout and AMI 2 Study Data dictionary (handout) 13 Variables What are the continuous variables? What are the categorical variables? N = 200 subjects 36

Excel Tool: Filter Place cursor over the block by A1 Data Filter AutoFilter (2003) Data Sort & Filter Filter (2007) 37

Exercise 1 You are the analyst for the Gout and AMI 2 study. A data dictionary was not implemented (GASP!) and there are entry errors. Clean the dataset by locating and finding entry errors using tools in Excel. Sample_worksheet_Exercise1.xls 38

Summary Important data considerations while conducting the study Designing the dataset Collecting the data Entering the data Cleaning/exploring raw data Applying statistics to analytical data 39

Using Excel to Analyze Data 40

Using Excel to Analyze Data Analysis ToolPak: add-in to be installed in Excel Supplemental program that adds custom commands Descriptive statistics Analytical statistics T-tests Correlation 41

1. Click Tools Add-Ins Getting Started 2. Select Analysis Tool Pack 42

Descriptive Statistics Statistics used to describe characteristics of study population/sample Not used to infer the properties of the population from which the sample was drawn For continuous variables Measures of central tendency: mean, median, mode Measures of variability (standard deviation) Shape of Distribution (skewness, kurtosis) 43

Level of Measurement & Acceptable Statistical Operations Nominal Ordinal Interval Ratio Frequency distribution Yes Yes Yes Yes Mode Yes Yes Yes Yes Median & percentile No Yes Yes Yes Mean & standard deviation No No Yes Yes 44

Descriptive Statistics 1.Click Tools Data Analysis 2. Select Descriptive Statistics 45

Example: Age Input range Select columns $A:$E Group by columns Click on Labels in 1 st row Output options New worksheet Summary statistics 46

Output for Descriptive Statistics: Age 47

Exercise 2 Data file = sample_worksheet_masterfile.xls Using Descriptive Statistics in Analysis ToolPak, obtain descriptive statistics for: Uric Acid BMI 48

Descriptive Statistics by Group Use Pivot Tables Useful to obtain descriptive statistics by group For example, if you wanted to know the average and standard deviation of BMI for men and women 49

Pivot Tables Click Data Pivot Table and Pivot Chart 50

Step 1 Example Age by Gout Diagnosis Where is data? What kind of report? 51

Example Age by Gout Diagnosis Step 2 Where is data you want to use? Drag pointer over entire dataset 52

Example Age by Gout Diagnosis Step 3 Where do you want to put the Pivot Table? Layout This is where you tell Excel which groups you want to output your results by and for what variables 53

Example Age by Gout Diagnosis Step 3 Layout Row represents your grouping variable (Gout_Dx) Column variable you want to output according to groups May need to drag several times for parameters needed 54

Example Age by Gout Diagnosis No gout Gout 55

Exercise 3 Using Pivot Tables and Charts, obtain the mean and standard deviation for uric acid according to gout diagnosis 56

Analytic Statistics Statistical procedures used to draw conclusions about a population from sample data Compare groups T-tests Evaluate correlation Correlation coefficients Evaluate association Regression models 57

Analytic Statistics: Considerations Research question: Describe, compare or predict? Levels of data measurement: nominal, ordinal or interval? Are you comparing same or different subjects? Number of experimental groups? Is your data normally distributed? What are the assumptions of the statistical test you would like to use? 58

Check Data Assumptions What are the assumptions of the statistical test you would like to use? Some common assumptions are: The DV and IV will need to be measured on a certain level (e.g. continuous) The population is normally distributed (not skewed) 59

Selecting Appropriate Statistical Test Statistical decision tree (handout) 1. Research goal 2. Identify ID and DV 3. Describe level of the data 4. Identify the # & group pairing groups 5. Check data assumptions Goal Describe one group Compare one group to a hypothetical value Compare two unpaired groups Compare two paired groups Compare three or more unmatched groups Compare three or more matched groups Quantify association between two variables Predict value from another measured variable Predict value from several measured or binomial variables Type of Dependent Variable Data Continuous Normal Mean, SD Ordinal Non-normal Median, interquartile range Categorical Proportion One-sample t test Wilcoxon test Chi-square Unpaired t test Mann-Whitney test Fisher's test (chi-square for large samples) Paired t test Wilcoxon test McNemar's test One-way ANOVA Kruskal-Wallis test Chi-square test Repeated-measures ANOVA Friedman test Cochrane Q Pearson correlation Spearman correlation Contingency coefficients Simple linear regression or Nonlinear regression Multiple linear regression or Multiple nonlinear regression Nonparametric regression Simple logistic regression Multiple logistic regression 60

Exercise 4 (handout) A pilot experiment designed to test the effectiveness of a new therapy to pain management for patients with chronic pain, conducted over a one year time period. What is the goal? What is the IV? What is the DV? How many groups? Paired/matched or independent? What is the level of measurement? 61

Comparing groups: Mean differences Independent Samples T-Test Comparison of the means of 2 non-paired groups Differences in pain levels between 2 groups (standard care and new intervention) Paired Samples T-Test Comparison of means of 2 paired measures Differences in pain levels within groups Pre and post measurement, repeated measurement under different conditions 62

Comparing Groups Sort dataset by variable you are comparing Click Tools Data analysis Three options for t-tests 63

Is there a difference in age between gout and non-gout patients? Highlight age data for first group (no gout) Highlight age data for second group (gout) Output to new worksheet 64

Output No gout Gout 65

Paired Samples T-Test Procedure Tools Data Analysis T-test: Paired Two Sample for Means Input 1 Range: DV at time 1 Input 2 Range: DV at time 2 Output Options 66

Exercise 5 Using Descriptive Statistics in Analysis ToolPak, compare patients with no gout versus gout according to: Uric Acid levels BMI 67

Exercise 6 Using Descriptive Statistics in Analysis ToolPak to answer the following question: Is there a difference in mean pain before and after surgery? 68

Associate - Correlation Allows an examination of the association between variables Range: 0 to +1 Information about the strength of association Information about the direction of the association (positive or negative) A correlation coefficient of 0 =no relationship A correlation coefficient of +1= positive relationship A correlation coefficient of -1= negative relationship 69

Continuous variables Columns: side-by-side Click Tools Data Analysis Click Correlation Evaluating Correlation 70

What is Correlation Between Age and Uric Acid Level? Highlight 2 columns Group by columns Labels in first row Output to new worksheet 71

Output 72

Exercise 6 Using Descriptive Statistics in Analysis ToolPak, evaluate the correlation between: Uric Acid and BMI 73

Thank You! 74