# Analyzing Research Data Using Excel

3 Objectives To review key concepts and elements of quantitative research To explore the application of Excel in conducting a research project: Creating data files Creating data dictionary Linking research question to appropriate analysis To apply statistics to analytical data 3

4 Workshop Outline 09:00-09:15 Review of Quantitative Research 09:15-09:30 Measurement 09:30-10:30 Excel 101 Excel as database 10:30-10:45 Break 10:45-11:15 Using Excel to Clean/Explore Data 11:15-12:00 Using Excel to Analyze Data 4

5 Excel Pop Quiz How many columns? 2003: & 2010: 16,384 How many rows? 2003: 65, & 2010: 1,048,576 True or False You can conduct statistical analyses in Excel? 5

6 Quick Review of Quantitative Research 6

7 Framework for Quantitative Research Conduct literature review Develop rationale Why do want to do this research? What do others say? What are knowledge gaps? Formulate research question Generate objective(s) and/or hypothesis PICO Method P = population / patient I = intervention C = comparison O = outcome Hypothesis Objective (Usually) statement of anticipated Action statement results Apply methods and conduct the study Measurement Study Design Analysis 7

8 Research Question 8

9 Measurement: Thinking in Numbers From this To this ID Gender Age Disease Outcome ID Gender Age Disease Outcome 1 Male 59 Y Female 52 Y Male 53 N Female 60 N

10 Types of Variables Independent variable (IV) Influences your outcome measure Active (intervention) or Attribute (characteristic) Dependent variable (DV) Influenced by the IV(s) Usually represents outcome studied Confounders Alternative explanation for an association between an exposure (IV) and an outcome (DV) Not a focus of the study Independently associated with the outcome Associated with the exposure under study 10

11 Level of Measurement Nominal Example: gender Data categorized into mutually exclusive and unordered groups Can assign number codes but calculations would be meaningless (male=1; female=2) Ordinal Example: SES level: low, middle and high income Data classified/categorized with implied order Distance between data not always equal Can't measure the magnitude or quantify the difference between data: how much lower is middle from higher income? 11

12 Level of Measurement Interval: attributes measured on interval scales Equal distance between each interval Distance between scale numbers has meaning Arbitrary zero point (e.g., temperature) Ratio: similar to interval scale Has true zero point Clear definition of 0: There is none of the variable Example: weight, salary (\$0=\$0). Can make assumptions about the ratio of two measurements 6 grams is twice as much as 3 grams 12

13 Level of Measurement & Acceptable Statistical Operations Nominal Ordinal Interval Ratio Frequency distribution Yes Yes Yes Yes Mode Yes Yes Yes Yes Median & percentile No Yes Yes Yes Mean & standard deviation No No Yes Yes 13

14 Excel

15 Objectives How to organize data in an MS Excel spreadsheet How to define variables How to code in preparation for analysis 15

16 Terminologies Data: Information that you collect Dataset Collection of data usually presented in tabular form Columns represents variables Rows represent members of the dataset Spreadsheet Computer application that facilitates use of datasets (enter data, analyses, sharing) MS Excel is a spreadsheet program 16

17 Using Excel for a Research Study To capture data Facilitate data collection, minimize entry errors To clean/explore/describe data Starting point for analyses is cleaning raw data Preliminary descriptive statistics To analyze data Using program add-ins 17

18 Stages in preparing data for analysis Collect data Create data file: Enter & clean data Explore data Analyze data Interpret results 18

19 Stages in preparing data for analysis Good practice Design your spreadsheet keeping your statistical analyses in mind Use logic check to clean data Create data dictionary Consult with analyst 19

20 Creating data file using spreadsheet Each variable (e.g., ID #) represented by a column Each participant is represented by a row All the information for a single case is entered across one row only the data in each column summarizes information on a particular variable 20

21 Defining variables Creating Your Spreadsheet Use descriptive, unique, names for variables Use underscore (_) instead of space Be consistent in naming especially with array variables Variables that capture a pattern Example: measuring blood pressure at regular intervals (e.g., Q30 min for 12 hours) BP_30min BP_1hr BP_final BP_time1 BP_time2 BP_time3 21

22 Creating Your Spreadsheet Each column captures only single piece of information ID Intervention_Used Yes surgery Yes medication No ID Intervention_Used Intervention_Type Yes Yes No Surgery Medication Not applicable 22

24 Creating Your Spreadsheet Use tools to facilitate entry: Colour-code columns Fill color icon 24

25 Creating Your Spreadsheet For numeric variables, use Format to force entry into specific form Date Number of decimals Highlight entire column to ensure consistent format applied 25

26 Data Dictionaries AKA code books A centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format Develop data dictionary: Before collecting data Components: Variable name Variable label Type: nominal or interval Values/coding for each variable Always assign values for missing or non applicable cases 26

27 Sample Data Dictionary Data Dictionary for Research Consults Database Item Variable Label Type Coding/ entry instruction 1 ID_no ID number Numeric Enter unique number for each record/patient (1.100) 2 Gender Gender of participant String Enter corresponding number: 0= Female 1= Male 999= missing 3 Date_T1 Date at baseline Date Specify format: mm/dd/yy 4 Education Highest level of education Numeric Enter corresponding number: 1= Grade 10 2= Grade 12 3= College Diploma 4= University 27

28 Data Structures Re-emphasize importance of keeping analyses in mind when designing dataset Statistical packages require that data are entered in specific way in order to run analytical steps 28

29 Using Excel to Clean / Explore Data 29

30 Data Cleaning & Data Exploration Data at this stage is raw dataset Need to Clean: any entry errors, duplicate entries convert text variables into numeric variables Explore: any outliers Excel tools to facilitate cleaning and exploring Filter Sort 30

31 Data Cleaning & Data Exploration Sources of data errors: Missing: never leave blank Assign a meaningful number for missing values Consider coding for cases with non-applicable responses Typing errors on data entry (e.g., age= 121) Measurement errors (e.g., height) Identify errors: Descriptive statistics for each variable Minimum and maximum values Means, medians and SD 31

32 Our Dataset for the Remainder of the Workshop 32

33 Gout and AMI Study 33

34 Gout and AMI 2 Study Research Question What clinical factors are associated with the development of AMI among elderly women with gout? Objectives To compare women with gout and no gout on clinical factors including age, BMI, uric acid level To evaluate the correlation between clinical factors among elderly women with gout and without gout 34

35 Gout and AMI 2 Dataset 35

36 Gout and AMI 2 Study Data dictionary (handout) 13 Variables What are the continuous variables? What are the categorical variables? N = 200 subjects 36

37 Excel Tool: Filter Place cursor over the block by A1 Data Filter AutoFilter (2003) Data Sort & Filter Filter (2007) 37

38 Exercise 1 You are the analyst for the Gout and AMI 2 study. A data dictionary was not implemented (GASP!) and there are entry errors. Clean the dataset by locating and finding entry errors using tools in Excel. Sample_worksheet_Exercise1.xls 38

39 Summary Important data considerations while conducting the study Designing the dataset Collecting the data Entering the data Cleaning/exploring raw data Applying statistics to analytical data 39

40 Using Excel to Analyze Data 40

41 Using Excel to Analyze Data Analysis ToolPak: add-in to be installed in Excel Supplemental program that adds custom commands Descriptive statistics Analytical statistics T-tests Correlation 41

42 1. Click Tools Add-Ins Getting Started 2. Select Analysis Tool Pack 42

43 Descriptive Statistics Statistics used to describe characteristics of study population/sample Not used to infer the properties of the population from which the sample was drawn For continuous variables Measures of central tendency: mean, median, mode Measures of variability (standard deviation) Shape of Distribution (skewness, kurtosis) 43

44 Level of Measurement & Acceptable Statistical Operations Nominal Ordinal Interval Ratio Frequency distribution Yes Yes Yes Yes Mode Yes Yes Yes Yes Median & percentile No Yes Yes Yes Mean & standard deviation No No Yes Yes 44

45 Descriptive Statistics 1.Click Tools Data Analysis 2. Select Descriptive Statistics 45

46 Example: Age Input range Select columns \$A:\$E Group by columns Click on Labels in 1 st row Output options New worksheet Summary statistics 46

47 Output for Descriptive Statistics: Age 47

48 Exercise 2 Data file = sample_worksheet_masterfile.xls Using Descriptive Statistics in Analysis ToolPak, obtain descriptive statistics for: Uric Acid BMI 48

49 Descriptive Statistics by Group Use Pivot Tables Useful to obtain descriptive statistics by group For example, if you wanted to know the average and standard deviation of BMI for men and women 49

50 Pivot Tables Click Data Pivot Table and Pivot Chart 50

51 Step 1 Example Age by Gout Diagnosis Where is data? What kind of report? 51

52 Example Age by Gout Diagnosis Step 2 Where is data you want to use? Drag pointer over entire dataset 52

53 Example Age by Gout Diagnosis Step 3 Where do you want to put the Pivot Table? Layout This is where you tell Excel which groups you want to output your results by and for what variables 53

54 Example Age by Gout Diagnosis Step 3 Layout Row represents your grouping variable (Gout_Dx) Column variable you want to output according to groups May need to drag several times for parameters needed 54

55 Example Age by Gout Diagnosis No gout Gout 55

56 Exercise 3 Using Pivot Tables and Charts, obtain the mean and standard deviation for uric acid according to gout diagnosis 56

57 Analytic Statistics Statistical procedures used to draw conclusions about a population from sample data Compare groups T-tests Evaluate correlation Correlation coefficients Evaluate association Regression models 57

58 Analytic Statistics: Considerations Research question: Describe, compare or predict? Levels of data measurement: nominal, ordinal or interval? Are you comparing same or different subjects? Number of experimental groups? Is your data normally distributed? What are the assumptions of the statistical test you would like to use? 58

59 Check Data Assumptions What are the assumptions of the statistical test you would like to use? Some common assumptions are: The DV and IV will need to be measured on a certain level (e.g. continuous) The population is normally distributed (not skewed) 59

60 Selecting Appropriate Statistical Test Statistical decision tree (handout) 1. Research goal 2. Identify ID and DV 3. Describe level of the data 4. Identify the # & group pairing groups 5. Check data assumptions Goal Describe one group Compare one group to a hypothetical value Compare two unpaired groups Compare two paired groups Compare three or more unmatched groups Compare three or more matched groups Quantify association between two variables Predict value from another measured variable Predict value from several measured or binomial variables Type of Dependent Variable Data Continuous Normal Mean, SD Ordinal Non-normal Median, interquartile range Categorical Proportion One-sample t test Wilcoxon test Chi-square Unpaired t test Mann-Whitney test Fisher's test (chi-square for large samples) Paired t test Wilcoxon test McNemar's test One-way ANOVA Kruskal-Wallis test Chi-square test Repeated-measures ANOVA Friedman test Cochrane Q Pearson correlation Spearman correlation Contingency coefficients Simple linear regression or Nonlinear regression Multiple linear regression or Multiple nonlinear regression Nonparametric regression Simple logistic regression Multiple logistic regression 60

61 Exercise 4 (handout) A pilot experiment designed to test the effectiveness of a new therapy to pain management for patients with chronic pain, conducted over a one year time period. What is the goal? What is the IV? What is the DV? How many groups? Paired/matched or independent? What is the level of measurement? 61

62 Comparing groups: Mean differences Independent Samples T-Test Comparison of the means of 2 non-paired groups Differences in pain levels between 2 groups (standard care and new intervention) Paired Samples T-Test Comparison of means of 2 paired measures Differences in pain levels within groups Pre and post measurement, repeated measurement under different conditions 62

63 Comparing Groups Sort dataset by variable you are comparing Click Tools Data analysis Three options for t-tests 63

64 Is there a difference in age between gout and non-gout patients? Highlight age data for first group (no gout) Highlight age data for second group (gout) Output to new worksheet 64

65 Output No gout Gout 65

66 Paired Samples T-Test Procedure Tools Data Analysis T-test: Paired Two Sample for Means Input 1 Range: DV at time 1 Input 2 Range: DV at time 2 Output Options 66

67 Exercise 5 Using Descriptive Statistics in Analysis ToolPak, compare patients with no gout versus gout according to: Uric Acid levels BMI 67

68 Exercise 6 Using Descriptive Statistics in Analysis ToolPak to answer the following question: Is there a difference in mean pain before and after surgery? 68

69 Associate - Correlation Allows an examination of the association between variables Range: 0 to +1 Information about the strength of association Information about the direction of the association (positive or negative) A correlation coefficient of 0 =no relationship A correlation coefficient of +1= positive relationship A correlation coefficient of -1= negative relationship 69

70 Continuous variables Columns: side-by-side Click Tools Data Analysis Click Correlation Evaluating Correlation 70

71 What is Correlation Between Age and Uric Acid Level? Highlight 2 columns Group by columns Labels in first row Output to new worksheet 71

72 Output 72

73 Exercise 6 Using Descriptive Statistics in Analysis ToolPak, evaluate the correlation between: Uric Acid and BMI 73

74 Thank You! 74

