Analyzing Quantitative Assessment Data with Excel October 2, 2014 Jeremy Penn, Ph.D. Director When to use Excel You want to quickly summarize or analyze your assessment data You want to create basic visual displays of your results (tables or graphs) You don t have the time or interest in learning a different analysis program When NOT to use Excel You have a large number of missing data points You need to run many different types of analyses You need to run more complex analyses You need to match data sets from various sources 1
Resources Excel 2010 for Educational and Psychological Statistics: Thomas Quirk The web http://www.dummies.com/howto/content/statistical-analysis-with-excel-fordummies-cheat-.html -Many other websites, depending on what you want to do Topics Covered Setting up your Excel file Tables Graphs and charts Descriptive statistics Correlation Regression Two-group t-test for independent groups Assumptions You understand the statistics you will be calculating and using If not, you will ask someone for help! A clock with a dead battery is right twice a day. 2
Setting up your Excel File Each row is a separate student or case Each column is a distinct variable Check for errors and the accuracy of your data file Create a codebook for each variable Look for missing values or unusual or impossible scores Excel does not handle missing values very well usually have to toss the student out of the data file in Excel (sometimes Excel will do this automatically if the cell is blank) Student Sex HSGPA ACT Visits to tutoring center NDSU GPA 1 M 2 16 0 2.8 2 M 2.1 18 0 2.75 3 M 3.2 24 0 1.6 4 M 2.25 20 1 3.2 5 M 2.3 21 2 2.5 6 M 4 29 3 3.9 7 M 3.75 32 3 3.8 8 M 3.5 30 5 3.75 9 M 3.5 26 8 4 10 M 3.9 27 10 3.2 11 F 4 34 0 3.9 12 F 2.5 26 0 2.1 13 F 2.13 21 0 2.2 14 F 3.8 19 0 3.4 15 F 3.9 27 4 3.8 16 F 4 26 5 4 17 F 2.75 19 5 3.2 18 F 3.1 18 6 3.4 19 F 3.25 25 12 3.8 20 F 3.85 27 15 3.95 Tables Good way to summarize data Easier to build in Excel than word or PP Easy to control borders, heading, style, etc. Big benefit is you can use Excel to run calculations for you very easily 3
Common Excel Formulas for Building Tables Perform a calculation =average(c2:c21) Count the number of times a text element is listed =countif(b2:b21, M ) Find the largest or smallest value in a range of numbers =max(c2:c21) or =min(c2:c21) Common Excel Formulas for Building Tables Calculate standard deviation =stdev.s(c2:c21) Sum a set of numbers =sum(c2:c21) Perform an action only if something is true =if(b2= M,c2, F ) (can combine if statements with other formulas to summarize data for a table) Refer to a cell on another worksheet =Sheet1!a2 Graphs and Charts Graphs and Charts in Excel work best from summary tables (not from the raw data) It is a good idea to start by creating a summary table 4
Sex Male Female Count =countif(b2:b21, M ) =countif(b2:b21, F ) Sex Count Male 10 Female 10 Select your table, then either choose recommended charts or select the pie chart directly from the insert menu You can double-click on the pie chart and adjust colors, labels, shadow, add labels, etc. Then you can export (copy / paste) to your report Note that if you adjust your table your chart will automatically change 5
Can you duplicate this graph? Create a Scatterplot for two groups Want to show the relationship between two variables for multiple groups Deceptively difficult in Excel (but very easy in some other programs) 1. Insert a scatterplot for one group as usual. 2. Right click on the chart, and select select data. 6
Change the name of the first series to the name of your first group (males, in our case) Then click on Add under Legend Series Enter the name of this group in Series name Then use the little buttons to select the X and Y values (HSGPA and ACT) for the second group (females) Hit OK. Then you can add titles, color, etc. 7
Descriptive Statistics Summarize your data Usually includes (for continuous variables) Mean Standard deviation Count For categorical variables Number of respondents in each category Percentages Descriptive Statistics You can do it by hand, using the formulas as above OR You can be faster and use the data analysis tool: http://office.microsoft.com/en-us/excelhelp/use-the-analysis-toolpak-to-performcomplex-data-analysis-ha102748996.aspx 8
Descriptive Statistics Once you ve installed the data analysis toolpak Select data analysis under Data tab Then choose Descriptive statistics Note that descriptive statistics ONLY works for numeric data Choose your input range (use the little button) Select Columns, Labels in first row, new worksheet, and summary statistics HSGPA ACT Visits to tutoring center NDSU GPA Mean 3.189Mean 24.25Mean 3.95Mean 3.2625 Standard Error 0.16523492Standard Error 1.128284679Standard Error Standard 0.988020349Error 0.160831515 Median 3.375Median 25.5Median 3Median 3.4 Mode 4Mode 26Mode 0Mode 3.2 Standard Standard Standard Deviation 0.738953029Deviation 5.045842478Standard Deviation 4.418561328Deviation 0.719260402 Sample Sample Sample Variance 0.546051579Variance 25.46052632Sample Variance 19.52368421Variance 0.517335526 Kurtosis -1.459468723Kurtosis -0.842417637Kurtosis 0.731454482Kurtosis -0.125923022 Skewness -0.439135411Skewness 0.12922007Skewness 1.155567122Skewness -0.913227106 Range 2Range 18Range 15Range 2.4 Minimum 2Minimum 16Minimum 0Minimum 1.6 Maximum 4Maximum 34Maximum 15Maximum 4 Sum 63.78Sum 485Sum 79Sum 65.25 Count 20Count 20Count 20Count 20 9
Correlation Excel works for finding the correlation between two continuous variables Not so good for two categorical variables, or for a categorical and a continuous, or an ordinal and a continuous, etc. Use caution! Just as before, except select Correlation Hit OK HSGPA ACT HSGPA 1 ACT 0.720102 1 10
Correlation Alternatively, you can use the correlation formula: =correl(c2:c21,d2:d21) Regression Regression of a single variable on an outcome variable is pretty easy; multiple regression in Excel (using several predictors) is not a great idea Want to be able to predict a value based on known value (e.g., predict NDSU GPA based on ACT score) 11
SUMMARY OUTPUT Regression Statistics Multiple R 0.474757758 R Square 0.225394929 Adjusted R Square 0.182361314 Standard Error 0.650379535 Observations 20 ANOVA df SS MS F Significance F Regression 1 2.215491279 2.215491279 5.237648024 0.03441 Residual 18 7.613883721 0.42299354 Total 19 9.829375 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 1.621395349 0.731679311 2.215991792 0.039817559 0.084194 3.15859 0.08419 3.15859 ACT 0.067674419 0.029570344 2.288590838 0.034413013 0.00554 0.12979 0.00554 0.1297 Okay, your turn! See if you can generate this regression graph (there s only one or two new things here) Two-group t-test Two completely different groups of people Both groups are sampled from a normal population Variances of the two populations approximately equal Not appropriate for categorical variables or when you have more than 2 groups or repeated measures within the same group 12
Use the data analysis tool t-test: Two-Sample Assuming Equal Variances Variable 1 Variable 2 Mean 3.05 3.328 Variance 0.638888889 0.470951111 Observations 10 10 Pooled Variance 0.55492 Hypothesized Mean Difference 0 df 18 t Stat -0.834477458 P(T<=t) one-tail 0.207475097 t Critical one-tail 1.734063607 P(T<=t) two-tail 0.414950194 t Critical two-tail 2.10092204 13
Interpretation No evidence to suggest that HSGPA was different for males versus females 1-tailed test is more powerful but must have reason for choosing 1-tailed test (directional) In our example, lack sufficient power (very small number of people in groups and HSGPA difference likely to be very small) T-Test Note: =T.TEST(C2:C11,C12:C21,2,2) Is equivalent, but only returns the p-value Getting Fancy Pivot tables are a handy way to analyze more complex data sets Under insert select PivotTable Extremely flexible way to explore and analyze your data 14
Let s say you were interested in looking at the relationship between number of visits to the tutoring center and ACT and NDSU GPA Can build this in a few clicks: Row Labels Average of ACT Average of NDSU GPA Count of Sex 0 22.57 2.68 7 1-3 25.50 3.35 4 4-8 24.33 3.69 6 10-15 26.33 3.65 3 Grand Total 24.25 3.2625 20 Shows your variables (your columns) Filters show results based on a variable (e.g., males only) Columns creates columns for results (e.g, males and females) Rows creates rows for results (e.g., freshman, soph, junior, senior Values the variables you want displayed 15
Select the variables you want to work with (NDSU GPA, Visits, and ACT) Excel guesses what you want to do with these variables (fields) Incorrectly, in this case Sum of Visits to tutoring Sum of NDSU Sum of center GPA ACT 79 65.25 485 Click and drag visits to the rows section This is a little better Row Labels Sum of NDSU GPA Sum of ACT 0 18.75 158 1 3.2 20 2 2.5 21 3 7.7 61 4 3.8 27 5 10.95 75 6 3.4 18 8 4 26 10 3.2 27 12 3.8 25 15 3.95 27 Grand Total 65.25 485 Sum is not particularly meaningful for these variables Click on the arrow next to ACT and NDSU GPA and select Value Field Settings Select Average 16
Row Labels Average of NDSU GPA Average of ACT 0 2.678571429 22.57142857 1 3.2 20 2 2.5 21 3 3.85 30.5 4 3.8 27 5 3.65 25 6 3.4 18 8 4 26 10 3.2 27 12 3.8 25 15 3.95 27 Grand Total 3.2625 24.25 Now we can see the average ACT score and NDSU GPA by number of visits. However with such a small data set, we can t make much of this pattern. So we want to group the number of visits into larger numbers of students. Drag student (or sex) to the Values section, and change the Value Field Setting to Count Row Labels Average of NDSU GPA Average of ACT Count of Student 0 2.678571429 22.57142857 7 1 3.2 20 1 2 2.5 21 1 3 3.85 30.5 2 4 3.8 27 1 5 3.65 25 3 6 3.4 18 1 8 4 26 1 10 3.2 27 1 12 3.8 25 1 15 3.95 27 1 Grand Total 3.2625 24.25 20 17
Let s try to get them in approximately equal-size groups, so leave group 0, then combine 1-4, 5-6, and 8-15 Select 1-4 (click and drag) then select Group selection under the group tools (analyze tab) Then you can rename the groups Use the - button to collapse across the groups Can also use Excel formatting to make it look nice Now you can easily see: Students with 0 visits had the lowest NDSU GPA and ACT scores Students with 8-15 visits had highest NDSU GPA but also had highest ACT 18
Sure, you say, I can do that without using pivot tables but what if you wanted to see the breakdown by males and females? With Pivot Tables, you can just drag sex to Columns and with little effort, you can see this: Average NDSU GPA Average ACT Number of Students Row Labels F M F M F M 0 visits 2.90 2.38 25.00 19.33 4 3 1-4 visits 3.80 3.35 27.00 25.50 1 4 5-6 visits 3.53 3.75 21.00 30.00 3 1 8-15 visits 3.88 3.60 26.00 26.50 2 2 Grand Total 3.38 3.15 24.20 24.30 10 10 Easily turn your PivotTable into a chart Select PivotChart under Analyze tab 35.00 30.00 25.00 20.00 15.00 Average NDSU GPA - F Average NDSU GPA - M Average ACT - F Average ACT - M Number of Students - F Number of Students - M 10.00 5.00 0.00 0 visits 1-4 visits 5-6 visits 8-15 visits If you make changes to your PivotTable, the PivotChart changes automatically! Pretty handy, eh? 35.00 30.00 25.00 20.00 15.00 10.00 Average NDSU GPA - F Average NDSU GPA - M Average ACT - F Average ACT - M Number of Students - F Number of Students - M 5.00 0.00 0 visits 1-4 visits 5-6 visits 8-15 visits 19
Use practice data file See if you can answer these questions: 1. Were more males or females retained to S12? 2. Is there a relationship between HSGPA and F11GPA (NDSU)? Is it similar for males and females? 3. A high school student has a HSGPA of 2.75. What would you predict her F11GPA to be? 4. Were students who used more services (ACE, STUDENTEMPLOYEE, WELLNESS) more or less likely to be retained? 5. Make your own question! Remaining Workshops this fall Analyzing Qualitative Data for Assessment: October 15, 2:00 in Hidatsa Will provide an overview on coding, creating and interpreting themes, and the different qualitative traditions Going Forward Would you be interested in a session on using SPSS for more advanced analyses? 20