Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources
2 Calculate counts, means, and standard deviations Produce frequency distributions and display the results graphically in column charts and histograms Produce crosstabulations of categorical data Produce scatterplots of numeric data Conduct chisquare tests Conduct ttests Conduct a oneway analysis of variance (ANOVA) Conduct correlations and linear regressions 2
3 Click the Insert Function button, choose Statistical as the category, then select one of the following functions from the dropdown list and specify the cell range:  COUNT: gives you the total number of cells in the selected range that contain numbers  AVERAGE: gives you the mean of a range of numbers (Note: any text found in the range will be ignored)  STDEV.P/STDEV.S: gives you the standard deviation of the numbers in the selected range 4
4 These can be easily created using pivot tables Click a cell in your data set, then click the Insert tab, and then select Pivot Table (click OK in the dialog box) Drag the categorical field you would like frequencies of into both the Row Labels and Values drop boxes For a crosstabulation, drop a second categorical field into the Column Labels drop box You can choose to display the data as numeric frequencies, column/row percentages, or both 5
5 After you ve created a frequency distribution with a pivot table, a couple more clicks will give you a column chart that graphically displays the frequency data Click anywhere in the pivot table and then click the Pivot Chart button (under the Options tab in the Tools group) Select the type of column chart you want from the Insert Chart dialog box and then click OK Make formatting changes to the chart using Pivot Chart Tools (most of these will involve the Layout tab) 6
6 To use the Data Analysis addin in Excel, you first need to load it Click File and then select Options Click AddIns, select Excel Addins from the list, and then click Go In the dialog box that appears, place a check in the box beside Analysis ToolPak, and then click OK A Data Analysis button is now available in the Analysis group at the far right side of the Data ribbon 7
7 Unfortunately, the Data Analysis addin for Excel is not available for Mac users As an alternative, you can download StatPlus: You get a free trial period of one month A single license costs $200, but discounts are available if you are a student ($90) or an academic ($130) Unlike the Data Analysis AddIn for PC users, StatPlus has to be opened up sidebyside with Excel 8
8 Click the Data Analysis button under the Data tab Select Descriptive Statistics from the list and click OK In the Descriptive Statistics dialog box, specify the cell range of the field or fields you want to summarize Check the Labels in First Row button Specify where you want the results displayed in Output options, and then select Summary statistics A table appears on a new worksheet that displays the mean, S.E., median, mode, S.D., variance, kurtosis, skewness, range, minimum, maximum, sum, and count 10
9 Open up a new pivot table and drag the field of interest into the Row Labels area Click the field of interest again and drag it into the Values area Make sure the field is set to display Count information Leftclick on any cell in the pivot table that contains a row label, and then choose Group Selection (which is located under the Options tab) Set the Starting at, Ending at, and By values (this allows you to define intervals of your choosing) 11
10 You can create a pivot chart from the pivot table you created on the previous slide Click anywhere in the pivot table and then click the Pivot Chart button (under the Options tab) Select the type of column chart you want from the Insert Chart dialog box and then click OK Make formatting changes to the chart (e.g., removing the gap between columns) using Pivot Chart Tools, most of which are available under the Layout tab 12
11 Select the range of data you want to display in the scatterplot (include the column labels) Click the Insert tab, select Scatter from the Charts group, and then select Scatter with only Markers (the top left choice in the gallery) as the chart type Click on the scatterplot that now appears on your worksheet, click the Layout tab, select Trendline in the Analysis group, and then click More Trendline Options Set the trendline type to Linear, then click the bottom two check boxes to display the equation and R squared value on the chart, and then click Close 13
12 FIRST VARIABLE SECOND VARIABLE APPROPRIATE TEST Categorical Categorical Chisquare Numeric Numeric Numeric Numeric CategoricalDichotomous (independent groups) CategoricalDichotomous (paired/matched groups) Categorical Numeric Student s ttest Paired ttest Analysis of Variance (ANOVA) Correlation/Linear Regression 14
13 This tests the independence of two categorical variables by comparing the actual counts in each cell of the contingency table with the expected counts Set up two tables one showing the observed (actual) counts; the other showing the expected counts For example: =CHISQ.TEST(B2:C5,B11:C14) these are rectangular arrays of observed counts and expected counts The result is the probability (i.e., pvalue) of getting the observed frequencies that differ by as much as this from the expected frequencies 15
14 Enter the T.TEST function into an empty cell and provide the following pieces of information in parentheses, all separated by commas:  the cell range associated with the data for group 1  the cell range corresponding to the data for group 2  whether you are doing a 1 or 2tailed test (usually 2)  the type of test ( 2 =equal variances; 3 =unequal) For example: =T.TEST(B2:B31,C2:C31,2,2) cell range 1 cell range 2 twotailed test equal variances 16
15 The function F.TEST compares the variances of your two samples/groups From the Statistical Functions menu, select F.TEST and provide the following information in the parentheses, all separated by commas:  the cell range of the group with greater variance  the cell range of the group with lesser variance If the resulting pvalue for this comparison is >.05 specify type = 2 when conducting the ttest; if the p value is <.05 specify type = 3 17
16 Commonly used when there is a natural pairing of observations in two samples, such as when a group is tested twice both before and after an intervention Tests the hypothesis that the difference between preand postresponses measured on the same person or thing has a mean value of zero Follow the procedures given on the slide titled TTest for Independent Groups, but set the test type as 1 For example: =T.TEST(B2:B31,C2:C31,2,1) test type 18
17 Using the same data as in the previous slides, click the Data Analysis button on the Data ribbon, and select the appropriate type of ttest, and then click OK Indicate the cell range for variable 1, the cell range for variable 2, modify the alpha level (if needed), click the button labeled New Worksheet Ply, and then click OK The ttest results appear on a new worksheet 19
18 Click the Data Analysis button under the Data tab Select ANOVA: Single Factor from the list and click OK In the dialog box, specify the cell range (which should include as many columns as you have groups) Specify that you want to group by columns, check Labels in First Row, set the alpha level (the default is 0.05) and click New Worksheet Ply in Output options, and then click OK The output appears on a new worksheet and displays the means for all groups, along with the pvalue 20
19 The correlation coefficient, or r, signifies the strength and direction of the relationship between two numeric variables The correlation coefficient can range from (meaning a perfect positive relationship) to (meaning a perfect negative relationship) CORREL: (array1, array2) this statistical function returns the correlation coefficient between X and Y The data have to be pairs of observations, e.g., the MCAT score (X) and Step 1 score (Y) for each student It makes no difference which variable you identify first 21
20 Data Analysis Tool: Correlation allows you to calculate multiple correlations at one time Click the Data Analysis button on the Data ribbon, select Correlation, and then click OK In the Input Range box, enter the cell range containing the data (this should include the column headers) In the section labeled Grouped By, select Columns Click the Labels in First Row check box Click the radio button labeled New Worksheet Ply, and then click OK The correlation results appear on a new worksheet 22
21 Click the Data Analysis button under the Data tab, select Regression, and then click OK Specify the cell range for the Y field and the cell range for the X field (include the column labels) Make sure the Labels box is checked In the Output options, select New Worksheet Ply In the Residuals area, select the output you want to see After clicking OK, a new page opens with the results, which include r, R 2, SE, N, the pvalue, the slope and intercept of the regression line, and predicted scores 23
