1 Using Excel Jeffrey L. Rummel Emory University Goizueta Business School BBA Seminar Jeffrey L. Rummel BBA Seminar 1 / 54 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships Between Multiple Variables Jeffrey L. Rummel BBA Seminar 2 / 54
2 Data from other sources Many applications now produce Excelready files Can copyandpaste from other applications Sometimes the arrangement of the data is fine When columns are not correct, use the Text to Columns command in the Data tab Jeffrey L. Rummel BBA Seminar 3 / 54 Stock price data Here is some data from AAPL from a text file that is comma delimited: Date,Open,High,Low,Close,Volume,Adj Close ,639.59,642.06,630.00,632.64, , ,648.87,652.79,644.00,644.61, , ,635.37,650.30,631.00,649.79, , ,632.35,635.13,623.85,634.76, , ,629.56,635.38,625.30,629.71, , ,646.50,647.20,628.10,628.10, , ,639.74,644.98,637.00,640.91, , ,638.65,640.49,623.55,635.85, , ,646.88,647.56,636.11,638.17, , ,665.20,666.00,651.28,652.59, , Jeffrey L. Rummel BBA Seminar 4 / 54
3 Political action committee You are working for a political candidate and you get the following From: "The Congressman" To: Date sent: Mon, 24 Aug 15:07: Subject: Campaign Statistics Priority: urgent Here is the data that showed that the effective income tax rate had increased from 14.1% to 15.2% of adjusted gross income during the period 1974 to We can kill two birds with one stone. First, we can choose the category that looks the worst for our opponents. Second, the people in that category will know that we re talking about them, not just some average or someone a lot richer or poorer than they are. See what you can work up for me. Jeffrey L. Rummel BBA Seminar 5 / 54 Political action committee Here is the data from the Adjusted Gross Income 1974 Income 1974 Tax 1978 Income 1978 Tax Under $5,000 41,651,643 2,244,467 19,879, ,318 $5,000$9, ,400,740 13,646, ,853,315 8,819,461 $10,000$14, ,688,922 21,449, ,858,024 17,155,758 $15,000$99, ,010,790 75,038, ,037, ,860,951 $100,000 or more 29,427,152 11,311,672 62,806,159 24,051,698 Total 880,179, ,690,314 1,242,434, ,577,186 Jeffrey L. Rummel BBA Seminar 6 / 54
4 House Price Database Use database collected from the web in The data is ficticious, so do not use it Collection of houses Size, features Price Jeffrey L. Rummel BBA Seminar 7 / 54 Filter to explore data Highlight the entire database, or just one cell inside Click on Filter Each column has commands to manipulate the data Jeffrey L. Rummel BBA Seminar 8 / 54
5 Filter column menu Each column has choices to sort and filter data Sort either way Filter allows check boxes for each value Number filters allow ranges to be selected if too many check boxes Jeffrey L. Rummel BBA Seminar 9 / 54 Excel Calculations of Descriptive Statistics Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships Between Multiple Variables Jeffrey L. Rummel BBA Seminar 10 / 54
6 Excel Calculations of Descriptive Statistics Summary of Excel formulas =AVERAGE(range) =MEDIAN(range) =MODE(range) =STDEV(range) =PERCENTILE(range,percent) =MIN(range) =MAX(range) =STANDARDIZE(X,mean, stdev) Jeffrey L. Rummel BBA Seminar 11 / 54 Excel Calculations of Descriptive Statistics Tool Many different statistical routines Choose This tool not available in all versions of Excel Jeffrey L. Rummel BBA Seminar 12 / 54
7 Excel Calculations of Descriptive Statistics Descriptive Statistics Highlight the input range Check the labels in first row box Check the summary statistics box New worksheet ply is good default, can rename here Jeffrey L. Rummel BBA Seminar 13 / 54 Excel Calculations of Descriptive Statistics Descriptive Statistics Jeffrey L. Rummel BBA Seminar 14 / 54
8 Single Variable Graphs Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships Between Multiple Variables Jeffrey L. Rummel BBA Seminar 15 / 54 Single Variable Graphs Graphing data with points Select data to be graphed (we will use Price column) In the Insert tab, select scatter With only one column, graph depends on the column sorting Y axis scaling issue  usually start at zero for ratio inference X axis can be categorical (1st, 2nd) in this kind of graph (will see (X, Y ) graphs later Jeffrey L. Rummel BBA Seminar 16 / 54
9 Single Variable Graphs Numeric plot of Eastville Houses What can we read from the graph? Median? Quartiles? Can we compute the mean? Can we compute the standard deviation? What happens when the data is sorted? Saving versions of graphs Jeffrey L. Rummel BBA Seminar 17 / 54 Single Variable Graphs Graphing data with columns Select data to be graphed (we will use Personnel column) In the Insert tab, select column Graph depends on the sort of the column Columns show the magnitude of each data point Jeffrey L. Rummel BBA Seminar 18 / 54
10 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships Between Multiple Variables Jeffrey L. Rummel BBA Seminar 19 / 54 Frequency Graphs  Histograms Instead of graphing each point, graph the count of different ranges Each range is called a bin The frequency in the range is calculated Excel command =FREQUENCY() Can also use the Histogram tool in the tool Jeffrey L. Rummel BBA Seminar 20 / 54
11 Histogram choices Bin size too big  few tall bars Bin size too small  many short bars Right number of bars n Jeffrey L. Rummel BBA Seminar 21 / 54 Frequency Approach Requires a little work, but can be used with any version of Excel Consider the price column What is the range? Looking at the descriptive statistics, the range is $197,300 to $675,030 There are 108 houses, but how many bars in the graph? One rule is to have the number of bars equal the square root of the number of data points Will want to round these values Jeffrey L. Rummel BBA Seminar 22 / 54
12 Frequency Construction The =FREQUENCY() function takes two parameters The first is the range The second is the bin  the upper value of the range The command will count how many values in the range are less than or equal to the bin value We can create a table with the appropriate bin values Jeffrey L. Rummel BBA Seminar 23 / 54 Frequency Graph We will graph both columns  the cumulative total and the frequency of each range Select columns B and C and insert a column chart Jeffrey L. Rummel BBA Seminar 24 / 54
13 Frequency Construction Right click the graph and edit each series to change the names Also edit the horizontal axis labels to use the bins in column A Click on one of the cumulative graph bars to highlight that series Right click to change the series time to a line graph instead of a bar graph The right click and select format data series, and plot the cumulative data on a secondary axis Jeffrey L. Rummel BBA Seminar 25 / 54 Reading a Histogram What range occurs the most frequently? Can you find the median? The quartiles? Can you determine the mean? Is there skewness in the data? Jeffrey L. Rummel BBA Seminar 26 / 54
14 Default Histogram Along with descriptive statistics, there is an option in the tool to create a histogram: Jeffrey L. Rummel BBA Seminar 27 / 54 Histogram choices We can also control the way the histogram is constructed There are often more logical start/stop points for bins There are also more logical bin sizes than those computed by the tool Create range and fill in the bin range box in the Histogram dialog box Jeffrey L. Rummel BBA Seminar 28 / 54
15 Histogram summary Using the FREQUENCY() formulas requires some preparation but results in a graph that can be adjusted more easily If the data changes, the charts created this way will also change The Histogram Tool automates some of the work required to create histograms, and is therefore useful when starting to look at a data set But the graphs, once created, are separate from the actual data and will not change Adjusting the bins to create a good graph can sometimes be tedious Jeffrey L. Rummel BBA Seminar 29 / 54 Relationships Between Multiple Variables Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships Between Multiple Variables Jeffrey L. Rummel BBA Seminar 30 / 54
16 Relationships Between Multiple Variables Showing Relationships with Scatterplots Looking for relationships between data Graph (X, Y ) pairs to test relationship Highlighting two columns sometimes works Highlight the Y column and create a scatterplot Edit the data to at the X variable Jeffrey L. Rummel BBA Seminar 31 / 54 Relationships Between Multiple Variables Scatterplots and hypothesis creation Looking for relationships between data What relationship could be imagined Looking for the variability of the relationship Also looking for points that violate the general relationship Graph will not check the relationship  that is done with more analysis Jeffrey L. Rummel BBA Seminar 32 / 54
17 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships Between Multiple Variables Jeffrey L. Rummel BBA Seminar 33 / 54 Explore subsets of the data Often we have data with multiple measures per observation For example, we lots of information about the houses We will be able to use this in model building, but sometimes a useful first step is to compare descriptive statistics across subsets of the data Jeffrey L. Rummel BBA Seminar 34 / 54
18 Explore subsets of the data One way to do this would be to use the filter commands Isolate the data of interest Copy just these rows and use the descriptive statistics tools An easier way to accomplish this is to use the Pivot Table tool Jeffrey L. Rummel BBA Seminar 35 / 54 Pivot table creation Notice there are some natural subgroups in the data Is there a difference between school district? Houses with fireplaces? Basements? Heat? Jeffrey L. Rummel BBA Seminar 36 / 54
19 Pivot table creation First, click somewhere in the data range for all the houses, or highlight the entire database On the insert menu, click on the pivot table button This creates a special page that is used for constructing the table Jeffrey L. Rummel BBA Seminar 37 / 54 Pivot table starting point Here is what the new tab looks like to start: Jeffrey L. Rummel BBA Seminar 38 / 54
20 Pivot table components There are two major sections for controlling the pivot table In the sheet itself, there are four areas that indicate you can drop fields there (page, column, row and data) On the right side, there is the pivot table field list at the top At the bottom are four boxes These correspond to the four drop areas, although they are labeled differently Jeffrey L. Rummel BBA Seminar 39 / 54 Pivot table components What we will do is select different fields for use in the report If we drop a field in the data area, or in the lower right box, The pivot table will compute a statistic with that field This is not very exciting, but if we click and drag the other fields to the Column and Row labels, the power of the Pivot table becomes clear Jeffrey L. Rummel BBA Seminar 40 / 54
21 More complex tables We can add more fields to the columns or rows, and the table will continue to drill down If there is more than one field for the column or row, the order in which the drilling down takes place can be modified by reordering the fields (click and drag them) We can also calculate more than one value for each subset by dragging more data down Clicking on a value field will bring up a dialog box that allows for formatting the number, but also changing the statistical calculation Jeffrey L. Rummel BBA Seminar 41 / 54 More complex tables Notice that there is a button called Sum Values, and it can be placed in the row or column box This allows the multiple statistics to be shown stacked veritically or horizontally The final control for the table is the Report Filter This will allow us to filter the entire table to show a subset of the data This is done with a selection drop down box on the spreadsheet that works just like filtering a column in the database Jeffrey L. Rummel BBA Seminar 42 / 54
22 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships Between Multiple Variables Jeffrey L. Rummel BBA Seminar 43 / 54 What is a model? Often knowing the general relationship is not enough We need a more precise specification of the relationship between the data Jeffrey L. Rummel BBA Seminar 44 / 54
23 What is a model? The model is written as Y = β 0 + β 1 X 1 + β 2 X β M X m If we are dealing with a data set, then there are some number of observations, which we label as i so that the prediction for the i th row of the data set would be Ŷ i = β 0 + β 1 X 1i + β 2 X 2i + + β M X mi Jeffrey L. Rummel BBA Seminar 45 / 54 Eastville Case example If we believe that size and price are related, we can easy create a scatterplot If you want to see the regression line for the scatterplot, rightclick the data points and select Add Trendline In the dialog box, use the Linear option, and check the box to display the equation on the chart Here is the resulting graph Jeffrey L. Rummel BBA Seminar 46 / 54
24 Eastville Case example Jeffrey L. Rummel BBA Seminar 47 / 54 software In addition to graphing data and calculating a regression line by hand, there is a tool in that will compute a regression equation This is important when there are more independent variables and we cannot draw a graph Jeffrey L. Rummel BBA Seminar 48 / 54
25 software Under Tools select and then The dialog box looks like this once it is filled in Jeffrey L. Rummel BBA Seminar 49 / 54 software The Y range is for the dependent variable This needs to be just a single column of numbers The X range is for the independent variable(s) If there is more than one variable, you can highlight a square range (but you cannot highlight noncontiguous columns) There are labels above the columns, so that box is checked Often useful to check Residuals, Standardized Residuals and Residual Plots Jeffrey L. Rummel BBA Seminar 50 / 54
26 output If you click OK, then the following report is created by the tool (I have done some formatting adjustments) Jeffrey L. Rummel BBA Seminar 51 / 54 output tool output is divided into four major sections: The regression statistics The ANOVA (which is short for analysis of variance) section The regression equation and statistics The residuals Jeffrey L. Rummel BBA Seminar 52 / 54
27 More than one independent variable We found a relationship between price and square feet What about the other variables? Instead of highlighting one column, we can highlight all the columns left of price in the regression tool Jeffrey L. Rummel BBA Seminar 53 / 54 output Here is the new regression report Jeffrey L. Rummel BBA Seminar 54 / 54
