Assignment objectives: Regression Pivot table Exercise #1- Simple Linear Regression Often the relationship between two variables, Y and X, can be adequately represented by a simple linear equation of the form Y = b 0 + b 1 X + e In regression terminology Y is the response or dependent variable and X is the regressor or independent variable. The error term, e, suggests that the relationship is not a perfect one. To understand the relationship between Y and X, or to summarize the relationship, or to predict Y for a given X it is necessary to estimate the coefficients or parameters b 0 and b 1. One way to estimate the coefficients is to collect data where Y and X are observed directly and let the data indicate the appropriate values for the coefficients. For example, suppose observations of diameter at breast height (DBH) and stump diameter (at 1-foot) inside bark (DS) of 243 Virginia cove-grown white oak were made (fig. 1). From inspection of the graph a linear equation would seem appropriate for describing the relationship between DBH and DS for cove-grown white oak in Virginia. Hand sketching a line through the data and recalling some basic algebra leads to b 0 (intercept) = 0.0 and b 1 (slope) = 0.75 so that, approximately, for these data DBH = 0.75 DS Simple linear regression analysis can be easily completed in Excel using Data Analysis window. The Analysis ToolPak is a Microsoft Office Excel add-in (an add-in is a supplemental program that adds custom commands or custom features to Microsoft Office.) program that is available when you install Microsoft Office or Excel. To use it in Excel, however, you need to load it first. Depending on the MS Office version different procedure should be used. For Office 2007 first April 26, 2012 Page 1
1. you should click the Microsoft Office Button, and then click Excel Options. 2. Click Add-Ins, and then in the Manage box, select Excel Add-ins. Click Go. 3. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.If you get prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to install it. 4. After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on the Data tab. In Office 2003or XP, the Data Analysis... option is located in Tools menu. If that option is not available in the Excel you are running, you can add it by selecting the Add-Ins... options of the Tools menu. The exampled used is from pages 27-30 of Avery and Burkhart, 5th Edition. An Excel file with the data for the example, called Regression.xls, is on Moodle under Excel#4 folder. Download it on your computer and rename it Excel4_yourlastname.xlsx. There are 62 observations of basal area growth and crown volume. We wish to see if basal area growth can be predicted from crown volume based on a simple linear relationship. A portion of the data as it appears in an Excel worksheet appears below. To plot "BA_GRO" versus "CRWN_VOL" select the data (including labels - cells A1:B63) and choose Chart... under the Insert menu to start up the Chart Wizard. Choose the XY (Scatter) type. Select various options in the Chart Wizard steps to format the graph so it looks like: You can fit a linear regression to the data by choosing Data Analysis... under the Tools menu and subsequently selecting the Regression Analysis Tool. You will be presented with the following dialog: April 26, 2012 Page 2
Specify the BA_GRO data and label for "Input Y Range:" (B1:B63) and the CRWN_VOL data and label for "Input X Range:" (A1:A63). Check the "Labels" box (since you included data labels in your input ranges), provide a new worksheet name under "Output options" (Use the name Results). You should obtain the following results (subset shown): R Square is called coefficient of determination in your text. Intercept is the b 0 term and CRWN_VOL is the slope or b 1 term. The Analysis of Variance table reports the Regression sum-of-squares and the Residual sumof-squares, the numerator of the residual mean square whose square root is the Standard error of estimate (Standard Error in the Regression Statistics table). Note that Excel uses scientific notation, by default, so when it says 3.45E-20 it means 3.45 x 10-20 (i.e. 0.0000000000000000000345). The relationship between variables is significant if the significance of F (the regression s p-value) is less than a preset value, usually 0.05. In this example significance is considerably less than 0.05. Regression thru origin. It is possible that an intercept-less equation might be sensible for the "BA_GRO" on "CRWN_VOL" regression. To see this, a new regression would need to be run (checking the "Constant is Zero" box in the Regression dialog box - Excel, in an unfortunate choice of words, chooses to call the intercept parameter "Constant") and the resulting residuals reanalyzed. If an intercept is critical there will be a linear (non-horizontal) trend to the residuals. To explain your results you should add a textbox indicating whether or not there is a relationship between the two variables and also if the slope and intercepts are needed. Once finished save the file as Excel4_yourlastname.xlsx. April 26, 2012 Page 3
Exercise #2: Pivot Table A Pivot Table is way to present information in a report format. The idea is that you can click drop down lists and change the data that is being displayed. For example, choose just one student from a drop down list and view only his or her scores. Pivot tables are a lot easier to grasp when you see them in action. Here's the one we're going to create in this section: Look at Row 4. This shows that the student is Elisa. If we click Elisa's drop down arrow, we'll see this: Now we have another student to select (we'll only use two students, for this tutorial). We could untick Lisa, and tick Mary instead. Then her scores would display. The Subject and Month cells also have drop down lists. So we could view only January's scores, and just for Art and English, for example. So this is a Pivot Table - a report that we can manipulate by selecting items from drop down lists. Let's make a start. DATA. The first thing you need for a Pivot Table is some data that can be downloaded from the Excel #4 folder from Moodle. The Excel file used for this exercise is called PivotData.xls. Copy the values from the spreadsheet named PivotData into the Excel file that contain the regression exercise (previous exercise). The assignment should be one file. Highlight the data that will be going in to your Pivot Table (cells A1 to D37). On the Excel 2007 menu bar, click Insert. From the Insert menu, locate the Tables panel: April 26, 2012 Page 4
On the Tables panel click Pivot Tables. The Create Pivot Tables dialogue box appears: In the dialogue box above, the data that we highlighted is in the Table/Range textbox. You can select different cells by clicking the icon to the right of the Table/Range textbox. You can also specify an external data source, such as a text file, for the data in your Pivot Table. Selected a New Worksheet as the place where the Pivot Table will be placed. Click OK. When you click OK, Excel 2007 presents you with a rather complex layout. The area on the right should look something like this one below: April 26, 2012 Page 5
It helps to have a look again at what we're trying to create. Here's the completed Pivot Table again: Now take a look at the Pivot Table Field List image again, the one above the completed pivot table. It has tick boxes for Month, Subject, Student, and Score. These are column headings from the original spreadsheet data. We've put the Month in cell A7 on our Pivot Table, Subject is in cell B6, Student is in cell B4, and Score is the Average scores in cells C8 to G10. You'll see how it works, though. The idea is that you tick a box in the Pivot Table Field List, and then drag it to the four areas below. Excel 2007 will take care of the rest. So, tick all four boxes in the field list: Excel will create a basic (and messy) Pivot Table for you. But we're going to put our 4 fields into the 4 areas below. Here are the four areas that can be dragged: April 26, 2012 Page 6
For the Report Filter, we want the name of a Student. For the Column Labels, we want the Subject, and for the Row Labels, we'll just have the Month. The Values will be the Average scores. If you look at the Field areas after you have ticked all four boxes, however, you may see something like this: Month, Subject and Student have all been grouped under Row Labels. You can drag and drop these, though. So click on Student in the Row Labels box. Hold down your left mouse button, and then drag it in to the Report Filter box. If you don't fancy dragging and dropping, simply click the Student item with your left button. From the menu that appears, select Move to Report Filter: Your Field areas will then look like this: April 26, 2012 Page 7
Move Subject from Row Labels to the Column Labels area: Your Field areas will then look like this: The Pivot Table on your spreadsheet will look a lot different, too. It should be looking like this: Our Pivot Table is coming along, but the scores are all wrong, and it needs tidying up a bit. The reason why the scores from our Pivot Table are so strange is because Excel 2007 is using the wrong formula. It's using a Sum total when we want it to use an Average. April 26, 2012 Page 8
The numbers have all been added up. But we want averages, instead. To change the formula, click on Sum of Score under the Values field area: You'll see the following menu: Select, Field Settings to see the following dialogue box: April 26, 2012 Page 9
Change the Formula from Sum to Average, and then click OK. Your Average formula won't be formatted to any decimal places. So highlight you data. On the Home menu in Excel 2007, locate the Number panel. Format your Averages so that it has no decimal places. Your Pivot Table will then look like this: Look at cells A3, B3 and A4 above. These all have the not very descriptive names of Average of Score, Column Labels, and Row Labels. You can click inside of these cells and type your own headings, in exactly the same way as you would to enter text in a normal cell. In the new version of the Pivot Table below, these cells were renamed and the data were centered. Only one thing left to do - spruce up the table by adding a bit of color. Click anywhere on your Pivot Table to highlight it. Now look at the menu bar at the top of Excel 2007. You'll notice a Design menu. Click on this to see the various design options. The Pivot Table Style Options panel is interesting. Select Banded Rows and see what happens. Now click Banded Columns. Next to this panel, there are lots of Pivot Table Styles to choose from. Select one that catches your eye. If you did not save your work yet, then save the file as Excel4_yourlastname.xlsx. April 26, 2012 Page 10