Assignment #1: Spreadsheets and Basic Data Visualization Sample Solution Part 1: Spreadsheet Data Analysis Problem 1. Football data: Find the average difference between game predictions and actual outcomes, across all games. The HomeScore and AwayScore columns are the actual scores, while the Prediction column is an estimate of how much the home team will win or lose by, i.e., it s an estimate of HomeScore minus AwayScore. For example, if the Home team beats the Away team by 15 points and the Prediction is 10, then the difference between Prediction and outcome is 5; if the Home team loses to the Away team by 8 points and the Prediction is 2, then the difference between Prediction and outcome is 6. In arithmetic terms, Outcome = (HomeScore AwayScore), and Difference = (Prediction Outcome). Give a single positive or negative number for the average difference, then explain how you manipulated the spreadsheet to arrive at the number. 0.96875 Explanation: Let column H contain the Difference using formula =(G2 (D2 F2)), expanded appropriately through all the other rows. The answer is given by =AVERAGE(H2:H). Note the minus sign in the answer. Problem 2. Football data: Which weeks have the smallest and largest point spreads on average across all games played in that week across all three years? The point spread is the difference between the higher and lower actual scores, i.e., HomeScore AwayScore, always a positive number. (Hint: You might want to use the =abs() function.) Give the smallest and largest point spread weeks, along with the average point spread across all games played in that week in all years, then explain how you manipulated the spreadsheet to arrive at the results. Week 5 has the smallest average point spread with value 9.325 Week 17 has the largest average point spread with value 13.93333 Explanation: Let column H contain the Absolute Difference using formula =ABS(D2 F2), expanded appropriately through all the other rows. Then create a pivot table (referencing the whole table, including column H) where the Rows = Week, Values = Column H AVERAGE. Then, sort by AVERAGE of column H. Problem 3. Football data: Find the teams that scored the least and the most total points. Make sure to include both home and away games when computing total points. Give the two teams and their total points, then explain how you manipulated the spreadsheet to arrive at the answer. Team with least points: Cleveland, 378 points Team with most points: Minnesota, 1352 points
Explanation: In column H use formula =SUMIF(C$2:C,C2,D$2:D) to obtain the sum of HomeScores grouped by matching the Home column with itself. In column I use formula =SUMIF(E$2:E,C2,F$2:F) to obtain the sum of AwayScores grouped by matching the Home column with Away column. In column J use formula =(H2+I2). Sort by column J to find the Home team with the highest and lowest values in column J. Problem 4. Schoolkids data: What is the overall ranking of Grades, Sports, Looks, and Money in perceived popularity impact across all of the students? Note that every row represents one student s ranking from 1 (most important) to 4 (least important) on how important the four factors (Grades, Sports, Looks and Money) are in impacting popularity. State the four factors in order of overall importance (the factor with the highest importance should be first), and explain how you manipulated the spreadsheet to arrive at the answer. In descending order of importance: Sports, Looks, Grades, Money Explanation: Let =AVERAGE(G2:G), =AVERAGE(H2:H), =AVERAGE(I2:I), =AVERAGE(J2:J) be the average rating of Grades, Sports, Looks, and Money. Simply order the four columns by the average rating from lowest to highest. Note that AVERAGE may be replaced with SUM to use the sum of ratings in this problem, since the population size is the same for all Grades, Sports, Looks and Money. Problem 5. Schoolkids data: Which area type puts the least emphasis overall on Looks: Rural, Suburban, or Urban? Give your answer and explain how you manipulated the spreadsheet to arrive at your answer. Suburban Explanation: Create a pivot table where Rows = Type, Values = Looks AVERAGE. Pick the type with the highest average. Note that you may not use SUM for this problem, since the population size differs for each type. For example, say there are 10 students in Urban schools, all of whom gave Looks ratings of 1, and there s only 1 student in Suburban schools, who gave Looks a rating of 4. Using SUM would give Urban a total rating of 10 (less important) and Suburban a total rating of 4 (more important), while obviously it s the other way around. Problem 6. Schoolkids data: Find the school with the highest girl to boy ratio. Provide the school name and the ratio, and explain how you manipulated the spreadsheet to arrive at your answer. Elm, with ratio 3.2 Explanation: Create a pivot table where Rows = School, Columns = Gender, Values = Gender COUNTA. In the pivot table sheet, add a column D=(C2/B2) and manually pick the school with the highest girl to boy ratio. Problem 7. Schoolkids data: Considering only boys who are older than 10, what is their most common goal? Give the goal and the number of boys older than 10 who have that goal, then explain how you manipulated the spreadsheet to arrive at the answer.
Grades, with 60 students. Explanation: Use the filter view to filter Gender=Boy and Age>10. Copy the resulting table to a new sheet, add Sports, Grades, Popular to three rows in Column K, then in Column L use formula =COUNTIF(F$2:F, K2). Alternatively, in column K use formula =UNIQUE(F2:F) expanded to other rows, and in column L use formula =COUNTIFS(A$2:A, "boy", C$2:C, ">10", F$2:F, L2). Part 2: Data Visualization Problem 8. Football bar graph: For Arizona, Atlanta, Baltimore, Buffalo, and Carolina, create a bar graph showing their average score as a home team and average score as an away team. Problem 9. Schoolkids bar graph: Create a stacked bar graph that shows, for each of the three school Types (Rural, Suburban, Urban), the total number of students divided by how many have the three goals (Sports, Popular, Grades). Your graph should have three bars, with each bar divided into three colors.
Problem 10. Schoolkids pie chart: Create a pie chart showing the relative percentage of students attending schools that are Rural, Suburban, and Urban. Problem 11. Schoolkids pie chart: Create a pie chart that shows the relative percentage of students who rank as most important Grades, Sports, Looks, and Money.
Problem 12. Football pie chart: Every game is either a home win (HomeScore > AwayScore), an away win (HomeScore < AwayScore), or a tie (HomeScore = AwayScore). Create a pie chart that shows the relative percentage of home wins, away wins, and ties across all games played. (Hint: you may find the =if() function helpful.) Problem 13. Football scatterplot: Create a scatterplot where the x axis is the Prediction and the y axis is the actual outcome (HomeScore minus AwayScore). There should be one dot in the scatterplot for every game played. What do you conclude from the scatterplot? Explanation: Predictions are somewhat better than random, since we can see that very generally as the outcome goes up so does the prediction. But there s a wide range, i.e., there are still a lot of bad predictions!