the Median-Medi Graphing bivariate data in a scatter plot



Similar documents
Solving Systems of Two Equations Algebraically

Session 7 Bivariate Data and Analysis

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Basic Graphing Functions for the TI-83 and TI-84

Section 1.1 Linear Equations: Slope and Equations of Lines

Coordinate Plane, Slope, and Lines Long-Term Memory Review Review 1

Linear Equations. Find the domain and the range of the following set. {(4,5), (7,8), (-1,3), (3,3), (2,-3)}

Measurement with Ratios

Graphing Linear Equations in Two Variables

Module 3: Correlation and Covariance

Section 1.5 Linear Models

Objective. Materials. TI-73 Calculator

Dealing with Data in Excel 2010

Activity 5. Two Hot, Two Cold. Introduction. Equipment Required. Collecting the Data

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving

LESSON TITLE: Math in Restaurants (by Deborah L. Ives, Ed.D)

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

In A Heartbeat (Algebra)

Elements of a graph. Click on the links below to jump directly to the relevant section

Graphing Quadratic Functions

with functions, expressions and equations which follow in units 3 and 4.

Example: Boats and Manatees

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Georgia Standards of Excellence Curriculum Map. Mathematics. GSE 8 th Grade

MATH 60 NOTEBOOK CERTIFICATIONS

1.3 LINEAR EQUATIONS IN TWO VARIABLES. Copyright Cengage Learning. All rights reserved.

Worksheet A5: Slope Intercept Form

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Paper 1. Calculator not allowed. Mathematics test. First name. Last name. School. Remember KEY STAGE 3 TIER 6 8

Algebra II End of Course Exam Answer Key Segment I. Scientific Calculator Only

Slope-Intercept Equation. Example

Graphing Linear Equations

Let s explore the content and skills assessed by Heart of Algebra questions.

Review of Fundamental Mathematics

Paper 1. Calculator not allowed. Mathematics test. First name. Last name. School. Remember KEY STAGE 3 TIER 5 7

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Equations, Lenses and Fractions

Linear Programming. Solving LP Models Using MS Excel, 18

TI-83/84 Plus Graphing Calculator Worksheet #2

Algebra Cheat Sheets

Write the Equation of the Line Review

Minnesota Academic Standards

The Point-Slope Form

Logo Symmetry Learning Task. Unit 5

EQUATIONS and INEQUALITIES

What does the number m in y = mx + b measure? To find out, suppose (x 1, y 1 ) and (x 2, y 2 ) are two points on the graph of y = mx + b.

Florida Math for College Readiness

10.1. Solving Quadratic Equations. Investigation: Rocket Science CONDENSED

Activity 6 Graphing Linear Equations

MATH Fundamental Mathematics IV

3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Regression III: Advanced Methods

Summary of important mathematical operations and formulas (from first tutorial):

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

LINEAR EQUATIONS IN TWO VARIABLES

1.2 GRAPHS OF EQUATIONS. Copyright Cengage Learning. All rights reserved.

5.1 Simple and Compound Interest

Big Ideas in Mathematics

Example 1. Rise 4. Run Our Solution

Unit 1 Equations, Inequalities, Functions

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Lecture 8 : Coordinate Geometry. The coordinate plane The points on a line can be referenced if we choose an origin and a unit of 20

Using Excel for Handling, Graphing, and Analyzing Scientific Data:

Overview. Observations. Activities. Chapter 3: Linear Functions Linear Functions: Slope-Intercept Form

MODERN APPLICATIONS OF PYTHAGORAS S THEOREM

Elasticity. I. What is Elasticity?

Curve Fitting in Microsoft Excel By William Lee

PLOTTING DATA AND INTERPRETING GRAPHS

Objectives. Materials

Slope & y-intercept Discovery Activity

Tom wants to find two real numbers, a and b, that have a sum of 10 and have a product of 10. He makes this table.

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

FREE FALL. Introduction. Reference Young and Freedman, University Physics, 12 th Edition: Chapter 2, section 2.5

Linear Equations. 5- Day Lesson Plan Unit: Linear Equations Grade Level: Grade 9 Time Span: 50 minute class periods By: Richard Weber

1 Determine whether an. 2 Solve systems of linear. 3 Solve systems of linear. 4 Solve systems of linear. 5 Select the most efficient

Simple linear regression

Course Outlines. 1. Name of the Course: Algebra I (Standard, College Prep, Honors) Course Description: ALGEBRA I STANDARD (1 Credit)

Mathematics Online Instructional Materials Correlation to the 2009 Algebra I Standards of Learning and Curriculum Framework

Common sense, and the model that we have used, suggest that an increase in p means a decrease in demand, but this is not the only possibility.

Lesson 18: Introduction to Algebra: Expressions and Variables

GENERAL SCIENCE LABORATORY 1110L Lab Experiment 6: Ohm s Law

Fairfield Public Schools

Homework #1 Solutions

Answer Key Building Polynomial Functions

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

The mathematical branch of probability has its

Correlation key concepts:

Bar Graphs and Dot Plots

Updates to Graphing with Excel

Indiana State Core Curriculum Standards updated 2009 Algebra I

Hands-On Math Algebra

Tutorial for the TI-89 Titanium Calculator

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

The Mathematics 11 Competency Test Percent Increase or Decrease

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, , , 4-9

POLYNOMIAL FUNCTIONS

Transcription:

the Median-Medi Students use movie sales data to estimate and draw lines of best fit, bridging technology and mathematical understanding. david c. Wilson Graphing bivariate data in a scatter plot and drawing an approximate line of best fit for the data have become commonly recommended activities (NCTM 2000) or even, in some states, a standard (e.g., New York State Education Department 2005) for middle school and high school students. The graphing calculator has provided a mechanism for students both to approximate a best-fit line (e.g., using the Transform application on the TI-84) and to calculate the best-fit line using a built-in option (e.g., LinReg or Med-Med on the TI-84). Computer software such as Fathom (2007) offers similar options for students to use in their exploration of data. Frequently, the goals of such explorations focus on exploring the slope as a rate of change, discussing the meaning of the y-intercept, or interpolating and extrapolating within the given context. These goals all contribute to broadening student understanding, yet the initial task of drawing the best-fit line remains a bit of a mystery to most students. Consider the two graphs in figure 1, which represent two possible responses from students when asked to draw a best-fit line for the scatter plot of data relating weight and length of a rubber band. What criteria did each student use to draw his or her line? Two common misperceptions of students when drawing the best-fit line are that the line should hit as many points as possible and that the line should divide the data set so that the number of points above and below the line is equal. It is difficult to counter these beliefs with mathematical reasoning when the mathematical process behind finding a line of best fit seems beyond the students current knowledge. To develop insight into this process, we will use the median-median line, which uses basic principles of coordinate geometry and linear equations to calculate the equation of the line of best fit. The process will provide students with a way of thinking about the appropriate criteria to consider when drawing bestfit lines and will reveal the mystery of what is happening in their calculator or computer when best-fit lines are generated. In addition, students will be able to compare and contrast the median-median line and least-squares regression line and make decisions about which may be more appropriate for a given set of data. Lisa thornberg/istockphoto.com 262 MatheMatics teacher Vol. 104, no. 4 november 2010 Copyright 2010 The National Council of Teachers of Mathematics, Inc. www.nctm.org. All rights reserved. This material may not be copied or distributed electronically or in any other format without written permission from NCTM.

an line Vol. 104, No. 4 November 2010 Mathematics Teacher 263

(a) (b) Fig. 1 Which student s best-fit line for this data set is more appropriate? THE BEST-FIT LINE FOR THREE DATA POINTS The task of determining a best-fit line for a data set consisting of two points is a trivial task. The addition of a third point complicates the task and calls for some analysis. Suppose that we begin by drawing a line through the outermost points, as shown in figure 2, with points A (2, 5), B (6, 4), and C (8, 2). If we take the line to be our initial attempt at a best-fit line, the task becomes one of looking to account for point B in some way. One possibility would be to shift the line up a bit toward point B. The question is, How far up do we slide the line so that it takes point B into account in a reasonable way? For example, if we shift the line up 1 unit, then it passes through B, but this solution does not seem to be a reasonable best-fit line. To resolve this Fig. 2 Using the left and right data points to form a line of best fit is a good starting point. dilemma, it is necessary to introduce the concept of a residual. The residual (R) of a data point is the vertical distance between the data point and the line of best fit. It is calculated by subtracting the y-coordinate of the point on the line from the y-coordinate of the data point; the formula would be R = y yʹ, where yʹ is the y-coordinate of the point on the line directly above or below the data point. This way of thinking about the distance from a point to a line is different from the typical Euclidian interpretation and is easier to calculate. Figure 3 shows the residual for point B. This method of calculating residuals results in positive residuals for points above the line and negative residuals for points below the line. Thus, subtracting the y-coordinates as described yields 4 3 = 1. The value of the residual allows us to think quantitatively in trying to answer our question about how far up the line should be shifted to account for point B. (Take a moment to ponder this idea before reading on.) Consider using one-third of the residual as the shift amount. Figure 4 shows the best-fit line for these three points when using this approach. Now consider the value of the residuals for the three points to see why 1/3 is the desired amount. Point B now has a residual of 2/3, while points A and C each have a residual of 1/3. Summing the residuals results in zero and prompts a new name for the line the zero-residual line. The process for finding the equation of this line is readily accessible to most eighth- and ninth-grade students, making it very attractive to use. First, students find the equation of the line that goes through the outermost points A and C, in this case yielding y = (1/2)x + 6. Because this line is then shifted by 1/3 of the residual toward B, the slope of the line will not change; only the y-intercept changes. That is, if 1/3 of the residual of B is added to the y-inter- 264 MatheMatics teacher Vol. 104, no. 4 november 2010

Fig. 3 the vertical red segment is the residual for the data point B. Fig. 4 shifting the line in figure 3 up by 1/3 produces a line whose residuals sum to zero. (a) (b) (c) Fig. 5 Using three points helps students understand the process. cept of the line through A and C, then the resulting equation is the zero-residual line. In this case, the result is y = (1/2)x + 19/3. This line is the median-median line for the three points. It can also be displayed by entering the coordinates of the points into lists on the TI-84 and generating the line of best fit by selecting Med-Med under the STAT-CALC menu. Figure 5 shows screen shots for this process. The last screen shot (fig. 5c) displays the residual values of points A, B, and C. These are generated automatically each time a user asks the calculator to find the Med-Med line. To view them, type RESID as the name of a list and press ENTER. THE MEDIAN-MEDIAN LINE One question that may need to be addressed at this point is, Why is this best-fit line referred to as the median-median line? A second question arises as to what to do when the data set has more than three points. The answer to the second question yields the answer to the first. Consider the data set given in table 1, provided by the National Association of Theater Owners (http://www.natoonline.org/ statistics.htm), on the average cost of admission to a movie (in U.S. dollars) and total annual movie Table 1 Movie Attendance Data 2002 9 Year Cost (in U.S. dollars) Attendance (in billions) 2002 5.80 1.570 2003 6.03 1.521 2004 6.21 1.484 2005 6.41 1.376 2006 6.55 1.401 2007 6.88 1.400 2008 7.18 1.341 2009 7.50 1.414 attendance in the United States and Canada from 2002 through 2009. The process of finding the median-median line of best fit begins with reducing the data set to three summary points, thus enabling the use of the process described above. The first task is to order the data by the x-values (as we do when determining the median of a set of data) and then split the data into three groups. If the number of points in the data set is not divisible Vol. 104, no. 4 november 2010 MatheMatics teacher 265

Table 2 Grouped Movie Data Cost (in U.S. dollars) Attendance (in billions) 5.80 1.570 6.03 1.521 6.21 1.484 6.41 1.376 6.55 1.401 Fig. 6 the median-median line for the movie data can be superimposed on a scatter plot of the data. 6.88 1.400 7.18 1.341 7.50 1.414 by 3, then the data are split so that the two outer groups contain equal quantities. For the movie data, this process yields two outer groups of three points and a middle group of two points. Table 2 displays the three groups of data. Students can use the table or the graph (draw three vertical lines) to identify the three groups. The summary point for each group is found by calculating the median x-value and the median y-value, hence the name median-median. The summary point may or may not be part of the original data set. The summary points for the movie data are (6.03, 1.521), (6.48, 1.389), and (7.18, 1.400). The third summary point illustrates the fact that the median x-value and median y-value are not always the x- and y-values of the middle point if there are three points. Once the summary points have been determined, the line of best fit can be calculated by following the three-point process described previously. The equation of the line through the outer points is y = 0.105x + 2.155. The residual of the middle point is 0.086. Taking 1/3 of the residual and adding it to the y-intercept of the equation yields y = 0.105x + 2.126 as the median-median line for the movie data. Again, the TI-84 can be used to display this (see fig. 6). The differences in values reflect the rounding decisions made during calculations. Note that the residual data displayed show the residuals for each of the points in the data set and not the residuals of the summary points. The process of finding the median-median line, while requiring students to use procedures that they may be expected to be competent in, can still be a challenging one; real data seldom have integer values, and the multistep process initially may appear daunting. Students should work through several examples involving three points to become familiar with the process and understand the zero-residual line before working with larger data sets. Students should understand what they are doing when taking 1/3 of the residual to the middle point and how that leads to finding the median-median line; otherwise, this procedure will become rote. CONNECTIONS TO RELATED TOPICS The numerous connections between the medianmedian line and traditional mathematical content in eighth- and ninth-grade mathematics curricula are worth recognizing. The process of finding the median-median line involves the concepts of slope, intercept, parallel lines, vertical shifts of graphs, and median values as well as fundamental algebraic skills. Perhaps most significant, students will have a way of estimating where they should draw the best-fit line for a given set of data and better understand what the calculator or software is doing when they use those tools. The task of finding a best-fit line also provides an opportunity to bring meaning to the slope and y-intercept values in the context of the data. The movie cost and attendance data are particularly well suited for this purpose. Moreover, it is helpful for students to step outside the mathematical analysis and consider what the values reveal about the data. Interpretations of slope should lead students to suggest statements reflecting that an increase of one dollar in the cost of a ticket is associated with a decrease in attendance of approximately 0.105 billion people. Put into more meaningful terms (another important task!), an increase in ticket cost of one dollar is associated with a decrease in attendance of 105 million people or an increase of 10 cents is associated with a amanda rohde/istockphoto.com 266 MatheMatics teacher Vol. 104, no. 4 november 2010

decrease in attendance of 10.5 million people. Similarly, interpretations of the y-intercept should lead to discussions about attendance if the cost were zero and, naturally, how meaningful this value is and the need for caution when extrapolating from data. The movie data set also leads students to conclude reasonably but incorrectly that attendance is related to ticket cost in a causal way. It is worth having a discussion about what was happening in the home-entertainment world and economy during the years under examination. To students, it may seem counterintuitive that the data do not provide evidence of causality. To appreciate the distinction between association and causality more fully, students may explore causality within other data sets, such as the number of fire trucks responding to a fire vs. the cost in fire damage (in dollars). Another activity that uses Fathom to explore association versus causality focuses on shoe size and reading ability in an engaging way to develop reasoning regarding causality (Center for Technology and Teacher Education 2001). A natural question that students might ask after discussing the median-median line would be how the least-squares regression line (LinReg on the TI-84) is different. A beneficial outcome for students of working with the median-median line is that they now have the necessary knowledge to understand how the least-squares process results in a different line. Note that the process uses the mean rather than the median as a starting point. That is, the mean x-value and the mean y-value for the data set are calculated, and the best-fit line is drawn through that point. The least-squares process differs from the median-median line process in that it does not involve summing the residuals to zero; rather, the residuals for each point are squared (resulting in positive values), and their sum is minimized, hence the name least squares. Figure 7 displays the movie data graphed in Fathom with the leastsquares regression line, the equation, and the sum of the squares of the residual values. The differences between the underlying processes that result in two lines of best fit can also spur discussion of when it might be more appropriate to use one or the other. As in the process of selecting an appropriate measure of center, outliers play a determining role. That is, the median-median line, because summary points are used in its construction, is more resistant to extreme points than is the least-squares line. CONCLUSIONS Technology can serve as a tool to generate discussion of and interest in underlying procedures. The mathematics underlying the median-median line Fig. 7 Fathom can display the data and the least-squares line as well as provide images of the squares of the residuals. is within the reach of most high school students and involves tasks that they may be expected to perform. This activity provides all students with a richer understanding of the processes underlying the task of finding lines of best fit and gives them tools to use when thinking about where best-fit lines should be drawn. REFERENCES Center for Technology and Teacher Education. The Correlation between Shoe Size and Reading Level. 2001. www.teacherlink.org/content/math/ activities/ft-shoe/guide.html. Key Curriculum Press. Fathom (version 2.1). Berkeley, CA: Key Curriculum Press, 2007. National Council of Teachers of Mathematics (NCTM). Principles and Standards for School Mathematics. Reston, VA: NCTM, 2000. New York State Education Department (NYSED). New York State Mathematics Core Curriculum. NYSED, 2005. www.emsc.nysed.gov/ciai/mst/ mathcorepage.html. DAVID C. WILSON, wilsondc@ buffalostate.edu, a former high school teacher, is an associate professor of mathematics education at Buffalo State, SUNY. His work focuses on creating learning environments that foster inquiry and build connections and understanding. Vol. 104, no. 4 november 2010 MatheMatics teacher 267