Transforming Bivariate Data

Similar documents
Area Formulas TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson. TI-Nspire Navigator System

The River of Life TEACHER NOTES SCIENCE NSPIRED. Science Objectives. Math Objectives. Materials Needed. Vocabulary.

Application of Function Composition

Triangle Trigonometry and Circles

Graphic Designing with Transformed Functions

Transformations: Rotations

Introduction to the TI-Nspire CX

TEACHER NOTES MATH NSPIRED

The Circumference Function

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Graphic Designing with Transformed Functions

Linear Regression. use waist

Building Concepts: Dividing a Fraction by a Whole Number

Calculator Notes for the TI-Nspire and TI-Nspire CAS

Straightening Data in a Scatterplot Selecting a Good Re-Expression Model

AP Statistics. Chapter 4 Review

Lesson Using Lines to Make Predictions

Fairfield Public Schools

Homework 8 Solutions

The KaleidaGraph Guide to Curve Fitting

Grade level: secondary Subject: mathematics Time required: 45 to 90 minutes

Chapter 23. Inferences for Regression

Exponential Growth and Modeling

TImath.com Algebra 1. Absolutely!

Scientific Graphing in Excel 2010

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Simple linear regression

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Session 7 Bivariate Data and Analysis

Curve Fitting, Loglog Plots, and Semilog Plots 1

Data analysis and regression in Stata

FREE FALL. Introduction. Reference Young and Freedman, University Physics, 12 th Edition: Chapter 2, section 2.5

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

MTH 140 Statistics Videos

Spreadsheet software for linear regression analysis

Scatter Plot, Correlation, and Regression on the TI-83/84

Problem of the Month Through the Grapevine

Correlation and Regression

Section 3 Part 1. Relationships between two numerical variables

Prentice Hall Mathematics: Algebra Correlated to: Utah Core Curriculum for Math, Intermediate Algebra (Secondary)

Regression III: Advanced Methods

Title ID Number Sequence and Duration Age Level Essential Question Learning Objectives. Lead In

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Addition by Division TEACHER NOTES SCIENCE NSPIRED

Rotational Motion and Torque

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Polynomials and Polynomial Functions

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Exercise 1.12 (Pg )

Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different)

Getting Correct Results from PROC REG

Pre-Algebra Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

TIphysics.com. Physics. Bell Ringer: Mechanical Advantage of a Single Fixed Pulley ID: 13507

Module 3: Correlation and Covariance

Describing Relationships between Two Variables

Teacher Management. Estimated Time for Completion 2 class periods

Review of Fundamental Mathematics

AP Physics 1 and 2 Lab Investigations

Concepts Sectors, arcs, circumference, areas of circles and properties of cones.

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Pennsylvania System of School Assessment

Chapter 7: Simple linear regression Learning Objectives

What Does the Normal Distribution Sound Like?

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Relationships Between Two Variables: Scatterplots and Correlation

SPSS Explore procedure

Scatter Plots with Error Bars

Tutorial for the TI-89 Titanium Calculator

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

High School Algebra Reasoning with Equations and Inequalities Solve systems of equations.

Tennessee Department of Education

Math 132. Population Growth: the World

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Module 5: Multiple Regression Analysis

Simple Predictive Analytics Curtis Seare

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

To learn the proper method for conducting and analyzing a laboratory experiment. To determine the value of pi.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Using Microsoft Excel to Plot and Analyze Kinetic Data

(Least Squares Investigation)

Quickstart for Desktop Version

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Clovis Community College Core Competencies Assessment Area II: Mathematics Algebra

Summary of important mathematical operations and formulas (from first tutorial):

McDougal Littell California:

Algebra II End of Course Exam Answer Key Segment I. Scientific Calculator Only

Module 1, Lesson 3 Temperature vs. resistance characteristics of a thermistor. Teacher. 45 minutes

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Describing, Exploring, and Comparing Data

Overview. Essential Questions. Precalculus, Quarter 4, Unit 4.5 Build Arithmetic and Geometric Sequences and Series

Linear Models in STATA and ANOVA

Absorbance Spectrophotometry: Analysis of FD&C Red Food Dye #40 Calibration Curve Procedure

Activity 5. Two Hot, Two Cold. Introduction. Equipment Required. Collecting the Data

Transcription:

Math Objectives Students will recognize that bivariate data can be transformed to reduce the curvature in the graph of a relationship between two variables. Students will use scatterplots, residual plots, and correlation coefficients of different transformations of bivariate data to determine which transformation is more effective at eliminating the curvature. Students will make predictions about the response variable using a transformed data model. Students will construct viable arguments (CCSS Mathematical Practices). Students will model with mathematics (CCSS Mathematical Practices). Vocabulary bivariate data power function correlation coefficient quadratic function exponential function residual plot least-squares regression line log transformation log-log transformation About the Lesson scatterplot square root transformation This lesson involves square root, semi-log, and log-log transformations of curved bivariate data using given data sets As a result, students will: Observe scatterplots, residual plots, and correlation coefficients of bivariate data. Transform data using square root, semi-log, and log-log transformations. Determine which transformation is more effective at reducing the curvature in the graph of two variables. Use the least-squares regression line based on the transformed data to make predictions. TI-Nspire Navigator System Transfer a File. Use Quick Poll to assess students understanding. TI-Nspire Technology Skills: Download a TI-Nspire document Open a document Move between pages Grab and drag a point Tech Tips: Make sure the font size on your TI-Nspire handhelds is set to Medium. You can hide the function entry line by pressing / G. Lesson Files: Student Activity Transforming_Bivariate_Data_St udent.pdf Transforming_Bivariate_Data_St udent.doc TI-Nspire document Transforming_Bivariate_Data.tn s Visit www.mathnspired.com for lesson updates and tech tip videos. 2011 Texas Instruments Incorporated 1 education.ti.com

Discussion Points and Possible Answers Analyzing the relationship between two variables, usually an explanatory variable and a response variable, is an important statistical task, and statisticians have developed tools to do this task. The theory and assumptions necessary to use these tools, however, are based on the use of linear relationships between the variables. Unfortunately many relationships are not linear. This activity explores techniques for transforming non-linear relationships into linear relationships. Move to page 1.3. The data on Page 1.3 are the type, the circumference in centimeters, and the mass in grams of 14 approximately spherical pieces of fruit. 1. Use the bar on the right to scroll through the measurements. What do you think a scatterplot of these bivariate data will look like? Sample Answer: Looking at just the first few pairs of values, it does not appear that the scatterplot will be linear there will likely be some kind of curve to these data because the change in the number of grams of mass for a given change in the circumference does not seem to be constant. Move to page 1.4. Page 1.4 displays a scatterplot of the fruit data with the least-squares regression line and the correlation coefficient. Beneath the scatterplot is the residual plot. 2. a. Describe the shape of the distribution of fruit mass versus fruit circumference with respect to shape. Does the distribution surprise you given what you saw in the spreadsheet? Explain your thinking. Sample Answer: The distribution of fruit mass versus fruit circumference is fairly strongly curved possibly a quadratic or an exponential curve. This is not surprising as the trend in the data did not seem to follow a linear pattern. 2011 Texas Instruments Incorporated 2 education.ti.com

c. What is the correlation coefficient? What does that suggest about the linear relationship between fruit circumference and fruit mass? Sample Answer: An r-value of 0.9325 is fairly high. This might suggest that there is a linear relationship between the variables. Teacher Tip: This would be a good time to remind the class that the shapes of the scatterplot and residual plot are more important than the r-value. d. Would it be appropriate to use this least-squares regression line to make a prediction about the mass of a fruit based on its circumference? Explain? Sample Answer: Both the scatterplot and the residual plot show a strong curved pattern. So regardless of the relatively high correlation coefficient, linear regression is not appropriate here, and predictions should not be made from the least-squares regression line. Because of the curvature in the data, predictions made at either end (very small or very large circumferences) might be too small, and predictions made for circumferences in the middle of the data might be too large. When there is a predictable error in the predictions, the model is not a good fit. Teacher Tip: A log-log transformation is a transformation that takes the log of both sides of the equation. If that transformation gives a linear result, then the original relationship, "untransformed" is the independent variable raised to a constant power, like y = A x p. The arrow will allow you to explore several transformations of the fruit data. For each transformation, you will see the scatterplot and residual plot of the transformed data, the equation of the least-squares regression line, and the correlation coefficient. The goal is to find a transformation that changes numbers in ways that makes the graph more linear. Using the arrow will display square root, semi-log, and log-log transformations. 3. Click on the arrow in the upper panel to take the square root of all of the fruit masses. a. How do the scatterplot and residual plot change with respect to shape? What is the new correlation coefficient? Sample Answer: The scatterplot looks less curved than the scatterplot of the original data, but the residual plot still shows a curve. The new correlation coefficient is 0.9849. 2011 Texas Instruments Incorporated 3 education.ti.com

b. How would you write the equation for the least-squares regression line for the transformed data? (Hint: Look at the variable names on each axis to see where you need to include "sqrt".) Answer: predicted sqrt(mass) = -7.55 + 0.955 (circumference) c. Do the transformed data have a linear relationship? Explain why or why not. Sample Answer: No, even though the correlation coefficient is close to 1, the residual plot still shows a curved pattern, indicating there is not a linear relationship between fruit circumference and squareroot(fruitmass). 4. Another way to change the shape of the distribution is to take the common (base 10) or natural (base e) logarithm. Click on the right arrow again to take the common logarithm of the fruit mass values. a. How do the scatterplot and residual plot change with respect to shape? What is the new correlation coefficient? Sample Answer: The scatterplot looks less curved than the scatterplot of the original data, but the residual plot still shows a curve. The new correlation coefficient is 0.9757, larger than that of the original data, but slightly smaller than the r for the square root transformation. Teacher Tip: You might ask students what difference they would expect to see if they used the natural logarithm, base e, rather than using the common logarithm base 10 for the transformation. It might be worthwhile to have them actually graph both transformations in a new file and compare the results, noticing that the linearity is not affected, just the coefficients. Teacher Tip: When data are plotted on along one axis using a linear scale, and a logarithmic scale is used on the other axis, it is sometimes called a semi-logarithmic transformation. (If both scales are logarithmic, you have a log-log transformation). b. Did this logarithmic transformation fix the problem of a non-linear relationship between fruit circumference and fruit mass? Explain your reasoning. Sample Answer: No, even though the correlation coefficient is larger than that of the original data, the residual plot still shows a curved pattern (this time curved down), indicating there is not a linear relationship between fruit circumference and log(fruit mass). 2011 Texas Instruments Incorporated 4 education.ti.com

5. a. Click on the right arrow again, and describe the new transformation. Sample Answer: The transformation is taking the log(fruit circumference). b. How do the scatterplot and residual plot change with respect to shape? What is the new correlation coefficient? Sample Answer: Both the scatterplot and residual plot show strong curves, indicating a nonlinear relationship between log(fruit circumference) and fruit mass. The correlation coefficient is also smaller, at r = 0.8109. c. Do the transformed data have a linear relationship? Explain. Sample Answer: No, the residual plot still shows a curved pattern, and the correlation coefficient is smaller than the one for the untransformed data, indicating that there is not a linear relationship between log(fruit circumference) and fruit mass. 6. Click on the right arrow again. a. This is a log-log transformation used to check for a power relationship in the original data. Explain why the name makes sense. Sample Answer: It makes sense to call the transformation log-log because you take the log of both sides of the equation. b. How do the scatterplot and residual plot change with respect to shape? What is the new correlation coefficient? Did the log-log transformation fix the curve in the original data so that it would be possible to use this model for prediction that is, do the transformed data have a linear relationship? Sample Answer: The scatterplot looks much less curved than the scatterplot of the original data, and the residual plot has a random scatter. The new correlation coefficient is 0.9952. The log-log transformation transformed the values so that a linear model would be appropriate for prediction. c. How would you write the equation for the least-squares regression line for the transformed data? Answer: predicted log(mass) = -1.41 + 2.713 (log(circumference)) 2011 Texas Instruments Incorporated 5 education.ti.com

d. Use your equation to predict the mass of a spherical watermelon with a circumference of 44 cm and one with a circumference of 68 cm. Which prediction do you feel is more reliable, and why? Sample Answer: using predicted log(mass) = -1.41 + 2.713 (log(circumference)), predicted log(mass) = -1.41 + 2.713 (log(44)) = 3.04869 mass = 1119 grams predicted log(mass) = -1.41 + 2.713 (log(68)) = 3.5616 mass = 3644 grams The prediction for the watermelon with a circumference of 44 cm. seems more reliable because 44 cm. is within the range of circumferences in the data set. A circumference of 68 cm. is much larger than any of the values in the data set, and the regression model might not hold for fruit that large. Move to page 2.1. 7. The data set on this page shows the lengths in inches and the weights in pounds for 25 alligators. What do you think a scatterplot of these bivariate data will look like? Sample Answer: I do not think that the scatterplot of alligator data will be linear. I think there might be some kind of curvature. Move to page 2.2. Page 2.2 shows a scatterplot of the alligator weight vs. length with the least-squares regression line. The scatterplot on the lower screen is the residual plot. 8. Describe the distribution of alligator weight versus alligator length shown in the scatterplot with respect to shape. Does the distribution surprise you given what you saw in the spreadsheet? Sample Answer: The distribution of alligator weight versus length is fairly strongly curved possibly a quadratic or an exponential curve. The data values did not seem to follow a linear pattern, so this is not surprising. 2011 Texas Instruments Incorporated 6 education.ti.com

9. a. Use the arrow in the upper panel to check four different transformations of the alligator data. Which transformation made the scatterplot of the data the most linear and the residual plot the most random? Explain your reasoning. Sample Answer: Taking the logarithm of only the alligator weights produced the most linear scatterplot and random residual plot (i.e, (length, logweight). The scatterplot for the square root transformation looked linear, but its residual plot showed a strong curved pattern. Taking the logarithm of only the lengths produced a strongly curved pattern in both the scatterplot and residual plot. The log-log transformation (taking the logartithm of both lengths and weights) produced a scatterplot that looked linear, but the residual plot again showed some curvature. b. How would you write the equation for the least-squares line for the transformation that seems best at eliminating the curvature in the alligator data? Answer: predicted alligator (log(weight)) = 0.58 + 0.015 alligator length c. Predict the weight of an alligator whose length is 140 cm. Answer: Using the equation above, the weight of the alligator would be 7.98 pounds. TI-Nspire Navigator Opportunity: Quick Poll See Note 1 at the end of this lesson. Wrap Up Upon completion of the lesson, the teacher should ensure that students are able to understand: Bivariate data can be transformed in order to apply regression modeling procedures. Scatterplots and residual plots of different transformations of bivariate data can be used to determine which transformation is more effective at eliminating the curve. How to use models of transformed data to make predictions. 2011 Texas Instruments Incorporated 7 education.ti.com

Assessment Label each of the following as sometimes, always, or never. Be prepared to defend your answers. 1. The least-squares regression line can be used to make a reasonable prediction for a given explanatory variable. Answer: Sometimes true, if the relationship between the explanatory variable and either the response or the transformed response variable is approximately linear. 2. When statisticians analyze the relationship between two variables, it is important to have a linear relationship between the variables or transformed values of those variables. Answer: Always true because the statistical tools for analyzing the relationship are based on the assumption of linearity. 3. When you are transforming data to analyze the relationship, if you take the logarithm of the explanatory variable, you have to take the logarithm of the response variable. Sample Answer: False because you can take the logarithm of either (log transformations) or both variables (log-log transformations) as was done in the activity. 4. Square root transformations will reduce the curvature in bivariate data. Sample Answer: Sometimes true depending on the nature of the original relationship between the variables. TI-Nspire Navigator Note 1 Name of Feature: Quick Poll Use Quick Poll to determine how well students understood the lesson. You might ask them to send responses to each of the questions in the Assessment above as well. Data used in this activity: Concept of using fruit (name of fruit, circumference in cm, mass in grams)--thanks to Tim Erickson of http://www.eeps.com/zoo/index.html Alligator data (length in inches and weight in pounds)--thanks to Richard Scheaffer 2011 Texas Instruments Incorporated 8 education.ti.com