Straightening Data in a Scatterplot Selecting a Good Re-Expression Model



Similar documents
Linear Regression. use waist

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

3.2 LOGARITHMIC FUNCTIONS AND THEIR GRAPHS. Copyright Cengage Learning. All rights reserved.

Graphing Linear Equations

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

What are the place values to the left of the decimal point and their associated powers of ten?

Review of Fundamental Mathematics

Simple Linear Regression

Scatter Plot, Correlation, and Regression on the TI-83/84

Absorbance Spectrophotometry: Analysis of FD&C Red Food Dye #40 Calibration Curve Procedure

Equations. #1-10 Solve for the variable. Inequalities. 1. Solve the inequality: Solve the inequality: 4 0

2.5 Transformations of Functions

MSLC Workshop Series Math Workshop: Polynomial & Rational Functions

Univariate Regression

Linear Equations. Find the domain and the range of the following set. {(4,5), (7,8), (-1,3), (3,3), (2,-3)}

AP Physics 1 and 2 Lab Investigations

Calculator Notes for the TI-Nspire and TI-Nspire CAS

Using Excel for Handling, Graphing, and Analyzing Scientific Data:

Slope-Intercept Equation. Example

PRE-CALCULUS GRADE 12

FINAL EXAM SECTIONS AND OBJECTIVES FOR COLLEGE ALGEBRA

Summary of important mathematical operations and formulas (from first tutorial):

This document contains Chapter 2: Statistics, Data Analysis, and Probability strand from the 2008 California High School Exit Examination (CAHSEE):

Algebra I Vocabulary Cards

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

In addition to looking for applications that can be profitably examined algebraically,

Manhattan Center for Science and Math High School Mathematics Department Curriculum

AP Statistics. Chapter 4 Review

Algebra II New Summit School High School Diploma Program

Graphing Linear Equations in Two Variables

Algebra 2 Year-at-a-Glance Leander ISD st Six Weeks 2nd Six Weeks 3rd Six Weeks 4th Six Weeks 5th Six Weeks 6th Six Weeks

Algebra 1 Course Information

Question 1a of 14 ( 2 Identifying the roots of a polynomial and their importance )

Notes for EER #4 Graph transformations (vertical & horizontal shifts, vertical stretching & compression, and reflections) of basic functions.

Part 1: Background - Graphing

Indiana State Core Curriculum Standards updated 2009 Algebra I

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

High School Algebra Reasoning with Equations and Inequalities Solve systems of equations.

Transforming Bivariate Data

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Descriptive Statistics

BookTOC.txt. 1. Functions, Graphs, and Models. Algebra Toolbox. Sets. The Real Numbers. Inequalities and Intervals on the Real Number Line

CURVE FITTING LEAST SQUARES APPROXIMATION

1.6 A LIBRARY OF PARENT FUNCTIONS. Copyright Cengage Learning. All rights reserved.

Diagrams and Graphs of Statistical Data

Curve Fitting in Microsoft Excel By William Lee

Scientific Graphing in Excel 2010

2. Simple Linear Regression

Week 1: Functions and Equations

0 Introduction to Data Analysis Using an Excel Spreadsheet

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

Plot the following two points on a graph and draw the line that passes through those two points. Find the rise, run and slope of that line.

Simple linear regression

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

In the Herb Business, Part III Factoring and Quadratic Equations

(Least Squares Investigation)

Algebra II End of Course Exam Answer Key Segment I. Scientific Calculator Only

CONDUCT YOUR EXPERIMENT/COLLECT YOUR DATA AND RECORD YOUR RESULTS WRITE YOUR CONCLUSION

Florida Math for College Readiness

Exercise 1.12 (Pg )

FACTORING QUADRATICS and 8.1.2

PEARSON R CORRELATION COEFFICIENT

Functions Modeling Change: A Precalculus Course. Marcel B. Finan Arkansas Tech University c All Rights Reserved

Clovis Community College Core Competencies Assessment Area II: Mathematics Algebra

EdExcel Decision Mathematics 1

Elements of a graph. Click on the links below to jump directly to the relevant section

Graphic Designing with Transformed Functions

7.1 Graphs of Quadratic Functions in Vertex Form

Algebra I Credit Recovery

Relationships Between Two Variables: Scatterplots and Correlation

What does the number m in y = mx + b measure? To find out, suppose (x 1, y 1 ) and (x 2, y 2 ) are two points on the graph of y = mx + b.

Exponential and Logarithmic Functions

due date: third day of class estimated time: 10 hours (for planning purposes only; work until you finish)

THE COST OF COLLEGE EDUCATION PROJECT PACKET

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Prentice Hall Mathematics: Algebra Correlated to: Utah Core Curriculum for Math, Intermediate Algebra (Secondary)

Correlation and Regression

Tutorial on Using Excel Solver to Analyze Spin-Lattice Relaxation Time Data

Common Core Unit Summary Grades 6 to 8

ANALYSIS OF TREND CHAPTER 5

Graphs of Proportional Relationships

Algebra and Geometry Review (61 topics, no due date)

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Common Tools for Displaying and Communicating Data for Process Improvement

Year 9 set 1 Mathematics notes, to accompany the 9H book.

How To Understand And Solve Algebraic Equations

Simple Regression Theory II 2010 Samuel L. Baker

MBA Jump Start Program

Academic Support Center. Using the TI-83/84+ Graphing Calculator PART II

6.4 Logarithmic Equations and Inequalities

Gerrit Stols

SECTION 2.5: FINDING ZEROS OF POLYNOMIAL FUNCTIONS

2013 MBA Jump Start Program. Statistics Module Part 3

Demand Forecasting When a product is produced for a market, the demand occurs in the future. The production planning cannot be accomplished unless

2.5 Zeros of a Polynomial Functions

Curve Fitting, Loglog Plots, and Semilog Plots 1

The North Carolina Health Data Explorer

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Unit 7 Quadratic Relations of the Form y = ax 2 + bx + c

Transcription:

Straightening Data in a Scatterplot Selecting a Good Re-Expression What Is All This Stuff? Here s what is included: Page 3: Graphs of the three main patterns of data points that the student is likely to encounter in scatter plots, along with suggestions on which re-expression models to try. Page 4: Graphs of residual plots that the student is likely to encounter, along with suggestions on which re-expression models to try. Pages 5 to 10: Scatter plots developed using various model types, their associated regression lines, re-expressions used to make the models linear, and some ideas on when to use the re-expressions. Explanation In an effort to assist the student in selecting a model to use in straightening data in a scatterplot, I developed the series of models described on pages 5 to 10. Using the patterns in the scatterplots and residual plots generated by these models, I created the summaries on pages 3 and 4. It is not possible to identify and chart every pattern that may arise in a scatterplot. However, I am hopeful that reviewing the information in this package will help the student identify patterns and quickly determine which re-expressions make the most sense when data are not nearly linear. If the student runs into a pattern different from those included in this packet, they should try to identify a pattern in the summaries that has similar characteristics. Sometimes it may be necessary for the student to draw on their knowledge of Algebra and functions to select an appropriate model. In any case, after re-expressing data, remember to check the residual plot for randomness and lack of a pattern to determine if the model you selected is a good one. Why Is it Difficult to Pick a? Rarely will it be obvious that a single model is much better than every other model. There will often be more than one model which makes a decent choice in straightening data. Why is this? The graphs on the next page may help. AP Statistics Page 1 of 10 October 4, 2013

Figure 1, below, summarizes a number of models that work with scatterplots where the data are simultaneously rising and flattening. Notice that all of these models have the same general pattern. And there are many more models that have similar patterns. Figure 1 Figure 2 Figure 2, above, summarizes a number of models that work with scatterplots where the data are simultaneously rising and concave up. Notice that all of these models have the same general pattern. Again there are many more models that have similar patterns. Finding the right model is difficult because there are so many possibilities. Additionally, the best model is not always the model with the best fit (i.e., the highest R 2 ). Selecting the right model often involves understanding the nature of the data, including what causes it to have its general shape. In hard science (e.g., chemistry, physics, biology, astronomy), fitting data with the proper model depends on the underlying scientific principles involved. In the social sciences (i.e., sciences relating to society and human behavior), however, it is often more of a challenge to select the right model for a given set of data; and, even if you find a good model, it may need to change over time. The Log s A Safe Haven It is noteworthy that three models involving logarithms (i.e., the logarithmic model, the power model and the exponential model) are very common. Notice in the summary on page 3 that at least one of these models fits each of the patterns shown. The student should become proficient at using these models and knowing when each should be used. When in doubt, try the Power model; it is very, uh Power-ful. Best wishes! Earl AP Statistics Page 2 of 10 October 4, 2013

Re-Expression s to Consider Based on the Pattern of Points in a Scatterplot Original Original Original s to consider: s to consider: s to consider: Logarithmic = a + b log x Square Root 2 = a + bx Logarithmic = a + b log x Exponential log = a + bx Quadratic (for counts) = a + bx Exponential log = a + bx Power log = a + b log x Reciprocal 1 = a + bx Power log = a + b log x Summary of When to Use Specific s Logarithmic : = a + b log x Use when values flatten out on the right. Exponential : log = a + bx Use when values increase or decrease at a constant percentage rate. Power : log = a + b log x Similar to the Square Root and Quadratic s, but allows powers other than ½ or 2. A very nice feature is that the regression determines the appropriate power (i.e., the value of b) to be used in the model. Square Root : 2 = a + bx For values that look like a square root function. Quadratic : = a + bx Start here when counts are involved. Reciprocal : 1 = a + bx Use when values are ratios, like miles per hour. Alternatively, invert the ratio and try a linear model. AP Statistics Page 3 of 10 October 4, 2013

Re-Expression s to Consider Based on a Graph of the s to consider: s to consider: s to consider: Logarithmic = a + b log x Logarithmic = a + b log x Square Root 2 = a + bx Power log = a + b log x Bottom Tilts Left Bottom Centered Bottom Tilts Right s to consider: s to consider: s to consider: Power log = a + b log x Quadratic = a + bx Power log = a + b log x Exponential log = a + bx Reciprocal 1 =.024 +.06x AP Statistics Page 4 of 10 October 4, 2013

Straightening Data in a Scatterplot Description of Individual s The following pages provide the background material used to create the summaries on pages 3 and 4. It is not necessary to study the detail for each of these models, but some familiarity with situations in which they should be used (green boxes) is recommended. I attempted to choose models that the student may encounter on the AP Exam. Other models are certainly possible, so this list should not be considered exhaustive. Two models are shown on each page. Here is an explanation of what is shown for each model. Top Line: Name of the model and the function I used to generate the data points. The specific function I used is not of major importance to the student; rather, the pattern of the points in the scatterplot and in the residual plot should be noted. Top Two Graphs: These are the scatterplot and residual plot generated by the function chosen. These are very important and represent the kind of patterns the student should look for when choosing a re-expression model. Note that, for consistency, I used xvalues from 1 to 15 in every model. The regression equation associated with the scatterplot is also shown. Note that the residual plots shown in this packet are based on the definition of residual plots in your textbook. A point is plotted for each point in the original scatterplot. The abscissa (i.e., horizontal coordinate) of each point is the predicted value,, associated with the corresponding point in the original scatterplot, and the ordinate (i.e., vertical coordinate) of each point is the residual, y, associated with that point. So, you can think of each ordered pair in the residual plot as having the coordinates: (, y ). Action Line: A description of the action to be taken by the student to produce a data reexpression that will attempt to straighten the data. Bottom Graph: A scatterplot of the re-expressed data. Note that the re-expressed data for each model in this packet all lie on straight lines; this results directly from the manner in which I created the original scatterplots. In your work, you will want the reexpression to produce 1) a scatterplot with points that are close to a straight line, and 2) a residual plot with randomly distributed points that have no apparent pattern. I show the general description of the re-expression model being used and the regression equation for the re-expressed data to the right of the graph. Note that this regression equation is equivalent to the sample function shown in the top line; this also results directly from the manner in which I created the original scatterplots. Green Box: Comments on when the model should be used and anything else I found particularly interesting in developing this packet. AP Statistics Page 5 of 10 October 4, 2013

Logarithmic sample function: y = 3.1 log(x) + 1.37 Original = 2.10 + 0.22x Action: Change x-axis values from x to log(x) : = a + b log x = 1.37 + 3.10 log x Use when values flatten out on the right. Logarithmic sample function: y = 2.3 log(x) + 4.62] Original = 4.07 0.16x Action: Change x-axis values from x to log(x) : = a + b log x = 4.62 2.30 log x Use when values flatten out on the right. AP Statistics Page 6 of 10 October 4, 2013

Exponential sample function: y = 0.8 (1.3) x Original = 8.11 + 2.46x Action: Change y-axis values from y to log(y) : log = a + bx log = 0.1 + 0.11x Use when values increase or decrease at a constant percentage rate. Exponential sample function: y = 4.9 (0.7) x Original = 2.30 0.19x Action: Change y-axis values from y to log(y) : log = a + bx log = 0.69 0.15x Use when values increase or decrease at a constant percentage rate. AP Statistics Page 7 of 10 October 4, 2013

Power sample function: y = 2.63 x 1.5 Original = 20.72 + 10.88x Action: Change x-axis values from x to log(x) AND y-axis values from y to log(y) : log = a + b log x log = 0.42 + 1.50 log x Similar to the Square Root and Quadratic s, but allows powers other than ½ or 2. A very nice feature is that the regression determines the appropriate power (i.e., value of b) to be used in the model. Power sample function: y = 2.63 x 0.2 Original = 2.89 + 0.12x Action: Change x-axis values from x to log(x) AND y-axis values from y to log(y) : log = a + b log x log = 0.42 + 0.2 log x Similar to the Square Root and Quadratic s, but allows powers other than ½ or 2. A very nice feature is that the regression determines the appropriate power (i.e., value of b) to be used in the model. AP Statistics Page 8 of 10 October 4, 2013

Square Root sample function: y = 1.35x + 2.91 Original = 2.08 + 0.19x Action: Change y-axis values from y to y 2 : 2 = a + bx 2 = 2.91 + 1.35x For values that look like a square root function. Quadratic sample function: y = (1.62x + 3.44) 2 Original = 107.14 + 53.14x Action: Change y-axis values from y to y : = a + bx = 3.44 + 1.62x Start here when counts are involved. AP Statistics Page 9 of 10 October 4, 2013

Reciprocal sample function: y = 50 3x+1.2 Original = 7.31 0.52x Action: Change x-axis values from y to 1 y : 1 = a + bx 1 =.024 +.06x Use when values are ratios, like miles per hour. Alternatively, invert the ratio and try a linear model. Negative Reciprocal sample function: y = 50 3x+1.2 Original = 7.31 0.52x Action: Change x-axis values from y to 1 y : 1 = a + bx 1 =.024.06x Like the Reciprocal, but preserves the direction of the original curve. AP Statistics Page 10 of 10 October 4, 2013