# A Short Tour of the Predictive Modeling Process

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Chapter 2 A Short Tour of the Predictive Modeling Process Before diving in to the formal components of model building, we present a simple example that illustrates the broad concepts of model building. Specifically, the following example demonstrates the concepts of data spending, building candidate models, and selecting the optimal model. 2.1 Case Study: Predicting Fuel Economy The fueleconomy.gov web site, run by the U.S. Department of Energy s Office of Energy Efficiency and Renewable Energy and the U.S. Environmental Protection Agency, lists different estimates of fuel economy for passenger cars and trucks. For each vehicle, various characteristics are recorded such as the engine displacement or number of cylinders. Along with these values, laboratory measurements are made for the city and highway miles per gallon (MPG) of the car. In practice, we would build a model on as many vehicle characteristics as possible in order to find the most predictive model. However, this introductory illustration will focus high-level concepts of model building by using a single predictor, engine displacement (the volume inside the engine cylinders), and a single response, unadjusted highway MPG for model year cars. The first step in any model building process is to understand the data, which can most easily be done through a graph. Since we have just one predictor and one response, these data can be visualized with a scatter plot (Fig. 2.1). This figure shows the relationship between engine displacement and fuel economy. The 10 model year panel contains all the 10 data while the other panel shows the data only for new 11 vehicles. Clearly, as engine displacement increases, the fuel efficiency drops regardless of year. The relationship is somewhat linear but does exhibit some curvature towards the extreme ends of the displacement axis. M. Kuhn and K. Johnson, Applied Predictive Modeling, DOI / , Springer Science+Business Media New York 13 19

2 2 A Short Tour of the Predictive Modeling Process 10 Model Year 11 Model Year Fuel Efficiency (MPG) Fig. 2.1: The relationship between engine displacement and fuel efficiency of all 10 model year vehicles and new 11 car lines If we had more than one predictor, we would need to further understand characteristics of the predictors and the relationships among the predictors. These characteristics may suggest important and necessary pre-processing steps that must be taken prior to building a model (Chap. 3). After first understanding the data, the next step is to build and evaluate a model on the data. A standard approach is to take a random sample of the data for model building and use the rest to understand model performance. However, suppose we want to predict the MPG for a new car line. In this situation, models can be created using the 10 data (containing 1,107 vehicles) and tested on the 245 new 11 cars. The common terminology would be that the 10 data are used as the model training set and the 11 values are the test or validation set. Now that we have defined the data used for model building and evaluation, we should decide how to measure performance of the model. For regression problems where we try to predict a numeric value, the residuals are important sources of information. Residuals are computed as the observed value minus the predicted value (i.e., y i ŷ i ). When predicting numeric values, the root mean squared error (RMSE) is commonly used to evaluate models. Described in more detail in Chap. 7, RMSE is interpreted as how far, on average, the residuals are from zero. At this point, the modeler will try various techniques to mathematically define the relationship between the predictor and outcome. To do this, the training set is used to estimate the various values needed by the model equations. The test set will be used only when a few strong candidate models have been finalized (repeatedly using the test set in the model build process negates its utility as a final arbitrator of the models). Suppose a linear regression model was created where the predicted MPG is a basic slope and intercept model. Using the training data, we estimate the

3 2.1 Case Study: Predicting Fuel Economy 21 Fuel Economy Predicted 10 Observed Fig. 2.2: Quality of fit diagnostics for the linear regression model. The training set data and its associated predictions are used to understand how well the model works intercept to be.6 and the slope to be 4.5 MPG/liters using the method of least squares (Sect. 6.2). The model fit is shown in Fig. 2.2 for the training set data. 1 The left-hand panel shows the training set data with a linear model fit defined by the estimated slope and intercept. The right-hand panel plots the observed and predicted MPG. These plots demonstrate that this model misses some of the patterns in the data, such as under-predicting fuel efficiency when the displacement is less than 2 L or above 6 L. When working with the training set, one must be careful not to simply evaluate model performance using the same data used to build the model. If we simply re-predict the training set data, there is the potential to produce overly optimistic estimates of how well the model works, especially if the model is highly adaptable. An alternative approach for quantifying how well the model operates is to use resampling, where different subversions of the training data set are used to fit the model. Resampling techniques are discussed in Chap. 4. For these data, we used a form of resampling called 10-fold cross-validation to estimate the model RMSE to be 4.6 MPG. Looking at Fig. 2.2, it is conceivable that the problem might be solved by introducing some nonlinearity in the model. There are many ways to do this. The most basic approach is to supplement the previous linear regression model with additional complexity. Adding a squared term for engine displacement would mean estimating an additional slope parameter associated with the square of the predictor. In doing this, the model equation changes to 1 One of our graduate professors once said the only way to be comfortable with your data is to never look at it.

4 22 2 A Short Tour of the Predictive Modeling Process Fuel Economy Predicted Observed Fig. 2.3: Quality of fit diagnostics for the quadratic regression model (using the training set) efficiency = displacement displacement 2 This is referred to as a quadratic model since it includes a squared term; the model fit is shown in Fig Unquestionably, the addition of the quadratic term improves the model fit. The RMSE is now estimated to be 4.2 MPG using cross-validation. One issue with quadratic models is that they can perform poorly on the extremes of the predictor. In Fig. 2.3, there may be a hint of this for the vehicles with very high displacement values. The model appears to be bending upwards unrealistically. Predicting new vehicles with large displacement values may produce significantly inaccurate results. Chapters 6 8 discuss many other techniques for creating sophisticated relationships between the predictors and outcome. One such approach is the multivariate adaptive regression spline (MARS) model (Friedman 1991). When used with a single predictor, MARS can fit separate linear regression lines for different ranges of engine displacement. The slopes and intercepts are estimated for this model, as well as the number and size of the separate regions for the linear models. Unlike the linear regression models, this technique has a tuning parameter which cannot be directly estimated from the data. There is no analytical equation that can be used to determine how many segments should be used to model the data. While the MARS model has internal algorithms for making this determination, the user can try different values and use resampling to determine the appropriate value. Once the value is found, a final MARS model would be fit using all the training set data and used for prediction.

5 2.1 Case Study: Predicting Fuel Economy 23 RMSE (Cross Validation) #Terms Fig. 2.4: The cross-validation profile for the MARS tuning parameter Fuel Economy Predicted Observed Fig. 2.5: Quality of fit diagnostics for the MARS model (using the training set). The MARS model creates several linear regression fits with change points at 2.3, 3.5, and 4.3 L For a single predictor, MARS can allow for up to five model terms (similar to the previous slopes and intercepts). Using cross-validation, we evaluated four candidate values for this tuning parameter to create the resampling profile which is shown in Fig The lowest RMSE value is associated with four terms, although the scale of change in the RMSE values indicates that there is some insensitivity to this tuning parameter. The RMSE associated with the optimal model was 4.2 MPG. After fitting the final MARS model with four terms, the training set fit is shown in Fig. 2.5 where several linear segments were predicted.

6 24 2 A Short Tour of the Predictive Modeling Process MARS Quadratic Regression Fuel Economy Fig. 2.6: The test set data and with model fits for two models Based on these three models, the quadratic regression and MARS models were evaluated on the test set. Figure 2.6 shows these results. Both models fit very similarly. The test set RMSE values for the quadratic model was 4.72 MPG and the MARS model was 4.69 MPG. Based on this, either model would be appropriate for the prediction of new car lines. 2.2 Themes There are several aspects of the model building process that are worth discussing further, especially for those who are new to predictive modeling. Data Splitting Although discussed in the next chapter, how we allocate data to certain tasks (e.g., model building, evaluating performance) is an important aspect of modeling. For this example, the primary interest is to predict the fuel economy of new vehicles, which is not the same population as the data used to build the model. This means that, to some degree, we are testing how well the model extrapolates to a different population. If we were interested in predicting from the same population of vehicles (i.e., interpolation), taking a simple random sample of the data would be more appropriate. How the training and test sets are determined should reflect how the model will be applied.

7 2.2 Themes 25 How much data should be allocated to the training and test sets? It generally depends on the situation. If the pool of data is small, the data splitting decisions can be critical. A small test would have limited utility as a judge of performance. In this case, a sole reliance on resampling techniques (i.e., no test set) might be more effective. Large data sets reduce the criticality of these decisions. Predictor Data This example has revolved around one of many predictors: the engine displacement. The original data contain many other factors, such as the number of cylinders, the type of transmission, and the manufacturer. An earnest attempt to predict the fuel economy would examine as many predictors as possible to improve performance. Using more predictors, it is likely that the RMSE for the new model cars can be driven down further. Some investigation into the data can also help. For example, none of the models were effective at predicting fuel economy when the engine displacement was small. Inclusion of predictors that target these types of vehicles would help improve performance. An aspect of modeling that was not discussed here was feature selection: the process of determining the minimum set of relevant predictors needed by the model. This common task is discussed in Chap. 19. Estimating Performance Before using the test set, two techniques were employed to determine the effectiveness of the model. First, quantitative assessments of statistics (i.e., the RMSE) using resampling help the user understand how each technique would perform on new data. The other tool was to create simple visualizations of a model, such as plotting the observed and predicted values, to discover areas of the data where the model does particularly good or bad. This type of qualitative information is critical for improving models and is lost when the model is gauged only on summary statistics. Evaluating Several Models For these data, three different models were evaluated. It is our experience that some modeling practitioners have a favorite model that is relied on indiscriminately. The No Free Lunch Theorem (Wolpert 1996) argues that,

8 26 2 A Short Tour of the Predictive Modeling Process without having substantive information about the modeling problem, there is no single model that will always do better than any other model. Because of this, a strong case can be made to try a wide variety of techniques, then determine which model to focus on. In our example, a simple plot of the data shows that there is a nonlinear relationship between the outcome and the predictor. Given this knowledge, we might exclude linear models from consideration, but there is still a wide variety of techniques to evaluate. One might say that model X is always the best performing model but, for these data, a simple quadratic model is extremely competitive. Model Selection At some point in the process, a specific model must be chosen. This example demonstrated two types of model selection. First, we chose some models over others: the linear regression model did not fit well and was dropped. In this case, we chose between models. There was also a second type of model selection shown. For MARS, the tuning parameter was chosen using cross-validation. This was also model selection where we decided on the type of MARS model to use. In this case, we did the selection within different MARS models. In either case, we relied on cross-validation and the test set to produce quantitative assessments of the models to help us make the choice. Because we focused on a single predictor, which will not often be the case, we also made visualizations of the model fit to help inform us. At the end of the process, the MARS and quadratic models appear to give equivalent performance. However, knowing that the quadratic model might not do well for vehicles with very large displacements, our intuition might tell us to favor the MARS model. One goal of this book is to help the user gain intuition regarding the strengths and weakness of different models to make informed decisions. 2.3 Summary At face value, model building appears straightforward: pick a modeling technique, plug in data, and generate a prediction. While this approach will generate a predictive model, it will most likely not generate a reliable, trustworthy model for predicting new samples. To get this type of model, we must first understand the data and the objective of the modeling. Upon understanding the data and objectives, we then pre-process and split the data. Only after these steps do we finally proceed to building, evaluating, and selecting models.

9

### Interaction between quantitative predictors

Interaction between quantitative predictors In a first-order model like the ones we have discussed, the association between E(y) and a predictor x j does not depend on the value of the other predictors

### 4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

### MATH 185 CHAPTER 2 REVIEW

NAME MATH 18 CHAPTER REVIEW Use the slope and -intercept to graph the linear function. 1. F() = 4 - - Objective: (.1) Graph a Linear Function Determine whether the given function is linear or nonlinear..

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

### JetBlue Airways Stock Price Analysis and Prediction

JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue

### the points are called control points approximating curve

Chapter 4 Spline Curves A spline curve is a mathematical representation for which it is easy to build an interface that will allow a user to design and control the shape of complex curves and surfaces.

### Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

### Module 6: Introduction to Time Series Forecasting

Using Statistical Data to Make Decisions Module 6: Introduction to Time Series Forecasting Titus Awokuse and Tom Ilvento, University of Delaware, College of Agriculture and Natural Resources, Food and

### Linear Approximations ACADEMIC RESOURCE CENTER

Linear Approximations ACADEMIC RESOURCE CENTER Table of Contents Linear Function Linear Function or Not Real World Uses for Linear Equations Why Do We Use Linear Equations? Estimation with Linear Approximations

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### 3.3. Solving Polynomial Equations. Introduction. Prerequisites. Learning Outcomes

Solving Polynomial Equations 3.3 Introduction Linear and quadratic equations, dealt within Sections 3.1 and 3.2, are members of a class of equations, called polynomial equations. These have the general

### Algebra II Semester Exam Review Sheet

Name: Class: Date: ID: A Algebra II Semester Exam Review Sheet 1. Translate the point (2, 3) left 2 units and up 3 units. Give the coordinates of the translated point. 2. Use a table to translate the graph

### Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

### Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

### \$2 4 40 + ( \$1) = 40

THE EXPECTED VALUE FOR THE SUM OF THE DRAWS In the game of Keno there are 80 balls, numbered 1 through 80. On each play, the casino chooses 20 balls at random without replacement. Suppose you bet on the

### TECHNICAL APPENDIX ESTIMATING THE OPTIMUM MILEAGE FOR VEHICLE RETIREMENT

TECHNICAL APPENDIX ESTIMATING THE OPTIMUM MILEAGE FOR VEHICLE RETIREMENT Regression analysis calculus provide convenient tools for estimating the optimum point for retiring vehicles from a large fleet.

### 8. Time Series and Prediction

8. Time Series and Prediction Definition: A time series is given by a sequence of the values of a variable observed at sequential points in time. e.g. daily maximum temperature, end of day share prices,

### Algebra 1 Course Title

Algebra 1 Course Title Course- wide 1. What patterns and methods are being used? Course- wide 1. Students will be adept at solving and graphing linear and quadratic equations 2. Students will be adept

### Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

### GRADES 7, 8, AND 9 BIG IDEAS

Table 1: Strand A: BIG IDEAS: MATH: NUMBER Introduce perfect squares, square roots, and all applications Introduce rational numbers (positive and negative) Introduce the meaning of negative exponents for

### Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

### 2.1 Using Lines to Model Data

2.1 Using Lines to Model Data 2.1 Using Lines to Model Data (Page 1 of 20) Scattergram and Linear Models The graph of plotted data pairs is called a scattergram. A linear model is a straight line or a

### The Method of Least Squares

The Method of Least Squares Steven J. Miller Mathematics Department Brown University Providence, RI 0292 Abstract The Method of Least Squares is a procedure to determine the best fit line to data; the

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### Teaching Multivariate Analysis to Business-Major Students

Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis

### Sect The Slope-Intercept Form

Concepts # and # Sect. - The Slope-Intercept Form Slope-Intercept Form of a line Recall the following definition from the beginning of the chapter: Let a, b, and c be real numbers where a and b are not

### Motion Graphs. It is said that a picture is worth a thousand words. The same can be said for a graph.

Motion Graphs It is said that a picture is worth a thousand words. The same can be said for a graph. Once you learn to read the graphs of the motion of objects, you can tell at a glance if the object in

### Solving Quadratic Equations by Completing the Square

9. Solving Quadratic Equations by Completing the Square 9. OBJECTIVES 1. Solve a quadratic equation by the square root method. Solve a quadratic equation by completing the square. Solve a geometric application

### Logs Transformation in a Regression Equation

Fall, 2001 1 Logs as the Predictor Logs Transformation in a Regression Equation The interpretation of the slope and intercept in a regression change when the predictor (X) is put on a log scale. In this

### The Big Picture. Correlation. Scatter Plots. Data

The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered

### 3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Solving quadratic equations 3.2 Introduction A quadratic equation is one which can be written in the form ax 2 + bx + c = 0 where a, b and c are numbers and x is the unknown whose value(s) we wish to find.

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Indiana State Core Curriculum Standards updated 2009 Algebra I

Indiana State Core Curriculum Standards updated 2009 Algebra I Strand Description Boardworks High School Algebra presentations Operations With Real Numbers Linear Equations and A1.1 Students simplify and

### Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I

Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting

### A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

Factoring Quadratic Trinomials Student Probe Factor Answer: Lesson Description This lesson uses the area model of multiplication to factor quadratic trinomials Part 1 of the lesson consists of circle puzzles

Ronald Christensen Advanced Linear Modeling Multivariate, Time Series, and Spatial Data; Nonparametric Regression and Response Surface Maximization Second Edition Springer Preface to the Second Edition

### Experiment #1, Analyze Data using Excel, Calculator and Graphs.

Physics 182 - Fall 2014 - Experiment #1 1 Experiment #1, Analyze Data using Excel, Calculator and Graphs. 1 Purpose (5 Points, Including Title. Points apply to your lab report.) Before we start measuring

### Lecture 10: Regression Trees

Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

### For each learner you will need: mini-whiteboard. For each small group of learners you will need: Card set A Factors; Card set B True/false.

Level A11 of challenge: D A11 Mathematical goals Starting points Materials required Time needed Factorising cubics To enable learners to: associate x-intercepts with finding values of x such that f (x)

### Part Three. Cost Behavior Analysis

Part Three Cost Behavior Analysis Cost Behavior Cost behavior is the manner in which a cost changes as some related activity changes An understanding of cost behavior is necessary to plan and control costs

### Using Excel for Handling, Graphing, and Analyzing Scientific Data:

Using Excel for Handling, Graphing, and Analyzing Scientific Data: A Resource for Science and Mathematics Students Scott A. Sinex Barbara A. Gage Department of Physical Sciences and Engineering Prince

### Credit Risk Analysis Using Logistic Regression Modeling

Credit Risk Analysis Using Logistic Regression Modeling Introduction A loan officer at a bank wants to be able to identify characteristics that are indicative of people who are likely to default on loans,

### High School Functions Interpreting Functions Understand the concept of a function and use function notation.

Performance Assessment Task Printing Tickets Grade 9 The task challenges a student to demonstrate understanding of the concepts representing and analyzing mathematical situations and structures using algebra.

### Prediction of Car Prices of Federal Auctions

Prediction of Car Prices of Federal Auctions BUDT733- Final Project Report Tetsuya Morito Karen Pereira Jung-Fu Su Mahsa Saedirad 1 Executive Summary The goal of this project is to provide buyers who attend

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### ALGEBRA I / ALGEBRA I SUPPORT

Suggested Sequence: CONCEPT MAP ALGEBRA I / ALGEBRA I SUPPORT August 2011 1. Foundations for Algebra 2. Solving Equations 3. Solving Inequalities 4. An Introduction to Functions 5. Linear Functions 6.

### MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) All but one of these statements contain a mistake. Which could be true? A) There is a correlation

### AP Statistics Section :12.2 Transforming to Achieve Linearity

AP Statistics Section :12.2 Transforming to Achieve Linearity In Chapter 3, we learned how to analyze relationships between two quantitative variables that showed a linear pattern. When two-variable data

### Chapter 10 - Practice Problems 1

Chapter 10 - Practice Problems 1 1. A researcher is interested in determining if one could predict the score on a statistics exam from the amount of time spent studying for the exam. In this study, the

### Straightening Data in a Scatterplot Selecting a Good Re-Expression Model

Straightening Data in a Scatterplot Selecting a Good Re-Expression What Is All This Stuff? Here s what is included: Page 3: Graphs of the three main patterns of data points that the student is likely to

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is

### Model Validation Techniques

Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost

### 2.5 Transformations of Functions

2.5 Transformations of Functions Section 2.5 Notes Page 1 We will first look at the major graphs you should know how to sketch: Square Root Function Absolute Value Function Identity Function Domain: [

### with functions, expressions and equations which follow in units 3 and 4.

Grade 8 Overview View unit yearlong overview here The unit design was created in line with the areas of focus for grade 8 Mathematics as identified by the Common Core State Standards and the PARCC Model

### RELEVANT TO ACCA QUALIFICATION PAPER P3. Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam

RELEVANT TO ACCA QUALIFICATION PAPER P3 Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam Business forecasting and strategic planning Quantitative data has always been supplied

### Time Series Analysis. 1) smoothing/trend assessment

Time Series Analysis This (not surprisingly) concerns the analysis of data collected over time... weekly values, monthly values, quarterly values, yearly values, etc. Usually the intent is to discern whether

### the Median-Medi Graphing bivariate data in a scatter plot

the Median-Medi Students use movie sales data to estimate and draw lines of best fit, bridging technology and mathematical understanding. david c. Wilson Graphing bivariate data in a scatter plot and drawing

### Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

### INVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4. Example 1

Chapter 1 INVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4 This opening section introduces the students to man of the big ideas of Algebra 2, as well as different was of thinking and various problem solving strategies.

### Big Ideas in Mathematics

Big Ideas in Mathematics which are important to all mathematics learning. (Adapted from the NCTM Curriculum Focal Points, 2006) The Mathematics Big Ideas are organized using the PA Mathematics Standards

### Physics Kinematics Model

Physics Kinematics Model I. Overview Active Physics introduces the concept of average velocity and average acceleration. This unit supplements Active Physics by addressing the concept of instantaneous

### Statewide Fuel Consumption Forecast Models

Statewide Fuel Consumption Forecast Models Washington State Department of Transportation Economic Analysis Work Group Participants: Washington State Department of Transportation, Washington State Office

### Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

### 1.1 Practice Worksheet

Math 1 MPS Instructor: Cheryl Jaeger Balm 1 1.1 Practice Worksheet 1. Write each English phrase as a mathematical expression. (a) Three less than twice a number (b) Four more than half of a number (c)

### CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide

Curriculum Map/Pacing Guide page 1 of 14 Quarter I start (CP & HN) 170 96 Unit 1: Number Sense and Operations 24 11 Totals Always Include 2 blocks for Review & Test Operating with Real Numbers: How are

College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)

### Using JMP with a Specific

1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to

### What is the purpose of this document? What is in the document? How do I send Feedback?

This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Statistics

### OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments

### Prediction and Confidence Intervals in Regression

Fall Semester, 2001 Statistics 621 Lecture 3 Robert Stine 1 Prediction and Confidence Intervals in Regression Preliminaries Teaching assistants See them in Room 3009 SH-DH. Hours are detailed in the syllabus.

### JUST THE MATHS UNIT NUMBER 1.8. ALGEBRA 8 (Polynomials) A.J.Hobson

JUST THE MATHS UNIT NUMBER 1.8 ALGEBRA 8 (Polynomials) by A.J.Hobson 1.8.1 The factor theorem 1.8.2 Application to quadratic and cubic expressions 1.8.3 Cubic equations 1.8.4 Long division of polynomials

### Lesson 3.2.1 Using Lines to Make Predictions

STATWAY INSTRUCTOR NOTES i INSTRUCTOR SPECIFIC MATERIAL IS INDENTED AND APPEARS IN GREY ESTIMATED TIME 50 minutes MATERIALS REQUIRED Overhead or electronic display of scatterplots in lesson BRIEF DESCRIPTION

### HOW MUCH WILL I SPEND ON GAS?

HOW MUCH WILL I SPEND ON GAS? Outcome (lesson objective) The students will use the current and future price of gasoline to construct T-charts, write algebraic equations, and plot the equations on a graph.

### A Learning Algorithm For Neural Network Ensembles

A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República

### 0 Introduction to Data Analysis Using an Excel Spreadsheet

Experiment 0 Introduction to Data Analysis Using an Excel Spreadsheet I. Purpose The purpose of this introductory lab is to teach you a few basic things about how to use an EXCEL 2010 spreadsheet to do

### Foundations for Functions

Activity: TEKS: Overview: Materials: Grouping: Time: Crime Scene Investigation (A.2) Foundations for functions. The student uses the properties and attributes of functions. The student is expected to:

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

Factoring Quadratic Trinomials Student Probe Factor x x 3 10. Answer: x 5 x Lesson Description This lesson uses the area model of multiplication to factor quadratic trinomials. Part 1 of the lesson consists

### Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS

DUSP 11.203 Frank Levy Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS These notes have three purposes: 1) To explain why some simple calculus formulae are useful in understanding

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Objectives. Day Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7. Inventory 2,782 2,525 2,303 2,109 1,955 1,788 1,570

Activity 4 Objectives Use the CellSheet App to predict future trends based on average daily use Use a linear regression model to predict future trends Introduction Have you ever gone to buy a new CD or

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### Regression analysis in practice with GRETL

Regression analysis in practice with GRETL Prerequisites You will need the GNU econometrics software GRETL installed on your computer (http://gretl.sourceforge.net/), together with the sample files that

### A Correlation of. to the. South Carolina Data Analysis and Probability Standards

A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards

### FORECASTING. Operations Management

2013 FORECASTING Brad Fink CIT 492 Operations Management Executive Summary Woodlawn hospital needs to forecast type A blood so there is no shortage for the week of 12 October, to correctly forecast, a

### Algebra I Sample Questions. 1 Which ordered pair is not in the solution set of (1) (5,3) (2) (4,3) (3) (3,4) (4) (4,4)

1 Which ordered pair is not in the solution set of (1) (5,3) (2) (4,3) (3) (3,4) (4) (4,4) y 1 > x + 5 and y 3x 2? 2 5 2 If the quadratic formula is used to find the roots of the equation x 2 6x 19 = 0,

### Revised Version of Chapter 23. We learned long ago how to solve linear congruences. ax c (mod m)

Chapter 23 Squares Modulo p Revised Version of Chapter 23 We learned long ago how to solve linear congruences ax c (mod m) (see Chapter 8). It s now time to take the plunge and move on to quadratic equations.

### Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of: