# Data Mining Part 5. Prediction

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini

2 Outline Introduction Linear Regression Other Regression Models References

3 Introduction

4 Introduction Numerical prediction is similar to classification construct a model use model to predict continuous or ordered value for a given input Numeric prediction vs. classification Classification refers to predict categorical class label Numeric prediction models continuous-valued functions

5 Introduction Regression analysis is the major method for numeric prediction Regression analysis model the relationship between one or more independent or predictor variables and a dependent or response variable Regression analysis is a good choice when all of the predictor variables are continuous valued as well.

6 Introduction In the context of data mining The predictor variables are the attributes of interest describing the instance that are known. The response variable is what we want to predict Some classification techniques can be adapted for prediction, e.g. Backpropagation k-nearest-neighbor classifiers Support vector machines

7 Introduction Regression analysis methods: Linear regression Straight-line linear regression Multiple linear regression Non-linear regression Generalized linear model Poisson regression Logistic regression Log-linear models Regression trees and Model trees

8 Linear Regression

9 Linear Regression Straight-line linear regression: involves a response variable y and a single predictor variable x y = w 0 + w 1 x w 0 : y-intercept w 1 : slope w 0 & w 1 are regression coefficients

10 Linear regression Method of least squares: estimates the best-fitting straight line as the one that minimizes the error between the actual data and the estimate of the line. D: a training set x: values of predictor variable y: values of response variable D : data points of the form(x1, y1), (x2, y2),, (x D, y D ). : the mean value of x1, x2,, x D : the mean value of y1, y2,, y D

11 Example: Salary problem The table shows a set of paired data where x is the number of years of work experience of a college graduate and y is the corresponding salary of the graduate.

12 Linear Regression The 2-D data can be graphed on a scatter plot. The plot suggests a linear relationship between the two variables, x and y.

13 Example: Salary data Given the above data, we compute we get The equation of the least squares line is estimated by

14 Example: Salary data Linear Regression: Y=3.5*X Salary Years

15 Multiple linear regression Multiple linear regression involves more than one predictor variable Training data is of the form (X 1, y 1 ), (X 2, y 2 ),, (X D, y D ) where the X i are the n-dimensional training data with associated class labels, y i An example of a multiple linear regression model based on two predictor attributes:

16 Example: CPU performance data

17 Multiple Linear Regression Various statistical measures exist for determining how well the proposed model can predict y (described later). Obviously, the greater the number of predictor attributes is, the slower the performance is. Before applying regression analysis, it is common to perform attribute subset selection to eliminate attributes that are unlikely to be good predictors for y. In general, regression analysis is accurate for numeric prediction, except when the data contain outliers.

18 Other Regression Models

19 Nonlinear Regression If data that does not show a linear dependence we can get a more accurate model using a nonlinear regression model For example, y = w 0 + w 1 x + w 2 x 2 + w 3 x 3

20 Generalized linear models Generalized linear model is foundation on which linear regression can be applied to modeling categorical response variables Common types of generalized linear models include Logistic regression: models the probability of some event occurring as a linear function of a set of predictor variables. Poisson regression: models the data that exhibit a Poisson distribution

21 Log-linear models In the log-linear method, all attributes must be categorical Continuous-valued attributes must first be discretized.

22 Regression Trees and Model Trees Trees to predict continuous values rather than class labels Regression and model trees tend to be more accurate than linear regression when the data are not represented well by a simple linear model

23 Regression trees Regression tree: a decision tree where each leaf predicts a numeric quantity Proposed in CART system (Breiman et al. 1984) CART: Classification And Regression Trees Predicted value is average value of training instances that reach the leaf

24 Example: CPU performance problem Regression tree for the CPU data

25 Example: CPU performance problem We calculate the average of the absolute values of the errors between the predicted and the actual CPU performance measures It turns out to be significantly less for the tree than for the regression equation.

26 Model tree Model tree: Each leaf holds a regression model A multivariate linear equation for the predicted attribute Proposed by Quinlan (1992) A more general case than regression tree

27 Example: CPU performance problem Model tree for the CPU data

28 References

29 References J. Han, M. Kamber, Data Mining: Concepts and Techniques, Elsevier Inc. (2006). (Chapter 6) I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition, Elsevier Inc., (Chapter 6)

30 The end

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

### Relationship of two variables

Relationship of two variables A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. Scatter Plot (or Scatter Diagram) A plot

### Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

### Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

### Tutorial for the TI-89 Titanium Calculator

SI Physics Tutorial for the TI-89 Titanium Calculator Using Scientific Notation on a TI-89 Titanium calculator From Home, press the Mode button, then scroll down to Exponential Format. Select Scientific.

### Population Prediction Project Demonstration Writeup

Dylan Zwick Math 2280 Spring, 2008 University of Utah Population Prediction Project Demonstration Writeup In this project we will investigate how well a logistic population model predicts actual human

### SCATTER PLOTS AND TREND LINES

Name SCATTER PLOTS AND TREND LINES VERSION 2 Lessons 1 3 1. You have collected the following data while researching the Winter Olympics. You are trying to determine if there is a relationship between the

### Data Mining III: Numeric Estimation

Data Mining III: Numeric Estimation Computer Science 105 Boston University David G. Sullivan, Ph.D. Review: Numeric Estimation Numeric estimation is like classification learning. it involves learning a

### City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Fundamentals of Data Science Course Code:

### Chapter 4 Describing the Relation between Two Variables

Chapter 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation The response variable is the variable whose value can be explained by the value of the explanatory or predictor

### DATA ANALYTICS USING R

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

### Predictive Dynamix Inc

Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished

### COMPARING LINEAR AND NONLINEAR FUNCTIONS

1 COMPARING LINEAR AND NONLINEAR FUNCTIONS LEARNING MAP INFORMATION STANDARDS 8.F.2 Compare two s, each in a way (algebraically, graphically, numerically in tables, or by verbal descriptions). For example,

### Name: Class: Date: Does the equation represent a direct variation? If so, find the constant of variation. c. yes; k = 5 3. c.

Name: Class: Date: Chapter 5 Test Multiple Choice Identify the choice that best completes the statement or answers the question. What is the slope of the line that passes through the pair of points? 1.

### Practice Test - Chapter 4. y = 2x 3. The slope-intercept form of a line is y = mx + b, where m is the slope, and b is the y-intercept.

y = 2x 3. The slope-intercept form of a line is y = mx + b, where m is the slope, and b is the y-intercept. Plot the y-intercept (0, 3). The slope is. From (0, 3), move up 2 units and right 1 unit. Plot

### Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

### A correlation exists between two variables when one of them is related to the other in some way.

Lecture #10 Chapter 10 Correlation and Regression The main focus of this chapter is to form inferences based on sample data that come in pairs. Given such paired sample data, we want to determine whether

### The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

### Smart Grid Data Analytics for Decision Support

1 Smart Grid Data Analytics for Decision Support Prakash Ranganathan, Department of Electrical Engineering, University of North Dakota, Grand Forks, ND, USA Prakash.Ranganathan@engr.und.edu, 701-777-4431

### Lecture 5: Correlation and Linear Regression

Lecture 5: Correlation and Linear Regression 3.5. (Pearson) correlation coefficient The correlation coefficient measures the strength of the linear relationship between two variables. The correlation is

### AP * Statistics Review. Linear Regression

AP * Statistics Review Linear Regression Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

### Slope-Intercept Equation. Example

1.4 Equations of Lines and Modeling Find the slope and the y intercept of a line given the equation y = mx + b, or f(x) = mx + b. Graph a linear equation using the slope and the y-intercept. Determine

### 1/2/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

### General Notes About 2006 AP Physics Scoring Guidelines

AP PHYSICS C MECHANICS 006 SCORING GUIDELINES General Notes About 006 AP Physics Scoring Guidelines 1. The solutions contain the most common method of solving the free-response questions and the allocation

### 5.1 Writing Linear Equations in Slope-Intercept Form. 1. Use slope-intercept form to write an equation of a line.

5.1 Writing Linear Equations in Slope-Intercept Form Objectives 1. Use slope-intercept form to write an equation of a line. 2. Model a real-life situation with a linear function. Key Terms Slope-Intercept

### Regents Exam Questions A2.S.8: Correlation Coefficient

A2.S.8: Correlation Coefficient: Interpret within the linear regression model the value of the correlation coefficient as a measure of the strength of the relationship 1 Which statement regarding correlation

### Data Mining 5. Cluster Analysis

Data Mining 5. Cluster Analysis 5.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

### Naive Bayesian Classifier

POLYTECHNIC UNIVERSITY Department of Computer Science / Finance and Risk Engineering Naive Bayesian Classifier K. Ming Leung Abstract: A statistical classifier called Naive Bayesian classifier is discussed.

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

### Algebra. Indiana Standards 1 ST 6 WEEKS

Chapter 1 Lessons Indiana Standards - 1-1 Variables and Expressions - 1-2 Order of Operations and Evaluating Expressions - 1-3 Real Numbers and the Number Line - 1-4 Properties of Real Numbers - 1-5 Adding

### SPSS Multivariable Linear Models and Logistic Regression

1 SPSS Multivariable Linear Models and Logistic Regression Multivariable Models Single continuous outcome (dependent variable), one main exposure (independent) variable, and one or more potential confounders

### Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

### Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

### Tutorial on Using Excel Solver to Analyze Spin-Lattice Relaxation Time Data

Tutorial on Using Excel Solver to Analyze Spin-Lattice Relaxation Time Data In the measurement of the Spin-Lattice Relaxation time T 1, a 180 o pulse is followed after a delay time of t with a 90 o pulse,

### GRAPHING LINEAR EQUATIONS IN TWO VARIABLES

GRAPHING LINEAR EQUATIONS IN TWO VARIABLES The graphs of linear equations in two variables are straight lines. Linear equations may be written in several forms: Slope-Intercept Form: y = mx+ b In an equation

### Studying Auto Insurance Data

Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.

### Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

### The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

### Advanced High School Statistics. Preliminary Edition

Chapter 2 Summarizing Data After collecting data, the next stage in the investigative process is to summarize the data. Graphical displays allow us to visualize and better understand the important features

### Worksheet A5: Slope Intercept Form

Name Date Worksheet A5: Slope Intercept Form Find the Slope of each line below 1 3 Y - - - - - - - - - - Graph the lines containing the point below, then find their slopes from counting on the graph!.

### Residuals. Residuals = ª Department of ISM, University of Alabama, ST 260, M23 Residuals & Minitab. ^ e i = y i - y i

A continuation of regression analysis Lesson Objectives Continue to build on regression analysis. Learn how residual plots help identify problems with the analysis. M23-1 M23-2 Example 1: continued Case

### 4.4 scatterplots and lines of fit2 ink.notebook. November 29, Standards. Scatter Plot. negative correlation. positive correlation

. scatterplots and lines of fit ink.notebook Page 7 Page 7. Scatter Plots and Line of Fit Page 7 Page 7 Page 7 Page 76 . scatterplots and lines of fit ink.notebook Lesson Objectives Standards Lesson Notes

### Algebra I: Strand 1. Foundations of Functions; Topic 3. Changing Perimeter; Task 1.3.1

1 TASK 1.3.1: CHANGING PERIMETER TEACHER ACTIVITY Solutions On a sheet of grid paper create one set of axes. Label the axes for this situation perimeter vs. multiplier. Graph your prediction data for changing

### Classification and Prediction

Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

### Topic: Solving Linear Equations

Unit 1 Topic: Solving Linear Equations N-Q.1. Reason quantitatively and use units to solve problems. Use units as a way to understand problems and to guide the solution of multi-step problems; choose and

### Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

### 8 th Grade Math Curriculum/7 th Grade Advanced Course Information: Course 3 of Prentice Hall Common Core

8 th Grade Math Curriculum/7 th Grade Advanced Course Information: Course: Length: Course 3 of Prentice Hall Common Core 46 minutes/day Description: Mathematics at the 8 th grade level will cover a variety

### High School Algebra 1 Common Core Standards & Learning Targets

High School Algebra 1 Common Core Standards & Learning Targets Unit 1: Relationships between Quantities and Reasoning with Equations CCS Standards: Quantities N-Q.1. Use units as a way to understand problems

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Georgia Online Formative Assessment Resource (GOFAR) 8th Grade Data Analysis and Probability

8th Grade Data Analysis and Probability Name: Date: Copyright 2014 by Georgia Department of Education. Items shall not be used in a third party system or displayed publicly. Page: (1 of 7 ) 1. Plainview

### Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

### Non-Linear Regression Analysis

Non-Linear Regression Analysis By Chanaka Kaluarachchi Presentation outline Linear regression Checking linear Assumptions Linear vs non-linear Non linear regression analysis Linear regression (reminder)

### INTERMEDIATE STATISTICS and LEAST SQUARES METHOD

INTERMEDIATE STATISTICS and LEAST SQUARES METHOD Educational Objective: To introduce several intermediate concepts in statistics for scientific measurements and data analysis, propagation of errors and

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### Binary Logistic Regression

Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### The Sugar Content of Soft Drinks

Suggested reading: Chang text pages 15-27 Purpose To determine the sugar content of some of the beverages you drink. Overview Sugar contributes more than any other solute to the density of soft drinks.

### Chapter 3: Describing Relationships

Chapter 3: Describing Relationships The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Chapter 3 2 Describing Relationships 3.1 Scatterplots and Correlation 3.2 Learning Targets After

### Domain: Statistics and Probability (SP) Cluster: Investigate patterns of association in bivariate data.

Domain: Statistics and Probability (SP) Standard: 8.SP.1. Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities. Describe patterns

### Dealing with Data in Excel 2010

Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### In this section, we ll review plotting points, slope of a line and different forms of an equation of a line.

Math 1313 Section 1.2: Straight Lines In this section, we ll review plotting points, slope of a line and different forms of an equation of a line. Graphing Points and Regions Here s the coordinate plane:

### Aim: How do we find the slope of a line? Warm Up: Go over test. A. Slope -

Aim: How do we find the slope of a line? Warm Up: Go over test A. Slope - Plot the points and draw a line through the given points. Find the slope of the line.. A(-5,4) and B(4,-3) 2. A(4,3) and B(4,-6)

### Section 3.4 The Slope Intercept Form: y = mx + b

Slope-Intercept Form: y = mx + b, where m is the slope and b is the y-intercept Reminding! m = y x = y 2 y 1 x 2 x 1 Slope of a horizontal line is 0 Slope of a vertical line is Undefined Graph a linear

### Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

### Pennsylvania System of School Assessment

Pennsylvania System of School Assessment The Assessment Anchors, as defined by the Eligible Content, are organized into cohesive blueprints, each structured with a common labeling system that can be read

### Data Mining for Knowledge Management. Classification

1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

### Activity 10 Regression Lines

Activity 10 Regression Lines Topic Area: Data Analysis and Probability NCTM Standard: Select and use appropriate statistical methods to analyze data. Objective: The student will be able to utilize the

### Scatter Diagrams. Correlation Classifications. Correlation Classifications. Correlation Classifications

Scatter Diagrams Correlation Classifications Scatter diagrams are used to demonstrate correlation between two quantitative variables. Often, this correlation is linear. This means that a straight line

### Scope and Sequence. AM CC Algebra 1 Library

The Accelerated Math scope and sequence is based on extensive mathematics and standards research and consultation from mathematics educators, mathematicians, and university researchers. You're sure to

### Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining SPSS 12.0 1. Overview Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface Projects References Outline Introduction Introduction Three of the common data mining

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### Study Resources For Algebra I. Unit 1C Analyzing Data Sets for Two Quantitative Variables

Study Resources For Algebra I Unit 1C Analyzing Data Sets for Two Quantitative Variables This unit explores linear functions as they apply to data analysis of scatter plots. Information compiled and written

### MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Module 7 Test Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. You are given information about a straight line. Use two points to graph the equation.

### Predicting Student Persistence Using Data Mining and Statistical Analysis Methods

Predicting Student Persistence Using Data Mining and Statistical Analysis Methods Koji Fujiwara Office of Institutional Research and Effectiveness Bemidji State University & Northwest Technical College

### Mathematics Common Core Cluster. Mathematics Common Core Standard. Domain

Mathematics Common Core Domain Mathematics Common Core Cluster Mathematics Common Core Standard Number System Know that there are numbers that are not rational, and approximate them by rational numbers.

### Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

### YEAR 8 SCHEME OF WORK - SECURE

YEAR 8 SCHEME OF WORK - SECURE Autumn Term 1 Number Spring Term 1 Real-life graphs Summer Term 1 Calculating with fractions Area and volume Decimals and ratio Straight-line graphs Half Term: Assessment

### Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

### APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email paul@esru.strath.ac.uk

Eighth International IBPSA Conference Eindhoven, Netherlands August -4, 2003 APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION Christoph Morbitzer, Paul Strachan 2 and

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### Chapter 2. Looking at Data: Relationships. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 2 Looking at Data: Relationships Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 2 Looking at Data: Relationships 2.1 Scatterplots

### Linear Approximations ACADEMIC RESOURCE CENTER

Linear Approximations ACADEMIC RESOURCE CENTER Table of Contents Linear Function Linear Function or Not Real World Uses for Linear Equations Why Do We Use Linear Equations? Estimation with Linear Approximations

### Honors Math 7 Credit by Exam Information Plano ISD

Honors Math 7 Credit by Exam Information Plano ISD This exam is for students who have completed 6 th grade math (or honors 6 th grade math) and wish to accelerate into Honors Algebra I for the following

### AMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015

AMS7: WEEK 8. CLASS 1 Correlation Monday May 18th, 2015 Type of Data and objectives of the analysis Paired sample data (Bivariate data) Determine whether there is an association between two variables This

### Vocabulary. S.ID.B.6b: Use Residuals to Assess Fit of a Function

S.ID.B.6b: Use Residuals to Assess Fit of a Function GRAPHS AND STATISTICS S.ID.B.6b: Use Residuals to Assess Fit of a Function B. Summarize, represent, and interpret data on two categorical and quantitative

### Appendix E: Graphing Data

You will often make scatter diagrams and line graphs to illustrate the data that you collect. Scatter diagrams are often used to show the relationship between two variables. For example, in an absorbance

### Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association

Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile Association Overview Two Challenges: 1. Identifying High/Medium Profit who are High/Low Risk of Flight Prospects

### Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th Standard 3: Data Analysis, Statistics, and Probability 6 th Prepared Graduates: 1. Solve problems and make decisions that depend on un

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### Chapter 10 - Practice Problems 1

Chapter 10 - Practice Problems 1 1. A researcher is interested in determining if one could predict the score on a statistics exam from the amount of time spent studying for the exam. In this study, the