Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Similar documents
8. Linear least-squares

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

What are the place values to the left of the decimal point and their associated powers of ten?

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Method To Solve Linear, Polynomial, or Absolute Value Inequalities:

Grade level: secondary Subject: mathematics Time required: 45 to 90 minutes

AMATH 352 Lecture 3 MATLAB Tutorial Starting MATLAB Entering Variables

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

ALGEBRA 2: 4.1 Graph Quadratic Functions in Standard Form

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Lecture 8 February 4

Introduction to Logistic Regression

Lecture 3: Linear methods for classification

Common Core Unit Summary Grades 6 to 8

Lecture 07: Work and Kinetic Energy. Physics 2210 Fall Semester 2014

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Linear Threshold Units

Estimation of σ 2, the variance of ɛ

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Machine Learning and Pattern Recognition Logistic Regression

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Programming Exercise 3: Multi-class Classification and Neural Networks

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Data Mining. Nonlinear Classification

Dimensionality Reduction: Principal Components Analysis

Decision Trees from large Databases: SLIQ

Computer Graphics CS 543 Lecture 12 (Part 1) Curves. Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

α = u v. In other words, Orthogonal Projection

Least Squares Estimation

5. Multiple regression

Beginner s Matlab Tutorial

CAHSEE on Target UC Davis, School and University Partnerships

2. Simple Linear Regression

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Algebra I Vocabulary Cards

Introduction to Learning & Decision Trees

Integration. Topic: Trapezoidal Rule. Major: General Engineering. Author: Autar Kaw, Charlie Barker.

Module 1 : Conduction. Lecture 5 : 1D conduction example problems. 2D conduction

Anchorage School District/Alaska Sr. High Math Performance Standards Algebra

STA 4273H: Statistical Machine Learning

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0,

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

L 2 : x = s + 1, y = s, z = 4s Suppose that C has coordinates (x, y, z). Then from the vector equality AC = BD, one has

HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14

Introduction to Machine Learning Using Python. Vikram Kamath

Algebra 2 Year-at-a-Glance Leander ISD st Six Weeks 2nd Six Weeks 3rd Six Weeks 4th Six Weeks 5th Six Weeks 6th Six Weeks

Getting to know your TI-83

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

5. Linear Regression

Data Mining Part 5. Prediction

Vocabulary Words and Definitions for Algebra

LINEAR INEQUALITIES. less than, < 2x + 5 x 3 less than or equal to, greater than, > 3x 2 x 6 greater than or equal to,

Chapter 4 -- Decimals

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Algebra 2 PreAP. Name Period

Mathematics. What to expect Resources Study Strategies Helpful Preparation Tips Problem Solving Strategies and Hints Test taking strategies

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Machine Learning: Multi Layer Perceptrons

Algebra 1 Course Information

Support Vector Machines Explained

Content-Based Recommendation

Course Outlines. 1. Name of the Course: Algebra I (Standard, College Prep, Honors) Course Description: ALGEBRA I STANDARD (1 Credit)

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS

Machine Learning Logistic Regression

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Unit 7: Radical Functions & Rational Exponents

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Graphs of Polar Equations

Simple Linear Regression Inference

Bootstrapping Big Data

Polynomial Neural Network Discovery Client User Guide

Section 1.1. Introduction to R n

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

Experiment #1, Analyze Data using Excel, Calculator and Graphs.

Support Vector Machines with Clustering for Training with Very Large Datasets

Advanced analytics at your hands

Prentice Hall Mathematics: Algebra Correlated to: Utah Core Curriculum for Math, Intermediate Algebra (Secondary)

Introduction to Modeling Spatial Processes Using Geostatistical Analyst

Learning is a very general term denoting the way in which agents:

Regression III: Advanced Methods

How To Run Statistical Tests in Excel

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Georgia Standards of Excellence Curriculum Map. Mathematics. GSE 8 th Grade

Combining GLM and datamining techniques for modelling accident compensation data. Peter Mulquiney

EQUATIONS and INEQUALITIES

Curves and Surfaces. Goals. How do we draw surfaces? How do we specify a surface? How do we approximate a surface?

Double Integrals in Polar Coordinates

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Vector Notation: AB represents the vector from point A to point B on a graph. The vector can be computed by B A.

Supervised Learning (Big Data Analytics)

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Transcription:

Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler

Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error Finding good parameters ϴ (direct minimization problem) Non-Linear regression problem

Example of Regression Vehicle price estimation problem Features x: Fuel type, The number of doors, Engine size Targets y: Price of vehicle Training Data: (fit the model) Fuel type The number of doors Engine size (61 326) Price 1 4 70 11000 1 2 120 17000 2 2 310 41000 Testing Data: (evaluate the model) Fuel type The number of doors Engine size (61 326) Price 2 4 150 21000

Example of Regression Vehicle price estimation problem Fuel type #of doors Engine size Price 1 4 70 11000 1 2 120 17000 2 2 310 41000 Fuel type The number of doors Engine size (61 326) Price 2 4 150 21000 ϴ = [-300, -200, 700, 130] Ex #1 = -300 + -200 *1 + 4*700 + 70*130 = 11400 Ex #2 = -300 + -200 *1 + 2*700 + 120*130 = 16500 Ex #3 = -300 + -200 *2 + 2*700 + 310*130 = 41000 Mean Absolute Training Error = 1/3 *(400 + 500 + 0) = 300 Test = -300 + -200 *2 + 4*700 + 150*130 = 21600 Mean Absolute Testing Error = 1/1 *(600) = 600

Supervised learning Notation Features x (input variables) Targets y (output variables) Predictions ŷ Parameters θ Training data (examples) Features Program ( Learner ) Characterized by some parameters θ Procedure (using θ) that outputs a prediction Error = Distance between y and ŷ Learning algorithm Change θ Improve performance Feedback / Target values Evaluation of the model (measure error)

Overview Regression Problem Definition and parameters. Prediction using ϴ as parameters Measure the error Finding good parameters ϴ (direct minimization problem) Non-Linear regression problem

Linear regression Target y 40 20 New instance with X 1 =8 Predicted value =17 Y = 5 + 1.5*X 1 Predictor : Evaluate line: ӯ = ϴ 0 + ϴ 1 * X 1 return ӯ ӯ = Predicted target value (Black line) 0 0 10 20 Feature X 1 Define form of function f(x) explicitly Find a good f(x) within that family

More dimensions? 26 26 y 24 22 y 24 22 20 20 30 20 20 x 10 1 10 0 0 x 2 30 40 30 20 20 x 10 1 10 0 0 x 2 30 40

Notation Ӯ is a plane in n+1 dimension space Define feature x 0 = 1 (constant) Then n = the number of features in dataset

Overview Regression Problem Definition and parameters. Prediction using ϴ as parameters Measure the error Finding good parameters ϴ (direct minimization problem) Non-Linear regression problem

Supervised learning Notation Features x (input variables) Targets y (output variables) Predictions ŷ Parameters θ Training data (examples) Features Program ( Learner ) Characterized by some parameters θ Procedure (using θ) that outputs a prediction Error = Distance between y and ŷ Learning algorithm Change θ Improve performance Feedback / Target values Evaluation of the model (measure error)

Measuring error Red points = Real target values Black line = ӯ (predicted value) ӯ = ϴ 0 + ϴ 1 * X Blue lines = Error (Difference between real value y and predicted value ӯ) Observation Error or residual Prediction 0 0 20

Mean Squared Error How can we quantify the error? m=number of instance of data Y= Real target value in dataset, ӯ = Predicted target value by ϴ*X Training Error: m= the number of training instances, Testing Error: Using a partition of Training error to check predicted values. m= the number of testing instances,

MSE cost function Rewrite using matrix form X = input variables in dataset y= output variable in dataset m=number of instance of data n = the number of features, (Matlab) >> e = y th*x ; J = e*e /m;

Visualizing the error function J is error function. The plane is the value of J, not the plane fitted to output values. J(θ) θ 1 Dimensions are ϴ0 and ϴ1 instead of X1 and X2 Output is J instead of y as target value 40 30 θ 1 20 10 0-10 Representation of J in 2D space. Inner red circles has less value of J Outer red circles has higher value of J -20-30 -40-1 -0.5 0 0.5 1 1.5 2 2.5 3 θ 0

Overview Regression Problem Definition and parameters. Prediction using ϴ as parameters Measure the error Finding good parameters ϴ (direct minimization problem) Non-Linear regression problem

Supervised learning Notation Features x Targets y Predictions ŷ Parameters θ Program ( Learner ) Learning algorithm Change θ Improve performance Training data (examples) Features Feedback / Target values Characterized by some parameters θ Procedure (using θ) that outputs a prediction Evaluation of the model (measure error)

Finding good parameters Want to find parameters which minimize our error Think of a cost surface : error residual for that θ

MSE Minimum (m <= n+1) Consider a simple problem One feature, two data points Two unknowns and two equations: m=number of instance of data n = the number of features, n +1=1+1 = 2 m=2 Can solve this system directly: Theta gives a line or plane that exactly fit to all target values.

SSE Minimum (m > n+1) Most of the time, m > n There may be no linear function that hits all the data exactly Minimum of a function has gradient equal to zero (gradient is a horizontal line.) Reordering, we have n +1=1+1 = 2 m=3 Just need to know how to compute parameters.

Effects of Mean Square Error choice outlier data: An outlier is an observation that lies an abnormal distance from other value. 18 16 14 12 10 8 16 2 cost for this one datum Heavy penalty for large errors Distract line from other points. 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20

Absolute error 18 MSE, original data 16 14 Abs error, original data Abs error, outlier data 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20

Error functions for regression (Mean Square Error) (Mean Absolute Error) Something else entirely (???) Arbitrary Error functions can t be solved in closed form So as alternative way, use gradient descent

Overview Regression Problem Definition and parameters. Prediction using ϴ as parameters Measure the error Finding good parameters ϴ (direct minimization problem) Non-Linear regression problem

Nonlinear functions Single feature x, predict target y: Add features: Linear regression in new features Sometimes useful to think of feature transform Convert a non-linear function to linear function and then solve it.

Higher-order polynomials Y = ϴ 0 Are more features better? Nested hypotheses 2 nd order more general than 1 st, 3 rd order than 2 nd, Fits the observed data better Y = ϴ 0 + ϴ 1 * X Y = ϴ 0 + ϴ 1 * X + ϴ 2 *X 2 + ϴ 3 *X 3 18 nd order 1 st order 3 rd order

Test data After training the model Go out and get more data from the world New observations (x,y) How well does our model perform? Training data New, test data

Training versus test error Plot MSE as a function of model complexity Polynomial order Decreases More complex function fits training data better What about new data? 0 th to 2 st order Error decreases Underfitting Higher order Error increases Overfitting 30 25 Under fitting 20 15 10 5 Mean squared error Training data New, test data 0 0 1 2 3 4 5 6 7 8 9 10 Polynomial order Overfitting

Summary Regression Problem Definition Vehicle Price estimation Prediction using ϴ: Measure the error: difference between y and ŷ e.g. Absolute error, MSE direct minimization problem Two cases m<=n+1 and m > n+1 Non-Linear regression problem Finding best n th order polynomial function for each problem (not overfitting and not under fitting)