Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Size: px

Start display at page:

Download "Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler"

Myra Hall
9 years ago
Views:

1 Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler

2 Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error Finding good parameters ϴ (direct minimization problem) Non-Linear regression problem

3 Example of Regression Vehicle price estimation problem Features x: Fuel type, The number of doors, Engine size Targets y: Price of vehicle Training Data: (fit the model) Fuel type The number of doors Engine size (61 326) Price Testing Data: (evaluate the model) Fuel type The number of doors Engine size (61 326) Price

4 Example of Regression Vehicle price estimation problem Fuel type #of doors Engine size Price Fuel type The number of doors Engine size (61 326) Price ϴ = [-300, -200, 700, 130] Ex #1 = *1 + 4* *130 = Ex #2 = *1 + 2* *130 = Ex #3 = *2 + 2* *130 = Mean Absolute Training Error = 1/3 *( ) = 300 Test = *2 + 4* *130 = Mean Absolute Testing Error = 1/1 *(600) = 600

5 Supervised learning Notation Features x (input variables) Targets y (output variables) Predictions ŷ Parameters θ Training data (examples) Features Program ( Learner ) Characterized by some parameters θ Procedure (using θ) that outputs a prediction Error = Distance between y and ŷ Learning algorithm Change θ Improve performance Feedback / Target values Evaluation of the model (measure error)

6 Overview Regression Problem Definition and parameters. Prediction using ϴ as parameters Measure the error Finding good parameters ϴ (direct minimization problem) Non-Linear regression problem

7 Linear regression Target y New instance with X 1 =8 Predicted value =17 Y = *X 1 Predictor : Evaluate line: ӯ = ϴ 0 + ϴ 1 * X 1 return ӯ ӯ = Predicted target value (Black line) Feature X 1 Define form of function f(x) explicitly Find a good f(x) within that family

8 More dimensions? y y x x x x

9 Notation Ӯ is a plane in n+1 dimension space Define feature x 0 = 1 (constant) Then n = the number of features in dataset

10 Overview Regression Problem Definition and parameters. Prediction using ϴ as parameters Measure the error Finding good parameters ϴ (direct minimization problem) Non-Linear regression problem

11 Supervised learning Notation Features x (input variables) Targets y (output variables) Predictions ŷ Parameters θ Training data (examples) Features Program ( Learner ) Characterized by some parameters θ Procedure (using θ) that outputs a prediction Error = Distance between y and ŷ Learning algorithm Change θ Improve performance Feedback / Target values Evaluation of the model (measure error)

12 Measuring error Red points = Real target values Black line = ӯ (predicted value) ӯ = ϴ 0 + ϴ 1 * X Blue lines = Error (Difference between real value y and predicted value ӯ) Observation Error or residual Prediction

13 Mean Squared Error How can we quantify the error? m=number of instance of data Y= Real target value in dataset, ӯ = Predicted target value by ϴ*X Training Error: m= the number of training instances, Testing Error: Using a partition of Training error to check predicted values. m= the number of testing instances,

14 MSE cost function Rewrite using matrix form X = input variables in dataset y= output variable in dataset m=number of instance of data n = the number of features, (Matlab) >> e = y th*x ; J = e*e /m;

15 Visualizing the error function J is error function. The plane is the value of J, not the plane fitted to output values. J(θ) θ 1 Dimensions are ϴ0 and ϴ1 instead of X1 and X2 Output is J instead of y as target value θ Representation of J in 2D space. Inner red circles has less value of J Outer red circles has higher value of J θ 0

16 Overview Regression Problem Definition and parameters. Prediction using ϴ as parameters Measure the error Finding good parameters ϴ (direct minimization problem) Non-Linear regression problem

17 Supervised learning Notation Features x Targets y Predictions ŷ Parameters θ Program ( Learner ) Learning algorithm Change θ Improve performance Training data (examples) Features Feedback / Target values Characterized by some parameters θ Procedure (using θ) that outputs a prediction Evaluation of the model (measure error)

18 Finding good parameters Want to find parameters which minimize our error Think of a cost surface : error residual for that θ

19 MSE Minimum (m <= n+1) Consider a simple problem One feature, two data points Two unknowns and two equations: m=number of instance of data n = the number of features, n +1=1+1 = 2 m=2 Can solve this system directly: Theta gives a line or plane that exactly fit to all target values.

20 SSE Minimum (m > n+1) Most of the time, m > n There may be no linear function that hits all the data exactly Minimum of a function has gradient equal to zero (gradient is a horizontal line.) Reordering, we have n +1=1+1 = 2 m=3 Just need to know how to compute parameters.

21 Effects of Mean Square Error choice outlier data: An outlier is an observation that lies an abnormal distance from other value cost for this one datum Heavy penalty for large errors Distract line from other points

22 Absolute error 18 MSE, original data Abs error, original data Abs error, outlier data

23 Error functions for regression (Mean Square Error) (Mean Absolute Error) Something else entirely (???) Arbitrary Error functions can t be solved in closed form So as alternative way, use gradient descent

24 Overview Regression Problem Definition and parameters. Prediction using ϴ as parameters Measure the error Finding good parameters ϴ (direct minimization problem) Non-Linear regression problem

25 Nonlinear functions Single feature x, predict target y: Add features: Linear regression in new features Sometimes useful to think of feature transform Convert a non-linear function to linear function and then solve it.

26 Higher-order polynomials Y = ϴ 0 Are more features better? Nested hypotheses 2 nd order more general than 1 st, 3 rd order than 2 nd, Fits the observed data better Y = ϴ 0 + ϴ 1 * X Y = ϴ 0 + ϴ 1 * X + ϴ 2 *X 2 + ϴ 3 *X 3 18 nd order 1 st order 3 rd order

27 Test data After training the model Go out and get more data from the world New observations (x,y) How well does our model perform? Training data New, test data

28 Training versus test error Plot MSE as a function of model complexity Polynomial order Decreases More complex function fits training data better What about new data? 0 th to 2 st order Error decreases Underfitting Higher order Error increases Overfitting Under fitting Mean squared error Training data New, test data Polynomial order Overfitting

29 Summary Regression Problem Definition Vehicle Price estimation Prediction using ϴ: Measure the error: difference between y and ŷ e.g. Absolute error, MSE direct minimization problem Two cases m<=n+1 and m > n+1 Non-Linear regression problem Finding best n th order polynomial function for each problem (not overfitting and not under fitting)

8. Linear least-squares

8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1 Definition overdetermined linear equations if b range(a), cannot