1 Data Mining Prediction Jingpeng Li 1 of 23 What is Prediction? Predicting the identity of one thing based purely on the description of another related thing Not necessarily future events, just unknowns Based on the relationship between a thing that you can know and a thing you need to predict 2 of 23 1
2 Terms Predictor => Predicted When building a predictive model, you have data covering both When using one, you have data describing the predictor and you want it to tell you the predicted value 3 of 23 How Does it Differ From Classification? A classification problem could be seen as a predictor of classes, but. Predicted values are usually continuous whereas classifications are discreet. Predictions are often (but not always) about the future whereas classifications are about the present. Classification is more concerned with the input than the output 4 of 23 2
3 Usual Examples Predicting levels of sales that will result from a price change or advert. Predicting whether or not it will rain based on current humidity Predicting the colour of a pottery glaze based on a mixture of base pigments Predicting how far up the charts a single will go Predicting how much revenue a book will bring 5 of 23 Techniques Most prediction techniques are based on mathematical models: Simple statistical models such as linear regression Nonlinear statistics such as power series Neural networks, RBFs, etc All based on fitting a curve through the data, that is, finding a relationship from the predictors to the predicted 6 of 23 3
4 Simple Worked Example Predicting sales levels for a national newspaper Predictors Price Front cover story Competitions Advertising spend Predicted Sales in Units 7 of 23 Price Cover Competition Advert spend Sales 22 Political No Political No Sport No Sport No Royal No Royal No Crime No Royal No Royal High Val Royal Low Val Sport No Sport High Val Royal No Crime Low Val Sport Low Val Sport Low Val Crime No Sport No Sport High Val Royal No Political High Val Sport No Political No Royal High Val Sport No Royal No Sport No The Data Sales increase as price decreases but other factors play a part too Sales by Price of 23 4
5 Mathematical Model Learns relationship between all predictors at once and the predicted outcome: Sales=f(Price, Cover, Adverts, Competition) Sales are a function of several variables. The job of a data mining algorithm is to find the function f 9 of 23 Neural Network Example A certain type of neural network, called a multi layer perceptron (MLP) can learn a function between our inputs (qualities of a newspaper) and the outcome (Sales) It works by building the function out of many small simple functions, joined by weighted connections 10 of 23 5
6 MLP Structure Output Layer Every unit does the same thing: Hidden Layer O j f ( wij i i O ) Input Layer 1 f (a) = 1+ e a 11 of 23 Neural Network Example A neural network uses the data to modify the weighted connections between all of its functions until it is able to predict the data accurately This process is referred to as training the neural network 12 of 23 6
7 Neural Network Training 1. Prepare the data so that a file contains the predictors and the predicted variables with an example per row 2. Split the data into a test set and a training set 3. Read each row in turn into the neural network, presenting the predictors as inputs and the predicted value as the target output 4. Make a prediction and compare the value given by the neural network to the target value 5. Update the weights see next slide 6. Present the next example in the file 7. Repeat until the error no longer reduces ideally stop when the test error is at its lowest. 13 of 23 How are the Weights Changed? Training data has inputs and outputs, in this example, newspaper details and sales figures The MLP starts with random weights Each example in the training data is used as an input and the network generates an output The difference between that output and the value in the training data is known as the error 14 of 23 7
8 Error Back Propagation An algorithm known as error back propagation uses this error value to change the weights The weight change from the input layer unit i to hidden layer unit j is: w ij å j xi where d j = o j (1 o j ) w jk The weight change from the hidden layer unit j to the output layer unit k is: k d k Dw jk =h d k o j where error ) y (1 y ) k ( k k Qualities of a Predictor Which ever technique you use, it should have the following qualities: Ability to make correct predictions on data that is not in the original training data Ability to provide a certainty measure with its predictions How well a solution performs depends on both the data and the person who built it 16 of 23 8
9 Important Concepts Over Fitting A data mining predictor can capture the structure of the data so well that irrelevant details are picked up and used when they are not generally true Data Quantity and Quality Insufficient data or data that does not capture the relationship between predictors and predicted can produce a very poor solution 17 of 23 Important Concepts Multiple solutions It is possible (easy, in fact) to build more than one correct (or equally accurate) predictor from the same data set Several such predictors should be built and compared A winner might be chosen, or several could be used as a panel of experts 18 of 23 9
10 Nonlinear? Curvy! Or to be more specific: If x predicts y then they have a nonlinear relationship if the effect on y of a small change in x depends on the current value of x. 19 of 23 Nonlinear? Where ever you are along the line on the linear plot above, moving one unit to the right will move you up 5 units. The 1/5 ratio is constant so the relationship is linear Here, moving a unit to the right on the line above will carry you up a different amount, depending on where you are: nonlinear 20 of 23 10
11 NonLinear Note that if you have more than one predictor, nonlinearity can occur as two or more predictors combine E.g Putting the price up 1p will cause you to sell 1000 fewer newspapers when there is a political story on the front cover, but only 500 fewer with sport on the cover 21 of 23 Advantages of Neural Networks Very powerful predictors almost always better than any rule based system a human expert could design Can cope with nonlinear relationships, multiple numeric and discreet variables Able to generalise to data that it has not seen before 22 of 23 11
12 Disadvantages How predictions are gained can be hard to understand by a human user Not easy to ask why an answer was given (though some help is possible) No rules to look at Can make big errors if not trained properly Requires a certain degree of faith! 23 of 23 12
More information