Data Miig CS 341, Sprig 2007 Decisio Trees Neural etworks Review: Lecture 6: Classificatio issues, regressio, bayesia classificatio Pretice Hall 2 Data Miig Core Techiques Classificatio Clusterig Associatio Rules Classificatio Outlie Goal: Provide a overview of the classificatio problem ad itroduce some of the basic algorithms Classificatio Problem Overview Classificatio Techiques Regressio Bayesia classificatio Distace Decisio Trees Rules Neural Networks Pretice Hall 3 Pretice Hall 4 Classificatio Outlie Goal: Provide a overview of the classificatio problem ad itroduce some of the basic algorithms Classificatio Problem Overview Classificatio Techiques Regressio Bayesia classificatio Classificatio Problem Give a database D={t 1,t 2,,t } ad a set of classes C={C 1,,C,C m }, the Classificatio Problem is to defie a mappig f:dc where each t i is assiged to oe class. Actually divides D ito equivalece classes. Predictio is similar, but may be viewed as havig ifiite umber of classes. Pretice Hall 5 Pretice Hall 6 1
Classificatio Examples Teachers classify studets grades as A, B, C, D, or F. Idetify mushrooms as poisoous or edible. Predict whe a river will flood. Idetify idividuals with credit risks. Speech recogitio Patter recogitio Pretice Hall 7 Classificatio Ex: Gradig If x >= 90 the grade =A. If 80<=x<90 the grade =B. If 70<=x<80 the grade =C. If 60<=x<70 the grade =D. If x<60 the grade =F. <80 <70 x <50 F <90 >=90 Pretice Hall 8 x x x >=70 C >=60 D >=80 B A Classificatio Ex: Letter Recogitio View letters as costructed from 5 compoets: Letter A Letter B Letter C Letter D Letter E Letter F Pretice Hall 9 Classificatio Techiques Approach: 1. Create specific model by evaluatig traiig data (or usig domai experts kowledge). 2. Apply model developed to ew data. Classes must be predefied Most commo techiques use DTs, NNs,, or are based o distaces or statistical methods. Pretice Hall 10 Defiig Classes Issues i Classificatio Partitioig Based Distace Based Missig Data Igore Replace with assumed value Overfittig Large set of traiig data Filter out erroeous or oisy data Measurig Performace Classificatio accuracy o test data Cofusio matrix OC Curve Pretice Hall 11 Pretice Hall 12 2
Classificatio Accuracy Classificatio Performace True positive (TP) t i Predicted to be i C j ad is actually i it. False positive (FP) t i Predicted to be i C j but is ot actually i it. True egative (TN) t i ot predicted to be i C j ad is ot actually i it. False egative (FN) t i ot predicted to be i C j but is actually i it. True Positive False Positive False Negative True Negative Pretice Hall 13 Pretice Hall 14 Cofusio Matrix A m x m matrix Etry C i,j idicates the umber of tuples assiged to C j, but where the correct class is C i The best solutio will oly have o- zero values o the diagoal. Pretice Hall 15 Height Example Data N am e G e d e r H eig h t O u tp u t1 O u tp u t2 K ristia F 1.6m S ho rt M ed iu m Jim M 2m T all M ed iu m M ag gie F 1.9m M e diu m T all M arth a F 1.88 m M e diu m T all S tep ha ie F 1.7m S ho rt M ed iu m B o b M 1.85 m M e diu m M ed iu m K a th y F 1.6m S ho rt M ed iu m D ave M 1.7m S ho rt M ed iu m W orth M 2.2m T all T all S teve M 2.1m T all T all D eb bie F 1.8m M e diu m M ed iu m T o dd M 1.95 m M e diu m M ed iu m K im F 1.9m M e diu m T all A m y F 1.8m M e diu m M ed iu m W ye tte F 1.75 m M e diu m M ed iu m Pretice Hall 16 Cofusio Matrix Example Operatig Characteristic Curve Usig height data example with Output1 (correct) ad Output2 (actual) assigmet Actual Assigmet Membership Short Medium Tall Short 0 4 0 Medium 0 5 3 Tall 0 1 2 Pretice Hall 17 Pretice Hall 18 3
Classificatio Outlie Goal: Provide a overview of the classificatio problem ad itroduce some of the basic algorithms Classificatio Problem Overview Classificatio Techiques Regressio Distace Decisio Trees Rules Neural Networks Regressio Assume data fits a predefied fuctio Determie best values for parameters i the model Estimate a output value based o iput values Ca be used for classificatio ad predictio Pretice Hall 19 Pretice Hall 20 Liear Regressio Assume the relatio of the output variable to the iput variables is a liear fuctio of some parameters. Determie best values for regressio coefficiets c 0,c 1,,c,c. Assume a error: y = c 0 +c 1 x 1 + +c x +ε Estimate error usig mea squared error for traiig set: Example: 4.3 Y = C 0 + ε Fid the value for c 0 that best partitio the height values ito classes: short ad medium The traiig data for y i is {1.6, 1.9, 1.88, 1.7, 1.85, 1.6, 1.7, 1.8, 1.95, 1.9, 1.8, 1.75} How? Pretice Hall 21 Pretice Hall 22 Example: 4.4 Liear Regressio Poor Fit Y = c 0 + c 0 x 1 + ε Fid the value for c 0 ad c 1 that best predict the class. Assume 0 for the short class, 1 for the medium class The traiig data for (x i, y i) is {(1.6,0), (1.9,0), (1.88, 0), (1.7, 0), (1.85, 0), (1.6, 0), (1.7,0), (1.8,0), (1.95, 0), (1.9, 0), (1.8, 0), (1.75, 0)} How? Pretice Hall 23 Pretice Hall 24 4
Classificatio Usig Regressio Divisio Divisio: Use regressio fuctio to divide area ito regios. Predictio: : Use regressio fuctio to predict a class membership fuctio. Pretice Hall 25 Pretice Hall 26 Predictio Logistic Regressio A geeralized liear model Extesively used i the medical ad social scieces It has the followig form Log e (p /p -1) = c 0 + c 1 x 1 + + c k x k p is the probability of beig i the class, 1 p is the probability that is ot. The parameters c 0, c 1, c k are usually estimated by maximum likelihood. (maximize the probability of observig the give value.) Pretice Hall 27 Pretice Hall 28 Why Logistic Regressio P is i the rage [0,1] A good model would like to have p value close to 0 or 1 Liear fuctio is ot suitable for p Cosider the odds p/1-p. p. As p icreases, the odds (p/1-p) p) icreases The odds is i the rage of [0, + ], + asymmetric. The log odds lies i the rage - to +, symmetric. Liear Regressio vs. Logistic Regressio Pretice Hall 29 Pretice Hall 30 5
Classificatio Outlie Goal: Provide a overview of the classificatio problem ad itroduce some of the basic algorithms Classificatio Problem Overview Classificatio Techiques Regressio Bayesia classificatio Bayes Theorem Posterior Probability: P(h 1 x i ) Prior Probability: P(h 1 ) Bayes Theorem: Pretice Hall 31 Assig probabilities of hypotheses give a data value. Pretice Hall 32 Naïve Bayes Classificatio Assume that the cotributio by all attributes are idepedet ad that each cotributes equally to the classificatio problem. t i has m idepedet attributes {x i1 P (t( i C j ) P (x( ik C j ) i1,, x im,}. Pretice Hall 33 Example: usig the output1 as classificatio results N a m e G e d e r H e ig h t O u tp u t1 O u tp u t2 K ris ti a F 1.6 m S h o rt M e d iu m J im M 2 m T a ll M e d iu m M a g g ie F 1.9 m M e d iu m T a ll M a rth a F 1.8 8 m M e d iu m T a ll S te p h a ie F 1.7 m S h o rt M e d iu m B o b M 1.8 5 m M e d iu m M e d iu m K a th y F 1.6 m S h o rt M e d iu m D a v e M 1.7 m S h o rt M e d iu m W o rth M 2.2 m T a ll T a ll S te v e M 2.1 m T a ll T a ll D e b b ie F 1.8 m M e d iu m M e d iu m T o d d M 1.9 5 m M e d iu m M e d iu m K im F 1.9 m M e d iu m T a ll A m y F 1.8 m M e d iu m M e d iu m W y e tte F 1.7 5 m M e d iu m M e d iu m Pretice Hall 34 Example 4.5 Step1: Calculate the prior probability P (short) = P (medium) = P (tall) = Example 4.5 Step1: Calculate the prior probability P (short) = 4/15 = 0.267 P (medium) = 8/15 = 0.533 P (tall) = 3/15 = 0.2 Step 2: Calculate the coditioal probability P(Geder i C j ), Geder i = F or M, C j = short or medium or tall P(Height i C j ) Height i i (0,1.6],(1.6,1.7],(1.7,1.8],(1.8,1.9],(1.9,2.0],(>2.0). Pretice Hall 35 Pretice Hall 36 6
Attribute Example 4.5 (cot d) cout short medium tall Geder M 1 2 3 F 3 6 0 Height (<1.6] 2 0 0 (1.6,1.7] 2 0 0 (1.7,1.8] 0 3 0 (1.8,1.9] 0 4 0 (1.9,2.0] 0 1 1 ( >2.0 ) 0 0 2 probability p(x i C j ) short medium tall 1/4 2/8 3/3 3/4 6/8 0/3 2/4 0 0 2/4 0 0 0 3/8 0 0 4/8 0 0 1/8 1/3 0 0 2/3 Example 4.5 (cot d) Give a tuple t ={Adam, M, 1.95m} Step 3: Calculate P(t C j ) P(t short) ) = P(t medium) ) = P(t tall)= Step 4: calculate P(t) P(t) ) = P(t short)p(short)+p(t medium)p(medium)+p(t tall)p(tall) Pretice Hall 37 Pretice Hall 38 Example 4.5 (cot d) Give a tuple t ={Adam, M, 1.95m} Step 3: Calculate P(t C j ) P(t short) ) = ¼ x 0 =0 P(t medium) ) = 2/8 x 1/8 =0.031 P(t tall)= 3/3 x1/3 =0.333 Step 4: calculate P(t) P(t) ) = P(t short)p(short)+p(t medium)p(medium)+p(t tall)p(tall) = 0.0826 Example 4.5 (cot d) Step 5: Calculate P(C j t) usig Bayes Rule P(short t) ) = P(t short)p(short)/p(t) ) = P(medium t) ) = P(tall t)= Last step: classify t based o these probabilities Pretice Hall 39 Pretice Hall 40 Example 4.5 (cot d) Step 5: Calculate P(C j t) usig Bayes Rule P(short t) ) = P(t short)p(short)/p(t) ) = 0 P(medium t) ) = 0.2 P(tall t)= 0.799 Last step: Classify the ew tuple as tall. A Summary Step 1: Calculate the prior probability of each class. P (C( j ) Step 2: Calculate the coditioal probability for each attribute value, P(Geder i C j ), Step 3: Calculate the coditioal probability P(t C j ) Step 4: calculate the prior probability of a tuple, P(t) Step 5: Calculate the posterior probability for each class give the tuple, P(C j t) usig Bayes Rule Step 6: Classify a tuple based o the P(C j t), the tuple belogs to the class with has the highest posterior probability. Pretice Hall 41 Pretice Hall 42 7
Next Lecture: Classificatio: Distace-based algorithms Decisio tree-based algorithms HW2 will be aouced! Pretice Hall 43 8