Review: Classification Outline



Similar documents
THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

2-3 The Remainder and Factor Theorems

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Soving Recurrence Relations

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Baan Service Master Data Management

Chapter 5: Inner Product Spaces

1. MATHEMATICAL INDUCTION

AP Calculus AB 2006 Scoring Guidelines Form B

Classification Techniques (1)

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

1 Correlation and Regression Analysis

Section 11.3: The Integral Test

Asymptotic Growth of Functions

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Output Analysis (2, Chapters 10 &11 Law)

Overview of some probability distributions.

Firewall Modules and Modular Firewalls

5 Boolean Decision Trees (February 11)

Department of Computer Science, University of Otago

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

PSYCHOLOGICAL STATISTICS

Modified Line Search Method for Global Optimization

One-sample test of proportions

Chapter 7 Methods of Finding Estimators

How To Extract From Data From A College Course

LECTURE 13: Cross-validation

Detecting Auto Insurance Fraud by Data Mining Techniques

Determining the sample size

Infinite Sequences and Series

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Overview on S-Box Design Principles

CS100: Introduction to Computer Science

Confidence Intervals for One Mean

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

NATIONAL SENIOR CERTIFICATE GRADE 12

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Lesson 15 ANOVA (analysis of variance)


Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

I. Chi-squared Distributions

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Theorems About Power Series

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

Confidence Intervals

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS100: Introduction to Computer Science

Maximum Likelihood Estimators.

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

CHAPTER 3 THE TIME VALUE OF MONEY

3 Basic Definitions of Probability Theory

Plug-in martingales for testing exchangeability on-line

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Incremental calculation of weighted mean and variance

Subject CT5 Contingencies Core Technical Syllabus

Cooley-Tukey. Tukey FFT Algorithms. FFT Algorithms. Cooley

The Forgotten Middle. research readiness results. Executive Summary

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Clustering Algorithm Analysis of Web Users with Dissimilarity and SOM Neural Networks

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

1 Computing the Standard Deviation of Sample Means

FOUNDATIONS OF MATHEMATICS AND PRE-CALCULUS GRADE 10

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Notes on exponential generating functions and structures.

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

A gentle introduction to Expectation Maximization

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

CS103X: Discrete Structures Homework 4 Solutions

Basic Elements of Arithmetic Sequences and Series

MATH 083 Final Exam Review

Semiconductor Devices

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

How To Solve The Homewor Problem Beautifully

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

Function factorization using warped Gaussian processes

Lesson 17 Pearson s Correlation Coefficient

Spam Detection. A Bayesian approach to filtering spam

Multiplexers and Demultiplexers

SEQUENCES AND SERIES

Sampling Distribution And Central Limit Theorem

1. C. The formula for the confidence interval for a population mean is: x t, which was

Hypothesis testing. Null and alternative hypotheses

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Descriptive Statistics

Transcription:

Data Miig CS 341, Sprig 2007 Decisio Trees Neural etworks Review: Lecture 6: Classificatio issues, regressio, bayesia classificatio Pretice Hall 2 Data Miig Core Techiques Classificatio Clusterig Associatio Rules Classificatio Outlie Goal: Provide a overview of the classificatio problem ad itroduce some of the basic algorithms Classificatio Problem Overview Classificatio Techiques Regressio Bayesia classificatio Distace Decisio Trees Rules Neural Networks Pretice Hall 3 Pretice Hall 4 Classificatio Outlie Goal: Provide a overview of the classificatio problem ad itroduce some of the basic algorithms Classificatio Problem Overview Classificatio Techiques Regressio Bayesia classificatio Classificatio Problem Give a database D={t 1,t 2,,t } ad a set of classes C={C 1,,C,C m }, the Classificatio Problem is to defie a mappig f:dc where each t i is assiged to oe class. Actually divides D ito equivalece classes. Predictio is similar, but may be viewed as havig ifiite umber of classes. Pretice Hall 5 Pretice Hall 6 1

Classificatio Examples Teachers classify studets grades as A, B, C, D, or F. Idetify mushrooms as poisoous or edible. Predict whe a river will flood. Idetify idividuals with credit risks. Speech recogitio Patter recogitio Pretice Hall 7 Classificatio Ex: Gradig If x >= 90 the grade =A. If 80<=x<90 the grade =B. If 70<=x<80 the grade =C. If 60<=x<70 the grade =D. If x<60 the grade =F. <80 <70 x <50 F <90 >=90 Pretice Hall 8 x x x >=70 C >=60 D >=80 B A Classificatio Ex: Letter Recogitio View letters as costructed from 5 compoets: Letter A Letter B Letter C Letter D Letter E Letter F Pretice Hall 9 Classificatio Techiques Approach: 1. Create specific model by evaluatig traiig data (or usig domai experts kowledge). 2. Apply model developed to ew data. Classes must be predefied Most commo techiques use DTs, NNs,, or are based o distaces or statistical methods. Pretice Hall 10 Defiig Classes Issues i Classificatio Partitioig Based Distace Based Missig Data Igore Replace with assumed value Overfittig Large set of traiig data Filter out erroeous or oisy data Measurig Performace Classificatio accuracy o test data Cofusio matrix OC Curve Pretice Hall 11 Pretice Hall 12 2

Classificatio Accuracy Classificatio Performace True positive (TP) t i Predicted to be i C j ad is actually i it. False positive (FP) t i Predicted to be i C j but is ot actually i it. True egative (TN) t i ot predicted to be i C j ad is ot actually i it. False egative (FN) t i ot predicted to be i C j but is actually i it. True Positive False Positive False Negative True Negative Pretice Hall 13 Pretice Hall 14 Cofusio Matrix A m x m matrix Etry C i,j idicates the umber of tuples assiged to C j, but where the correct class is C i The best solutio will oly have o- zero values o the diagoal. Pretice Hall 15 Height Example Data N am e G e d e r H eig h t O u tp u t1 O u tp u t2 K ristia F 1.6m S ho rt M ed iu m Jim M 2m T all M ed iu m M ag gie F 1.9m M e diu m T all M arth a F 1.88 m M e diu m T all S tep ha ie F 1.7m S ho rt M ed iu m B o b M 1.85 m M e diu m M ed iu m K a th y F 1.6m S ho rt M ed iu m D ave M 1.7m S ho rt M ed iu m W orth M 2.2m T all T all S teve M 2.1m T all T all D eb bie F 1.8m M e diu m M ed iu m T o dd M 1.95 m M e diu m M ed iu m K im F 1.9m M e diu m T all A m y F 1.8m M e diu m M ed iu m W ye tte F 1.75 m M e diu m M ed iu m Pretice Hall 16 Cofusio Matrix Example Operatig Characteristic Curve Usig height data example with Output1 (correct) ad Output2 (actual) assigmet Actual Assigmet Membership Short Medium Tall Short 0 4 0 Medium 0 5 3 Tall 0 1 2 Pretice Hall 17 Pretice Hall 18 3

Classificatio Outlie Goal: Provide a overview of the classificatio problem ad itroduce some of the basic algorithms Classificatio Problem Overview Classificatio Techiques Regressio Distace Decisio Trees Rules Neural Networks Regressio Assume data fits a predefied fuctio Determie best values for parameters i the model Estimate a output value based o iput values Ca be used for classificatio ad predictio Pretice Hall 19 Pretice Hall 20 Liear Regressio Assume the relatio of the output variable to the iput variables is a liear fuctio of some parameters. Determie best values for regressio coefficiets c 0,c 1,,c,c. Assume a error: y = c 0 +c 1 x 1 + +c x +ε Estimate error usig mea squared error for traiig set: Example: 4.3 Y = C 0 + ε Fid the value for c 0 that best partitio the height values ito classes: short ad medium The traiig data for y i is {1.6, 1.9, 1.88, 1.7, 1.85, 1.6, 1.7, 1.8, 1.95, 1.9, 1.8, 1.75} How? Pretice Hall 21 Pretice Hall 22 Example: 4.4 Liear Regressio Poor Fit Y = c 0 + c 0 x 1 + ε Fid the value for c 0 ad c 1 that best predict the class. Assume 0 for the short class, 1 for the medium class The traiig data for (x i, y i) is {(1.6,0), (1.9,0), (1.88, 0), (1.7, 0), (1.85, 0), (1.6, 0), (1.7,0), (1.8,0), (1.95, 0), (1.9, 0), (1.8, 0), (1.75, 0)} How? Pretice Hall 23 Pretice Hall 24 4

Classificatio Usig Regressio Divisio Divisio: Use regressio fuctio to divide area ito regios. Predictio: : Use regressio fuctio to predict a class membership fuctio. Pretice Hall 25 Pretice Hall 26 Predictio Logistic Regressio A geeralized liear model Extesively used i the medical ad social scieces It has the followig form Log e (p /p -1) = c 0 + c 1 x 1 + + c k x k p is the probability of beig i the class, 1 p is the probability that is ot. The parameters c 0, c 1, c k are usually estimated by maximum likelihood. (maximize the probability of observig the give value.) Pretice Hall 27 Pretice Hall 28 Why Logistic Regressio P is i the rage [0,1] A good model would like to have p value close to 0 or 1 Liear fuctio is ot suitable for p Cosider the odds p/1-p. p. As p icreases, the odds (p/1-p) p) icreases The odds is i the rage of [0, + ], + asymmetric. The log odds lies i the rage - to +, symmetric. Liear Regressio vs. Logistic Regressio Pretice Hall 29 Pretice Hall 30 5

Classificatio Outlie Goal: Provide a overview of the classificatio problem ad itroduce some of the basic algorithms Classificatio Problem Overview Classificatio Techiques Regressio Bayesia classificatio Bayes Theorem Posterior Probability: P(h 1 x i ) Prior Probability: P(h 1 ) Bayes Theorem: Pretice Hall 31 Assig probabilities of hypotheses give a data value. Pretice Hall 32 Naïve Bayes Classificatio Assume that the cotributio by all attributes are idepedet ad that each cotributes equally to the classificatio problem. t i has m idepedet attributes {x i1 P (t( i C j ) P (x( ik C j ) i1,, x im,}. Pretice Hall 33 Example: usig the output1 as classificatio results N a m e G e d e r H e ig h t O u tp u t1 O u tp u t2 K ris ti a F 1.6 m S h o rt M e d iu m J im M 2 m T a ll M e d iu m M a g g ie F 1.9 m M e d iu m T a ll M a rth a F 1.8 8 m M e d iu m T a ll S te p h a ie F 1.7 m S h o rt M e d iu m B o b M 1.8 5 m M e d iu m M e d iu m K a th y F 1.6 m S h o rt M e d iu m D a v e M 1.7 m S h o rt M e d iu m W o rth M 2.2 m T a ll T a ll S te v e M 2.1 m T a ll T a ll D e b b ie F 1.8 m M e d iu m M e d iu m T o d d M 1.9 5 m M e d iu m M e d iu m K im F 1.9 m M e d iu m T a ll A m y F 1.8 m M e d iu m M e d iu m W y e tte F 1.7 5 m M e d iu m M e d iu m Pretice Hall 34 Example 4.5 Step1: Calculate the prior probability P (short) = P (medium) = P (tall) = Example 4.5 Step1: Calculate the prior probability P (short) = 4/15 = 0.267 P (medium) = 8/15 = 0.533 P (tall) = 3/15 = 0.2 Step 2: Calculate the coditioal probability P(Geder i C j ), Geder i = F or M, C j = short or medium or tall P(Height i C j ) Height i i (0,1.6],(1.6,1.7],(1.7,1.8],(1.8,1.9],(1.9,2.0],(>2.0). Pretice Hall 35 Pretice Hall 36 6

Attribute Example 4.5 (cot d) cout short medium tall Geder M 1 2 3 F 3 6 0 Height (<1.6] 2 0 0 (1.6,1.7] 2 0 0 (1.7,1.8] 0 3 0 (1.8,1.9] 0 4 0 (1.9,2.0] 0 1 1 ( >2.0 ) 0 0 2 probability p(x i C j ) short medium tall 1/4 2/8 3/3 3/4 6/8 0/3 2/4 0 0 2/4 0 0 0 3/8 0 0 4/8 0 0 1/8 1/3 0 0 2/3 Example 4.5 (cot d) Give a tuple t ={Adam, M, 1.95m} Step 3: Calculate P(t C j ) P(t short) ) = P(t medium) ) = P(t tall)= Step 4: calculate P(t) P(t) ) = P(t short)p(short)+p(t medium)p(medium)+p(t tall)p(tall) Pretice Hall 37 Pretice Hall 38 Example 4.5 (cot d) Give a tuple t ={Adam, M, 1.95m} Step 3: Calculate P(t C j ) P(t short) ) = ¼ x 0 =0 P(t medium) ) = 2/8 x 1/8 =0.031 P(t tall)= 3/3 x1/3 =0.333 Step 4: calculate P(t) P(t) ) = P(t short)p(short)+p(t medium)p(medium)+p(t tall)p(tall) = 0.0826 Example 4.5 (cot d) Step 5: Calculate P(C j t) usig Bayes Rule P(short t) ) = P(t short)p(short)/p(t) ) = P(medium t) ) = P(tall t)= Last step: classify t based o these probabilities Pretice Hall 39 Pretice Hall 40 Example 4.5 (cot d) Step 5: Calculate P(C j t) usig Bayes Rule P(short t) ) = P(t short)p(short)/p(t) ) = 0 P(medium t) ) = 0.2 P(tall t)= 0.799 Last step: Classify the ew tuple as tall. A Summary Step 1: Calculate the prior probability of each class. P (C( j ) Step 2: Calculate the coditioal probability for each attribute value, P(Geder i C j ), Step 3: Calculate the coditioal probability P(t C j ) Step 4: calculate the prior probability of a tuple, P(t) Step 5: Calculate the posterior probability for each class give the tuple, P(C j t) usig Bayes Rule Step 6: Classify a tuple based o the P(C j t), the tuple belogs to the class with has the highest posterior probability. Pretice Hall 41 Pretice Hall 42 7

Next Lecture: Classificatio: Distace-based algorithms Decisio tree-based algorithms HW2 will be aouced! Pretice Hall 43 8