Machine Learning. Goals. Does it works. Notes

Similar documents
Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Introduction to Learning & Decision Trees

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Social Media Mining. Data Mining Essentials

Machine Learning: Overview

Data Mining Classification: Decision Trees

Professor Anita Wasilewska. Classification Lecture Notes

Course 395: Machine Learning

Classification and Prediction

Data Mining for Knowledge Management. Classification

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Chapter 12 Discovering New Knowledge Data Mining

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Data Mining - Evaluation of Classifiers

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Data Mining Practical Machine Learning Tools and Techniques

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Specific Usage of Visual Data Analysis Techniques

Learning is a very general term denoting the way in which agents:

Data mining techniques: decision trees

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Using Data Mining for Mobile Communication Clustering and Characterization

Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT.

Full and Complete Binary Trees

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Data Mining Techniques

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

D A T A M I N I N G C L A S S I F I C A T I O N

Machine Learning using MapReduce

Data Mining Applications in Fund Raising

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Rule based Classification of BSE Stock Data with Data Mining

Equational Reasoning as a Tool for Data Analysis

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Big Data: The Science of Patterns. Dr. Lutz Hamel Dept. of Computer Science and Statistics

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

T Non-discriminatory Machine Learning

IMPROVING PIPELINE RISK MODELS BY USING DATA MINING TECHNIQUES

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

Data Preprocessing. Week 2

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

The KDD Process: Applying Data Mining

Data Mining in the Application of Criminal Cases Based on Decision Tree

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Data Mining and Neural Networks in Stata

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Optimization of C4.5 Decision Tree Algorithm for Data Mining Application

AnalysisofData MiningClassificationwithDecisiontreeTechnique

Foundations of Business Intelligence: Databases and Information Management

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining Part 5. Prediction

Web Document Clustering

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

TIETS34 Seminar: Data Mining on Biometric identification

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

DATA MINING USING INTEGRATION OF CLUSTERING AND DECISION TREE

Classification algorithm in Data mining: An Overview

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Data Mining for Business Analytics

A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

An Introduction to Advanced Analytics and Data Mining

DATA MINING METHODS WITH TREES

Foundations of Artificial Intelligence. Introduction to Data Mining

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Using Artificial Intelligence to Manage Big Data for Litigation

Knowledge-based systems and the need for learning

Supervised and unsupervised learning - 1

Decision-Tree Learning

FUNDAMENTALS OF MACHINE LEARNING FOR PREDICTIVE DATA ANALYTICS Algorithms, Worked Examples, and Case Studies

Information Management course

Machine Learning and Statistics: What s the Connection?

Self-Improving Supply Chains

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Introduction to Data Mining Techniques

Data Mining Algorithms Part 1. Dejan Sarka

8. Machine Learning Applied Artificial Intelligence

Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

DATA MINING TECHNIQUES AND APPLICATIONS

How To Perform An Ensemble Analysis

Data Mining for Fun and Profit

Introduction. A. Bellaachia Page: 1

Data quality in Accounting Information Systems

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Statistics for BIG data

: Introduction to Machine Learning Dr. Rita Osadchy

Efficient Integration of Data Mining Techniques in Database Management Systems

Indian Agriculture Land through Decision Tree in Data Mining

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

Transcription:

Machine learning Introduction All the techniques that we have seen until now allow us to build intelligent systems The limitation of these systems is that they only can solve the problems their are programmed for But we only should consider a system intelligent if is also able to observe its environment and learn from it The real intelligence resides in adaptation, to be able to integrate new knowledge, to solve new problems, to learn from mistakes (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 1 / 28 Goals Introduction The goal is not to model human learning The goal is to overcome the limitations of usual AI applications (KBS, planning, NLP, problem solving,...): Their limit is in the knowledge that they have Their capacities can not reach outside that limits It is not possible to foresee all possible problems from the beginning We are looking for programs that can adapt without being reprogrammed (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 2 / 28 Does it works Introduction Does Really Work? Tom Mitchell. AI Magazine 1997 Where and what can machine learning be applied for? Tasks very difficult to program (face recognition, voice,...) Adaptable applications (intelligent interfaces, spam filters, recommendation systems,...) Data mining (intelligent data analysis) (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 3 / 28

Types of machine learning Introduction Inductive Learning: Models are built from the generalization of examples. We look for patterns that explain the common characteristics of the examples. Deductive Learning: Deduction is applied to obtain generalizations from a solved example and its explanation. Genetic learning: Algorithms inspired in the theory of evolution are applied to find general description to groups of examples. onnexionist learning: Generalization is performed by the adaptation mechanisms of artificial neural networks. (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 4 / 28 Inductive learning Inductive Learning Is the area with the most number of methods Goal: To discover general rules or concepts from a limited set of examples (common patterns) It is based on the search of similar characteristics among examples All its methods are based on inductive reasoning (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 5 / 28 Inductive Learning Inductive reasoning vs Deductive reasoning Inductive reasoning It obtains general knowledge from specific information The knowledge obtained is new Its not truth preserving (new information can invalidate the knowledge obtained) It has not well founded theory Deductive reasoning It obtains general knowledge from general knowledge The knowledge is not new (it is implicit in the initial knowledge) New knowledge can not invalidate the knowledge already obtained Its basis is mathematical logic (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 6 / 28

Inductive learning Inductive Learning From a formal point of view its results are invalid We suppose that a limited number of examples represent the characteristics of the concept that we want to learn Just only one counterexample invalidates the results But, most of the human learning is inductive! (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 7 / 28 Learning as search (I) Search and inductive learning The usual way to view inductive learning is as a search problem The goal is to discover a function/representation that summarizes the characteristics of a set of examples The space of search is all the possible concepts that can be built There are different ways to perform the search (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 8 / 28 Learning as search (II) Search and inductive learning Space of search: Language used to describe the concepts = Set of concepts that can be described by the language Search operators: Heuristic operators that allow to explore the space of concepts Heuristic function: Preference function that guides the search (Bias) (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 9 / 28

Types of inductive learning Search and inductive learning Supervised inductive learning Each example is labeled with the concept it belongs to Learning is performed by contrast among concepts A set of heuristics allows to generate different hypothesis There is a criteria of preference (bias) that allows to choose the most suitable hypothesis for the examples Result: The concept or concepts that describe better the examples (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 10 / 28 Types of inductive learning Search and inductive learning Unsupervised inductive learning Examples are not labeled We want to discover a suitable way to cluster the objects Learning is based on the discovery of similarity/dissimilarity among examples A heuristic preference criteria will guide the search Result: A partition of the examples and a characterization of the partitions (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 11 / 28 We can learn a concept as the set of questions that allows to distinguish it from others Using a tree as representation formalism we can store and organize these questions Each node from the tree is a question about an attribute The search space is the set of all possible trees of questions This representation is equivalent to a DNF (2 2n ) (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 12 / 28

To reduce the computational cost of searching this space we must to choose a bias (what kind of concepts are preferred) Decision: Tree that gives the minimal description of the goal concept given a set of examples Reason: Such kind of tree will be the better to predict new instances (the probability that unnecessary conditions appear is reduced) Occam s razor: the hypothesis that introduces the fewest assumptions and postulates the fewest entities is to be preferred (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 13 / 28 Algorithms for One of the first algorithms for building decision trees is ID3 (Quinlan 1986) It is in the family of algorithms for Top Down Induction Decision Trees (TDIDT) ID3 performs a search using a Hill-limbing strategy in the space of decision trees For each level of the tree an attribute is chosen and the set of examples is split using the values of the attribute. This process is repeated recursively for each partition The selection of the attribute is performed using an heuristic function (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 14 / 28 Information Theory Information theory studies among other things the coding of messages and the cost of their transmition If we define a set of messages M = {m 1, m 2,..., m n }, each one with probability P(m i ), we can define the quantity of information (I ) that a message M contains as: I (M) = n P(m i )log(p(m i )) i=1 This value can be interpreted as the information needed to discriminate the messages from M (Number of bit necessary to code the messages) (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 15 / 28

Quantity of information as heuristic We can use an analogy from message coding assuming that the classes are messages and the proportion of examples from each class is their probability A decision tree can be seen as the coding that allows to discriminate among classes We are looking for the minimal code that discriminates among classes Each attribute is evaluated to decide if it is a part of the code An attribute is better than other if allows to discriminate better among classes (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 16 / 28 Quantity of information as heuristic At each level of the tree we have to find the attribute that allows to minimize the code (minimizes the size of the tree) The attribute that allows that is the attribute that left less quantity of information to cover by other attributes The election of an attribute should result in subsets of examples that are biased towards one class We need a measure of the quantity of information not covered by an attribute (Entropy, E) (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 17 / 28 Information Gain Quantity of information (X - examples, - classification) I (X, ) = c i c i X log( c i X ) Entropy (A - attribute, [A(x) = v i ] - examples with value v i ) Information Gain E(X, A, ) = v i A [A(x) = v i ] I ([A(x) = v i ], ) X G(X, A, ) = I (X, ) E(X, A, ) (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 18 / 28

Information Gain 1 2 3 I(X,) A=v1 A=v2 A=v3 1 2 3 1 2 3 1 2 3 E(X,A,) G(X,A,)= I(X,) E(X,A,) (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 19 / 28 ID3 Algorithm Algorithm: ID3 (X : Examples, : lassification, A: Attributes) if all examples are from the same class then return a leave with the class name else ompute the quantity of information of the examples (I) foreach attribute in A do ompute the entropy (E) and the information gain (G) Pick the attribute that maximizes G (a) Delete a from the list of attributes (A) Generate a root node for the attribute a foreach partition generated by the values of the attribute a do Tree i =ID3(X (a=v i ), (a=v i ),A-a) generate a new branch with a=v i and Tree i return the root node for a (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 20 / 28 Example (1) Let it be the following set of examples Ex. Eyes Hair Height lass 1 Blue Blonde Tall + 2 Blue Dark Medium + 3 Brown Dark Medium 4 Green Dark Medium 5 Green Dark Tall + 6 Brown Dark Small 7 Green Blonde Small 8 Blue Dark Medium + (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 21 / 28

Example (2) I (X, ) = 1/2 log(1/2) 1/2 log(1/2) = 1 E(X, eyes) = (blue) 3/8 ( 1 log(1) 0 log(0)) + (brown) 2/8 ( 1 log(1) 0 log(0)) + (green) 3/8 ( 1/3 log(1/3) 2/3 log(2/3)) = 0,344 E(X, hair) = (blonde) 2/8 ( 1/2 log(1/2) 1/2 log(1/2)) + (dark) 6/8 ( 1/2 log(1/2) 1/2 log(1/2)) = 1 E(X, height) = (tall) 2/8 ( 1 log(1) 0 log(0)) + (medium) 4/8 ( 1/2 log(1/2) 1/2 log(1/2)) + (small) 2/8 (0 log(0) 1 log(1)) = 0,5 (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 22 / 28 Example (3) We can see that the attribute eyes is the one that maximizes the function. G(X, eyes) = 1 0,344 = 0,656 G(X, hair) = 1 1 = 0 G(X, height) = 1 0,5 = 0,5 (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 23 / 28 Example (4) This attributes generates the first level of the tree EYES BLUE BROWN GREEN 1,2,8 + 3,6 4,7 5 + (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 24 / 28

Example (5) Now only in the node corresponding to the value green we have a mix of classes, so we repeat the process with these examples. Ex. Hair Height lass 4 Dark Medium 5 Dark Tall + 7 Blonde Small (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 25 / 28 Example (6) I (X, ) = 1/3 log(1/3) 2/3 log(2/3) = 0,918 E(X, hair) = (blonde) 1/3 (0 log(0) 1 log(1)) + (dark) 2/3 ( 1/2 log(1/2) 1/2 log(1/2)) = 0,666 E(X, height) = (tall) 1/3 (0log(0) 1 log(1)) + (medium) 1/3 ( 1 log(1) 0 log(0)) + (small) 1/3 (0 log(0) 1 log(1)) = 0 (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 26 / 28 Example (7) Now the attribute with the maximum value is eyes. G(X, hair) = 0,918 0,666 = 0,252 G(X, height) = 0,918 0 = 0,918 (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 27 / 28

Example (8) The resulting tree totally is discriminant. EYES 1,2,8 + BLUE 3,6 BROWN GREEN HEIGHT TALL MEDIUM SMALL 5 + 4 7 (LSI-FIB-UP) Artificial Intelligence Term 2009/2010 28 / 28