Machine Learning. A completely different way to have an agent acquire the appropriate abilities to solve a particular goal is via machine learning.

Similar documents
Big Data: The Science of Patterns. Dr. Lutz Hamel Dept. of Computer Science and Statistics

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Social Media Mining. Data Mining Essentials

Data Mining: Foundation, Techniques and Applications

Decision-Tree Learning

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Data Mining for Knowledge Management. Classification

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Classification and Prediction

Professor Anita Wasilewska. Classification Lecture Notes

Data Mining on Streams

The KDD Process: Applying Data Mining

Full and Complete Binary Trees

Knowledge-based systems and the need for learning

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Chapter 12 Discovering New Knowledge Data Mining

8. Machine Learning Applied Artificial Intelligence

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Learning is a very general term denoting the way in which agents:

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Data Mining Classification: Decision Trees

Practical Introduction to Machine Learning and Optimization. Alessio Signorini

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

Knowledge Discovery and Data Mining

Machine Learning: Overview

Symbol Tables. Introduction

UNIVERSITY OF LONDON (University College London) M.Sc. DEGREE 1998 COMPUTER SCIENCE D16: FUNCTIONAL PROGRAMMING. Answer THREE Questions.

Data quality in Accounting Information Systems

Machine Learning and Data Mining. Fundamentals, robotics, recognition

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Analysis of Algorithms I: Optimal Binary Search Trees

Data Mining Practical Machine Learning Tools and Techniques

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Decision Trees. JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology. Doctoral School, Catania-Troina, April, 2008

Data mining techniques: decision trees

Introduction to Learning & Decision Trees

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Weather forecast prediction: a Data Mining application

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Binary Search Trees CMPSC 122

Course 395: Machine Learning

S. Muthusundari. Research Scholar, Dept of CSE, Sathyabama University Chennai, India Dr. R. M.

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Data Mining Algorithms Part 1. Dejan Sarka

Text Analytics Illustrated with a Simple Data Set

Writing learning objectives

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Data Mining of Web Access Logs

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Using Data Mining for Mobile Communication Clustering and Characterization

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

Lecture 10: Regression Trees

Big Data and Scripting. Part 4: Memory Hierarchies

Web Document Clustering

Web Data Extraction: 1 o Semestre 2007/2008

Gerry Hobbs, Department of Statistics, West Virginia University

Data Mining Essentials

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.

Data Structures. Level 6 C Module Descriptor

Classification algorithm in Data mining: An Overview

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

Analysis of Algorithms I: Binary Search Trees

Data Mining with Weka

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

Monday Morning Data Mining

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Učenje pravil za napovedno razvrščanje

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

Data Mining Techniques and its Applications in Banking Sector

Employer Health Insurance Premium Prediction Elliott Lui

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Big Data Analytics for SCADA

FUNDAMENTALS OF MACHINE LEARNING FOR PREDICTIVE DATA ANALYTICS Algorithms, Worked Examples, and Case Studies

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

DELAWARE MATHEMATICS CONTENT STANDARDS GRADES PAGE(S) WHERE TAUGHT (If submission is not a book, cite appropriate location(s))

: Introduction to Machine Learning Dr. Rita Osadchy

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

Churn Prediction. Vladislav Lazarov. Marius Capota.

Version Spaces.

REVIEW OF ENSEMBLE CLASSIFICATION

Data Mining Fundamentals

Converting a Number from Decimal to Binary

Data Mining Part 5. Prediction

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

How To Create A Tree From A Tree In Runtime (For A Tree)

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

DNS LOOKUP SYSTEM DATA STRUCTURES AND ALGORITHMS PROJECT REPORT

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model

Supervised Learning Evaluation (via Sentiment Analysis)!

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

Transcription:

Machine Learning A completely different way to have an agent acquire the appropriate abilities to solve a particular goal is via machine learning.

Machine Learning What is Machine Learning? Programs that get better with experience given a task and some performance measure. Learning to classify news articles Learning to recognize spoken words Learning to play board games Learning to navigate a virtual world Usually involves some sort of inductive reasoning step. Read Chaps. 17 & 26 in Alex Book

Inductive Reasoning Deductive reasoning (rule based reasoning) From the general to the specific Inductive reasoning From the specific to the general General Theory Deduction Induction Specific Facts

Example Facts: every time you see a swan you notice that the swan is white. Inductive step: you infer that all swans are white. Observed Swans are white. All Swans are white. Induction Inference is the act or process of drawing a conclusion based solely on what one already knows.

Observation Deduction is truth preserving If the rules employed in the deductive reasoning process are sound, then, what holds in the theory will hold for the deduced facts. Induction is NOT truth preserving It is more of a statistical argument The more swans you see that are white, the more probable it is that all swans are white. But this does not exclude the existence of black swans.

Observation D observations X universe of all swans

Different Styles of Machine Learning Supervised Learning The learning needs explicit examples of the concept to be learned (e.g. white swans ) Unsupervised Learning The learner discovers autonomously any structure in the domain that might represent an interesting concept

Knowledge - Representing what has been learned Symbolic Learners (transparent models) If-then-else rules Decision trees Association rules Sub-Symbolic Learners (non-transparent models) Neural Networks Clustering (Self-Organizing Maps, k-means) Support Vector Machines

Why Learning? Scripting works well if there is a well understood relationship between the input (senses) and the actions to be taken Learning works well where no such clear relationship exists Perhaps there are too many special cases to consider Perhaps there is a non-linear numerical relationship between the input and the output that is difficult to characterize Learning can be adaptive online learning where the agent constantly evaluates its actions and adjusts its acquired knowledge Very difficult to achieve in scripting

Decision Trees Learn from labeled observations - supervised learning Represent the knowledge learned in form of a tree Example: learning when to play tennis. Examples/observations are days with their observed characteristics and whether we played tennis or not

Play Tennis Example No Hot Overcast Overcast Cool No Cool Overcast No Cool Cool Hot Overcast No Hot No Hot PlayTennis Windy Humidity Temperature Outlook

Decision Tree Learning Induction Facts or Observations Theory

Interpreting a DT DT Decision Tree A DT uses the attributes of an observation table as nodes and the attribute values as links. All attribute values of a particular attribute need to be represented as links. The target attribute is special - its values show up as leaf nodes in the DT.

Interpreting a DT Each path from the root of the DT to a leaf can be interpreted as a decision rule. IF Outlook = AND Humidity = THEN Playtennis = IF Outlook = Overcast THEN Playtennis = IF Outlook = Rain AND Wind = Strong THEN Playtennis = No

DT: Explanation & Prediction Explanation: the DT summarizes (explains) all the observations in the table perfectly 100% Accuracy Prediction: once we have a DT (or model) we can use it to make predictions on observations that are not in the original training table, consider: Outlook =, Temperature =, Humidity =, Windy =, Playtennis =?

Constructing DTs How do we choose the attributes and the order in which they appear in a DT? Recursive partitioning of the original data table Heuristic - each generated partition has to be less random (entropy reduction) than previously generated partitions

Entropy S is a sample of training examples p + is the proportion of positive examples in S p - is the proportion of negative examples in S Entropy measures the impurity (randomness) of S S p + Entropy(S) - p + log 2 p + - p - log 2 p - Entropy(S) = Entropy([9+,5-]) =.94

Partitioning the Data Set Outlook Temperature Humidity Windy PlayTennis Hot No Hot No No E =.97 Cool Outlook Temperature Humidity Windy PlayTennis Outlook Overcast Overcast Overcast Overcast Hot Cool E = 0 Average Entropy =.64 Overcast Hot Outlook Temperature Humidity Windy PlayTennis Cool Cool No E =.97 No

Partitioning in Action E =.640 E =.789 E =.892 E =.911

Recursive Partitioning Partition(Examples, TargetAttribute, Attributes) Examples are the training examples. TargetAttribute is a binary (+/-) categorical dependent variable and Attributes is the list of independent variables which are available for testing at this point. This function returns a decision tree. Create a Root node for the tree. I f all Examples are positive then return Root as a leaf node with label = +. Else if all Examples are negative then return Root as a leaf node with label = -. Else if Attributes is empty then return Root as a leaf node with label = most common value of TargetAttribute in Examples. Otherwise o A := the attribute from Attributes that reduces entropy the most on the Examples. o Root := A o F or each v! values(a )! Add a new branch below the Root node with value A = v! L et Examples v be the subset of Examples where A = v! I f Examples v is empty then add new leaf node to branch with label = most common value of TargetAttribute in Examples.! Else add new subtree to branch Partition(Examples v, TargetAttribute, Attributes {A}) Return Root Based on material from the book: "Machine Learning", Tom M. Mitchell. McGraw-Hill, 1997.