# Introduction to Learning & Decision Trees

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing facts - learning the underlying structure of the problem or data aka: regression pattern recognition machine learning data mining A fundamental aspect of learning is generalization: - given a few examples, can you generalize to others? Learning is ubiquitous: - medical diagnosis: identify new disorders from observations - loan applications: predict risk of default - prediction: (climate, stocks, etc.) predict future from current and past data - speech/object recognition: from examples, generalize to others 2

2 Representation How do we model or represent the world? All learning requires some form of representation. Learning: adjust model parameters to match data model {θ,..., θ n } world (or data) 3 The complexity of learning Fundamental trade-off in learning: complexity of model vs amount of data required to learn parameters The more complex the model, the more it can describe, but the more data it requires to constrain the parameters. Consider a hypothesis space of N models: - How many bits would it take to identify which of the N models is correct? - log2(n) in the worst case Want simple models to explain examples and generalize to others - Ockham s (some say Occam) razor 4

3 Complex learning example: curve fitting t = sin(2πx) + noise t t n 0 y(x n, w) 0 x n x How do we model the data? 5 example from Bishop (2006), Pattern Recognition and Machine Learning Polynomial curve fitting M y(x, w) = w 0 + w x + w 2 x w M x M = w j x j j=0 E(w) = N [y(x n, w) t n ] 2 2 n= example from Bishop (2006), Pattern Recognition and Machine Learning

4 More data are needed to learn correct model This is overfitting. 7 example from Bishop (2006), Pattern Recognition and Machine Learning supervised Types of learning unsupervised reinforcement reinforcement desired output {y,..., y n } model output model {θ,..., θ n } model {θ,..., θ n } model {θ,..., θ n } world (or data) world (or data) world (or data) 8

5 Decision Trees Decision trees: classifying from a set of attributes <2 years at current job? Predicting credit risk missed payments? defaulted? Y N Y N Y Y N Y N N Y Y N bad: 0 good: 3 N bad: good: 6 <2 years at current job? bad: 3 good: 7 missed payments? Y bad: good: 3 Y bad: 2 good: each level splits the data according to different attributes goal: achieve perfect classification with minimal number of decisions - not always possible due to noise or inconsistencies in the data 0

6 Observations Any boolean function can be represented by a decision tree. not good for all functions, e.g.: - parity function: return iff an even number of inputs are - majority function: return if more than half inputs are best when a small number of attributes provide a lot of information Note: finding optimal tree for arbitrary data is NP-hard. Decision trees with continuous values years at current job Predicting credit risk # missed payments defaulted? 7 0 N Y 3 0 N 9 0 N 4 2 Y N 5 N 8 4 Y.0 0 N.75 0 N # missed payments >.5 " "" "" " " " years at current job Now tree corresponds to order and placement of boundaries General case: - arbitrary number of attributes: binary, multi-valued, or continuous - output: binary, multi-valued (decision or axis-aligned classification trees), or continuous (regression trees) 2

7 Examples loan applications medical diagnosis movie preferences (Netflix contest) spam filters security screening many real-word systems, and AI success In each case, we want - accurate classification, i.e. minimize error - efficient decision making, i.e. fewest # of decisions/tests decision sequence could be further complicated - want to minimize false negatives in medical diagnosis or minimize cost of test sequence - don t want to miss important 3 Decision Trees simple example of inductive learning. learn decision tree from training examples 2. predict classes for novel testing examples class prediction model {θ,..., θ n } Generalization is how well we do on the testing examples. Only works if we can learn the underlying structure of the data. training examples testing examples 4

8 Choosing the attributes How do we find a decision tree that agrees with the training data? Could just choose a tree that has one path to a leaf for each example - but this just memorizes the observations (assuming data are consistent) - we want it to generalize to new examples Ideally, best attribute would partition the data into positive and negative examples Strategy (greedy): - choose attributes that give the best partition first Want correct classification with fewest number of tests 5 Problems How do we which attribute or value to split on? When should we stop splitting? What do we do when we can t achieve perfect classification? What if tree is too large? Can we approximate with a smaller tree? Predicting credit risk <2 years at current job? missed payments? defaulted? Y N Y N bad: 3 good: 7 missed payments? Y N Y Y N Y N N Y Y N bad: 0 good: 3 bad: good: 6 <2 years at current job? Y bad: good: 3 bad: 2 good: 6

9 Basic algorithm for learning decision trees. starting with whole training data 2. select attribute or value along dimension that gives best split 3. create child nodes based on split 4. recurse on each child using child data until a stopping criterion is reached all examples have same class amount of data is too small tree too large Central problem: How do we choose the best attribute? 7 Measuring information A convenient measure to use is based on information theory. How much information does an attribute give us about the class? - attributes that perfectly partition should given maximal information - unrelated attributes should give no information Information of symbol w: I(w) log 2 P (w) P (w) = /2 I(w) = log 2 /2 = bit P (w) = /4 I(w) = log 2 /4 = 2 bits 8

10 Information and Entropy I(w) log 2 P (w) For a random variable X with probability P(x), the entropy is the average (or expected) amount of information obtained by observing x: H(X) = x P (x)i(x) = x P (x) log 2 P (x) Note: H(X) depends only on the probability, not the value. H(X) quantifies the uncertainty in the data in terms of bits H(X) gives a lower bound on cost (in bits) of coding (or describing) X H(X) = x P (x) log 2 P (x) P (heads) = /2 2 log 2 P (heads) = /3 3 log log 2 2 = bit log 2 2 = bits 3 9 Entropy of a binary random variable Example Entropy is maximum at p=0.5 Entropy is zero and p=0 or p=. A single random variable X with X = with probability p and X = 0 with probability p. Note that H(p) is bit when p = /2. 20

11 English character Example strings revisited: of symbols A-Z and in space english: A-Z and space H = 4.76 bits/char H2 = 4.03 bits/char H2 = 2.8 bits/char A fourth order approximation The entropy increases as the data become less ordered. 2 Computational Perception and Scene Analysis: Visual Coding 2 / Michael S. Lewicki, CMU Note that as the order is increased: entropy decreases: H 0 = 4.76 bits, H = 4.03 bits, and H 4 = 2.8 bits/char variables, i.e. P (c i c i, c i 2,..., c i k ), specify more specific structure generated samples look more like real English Predicting credit risk How many bits does This it is take an to example specify of the relationship <2 years between at missed efficient coding and representatio the attribute of defaulted? defaulted? current job? payments? of signal structure. - P(defaulted = Y) = 3/0 - P(defaulted = N) = 7/0 Y N Y Computational Perception and Scene Analysis: Visual Coding 2 / Michael S. Lewicki, CMU? H(Y ) = P (Y = y i ) log 2 P (Y = y i ) Credit risk revisited i=y,n = 0.3 log log = How much can we reduce the entropy (or uncertainty) of defaulted by knowing the other attributes? Ideally, we could reduce it to zero, in which case we classify perfectly. N Y Y N Y N N Y Y? 22

12 Conditional entropy H(Y X) is the remaining entropy of Y given X or The expected (or average) entropy of P(y x) H(Y X) x P (x) y P (y x) log 2 P (y x) = x P (x) y P (Y = y X = x) log 2 P (Y = y X = x) = x P (x) y H(Y X = x) H(Y X=x) is the specific conditional entropy, i.e. the entropy of Y knowing the value of a specific attribute x. 23 Back to the credit risk example Predicting credit risk H(Y X) x = x = x P (x) y P (x) y P (x) y P (y x) log 2 P (y x) P (Y = y X = x) log 2 P (Y = y X = x) H(Y X = x) H(defaulted < 2years = N) = log 2 H(defaulted < 2years = Y) = 3 4 log log log 2 4 = H(defaulted missed) = = = <2 yrs missed def? Y N Y N Y Y N Y N N Y Y H(defaulted missed = N) = 6 7 log log 2 7 = H(defaulted missed = Y) = 3 log log = H(defaulted missed) = =

13 Mutual information We now have the entropy - the minimal number of bits required to specify the target attribute: H(Y ) = y P (y) log 2 P (y) The conditional entropy - the remaining entropy of Y knowing X H(Y X) = x P (x) y P (y x) log 2 P (y x) So we can now define the reduction of the entropy after learning Y. This is known as the mutual information between Y and X I(Y ; X) = H(Y ) H(Y X) 25 Properties of mutual information Mutual information is symmetric I(Y ; X) = I(X; Y ) In terms of probability distributions, it is written as I(X; Y ) = x,y P (x, y) log 2 P (x, y) P (x)p (y) It is zero, if Y provides no information about X: I(X; Y ) = 0 P (x) and P (y) are independent If Y = X then I(X; X) = H(X) H(X X) = H(X) 26

15 First step: calculate information gains Compute for information gain for each Suppose attribute we want to In this predict case, cylinders MPG. provides the most gain, because it nearly partitions the data. Look at all the information gains 29 Copyright Andrew W. Moore Slide 32 First decision: partition on cylinders A Decision Stump Note the lopsided mpg class distribution. Copyright Andrew W. Moore Slide 33 30

16 Recursion Step Recurse on child nodes to expand tree Records in which cylinders = 4 Take the Original Dataset.. And partition it according to the value of the attribute we split on Records in which cylinders = 5 Records in which cylinders = 6 Records in which cylinders = 8 Copyright Andrew W. Moore Slide 34 3 Expanding the tree: data is partitioned for each child Recursion Step Build tree from These records.. Build tree from These records.. Build tree from These records.. Build tree from These records.. Records in which cylinders = 4 Records in which cylinders = 5 Records in which cylinders = 6 Records in which cylinders = 8 Exactly the same, but with a smaller, conditioned datasets. Copyright Andrew W. Moore Slide 35 32

17 Second level of decisions Second level of tree Why don t we expand these nodes? Recursively build a tree from the seven records in which there are four cylinders and the maker was based in Asia (Similar recursion in the other cases) 33 Copyright Andrew W. Moore Slide 36 The final tree Copyright Andrew W. Moore 34 Slide 37

### Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu.

Decision Trees Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 42-268-7599 Copyright Andrew W. Moore Slide Decision Trees Decision trees

### Learning. Artificial Intelligence. Learning. Types of Learning. Inductive Learning Method. Inductive Learning. Learning.

Learning Learning is essential for unknown environments, i.e., when designer lacks omniscience Artificial Intelligence Learning Chapter 8 Learning is useful as a system construction method, i.e., expose

### Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

### Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge

### Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

### Lecture 10: Regression Trees

Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

### Data Mining for Knowledge Management. Classification

1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

### Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu.

Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### Classification and Regression Trees

Classification and Regression Trees Bob Stine Dept of Statistics, School University of Pennsylvania Trees Familiar metaphor Biology Decision tree Medical diagnosis Org chart Properties Recursive, partitioning

### Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2. Tid Refund Marital Status

Data Mining Classification: Basic Concepts, Decision Trees, and Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Classification: Definition Given a collection of

### COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

### INTRODUCTION TO MACHINE LEARNING

Why are you here? What is Machine Learning? Why are you taking this course? INTRODUCTION TO MACHINE LEARNING David Kauchak CS 451 Fall 2013 What topics would you like to see covered? Machine Learning is

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Course 395: Machine Learning

Course 395: Machine Learning Lecturers: Maja Pantic (maja@doc.ic.ac.uk) Stavros Petridis (sp104@doc.ic.ac.uk) Goal (Lectures): To present basic theoretical concepts and key algorithms that form the core

### Final Exam, Spring 2007

10-701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:

### Classification and Prediction

Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

### C19 Machine Learning

C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

### Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data

### 5. Introduction to Data Mining

5. Introduction to Data Mining 5.1 Introduction 5.2 Building a classification tree 5.2.1 Information theoretic background 5.2.2 Building the tree 5.. Mining association rule [5.4 Clustering] using material

### Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

### Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision

### Introduction to Statistical Machine Learning

CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,

### Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

### Chapter 2: Entropy and Mutual Information

Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

### Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

### Introduction. 5. Introduction to Data Mining. Data Mining. 5.1 Introduction. Introduction. Introduction. Association rules

5. Introduction to Data Mining 5.1 Introduction 5. Building a classification tree 5..1 Information theoretic background 5.. Building the tree 5.. Mining association rule [5.4 Clustering] Introduction Association

### Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

### INTRODUCTION TO MACHINE LEARNING 3RD EDITION

ETHEM ALPAYDIN The MIT Press, 2014 Lecture Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml3e CHAPTER 1: INTRODUCTION Big Data 3 Widespread

### Decision tree algorithm short Weka tutorial

Decision tree algorithm short Weka tutorial Croce Danilo, Roberto Basili Machine leanring for Web Mining a.a. 2009-2010 Machine Learning: brief summary Example You need to write a program that: given a

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

### Machine Learning - Spring 2012 Problem Set 3

10-701 Machine Learning - Spring 2012 Problem Set 3 Out: February 29th, 1:30pm In: March 19h, 1:30pm TA: Hai-Son Le (hple@cs.cmu.edu) School Of Computer Science, Carnegie Mellon University Homework will

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

### TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### A Gentle Introduction to Machine Learning

A Gentle Introduction to Machine Learning Second Lecture Part I Olov Andersson, AIICS Linköpings Universitet Outline of Machine Learning Lectures First we will talk about Supervised Learning Definition

### Foundations of Artificial Intelligence. Introduction to Data Mining

Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present

### Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model

10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1

### Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### Data mining knowledge representation

Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

### Chapter 7. Feature Selection. 7.1 Introduction

Chapter 7 Feature Selection Feature selection is not used in the system classification experiments, which will be discussed in Chapter 8 and 9. However, as an autonomous system, OMEGA includes feature

### 4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we

### Mutual information based feature selection for mixed data

Mutual information based feature selection for mixed data Gauthier Doquire and Michel Verleysen Université Catholique de Louvain - ICTEAM/Machine Learning Group Place du Levant 3, 1348 Louvain-la-Neuve

### Ex. 2.1 (Davide Basilio Bartolini)

ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips

### Introduction to Machine Learning

Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

### COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise

### Machine Learning and Statistics: What s the Connection?

Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning

### Decision-Tree Learning

Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### Entropy and Mutual Information

ENCYCLOPEDIA OF COGNITIVE SCIENCE 2000 Macmillan Reference Ltd Information Theory information, entropy, communication, coding, bit, learning Ghahramani, Zoubin Zoubin Ghahramani University College London

### Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

### Learning is a very general term denoting the way in which agents:

What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

### Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

### An Introduction to Information Theory

An Introduction to Information Theory Carlton Downey November 12, 2013 INTRODUCTION Today s recitation will be an introduction to Information Theory Information theory studies the quantification of Information

### 15.564 Information Technology I. Business Intelligence

15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### National Sun Yat-Sen University CSE Course: Information Theory. Gambling And Entropy

Gambling And Entropy 1 Outline There is a strong relationship between the growth rate of investment in a horse race and the entropy of the horse race. The value of side information is related to the mutual

### Machine Learning Algorithms and Predictive Models for Undergraduate Student Retention

, 225 October, 2013, San Francisco, USA Machine Learning Algorithms and Predictive Models for Undergraduate Student Retention Ji-Wu Jia, Member IAENG, Manohar Mareboyana Abstract---In this paper, we have

### Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of

### Decision Trees and Random Forests. Reference: Leo Breiman, http://www.stat.berkeley.edu/~breiman/randomforests

Decision Trees and Random Forests Reference: Leo Breiman, http://www.stat.berkeley.edu/~breiman/randomforests 1. Decision trees Example (Guerts, Fillet, et al., Bioinformatics 2005): Patients to be classified:

### Data mining techniques: decision trees

Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39

### Class Overview and General Introduction to Machine Learning

Class Overview and General Introduction to Machine Learning Piyush Rai www.cs.utah.edu/~piyush CS5350/6350: Machine Learning August 23, 2011 (CS5350/6350) Intro to ML August 23, 2011 1 / 25 Course Logistics

### Homework 4 Statistics W4240: Data Mining Columbia University Due Tuesday, October 29 in Class

Problem 1. (10 Points) James 6.1 Problem 2. (10 Points) James 6.3 Problem 3. (10 Points) James 6.5 Problem 4. (15 Points) James 6.7 Problem 5. (15 Points) James 6.10 Homework 4 Statistics W4240: Data Mining

### Learning from Data: Decision Trees

Learning from Data: Decision Trees Amos Storkey, School of Informatics University of Edinburgh Semester 1, 2004 LfD 2004 Decision Tree Learning - Overview Decision tree representation ID3 learning algorithm

### INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:

### K-nearest-neighbor: an introduction to machine learning

K-nearest-neighbor: an introduction to machine learning Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison slide 1 Outline Types of learning Classification:

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### LCs for Binary Classification

Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it

### Neural Networks. Introduction to Artificial Intelligence CSE 150 May 29, 2007

Neural Networks Introduction to Artificial Intelligence CSE 150 May 29, 2007 Administration Last programming assignment has been posted! Final Exam: Tuesday, June 12, 11:30-2:30 Last Lecture Naïve Bayes

### Chapter 20: Data Analysis

Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

### Decision Trees (Sections )

Decision Trees (Sections 8.1-8.4) Non-metric methods CART (Classification & regression Trees) Number of splits Query selection & node impurity Multiway splits When to stop splitting? Pruning Assignment

### Full and Complete Binary Trees

Full and Complete Binary Trees Binary Tree Theorems 1 Here are two important types of binary trees. Note that the definitions, while similar, are logically independent. Definition: a binary tree T is full

### CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Lecture 1 Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x-5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

### Less naive Bayes spam detection

Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:h.m.yang@tue.nl also CoSiNe Connectivity Systems

### The Use of Probabilistic Neural Network and ID3 Algorithm for Java Character Recognition

The Use of Probabilistic Neural Network and ID3 Algorithm Java Recognition Gregorius Satia Budhi Inmatics Department greg@petra.ac.id Rudy Adipranata Inmatics Department rudya@petra.ac.id Bondan Sebastian

### What is Classification? Data Mining Classification. Certainty. Usual Examples. Predictive / Definitive. Techniques

What is Classification? Data Mining Classification Kevin Swingler Assigning an object to a certain class based on its similarity to previous examples of other objects Can be done with reference to original

### Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,

### 203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

### Web Document Clustering

Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

### Smart Grid Data Analytics for Decision Support

1 Smart Grid Data Analytics for Decision Support Prakash Ranganathan, Department of Electrical Engineering, University of North Dakota, Grand Forks, ND, USA Prakash.Ranganathan@engr.und.edu, 701-777-4431

### The Artificial Prediction Market

The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

### Data, Measurements, Features

Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

### FUNDAMENTALS OF MACHINE LEARNING FOR PREDICTIVE DATA ANALYTICS Algorithms, Worked Examples, and Case Studies

FreeChapter InformationBasedLearning 2015/6/12 18:02 Page i #1 FUNDAMENTALS OF MACHINE LEARNING FOR PREDICTIVE DATA ANALYTICS Algorithms, Worked Examples, and Case Studies John D. Kelleher Brian Mac Namee

### CS383, Algorithms Notes on Lossless Data Compression and Huffman Coding

Prof. Sergio A. Alvarez http://www.cs.bc.edu/ alvarez/ Fulton Hall 410 B alvarez@cs.bc.edu Computer Science Department voice: (617) 552-4333 Boston College fax: (617) 552-6790 Chestnut Hill, MA 02467 USA

### Optimization of C4.5 Decision Tree Algorithm for Data Mining Application

Optimization of C4.5 Decision Tree Algorithm for Data Mining Application Gaurav L. Agrawal 1, Prof. Hitesh Gupta 2 1 PG Student, Department of CSE, PCST, Bhopal, India 2 Head of Department CSE, PCST, Bhopal,

### MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

### Data Mining Classification

Data Mining Classification Jingpeng Li 1 of 26 What is Classification? Assigning an object to a certain class based on its similarity to previous examples of other objects Can be done with reference to

### CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION N PROBLEM DEFINITION Opportunity New Booking - Time of Arrival Shortest Route (Distance/Time) Taxi-Passenger Demand Distribution Value Accurate

### Lecture 1: Course overview, circuits, and formulas

Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik

### Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning SS 2005 Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes