# Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Size: px
Start display at page:

Download "Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)"

Transcription

1 Machine Learning 1

2 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2

3 Outline Inductive learning Decision tree learning Measuring learning performance Statistical learning Naive Bayes Learning Classification Evaluation 3

4 Inductive Learning Training Set, Data of N examples of input-output pairs (x 1,y 1 )...(x N,y N ) such that y i is generated by unknown function y = f(x) Learning: discover a hypothesis function h that approximates the true function f Test Set is used to measure accuracy of hypothesis h Hypothesis h generalizes well if it correctly predicts the value of y in novel examples Hypothesis space, Hypothesis being realizable 4

5 Kinds of Learning Three types of feedback determine main kinds of (machine) learning: Supervised learning: requires collection of sample input-output pairs problem instance, correct answer, so that it learns a function that maps from input to output. In other words, it requires teacher Unsupervised learning: learn patterns from the input without specific feedback: e.g., clustering. Requires no teacher Reinforcement learning: occasional rewards occur to reinforce or inhibit certain sequences of actions. Is harder, but requires no teacher Semi-Supervised learning: Too few labeled examples plus not necessarily very accurate 5

6 Inductive learning (a.k.a. Science) Simplest form: learn a function from examples (tabula rasa, blank slate in Latin) f is the target function An example is an input-output pair x, f(x), e.g., Problem: find a(n) hypothesis h such that h f given a training set of examples O O X X X, +1 6

7 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 7

8 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 8

9 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 9

10 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 10

11 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 11

12 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x Ockham s razor (William of Ockham (c )): maximize a combination of consistency and simplicity. 12

13 Learning Decision Trees A decision tree represents a function that takes as input a vector of attribute values and returns a decision a single output value. A B A xor B F F F F T T T F T T T F F F B A F T B T F T T T F We will now outline a supervised learning method for constructing decision trees given labeled data input-output pairs. 13

14 Attribute-based representations Examples described by attribute values (Boolean, discrete, continuous, etc.) E.g., situations where I will/won t wait for a table: Example Attributes Target Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait X 1 T F F T Some \$\$\$ F T French 0 10 T X 2 T F F T Full \$ F F Thai F X 3 F T F F Some \$ F F Burger 0 10 T X 4 T F T T Full \$ F F Thai T X 5 T F T F Full \$\$\$ F T French >60 F X 6 F T F T Some \$\$ T T Italian 0 10 T X 7 F T F F None \$ T F Burger 0 10 F X 8 F F F T Some \$\$ T T Thai 0 10 T X 9 F T T F Full \$ T F Burger >60 F X 10 T T T T Full \$\$\$ F T Italian F X 11 F F F F None \$ F F Thai 0 10 F X 12 T T T T Full \$ F F Burger T Classification of examples is positive (T) or negative (F) 14

15 Decision trees One possible representation for hypotheses E.g., here is the true tree for deciding whether to wait: Patrons? None Some Full F T WaitEstimate? > F Alternate? Hungry? T No Yes No Yes Reservation? Fri/Sat? T Alternate? No Yes No Yes No Yes Bar? T F T T Raining? No Yes No Yes F T F T 15

16 Expressiveness Decision trees can express any function of the input attributes. E.g., for Boolean functions, truth table row path to leaf: A B A xor B F F F F T T T F T T T F F F B A F T B T F T T T F Trivially, there is a consistent decision tree for any training set w/ one path to leaf for each example (unless f nondeterministic in x) but it probably won t generalize to new examples Prefer to find more compact decision trees 16

17 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? 17

18 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions 18

19 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows 19

20 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n 20

21 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees 21

22 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry Rain)?? 22

23 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry Rain)?? Each attribute can be in (positive), in (negative), or out 3 n distinct conjunctive hypotheses More expressive hypothesis space increases chance that target function can be expressed increases number of hypotheses consistent w/ training set may get worse predictions 23

24 Decision tree learning Aim: find a small tree consistent with the training examples Idea 1: (recursively) choose most significant attribute as root of (sub)tree to branch on next Idea 2: a good attribute splits the examples into subsets that are (ideally) all positive or all negative 24

25 Example contd. Decision tree learned from the 12 examples: Patrons? None Some Full F T Hungry? Yes No Type? F French Italian Thai Burger T F Fri/Sat? T No Yes F T Substantially simpler than true tree a more complex hypothesis isn t justified by small amount of data 25

26 How do we know that h f? Performance measurement 1) Use theorems of computational/statistical learning theory 2) Try h on a new test set of examples (use same distribution over example space as training set) Learning curve = % correct on test set as a function of training set size % correct on test set Training set size 26

27 Performance measurement contd. Learning curve depends on realizable (can express target function) vs. non-realizable non-realizability can be due to missing attributes or restricted hypothesis class (e.g., thresholded linear function) redundant expressiveness (e.g., loads of irrelevant attributes) % correct 1 realizable redundant nonrealizable # of examples 27

28 Performance measurement contd. II Still, How do we know that h f? Hume s Problem of Induction: Wikipedia: The problem of induction is the philosophical question of whether inductive reasoning leads to knowledge understood in the classic philosophical sense, since it focuses on the lack of justification for either: 1. Generalizing about the properties of a class of objects based on some number of observations of particular instances of that class (for example, the inference that all swans we have seen are white, and therefore all swans are white, before the discovery of black swans) or 2. Presupposing that a sequence of events in the future will occur as it always has in the past (for example, that the laws of physics will hold as they have always been observed to hold). Hume called this the principle uniformity of nature. 28

29 Classes of Learning Problems Classification: The output y of a true function that we learnis a finite set of values, e.g., wait or leave in a restaurant; sunny, cloudy, or rainy. Regression: The output y ofatrue function thatwe learnisanumber, e.g., tomorrow s temperature. Sometimes the function f is stochastic strictly speaking, it is not a function of x, so what we learn is a conditional probability distribution P(Y x). 29

30 Statistical learning Training Set, Data evidence instantiations of all or some of the random variables describing the domain Hypotheses are probabilistic theories of how the domain works 30

31 Learning a Probability Model Training Set, Data of N examples of input-output pairs (x 1,y 1 )...(x N,y N ) such that y i is generated by unknown function y = f(x) Inductive Learning: discover a hypothesis function h that approximates the true function f, e.g, Decision Trees Statistical Learning: Given a fixed structure of a probability model of the domain, discover its parameters from Data: parameter learning As a result given parameters of a problem instance, learned probability model can be used to answer queries about problem instances Classification: Observed parameters of a given instance and learned probability model of a domain provides probabilistic information on the likelihood of a particular classification 31

32 Classification Problems Classification is the task of predicting labels (class variables) for inputs Commercially and Scientifically Important Examples: Spam Filtering Optical Character Recognition (OCR) Medical Diagnoses Part of Speech Tagging Semantic Role Labeling/Information Extraction Automatic essay grading Fraud detection 32

33 Probabilistic Models A naive Bayes model: P(Cause,Effect 1,...,Effect n ) = P(Cause)Π i P(Effect i Cause) (1) Cavity Cause Toothache Catch Effect 1 Effect n where Cause is taken to be the class variable, which is to be predicted. The attribute-parameter variables are the leaves Effects. Model is naive : assumes parameter variables to be independent Model Training: using Training Set to uncover the conditional probability distribution of parameters P(Effect i Cause j ) Once the model is trained, given values of parameters of a problem instance, we can use (1) to classify an instance. 33

34 Independence as Abstraction Model is naive : assumes parameter variables to be independent May lead to overconfidence Indeed, all CAPS in Spam is not independent of \$\$ symbols Yet, it is often a fine abstraction, and a computationally tractable one 34

35 Optical Character Recognition Example: Training a Model Given a labeled collection M of digits in digital form nxn grid Features: Pixel i,j = on or off, Adj A naive Bayes model: P(Digit,Pixel 1,1,...,Pixel n,n,adj) = P(Digit)Π i,j P(Pixel i,j Digit)P(Adj) Model Training Process: For M P(0) = count(m,0) M,..., P(9) = count(m,9) M P(pixel 1,1 = on 0) = count(m,0,on,1,1) count(m,0),... P(pixel 1,1 = off 0) = 1 P(pixel 1,1 = on 0),... 35

36 Example: Classification in OCR Given parameters-attributes-features of an unseen instance and trained model we can compute P(0,pixel 1,1 = on,...,pixel n,n = off,adj = true) = x 0... P(9,pixel 1,1 = on,...,pixel n,n = off,adj) = x 9 and then pick the most likely class, i.e., class that corresponds to the maximum value among x 0,...,x 9. 36

37 Evaluation Split Labeled Data into Three Categories (80/10/10; 60/20/20): 1. Training set 2. Held-out set 3.Test set Decide on Features (Parameters, Attributes): attribute-value pairs that characterize each instance Experimentation-Evaluation Cycle: 1. Learn parameters, (e.g., model probabilities) on training set 2.Tune set of features on held-out set 3. Compute accuracy on test set: accuracy fraction of instances predicted correctly 37

38 Feature Engineering Feature Engineering is crucial! Features translate into hypotheses space Too few features: cannot fit the data Too many features: overfitting 38

39 Generalization and Overfitting Relative frequency parameters will overfit the training data Since training set did not contain 3 with pixel i,j on during training does not mean it does not exist (but note how we will assign probability 0 to such event!) Unlikely that every occurrence if minute is 100% spam Unlikely that every occurrence if seriously is 100% ham Similarly, what happens to the words that never occur in training set? Unseen events should not be assigned 0 probability To generalize better: smoothing is essential 39

40 Intuitions Behind Smoothing Estimation: Smoothing We have some prior expectation about parameters Given little evidence, we should prefer prior Given a lot of evidence the data should rule Maximum likelihood estimate P ML (x) = count(x) total samples does not account for above intuitions Consider three coin flips: Head, Head, Tail; what is P ML (x) 40

41 Laplace s estimate P LAP (x) = Estimation: Laplace Smoothing count(x)+1 total samples + X Pretend that every outcome appeared once more than it did Note how it elegantly deals with earlier unseen events Laplace s estimate extended with strength factor: P LAP,k (x) = count(x)+k total samples + k X Considerthreecoinflips: Head,Head,Tail;whatareP ML (x),p LAP (x),p LAP,k (x)? There are many ways to introduce smoothing as well as methods to account for unknown events 41

42 Summary Learning needed for unknown environments, lazy designers Learning method depends on type of performance element, available feedback, type of component to be improved, and its representation For supervised learning, the aim is to find a simple hypothesis that is approximately consistent with training examples Decision tree learning using information gain Learning performance = prediction accuracy measured on test set Learning Models, Naive Bayses Nets Classification Problem by Means of Naive Bayses Nets Smoothing Evaluation Concepts 42

### Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

### Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters

### 203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

### CSE 473: Artificial Intelligence Autumn 2010

CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron

### Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of

### Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

### Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

### Introduction to Learning & Decision Trees

Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

### Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34

Machine Learning Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34 Outline 1 Introduction to Inductive learning 2 Search and inductive learning

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### Machine Learning: Overview

Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

### Learning is a very general term denoting the way in which agents:

What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

### Big Data Analytics CSCI 4030

High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

### Introduction to Pattern Recognition

Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

### MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

### Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

### Lecture 10: Regression Trees

Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

### Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Naïve Bayes Components ML vs. MAP Benefits Feature Preparation Filtering Decay Extended Examples

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Classification and Prediction

Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

### 8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

### Data Mining for Knowledge Management. Classification

1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

### Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Data, Measurements, Features

Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

### Web Document Clustering

Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

### 131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

### Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

### Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

### How To Perform An Ensemble Analysis

Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Reasoning Component Architecture

Architecture of a Spam Filter Application By Avi Pfeffer A spam filter consists of two components. In this article, based on my book Practical Probabilistic Programming, first describe the architecture

### Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

### Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

### Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

### Data Mining on Streams

Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

### TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

### Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

### Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

### Dan French Founder & CEO, Consider Solutions

Dan French Founder & CEO, Consider Solutions CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization CLIENTS CONTEXT The

### COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

### Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

### Decision-Tree Learning

Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values

### 1 What is Machine Learning?

COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #1 Scribe: Rob Schapire February 4, 2008 1 What is Machine Learning? Machine learning studies computer algorithms for learning to do

### Machine Learning and Statistics: What s the Connection?

Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

### Monotonicity Hints. Abstract

Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

### Machine learning for algo trading

Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### Bayes and Naïve Bayes. cs534-machine Learning

Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

### Knowledge-based systems and the need for learning

Knowledge-based systems and the need for learning The implementation of a knowledge-based system can be quite difficult. Furthermore, the process of reasoning with that knowledge can be quite slow. This

### Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

### Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

### Practical Introduction to Machine Learning and Optimization. Alessio Signorini <alessio.signorini@oneriot.com>

Practical Introduction to Machine Learning and Optimization Alessio Signorini Everyday's Optimizations Although you may not know, everybody uses daily some sort of optimization

### Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual

### Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

### IMPROVING PIPELINE RISK MODELS BY USING DATA MINING TECHNIQUES

IMPROVING PIPELINE RISK MODELS BY USING DATA MINING TECHNIQUES María Fernanda D Atri 1, Darío Rodriguez 2, Ramón García-Martínez 2,3 1. MetroGAS S.A. Argentina. 2. Área Ingeniería del Software. Licenciatura

### Projektgruppe. Categorization of text documents via classification

Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

### In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

### Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

### Foundations of Artificial Intelligence. Introduction to Data Mining

Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present

### Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu.

Decision Trees Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 42-268-7599 Copyright Andrew W. Moore Slide Decision Trees Decision trees

### D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

### Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP

### 1 Maximum likelihood estimation

COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

### Monday Morning Data Mining

Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik

### Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

### Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

### COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

### 6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

### Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

### Using Artificial Intelligence to Manage Big Data for Litigation

FEBRUARY 3 5, 2015 / THE HILTON NEW YORK Using Artificial Intelligence to Manage Big Data for Litigation Understanding Artificial Intelligence to Make better decisions Improve the process Allay the fear

### Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

### Machine Learning for natural language processing

Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to

### Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

### Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

### Less naive Bayes spam detection

Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:h.m.yang@tue.nl also CoSiNe Connectivity Systems

### Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

### F. Aiolli - Sistemi Informativi 2007/2008

Text Categorization Text categorization (TC - aka text classification) is the task of buiding text classifiers, i.e. sofware systems that classify documents from a domain D into a given, fixed set C =