Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Save this PDF as:

Size: px
Start display at page:

Download "Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)"

Transcription

1 Machine Learning 1

2 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2

3 Outline Inductive learning Decision tree learning Measuring learning performance Statistical learning Naive Bayes Learning Classification Evaluation 3

4 Inductive Learning Training Set, Data of N examples of input-output pairs (x 1,y 1 )...(x N,y N ) such that y i is generated by unknown function y = f(x) Learning: discover a hypothesis function h that approximates the true function f Test Set is used to measure accuracy of hypothesis h Hypothesis h generalizes well if it correctly predicts the value of y in novel examples Hypothesis space, Hypothesis being realizable 4

5 Kinds of Learning Three types of feedback determine main kinds of (machine) learning: Supervised learning: requires collection of sample input-output pairs problem instance, correct answer, so that it learns a function that maps from input to output. In other words, it requires teacher Unsupervised learning: learn patterns from the input without specific feedback: e.g., clustering. Requires no teacher Reinforcement learning: occasional rewards occur to reinforce or inhibit certain sequences of actions. Is harder, but requires no teacher Semi-Supervised learning: Too few labeled examples plus not necessarily very accurate 5

6 Inductive learning (a.k.a. Science) Simplest form: learn a function from examples (tabula rasa, blank slate in Latin) f is the target function An example is an input-output pair x, f(x), e.g., Problem: find a(n) hypothesis h such that h f given a training set of examples O O X X X, +1 6

7 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 7

8 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 8

9 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 9

10 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 10

11 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 11

12 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x Ockham s razor (William of Ockham (c )): maximize a combination of consistency and simplicity. 12

13 Learning Decision Trees A decision tree represents a function that takes as input a vector of attribute values and returns a decision a single output value. A B A xor B F F F F T T T F T T T F F F B A F T B T F T T T F We will now outline a supervised learning method for constructing decision trees given labeled data input-output pairs. 13

14 Attribute-based representations Examples described by attribute values (Boolean, discrete, continuous, etc.) E.g., situations where I will/won t wait for a table: Example Attributes Target Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait X 1 T F F T Some \$\$\$ F T French 0 10 T X 2 T F F T Full \$ F F Thai F X 3 F T F F Some \$ F F Burger 0 10 T X 4 T F T T Full \$ F F Thai T X 5 T F T F Full \$\$\$ F T French >60 F X 6 F T F T Some \$\$ T T Italian 0 10 T X 7 F T F F None \$ T F Burger 0 10 F X 8 F F F T Some \$\$ T T Thai 0 10 T X 9 F T T F Full \$ T F Burger >60 F X 10 T T T T Full \$\$\$ F T Italian F X 11 F F F F None \$ F F Thai 0 10 F X 12 T T T T Full \$ F F Burger T Classification of examples is positive (T) or negative (F) 14

15 Decision trees One possible representation for hypotheses E.g., here is the true tree for deciding whether to wait: Patrons? None Some Full F T WaitEstimate? > F Alternate? Hungry? T No Yes No Yes Reservation? Fri/Sat? T Alternate? No Yes No Yes No Yes Bar? T F T T Raining? No Yes No Yes F T F T 15

16 Expressiveness Decision trees can express any function of the input attributes. E.g., for Boolean functions, truth table row path to leaf: A B A xor B F F F F T T T F T T T F F F B A F T B T F T T T F Trivially, there is a consistent decision tree for any training set w/ one path to leaf for each example (unless f nondeterministic in x) but it probably won t generalize to new examples Prefer to find more compact decision trees 16

17 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? 17

18 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions 18

19 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows 19

20 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n 20

21 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees 21

22 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry Rain)?? 22

23 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry Rain)?? Each attribute can be in (positive), in (negative), or out 3 n distinct conjunctive hypotheses More expressive hypothesis space increases chance that target function can be expressed increases number of hypotheses consistent w/ training set may get worse predictions 23

24 Decision tree learning Aim: find a small tree consistent with the training examples Idea 1: (recursively) choose most significant attribute as root of (sub)tree to branch on next Idea 2: a good attribute splits the examples into subsets that are (ideally) all positive or all negative 24

25 Example contd. Decision tree learned from the 12 examples: Patrons? None Some Full F T Hungry? Yes No Type? F French Italian Thai Burger T F Fri/Sat? T No Yes F T Substantially simpler than true tree a more complex hypothesis isn t justified by small amount of data 25

26 How do we know that h f? Performance measurement 1) Use theorems of computational/statistical learning theory 2) Try h on a new test set of examples (use same distribution over example space as training set) Learning curve = % correct on test set as a function of training set size % correct on test set Training set size 26

27 Performance measurement contd. Learning curve depends on realizable (can express target function) vs. non-realizable non-realizability can be due to missing attributes or restricted hypothesis class (e.g., thresholded linear function) redundant expressiveness (e.g., loads of irrelevant attributes) % correct 1 realizable redundant nonrealizable # of examples 27

28 Performance measurement contd. II Still, How do we know that h f? Hume s Problem of Induction: Wikipedia: The problem of induction is the philosophical question of whether inductive reasoning leads to knowledge understood in the classic philosophical sense, since it focuses on the lack of justification for either: 1. Generalizing about the properties of a class of objects based on some number of observations of particular instances of that class (for example, the inference that all swans we have seen are white, and therefore all swans are white, before the discovery of black swans) or 2. Presupposing that a sequence of events in the future will occur as it always has in the past (for example, that the laws of physics will hold as they have always been observed to hold). Hume called this the principle uniformity of nature. 28

29 Classes of Learning Problems Classification: The output y of a true function that we learnis a finite set of values, e.g., wait or leave in a restaurant; sunny, cloudy, or rainy. Regression: The output y ofatrue function thatwe learnisanumber, e.g., tomorrow s temperature. Sometimes the function f is stochastic strictly speaking, it is not a function of x, so what we learn is a conditional probability distribution P(Y x). 29

30 Statistical learning Training Set, Data evidence instantiations of all or some of the random variables describing the domain Hypotheses are probabilistic theories of how the domain works 30

31 Learning a Probability Model Training Set, Data of N examples of input-output pairs (x 1,y 1 )...(x N,y N ) such that y i is generated by unknown function y = f(x) Inductive Learning: discover a hypothesis function h that approximates the true function f, e.g, Decision Trees Statistical Learning: Given a fixed structure of a probability model of the domain, discover its parameters from Data: parameter learning As a result given parameters of a problem instance, learned probability model can be used to answer queries about problem instances Classification: Observed parameters of a given instance and learned probability model of a domain provides probabilistic information on the likelihood of a particular classification 31

32 Classification Problems Classification is the task of predicting labels (class variables) for inputs Commercially and Scientifically Important Examples: Spam Filtering Optical Character Recognition (OCR) Medical Diagnoses Part of Speech Tagging Semantic Role Labeling/Information Extraction Automatic essay grading Fraud detection 32

33 Probabilistic Models A naive Bayes model: P(Cause,Effect 1,...,Effect n ) = P(Cause)Π i P(Effect i Cause) (1) Cavity Cause Toothache Catch Effect 1 Effect n where Cause is taken to be the class variable, which is to be predicted. The attribute-parameter variables are the leaves Effects. Model is naive : assumes parameter variables to be independent Model Training: using Training Set to uncover the conditional probability distribution of parameters P(Effect i Cause j ) Once the model is trained, given values of parameters of a problem instance, we can use (1) to classify an instance. 33

34 Independence as Abstraction Model is naive : assumes parameter variables to be independent May lead to overconfidence Indeed, all CAPS in Spam is not independent of \$\$ symbols Yet, it is often a fine abstraction, and a computationally tractable one 34

35 Optical Character Recognition Example: Training a Model Given a labeled collection M of digits in digital form nxn grid Features: Pixel i,j = on or off, Adj A naive Bayes model: P(Digit,Pixel 1,1,...,Pixel n,n,adj) = P(Digit)Π i,j P(Pixel i,j Digit)P(Adj) Model Training Process: For M P(0) = count(m,0) M,..., P(9) = count(m,9) M P(pixel 1,1 = on 0) = count(m,0,on,1,1) count(m,0),... P(pixel 1,1 = off 0) = 1 P(pixel 1,1 = on 0),... 35

36 Example: Classification in OCR Given parameters-attributes-features of an unseen instance and trained model we can compute P(0,pixel 1,1 = on,...,pixel n,n = off,adj = true) = x 0... P(9,pixel 1,1 = on,...,pixel n,n = off,adj) = x 9 and then pick the most likely class, i.e., class that corresponds to the maximum value among x 0,...,x 9. 36

37 Evaluation Split Labeled Data into Three Categories (80/10/10; 60/20/20): 1. Training set 2. Held-out set 3.Test set Decide on Features (Parameters, Attributes): attribute-value pairs that characterize each instance Experimentation-Evaluation Cycle: 1. Learn parameters, (e.g., model probabilities) on training set 2.Tune set of features on held-out set 3. Compute accuracy on test set: accuracy fraction of instances predicted correctly 37

38 Feature Engineering Feature Engineering is crucial! Features translate into hypotheses space Too few features: cannot fit the data Too many features: overfitting 38

39 Generalization and Overfitting Relative frequency parameters will overfit the training data Since training set did not contain 3 with pixel i,j on during training does not mean it does not exist (but note how we will assign probability 0 to such event!) Unlikely that every occurrence if minute is 100% spam Unlikely that every occurrence if seriously is 100% ham Similarly, what happens to the words that never occur in training set? Unseen events should not be assigned 0 probability To generalize better: smoothing is essential 39

40 Intuitions Behind Smoothing Estimation: Smoothing We have some prior expectation about parameters Given little evidence, we should prefer prior Given a lot of evidence the data should rule Maximum likelihood estimate P ML (x) = count(x) total samples does not account for above intuitions Consider three coin flips: Head, Head, Tail; what is P ML (x) 40

41 Laplace s estimate P LAP (x) = Estimation: Laplace Smoothing count(x)+1 total samples + X Pretend that every outcome appeared once more than it did Note how it elegantly deals with earlier unseen events Laplace s estimate extended with strength factor: P LAP,k (x) = count(x)+k total samples + k X Considerthreecoinflips: Head,Head,Tail;whatareP ML (x),p LAP (x),p LAP,k (x)? There are many ways to introduce smoothing as well as methods to account for unknown events 41

42 Summary Learning needed for unknown environments, lazy designers Learning method depends on type of performance element, available feedback, type of component to be improved, and its representation For supervised learning, the aim is to find a simple hypothesis that is approximately consistent with training examples Decision tree learning using information gain Learning performance = prediction accuracy measured on test set Learning Models, Naive Bayses Nets Classification Problem by Means of Naive Bayses Nets Smoothing Evaluation Concepts 42

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

CSE 473: Artificial Intelligence Autumn 2010

CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

Introduction to Learning & Decision Trees

Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

Class Overview and General Introduction to Machine Learning

Class Overview and General Introduction to Machine Learning Piyush Rai www.cs.utah.edu/~piyush CS5350/6350: Machine Learning August 23, 2011 (CS5350/6350) Intro to ML August 23, 2011 1 / 25 Course Logistics

Machine Learning: Overview

Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34

Machine Learning Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34 Outline 1 Introduction to Inductive learning 2 Search and inductive learning

Data mining knowledge representation

Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

Introduction to Statistical Machine Learning

CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

Data Mining Practical Machine Learning Tools and Techniques

Some Core Learning Representations Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision trees Learning Rules Association rules

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

Big Data Analytics CSCI 4030

High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

Learning is a very general term denoting the way in which agents:

What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

Learning. CS461 Artificial Intelligence Pinar Duygulu. Bilkent University, Spring 2007. Slides are mostly adapted from AIMA and MIT Open Courseware

1 Learning CS 461 Artificial Intelligence Pinar Duygulu Bilkent University, Slides are mostly adapted from AIMA and MIT Open Courseware 2 Learning What is learning? 3 Induction David Hume Bertrand Russell

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Naïve Bayes Components ML vs. MAP Benefits Feature Preparation Filtering Decay Extended Examples

Introduction to Pattern Recognition

Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

Lecture 10: Regression Trees

Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

K-nearest-neighbor: an introduction to machine learning

K-nearest-neighbor: an introduction to machine learning Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison slide 1 Outline Types of learning Classification:

The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

C19 Machine Learning

C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

Classification and Prediction

Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

MACHINE LEARNING. Introduction. Alessandro Moschitti

MACHINE LEARNING Introduction Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it Course Schedule Lectures Tuesday, 14:00-16:00

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

Data Mining for Knowledge Management. Classification

1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge

Introduction to Machine Learning Connectionist and Statistical Language Processing

Introduction to Machine Learning Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Introduction to Machine Learning p.1/22

Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00

Data, Measurements, Features

Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

CS 331: Artificial Intelligence Fundamentals of Probability II. Joint Probability Distribution. Marginalization. Marginalization

Full Joint Probability Distributions CS 331: Artificial Intelligence Fundamentals of Probability II Toothache Cavity Catch Toothache, Cavity, Catch false false false 0.576 false false true 0.144 false

Introduction to Applied Supervised Learning w/nlp

Introduction to Applied Supervised Learning w/nlp Stephen Purpura Cornell University Department of Information Science Talk at the Tools for Text Workshop June 2010 Topics Transition from Unsupervised

Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

Web Document Clustering

Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598. Keynote, Outlier Detection and Description Workshop, 2013

Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

Reasoning Component Architecture

Architecture of a Spam Filter Application By Avi Pfeffer A spam filter consists of two components. In this article, based on my book Practical Probabilistic Programming, first describe the architecture

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

13.3 Inference Using Full Joint Distribution

191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

Basics of Probability

Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space

Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

CS 188: Artificial Intelligence. Probability recap

CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability

Machine Learning and Statistics: What s the Connection?

Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

Data Mining on Streams

Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

Introduction to Artificial Neural Networks. Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks v.3 August Michel Verleysen Introduction - Introduction to Artificial Neural Networks p Why ANNs? p Biological inspiration p Some examples of problems p Historical

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

1 What is Machine Learning?

COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #1 Scribe: Rob Schapire February 4, 2008 1 What is Machine Learning? Machine learning studies computer algorithms for learning to do

1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

Monotonicity Hints. Abstract

Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

Dan French Founder & CEO, Consider Solutions

Dan French Founder & CEO, Consider Solutions CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization CLIENTS CONTEXT The

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

Logic, Probability and Learning

Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

Decision-Tree Learning

Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual

An introduction to machine learning

An introduction to machine learning Pierre Lison, Language Technology Group (LTG) Department of Informatics HiOA, October 3 2012 Outline Motivation Machine learning approaches My own research Conclusion

Projektgruppe. Categorization of text documents via classification

Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

Foundations of Artificial Intelligence. Introduction to Data Mining

Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

A Gentle Introduction to Machine Learning

A Gentle Introduction to Machine Learning Second Lecture Part I Olov Andersson, AIICS Linköpings Universitet Outline of Machine Learning Lectures First we will talk about Supervised Learning Definition

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

Email Spam Detection A Machine Learning Approach

Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn

Machine Learning for natural language processing

Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery