Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)


 Sydney Webb
 6 years ago
 Views:
Transcription
1 Machine Learning 1
2 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2
3 Outline Inductive learning Decision tree learning Measuring learning performance Statistical learning Naive Bayes Learning Classification Evaluation 3
4 Inductive Learning Training Set, Data of N examples of inputoutput pairs (x 1,y 1 )...(x N,y N ) such that y i is generated by unknown function y = f(x) Learning: discover a hypothesis function h that approximates the true function f Test Set is used to measure accuracy of hypothesis h Hypothesis h generalizes well if it correctly predicts the value of y in novel examples Hypothesis space, Hypothesis being realizable 4
5 Kinds of Learning Three types of feedback determine main kinds of (machine) learning: Supervised learning: requires collection of sample inputoutput pairs problem instance, correct answer, so that it learns a function that maps from input to output. In other words, it requires teacher Unsupervised learning: learn patterns from the input without specific feedback: e.g., clustering. Requires no teacher Reinforcement learning: occasional rewards occur to reinforce or inhibit certain sequences of actions. Is harder, but requires no teacher SemiSupervised learning: Too few labeled examples plus not necessarily very accurate 5
6 Inductive learning (a.k.a. Science) Simplest form: learn a function from examples (tabula rasa, blank slate in Latin) f is the target function An example is an inputoutput pair x, f(x), e.g., Problem: find a(n) hypothesis h such that h f given a training set of examples O O X X X, +1 6
7 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 7
8 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 8
9 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 9
10 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 10
11 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x 11
12 Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: f(x) x Ockham s razor (William of Ockham (c )): maximize a combination of consistency and simplicity. 12
13 Learning Decision Trees A decision tree represents a function that takes as input a vector of attribute values and returns a decision a single output value. A B A xor B F F F F T T T F T T T F F F B A F T B T F T T T F We will now outline a supervised learning method for constructing decision trees given labeled data inputoutput pairs. 13
14 Attributebased representations Examples described by attribute values (Boolean, discrete, continuous, etc.) E.g., situations where I will/won t wait for a table: Example Attributes Target Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait X 1 T F F T Some $$$ F T French 0 10 T X 2 T F F T Full $ F F Thai F X 3 F T F F Some $ F F Burger 0 10 T X 4 T F T T Full $ F F Thai T X 5 T F T F Full $$$ F T French >60 F X 6 F T F T Some $$ T T Italian 0 10 T X 7 F T F F None $ T F Burger 0 10 F X 8 F F F T Some $$ T T Thai 0 10 T X 9 F T T F Full $ T F Burger >60 F X 10 T T T T Full $$$ F T Italian F X 11 F F F F None $ F F Thai 0 10 F X 12 T T T T Full $ F F Burger T Classification of examples is positive (T) or negative (F) 14
15 Decision trees One possible representation for hypotheses E.g., here is the true tree for deciding whether to wait: Patrons? None Some Full F T WaitEstimate? > F Alternate? Hungry? T No Yes No Yes Reservation? Fri/Sat? T Alternate? No Yes No Yes No Yes Bar? T F T T Raining? No Yes No Yes F T F T 15
16 Expressiveness Decision trees can express any function of the input attributes. E.g., for Boolean functions, truth table row path to leaf: A B A xor B F F F F T T T F T T T F F F B A F T B T F T T T F Trivially, there is a consistent decision tree for any training set w/ one path to leaf for each example (unless f nondeterministic in x) but it probably won t generalize to new examples Prefer to find more compact decision trees 16
17 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? 17
18 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions 18
19 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows 19
20 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n 20
21 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees 21
22 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry Rain)?? 22
23 Hypothesis spaces How many distinct decision trees with n Boolean attributes?? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry Rain)?? Each attribute can be in (positive), in (negative), or out 3 n distinct conjunctive hypotheses More expressive hypothesis space increases chance that target function can be expressed increases number of hypotheses consistent w/ training set may get worse predictions 23
24 Decision tree learning Aim: find a small tree consistent with the training examples Idea 1: (recursively) choose most significant attribute as root of (sub)tree to branch on next Idea 2: a good attribute splits the examples into subsets that are (ideally) all positive or all negative 24
25 Example contd. Decision tree learned from the 12 examples: Patrons? None Some Full F T Hungry? Yes No Type? F French Italian Thai Burger T F Fri/Sat? T No Yes F T Substantially simpler than true tree a more complex hypothesis isn t justified by small amount of data 25
26 How do we know that h f? Performance measurement 1) Use theorems of computational/statistical learning theory 2) Try h on a new test set of examples (use same distribution over example space as training set) Learning curve = % correct on test set as a function of training set size % correct on test set Training set size 26
27 Performance measurement contd. Learning curve depends on realizable (can express target function) vs. nonrealizable nonrealizability can be due to missing attributes or restricted hypothesis class (e.g., thresholded linear function) redundant expressiveness (e.g., loads of irrelevant attributes) % correct 1 realizable redundant nonrealizable # of examples 27
28 Performance measurement contd. II Still, How do we know that h f? Hume s Problem of Induction: Wikipedia: The problem of induction is the philosophical question of whether inductive reasoning leads to knowledge understood in the classic philosophical sense, since it focuses on the lack of justification for either: 1. Generalizing about the properties of a class of objects based on some number of observations of particular instances of that class (for example, the inference that all swans we have seen are white, and therefore all swans are white, before the discovery of black swans) or 2. Presupposing that a sequence of events in the future will occur as it always has in the past (for example, that the laws of physics will hold as they have always been observed to hold). Hume called this the principle uniformity of nature. 28
29 Classes of Learning Problems Classification: The output y of a true function that we learnis a finite set of values, e.g., wait or leave in a restaurant; sunny, cloudy, or rainy. Regression: The output y ofatrue function thatwe learnisanumber, e.g., tomorrow s temperature. Sometimes the function f is stochastic strictly speaking, it is not a function of x, so what we learn is a conditional probability distribution P(Y x). 29
30 Statistical learning Training Set, Data evidence instantiations of all or some of the random variables describing the domain Hypotheses are probabilistic theories of how the domain works 30
31 Learning a Probability Model Training Set, Data of N examples of inputoutput pairs (x 1,y 1 )...(x N,y N ) such that y i is generated by unknown function y = f(x) Inductive Learning: discover a hypothesis function h that approximates the true function f, e.g, Decision Trees Statistical Learning: Given a fixed structure of a probability model of the domain, discover its parameters from Data: parameter learning As a result given parameters of a problem instance, learned probability model can be used to answer queries about problem instances Classification: Observed parameters of a given instance and learned probability model of a domain provides probabilistic information on the likelihood of a particular classification 31
32 Classification Problems Classification is the task of predicting labels (class variables) for inputs Commercially and Scientifically Important Examples: Spam Filtering Optical Character Recognition (OCR) Medical Diagnoses Part of Speech Tagging Semantic Role Labeling/Information Extraction Automatic essay grading Fraud detection 32
33 Probabilistic Models A naive Bayes model: P(Cause,Effect 1,...,Effect n ) = P(Cause)Π i P(Effect i Cause) (1) Cavity Cause Toothache Catch Effect 1 Effect n where Cause is taken to be the class variable, which is to be predicted. The attributeparameter variables are the leaves Effects. Model is naive : assumes parameter variables to be independent Model Training: using Training Set to uncover the conditional probability distribution of parameters P(Effect i Cause j ) Once the model is trained, given values of parameters of a problem instance, we can use (1) to classify an instance. 33
34 Independence as Abstraction Model is naive : assumes parameter variables to be independent May lead to overconfidence Indeed, all CAPS in Spam is not independent of $$ symbols Yet, it is often a fine abstraction, and a computationally tractable one 34
35 Optical Character Recognition Example: Training a Model Given a labeled collection M of digits in digital form nxn grid Features: Pixel i,j = on or off, Adj A naive Bayes model: P(Digit,Pixel 1,1,...,Pixel n,n,adj) = P(Digit)Π i,j P(Pixel i,j Digit)P(Adj) Model Training Process: For M P(0) = count(m,0) M,..., P(9) = count(m,9) M P(pixel 1,1 = on 0) = count(m,0,on,1,1) count(m,0),... P(pixel 1,1 = off 0) = 1 P(pixel 1,1 = on 0),... 35
36 Example: Classification in OCR Given parametersattributesfeatures of an unseen instance and trained model we can compute P(0,pixel 1,1 = on,...,pixel n,n = off,adj = true) = x 0... P(9,pixel 1,1 = on,...,pixel n,n = off,adj) = x 9 and then pick the most likely class, i.e., class that corresponds to the maximum value among x 0,...,x 9. 36
37 Evaluation Split Labeled Data into Three Categories (80/10/10; 60/20/20): 1. Training set 2. Heldout set 3.Test set Decide on Features (Parameters, Attributes): attributevalue pairs that characterize each instance ExperimentationEvaluation Cycle: 1. Learn parameters, (e.g., model probabilities) on training set 2.Tune set of features on heldout set 3. Compute accuracy on test set: accuracy fraction of instances predicted correctly 37
38 Feature Engineering Feature Engineering is crucial! Features translate into hypotheses space Too few features: cannot fit the data Too many features: overfitting 38
39 Generalization and Overfitting Relative frequency parameters will overfit the training data Since training set did not contain 3 with pixel i,j on during training does not mean it does not exist (but note how we will assign probability 0 to such event!) Unlikely that every occurrence if minute is 100% spam Unlikely that every occurrence if seriously is 100% ham Similarly, what happens to the words that never occur in training set? Unseen events should not be assigned 0 probability To generalize better: smoothing is essential 39
40 Intuitions Behind Smoothing Estimation: Smoothing We have some prior expectation about parameters Given little evidence, we should prefer prior Given a lot of evidence the data should rule Maximum likelihood estimate P ML (x) = count(x) total samples does not account for above intuitions Consider three coin flips: Head, Head, Tail; what is P ML (x) 40
41 Laplace s estimate P LAP (x) = Estimation: Laplace Smoothing count(x)+1 total samples + X Pretend that every outcome appeared once more than it did Note how it elegantly deals with earlier unseen events Laplace s estimate extended with strength factor: P LAP,k (x) = count(x)+k total samples + k X Considerthreecoinflips: Head,Head,Tail;whatareP ML (x),p LAP (x),p LAP,k (x)? There are many ways to introduce smoothing as well as methods to account for unknown events 41
42 Summary Learning needed for unknown environments, lazy designers Learning method depends on type of performance element, available feedback, type of component to be improved, and its representation For supervised learning, the aim is to find a simple hypothesis that is approximately consistent with training examples Decision tree learning using information gain Learning performance = prediction accuracy measured on test set Learning Models, Naive Bayses Nets Classification Problem by Means of Naive Bayses Nets Smoothing Evaluation Concepts 42
Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing
CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and
More informationMachine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks
CS 188: Artificial Intelligence Naïve Bayes Machine Learning Up until now: how use a model to make optimal decisions Machine learning: how to acquire a model from data / experience Learning parameters
More information203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationMachine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
More informationCSE 473: Artificial Intelligence Autumn 2010
CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron
More informationMachine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)
Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationIntroduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
More informationIntroduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 538 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning?  more than just memorizing
More informationMachine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machinelearning Logistics Lectures M 9:3011:30 am Room 4419 Personnel
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining BecerraFernandez, et al.  Knowledge Management 1/e  2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationMachine Learning. Term 2012/2013 LSI  FIB. Javier Béjar cbea (LSI  FIB) Machine Learning Term 2012/2013 1 / 34
Machine Learning Javier Béjar cbea LSI  FIB Term 2012/2013 Javier Béjar cbea (LSI  FIB) Machine Learning Term 2012/2013 1 / 34 Outline 1 Introduction to Inductive learning 2 Search and inductive learning
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationMachine Learning: Overview
Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationLearning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationIntroduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007
Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Naïve Bayes Components ML vs. MAP Benefits Feature Preparation Filtering Decay Extended Examples
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationClassification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
More information8. Machine Learning Applied Artificial Intelligence
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
More informationData Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationMachine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu
Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? MerriamWebster: learn = to acquire knowledge, understanding, or skill
More informationData Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More information1311. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 1311 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationNetwork Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang InternetDraft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draftjiangnmlrgnetworkmachinelearning00
More information10601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html
10601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html Course data All uptodate info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601f10/index.html
More informationMachine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
More informationHow To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationReasoning Component Architecture
Architecture of a Spam Filter Application By Avi Pfeffer A spam filter consists of two components. In this article, based on my book Practical Probabilistic Programming, first describe the architecture
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University Email: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationData Mining on Streams
Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs
More informationTOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM ThanhNghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
More informationKnowledge Discovery and Data Mining. Structured vs. NonStructured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. NonStructured Data Most business databases contain structured data consisting of welldefined fields with numeric or alphanumeric values.
More informationDan French Founder & CEO, Consider Solutions
Dan French Founder & CEO, Consider Solutions CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization CLIENTS CONTEXT The
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Daybyday Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationDecisionTree Learning
DecisionTree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: TopDown Induction of Decision Trees Numeric Values Missing Values
More information1 What is Machine Learning?
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #1 Scribe: Rob Schapire February 4, 2008 1 What is Machine Learning? Machine learning studies computer algorithms for learning to do
More informationMachine Learning and Statistics: What s the Connection?
Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationMonotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. AbuMostafa EE and CS Deptartments California Institute of Technology
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationBayes and Naïve Bayes. cs534machine Learning
Bayes and aïve Bayes cs534machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationKnowledgebased systems and the need for learning
Knowledgebased systems and the need for learning The implementation of a knowledgebased system can be quite difficult. Furthermore, the process of reasoning with that knowledge can be quite slow. This
More informationUniversité de Montpellier 2 Hugo AlatristaSalas : hugo.alatristasalas@teledetection.fr
Université de Montpellier 2 Hugo AlatristaSalas : hugo.alatristasalas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationEmail Spam Detection A Machine Learning Approach
Email Spam Detection A Machine Learning Approach Ge Song, Lauren Steimle ABSTRACT Machine learning is a branch of artificial intelligence concerned with the creation and study of systems that can learn
More informationPractical Introduction to Machine Learning and Optimization. Alessio Signorini <alessio.signorini@oneriot.com>
Practical Introduction to Machine Learning and Optimization Alessio Signorini Everyday's Optimizations Although you may not know, everybody uses daily some sort of optimization
More informationSteven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg
Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
More informationSemiSupervised Support Vector Machines and Application to Spam Filtering
SemiSupervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationIMPROVING PIPELINE RISK MODELS BY USING DATA MINING TECHNIQUES
IMPROVING PIPELINE RISK MODELS BY USING DATA MINING TECHNIQUES María Fernanda D Atri 1, Darío Rodriguez 2, Ramón GarcíaMartínez 2,3 1. MetroGAS S.A. Argentina. 2. Área Ingeniería del Software. Licenciatura
More informationProjektgruppe. Categorization of text documents via classification
Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical microclustering algorithm ClusteringBased SVM (CBSVM) Experimental
More informationFoundations of Artificial Intelligence. Introduction to Data Mining
Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present
More informationDecision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu.
Decision Trees Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 422687599 Copyright Andrew W. Moore Slide Decision Trees Decision trees
More informationD A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
More informationTagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Partofspeech (POS) tagging is perhaps the earliest, and most famous,
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 3448 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationGrammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University
Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun > roller Verb thrills VP Verb NP S NP VP NP S VP
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationMonday Morning Data Mining
Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline:  data mining  IceCube  Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik
More informationPrinciples of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n
Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,
More informationBuilding a Question Classifier for a TRECStyle Question Answering System
Building a Question Classifier for a TRECStyle Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More informationProfessor Anita Wasilewska. Classification Lecture Notes
Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,
More informationCOMMON CORE STATE STANDARDS FOR
COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in
More information6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationUsing Artificial Intelligence to Manage Big Data for Litigation
FEBRUARY 3 5, 2015 / THE HILTON NEW YORK Using Artificial Intelligence to Manage Big Data for Litigation Understanding Artificial Intelligence to Make better decisions Improve the process Allay the fear
More informationDistributed forests for MapReducebased machine learning
Distributed forests for MapReducebased machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Introduction Laura Kallmeyer HeinrichHeineUniversität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to
More informationPredictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationLess naive Bayes spam detection
Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. Email:h.m.yang@tue.nl also CoSiNe Connectivity Systems
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationF. Aiolli  Sistemi Informativi 2007/2008
Text Categorization Text categorization (TC  aka text classification) is the task of buiding text classifiers, i.e. sofware systems that classify documents from a domain D into a given, fixed set C =
More informationMachine Learning for Data Science (CS4786) Lecture 1
Machine Learning for Data Science (CS4786) Lecture 1 TuTh 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:
More informationLecture Slides for INTRODUCTION TO. ETHEM ALPAYDIN The MIT Press, 2004. Lab Class and literature. Friday, 9.00 10.00, Harburger Schloßstr.
Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Lab Class and literature Friday, 9.00 10.00, Harburger
More information