Supervised Learning (Big Data Analytics)

Similar documents

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Data Mining - Evaluation of Classifiers

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Data Mining Practical Machine Learning Tools and Techniques

Data Mining. Nonlinear Classification

Chapter 6. The stacking ensemble approach

Knowledge Discovery and Data Mining

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Social Media Mining. Data Mining Essentials

Evaluation & Validation: Credibility: Evaluating what has been learned

Principles of Data Mining by Hand&Mannila&Smyth

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Machine Learning in Spam Filtering

Machine Learning Final Project Spam Filtering

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Statistical Machine Learning

An Introduction to Data Mining

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Machine learning for algo trading

More Data Mining with Weka

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Making Sense of the Mayhem: Machine Learning and March Madness

Question 2 Naïve Bayes (16 points)

Learning is a very general term denoting the way in which agents:

CSE 473: Artificial Intelligence Autumn 2010

Knowledge Discovery and Data Mining

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

What is Learning? CS 391L: Machine Learning Introduction. Raymond J. Mooney. Classification. Problem Solving / Planning / Control

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Classification algorithm in Data mining: An Overview

Big Data Analytics CSCI 4030

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Linear Classification. Volker Tresp Summer 2015

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Azure Machine Learning, SQL Data Mining and R

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

STA 4273H: Statistical Machine Learning

Data Mining Algorithms Part 1. Dejan Sarka

Knowledge Discovery and Data Mining

Logistic Regression for Spam Filtering

Projektgruppe. Categorization of text documents via classification

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

A semi-supervised Spam mail detector

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

Data Mining Techniques for Prognosis in Pancreatic Cancer

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Scalable Developments for Big Data Analytics in Remote Sensing

Simple and efficient online algorithms for real world applications

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers

Comparison of Data Mining Techniques used for Financial Data Analysis

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Learning from Diversity

CSCI567 Machine Learning (Fall 2014)

The Basics of Graphical Models

Stock Market Forecasting Using Machine Learning Algorithms

LCs for Binary Classification

Machine Learning and Pattern Recognition Logistic Regression

Car Insurance. Havránek, Pokorný, Tomášek

Introduction to Data Mining

Machine Learning Logistic Regression

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Big Data Analytics and Optimization

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Predict Influencers in the Social Network

A Simple Introduction to Support Vector Machines

Analytics on Big Data

HT2015: SC4 Statistical Data Mining and Machine Learning

Car Insurance. Prvák, Tomi, Havri

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Monday Morning Data Mining

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Mining a Corpus of Job Ads

Detecting Corporate Fraud: An Application of Machine Learning

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Model Combination. 24 Novembre 2009

Getting Even More Out of Ensemble Selection

Data Mining for Knowledge Management. Classification

8. Machine Learning Applied Artificial Intelligence

Maschinelles Lernen mit MATLAB

Linear Threshold Units

W6.B.1. FAQs CS535 BIG DATA W6.B If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Local classification and local likelihoods

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Bayesian networks - Time-series models - Apache Spark & Scala

Introduction to Online Learning Theory

NEURAL NETWORKS A Comprehensive Foundation

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

Transcription:

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice

Goal of Big Data Analytics Uncover patterns in Data. Can be used to: Gaining competitive advantage if you are a marketer Making a lot of money if you are working in the stock market Winning presidential Elections and so on. Analytics = Machine learning Practical advice on how to use machine learning the right way.

Traditional Programming Data Program Computer Output Not scalable Machine Learning (Analytics) Data Computer Output Scalable Program Not Magic: More like Gardening. Farmers combine seeds with nutrients to grow crops. Learners combine knowledge with data to grow programs

Supervised Learning Given: Training examples x, f x for some unknown function f. Find: A good approximation to f. Classification problem: f(x) is an (small) integer Regression problem: f(x) is a real number Example Applications: Credit Card approval; Spam Filtering; Disease Diagnosis; Automatically tagging images with location; etc.

Example: Credit Card approval A credit card company receives thousands of applications for new cards. Each application contains information about an applicant: age Marital status annual salary outstanding debts credit rating etc. Problem: to classify applications into two categories, approved and not approved.

Classification Example: Spam Filtering Classify as Spam or Not Spam

Classification Example: Weather Prediction Classify as Rainy, Cloudy, Sunny

Regression example: Predicting Gold/Stock prices Good ML can make you rich (but there is still some risk involved). Given historical data on Gold prices, predict tomorrow s price!

Supervised Learning: Some Terminology Training Data Training Example: Example of the form x, f x Classifier: A discrete-valued function or an algorithm that outputs a discrete-valued function Classes: The number of distinct values that f x can take.

Training Data Training Example Two Classes: {Yes,No}

Steps in Supervised Learning 1. Determine the representation for x,f(x) and determine what x to use Feature Engineering 2. Gather a training set (not all data is kosher) Data Cleaning 3. Select a suitable evaluation method 4. Find a suitable learning algorithm among a plethora of available choices

Feature Engineering is the Key Most effort in ML projects is constructing features Black art: Intuition, creativity required Understand properties of the task at hand How the features interact with or limit the algorithm you are using. ML is an iterative process Try different types of features, experiment with each and then decide which feature set/algorithm combination to use

What features will you use? Examples Spam Filtering Mapping images to names Feature Combination Linear models cannot handle some dependencies between features (e.g. XOR) Feature combinations might work better. Quick growth of the number of features.

Evaluation Accuracy Fraction of the examples that are correctly classified by the learner Precision, Recall and F-score (Next slide) Squared error (Regression problems) Likelihood Posterior probability Cost / Utility Etc.

Precision, Recall and F-1 score Precision (P) = F1-score = 2 P R P+R Actual=True tp ; Recall (R)= tp tp+fp tp+fn Harmonic mean of precision and recall Actual=False Predicted=True tp (correct result) fp (unexpected result) Predicted=False fn (missing result) tn (correct absence of result)

What algorithms (Classifiers/learners) to use Naïve Bayes Logistic Regression Linear SVMs Decision Trees Neural Networks Support Vector Machines Linear classifiers Non-linear K-nearest neighbors (Non-parametric) Bagging (Meta); Boosting (Meta) Weka Software

Classifiers: Bias versus Variance High Bias, low variance Low Bias, High variance Linear Classifier Naïve Bayes Logistic Regression Perceptron Non-Linear Classifier Support vector machine (Kernels) Neural networks Decision Trees Simpler Complex

Learning = Representation + Evaluation + Optimization Thousands of learning algorithms Combinations of just three elements Representation Evaluation Optimization Instances Accuracy Greedy search Hyperplanes Precision/Recall Branch & bound Decision trees Squared error Gradient descent Sets of rules Likelihood Quasi-Newton Neural networks Posterior prob. Linear progr. Graphical models Margin Quadratic progr. Etc. Etc. Etc.

It s Generalization that Counts Divide data into training, test and hold-out or validation set Algorithm must work on test examples never seen before Training examples can just be memorized Don t tune parameters on test data Use cross-validation Training Data Hold-Out Data Test Data

K-Fold Cross Validation Test Train on (k - 1) splits k-fold Choose a suitable K (usually 10)

Data Alone Is Not Enough Classes of unseen examples are arbitrary So learner must make assumptions No free lunch theorems Luckily, real world is not random Induction is knowledge lever

Overfitting Has Many Faces Classifier A is better than B on the training set but the reverse is true on the test set!! The biggest problem in machine learning Bias and variance (Simple vs. Complex) Can learn a simpler linear function vs. can learn any function Less powerful learners can be better Solutions: Cross-validation; Regularization

Intuition Fails In High Dimensions Curse of dimensionality Sparseness worsens exponentially with number of features Irrelevant features ruin similarity In high dimensions all examples look alike 3D intuitions do not apply in high dimensions Blessing of non-uniformity

More Data Beats a Cleverer Algorithm Easiest way to improve: More data Then Data is bottleneck Now: Scalability is bottleneck ML algorithms more similar than they appear Clever algorithms require more effort but can pay off in the end Biggest bottleneck is human time

Learn Many Models, Not Just One Three stages of machine learning 1. Try variations of one algorithm, chose one 2. Try variations of many algorithms, choose one 3. Combine many algorithms, variations Ensemble techniques Bagging Boosting Stacking Etc.

Representable Does Not Imply Learnable Standard claim: My language can represent/approximate any function No excuse for ignoring others Causes of non-learnability Not enough data Not enough components Not enough search Some representations exponentially more compact than others

ADVANCED TOPICS

Supervised Learning and its Generalizations Supervised Learning Desired output is simple. (e.g., purchase an item or not; the person has the disease or not; etc.) Structured Prediction: is a Generalization Desired output is complex.

Structured Prediction: Examples Parsing: given an input sequence, build a tree whose leaves are the elements in the sequence and whose structure obeys some grammar. Collective classification: given a graph defined by a set of vertices and edges, produce a labeling of the vertices. Labeling web pages given link information

Models and Algorithms for Structured Prediction Probabilistic Graphical Models Compact representation of joint distribution Principled way of dealing with uncertainty Take advantage of conditional independence Markov logic and statistical relational models Model both relational structure and uncertainty One example related with another example Considerable machine learning expertise required here! (not yet a blackbox)