Recitation. Mladen Kolar

Similar documents
Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Supervised Learning (Big Data Analytics)

Machine learning for algo trading

Content-Based Recommendation

STA 4273H: Statistical Machine Learning

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Question 2 Naïve Bayes (16 points)

Statistical Models in Data Mining

LCs for Binary Classification

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

HT2015: SC4 Statistical Data Mining and Machine Learning

Data Mining. Nonlinear Classification

Predict Influencers in the Social Network

Data Mining Practical Machine Learning Tools and Techniques

Cross-validation for detecting and preventing overfitting

Predictive Data modeling for health care: Comparative performance study of different prediction models

Social Media Mining. Data Mining Essentials

Data Mining: An Overview. David Madigan

Making Sense of the Mayhem: Machine Learning and March Madness

MACHINE LEARNING IN HIGH ENERGY PHYSICS

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

An Introduction to Data Mining

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Big Data Analytics CSCI 4030

Decompose Error Rate into components, some of which can be measured on unlabeled data

1 Maximum likelihood estimation

Classification Problems

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Data Mining Techniques for Prognosis in Pancreatic Cancer

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Evaluation of Machine Learning Techniques for Green Energy Prediction

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

L13: cross-validation

A Content based Spam Filtering Using Optical Back Propagation Technique

Local classification and local likelihoods

Data Mining - Evaluation of Classifiers

Towards better accuracy for Spam predictions

E-commerce Transaction Anomaly Classification

Classification algorithm in Data mining: An Overview

Detecting Corporate Fraud: An Application of Machine Learning

Microsoft Azure Machine learning Algorithms

Classification of Bad Accounts in Credit Card Industry

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Principles of Data Mining by Hand&Mannila&Smyth

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Statistical Machine Learning

Azure Machine Learning, SQL Data Mining and R

Neural Networks and Support Vector Machines

Predicting Student Persistence Using Data Mining and Statistical Analysis Methods

Employer Health Insurance Premium Prediction Elliott Lui

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

MHI3000 Big Data Analytics for Health Care Final Project Report

Multi-Class and Structured Classification

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Predicting Flight Delays

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Knowledge Discovery and Data Mining

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

MS1b Statistical Data Mining

Machine Learning with MATLAB David Willingham Application Engineer

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

Basics of Statistical Machine Learning

Final Project Report

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Chapter 6. The stacking ensemble approach

Hong Kong Stock Index Forecasting

Neural Networks Lesson 5 - Cluster Analysis

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

PS 271B: Quantitative Methods II. Lecture Notes

Knowledge Discovery and Data Mining

Bayes and Naïve Bayes. cs534-machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Cell Phone based Activity Detection using Markov Logic Network

Supporting Online Material for

OUTLIER ANALYSIS. Data Mining 1

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Linear Threshold Units

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Car Insurance. Prvák, Tomi, Havri

Machine Learning in Spam Filtering

Lecture 13: Validation

Knowledge Discovery and Data Mining

Support Vector Machines with Clustering for Training with Very Large Datasets

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data. and Alex Gray

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Course: Model, Learning, and Inference: Lecture 5

Customer Classification And Prediction Based On Data Mining Technique

Bayesian networks - Time-series models - Apache Spark & Scala

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Transcription:

10-601 Recitation Mladen Kolar

Topics covered after the midterm Learning Theory Hidden Markov Models Neural Networks Dimensionality reduction Nonparametric methods Support vector machines Boosting

Learning theory Agnostic learning Infinite number of hypothesis

Which formula to use?

Which formula to use?

Which formula to use?

Bayes Networks Which statement is true?

Bayes Networks How many independent parameters?

Hidden Markov Models Special case of Bayes Networks Representation Algorithms forward forward-backward viterbi Baum-Welch

Principal component analysis Classification in lower dimensional space. Label points so that 1-NN classifier have the following leave-one-out cross validation errors

Principal component analysis Classification in lower dimensional space. Label points so that 1-NN classifier have the following leave-one-out cross validation errors

Principal component analysis Classification in lower dimensional space. Label points so that 1-NN classifier have the following leave-one-out cross validation errors

Principal component analysis Classification in lower dimensional space. Label points so that 1-NN classifier have the following leave-one-out cross validation errors

Gaussian Naïve Bayes and Logistic Regression

Gaussian Naïve Bayes and Logistic Regression

Gaussian Naïve Bayes and Logistic Regression

Gaussian Naïve Bayes and Logistic Regression

Gaussian Naïve Bayes and Logistic Regression

Gaussian Naïve Bayes and Logistic Regression

Gaussian Naïve Bayes and Logistic Regression

Gaussian Naïve Bayes and Logistic Regression

Boosting

Boosting

Boosting

Which classifier to use? Your billionaire friend needs your help. She needs to classify job applications into good/bad categories, and also to detect job applicants who lie in their applications using density estimation to detect outliers. To meet these needs, do you recommend using a discriminative or generative classifier? Why?

Which classifier to use? Your billionaire friend also wants to classify companies to decide which one to acquire. This project has lots of training data based on several decades of research. To create the most accurate classifier, do you recommend using a discriminative or generative classifier? Why?

Short questions Assume that we are using an SVM classifier with a Gaussian kernel. Draw a graph showing two curves: training error vs. kernel bandwidth and test error vs. kernel bandwidth

Short questions Assume that we are modeling a number of random variables using a Bayesian Network with n edges. Draw a graph showing two curves: Bias of the estimate of the joint probability vs. n and variance of the estimate of the joint probability vs. n.

Short questions Both PCA and linear regression can be thought of as algorithms for minimizing a sum of squared errors. Explain which error is being minimized in each algorithm.

Short questions Both PCA and linear regression can be thought of as algorithms for minimizing a sum of squared errors. Explain which error is being minimized in each algorithm. PCA reconstruction error Linear regression residual error

Short questions For kernel regression, which one of these structural assumptions is the one that most affects the trade-off between underfitting and overfitting: Kernel function (say, Gaussian vs box-shaped) Metric used (L 2 vs L 1 vs L ) Kernel width The maximum height of a kernel

Short questions Given two classifiers A and B, if A has a lower VCdimension than B then A almost certainly will perform better on a test set. False

Short questions In neural networks, what controls the trade-off between underfitting and overfitting?

Short questions Distribute the numbers 1,2,3,4 to the following four methods for classifying 2D data, such that 1 indicates highest variance and lowest bias, and 4 indicates lowest variance and highest bias. Boosted logistic regression (10 rounds of boosting) 1-nearest neighbor classifier Decision trees of depth 10 Logistic regression

Short questions Distribute the numbers 1,2,3,4 to the following four methods for classifying 2D data, such that 1 indicates highest variance and lowest bias, and 4 indicates lowest variance and highest bias. 1. 1-nearest neighbor classifier 2. Decision trees of depth 10 3. Boosted logistic regression (10 rounds of boosting) 4. Logistic regression

Short questions The ID3 algorithm is guaranteed to find the optimal decision tree. False

Short questions What is the difference between MLE and MAP estimates? MLE finds parameters to maximize the likelihood function. MAP finds parameters to maximize the posterior probability.

Is HMM appropriate model? Gene sequence dataset A database of movie reviews Stock market price dataset Daily precipitation data from the Northwest of the US

knn What is the prediction of the 3-nearest-neighbor classifier at the point (1,1)?

knn What is the prediction of the 5-nearest-neighbor classifier at the point (1,1)?

knn What is the prediction of the 7-nearest-neighbor classifier at the point (1,1)?

knn Consider the two-class classification problem. At a data point x, the true conditional probability of a class k is p k x = P C = k X = x. What is the Bayes error at a point x? min k p k (x) What is the 1-NN error at x when x is the nearest neighbor? p 0 x p 1 x + p 0 x p 1 (x)