Machine Learning for Data Science (CS4786) Lecture 1



Similar documents
TIETS34 Seminar: Data Mining on Biometric identification

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

MS1b Statistical Data Mining

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

An Introduction to Data Mining

Machine Learning with MATLAB David Willingham Application Engineer

Learning is a very general term denoting the way in which agents:

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Machine Learning using MapReduce

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Machine Learning Introduction

Introduction to Data Mining

Statistics for BIG data

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Big Data and Marketing

Machine learning for algo trading

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Evaluation of Machine Learning Techniques for Green Energy Prediction

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Data Mining Yelp Data - Predicting rating stars from review text

: Introduction to Machine Learning Dr. Rita Osadchy

Using Data Mining for Mobile Communication Clustering and Characterization

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Machine Learning What, how, why?

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

Machine Learning Capacity and Performance Analysis and R

Introduction to Machine Learning Using Python. Vikram Kamath

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Data, Measurements, Features

The Data Mining Process

Data Mining: Overview. What is Data Mining?

Clustering Connectionist and Statistical Language Processing

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

How To Cluster

Introduction to Pattern Recognition

Data Mining Part 5. Prediction

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

HT2015: SC4 Statistical Data Mining and Machine Learning

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Learning outcomes. Knowledge and understanding. Competence and skills

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Using Artificial Intelligence to Manage Big Data for Litigation

MA2823: Foundations of Machine Learning

Computational Linguistics and Learning from Big Data. Gabriel Doyle UCSD Linguistics

ADVANCED MACHINE LEARNING. Introduction

Obtaining Value from Big Data

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Principles of Data Mining by Hand&Mannila&Smyth

Social Media Mining. Data Mining Essentials

Introduction to Big Data with Apache Spark UC BERKELEY

CS Introduction to Data Mining Instructor: Abdullah Mueen

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

TDA and Machine Learning: Better Together

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Statistical Models in Data Mining

Linear Threshold Units

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

SWIFT: A Text-mining Workbench for Systematic Review

Attend Part 1 (2-3pm) to get 1 point extra credit. Polo will announce on Piazza options for DL students.

CS Data Science and Visualization Spring 2016

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important

The Scientific Data Mining Process

Unsupervised and supervised dimension reduction: Algorithms and connections

Chapter ML:XI (continued)

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Information Management course

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015

Statistical Machine Learning

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

What is Data mining?

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Mammoth Scale Machine Learning!

from Larson Text By Susan Miertschin

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA)

Data Mining + Business Intelligence. Integration, Design and Implementation

Faculty of Science School of Mathematics and Statistics

DATA MINING TECHNIQUES AND APPLICATIONS

Statistics Graduate Courses

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Exploratory data analysis approaches unsupervised approaches. Steven Kiddle With thanks to Richard Dobson and Emanuele de Rinaldis

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Transcription:

Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan

ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out: for our calibration. Hand in your assignments beginning of class on 29th Jan. We are thinking roughly three assignments (Approximately) 2 competition/challenges, Clustering/data visualization challenge Prediction challenge with focus on feature extraction/selection

Lets get started...

DATA DELUGE Each time you use your credit card: who purchased what, where and when Netflix, Hulu, smart TV: what do different groups of people like to watch Social networks like Facebook, Twitter,... : who is friends with who, what do these people post or tweet about Millions of photos and videos, many tagged Wikipedia, all the news websites: pretty much most of human knowledge

Guess?

Social Network of Marvel Comic Characters! by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands

What can we learn from all this data?

WHAT IS MACHINE LEARNING? Use data to automatically learn to perform tasks better. Close in spirit to T. Mitchell s description

W HERE IS IT USED? Movie Rating Prediction

WHERE IS IT USED? Pedestrian Detection

WHERE IS IT USED? Market Predictions

W HERE IS IT USED? Spam Classification

MORE APPLICATIONS Each time you use your search engine Autocomplete: Blame machine learning for bad spellings Biometrics: reason you shouldn t smile Recommendation systems: what you may like to buy based on what your friends and their friends buy Computer vision: self driving cars, automatically tagging photos Topic modeling: Automatically categorizing documents/emails by topics or music by genre...

TOPICS WE HOPE TO COVER 1 Dimensionality Reduction: Principal Component Analysis (PCA), Canonical Component Analysis (CCA), Random projections, Compressed Sensing (CS), Independent Component Analysis (ICA), Information-Bottleneck, Linear Discriminant Analysis 2 Clustering and Mixture models: k-means clustering, gaussian mixture models, hierarchical clustering, link based clustering 3 Probabilistic Modeling & Graphical Models: Probabilistic modeling, MLE Vs MAP Vs Bayesian approaches, inference and learning in graphical models, Latent Dirichlet Allocation (LDA) 4 Some supervised learning: (if time permits) linear regression, logistic regression, Lasso, ridge regression, neural networks/deep learning,...

TOPICS WE HOPE TO COVER 1 Dimensionality Reduction: Principal Component Analysis (PCA), Canonical Component Analysis (CCA), Random projections, Compressed Sensing (CS), Independent Component Analysis (ICA), Information-Bottleneck, Linear Discriminant Analysis 2 Clustering and Mixture models: k-means clustering, gaussian mixture models, hierarchical clustering, link based clustering 3 Probabilistic Modeling & Graphical Models: Probabilistic modeling, MLE Vs MAP Vs Bayesian approaches, inference and learning in graphical models, Latent Dirichlet Allocation (LDA) 4 Some supervised learning: (if time permits) linear regression, logistic regression, Lasso, ridge regression, neural networks/deep learning,...

TOPICS WE HOPE TO COVER 1 Dimensionality Reduction: Principal Component Analysis (PCA), Canonical Component Analysis (CCA), Random projections, Compressed Sensing (CS), Independent Component Analysis (ICA), Information-Bottleneck, Linear Discriminant Analysis 2 Clustering and Mixture models: k-means clustering, gaussian mixture models, hierarchical clustering, link based clustering 3 Probabilistic Modeling & Graphical Models: Probabilistic modeling, MLE Vs MAP Vs Bayesian approaches, inference and learning in graphical models, Latent Dirichlet Allocation (LDA) 4 Some supervised learning: (if time permits) linear regression, logistic regression, Lasso, ridge regression, neural networks/deep learning,...

TOPICS WE HOPE TO COVER 1 Dimensionality Reduction: Principal Component Analysis (PCA), Canonical Component Analysis (CCA), Random projections, Compressed Sensing (CS), Independent Component Analysis (ICA), Information-Bottleneck, Linear Discriminant Analysis 2 Clustering and Mixture models: k-means clustering, gaussian mixture models, hierarchical clustering, link based clustering 3 Probabilistic Modeling & Graphical Models: Probabilistic modeling, MLE Vs MAP Vs Bayesian approaches, inference and learning in graphical models, Latent Dirichlet Allocation (LDA) 4 Some supervised learning: (if time permits) linear regression, logistic regression, Lasso, ridge regression, neural networks/deep learning,...

TOPICS WE HOPE TO COVER 1 Dimensionality Reduction: Principal Component Analysis (PCA), Canonical Component Analysis (CCA), Random projections, Compressed Sensing (CS), Independent Component Analysis (ICA), Information-Bottleneck, Linear Discriminant Analysis 2 Clustering and Mixture models: k-means clustering, gaussian mixture models, hierarchical clustering, link based clustering 3 Probabilistic Modeling & Graphical Models: Probabilistic modeling, MLE Vs MAP Vs Bayesian approaches, inference and learning in graphical models, Latent Dirichlet Allocation (LDA) 4 Some supervised learning: (if time permits) linear regression, logistic regression, Lasso, ridge regression, neural networks/deep learning,...

TOPICS WE HOPE TO COVER unsupervised learning {z } 1 Dimensionality Reduction: Principal Component Analysis (PCA), Canonical Component Analysis (CCA), Random projections, Compressed Sensing (CS), Independent Component Analysis (ICA), Information-Bottleneck, Linear Discriminant Analysis 2 Clustering and Mixture models: k-means clustering, gaussian mixture models, hierarchical clustering, link based clustering 3 Probabilistic Modeling & Graphical Models: Probabilistic modeling, MLE Vs MAP Vs Bayesian approaches, inference and learning in graphical models, Latent Dirichlet Allocation (LDA) 4 Some supervised learning: (if time permits) linear regression, logistic regression, Lasso, ridge regression, neural networks/deep learning,...

UNSUPERVISED LEARNING Given (unlabeled) data, find useful information, pattern or structure Dimensionality reduction/compression : compress data set by removing redundancy and retaining only useful information Clustering: Find meaningful groupings in data Topic modeling: discover topics/groups with which we can tag data points

DIMENSIONALITY REDUCTION You are provided with n data points each in R d Goal: Compress data into n, points in R K where K << d Retain as much information about the original data set Retain desired properties of the original data set Eg. PCA, compressed sensing,...

PRINCIPAL COMPONENT ANALYSIS (PCA) Eigen Face: Turk & Pentland 91 Write down each data point as a linear combination of small number of basis vectors Data specific compression scheme One of the early successes: in face recognition: classification based on nearest neighbor in the reduced dimension space

COMPRESSED SENSING From Compressive Sensing Camera Candes, Tao, Donaho 0 04 Can we compress directly while receiving the input? We now have cameras that directly sense/record compressed information... and very fast! Time spent only for reconstructing the compressed information Especially useful for capturing high resolution MRI s

INDEPENDENT COMPONENT ANALYSIS (ICA) Cocktail Party You are at a cocktail party, people are speaking all around you But you are still able to follow conversation with your group? Can a computer do this automatically?

INDEPENDENT COMPONENT ANALYSIS (ICA) Blind Source Seperation Bell & Sejnowski 95 Can do this as long as the sources are independent Represent data points as linear (or non-linear) combination of independent sources

DATA VISUALIZATION 2D projection Help visualize data (in relation to each other) Preserve relative distances among data-points (at least close by ones)

CLUSTERING K-means clustering Given just the data points group them in natural clusters Roughly speaking Points within a cluster must be close to each other Points between clusters must be separated Helps bin data points, but generally hard to do

TOPIC MODELLING Blei, Ng & Jordan 06 Probabilistic generative model for documents Each document has a fixed distribution over topics, each topic is has a fixed distribution over words belonging to it Unlike clustering, groups are non-exclusive

SUPERVISED LEARNING Training data comes as input output pairs (x, y) Based on this data we learn a mapping from input to output space Goal: Given new input instance x, predict outcome y accurately based on given training data Classification, regression PHOTO: Handout, Subaru

WHAT WE WON T COVER Feature extraction is a problem/domain specific art, we won t cover this in class We won t cover optimization methods for machine learning Implementation tricks and details won t be covered There are literally thousands of methods, we will only cover a few!

WHAT YOU CAN TAKE HOME How to think about a learning problem and formulate it Well known methods and how and why they work Hopefully we can give you an intuition on choice of methods/approach to try out on a given problem

DIMENSIONALITY REDUCTION Given data x 1,...,x n R d compress the data points in to low dimensional representation y 1,...,y n R K where K << d

WHY DIMENSIONALITY REDUCTION? For computational ease As input to supervised learning algorithm Before clustering to remove redundant information and noise Data visualization Data compression Noise reduction

DIMENSIONALITY REDUCTION Desired properties: 1 Original data can be (approximately) reconstructed 2 Preserve distances between data points 3 Relevant information is preserved 4 Redundant information is removed 5 Models our prior knowledge about real world Based on the choice of desired property and formalism we get different methods

SNEAK PEEK Linear projections Principle component analysis