MACHINE LEARNING AN INTRODUCTION

Similar documents
Copyr i g ht 2014, SAS Ins titut e Inc. All rights res er ve d. WHAT S NEW IN SAS ANALYTICS 9.4

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

MS1b Statistical Data Mining

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Machine Learning Introduction

Machine Learning with MATLAB David Willingham Application Engineer

MACHINE LEARNING BRETT WUJEK, SAS INSTITUTE INC.

Learning outcomes. Knowledge and understanding. Competence and skills

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

An Introduction to Data Mining

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Machine Learning: Overview

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

The Data Mining Process

Using Data Mining for Mobile Communication Clustering and Characterization

Data Mining + Business Intelligence. Integration, Design and Implementation

Principles of Data Mining by Hand&Mannila&Smyth

MA2823: Foundations of Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning

Machine learning for algo trading

Data, Measurements, Features

Leveraging Ensemble Models in SAS Enterprise Miner

Introduction to Data Mining

Machine Learning for Data Science (CS4786) Lecture 1

Predictive Modeling and Big Data

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

An Overview of Knowledge Discovery Database and Data mining Techniques

Unsupervised Data Mining (Clustering)

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Statistics for BIG data

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

ADVANCED MACHINE LEARNING. Introduction

Chapter 12 Discovering New Knowledge Data Mining

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Machine Learning.

Machine Learning using MapReduce

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

DATA MINING TECHNIQUES AND APPLICATIONS

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining Applications in Higher Education

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Fast Analytics on Big Data with H20

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Social Media Mining. Data Mining Essentials

Data Mining Practical Machine Learning Tools and Techniques

Foundations of Artificial Intelligence. Introduction to Data Mining

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Data Mining Algorithms Part 1. Dejan Sarka

Evaluation of Machine Learning Techniques for Green Energy Prediction

Machine Learning What, how, why?

How To Perform An Ensemble Analysis

Data Mining: Overview. What is Data Mining?

SAS and Teradata Partnership

Programming Exercise 3: Multi-class Classification and Neural Networks

Predictive Modeling Techniques in Insurance

Dan French Founder & CEO, Consider Solutions

Big Data Analytics and Optimization

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Learning is a very general term denoting the way in which agents:

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Classification algorithm in Data mining: An Overview

MACHINE LEARNING BASICS WITH R

Role of Social Networking in Marketing using Data Mining

Analytics on Big Data

How To Make A Credit Risk Model For A Bank Account

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Supervised and unsupervised learning - 1

Predictive Data modeling for health care: Comparative performance study of different prediction models

Information Management course

Bayesian networks - Time-series models - Apache Spark & Scala

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Comparison of K-means and Backpropagation Data Mining Algorithms

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

Azure Machine Learning, SQL Data Mining and R

Deep learning applications and challenges in big data analytics

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Statistical Models in Data Mining

Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA

Optimizing content delivery through machine learning. James Schneider Anton DeFrancesco

Introduction to Data Mining

Unsupervised and supervised dimension reduction: Algorithms and connections

Predict Influencers in the Social Network

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

How To Identify A Churner

Role of Neural network in data mining

Transcription:

AN INTRODUCTION JOSEFIN ROSÉN, SENIOR ANALYTICAL EXPERT, SAS INSTITUTE JOSEFIN.ROSEN@SAS.COM TWITTER: @ROSENJOSEFIN

AGENDA What is machine learning? When, where and how is machine learning used? Exemple deep learning Machine learning in SAS

WHAT IS MACHINE LEARNING? Wikipedia: Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. SAS: Machine learning is a branch of artificial intelligence that automates the building of systems that learn from data, identify patterns, and make decisions with minimal human intervention.

WHAT IS MACHINE LEARNING? Complicated methods, consumable results

MULTIDISCIPLINARY NATURE OF DATA ANALYSIS Statistics Pattern Recognition Computational Neuroscience Data Science Data Mining Machine Learning AI Databases Information Retrieval

WHEN TO USE MACHINE LEARNING? When the predictive accuracy of a model is more important than the interpretability of a model. When traditional approaches are inappropriate, e.g. when you have: more variables than observations many correlated variables unstructured data fundamentally nonlinear or unusual phenomena

WHERE IS MACHINE LEARNING USED? A few examples: Recommendation applications Fraud detection Predictive maintenance Text analytics Self driving cars

UNSUPERVISED LEARNING Operated on unlabeled examples; Clustering, dimension reduction; INTRODUCTION TO MACHINE LEARNING

SUPERVISED LEARNING Trained on labeled examples; Classification, regression, prediction; INTRODUCTION TO MACHINE LEARNING

DEEP LEARNING Neural networks with more than two hidden layers Promising breakthroughs in speech, text and image recognition Useful for feature extraction

MNIST TRÄNINGSDATA 784 variables form a 28x28 digital grid 784-dimensional input vector X = (x 1,,x 784 ) Pixel intensity from 0 to 255 60,000 training pictures with label 10,000 test pictures without label

MNIST EXEMPEL Train a stacked denoising autoencoder Extract representative features from MNIST data Compare with principal component analysis, two PCs

STACKED DENOISING AUTOENCODER Uncorrupted Output Features Target Layer h5 Hidden Neurons h4 Hidden Neurons Hidden layers h3 Hidden Neurons Extractable Features h2 Hidden Neurons h1 Hidden Neurons Partially Corrupted Input Features Input Layer

Record ID Hidden Unit 1 Hidden Unit 2 1 0.98754 0.32453 2 0.76854 0.87345 3 0.87435 0.05464 h3 h2 h1 Hidden Neurons Hidden Neurons Hidden Neurons Extractable Features Partially Corrupted Input Features Input Layer Record ID Pixel 1 Pixel 2 Pixel 3 Pixel 4 Pixel 5 Pixel 6 Pixel 7 Pixel 8 Pixel 9 Pixel 10 1 0 0 0 0 0 5 8 11 6 3 2 0 0 0 0 10 20 45 46 36 24 3 0 25 37 32 40 64 107 200 67 46

FEATURE EXTRACTION DENOISING AUTOENCODER

FEATURE EXTRACTION - PCA

SAS MACHINE LEARNING ALGORITHMS Neural networks Decision trees Random forests Associations and sequence discovery Gradient boosting and bagging Support vector machines Nearest-neighbor mapping K-means clustering DBSCAN Self-organizing maps Local search optimization techniques such as genetic algorithms Expectation maximization Multivariate adaptive regression splines Bayesian networks Kernel density estimation Principal components analysis Singular value decomposition Gaussian mixture models Sequential covering rule building Model ensembles Recommendations

SAS PRODUCTS USING MACHINE LEARNING SAS Enterprise Miner SAS Text Miner SAS In-Memory Statistics for Hadoop SAS Visual Statistics SAS/STAT SAS/OR SAS Factory Miner

Algoritm SAS EM-noder SAS procedurer Regression Decision tree High Performance Regression LARS Partial Least Squares Regression Decision Tree High Performance Tree ADAPTIVEREG GAM GENMOD GLMSELECT HPGENSELECT HPLOGISTIC HPQUANTSELECT HPREG LOGISTIC QUANTREG QUANTSELECT REG ARBORETUM HPSPLIT Random forest High Performance Tree HPFOREST Gradient boosting Gradient Boosting ARBORETUM Neurala networks AutoNeural DMNeural High Performance Neural Neural Network HPNEURAL NEURAL Support vector machine High Performance Support Vector Machine HPSVM SUPERVISED LEARNING Naïve Bayes HPBNET* Neighbors Memory Based Reasoning DISCRIM *PROC HPBNET can learn different network structures (naïve, TAN, PC, and MB) and automatically select the best model

UNSUPERVISED LEARNING ALGORITMER Algoritm SAS EM-noder SAS procedurer A priori rules K-means klustring Spektral klustring Kernel density estimation Association Link Analysis Cluster High Performance Cluster FASTCLUS HPCLUS Custom lösning genom Base SAS och procedurerna DISTANCE och PRINCOMP KDE Kernel PCA Custom lösning genom Base SAS och procedurerna CORR, PRINCOMP och SCORE Singular value decomposition HPTMINE IML Self organizing maps SOM/Kohonen

SEMI-SUPERVISED LEARNING ALGORITMER Algoritm SAS EM-noder SAS procedurer Denoising autoencoders HPNEURAL NEURAL

WHY NOW? Big data Computational resources Powerful computers Cheap data storage. Space is big. You just won't believe how vastly, hugely, mind-bogglingly big it is Douglas Adams in The Hitchhiker's Guide to the Galaxy

More reading White papers http://www.sas.com/en_us/whitepapers/machine-learning-with-sas-enterprise-miner-107521.html http://support.sas.com/resources/papers/proceedings14/sas313-2014.pdf SAS links http://www.sas.com/en_us/insights/analytics/machine-learning.html http://www.sas.com/en_us/insights/articles/analytics/introduction-to-machine-learning-five-things-the-quants-wish-weknew.html SAS Data Mining Community https://communities.sas.com/community/support-communities/sas_data_mining_and_text_mining/ Big Data Matters Webinar Series: www.sas.com/bigdatamatters