Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1
Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the machine learning workflow with MATLAB 2
Motivation Do you want to create a model of a system? Understand dynamics Predict Outputs How do you create a model? Develop an equation Takes time to develop, sometimes even years Unknown if there is actually an equation at all Another option, Machine Learning 3
Used Across Many Application Areas Biology Financial Services Image & Video Processing Audio Processing Energy Tumor Detection, Drug Discovery Credit Scoring, Algorithm Trading, Bond Classification Pattern Recognition Speech Recognition Load, Price Forecasting, Trading 4
Machine Learning Characteristics and Examples Characteristics Lots of data (many variables) System too complex to know the governing equation (e.g., black-box modeling) Examples Pattern recognition (speech, images) Financial algorithms (credit scoring, algo trading) Energy forecasting (load, price) Biology (tumor detection, drug discovery) AAA 93.68% AA 2.44% A 0.14% BBB 0.03% BB 0.03% B 0.00% CCC 0.00% 5.55% 0.59% 92.60% 4.03% 4.18% 91.02% 0.23% 7.49% 0.12% 0.73% 0.00% 0.11% 0.00% 0.00% 0.18% 0.00% 0.73% 0.15% 3.90% 0.60% 87.86% 3.78% 8.27% 86.74% 0.82% 9.64% 0.37% 1.84% 0.00% 0.00% 0.00% 0.00% 0.08% 0.00% 0.39% 0.06% 3.28% 0.18% 85.37% 2.41% 6.24% 81.88% 0.00% 0.06% 0.08% 0.16% 0.64% 1.64% 9.67% D 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% AAA AA A BBB BB B CCC D 6
Challenges Machine Learning Significant technical expertise required No one size fits all solution Locked into Black Box solutions Time required to conduct the analysis 7
Overview Machine Learning Type of Learning Categories of Algorithms Unsupervised Learning Clustering Machine Learning Group and interpret data based only on input data Supervised Learning Develop predictive model based on both input and output data Classification Regression 8
Unsupervised Learning k-means, Fuzzy C-Means Hierarchical Clustering Neural Networks Gaussian Mixture Hidden Markov Model 9
Supervised Learning Regression Neural Networks Decision Trees Ensemble Methods Non-linear Reg. (GLM, Logistic) Linear Regression Classification Support Vector Machines Discriminant Analysis Naive Bayes Nearest Neighbor 10
Supervised Learning - Workflow Speed up Computations Select Model Data Train the Model Use for Prediction Import Data Explore Data Prepare Data Known data Known responses Model Model New Data Predicted Responses Measure Accuracy 11
Classification Overview What is classification? Predicting the best group for each point Learns from labeled observations Uses input features Why use classification? Accurately group data never seen before 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 How is classification done? Can use several algorithms to build a predictive model Good training data is critical 12
Clustering Overview What is clustering? Segment data into groups, based on data similarity 1 0.9 0.8 0.7 0.6 Why use clustering? Identify outliers Resulting groups may be the matter of interest 0.5 0.4 0.3 0.2 0.1 How is clustering done? Can be achieved by various algorithms It is an iterative process (involving trial and error) 0-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 15
Deploying MATLAB Applications to Excel Toolboxes 3 1 MATLAB Desktop End-User Machine 2 MATLAB Compiler MATLAB Builder EX.dll.bas 18
Deployment Highlights Database Servers.exe HADOOP Excel Desktop Applications Spreadsheets Application Servers.NET C Client Front End Applications Web Applications Java CTF Batch/Cron Jobs Royalty-free deployment Point-and-click workflow Unified process for desktop and server apps 19
MATLAB for Machine Learning Challenges Time (loss of productivity) Extract value from data Computation speed Time to deploy & integrate Technology risk MATLAB Solution Rapid analysis and application development High productivity from data preparation, interactive exploration, visualizations. Machine learning, Video, Image, and Financial Depth and breadth of algorithms in classification, clustering, and regression Fast training and computation Parallel computation, Optimized libraries Ease of deployment and leveraging enterprise Push-button deployment into production High-quality libraries and support Industry-standard algorithms in use in production Access to support, training and advisory services when needed 20
Machine Learning with MATLAB Interactive environment Visual tools for exploratory data analysis Easy to evaluate and choose best algorithm Apps available to help you get started (e.g,. neural network tool, curve fitting tool) Multiple algorithms to choose from Clustering Classification Regression 21
Learn More : Machine Learning with MATLAB http://www.mathworks.com/discovery/ machine-learning.html Data Driven Fitting with MATLAB Classification with MATLAB Regression with MATLAB Multivariate Classification in the Life Sciences Electricity Load and Price Forecasting Credit Risk Modeling with MATLAB 22
Training Services Exploit the full potential of MathWorks products Flexible delivery options: Public training available worldwide Onsite training with standard or customized courses Web-based training with live, interactive instructor-led courses Self-paced interactive online training More than 30 course offerings: Introductory and intermediate training on MATLAB, Simulink, Stateflow, code generation, and Polyspace products Specialized courses in control design, signal processing, parallel computing, code generation, communications, financial analysis, and other areas 23
Continuous Improvement Consulting Services Accelerating return on investment A global team of experts supporting every stage of tool and process integration Process and Technology Standardization Process and Technology Automation Process Assessment Advisory Services Component Deployment Full Application Deployment Jumpstart Migration Planning Research Advanced Engineering Product Engineering Teams Supplier Involvement 24
Technical Support Resources Over 100 support engineers All with MS degrees (EE, ME, CS) Local support in North America, Europe, and Asia Comprehensive, product-specific Web support resources High customer satisfaction 95% of calls answered within three minutes 70% of issues resolved within 24 hours 80% of customers surveyed rate satisfaction at 80 100% 25
MATLAB Central Community for MATLAB and Simulink users Over 1 million visits per month File Exchange Upload/download access to free files including MATLAB code, Simulink models, and documents Ability to rate files, comment, and ask questions More than 12,500 contributed files, 300 submissions per month, 50,000 downloads per month Newsgroup Blogs Web forum for technical discussions about MathWorks products More than 300 posts per day Commentary from engineers who design, build, and support MathWorks products Open conversation at blogs.mathworks.com Based on February 2011 data 26
Questions? 27