Benefits of analytics using Microsoft Azure Machine Learning (ML) Tomaž Kaštrun @tomaz_tsql
Gold sponsors
Platinum sponsor
Special Thanks Special Thanks to SQL Saturday Bratislava Organizers! Making SQL Server community stronger, bigger and better!
Speaker info BI Developer (MSSQL Server, C#, SAS, R, SAP, Py) 10+ ys experience MSSQL Server 15+ ys experience data analysis and DM, Data Scientist (NO!) Working: Spar ICS Österreich, Spar Slovenija MCPT, MCT SQL Server tomaz.kastrun@gmail.com @tomaz_tsql http://tsqljokes.tumblr.com/ https://tomaztsql.wordpress.com Publishing articles, speaking at SQL events Coffee Lover, Fixie bikes junkie
Microsoft and Machine Learning
2015 -> new SQL Server 2016 -> R integration in SQL Server (mid-end 2016) -> CTP2 SQL Server 2016 available https://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2016 -> April 2015 Microsoft acquires Revolution analytics -> What to expect (not confirmed) - multi-threading R analytics within SQL Server - in memory R analytics (RRO, MKL from Revolution Analytics) - Azure extensions - R language systematization - R libraries systematization
Intro to R and ML Implementation of the S statistical programming language 1. Originally invented in Bell Labs (formerly AT&T) in 1976 2. R first release dates ~1993 (more @ http://r-project.org), y. 2000 first stable production use 3. Last stable release: 3.2.1 (June 18th, 2015) 4. Open source, functional (imperative) programming with support of OOP 5. Extremely powerful graphics capabilities 6. Cross-platform, multi-paradigm 7. CRAN huge R library repository (6679 libraries; June 19th, 2015) (http://cran.r-project.org/web/packages/) 8. Large and growing ML/R/Data science community
How + where to get R -> R on CRAN (Comprehensive R Archive Network http://cran-r.project.org -> R Studio http://www.rstudio.com Since April 1st, 2015, Microsoft officially acquires Revolution analytics http://revolutionanalytics.com Microsoft already announced integration of R in SQL Server 2016.
DEMO #1 Language R
Machine Learning (ML) -> Machine learning is predicting future based on past data -> characteristics of past data are constantly being tested for model improvements
Machine Learning (ML) - Benefits
Supervised VS. Unsupervised -> Supervised learning Linear Regression
Supervised VS. Unsupervised -> Unsupervised learning Cluster analysis
Common Machine Learning Algorithms
Azure ML -> Fully-managed & Scalabel cloud service -> Focus on ability to develop & deploy -> For data-scientist, for statiticians and for emerging data-scientists -> Friendly User-interface for data science workflow -> Wide range of ML algorithms -> R and Python integration -> Support for R libraries
Title
Basic ML Workflow (modules)
Azure ML Modules -> Machine Learning libraries are encapsulated in modules -> each module can perform a task in machine learning scenario -> Workflow is a set of modules, connected among each other, from reading the data, applying ML algorithm to generating result -> Categories: -> Data format Conversions -> Data input and output -> Data Transformation -> Machine Learning Modules -> Statistical functions -> OpenCV Library, R Execution, Python Execution
Azure ML Modules -> Each module has additional attributes, features for fine tunning of generated output -> Modules have ports for establishing connections -> Modules can also visualize, download and save the output.
Azure ML Modules (Data transformation)
Azure ML Modules (Learning Models)
Selecting Classification Algorithm How large is your training data? To Avoid over-fitting use high bias/low variance classifiers such as Naive Bayes Do you need to train incrementally or in a batched mode? If you need to update your classifier with new data frequently (or you have a lot of data), you probably want to use Bayesian algorithms that update well. Both neural nets and SVM need to work on the training data in batch mode. Is your data exclusively categorical or exclusively numeric or a mixture of both kinds? Bayesian works best with categorical/binomial data. Decision trees can't predict numerical values. Do you or your audience need to understand how the classifier works? Use Bayesian or Decision Trees, since these can be easily explained to most people. Neural networks and SVM are "black boxes" in the sense that you can't really see how they are classifying data. How fast does your classification need to be generated? SVM's are fast when it comes to classifying since they only need to determine which side of the "line" your data is on. Decision trees can be slow especially when they're complex (e.g. lots of branches). How much complexity does the problem present or require? Neural nets and SVMs can handle complex non-linear classification.
Selecting Regression Algorithm Bayesian Linear Regression Boosted Decision Tree Regression Decision Forest Regression Linear Regression Neural Network Regression Ordinal Regression Poisson Regression
Analysis Services (SSAS) Task / Problem Algorithm Predicting a discrete attribute Microsoft Decision Trees Algorithm Flag the customers in a prospective buyers list as good or poor prospects. Microsoft Naive Bayes Algorithm Calculate the probability that a server will fail within the next 6 months. Microsoft Clustering Algorithm Categorize patient outcomes and explore related factors. Microsoft Neural Network Algorithm Predicting a continuous attribute Microsoft Decision Trees Algorithm Forecast next year's sales. Microsoft Time Series Algorithm Predict site visitors given past historical and seasonal trends. Microsoft Linear Regression Algorithm Generate a risk score given demographics. Microsoft Linear Regression Algorithm Predicting a sequence Microsoft Sequence Clustering Algorithm Perform clickstream analysis of a company's Web site. Microsoft Sequence Clustering Algorithm Analyze the factors leading to server failure. Microsoft Sequence Clustering Algorithm Capture and analyze sequences of activities during outpatient visits, to formulate best practices around common activities. Microsoft Sequence Clustering Algorithm Finding groups of common items in transactions Microsoft Association Algorithm Use market basket analysis to determine product placement. Microsoft Decision Trees Algorithm Suggest additional products to a customer for purchase. Microsoft Decision Trees Algorithm Analyze survey data from visitors to an event, to find which activities or booths were correlated, to plan future activities. Microsoft Decision Trees Algorithm Finding groups of similar items Microsoft Clustering Algorithm Create patient risk profiles groups based on attributes such as demographics and behaviors. Microsoft Sequence Clustering Algorithm Analyze users by browsing and buying patterns. Microsoft Sequence Clustering Algorithm Identify servers that have similar usage characteristics. Microsoft Sequence Clustering Algorithm
Analysis Services vs. Azure ML On-premises vs. Cloud Pricing Administrating / Corporate environment Algorithms and statistics Data visualization (Profit & lift charts for DM, Classification matrix, Neural Networks, ) Integration of ML service in schema of Azure services vs. SQL Server edition
DEMO #2 Working with modules
Azure ML Modules R Extended
Title 29 6/20/2015 Footer Goes Here
DEMO #3 R Script in Azure
Azure ML API -> Already included as part of Azure subscription -> Provides connection to ML workflow and external application -> Prepared for users to predict or score the model -> Supports two modes of operation -> Request response Service (is a low latency, high scale WS for synchronous singular prediction) -> Batch execution Service (is a asynchronous WS for bulk predictions)
Azure ML API -> Advantages: -> Launch your model in minutes for real-time predictions -> Publish into Azure data market for selling predictions to your customers -> Integrate your client with cloud ML API in minutes by leveraging ready to execute code -> Make most of your existing R and Python code by embedding it within Execute-R of Execute-Py Module
DEMO #4 Azure ML API
Azure Pricing Machine Learning is offered in two tiers: Free and Standard. Free:Experience the Machine Learning Studio for free using up to 10GB of your own data. Standard:Adds the ability to work over larger data sets from a broader range of data sources and deploy machine learning algorithms into production as Web Services in the ML API Service.
Azure Pricing ML Seat Subscription Monthly Fee 7.43/ Seat/ Month ML Studio Usage Hourly 0.74/Experiment Hour ML API Usage Hourly 1.48/Production API Compute Hour Transactions 0.37/1,000 Production API Transactions Valid on: June 11th, 2015
Azure Pricing SOURCE: http://azure.microsoft.com/en-gb/pricing/details/machine-learning/