How to Build MicroStrategy Projects on Top of Big Data Sources in the Cloud Jochen Demuth, Director, Partner Engineering
Use Cases for Big Data in the Cloud Four broad categories and their value Traditional sources moving online Digital exhaust from interactions Company, Government, Financial sector, Business and consumer studies, Surveys, Polls Online click-stream, Application logs, Call/service records, ID scans, Security cameras All business performance drivers Operational efficiency, Revenue management, Strategic planning New revenue sources, Consumer promotions, Risk management, Fraud detection Web 2.0 phenomenon Internet of things Content generated from social media posts, tweets, blogs, pictures, videos, ratings Machine generated sensor data and machine to machine communication Customer engagement, Customer service, Brand management, Viral marketing Operational efficiency, Cost control, Risk avoidance
Traditional sources moving online How to take advantage of new technologies Traditional relational data sources in the cloud RDBMS installed in the cloud (e.g. HP Vertica on Amazon EC2) Managed RDBMS in the cloud (e.g. Amazon RDS) Relational Database technology build for the cloud, e.g. Amazon AWS (EMR, Redshift, Aurora) Google BigQuery RDBMS vendor cloud services (e.g. Microsoft, Oracle, Teradata, HP, IBM, SAP, ) Cloud services simplify and automate many aspects of data management, but there are still application specific aspects that need conscious control 3
Some Database Features Require Conscious Design Choices Query time often dominated by data access with significant performance impact Data organization Columnar vs. row based Minimize data access Partitioning key selection Data sorting (Index selection/strategy) Compression (on/off; algorithm) Approximate calculation (e.g. HyperLogLog) Access and process data in parallel Data distribution in MPP databases to minimize data movement Existing best practices for developing MicroStrategy applications apply Make sure to take advantage of db features designed for analytical workloads Look for best practices to take advantage of data source strengths in MicroStrategy Community 4
Use Cases for Big Data in the Cloud Four broad categories and their value Traditional sources moving online Digital exhaust from interactions Company, Government, Financial sector, Business and consumer studies, Surveys, Polls Online click-stream, Application logs, Call/service records, ID scans, Security cameras All business performance drivers Operational efficiency, Revenue management, Strategic planning New revenue sources, Consumer promotions, Risk management, Fraud detection Web 2.0 phenomenon Internet of things Content generated from social media posts, tweets, blogs, pictures, videos, ratings Machine generated sensor data and machine to machine communication Customer engagement, Customer service, Brand management, Viral marketing Operational efficiency, Cost control, Risk avoidance
Identifying Value in Data Requires Utmost Flexibility Static data models get in the way of analysis at the speed of thought Digital exhaust from interactions Online click-stream, Application logs, Call/service records, ID scans, Security cameras New revenue sources, Consumer promotions, Risk management, Fraud detection Technical Characteristics: Unknown data sources are analyzed for potential new business value. Analysis necessary to support the development of new business models Data models don t exist (yet). 6
MicroStrategy Supports All Analytic Needs Some People Produce Analytics While Others Consume Analytics Analytical Complexity User Scale Back Office Front Line Data Scientists Business Analysts Business Users Trained in modeling and coding Use a variety of tools Want their favorite tools Look for the truth Analytical amateurs Power users of BI tools Want to use the right tool Look for the business edge Make the daily decisions Some may be power users Most need simple tools Look for actionable information
MicroStrategy Provides Flexible Data Modeling Options Choose how to access and analyze data Direct Modeled Report Dashboard Visual Insight Report Dashboard Visual Insight Flexible data access Schema on read Supports quick iterations Reusable Objects Unified MicroStrategy Metadata Reusable Data Reusable Objects Reusable Design ID scans Online clickstream Application logs Call/service records
Use Cases for Big Data in the Cloud Four broad categories and their value Traditional sources moving online Digital exhaust from interactions Company, Government, Financial sector, Business and consumer studies, Surveys, Polls Online click-stream, Application logs, Call/service records, ID scans, Security cameras All business performance drivers Operational efficiency, Revenue management, Strategic planning New revenue sources, Consumer promotions, Risk management, Fraud detection Web 2.0 phenomenon Internet of things Content generated from social media posts, tweets, blogs, pictures, videos, ratings Machine generated sensor data and machine to machine communication Customer engagement, Customer service, Brand management, Viral marketing Operational efficiency, Cost control, Risk avoidance
The Web 2.0 Phenomenon Introduces Specific Challenges Data access, data structure, and data meshing Web 2.0 phenomenon Content generated from social media posts, tweets, blogs, pictures, videos, ratings Customer engagement, Customer service, Brand management, Viral marketing Data often requires structuring or flattening for analysis For optimal value data from multiple sources need to be put in context Access data where it exists Web 2.0 data stored in relational data sources Online services that also provide data services E.g. Salesforce.com Online services that provide data Social Government Weather MicroStrategy offers three ways to access Web 2.0 data 10
No Data Left Behind Optimized connectors to your entire Big Data ecosystem Big Data & NoSQL Elastic Map Reduce BigInsights Columnar Databases Redshift Bring All Relevant Data to Decision Makers Data Warehouse Appliances Relational Databases Multidimensional Databases HANA Analysis Services Parallel Data Warehouse SaaS-Based App Data User / Departmental Data
Three Ways to Query Multi-structured Data MicroStrategy Analytics Platform Dashboards Self-Service Analytics Reports and Statements OLAP Analysis DATA PROCESSING, ANALYTICS & DELIVERY 1. Direct connection to source Parse structure with lightweight Schema-on-read functions Import data or Create a modeled environment 2. Using Web Services Requires data to be exposed as a Web Service Data will need to be structured prior to access 3. Offline Process and Store Using specialty analytics (text, streaming, image processing) and stored as structured Text Analytics Module DATA STORAGE Semi-Structured Data Web Logs Social media posts Surveys Server Logs Geo-spatial Unstructured Data E-mail Image Audio Video Sensor + Machine Data Documents
MicroStrategy Offers Several Paths to Mesh Data For Analysis Integrating Modeled BI and Self-Service BI Structured Data: Architect Corporate Data Sources Structured Join: Multi-Source Model Multi-Source Pushdown Joins Structured BI Content Consumption Dashboards and MicroApps Cubes from Model Join Datasets in Documents Ad Hoc / Visual Insight Local / Dept Data Sources Cubes from Import Self Service Data: Data Import Self Service Join: Document Data Blending Self Service BI Content Creation
Use Cases for Big Data in the Cloud Four broad categories and their value Traditional sources moving online Digital exhaust from interactions Company, Government, Financial sector, Business and consumer studies, Surveys, Polls Online click-stream, Application logs, Call/service records, ID scans, Security cameras All business performance drivers Operational efficiency, Revenue management, Strategic planning New revenue sources, Consumer promotions, Risk management, Fraud detection Web 2.0 phenomenon Internet of things Content generated from social media posts, tweets, blogs, pictures, videos, ratings Machine generated sensor data and machine to machine communication Customer engagement, Customer service, Brand management, Viral marketing Operational efficiency, Cost control, Risk avoidance
Find Insights in Vast Amounts of Machine Generated Data Machine generated data often does not lend itself for traditional OLAP analysis Internet of things Machine generated sensor data and machine to machine communication Operational efficiency, Cost control, Risk avoidance Apply the methods of predictive analytics and data mining to machine generated data
MicroStrategy Support for Predictive Analytics All of the most commonly used techniques are supported Which Techniques Do You Use Most Primary Work Horses of Data Mining = via R = via PMML = MicroStrategy Native Source: 2013 Rexer Data Miner Surveys www.rexeranalytics.com Over 1,250 Data Miners from 75 Countries
Predictive Analytics Are Part of MicroStrategy Function Library 17 Reporting Average Mean Count Sum Maximum Minimum Median Mode Product Rank Percentile N -Tile N-tile by Step N-tile by Value N-tile by Step and Value Standard Deviation Standard Deviation of a Population Variance Variance of a Population Absolute Integer A-cosine Ln Hyp A-cos A-sine Log10 Hyp A-sine A-tan Power A-tan2 Quotient Hyp A-tanRadians Ceiling Combine Round Cosine Sine Date and Time Add Days Add Months Current Date Current Date & Time Current Time Day of Month Day of Week Day of Year Days Between Month Start Date Month End Date Months Between Year Start Date Year End Date Statistical Aggregate Math Functions Log Mod Hyp Cosine Degrees Square Root Exponent Tan Factorial Hyp Tan Floor Truncate Geometric Mean Average Deviation Kurtosis Skew Randbetween Hyp Sine Beta Beta Inverse Binomial Probability Chi Chi Inverse Confidence Correlation Coefficient Covariance Critical Binomial Chi Test (Independence) Cumulative Binomial Exponent F-Probability F-Test Fisher Transformation Gamma Gamma Inverse Gamma Logarithm Homoscedastic Ttest Accrued Interest Accrued Interest Maturity Amount Received at Maturity Bond-equivalent Yield for T-BILL Convert Dollar Price from Fraction to Decimal Convert Dollar Price from Decimal to Fraction Cumulative Interest Paid on Loan Cumulative Principal Paid on Loan Depreciation for each Accounting Period Days In Coupon Period to Settlement Date Days In Coupon Period with Settlement Date Days from Settlement Date to Next Coupon Double-Declining Balance Method Discount Rate For a Security Effective Annual Interest Rate Fixed-Declining Balance Method Future Value Future Value of Initial Principal with Compound Statistical Heteroscedastic Ttest Hypergeometric Intercept Point Inverse of Lognormal Cumulative Inverse of F Probability Inverse of Fisher Inverse of the Std Normal Cumulative Inverse of the T- Lognormal Cumulative Mean T-Test Negative Binomial Normal Cumulative Normal Inverse Number of Permutations for a Given Object Paired T-test Poisson (Predict Number of Events) Pearson Product Moment Correlation Coefficient RSQ (Square of Pearson) Slope of Linear Regression STEYX (Standard Error of Predicted y Value) Standardize Standard Normal Cumulative T- Variance Test Weibull (Reliability Analysis) Financial Interest Rates Interest Rate Interest Payment Internal Rate of Return Interest Rate per Annuity Macauley Duration Modified Duration Modified Internal Rate of Return Next Coupon Date After Settlement Date No of Coupons Settlement and Maturity Date Nominal Annual Interest Rate No of Investment Periods Net Present Value Odd First period Yield Odd Last Period Prev Coupon Date Before Settlement Date Price Per $100 Face Value w Odd First Period Payment Association Rules Time Series Clustering Train Association General Regression Train Clustering Mining Train Decision Tree Neural Network Train Regression Regression Train Time Series Rule Set Tree Model Support Vector Machine Variants OLAP Functions Running Total Running Std Deviation Running Std Deviation of Population Running Minimum Running Maximum Running Count Moving Difference Moving Maximum Moving Minimum Moving Average Data Mining Moving Sum Moving Count Moving Std Deviation Moving Std Deviation of Population First or Last Value in Range Exponential Weight Moving Avg Exponential Weight Running Avg Payment on Principal Price Price Discount Price at Maturity Present Value Prorated Depreciation for each Period Straight Line Depreciation Sum-Of-Years' Digits Depreciation T-BILL Price T-BILL Yield Variable Declining Balance Yield Yield for Discounted Security Yield at Maturity
Easy Integration with Third Party Analytical Models Deploy Any of 5000+ Open Source R Analytics Import Predictive Models from Popular Packages Create Your Own Custom Functions MicroStrategy R Integration Pack PMML Model ƒ Apply (X) MicroStrategy Custom Function Plug-in As a MicroStrategy metric, use models and functions in any report or dashboard
The Full Range of Advanced Analytics from One Place Optimization What do we want to happen? Analytical Maturity Predictions Relationship Analysis Benchmarking Trend Analysis What is likely to happen based on past history? What factors influence activity or behavior? How are we doing versus comparables? What direction are we headed in? World s most popular advanced analytics tool. Free, open source. Data Summarization What is happening in the aggregate? More Industry s most powerful SQL Engine and 300+ native analytical functions Specialty Tools
MicroStrategy Supports All Use Cases for Big Data in the Cloud Analytical platform that provides the flexibility to enable modern analysis Traditional sources moving online Digital exhaust from interactions Company, Government, Financial sector, Business and consumer studies, Surveys, Polls Online click-stream, Application logs, Call/service records, ID scans, Security cameras All business performance drivers Operational efficiency, Revenue management, Strategic planning New revenue sources, Consumer promotions, Risk management, Fraud detection Web 2.0 phenomenon Internet of things Content generated from social media posts, tweets, blogs, pictures, videos, ratings Machine generated sensor data and machine to machine communication Customer engagement, Customer service, Brand management, Viral marketing Operational efficiency, Cost control, Risk avoidance