Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy
Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics Platform connect to Hadoop prior to MicroStrategy Release 10? Hadoop YARN How can MicroStrategy Analytics Platform connect to Hadoop with Release 10 (Big Data Engine (BDE))? Usage patterns for MicroStrategy with Hadoop as a Data Source Demo of an Use Case
Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single Database MapReduce & NOSQL Databases Elastic Map Reduce BigInsights Distribution HDFS Columnar Databases Redshift Data Warehouse Appliances HANA Parallel Data Warehouse Relational Databases Multidimensional Databases Analysis Services SaaS-Based App Data Google Analytics Zendesk Generic Web Services SOAP REST Generic Web Services with OAuth..many more.. User / Departmental Data Clipboard MicroStrategy Dataset
Why Hadoop? By 2016, the expectation is majority of enterprise data will be processed in Apache Hadoop Why? Volume Volume Orders of magnitude larger than conventional data (Petabytes, Exabytes) Use commodity hardware Variety Variety Structured, semi-structured, unstructured formats Velocity Velocity Speed of ingesting incoming data streams
Why Hadoop? Hadoop can be challenging! CONNECTIVITY ODBC/JDBC/PIG Connector DATAWAREHOUSE INFRASTRUCTURE Hive SQL on Hadoop Pig DATA PROCESSING MapReduce Framework DATA STORAGE Hadoop Distributed File Systems (HDFS) Traditionally, connection to HDFS need intermediate layers and overhead (Hive/ Pig, etc.) that generate MapReduce jobs. MapReduce can be relatively complicated, and harder skill to master. MapReduce is a model for processing large data sets with a parallel, distributed algorithm on a Hadoop cluster.
MapReduce ODBC / JDBC How does MicroStrategy connect with Hadoop prior to MicroStrategy 10? SQL on Hadoop Translates SQL to MapReduce (Cloudera Impala, BigSQL, Shark) Apache Hive Translates HiveQL to MapReduce Apache Pig Pig-Latin script to generate MapReduce All need an additional layer(s) and overhead (ODBC or JDBC) between MicroStrategy and HDFS MicroStrategy generates either SQL (SQL on Hadoop), HiveQL or Freeform Pig, that in turn create MapReduce jobs to get data from HDFS
Hadoop 2.0 - YARN Prior to Hadoop 2.0 to crunch data in Hadoop you wrote or generated MapReduce via Hive PIG SQL on Hadoop Hadoop 2.0 YARN Yet Another Resource Negotiator YARN s execution model is more generic than the earlier MapReduce implementation. YARN can run applications that do not follow the MapReduce model Hadoop YARN is an attempt to take Apache Hadoop beyond MapReduce for dataprocessing. The current Apache MapReduce version is built over Apache YARN Internal MicroStrategy tests show a 5x improvement in speed when moving to YARN External data shows 3 min for YARN vs. 3 hours for MapReduce! MapReduce paradigm is hard to master! How YARN Opens Doors to Easier Programming Tools for Hadoop 2.0 Users - John Lilley
MicroStrategy Taps into Hadoop Natively using YARN MicroStrategy 9.4.1 and Prior Hadoop MicroStrategy Analytics Platform Hive ODBC Connector Hadoop Distribution Hive HDFS MicroStrategy 10 NEW MicroStrategy Analytics Platform Big Data Engine / Hadoop Gateway Hadoop HDFS Big Data Engine / Hadoop Gateway gives us higher performance, as we bypass the Hive/MapReduce layer and use YARN Ability to consume unstructured data natively from Hadoop! No ODBC or other overhead 12
Connect Live How does MicroStrategy Big Data Engine (BDE) / Hadoop Gateway work? Big Data Engine (BDE) / Hadoop Gateway is a native YARN application that enables direct access to HDFS BDE component would be installed on the Hadoop cluster Data Data. partition partition Parallel Partitioned In-Memory Cube BDE creates the metadata on the fly when files are selected and imported With internal testing, BDE/Hadoop Gateway is at least 5 times faster than other comparable Hive based technologies. Data Node Big Data Execution Engine Data Node Big Data Execution Engine Name Node Big Data Query Engine Hadoop Cluster
Usage Patterns for MicroStrategy with Hadoop as a Data Source 1.Visually explore subject matter extract in-memory through a one-time query to Hadoop 2.Self-service parameterized queries directly to Hadoop 3.Model-driven access to Hadoop. 4.Query multi-source schema model and drill down among Intelligent Cubes, EDW, Hive Multi-dimensional Business Model RDBMS ETL Maturity of Data Access
Three Steps for Self Service Access to Hadoop with Native Connectivity Import Data from HDFS directly Cleanse, Refine with Data Wrangler Analyze with Visual Insight Cleanse, refine and transform data from HDFS, make it ready for analysis. Designed for business users Get full insights from Hadoop/HDFS data using Visual Insight Web logs, survey/feedback forms, machine generated data 15
Demo Demo
Questions? Q&A