SAS FOR BIG DATA PRESENTED BY: BRAD HATHAWAY
SAS AND BIG DATA SOME KEY TAKEAWAYS FROM THE VIDEO Combining Big Data and Analytics Hadoop allows capturing unlimited amounts of diverse data many companies are using this to create a Data Lake Extracting value from the lake requires analytics which makes SAS a natural complement to Hadoop One thing the video didn t mention: the longer the data stays in the Data Lake, the better your performance and overall experience will be. It is critical to have as much processing in Hadoop as possible.
SAS BUSINESS ANALYTICS FRAMEWORK... GIVES SAS CUSTOMERS THE POWER TO KNOW! Each area is a market on its own! SAS is ranked as a leader in pretty much all of them! Our customers are now shifting their attention to how each of these areas interact with Hadoop!
AGENDA What is Hadoop? (a quick refresher) Two Hadoop Approaches Data Platforms with Hadoop BI & Analytics on Hadoop SAS on Hadoop a taste of technology Data Quality Accelerator on Hadoop Self-Service DI on Hadoop SAS Visual Statistics
WHAT IS HADOOP? A QUICK REFRESHER
WHAT IS HADOOP? DICTIONARY DEFINITION Hadoop is one way of using a set of cheap computers to store an enormous amount of data and then to process that data in parallel."
WHAT IS HADOOP? MAKING HADOOP EASY AND ENTERPRISE READY
Total Cost WHAT IS HADOOP? AS A DATA PLATFORM, STORAGE COSTS ARE MUCH LOWER $18 000 000,00 $16 000 000,00 $14 000 000,00 $12 000 000,00 $10 000 000,00 $8 000 000,00 $6 000 000,00 Hadoop Teradata Warehouse Appliance Oracle Exadata IBM Netezza $4 000 000,00 $2 000 000,00 $0,00 1 10 100 1000 Number of Gigabytes
WHAT IS HADOOP? PROJECTS FOR THE HADOOP STACK
SUPPORTING EVIDENCE THE TREND IS UP! Source: SandHill Group, Do You Hadoop? A Survey of Big Data Practitioners October 29, 2013
TWO HADOOP APPROACHES
TWO STARTING POINTS NOT MUTUALLY EXCLUSIVE BUT OFTEN NOT SEEN TOGETHER! Hadoop as a Data Platform (standalone or as part of a broader ecosystem) Hadoop as a core component of the next generation of BI and Analytics EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION DEPLOY MODEL DATA EXPLORATION VALIDATE MODEL BUILD MODEL TRANSFORM & SELECT.. to support an IT Transformation.. to support innovative business usage
DATA PLATFORMS WITH HADOOP
WHERE WE ARE TODAY? SETTING THE SCENE Operational Data Sources Operational Data Sources: Traditional sources include ERP, CRM and financial systems amongst others. Evolving sources that include unstructured data from places like Twitter, LinkedIn etc. and streaming data from the Internet of Things (sensors etc.)
WHERE WE ARE TODAY? SETTING THE SCENE BI and Analytics Operational Data Sources Data Mart Data Mart EDW Analytic Mart Analytic Mart Unstructured, Semi-structured and Streaming data (i.e. sensor data) often handled outside the Warehouse flow
WHERE DOES HADOOP FIT? HADOOP AS A NEW DATA STORE BI and Analytics Operational Data Sources Data Mart Data Mart EDW Analytic Mart Analytic Mart
WHERE DOES HADOOP FIT? HADOOP AS AN ADDITIONAL INPUT TO THE EDW BI and Analytics Operational Data Sources Data Mart Data Mart EDW Analytic Mart Analytic Mart Analytic Mart Data Mart
WHERE DOES HADOOP FIT? HADOOP DATA PLATFORM AS A BASIS FOR BI AND ANALYTICS BI and Analytics Operational Data Sources EDW Analytic Mart Data Mart Data Mart Data Mart Analytic Mart Analytic Mart
WHERE DOES HADOOP FIT? HADOOP DATA PLATFORM AS A STAGING LAYER AS PART OF A DATA LAKE Downstream stores could be Hadoop, data appliances or an RDBMS Operational Data Sources EDW BI and Analytics Data Mart Data Mart Analytic Mart Analytic Mart
HIGH LEVEL VIEW WHAT YOU CAN DO WITH SAS AND HADOOP WHEN IT COMES TO USING HADOOP AS A DATA PLATFORM Today... Coming very soon BASE SAS Map Reduce + Pig Scripting + HDFS Commands SAS/Access to Hadoop Hive, Hive2 + Direct file access SAS/Access to Impala (Cloudera only) SAS Data Integration Studio (Transforms) in Data Management Standard / Advanced: Read/Write HDFS files Submit HiveQL code Execute Map/Reduce code Submit Pig Latin Transfer data to/from Hadoop using Hadoop utilities SQL transforms pushed down with Access to Hadoop engine SAS Federation Server Virtual and secure access to Hadoop and more traditional sources Everything we have today plus... SAS Data Quality Accelerator for Hadoop - Execute selected DQ routines in Hadoop SAS Code Accelerator for Hadoop - Execute SAS DS2 code in Hadoop New Web Based Business User Interface Point and click data management routines where data stays in Hadoop HTML 5 Web based interface SAS Event Stream Processing Engine To bring streaming data from Sensors into Hadoop
BI & ANALYTICS ON HADOOP
WHEN IT COMES TO BI / REPORTING TWO SIMPLE THINGS TO REMEMBER Data for data visualization, and reporting sourced from Hadoop but the actual visualization / reporting is not running on Hadoop Hadoop cluster processors used for data visualization, exploration and reporting SAS/Access just like we do with an RDBMS A In-Memory B More or less business as usual Transformational
WHEN IT COMES TO BI / REPORTING WHAT YOU CAN DO WITH SAS AND HADOOP WHEN IT COMES TO USING HADOOP AS PART OF BI Data for data visualization, and reporting sourced from Hadoop but the actual visualization / reporting is not running on Hadoop Hadoop cluster processors used for data visualization, exploration and reporting Any SAS BI Product: SAS Visual Analytics SAS Office Analytics SAS Enterprise Guide SAS BI/EBI Server SAS Stored Processes and batch programs for reporting A In-Memory Exploration, Visualization & Reporting SAS Visual Analytics B
WHEN IT COMES TO ANALYTICS THREE SIMPLE THINGS TO REMEMBER Data for Analytics sourced from Hadoop but no Analytics running on Hadoop Hadoop cluster processors used for Analytical Computation Analytics deployed for batch execution in Hadoop Think SAS/Access just like we do with an RDBMS Think In-Memory Analytics Think In-Database just like with an RDBMS C D E More or less business as usual Transformational Operational
WHEN IT COMES TO ANALYTICS WHAT YOU CAN DO WITH SAS AND HADOOP WHEN IT COMES TO USING HADOOP AS PART OF ANALYTICS Data for Analytics sourced from Hadoop but no Analytics running on Hadoop Hadoop cluster processors used for Analytical Computation Analytics deployed for batch execution on Hadoop Any SAS Analytics Product: SAS Enterprise Miner SAS Forecast Server SAS/STAT etc. C D E In-Memory Interactive Analytics SAS Visual Statistics SAS In-Memory Statistics for Hadoop Operational Analytics SAS Scoring Accelerator for Hadoop SAS Code Accelerator for Hadoop
THE ANALYTICS LIFECYCLE STRATEGY: ENABLE THE ENTIRE LIFECYCLE ON HADOOP SAS Visual Analytics EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION SAS Visual Analytics SAS Visual Statistics SAS In-Memory Statistics for Hadoop SAS Scoring Accelerator for Hadoop SAS Code Accelerator for Hadoop DEPLOY MODEL DATA EXPLORATION VALIDATE MODEL TRANSFORM & SELECT Done using either the Data Preparation, Data Exploration or Build Model Tools Done using the Build Model Tools and other checks BUILD MODEL SAS High Performance Analytics Offerings supported by relevant clients like SAS Enterprise Miner, SAS/STAT etc.
SAS ON HADOOP A TASTE OF TECHNOLOGY
SAS DI STUDIO FLOW INCLUDING HADOOP DATA HADOOP Access Hadoop SAP Combine with other data, Transform & Load TERADATA ORACLE SAS FEDERATION SERVER SAS DB2
SAS DI STUDIO MANAGE DATA IN HADOOP STANDALONE Access data in Hadoop Transform data inside Hadoop using HiveQL Creating new data in Hadoop
DATA MANAGEMENT FOR HADOOP HIGH PERFORMANCE IN-HADOOP DATA PROCESSING SAS Servers Hadoop Cluster MapReduce, Pig, HiveQL SAS Code Accelerator SAS Data Quality Accelerator Harness the power of the Hadoop distributed platform, big data, and SAS data management capabilities High performance in-database processing Native capabilities (HiveQL, Pig, MR) + Value-Added capabilities SAS Code Accelerator SAS Data Quality Accelerator Embedded into Hadoop HDFS / Raw Files SAS LEVERAGES HADOOP FOR MAXIMUM PERFORMANCE
DATA MANAGEMENT FOR HADOOP SELF-SERVICE DATA QUERY AND TRANSFORMATION New SAS Web-Based Business User Interface Users are able to manage big data Query, Select, Filter, Summarize & Transform data Use data quality Load data into SAS LASR Hadoop Cluster HiveQL, Pig, MapReduce SAS Code Accelerator SAS Data Quality Accelerator Feature preview: https://www.youtube.com/watch?v=6-9zckqjcus
SAS DATA DIRECTOR
SAS DATA DIRECTOR
SAS DATA DIRECTOR
SAS DATA DIRECTOR
SAS DATA DIRECTOR
SAS DATA DIRECTOR
SAS DATA DIRECTOR
SAS DATA DIRECTOR
SAS VISUAL STATISTICS 6.4 EXTENDING SAS VISUAL ANALYTICS FOR MORE ANALYTIC CONTROL AND TARGETED ACTIONS
Advanced Modeling Techniques
IN SUMMARY SAS BUSINESS ANALYTICS FRAMEWORK... GIVES OUR CUSTOMERS THE POWER TO KNOW! SAS does this with support of Hadoop in all core areas this is unique to SAS! Any business use case you can think of will need all of these!
THANK YOU!