Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe
Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle SAS Data Loader for Hadoop Demo
Market trends How much does this drive cost? 3 TB TODAY $115 2010 $270 2005 $3,720 2000 $33,000 1995 $3,360,000 1990 $33,600,000 1985 $315,000,000 1980 $1,312,500,000
Tech trends How long does it take to read 3 TB? 3 TB 1 disk 4.17 hr. 100 disks 2.5 min 1000 disks 15 sec
What is it? Distributed processing of large data sets across clusters of computers using simple programming models Single or multiples machines Data processing framework and a distributed file system for data storage (HDFS)
Traditional vs. In-Database vs. In-memory Traditional SAS In-Database In-Memory Data Store Data Store Data Store Data Data Data Memory SAS SAS SAS Even with In-Database processing there will still be some work performed on the SAS server Even with In-Database processing there will still be some work performed on the SAS server These approaches are complementary & can be combined for maximum effect
SAS and Hadoop SAS accesses and extracts data from Hadoop to a SAS server for processing, and writes results back SAS accesses and processes Hadoop data on SAS Servers while keeping the data and computations massively parallel SAS processes data directly in the Hadoop cluster
The Hadoop analytics lifecycle SAS Visual Analytics EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION SAS/Access to Hadoop SAS DI & Federation Server SAS ESP SAS Data Loader SAS Visual Analytics SAS Visual Statistics SAS In-Memory Statistics for Hadoop SAS Scoring Accelerator for Hadoop SAS Code Accelerator for Hadoop SAS Code Accelerator for Loader DEPLOY MODEL DATA EXPLORATION SAS/Access to Hadoop VALIDATE MODEL TRANSFORM & SELECT SAS DI & Federation Server Done using either the Data Preparation, SAS Data Exploration Loader or Build Model Tools SAS DQ Accelerator for Hadoop Done using the Build Model Tools and other checks BUILD MODEL SAS High Performance Analytics Offerings supported by relevant clients like SAS Enterprise Miner, SAS/STAT etc.
DEPLOY & MONITOR The Hadoop analytics lifecycle SAS/ACCESS SAS Data Management SAS Federation Server SAS Event Stream Processing SAS Data Loader for Hadoop SAS Data Quality Accelerator for Hadoop SAS Code Accelerator for Hadoop MANAGE DATA DATA EXPLORE SAS Data Loader for Hadoop SAS Visual Analytics SAS In-memory Statistics for Hadoop TEXT SAS Scoring Accelerator for Hadoop SAS Decision Manager SAS Visual Analytics DEVELOP MODELS SAS High Performance Analytics Products SAS Visual Statistics SAS In-memory Statistics for Hadoop
SAS Data Management Platform works seamlessly across Hadoop SAS Event Stream Processing Engine Access to HDFS, Hadoop scripting (Pig, Map Reduce ) and HIVE/Cloudera Impala through SAS coding and GUI + Reuse of DQ and ETL/ELT processing Hadoop Accelerated Clients BAU SAS DM clients SAS DI Studio All other DM Clients SAS/Access to Hadoop, SAS/Access to Impala, Other clients Third party clients + SAS BI + SAS Analytics + SAS Solutions Data virtualization & masking across Hadoop and other data stores BASE SAS, SAS Federation Server Self-service data manipulation in Hadoop + Loading into LASR RDBMS Web Based DM interface for Hadoop Bring streaming data from various sources into Hadoop and/or the RDBMS or generate events before data hits downstream store On-Hadoop data processing
SAS IN-DATABASE FOR HADOOP SAS Data Loader for Hadoop Code Accelerator for Hadoop Data Quality Accelerator for Hadoop Data Loader, the UI Scoring Accelerator for Hadoop Separately licensed product
Sas data loader for Hadoop Point & Click User Menus a new SAS Web-based Business user interface Little or no Hadoop experience needed Self-Service UI HTML 5 Interface Enables Self-Service approach to managing data in Hadoop environment
Web Based Data Management interface for Hadoop Capabilities Benefits Browser-based + point and click self-service approach No knowledge of Hadoop or SAS is required) Access and view data in Hadoop Query, filter, transform, summarize the data Load data into tables as well as SAS LASR SAS Data Quality Accelerator for Hadoop enable the casual user Improve data quality Minimize movement of data SAS Data Quality Accelerator for Hadoop and SAS Code Accelerator for Hadoop run in the Hadoop cluster
SAS Data Loader for Hadoop What is it? Web-based interface Easy-to-use HTML5 Execute code on the Hadoop cluster DS2, Hive and Data Quality Load data into SAS LASR server vapp
SAS Data Loader for Hadoop What is it? Non-IT or Business person Easy to configure (small configuration list)
vapp What is a vapp vapp stands for virtual Application Fully functional appliance containing a specific set of SAS Software Plug-and-Play environment Some vapp examples : SAS University Edition, SAS Data Loader and Visual Analytics 6,2 (Cloud only)
vapp Operating System Applications CPU vapp Ledger RAM Storage SAS Solution Network
vapp How does it integrate with the rest of the environment? Instructions Instructions/queries SAS Data Loader For Hadoop Registers Loaded LASR tables only Desktop Metadata Data
Sas data loader for Hadoop Client-Side requirements Laptop or desktop running Windows 7 (64-bit) 8 GB RAM minimum (16 GB preferred) HyperThreading enabled in the BIOS (VT-x or AMD-v) 20 GB of free disk space Capable of installing and running VMware 6 or 6+ Internet Explorer 9+, Firefox 14+, or Chrome 21+
SAS Data Loader for Hadoop Installation process Installation Pre-requisites Deploy Integrate Test VMWare Player Shared Folder Application page Navigate in Hadoop SAS Software Depot Hadoop Cluster SAS Embedded Process Firewall VM Configuration & deploy Startup Apply SAS License Hadoop configuration inside the Data Loader Optional : LASR Configuration Do a transformation Filter & query Run SAS Code Load to LASR
Key take-aways Existing SAS customers can leverage their SAS skills and existing data management assets developed with SAS when using Hadoop SAS Data Management provides the flexibility to work with Hadoop as a new data store alongside traditional data stores using a single platform SAS Data Management graphical user interfaces accelerate the adoption of Hadoop
Turning Data into Value
SAS & Hadoop, getting the value out of Big Data Big Data + Hadoop = Big Data Collection for the technical user Big Data + Hadoop + SAS = Accessibility for everybody in the organization Business users consume the big Hadoop data Business analysts explore & visualize Data Scientists develop and deploy analytical models Decisions built on fact based analytical insights into all of the data NEW workshop SAS & Hadoop, getting the value out of Big Data 18 Nov. 2014 All details on www.sas.com/belux/training
SAS & Hadoop, getting the value out of Big Data Big Data + Hadoop = Big Data Collection for the technical user Big Data + Hadoop + SAS = Accessibility for everybody in the organization Business users consume the big Hadoop data Business analysts explore & visualize Data Scientists develop and deploy analytical models Decisions built on fact based analytical insights into all of the data NEW workshop SAS & Hadoop, getting the value out of Big Data 18 Nov. 2014 All details on www.sas.com/belux/training
SAS Forum Twitter Contest Tweet to win prizes! 5. Which are the 2 core components of every Hadoop installation? A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce Tweet your answer: Prizes to win: Example: @spicyanalytics 5X 1 st prize: a ticket for Analytics 2015 2 nd prize: a book of Prof Bart Baesens: Analytics in a big data world 3 rd to 30 th prize: chocolates with pepper Start of your tweet Question # Your answer Winners will be contacted post-forum!
Turning Data into Value
SAS Data Loader for Hadoop A new SAS Web-based Business user interface Point & Click User Menus Little or no Hadoop experience needed Self-Service UI HTML 5 Interface Enables Self-Service approach to managing data in Hadoop environment
SAS Data Loader for Hadoop Transform Data in Hadoop Filtering Rules Column Selections Aggregation No coding, scripting or specialized skills required
SAS Data Loader for Hadoop Query Hadoop data Select Source Tables Apply Query Criteria See subset of data in Table Viewer Simple Drag & Drop approach to Query Data inside Hadoop
SAS Data Loader for Hadoop Profile Hadoop Data Select Source Table View Reports in Column Display View Reports in Table Display Run standard metrics on data inside Hadoop and generate reports
View Data
SAS Data Loader for Hadoop Copy Data to distributed sas lasr server Select Source Table Copy Data To distributed SAS LASR Servers Visualize Data SAS Visual Analytics Explore Hadoop data quickly and easily for faster insights Optional