Hadoop & SAS Data Loader for Hadoop



Similar documents
WHAT S NEW IN SAS 9.4

What's New in SAS Data Management

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Bringing the Power of SAS to Hadoop. White Paper

ANALYTICS MODERNIZATION TRENDS, APPROACHES, AND USE CASES. Copyright 2013, SAS Institute Inc. All rights reserved.

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

and Hadoop Technology

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

The Future of Data Management

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Introducing Oracle Exalytics In-Memory Machine

Cost-Effective Business Intelligence with Red Hat and Open Source

Extend your analytic capabilities with SAP Predictive Analysis

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Workshop on Hadoop with Big Data

Oracle Big Data SQL Technical Update

QUEST meeting Big Data Analytics

Paper SAS Techniques in Processing Data on Hadoop

DATA VISUALIZATION: CONVERTING INFORMATION TO DECISIONS DAVID FRONING, PRINCIPAL PRODUCT MANAGER

9.4 Intelligence. SAS Platform. Overview Second Edition. SAS Documentation

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Cisco Data Preparation

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

Data processing goes big

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Harnessing Big Data with KNIME

In-Memory Analytics for Big Data

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Ganzheitliches Datenmanagement

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Ad Hoc Analysis of Big Data Visualization

An Oracle White Paper October Oracle: Big Data for the Enterprise

6 Steps to Faster Data Blending Using Your Data Warehouse

SAS Visual Analytics 7.2 for SAS Cloud: Quick-Start Guide

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Safe Harbor Statement

Native Connectivity to Big Data Sources in MSTR 10

Cloudera & SAS Partnership Overview. Graham Pymm Cloudera Systems Engineer

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

SAS University Edition: Installation Guide for Windows

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Oracle Big Data Handbook

SAS Academic Program

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

SAS Data Loader 2.1 for Hadoop

An Oracle White Paper June Oracle: Big Data for the Enterprise

Oracle Big Data Essentials

#mstrworld. No Data Left behind: 20+ new data sources with new data preparation in MicroStrategy 10

SAP Crystal Reports & SAP HANA: Integration & Roadmap Kenneth Li SAP SESSION CODE: 0401

Advanced Big Data Analytics with R and Hadoop

How To Create A Data Visualization With Apache Spark And Zeppelin

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Oracle Big Data Strategy Simplified Infrastrcuture

Big Data: Are You Ready? Kevin Lancaster

Big Data Course Highlights

Hadoop & Spark Using Amazon EMR

Talend Big Data Sandbox

SAS Visual Analytics: Arkitektur, data flow og administration

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Testing Big data is one of the biggest

Big Data Technologies Compared June 2014

The Greenplum Analytics Workbench

Netezza and Business Analytics Synergy

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

Customer Site Requirements for incontact Workforce Optimization

SAS University Edition: Installation Guide for Linux

Downloading, Configuring, and Using the Free SAS University Edition Software

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

The Future of Data Management with Hadoop and the Enterprise Data Hub

Luncheon Webinar Series May 13, 2013

Creating a universe on Hive with Hortonworks HDP 2.0

Comprehensive Analytics on the Hortonworks Data Platform

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Toronto 26 th SAP BI. Leap Forward with SAP

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

How To Handle Big Data With A Data Scientist

Big Data Visualization and Dashboards

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Transcription:

Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe

Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle SAS Data Loader for Hadoop Demo

Market trends How much does this drive cost? 3 TB TODAY $115 2010 $270 2005 $3,720 2000 $33,000 1995 $3,360,000 1990 $33,600,000 1985 $315,000,000 1980 $1,312,500,000

Tech trends How long does it take to read 3 TB? 3 TB 1 disk 4.17 hr. 100 disks 2.5 min 1000 disks 15 sec

What is it? Distributed processing of large data sets across clusters of computers using simple programming models Single or multiples machines Data processing framework and a distributed file system for data storage (HDFS)

Traditional vs. In-Database vs. In-memory Traditional SAS In-Database In-Memory Data Store Data Store Data Store Data Data Data Memory SAS SAS SAS Even with In-Database processing there will still be some work performed on the SAS server Even with In-Database processing there will still be some work performed on the SAS server These approaches are complementary & can be combined for maximum effect

SAS and Hadoop SAS accesses and extracts data from Hadoop to a SAS server for processing, and writes results back SAS accesses and processes Hadoop data on SAS Servers while keeping the data and computations massively parallel SAS processes data directly in the Hadoop cluster

The Hadoop analytics lifecycle SAS Visual Analytics EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION SAS/Access to Hadoop SAS DI & Federation Server SAS ESP SAS Data Loader SAS Visual Analytics SAS Visual Statistics SAS In-Memory Statistics for Hadoop SAS Scoring Accelerator for Hadoop SAS Code Accelerator for Hadoop SAS Code Accelerator for Loader DEPLOY MODEL DATA EXPLORATION SAS/Access to Hadoop VALIDATE MODEL TRANSFORM & SELECT SAS DI & Federation Server Done using either the Data Preparation, SAS Data Exploration Loader or Build Model Tools SAS DQ Accelerator for Hadoop Done using the Build Model Tools and other checks BUILD MODEL SAS High Performance Analytics Offerings supported by relevant clients like SAS Enterprise Miner, SAS/STAT etc.

DEPLOY & MONITOR The Hadoop analytics lifecycle SAS/ACCESS SAS Data Management SAS Federation Server SAS Event Stream Processing SAS Data Loader for Hadoop SAS Data Quality Accelerator for Hadoop SAS Code Accelerator for Hadoop MANAGE DATA DATA EXPLORE SAS Data Loader for Hadoop SAS Visual Analytics SAS In-memory Statistics for Hadoop TEXT SAS Scoring Accelerator for Hadoop SAS Decision Manager SAS Visual Analytics DEVELOP MODELS SAS High Performance Analytics Products SAS Visual Statistics SAS In-memory Statistics for Hadoop

SAS Data Management Platform works seamlessly across Hadoop SAS Event Stream Processing Engine Access to HDFS, Hadoop scripting (Pig, Map Reduce ) and HIVE/Cloudera Impala through SAS coding and GUI + Reuse of DQ and ETL/ELT processing Hadoop Accelerated Clients BAU SAS DM clients SAS DI Studio All other DM Clients SAS/Access to Hadoop, SAS/Access to Impala, Other clients Third party clients + SAS BI + SAS Analytics + SAS Solutions Data virtualization & masking across Hadoop and other data stores BASE SAS, SAS Federation Server Self-service data manipulation in Hadoop + Loading into LASR RDBMS Web Based DM interface for Hadoop Bring streaming data from various sources into Hadoop and/or the RDBMS or generate events before data hits downstream store On-Hadoop data processing

SAS IN-DATABASE FOR HADOOP SAS Data Loader for Hadoop Code Accelerator for Hadoop Data Quality Accelerator for Hadoop Data Loader, the UI Scoring Accelerator for Hadoop Separately licensed product

Sas data loader for Hadoop Point & Click User Menus a new SAS Web-based Business user interface Little or no Hadoop experience needed Self-Service UI HTML 5 Interface Enables Self-Service approach to managing data in Hadoop environment

Web Based Data Management interface for Hadoop Capabilities Benefits Browser-based + point and click self-service approach No knowledge of Hadoop or SAS is required) Access and view data in Hadoop Query, filter, transform, summarize the data Load data into tables as well as SAS LASR SAS Data Quality Accelerator for Hadoop enable the casual user Improve data quality Minimize movement of data SAS Data Quality Accelerator for Hadoop and SAS Code Accelerator for Hadoop run in the Hadoop cluster

SAS Data Loader for Hadoop What is it? Web-based interface Easy-to-use HTML5 Execute code on the Hadoop cluster DS2, Hive and Data Quality Load data into SAS LASR server vapp

SAS Data Loader for Hadoop What is it? Non-IT or Business person Easy to configure (small configuration list)

vapp What is a vapp vapp stands for virtual Application Fully functional appliance containing a specific set of SAS Software Plug-and-Play environment Some vapp examples : SAS University Edition, SAS Data Loader and Visual Analytics 6,2 (Cloud only)

vapp Operating System Applications CPU vapp Ledger RAM Storage SAS Solution Network

vapp How does it integrate with the rest of the environment? Instructions Instructions/queries SAS Data Loader For Hadoop Registers Loaded LASR tables only Desktop Metadata Data

Sas data loader for Hadoop Client-Side requirements Laptop or desktop running Windows 7 (64-bit) 8 GB RAM minimum (16 GB preferred) HyperThreading enabled in the BIOS (VT-x or AMD-v) 20 GB of free disk space Capable of installing and running VMware 6 or 6+ Internet Explorer 9+, Firefox 14+, or Chrome 21+

SAS Data Loader for Hadoop Installation process Installation Pre-requisites Deploy Integrate Test VMWare Player Shared Folder Application page Navigate in Hadoop SAS Software Depot Hadoop Cluster SAS Embedded Process Firewall VM Configuration & deploy Startup Apply SAS License Hadoop configuration inside the Data Loader Optional : LASR Configuration Do a transformation Filter & query Run SAS Code Load to LASR

Key take-aways Existing SAS customers can leverage their SAS skills and existing data management assets developed with SAS when using Hadoop SAS Data Management provides the flexibility to work with Hadoop as a new data store alongside traditional data stores using a single platform SAS Data Management graphical user interfaces accelerate the adoption of Hadoop

Turning Data into Value

SAS & Hadoop, getting the value out of Big Data Big Data + Hadoop = Big Data Collection for the technical user Big Data + Hadoop + SAS = Accessibility for everybody in the organization Business users consume the big Hadoop data Business analysts explore & visualize Data Scientists develop and deploy analytical models Decisions built on fact based analytical insights into all of the data NEW workshop SAS & Hadoop, getting the value out of Big Data 18 Nov. 2014 All details on www.sas.com/belux/training

SAS & Hadoop, getting the value out of Big Data Big Data + Hadoop = Big Data Collection for the technical user Big Data + Hadoop + SAS = Accessibility for everybody in the organization Business users consume the big Hadoop data Business analysts explore & visualize Data Scientists develop and deploy analytical models Decisions built on fact based analytical insights into all of the data NEW workshop SAS & Hadoop, getting the value out of Big Data 18 Nov. 2014 All details on www.sas.com/belux/training

SAS Forum Twitter Contest Tweet to win prizes! 5. Which are the 2 core components of every Hadoop installation? A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce Tweet your answer: Prizes to win: Example: @spicyanalytics 5X 1 st prize: a ticket for Analytics 2015 2 nd prize: a book of Prof Bart Baesens: Analytics in a big data world 3 rd to 30 th prize: chocolates with pepper Start of your tweet Question # Your answer Winners will be contacted post-forum!

Turning Data into Value

SAS Data Loader for Hadoop A new SAS Web-based Business user interface Point & Click User Menus Little or no Hadoop experience needed Self-Service UI HTML 5 Interface Enables Self-Service approach to managing data in Hadoop environment

SAS Data Loader for Hadoop Transform Data in Hadoop Filtering Rules Column Selections Aggregation No coding, scripting or specialized skills required

SAS Data Loader for Hadoop Query Hadoop data Select Source Tables Apply Query Criteria See subset of data in Table Viewer Simple Drag & Drop approach to Query Data inside Hadoop

SAS Data Loader for Hadoop Profile Hadoop Data Select Source Table View Reports in Column Display View Reports in Table Display Run standard metrics on data inside Hadoop and generate reports

View Data

SAS Data Loader for Hadoop Copy Data to distributed sas lasr server Select Source Table Copy Data To distributed SAS LASR Servers Visualize Data SAS Visual Analytics Explore Hadoop data quickly and easily for faster insights Optional