QUEST meeting Big Data Analytics



Similar documents
and Hadoop Technology

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Paper SAS Techniques in Processing Data on Hadoop

WHAT S NEW IN SAS 9.4

Comprehensive Analytics on the Hortonworks Data Platform

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

ANALYTICS MODERNIZATION TRENDS, APPROACHES, AND USE CASES. Copyright 2013, SAS Institute Inc. All rights reserved.

HDP Hadoop From concept to deployment.

HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS

9.4 Hadoop Configuration Guide for Base SAS. and SAS/ACCESS

Hadoop & SAS Data Loader for Hadoop

Workshop on Hadoop with Big Data

Bringing the Power of SAS to Hadoop. White Paper

Constructing a Data Lake: Hadoop and Oracle Database United!

#TalendSandbox for Big Data

HDP Enabling the Modern Data Architecture

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

The Future of Data Management

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

Document Type: Best Practice

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Bringing the Power of SAS to Hadoop

Modernizing Your Data Warehouse for Hadoop

Dell In-Memory Appliance for Cloudera Enterprise

Apache Sentry. Prasad Mujumdar

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Hadoop Job Oriented Training Agenda

Big Data and Industrial Internet

The Future of Data Management with Hadoop and the Enterprise Data Hub

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Oracle Big Data Essentials

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

<Insert Picture Here> Big Data

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Big Data Realities Hadoop in the Enterprise Architecture

Dominik Wagenknecht Accenture

9.4 SPD Engine: Storing Data in the Hadoop Distributed File System

Data processing goes big

Big Data Too Big To Ignore

Använd SAS för att bearbeta och analysera ditt data i Hadoop

Native Connectivity to Big Data Sources in MSTR 10

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Bringing Big Data to People

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

The Inside Scoop on Hadoop

Cloudera & SAS Partnership Overview. Graham Pymm Cloudera Systems Engineer

Upcoming Announcements

Hadoop Ecosystem B Y R A H I M A.

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Information Builders Mission & Value Proposition

Introduction to Big Data Training

Talend Big Data. Delivering instant value from all your data. Talend

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Data Security in Hadoop

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten

SAS and Teradata Partnership

A Modern Data Architecture with Apache Hadoop

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Ubuntu and Hadoop: the perfect match

Oracle Big Data Fundamentals Ed 1 NEW

Peers Techno log ies Pv t. L td. HADOOP

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.1.x

BIG DATA TECHNOLOGY. Hadoop Ecosystem

White Paper: Hadoop for Intelligence Analysis

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Big data for the Masses The Unique Challenge of Big Data Integration

How Companies are! Using Spark

Big Data: Making Sense of it all!

Case Study : 3 different hadoop cluster deployments

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

Investor Presentation. Second Quarter 2015

Hadoop Introduction coreservlets.com and Dima May coreservlets.com and Dima May

IBM Big Data Platform

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

Qsoft Inc

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

What's New in SAS Data Management

Lessons Learned: Building a Big Data Research and Education Infrastructure

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Hadoop IST 734 SS CHUNG

Transcription:

QUEST meeting Big Data Analytics Peter Hughes Business Solutions Consultant SAS Australia/New Zealand Copyright 2015, SAS Institute Inc. All rights reserved.

Big Data Analytics WHERE WE ARE NOW 2005 2007 2009 2011 2013 BIG DATA Lots of data HADOOP Processing Power ANALYTICS Accurate /Decisions Copyright 2014, SAS Institute Inc. All rights reserved.

The era of abundance "Big data is what happened when the cost of storing information became less than the cost of making the decision to throw it away. - George Dyson Science Historian and TED Speaker C opyr i g ht 2014, SAS Ins titut e Inc. All rights res er ve d.

Two Eras... Will you modernize your mindset? Technology empowered Discovery-centric Focus on value Everything is permitted unless it is forbidden C opyr i g ht 2014, SAS Ins titut e Inc. All rights res er ve d.

WHAT IS HADOOP? An Apache Software Foundation project Open-source Origins in early 2000s with contributions from Google, Yahoo! and Facebook Framework of tools for processing Big Data 1. Base: Common, Distributed File System (HDFS); MapReduce & YARN 2. Additional projects including: Pig; Hive; HBase; Pig; Zookeeper et al. Designed for clusters using commodity server hardware typically Intel/Linux Distributed storage Distributed processing Fault-tolerant topology Commercial Hadoop distributions based on Apache code Extensions; additional tooling; support Vendors: Cloudera; Hortonworks, MapR; Pivotal; IBM; Intel & others Copyright 2014, SAS Institute Inc. All rights reserved.

SAS and Hadoop COMMERCIAL HADOOP VENDORS Intel recently invested $740 Million to buy 18%. Puts their value at around the $4 Billion mark! GE invested $105 Million In Pivotal Google Capital recently invested $80 Million to into MapR they gathered $110 million of investment in their last round! Pivotal HD HP recently invested $50 Million to into Hortonworks to get a place on the board. Total investment now about $300 Million. Big Teradata and SAP Partners! IBM InfoSphere BigInsights Copyright 2014, SAS Institute Inc. All rights reserved.

SAS and Hadoop INTEGRATION WITH OPEN SOURCE HADOOP HIVE Hcatalog YARN PIG MapReduce HDFS Impala Sqoop Parquet ORC Spark Oozie Copyright 2014, SAS Institute Inc. All rights reserved.

SAS WITHIN THE HADOOP ECOSYSTEM User Interface SAS Data Loader for Hadoop SAS Data Integration SAS Enterprise Miner SAS Visual Analytics SAS In-Memory Statistics for Hadoop SAS User Metadata Data Access Base SAS & SAS/ACCESS to Hadoop SAS Metadata In-Memory Data SAS Access LASR Analytic Next-Gen SAS User SAS Embedded Server Data Processing Pig Hive Process Accelerators SAS High- Map Reduce/YARN Performance Analytic MPI Procedures Based File System HDFS Copyright 2014, SAS Institute Inc. All rights reserved.

DATA TO DECISION LIFECYCLE on Hadoop SAS/ACCESS (Hadoop/Impala) SAS Data Management SAS Federation Server SAS Data Quality Accelerator for MANAGE Hadoop DATA SAS Code Accelerator for Hadoop SAS Data Loader for Hadoop SAS Visual Analytics SAS In-memory Statistics for Hadoop Model Manager SAS Scoring Accelerator for Hadoop DEPLOY & MONITOR TEXT DEVELOP MODELS DATA EXPLORE SAS HPA Products SAS Visual Statistics SAS In-memory Statistics for Hadoop C opyr i g ht 2014, SAS Ins titut e Inc. All rights res er ve d.

MANAGE DATA READ/WRITE TO HDFS file:///c:/sample_data/hadoop_config.xml# /* Create directory on HDFS */ filename cfg "C:\Sample_Data\hadoop_config.xml"; proc hadoop options=cfg username="hadoop" password="hadoop"; hdfs mkdir="/user/hadoop/testfolder" ; run; /* Copy file from local SAS to HDFS */ filename cfg "C:\Sample_Data\hadoop_config.xml"; proc hadoop options=cfg username="hadoop" password="hadoop"; hdfs copyfromlocal="c:\sample_data\dept.txt" out="/user/hadoop/testfolder/"; run; /* Copy file from HDFS to local SAS */ filename cfg "C:\Sample_Data\hadoop_config.xml"; proc hadoop options=cfg username="hadoop" password="hadoop"; hdfs copytolocal="/user/hadoop/testfolder" out="c:\sample_data\" ; run; Hadoop configuration file, used for all PROC HADOOP PIG MAPREDUCE HDFS calls C opyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d.

MANAGE DATA SAS/ACCESS Base SAS Procedures executed in-database for Hadoop FREQ, REPORT, SORT, SUMMARY/MEANS, TABULATE Supported Hadoop distributions & combinations* Cloudera CDH 5.0 running Hive/Hive2 Hortonworks HDP 2.0 running HiveServer2 IBM InfoSphere BigInsights 2.1 running Hive MapR M5 2.0.1 running Hive Pivotal/Greenplum HD running Hive Pivotal/Greenplum MR 2.0.1 running Hive * If a provider assures upward compatibility, SAS/ACCESS supports newer combinations. For example, Cloudera assures upward compatibility within major releases, so Cloudera CDH4.2 running Hive or HiveServer2 is supported. C opyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d.

MANAGE DATA HIVE LIBNAME cdh_hdp HADOOP PORT=10000 SERVER=sascldserv02 user=hadoop password=hadoop ; /* Create new table */ proc sql; connect to hadoop(port=10000 SERVER=sascldserv02 USER=hadoop PASSWORD="hadoop"); exec( create table cars_prc (make string, model string, msrp double) ) by hadoop; quit; /* Copy from another table */ proc sql; insert into cdh_hdp.cars_prc select make, model, msrp from sashelp.cars ; quit; /* List contents */ proc sql; select * from cdh_hdp.cars_prc; quit; C opyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d.

MANAGE DATA MAPREDUCE /* Invoke MapReduce Word Count program */ filename cfg "C:\Sample_Data\hadoop_config.xml"; proc hadoop options=cfg username="hadoop" password="hadoop" verbose; hdfs delete="/user/hadoop/output_mr1"; mapreduce input="/user/hadoop/gutenberg output="/user/hadoop/output_mr1" jar="c:\sample_data\hadoop-examples-2.0.0-mr1-cdh4.1.2.jar" outputkey="org.apache.hadoop.io.text" outputvalue="org.apache.hadoop.io.intwritable" reduce="org.apache.hadoop.examples.wordcount$intsumreducer" combine="org.apache.hadoop.examples.wordcount$intsumreducer" map="org.apache.hadoop.examples.wordcount$tokenizermapper" reducetasks=0 ; run; C opyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d.

MANAGE DATA SAS DATA INTEGRATION STUDIO Seamless access to Hadoop data (HDFS/HIVE/IMPALA) by analyst/traditional SAS users Reading & writing to/from HDFS Transfer to/from Hadoop operators Support for Pig, Hive & MapReduce transforms C opyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d.

SAS IN-MEMORY ANALYTICS SAS LASR ANALYTIC SERVER AND HADOOP In-memory processing; use Hadoop for storage persistence and commodity computing WEB CLIENTS APPLICATIONS SAS LASR ANALYTIC SERVER HADOOP ERP SCM SAS Visual Analytics SAS IN-MEMORY SAS IN-MEMORY CRM Images SAS Visual Statistics SAS IN-MEMORY Audio and Video SAS In-Memory Statistics for Hadoop SAS IN-MEMORY Machine Logs *Name not finalized. SAS IN-MEMORY Text f Web and Social C opyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d.

DEPLOY & MONITOR SAS SCORING ACCELERATOR FOR HADOOP Publish SAS Enterprise Miner models or SAS/STAT linear models inside the Hadoop Fully integrated with SAS Model Manager to streamline registration, validation and performance monitoring Reduced data movement and improve data governance by streamlining model deployment processes within Hadoop C opyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d.

C opyr i g ht 2014, SAS Ins titut e Inc. All rights res er ve d. http://www.sas.com/au/sashadoop

C opyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. QUESTIONS?

Peter.hughes@sas.com peter hughes Thank You! http://www.sas.com/au/sashadoop C opyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d.