Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013



Similar documents
Big Data for Big Science. Bernard Doering Business Development, EMEA Big Data Software

Cloud Computing. Big Data. High Performance Computing

Real-Time Big Data Analytics for the Enterprise

Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler

Unlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation

Implement Hadoop jobs to extract business value from large and varied data sets

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

The Future of Data Management

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

IBM InfoSphere BigInsights Enterprise Edition

Workshop on Hadoop with Big Data

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Quick Reference Selling Guide for Intel Lustre Solutions Overview

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Deploying Hadoop with Manager

IBM BigInsights for Apache Hadoop

Qsoft Inc

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

WHAT S NEW IN SAS 9.4

The Future of Data Management with Hadoop and the Enterprise Data Hub

Upcoming Announcements

#TalendSandbox for Big Data

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Modernizing Your Data Warehouse for Hadoop

Peers Techno log ies Pv t. L td. HADOOP

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Big Data Management and Security

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Professional Hadoop Solutions

Interactive data analytics drive insights

I/O Considerations in Big Data Analytics

HDP Enabling the Modern Data Architecture

Dell In-Memory Appliance for Cloudera Enterprise

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

How to Hadoop Without the Worry: Protecting Big Data at Scale

Hadoop Ecosystem B Y R A H I M A.

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Hadoop & Spark Using Amazon EMR

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Dominik Wagenknecht Accenture

ITG Software Engineering

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Comprehensive Analytics on the Hortonworks Data Platform

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Oracle Big Data Fundamentals Ed 1 NEW

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

BIG DATA TRENDS AND TECHNOLOGIES

Certified Big Data and Apache Hadoop Developer VS-1221

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe May, 2013

Big Data Analytics Nokia

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Data Security in Hadoop

A Modern Data Architecture with Apache Hadoop

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

HDP Hadoop From concept to deployment.

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

So What s the Big Deal?

How Cisco IT Built Big Data Platform to Transform Data Management

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com

Apache Hadoop: Past, Present, and Future

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

Fast, Low-Overhead Encryption for Apache Hadoop*

Complete Java Classes Hadoop Syllabus Contact No:

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Extended Attributes and Transparent Encryption in Apache Hadoop

Red Hat Enterprise Linux is open, scalable, and flexible

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Oracle Big Data Strategy Simplified Infrastrcuture

Big Data on Microsoft Platform

Open Source for Cloud Infrastructure

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Information Builders Mission & Value Proposition

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

CA Big Data Management: It s here, but what can it do for your business?

Move Data from Oracle to Hadoop and Gain New Business Insights

Driving Growth in Insurance With a Big Data Architecture

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Big Data Too Big To Ignore

Transcription:

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013

Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache Hadoop* Enterprise Adoption of Technical Computing on HPC Systems Challenge: Need for an Efficient Infrastructure for Hadoop and HPC Workloads Solution: Intel HPC Distribution for Apache Hadoop* Software Architecture: Key Differentiators Value Prop: Features, Functions, and Benefits Proof Points Intel HPC Distribution BETA PROGRAM 2

Abstract Intel is addressing the need for a scalable, efficient infrastructure that can support Big Data analytics applications on HPC systems in the enterprise by introducing the Intel HPC Distribution product bundle and inviting early adopters to a BETA program. 3

Opportunity: HPC Adoption of Big Data Analytics on Apache Hadoop* Discoveries and Decisions Driven by Big Data Analytics Operational Efficiency Consumer Behavior Security and Risk Management Traffic Optimization Location Aware Ad Placement Personalized Preventive Care Smart Energy Grid Buyer Protection Program Claim Fraud Reduction 4

Opportunity: Enterprise Adoption of Technical Computing on HPC Systems Discoveries and Decisions Driven by Big, Fast Supercomputers Geosciences Oil and gas exploration Seismic modeling Modeling wind turbine placement Life Sciences Genomics Drug discovery Large scale manufacturing Crash safety for auto and aerospace Virtual prototype 5

Challenge: Need for an Efficient Infrastructure for Hadoop* and HPC Workloads Tackling a wide range of previously intractable problems that are important for economic competitiveness, scientific advancement, national security, and the quality of human life. These include fraud detection, antiterrorist analysis, social and biological network analysis, semantic analysis, financial and economic modeling, drug discovery and epidemiology, weather and climate modeling, oil exploration, and power grid management. The common denominator is that the problems are large and complex enough to require modeling and simulation on HPC resources. Source: Excerpt from IDC report #231572, Exploring the Big Data Market for High-Performance Computing, 2013. 6

Addressing the HPC Big Data Challenge Intel HPC Distribution for Apache Hadoop* Software Intel Manager for Hadoop* Software Deployment, Configuration, Monitoring, Altering and Security Intel Manager for Lustre* Software Configure, Monitor, Troubleshoot, Manage Sqoop Data Exchange Flume Log Collector ZooKeeper Coordination Oozie Workflow Pig Scripting Mahout Machine Learning YARN (MRv2) Distributed Processing Framework HDFS Hadoop Distributed File Systems R Connector s Statistics Hive SQL Query HBase Columna r Storage Moab, SLURM, Lustre MPI 7

Solution: Intel HPC Distribution for Apache Hadoop* Software Intel is the first to offer Lustre s parallel file system integrated with Hadoop workloads. Intel Distribution for Apache Hadoop* Software Authentication, authorization, auditing built-in to Apache Hadoop Transparent encryption in Hive, Pig, MapReduce, HBase, HDFS Up to 20x faster en/decryption with Intel AES-NI 1 Up to 30x faster on Intel architecture than other hardware Up to 2.6X faster than other open source distributions Enterprise-grade Hadoop cluster management console and APIs Automated configuration with Intel Active Tuner Direct integration to Intel EE for Lustre allows users to utilize that file system in place of the Hadoop Distributed File System (HDFS). Connectors Netezza, Oracle, R, SAP, SQLServer, Teradata, DB2 Sqoop Data Transfer Flume Log Collector Oozie Workflow Zookeeper Coordination Recommendation Engine Behavior Model Vertical Accelerators Analytics Workbench Storm/Kafk a Stream Pig Scripting Solr Search Lucene Index TBN Graph Mahout Machine Learning MR1 MR2/YARN Distributed Processing Framework HDFS Lustre Hadoop Compatible File Systems HBase Explorer Gryphon SQL Hive Query HBase Security Controls Heat Map Job Profiler Resource Monitor Upgrade Alerts Unified Logging Tuning Ladon (Disaster Recovery) Configuration Rhino (Security) Deployment Open Source Proprietary 1. Based on internal testing 8

Solution: Intel Enterprise Edition for Lustre* Software Intel HPC Distribution for Apache Hadoop software is the only distribution of Apache Hadoop* to integrate and support Lustre* out of the box. Intel Enterprise Edition for Lustre* Software Full open source core Simple GUI for install and management with central data collection Direct integration with storage HW and applications Global tier-1 support Storage plug-in; deep vendor integration REST API extensibility Hadoop* Adapter for shared simplified storage for Hadoop Hadoop Adapter Lustre storage for MapReduce applications Intel Manager for Lustre* Software Configure, Monitor, Troubleshoot, Manage REST API Extensibility CLI Management and Monitoring Service Lustre File System Full distribution of open source Lustre software Storage Plug-in Integration Intel value-added Software Open Source Software 9

Intel HPC Distribution: Open Platform for High Performance Data Analytics Value Prop: Features, Functions, and Benefits Performance Bring compute to the data: Run MapReduce* on Lustre* without code changes Run MapReduce* faster: Avoid the intermediate file shuffle with shared storage Efficiency Avoid Hadoop* islands in the sea of HPC systems Run MapReduce jobs alongside HPC workloads with full access to the cluster resources Manageability Use the seamless integration to manage one common platform for Hadoop and HPC Develop with multiple programming models and deploy on shared storage 10

Proof Points In IDC's 2013 worldwide study of HPC end users, 67% of the sites said they perform HPDA on their HPC systems, often using Hadoop*, with an average of 30% of the available computing cycles devoted to this work. This formative market for Big Data problems needing HPC includes data-intensive modeling and simulation, along with newer analytics methods employed by established HPC users and first-time users from the commercial world. Source: IDC Whitepaper, 2013. 11

Join the BETA program Early adopters of the combined Intel Distribution for Apache Hadoop Software and Intel EE for Lustre Software solution will receive a free, exclusive limited-use version of the software and exchange insights with Intel experts. To be considered for the BETA, please contact: hpdd-info@intel.com 12

2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.