SGI Big Data Ecosystem - Overview. Dave Raddatz, Big Data Solutions & Performance, SGI



Similar documents
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Introducing Oracle Exalytics In-Memory Machine

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Data management challenges in todays Healthcare and Life Sciences ecosystems

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

The Hardware Dilemma. Stephanie Best, SGI Director Big Data Marketing Ray Morcos, SGI Big Data Engineering

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

SGI High Performance Computing

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Oracle Big Data SQL Technical Update

NextGen Infrastructure for Big DATA Analytics.

Data Centric Systems (DCS)

Microsoft Analytics Platform System. Solution Brief

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Processing: Past, Present and Future

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Investor Presentation. Second Quarter 2015

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Integrated Grid Solutions. and Greenplum

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Dell* In-Memory Appliance for Cloudera* Enterprise

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big Data on Microsoft Platform

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Big Data Are You Ready? Thomas Kyte

Oracle Big Data Building A Big Data Management System

Dell In-Memory Appliance for Cloudera Enterprise

Big Data Performance Growth on the Rise

Interactive data analytics drive insights

SGI HPC Systems Help Fuel Manufacturing Rebirth

Big Data Technologies Compared June 2014

How To Use Hp Vertica Ondemand

Advanced In-Database Analytics

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

The Future of Data Management

BIG DATA-AS-A-SERVICE

Information Architecture

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Overview: X5 Generation Database Machines

Cray: Enabling Real-Time Discovery in Big Data

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

IBM Netezza High Capacity Appliance

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Modernizing Your Data Warehouse for Hadoop

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Safe Harbor Statement

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

IBM System x reference architecture solutions for big data

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Cisco Solutions for Big Data and Analytics

<Insert Picture Here> Big Data

A HIGH-PERFORMANCE, SCALABLE BIG DATA APPLIANCE LAURA CHU-VIAL, SENIOR PRODUCT MARKETING MANAGER JOACHIM RAHMFELD, VP FIELD ALLIANCES OF SAP

Where is... How do I get to...

Big Data and Natural Language: Extracting Insight From Text

Protecting Big Data Data Protection Solutions for the Business Data Lake

ORACLE BIG DATA APPLIANCE X3-2

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Big Data Explained. An introduction to Big Data Science.

HDP Hadoop From concept to deployment.

Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop

MarkLogic Enterprise Data Layer

Please give me your feedback

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Build Your Competitive Edge in Big Data with Cisco. Rick Speyer Senior Global Marketing Manager Big Data Cisco Systems 6/25/2015

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Cloud Computing. Big Data. High Performance Computing

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

2009 Oracle Corporation 1

locuz.com Big Data Services

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Cost-Effective Business Intelligence with Red Hat and Open Source

Data Integration Checklist

Performance and Scalability Overview

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Big Data for Investment Research Management

Luncheon Webinar Series May 13, 2013

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Transcription:

SGI Big Data Ecosystem - Overview Dave Raddatz, Big Data Solutions & Performance, SGI Big Data Benchmarking Community March 21, 2013

Speed and Scale Finding Answers 0 Subatomic Atomic to Cellular Human Scale Astronomical 10 16 10 18 10 15 10 6 10 6 Light year Exascale 10 24 Femtometer Yottascale

Our Strategy and Focus Business Applications Business Computing High Performance Computing HPC Redundancy Speed & Scale 3

Our Strategy and Focus Business Applications Business Computing Big Data High Performance Computing HPC Redundancy Speed & Scale 4

Big Data: A combination of Structured, Semi-Structured & Unstructured Data Semi structured Data: Logs, Social chat, twitter, email Volume Structured Data: Traditional Databases, Transactional systems Big Data Variety Unstructured Data: Documents, Sensors, Audio, Video, Images Velocity 5

An Intelligent and Dynamic Big Data Ecosystem Data ingest Processing Analytics and Visualization Archiving The data wave arrives Organize Analyze, Decide, Predict Archive, Retrieve, Comply Emerging technologies include Data Integration, MapReduce, Data Analytics, Data Mining, Predictive Analytics and Visualization to derive intelligence out of a dynamic data ecosystem.

SGI Servers for Speed and Scale HPC Commercial Scientific Modeling & Simulation Big Data Hadoop In-Memory Analytics Archive Cloud Public Private Government SGI ICE TM Rackable SGI UV TM Rackable CloudRack C2

SGI for an Intelligent and Dynamic Big Data Ecosystem Continuously Managing and Deriving Value from Your Data Data ingest Processing Analytics and Visualization Archiving The data wave arrives Organize Analyze, Decide, Predict Archive, Retrieve, Comply SGI Rackable SGI Hadoop Clusters (Scale-out) SGI UV SGI Rackable (Scale-up and out) SGI Modular InfiniteStorage ArcFiniti

Co-processing hybrid data is a challenge Today s Challenges: To analyze structured and unstructured data types simultaneously While one leverages Hadoop for unstructured datasets, using on-hand database management tools on them become impossible, as the data grows. Difficulties include capturing, storing, searching, sharing, analyzing and visualizing data. Response times are high (mins to hrs). While one maintains traditional data warehouses for analyzing structured data, dealing with the volume, velocity and variety of the influx of unstructured information becomes a challenge. SGI Big Data Ecosystem: Implements a single software framework to access, process and analyze all structured, semi- and unstructured information moving across an organization to deliver actionable insight in real-time. Promotes Rackable and UV product line integrated with a modular, feature-rich, best-in-class software framework for extreme analytics on fused data types. 9

SGI co-processing hybrid data in a single dynamic workflow Key Points: With a flexible scale out and scale up product portfolio, SGI fuses unstructured pre processed information from Hadoop with structured data framework for dynamic post processing and extreme analytics in real time. SGI Mineset technology can provide additional capabilities for Predictive Analytics in near real time combined with Visualization for Interactive BI. Archive SGI DMF to archive and perform on demand search of information for compliance In a single workflow, SGI performs ingestion, query execution, aggregation and analytics across all information types and able to deliver actionable insight to users. With DMF, SGI can provide additional capabilities to archive massive amounts of information from Hadoop to tape devices and perform on demand search for compliance.

SGI Big Data Ecosystem: From Data to Actionable Insight Key Points: DATA --> Knowledge -->Insight Analyze Visualize, Decide, Predict, React Acquire, Organize Ext. Data Sources Business Intelligence Content Search, Machine Learning, Artificial Intelligence, Natural Language Processing, Mining, Data Modeling, Visualization Transactional Databases SGI Mineset SGI UV20 / Rackable Oracle, Greenplum, MarkLogic, SAP HANA SGI UV2000 / SGI Rackable Cluster Cloudera, Greenplum Hadoop SGI Rackable Hadoop Cluster Streaming Applications 11 CRM/ERP Systems With a scale out and scale up product portfolio, SGI fuses unstructured, semi structured and structured data. Organization of incoming data from heterogeneous sources is performed using Hadoop on scale out SGI Rackable clusters. This is followed by dynamic postprocessing and extreme analytics on traditional databases on SGI UV or Rackable cluster, depending on whether the DB architecture is SMP or MPP. SGI provides additional capability for Predictive Analytics combined with Visualization for Interactive BI. In a single workflow, SGI can perform analytics across all information types and able to deliver actionable insight to users.

SGI Mineset for Visual Analytics: US Average Gross Income (Classification) Key Points: A Demographic View of US Avg. Gross Income by age group 4D View: Age, Education and Occupation are the 3 dimensions; Vary age group to see the change in gross income in the 4 th dimension. Get avg. gross income in color (blue to red) in the 4 th dimension; One can view the change in the avg. gross income (called mining attribute) from age group 20 to 60+.

SGI UV2000 Big Brain Computer Modular Design, Configuration Flexibility Supports GPU, Intel MIC UV 20 UV 2000 42U 19 Destination Rack Up to 512 Intel Xeon cores (E5) Up to 16TB Up to 2048 (up to 4096) Intel Xeon cores (E5) Up to 64TB Up to 128 Intel Xeon cores (E5) Up to 4TB Up to 32 Intel Xeon cores (E5) Up to 1.5TB SGI NUMAlink 6 network Runs Linux SW off-the-shelf, unmodified Standard Management, Storage Interfaces SGI Infinite Storage Solutions Interface with common management software schemes 13

SGI UV 2000 Single-node to manage, use far less complex than scaleout systems with many nodes. Commodity Scale-out/Cluster InfiniBand or Gigabit Ethernet SGI UV 2000 Big Brain SGI NUMAlink 6 Interconnect Mem ~64GB system + OS mem system + OS mem system + OS mem system + OS... mem system + OS Global shared memory to 64TB System to 4096 thread + OS one instance Off-node data access requires inter-node messaging scheme All data transparently accessible to all processors 14

Globally Shared Memory System NUMAlink 6 is the glue of UV2 NUMAlink Router UV Blade UV Blade UV Blade UV Blade HUB HUB HUB HUB CPU CPU CPU CPU CPU CPU CPU CPU 256GB 256GB 256GB 256GB 256GB 256GB 256GB 256GB To 64TB Cache-Coherent Shared Memory 15

SGI UV World's Largest In-Memory System for Data-Intensive Applications Proven for complex in-memory analytics, fraud/threat detection, graph databases, integrative simulation or analysis. UCSD U of IL Denmark Tech. U Cntr for Bio Sequencing Major WW Courier 200TB space weather simulations protect power and satellite grids Political revolt is predictable with fast analysis of 100M web postings Graph databases leverage large shared memory to find relationships between people, places, orgs and events. Cancer takes a hit as cell network simulation goes from weeks to hours $Millions saved as 38X speedup makes fraud detection real-time 16

SGI Hadoop Clusters

SGI Hadoop Background SGI has been one of the leading commercial suppliers of Hadoop servers since the technology was introduced Leading technology users deploy on SGI Hadoop Clusters SGI supplies customer optimized Hadoop Clusters to key US government agencies SGI has sourced Hadoop installations as large as 40,000 nodes and individual clusters as large as 4,000 nodes Start Up and GO! Factory integrated Hadoop solutions Cloudera s Distribution (CDH) and Cloudera Enterprise Direct delivery to production

SGI Hadoop Reference Implementations (RIs): Solving Big Data Challenges Solve complexity of Hadoop deployment: With a factory-installed, optimized Hadoop integrated with the hardware, RIs allow customers to run complex analytical Applications atop Hadoop, out of the box. RIs lead to little deployment effort, low maintenance cost, maximum Rack density. Built on top of standard Apache Hadoop distributions, RIs provide a modular approach to various verticals. Solve Hadoop optimization challenges with a predictable performance: RIs are pre-built for affordable price/performance, provide linear Terasort performance, optimal performance/watt - allowing organizations to focus on application development instead of performance tuning. Solve challenges of volume, velocity, variety of data: RIs have raw capacity to support 100s of terabytes to petabytes of high velocity data in three densely-packed Rack configurations. RIs can leverage Data management solutions for ingest and archive. Solve challenges to find value in information: RIs have tested features jointly with ISVs for Import/Export, Search, Mine, Predict, creating Business 19 Models, and Visualizing data for Business Intelligence.

SGI Hadoop Reference Implementations Preconfigured half-rack, full rack, multi-rack systems 10GigE Import, Export, Search, Mine, Predict & Visualize data for Business Intelligence Multi Rack: Petabytes useable capacity Fastest time to production Purpose designed Performance optimized Factory integrated Cloudera certified

Social Network Analytics with SGI Hadoop clusters Key Points: The SNA technology extracts people, places, and organizations from text resource crawled from the web and stores into Hadoop/HDFS to find the statistical correlations between them. One can then export the filtered data from Hadoop into a MySQL database for further visualization. One can use the stored data in the MySQL database to connect to other resources, create relationship graph and also connect various resources to generate mashups. Mashup is generated from Social Network Analysis shows forty or so people, places, and organizations followed by 3D visualization of this particular mashup. 21

SGI DataRaptor with MarkLogic

SGI Data Fusion enhanced with MarkLogic XML DB Key Points: SGI Fusion is enhanced with MarkLogic DB supporting XQuery directly on ingested data in real time. Direct ingestion and indexing in the DB eliminates the need of pre processing on Hadoop and hence Hadoop infrastructure in total. 23

DataRaptor w/ MarkLogic: Value Proposition SGI brings to market a leading factory-integrated, flexible and easy-to-use Appliance for next-generation, mission-critical Big Data applications with the following features: A factory-integrated content management framework SGI system hardware integrated with ML database - a leading NoSQL database with XML data model; a Search Engine; an Application Server at Web scale (Big Data); a single system supporting both operational and analytics queries resulting in: Real-time processing capabilities Linear scalability High TXN Rate and cost advantage Availability in 2 flavors one with High Performance drives and second with High Capacity drives, optimally tailored to be configurable for custom workloads; An information management platform supporting structured, semi-structured and unstructured content; Users can run their own analytical and transactional applications atop Appliance and visualize data for Information Intelligence. Disruptive value proposition addresses substantial market opportunity Database SGI Mineset Application Server SGI High Performance 222222 System Search 24

SGI DataRaptor with MarkLogic : Use cases Federal Real time Situational Awareness by event correlation across varied sources to support a mission; Information aggregation for end user delivery Cost effective database, search and alerting platform for information sharing Media Loading Metadata, and assets as is, Asset Management and Dynamic digital content delivery Search, discovery and automatic assembly of assets Financial Services Loading and indexing data in a schema agnostic NoSQL database for analysis of trades; Transactional data integrity, Risk management and analysis Govt grade security 25

SGI Management Center (SMC)

SGI Management Center Key Features Premier system management console with remote server monitoring and control Ease of use, full system management including GUI and CLI Policy driven, fine grained power load management for green computing Advanced fault, event, and alert management for improved reliability Advanced capabilities including GPU management, BIOS management and high availability

SGI Software for accelerating results

SGI Software Environment Premier Software Environment for Technical Computing SGI Management Suite SGI Management Center Standard SGI Management Center Premium Power Option Development Tools and SGI Development Suite & Libraries Workload SGI Workload Scheduling DMF CXFS Scheduling Lustre SGI Foundation Software Linux Linux Operating and Virtualization Systems SGI Scale-up and Scale-out Computing and Storage Systems SGI Performance Suite SGI MPI SGI Accelerate SGI REACT TM SGI UPC SGI Hardware SGI Software 3 rd Party Software Complete, integrated Linux environment across all SGI systems Highest level of scalability and performance while maintaining ISV compatibility Best of Breed solutions delivered with industry leading partners

SGI Performance Suite SGI Accelerate SGI MPI SGI REACT SGI UPC Accelerate applications with optimized software libraries and tools Tune applications without recompiling Optimize performance with specialized algorithms SGI s scalable, high performance MPI environment More than just an MPI library Includes runtime MPI acceleration, profiling, checkpoint/restart and more Hard real-time performance for Linux Only hard real-time solution for standard distribution Linux No custom Linux kernel needed Top 500 level performance for standard distribution Linux SGI s optimized Unified Parallel C compiler environment Scales across SGI Numalink and InfiniBand

SGI as a member of the Big Data Benchmark Community (BDBC)

Goals To understand and learn from what others in the industry are doing with respect to Big Data benchmarking Exchange ideas on new technology trends and how to leverage those for the Big Data community. To contribute to the development and execution of standard benchmarks for Big Data Publish benchmarks on SGI Appliances for customer satisfaction. Help the committee to maintain a best-in-class suite of benchmarks that simulate real-world big data applications that customers can benefit from. 32

33