Jeff Adie Principal Systems Engineer. jeffadie@sgi.com



Similar documents
<Insert Picture Here> Extreme Performance with In-Memory Database Technology Real Life Stories

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Understanding the Value of In-Memory in the IT Landscape

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

SAP and Hortonworks Reference Architecture

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Bringing Big Data Modelling into the Hands of Domain Experts

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Intelligent Business Operations and Big Data Software AG. All rights reserved.

An Overview of SAP BW Powered by HANA. Al Weedman

Architectures for Big Data Analytics A database perspective

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Integrating a Big Data Platform into Government:

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

Image Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Why DBMSs Matter More than Ever in the Big Data Era

Big Data Are You Ready? Thomas Kyte

Scaling Your Data to the Cloud

Data Centric Systems (DCS)

Where is... How do I get to...

Strategic Decisions Supported by SAP Big Data Solutions. Angélica Bedoya / Strategic Solutions GTM Mar /2014

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Big Fast Data Hadoop acceleration with Flash. June 2013

BigMemory and Hadoop: Powering the Real-time Intelligent Enterprise

Introducing Oracle Exalytics In-Memory Machine

Cray: Enabling Real-Time Discovery in Big Data

Big Data Performance Growth on the Rise

Achieving Zero Downtime and Accelerating Performance for WordPress

Big Data Functionality for Oracle 11 / 12 Using High Density Computing and Memory Centric DataBase (MCDB) Frequently Asked Questions

BIG DATA ANALYTICS For REAL TIME SYSTEM

From Spark to Ignition:

In-Memory or Live Data: Which is Better?

Integrating Apache Spark with an Enterprise Data Warehouse

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Infrastructures for big data

Advanced analytics at your hands

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

NEEDLE STACKS & BIG DATA: USING EVENT STREAM PROCESSING FOR RISK, SURVEILLANCE & SECURITY ANALYTICS IN CAPITAL MARKETS

Breaking News! Big Data is Solved. What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

Challenges for Data Driven Systems

SQream Technologies Ltd - Confiden7al

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Big Data and Analytics in Government

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Oracle Big Data Building A Big Data Management System

The Big Data Paradigm Shift. Insight Through Automation

Unlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation

Safe Harbor Statement

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Fast Data in the Era of Big Data: Twitter s Real-

Big Data for Big Intel

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

ANALYTICS IN BIG DATA ERA

Oracle Database 11g Comparison Chart

Big Data Spatial Analytics An Introduction

The 3 questions to ask yourself about BIG DATA

Enabling the Use of Data

BigMemory & Hybris : Working together to improve the e-commerce customer experience

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Netezza and Business Analytics Synergy

Advanced Big Data Analytics with R and Hadoop

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Using In-Memory Computing to Simplify Big Data Analytics

Big Data Volume, Velocity, Variability

Parallel Data Warehouse

Reference Architecture, Requirements, Gaps, Roles

SGI Big Data Ecosystem - Overview. Dave Raddatz, Big Data Solutions & Performance, SGI

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Transcription:

Jeff Adie Principal Systems Engineer jeffadie@sgi.com

Agenda Real Time analytics Analysis classes Analysis methods GPUs and Analytics GPUdb and GAIA Example cases: USPS Ebay/Paypal INSCOM Summary

SGI and Big Data Supercomputing 96 Company Confidential

Big Data Analysis classes Two fundamental methods: Object analysis Relationship analysis Object Analysis needle in a haystack Problem can be easily decomposed Ideal for map-reduce or hadoop type analysis Relationship analysis relationships between entities Not easily decomposed Interested in the links rather than the objects themselves Ideal for large-shared memory/graph analysis

Big Data Analytics Feature Extraction Statistical Processing Output Predictions Feature Extraction Conform data to input requirements Statistical Processing Analysis Output Predictions Derive insight from results

Big Data Analytics Many analytics techniques can be framed as convex An optimisation problem where the objective function is convex Exceptions include a-priori and some graph mining algorithms In data analysis, convex problems are attractive Local solutions are always globally optimal Problem definition can be decoupled from the solver Many well studied algorithms exist Generally linearly seperable

Big Data Analytics Most popular algorithms for convex problems are gradient methods Conjugate Gradient Newton Method Incremental Gradient Descent These are all ideally suited to GPU processing: Fast Gradient Gradients with Multiple GPUs Cevahir, et al, ICCS 2009 Accelerating the CG method with CUDA Matt Pennybaker, U of Arizona A GPU framework for solving systems of linear equations, Kruger et al, GPU Gems 2

Real Time Analytics Real time analysis provides: Fast insight Interactive what-if? Interactive steering Up to date information Support instant yes/no go/no go Online fraud detection Real Time has special requirements Data must be immediately available & accessable Processing time must be very fast Need a data-centric approach

Data Centric Analysis Move the compute, not the data! Surround the data with compute elements (CPUs and GPUs) Disks are too slow => In-memory databases Need lots of RAM up up up Global Shared Memory GPU GPU GPU

In memory data Accessing data on disk (milliseconds) Accessing data in flash (microseconds) Accessing data in memory (nanoseconds) Many methods: In memory databases (memsql, TimesTen, HANA) In memory tables (most DBMS support) Directly into shared memory (/dev/shm) GPUdb GPU Accelerated Database

GPUdb A scalable distributed database for many core devices SQL like query language Designed for big data 10-100 million+ rows Not to be confused with gpudb by Yuan Yuan @ code.google.com/gpudb

GAIA A comprehensive database that incorporates numerous complex and distinct sources of information, which can be quickly sorted and displayed using easily understood visualization tools. Built on top of GPUdb Created by Global Intelligence Solutions (GIS) Federal

GAIA 10 Million Flights

GAIA From LAX only

GAIA Shipping

GAIA 500 Million Tweets

GAIA SEA Tweets

Examples Images: William Eng

Real Time Fraud Detection @ USPS

S. Ryan Quick Principal Architect, Paypal

INSCOM GAIA System Partnership between SGI, GIS Federal and NVIDIA Customer : U.S. Army INtelligence and Security COMmand (INSCOM) Custom system for providing soldiers with real-time actionable insight: Quickly and clearly recognise threats Pinpoint hazards along a route in real time to allow military personnel the opportunity to hange their course of action.

INSCOM GAIA

INSCOM GAIA System 10 TB in-core database UV2000 SMP system with 16 NVIDIA K20X GPUs 2,048 CPU cores Live ingest from multiple sensors Sub second response for geospatial calculations

Summary Big Data provides both substantial challenges and opportunities GPU accelerated Data Analytics holds much potential for many typical problems Big Data feeds which require immediate filtering/pre-processing can take real advantage of GPU offload Data movement, as always, is an important issue