BIG DATA The Petabyte Age: More Isn't Just More More Is Different wired magazine Issue 16/.07



Similar documents
Sun Constellation System: The Open Petascale Computing Architecture

HPC Update: Engagement Model

2009 Oracle Corporation 1

An Oracle White Paper May Exadata Smart Flash Cache and the Oracle Exadata Database Machine

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

SUN ORACLE EXADATA STORAGE SERVER

Exadata Database Machine

SUN ORACLE DATABASE MACHINE

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Overview: X5 Generation Database Machines

ARCHITECTING COST-EFFECTIVE, SCALABLE ORACLE DATA WAREHOUSES

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

An Oracle White Paper December Exadata Smart Flash Cache Features and the Oracle Exadata Database Machine

SCI Briefing: A Review of the New Hitachi Unified Storage and Hitachi NAS Platform 4000 Series. Silverton Consulting, Inc.

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

Oracle Exadata Database Machine for SAP Systems - Innovation Provided by SAP and Oracle for Joint Customers

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Main Memory Data Warehouses

Infrastructure Matters: POWER8 vs. Xeon x86

Hadoop on the Gordon Data Intensive Cluster

Capacity Management for Oracle Database Machine Exadata v2

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

ORACLE EXADATA STORAGE SERVER X4-2

Virtual Compute Appliance Frequently Asked Questions

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Hadoop: Embracing future hardware

An Oracle White Paper January Improving SAS Customer Intelligence Solution Performance with Oracle SPARC SuperCluster

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

ioscale: The Holy Grail for Hyperscale

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

ORACLE EXADATA DATABASE MACHINE X2-8

Flash Performance for Oracle RAC with PCIe Shared Storage A Revolutionary Oracle RAC Architecture

ORACLE EXADATA STORAGE SERVER X2-2

Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

An Oracle White Paper November Achieving New Levels of Datacenter Performance and Efficiency with Software-optimized Flash Storage

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

An Oracle White Paper December A Technical Overview of the Oracle Exadata Database Machine and Exadata Storage Server

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Oracle Database In-Memory The Next Big Thing

ORACLE SUPERCLUSTER T5-8

Oracle Maximum Availability Architecture with Exadata Database Machine. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

SMB Direct for SQL Server and Private Cloud

Can High-Performance Interconnects Benefit Memcached and Hadoop?

SUN STORAGE F5100 FLASH ARRAY

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com

HadoopTM Analytics DDN

ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK

Quick Reference Selling Guide for Intel Lustre Solutions Overview

Data Center Op+miza+on

An Oracle White Paper June A Technical Overview of the Oracle Exadata Database Machine and Exadata Storage Server

Inge Os Sales Consulting Manager Oracle Norway

200 flash-related. petabytes of. flash shipped. patents. NetApp flash leadership. Metrics through October, 2015

Introducing Oracle Exalytics In-Memory Machine

Integrated Grid Solutions. and Greenplum

Oracle Exalytics Briefing

How To Write An Article On An Hp Appsystem For Spera Hana

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Parallel Programming Survey

MACHINE X2-22 ORACLE EXADATA DATABASE ORACLE DATA SHEET FEATURES AND FACTS FEATURES

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

An Oracle White Paper December Consolidating and Virtualizing Datacenter Networks with Oracle s Network Fabric

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

MACHINE X2-8 ORACLE EXADATA DATABASE ORACLE DATA SHEET FEATURES AND FACTS FEATURES

Running Oracle s PeopleSoft Human Capital Management on Oracle SuperCluster T5-8 O R A C L E W H I T E P A P E R L A S T U P D A T E D J U N E

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

Big Fast Data Hadoop acceleration with Flash. June 2013

TUT NoSQL Seminar (Oracle) Big Data

SUN ORACLE DATABASE MACHINE

Building a Scalable Storage with InfiniBand

Enabling Technologies for Distributed Computing

Building a Flash Fabric

How To Build An Exadata Database Machine X2-8 Full Rack For A Large Database Server

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Transcription:

BIG DATA The Petabyte Age: More Isn't Just More More Is Different wired magazine Issue 16/.07 Martien Ouwens Datacenter Solutions Architect Sun Microsystems 1

Big Data, The Petabyte Age: Relational Next Generation The biggest challenge of the Petabyte Age won't be storing all that data, it'll be figuring out how to make sense of it wired magazine Issue 16/.07 Sun Confidential: Internal Only 2

Big Data Key Trends Relational Next Generation Sun Confidential: Internal Only 3

Hitting walls in Processor Design Clock frequency frequency increases tapering off, in new semiconductor processes high frequencies => power issues Memory latency (not instruction execution speed) dominating most application times Processor designs for high single-thread performance are becoming highly complex, therefore: expense and/or time-to-market suffer verification increasingly difficult more complexity => more circuitry => increased power... for diminishing performance returns Sun Confidential: Internal Only 4

Memory Bottleneck Relative Performance 10000 1000 100 10 CPU Frequency DRAM Speeds CPU -- 2x Every 2 Years DRAM -- 2x Every 6-8 Years Gap 1 1980 1985 2000 2005 1990 1995 Source: Sun World Wide Analyst Conference Feb. 25, 2003 Sun Confidential: Internal Only 5

How We Mask Memory Latency Today ITLB L1 I-CacheI Core 0 P$ W$ L2 Cache (Left) L1 D-Cache DTLB L2/L3 Cntl L2 Tags SIU L1 D-Cache DTLB L3 Tags SIU MCU L1 I-CacheI Core 1 P$ W$ L2 Cache (Right) ITLB Great Big Caches of many different kinds that only mask memory latency and execute no code accessed one cache line at a time but require power and cooling to all lines all the time Cache Logic Accounts for About 75% of the Chip Area Sun Confidential: Internal Only 6

Power Wall + Memory Wall + ILP Wall = Brick Wall In 2006, performance is a factor of three below the traditional doubling every 18 months that we enjoyed between 1986 and 2002. The doubling of uniprocessor performance may now take 5 years. * * The Landscape of Parallel Computing Research: A View from Berkeley, December 2006 (emphasis added) http://www.eecs.berkeley.edu/pubs/techrpts/2006/eecs-2006-183.html Sun Confidential: Internal Only 7

Single Threading Up to 85% Cycles Spent Waiting for Memory Single Threaded Performance SPARC US-IV+ 25% 75% Intel 15% 85% Hurry Up and WAIT Thread C M C M C M Compute Memory Latency Time Sun Confidential: Internal Only 8

Comparing Modern CPU Design Techniques 1 GHz 2 GHz C M C M C M C M C M C M C M C M Memory Latency Compute TLP C C M M Time Saved C M C M Time ILP Offers Limited Headroom TLP Provides Greater Performance Efficiency Sun Confidential: Internal Only 9

Chip Multi-Threading (CMT) Utilization: Up to 85%* UltraSPARC-T1 85% 15% Core-1 Core-2 Core-3 Core-4 Core-5 Core-6 Core-7 Core-8 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Chip Multithreaded (CMT) Performance * based on example of UltraSPARC T1 Compute Time Latency Sun Confidential: Internal Only 10

Enterprise Flash Relative Perf 1 sec 16,7 min 11,5 dag 31,7 jr Sun Confidential: Internal Only 11

Where to Store Data? Optimization Trade-Off Sun Confidential: Internal Only 12

Data Retention Comparison DRAM SSD HDD Sun Confidential: Internal Only 13

ZFS Turbo Charges Applications Hybrid Storage Pool Data Management on Solaris Systems, Open Storage ZFS automatically: Writes new data to very fast SSD pool (ZIL) Determines data access patterns and stores frequently accessed data in the L2ARC Bundles IO into sequential lazy writes for more efficient use of low cost mechanical disks Sun Confidential: Internal Only 14

Unique Capabilities from Sun Sun Storage 7000 Unified Storage Systems Radical simplification > Ready-to-go appliance installs in minutes > Extensive analytics and drill-down for rapid management of performance challenges Improved latency and performance > Integrated flash drives > Hybrid Storage Pools automates data placement on the best media Compelling value through standards > Industry standard hardware and open source software vs. proprietary solution Sun Confidential: Internal Only 15

Sun Oracle Database Machine Hardware by Sun, software by Oracle Exadata Version 2 Enhancements to Exadata Version 1 offering extreme performance > Faster servers, storage, and network interconnects > Flash acceleration > Speedup at Oracle level of 20 times processing IOPS > Extends Exadata to provide acceleration for OLTP Database aware storage > Smart scans > Storage indexes eliminates a good deal of I/Os Massively parallel architecture Dynamic linear scaling Offering mission critical availability and protection of data Sun Confidential: Internal Only 16

What Does Sun Oracle Database Machine Address? Current database deployments often have bottlenecks limiting the movement of data from disks to servers > Storage Array internal bottlenecks on processors and Fibre Channel Architecture > Limited Fibre Channel host bus adapters in servers > Under configured and complex SANs Pipes between disks and servers are 10x to 100x too slow for data size Sun Confidential: Internal Only 17

Sun Oracle Database Machine Solves the Bottleneck by Adding more pipes Massively parallel architecture Make the pipes wider 5X faster than conventional storage Ship less data through the pipes Process data in storage Simplify the storage offering eliminate the need of a SAN Allow for Ease of Growth Provide a Flash Cache for Rapid Response Sun Confidential: Internal Only 18

HPC Market Trends It s a growing market, outpacing growth of many others > $10B in 2008 > CAGR of 3.1% for 2007 to 2012* Clustering is now the dominant architecture HPC technologies now within reach of all organizations * Source IDC (January 2009) HPC is Underserved by Moore s Law Sun Confidential: Internal Only 19

Sun Constellation System Open Petascale Architecture Eco-Efficient Building Blocks Compute Networking Storage Software DEVELOPER TOOLS GRID ENGINE PROVISIONING Ultra-Dense Blade Platform Fastest processors: AMD Opteron, Intel Xeon Highest compute density Fastest host channel adaptor Ultra-Dense Switch Solution 648 port InfiniBand switch Unrivaled cable simplification Most economical InfiniBand cost/port Ultra-Dense Storage Solution Most economical and scalable parallel file system building block Up to 48 TB in 4RU Up to 2TB of SSD Direct cabling to IB switch Comprehensive Software Stack Integrated developer tools Integrated Grid Engine infrastructure Provisioning, monitoring, patching Simplified inventory management Sun Confidential: Internal Only 20

Sun Constellation System Open Petascale Architecture Radical Simplicity, Faster Time to Deployment Competitors Compute Clusters Racks Sun Constellation System Cluster Racks Competitors Cabling Infrastructure Leaf Switches Alternative Open Standards Fabric 300 switching elements 6912 cables 92 racks Core Switches Cabling Reduced Cabling 1 switching element 300:1 reduction 1152 cables 6:1 reduction 74 racks 20% smaller footprint Cabling Sun Confidential: Internal Only 21

Compute Power 579 TFLOPS Aggregate Peak 3,936 Sun four-socket, quad-core nodes 15,744 AMD Opteron processors Quad-core, four flops/cycle (dual pipelines) Memory 2 GB/core, 32 GB/node, 125 TB total 132 GB/s aggregate bandwidth Disk Subsystem 72 Sun x4500 Thumper I/O servers, 24TB each 1.7 Petabyte total raw storage Aggregate bandwidth ~32 GB/sec InfiniBand Interconnect Massive Low Latency Engine Sun Confidential: Internal Only 22

The HPC Data I/O Challenge Compute power is scaling rapidly > Larger compute clusters > More sockets per server > More cores per CPU Compute advances far outpace those in I/O > Storage latency 12 orders of magnitude higher than compute > Compute Side developed Parallel Approaches sooner Need a better approach to feed compute > Cluster performance is often limited by data I/O > Need a parallel I/O approach to help balance things Sun Confidential: Internal Only 23

Parallel Storage Sun Storage Cluster HA MDS Module Standard OSS Module Lustre and Open Storage providing a wide range of Scaling and Performance for a wide range of cluster sizes Breakthrough economics >Up to 50% cost savings vs. competitors Simplified deployment and management of Lustre storage HA OSS Module Sun Confidential: Internal Only 24

Unique Capabilities from Sun Sun Storage Cluster High performance and broad scaling > From a few to over 100 GB/second > From dozens to thousands of nodes Breakthrough economics, up to 50% savings > Built with standardized hardware and open source software vs. proprietary products Simplifying Lustre for the mainstream > Pre-defined modules speed deployment of parallel file system, available factory integrated > Ease of use improvements patchless clients, integrated driver stack, LNET self-test, mountbased cluster configuration, many more Sun Confidential: Internal Only 25

Lustre in Action TACC Ranger 1.73 PB storage, 35GB/s I/O throughput, 3,936 quad-core clients Sandia Red Storm 340 TB Storage, 50GB/s I/O throughput, 25,000 clients Framestore The Tale of Despereaux 200TB Lustre file system averaged 1.2GB/s in sustained reads, peaks of 3GB/s, 5TB data generated per night from cluster of 4,000 cores interfacing with Lustre file system German Climate Research Data Centre (DKRZ) 272 TB, Lustre and Storage Archive Manager, 256 nodes with 1024 cores Sun Confidential: Internal Only 26

Big Data Key Trends Relational Commoditisation (Open source, COTS) Next Generation Distributed data and processing (Dist. FS, MapReduce, Hadoop...) Master/Slave Open source, general purpose datawarehouse Object Store Semi-structured Database ScaleDB, Big Table, SimpleDB hbase Master/Master OLTP is the datawarehouse Proprietary, dedicated datawarehouse Distributed FS Unstructured Data Federated/ Shared Structured Data Sun Confidential: Internal Only 27

Big Data Martien Ouwens Datacenter Solutions Architect martien.ouwens@sun.com Sun Microsystems Nederland B.V. Sun Confidential: Internal Only 28

Typical BI/DW Components Pentaho Infobright Pentaho S o u r c e s Transactional systems Repository ODS Line of business systems L o a d Data Warehouse Analytics Reporting Sun Servers Sun Storage Sun Servers Slide 29

Column-Orientation Exemplified by InfoBright's approach Column Orientation Data Packs data stored in manageably sized, highly compressed data packs Knowledge Grid statistics and metadata describing the supercompressed data which is automatically updated as data packs are created or updated Infobright Optimizer iterates over the Knowledge Grid; only the data packs needed to resolve the query are decompressed 30 Smarter architecture Load data and go No indexes or partitions to build and maintain, no tuning Super-compact data foot- print leverages off-the-shelf hardware MySQL wrapper connects to BI and ETL tools Sun Confidential: Internal Only 30

InfoBright Query Execution Rows 1 to 65,536 `employees` records stored in Data Packs of 64k elements each 65,537 to 131,072 salary age job city All packs ignored All packs ignored SELECT COUNT(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = Shipping AND city = TORONTO ; Using the Knowledge Grid, we verify, which packs are irrelevant (no values found that fit SELECT criteria) relevant (all values fit) suspect (some values fit) 131,073 to Relevant packs Any row containing 1 or more irrelevant DP s is ignored. Only this pack will be decompressed All packs ignored Since we are looking for COUNT(*) in this case we do not need decompress relevant packs. COUNT information is kept in the Knowledge Grid in the Data Pack Nodes. The suspect packs must be decompressed for detailed evaluation. Sun Confidential: Internal Only 31