Increasing Flash Throughput for Big Data Applications (Data Management Track)

Similar documents
Advances in Virtualization In Support of In-Memory Big Data Applications

Inge Os Sales Consulting Manager Oracle Norway

Accelerating Real Time Big Data Applications. PRESENTATION TITLE GOES HERE Bob Hansen

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Realizing the next step in storage/converged architectures

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Binary search tree with SIMD bandwidth optimization using SSE

Overview: X5 Generation Database Machines

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

The New Economics of SAP Business Suite powered by SAP HANA SAP AG. All rights reserved. 2

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

White Paper The Numascale Solution: Extreme BIG DATA Computing

Windows Server Performance Monitoring

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

Enabling Technologies for Distributed and Cloud Computing

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

2009 Oracle Corporation 1

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

IOS110. Virtualization 5/27/2014 1

Hadoop: Embracing future hardware

Enabling Technologies for Distributed Computing

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

bigdata Managing Scale in Ontological Systems

Software-defined Storage Architecture for Analytics Computing

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Big Fast Data Hadoop acceleration with Flash. June 2013

Oracle Database 12c Plug In. Switch On. Get SMART.

How To Write An Article On An Hp Appsystem For Spera Hana

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

SMB Direct for SQL Server and Private Cloud

Oracle Exadata Database Machine for SAP Systems - Innovation Provided by SAP and Oracle for Joint Customers

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Internet Scale Storage Microsoft Storage Community

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Nutanix Solutions for Private Cloud. Kees Baggerman Performance and Solution Engineer

Hyperscale Use Cases for Scaling Out with Flash. David Olszewski

Pluribus Netvisor Solution Brief

Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database Option

White Paper The Numascale Solution: Affordable BIG DATA Computing

Microsoft Windows Server in a Flash

Architectures for Big Data Analytics A database perspective

In-Memory Databases MemSQL

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

IBM System x SAP HANA

InfiniBand vs Fibre Channel Throughput. Figure 1. InfiniBand vs 2Gb/s Fibre Channel Single Port Storage Throughput to Disk Media

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

4 th Workshop on Big Data Benchmarking

RevoScaleR Speed and Scalability

Oracle Database In-Memory The Next Big Thing

Integrating Apache Spark with an Enterprise Data Warehouse

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

Infrastructure Matters: POWER8 vs. Xeon x86

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Microsoft Windows Server Hyper-V in a Flash

SCI Briefing: A Review of the New Hitachi Unified Storage and Hitachi NAS Platform 4000 Series. Silverton Consulting, Inc.

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

WebLogic on Oracle Database Appliance: Combining High Availability and Simplicity

Maximizing SQL Server Virtualization Performance

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Understanding the Benefits of IBM SPSS Statistics Server

Turn Big Data to Small Data

High Performance MySQL Cluster Cloud Reference Architecture using 16 Gbps Fibre Channel and Solid State Storage Technology

Maximum performance, minimal risk for data warehousing

How To Handle Big Data With A Data Scientist

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Exadata Database Machine

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

Tuning Tableau Server for High Performance

A Performance Analysis of Distributed Indexing using Terrier

Lowering the Total Cost of Ownership (TCO) of Data Warehousing

Scala Storage Scale-Out Clustered Storage White Paper

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Big Data Challenges in Bioinformatics

Copyright 2014 Oracle and/or its affiliates. All rights reserved.

In-Memory Data Management for Enterprise Applications

Networking in the Hadoop Cluster

How To Get More Value From Your Data With Hitachi Data Systems And Sper On An Integrated Platform For Sper.Com

NextGen Infrastructure for Big DATA Analytics.

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

In-memory computing with SAP HANA

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014

CitusDB Architecture for Real-Time Big Data

Mit Soft- & Hardware zum Erfolg. Giuseppe Paletta

PARALLELS CLOUD SERVER

SAP Business Suite powered by SAP HANA

Transcription:

Scale Simplify Optimize Evolve Increasing Flash Throughput for Big Data Applications (Data Management Track) Flash Memory 1

Industry Context Addressing the challenge A proposed solution Review of the Benefits Flash Memory Summit 2015 2

Industry Context Data sizes for data under management are monotonically increasing Who wants less data? Our appetite for analysis is monotonically increasing Do you think, or do you know? Trend toward evidence-based management Our appetite for speed is monotonically increasing Who wants questions answered more slowly? Hence the industry interest in in-memory data management systems Our overall ability to manage complexity is not increasing Flash Memory 3

Processor / Memory Context Processor speeds are limited Processor core density has been increasing at a healthy rate Memory density is increasing (but at a lower rate than core density)! Therefore, the memory/core ratio is going in the wrong direction! We haven t significantly changed the memory/storage hierarchies for decades Interconnects are getting faster as fast as memory access? memory access is slow caches are fast! Hennessy, John L.; Patterson, David A. (2011-10-07). Computer Architecture: A Flash Memory Quantitative Approach (The 4

Which is faster, and why? Flash Memory 5

Which is faster, and why? Moving data is the most time-consuming activity, reducing data movement is key. I/O is the enemy Flash Memory 6

The Memory Hierarchy in Human Terms (.3ns = 1s) Flash Memory 7

I/O Bus Choice Memory Verses Storage Flash Memory 8

Choices and Tradeoffs Hardware Architecture Software Development and Maintenance Flash Memory 9

Industry Context Addressing the challenge A proposed solution Review of the Benefits Flash Memory Summit 2015 10

The Challenge How big is Big, and what kind of computer architecture do you need to optimally solve a Big Data problem? Big Data applications may have irregular and unpredictable data access patterns Efficient partitioning of data and queries can be challenging Data can change, and structurally vary characteristics over time Big Data applications often must solve system problems Coordination of computing resources (Storage, Partitions, Networks, CPU s) People who have deep analytic skills are often required to design system resource strategies, perhaps not the greatest use of their time and expertise Flash Memory 11

Flash Memory 12

The Three V s of Big Data Flash Memory 13 Src: https://medium.com/deepend-indepth/know-your-audience-better-than-asio-4802839c3fd3

Data Locality is the real problem not, Volume, Velocity or Variety The Impact of Data Locality Volume: The further the data has to move the worse your performance will be. Velocity: The more complex the transformations the worse your performance will be. Variety: The more data components you look at the worse your performance will be. Flash Memory 14

Flash Memory An Example: Where to optimize?

A Simple Improvement: Column verses Row data architecture Flash Memory 16

The End Goal In-Memory Performance The world wants data to be in-memory, but hasn t been able to get it. Historically the industry has scaled applications to fit on available computer hardware. The industry needs to move to scale the hardware to fit the application. Linear System Scalability We need systems that enable hardware to scale organically using low cost commodity servers at linear cost, as customer needs evolve. We need to enable applications to achieve superior in-memory performance using inexpensive unmodified hardware. Reduced Software Development Costs We need to use off-the-shelf, unmodified Linux. We require no changes to applications or database software. We require systems that optimize automatically, and learn over time. Flash Memory 17

Industry Context Addressing the challenge A proposed solution Review of the Benefits Flash Memory Summit 2015 18

Choice: scale up or scale out? Both scale up and scale out can be expensive When all you have is a hammer Every problem looks like a nail Today we rely almost exclusively on scale out systems Because that s the main way we add processors and memory Shard the data, intelligently target the queries time consuming It s not easy to query partitioned databases What is the best way to do it? Moving data is time-consuming And you might have to change it What if you could build systems that Scale up and Out? Flash Memory 19

Opportunity: Enable application developers to focus time on applications, not the systems required to run them Move the coordination and management tasks to the Operating System (broadly based) Remove the requirement that managing resources be part of the Big Data application Flash Memory 20

How? 45 years ago we figured out how to virtualize memory using the locality principle* Questions: Could locality be applied ubiquitously across our computing infrastructure? How might we apply locality to all compute resource types automatically & dynamically? * P.J. Denning Flash Memory 21

Traditional view of virtualization Virtual Machines sharing a server Virtual Machine Operating System Virtual Machine Operating System Virtual Machine Operating System Virtual Machine Operating System Virtual Machine Operating System Virtual Machine Operating System Physical Machine OS Flash Memory 22

TidalScale Provides hardware aggregation Virtual Machine Operating System A Single Large Virtual Machine running on multiple servers (Inverse Virtualization) Physical Machine OS Physical Machine OS Physical Machine OS Physical Machine OS Physical Machine OS Physical Machine OS Flash Memory 23

How is this better? Completely 100% bit-for-bit unmodified Application Application Query1 Query2 Query3 Database DB DB DB Operating System OS OS OS Hyper Kernel Hyper Kernel Hyper Kernel HW HW HW HW HW HW Today s Model TidalScale Model Flash Memory 24

What enables the solution? The Memory Hierarchy in Human Terms (.3ns = 1s) Flash Memory 25

Why? Creates a unique scalable solution experience: User experience bare metal User experience TidalScale Real Screen Shots Flash Memory 26

Hardware Example System features Admin node 1 Worker nodes 25 Total Memory 3.2TB Total Cores: 150 Network 1/10GbE Storage FreeNAS, xtb Components 1x Admin node (A003) Colfax CX1260i-X6 Haswell, E5-2603V3 6C, 16GB 25x Worker nodes Colfax CX1260i-X6 Haswell, E5-2603V3 6C, 128GB 1x 1G switch 2x 10G switch (S009, S010) Mellanox 1x NAS Flash Memory 27

TidalScale - No Changes Required and many other applications Flash Memory 28

Industry Context Addressing the challenge A proposed solution Review of the Benefits Flash Memory Summit 2015 29

Price/Performance at Scale Scale up costs increase rapidly after 3Tb Flash Memory 30

Performance Scales Up TidalScale can continue to add cores as well as memory (and bandwidth) Single machine solution can only add memory Flash Memory 31

What Problems Benefit? Data Mining / Finance High Data Volumes, Large Analytics, Risk Analysis, Fraud Detection, Graph Analytics using Alternative Data Sources, Risk modeling, High Frequency trading, Complex Event Modeling Bioinformatics Next Generation DNA sequencing, Meta-genomic analysis, Finite Element Brain Modeling, Time-Series MRI Neuro-Imaging IT / Operational Systems In-House Applications, Web Controllers & Servers, Gateways, Image serving, Ad serving, OLTP, ERP, Business Intelligence Flash Memory 32

Scale: Scale, Simplify, Optimize, Evolve Aggregates compute resources for large scale in-memory analysis and decision support Scales like a cluster using commodity hardware at linear cost Allow customers to grow gradually as their needs develop Simplify: Dramatically simplifies application development No need to distribute work across servers Existing applications run as a single instance, without modification, as if on a highly flexible mainframe Optimize: Automatic dynamic hierarchical resource optimization Evolve: Applicable to modern and emerging microprocessors, memories, interconnects, persistent storage & networks Flash Memory 33

Scale Simplify Optimize Evolve Restoring DEVELOPER PRODUCTIVITY THROUGH SIMPLICTY Flash Memory