SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK



Similar documents
Technical Challenges for Big Health Care Data. Donald Kossmann Systems Group Department of Computer Science ETH Zurich

Main Memory Data Warehouses

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Oracle Database In-Memory The Next Big Thing

Virtuoso and Database Scalability

Capacity Management for Oracle Database Machine Exadata v2

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

2009 Oracle Corporation 1

Rethinking SIMD Vectorization for In-Memory Databases

Safe Harbor Statement

Tushar Joshi Turtle Networks Ltd

CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY

Database Scalability and Oracle 12c

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Architectures for Big Data Analytics A database perspective

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Distributed Architecture of Oracle Database In-memory

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Chapter 18: Database System Architectures. Centralized Systems

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

In-Memory Data Management for Enterprise Applications

Performance And Scalability In Oracle9i And SQL Server 2000

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Inge Os Sales Consulting Manager Oracle Norway

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

The Sierra Clustered Database Engine, the technology at the heart of

Crystal Reports Server 2008

Benchmarking Cassandra on Violin

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

Netezza and Business Analytics Synergy

Actian Vector in Hadoop

FPGA-based Multithreading for In-Memory Hash Joins

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering

Data Modeling and Databases I - Introduction. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Performance and scalability of a large OLTP workload

<Insert Picture Here> Oracle In-Memory Database Cache Overview

How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D.

Module 14: Scalability and High Availability

Tier Architectures. Kathleen Durant CS 3200

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

In-memory databases and innovations in Business Intelligence

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

Performance and Scalability Overview

Oracle Database Scalability in VMware ESX VMware ESX 3.5

PUBLIC Performance Optimization Guide

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Integrating Apache Spark with an Enterprise Data Warehouse

SAP HANA. SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence

White Paper. Optimizing the Performance Of MySQL Cluster

In Memory Accelerator for MongoDB

SUN ORACLE EXADATA STORAGE SERVER

DISTRIBUTED AND PARALLELL DATABASE

Real Life Performance of In-Memory Database Systems for BI

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe

Configuring Apache Derby for Performance and Durability Olav Sandstå

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

SQL Server 2005 Features Comparison

DB2 Database Layout and Configuration for SAP NetWeaver based Systems

Overview: X5 Generation Database Machines

HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture

Cloud Based Application Architectures using Smart Computing

iservdb The database closest to you IDEAS Institute

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Cognos Performance Troubleshooting

Big Data Analytics - Accelerated. stream-horizon.com

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

Scalable Architecture on Amazon AWS Cloud

Deliverable Billion Triple dataset hosted on the LOD2 Knowledge Store Cluster. LOD2 Creating Knowledge out of Interlinked Data

Can the Elephants Handle the NoSQL Onslaught?

IBM Netezza High Capacity Appliance

Aaron Werman.

System Architecture. In-Memory Database

Cloud Computing - A Database Perspective. Donald Kossmann Systems Group, ETH Zurich

Database Hardware Selection Guidelines

Performance Baseline of Oracle Exadata X2-2 HR HC. Part II: Server Performance. Benchware Performance Suite Release 8.4 (Build ) September 2013

Instant-On Enterprise

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Transcription:

3/2/2011 SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK Systems Group Dept. of Computer Science ETH Zürich, Switzerland SwissBox Humboldt University Dec. 2010 Systems Group = www.systems.ethz.ch Enterprise Computing Center = www.ecc.ethz.ch 1

3/2/2011 APPLIANCES: The world is changing ORACLE EXADATA Intelligent storage manager Massive caching RAC based architecture Fast network interconnect 2

3/2/2011 ORACLE EXADATA Pushing SQL operators to the storage manager NETEZZA (IBM) TWINFIN No storage manager Distributed disks (per node) FPGA processing No indexing 3

3/2/2011 NETEZZA (IBM) TWINFIN SAP ACCELERATOR Main memory database Column store No indexing (automatic) 4

SWISSBOX Gustavo Alonso, Donald Kossmann, Timothy Roscoe: SwissBox: A Database Appliance CIDR 2011 ETH SWISSBOX 5

SwissBox main components Barrelfish: research operating system for multicore machines. Designed to let the application control key system aspects Crescando: main memory storage manager E cast: distributed protocol for routing updates and reads to (large) pools of replicated nodes running Crescando FPGA layer: Hardware accelerators for network traffic optimization and operator off loading from CPUs SharedDB: data flow architecture for shared operator processing CRESCANDO: the storage manager of SwissBox Philipp Unterbrunner, Georgios Giannikis, Gustavo Alonso, Dietmar Fauser, Donald Kossmann: Predictable Performance for Unpredictable Workloads. PVLDB 2(1): 706 717 (2009) 6

Amadeus Workload Passenger Booking Database ~ 600 GB of raw data (two years of bookings) single table, denormalized ~ 50 attributes: flight no, name, date,..., many flags Query Workload up to 4000 queries / second latency guarantees: 2 seconds today: only pre canned queries allowed Update Workload avg. 600 updates per second (1 update per GB per sec) peak of 12000 updates per second data freshness guarantee: 2 seconds Amadeus Query Examples Simple Queries Print passenger list of Flight LH 4711 Give me LH hon circle from Frankfurt to Delhi Complex Queries Give me all Heathrow passengers that need special assistance (e.g., afterterrorwarning) Problems with State of the Art Simple queries work only because of mat. views multi month project to implement new query / process Complex queries do not work at all 7

Why trad. DBMS are a pain? 20'000 MySQL Query 50th MySQL Query 90th MySQL Query 99th 9'000 8'000 Query Latency in msec 15'000 10'000 5'000 7'000 6'000 5'000 4'000 3'000 2'000 1'000 Query Latency in msec 0 0 20 40 60 80 100 Update Load in Updates/sec Performance depends on workload parameters 1.75 1.5 Synthetic Workload Parameter s changes in load (updates, columns accessed) > huge variance Unpredictable performance, impossible to tune correctly 2 1.25 0 System requirements Predictable (= constant) Performance independent of updates, query types,... Meet SLAs latency, data freshness Affordable Cost ~ 1000 COTS machines are okay (compare to mainframe) Meet Consistency Requirements monotonic reads (ACID not needed) Respect Hardware Trends main memory, NUMA, large data centers 8

Selected RelatedWork L. Qiao et. al. Main memory scan sharing for multi core CPUs. VLDB '08 Cooperative main memory scans for ad hoc OLAP queries (read only) P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper pipelining query execution. CIDR 05 Cooperative scans over vertical partitions on disk K. A. Ross. Selection conditions in main memory. In ACM TODS, 29(1), 2004. S. Chandrasekaran and M. J. Franklin. Streaming queries over streaming data VLDB '02 Query data join G. Candea, N. Polyzotis, R. Vingralek. A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses. VLDB 09 An always on join operator based on similar requirements and design principles What is Crescando? A distributed (relational) table: MM on NUMA horizontally partitioned distributed within and across machines Query / update interface SELECT * FROM table WHERE <any predicate> UPDATE table SET <anything> WHERE <any predicate> monotonic reads / writes (SI within a single partition) Some nice properties constant / predictable latency & data freshness solves the Amadeus use case 9

Design Operate MM like disk in shared nothing architecture Core ~ Spindle (many cores per machine & data center) all data kept in main memory (log to disk for recovery) each core scans one partition of data all the time Batch queries and updates: shared scans do trivial MQO (at scan level on system with single table) control read/update pattern > no data contention Index queries / not data just as in the stream processing world predictable+optimizable: rebuild indexes every second Updates are processed before reads Clock Scan QUERIES UPDATES BUILD QUERY INDEX FOR NEXT SCAN READ CURSOR WRITE CURSOR DATA IN CIRCULAR BUFFER (WIDE TABLE) 10

{record, {query ids} } results is Predicate Indexes Queries + Upd. qs Unindexed Queries Active Queries records Crescando on 1 Core data partition Crescando on 1 Machine (N Cores) Scan Thread Scan Thread Input Queue (Operations) Split Scan Thread Scan Thread Merge Output Queue (Result Tuples)... Input Queue (Operations) Scan Thread Output Queue (Result Tuples) 11

Crescando in a Data Center (N Machines) Implementation Details Optimization decide for batch of queries which indexes to build runs once every second (must be fast) Query + update indexes different indexes for different kinds of predicates e.g., hash tables, R trees, tries,... must fit in L2 cache (better L1 cache) Probe indexes Updates in right order, queries in any order Persistence & Recovery Log updates / inserts to disk (not a bottleneck) 12

Benchmark Environment Crescando Implementation Shared library for POSIX systems Heavily optimized C++ with some inline assembly Benchmark Machines 16 core Opteron machine with 32 GB DDR2 RAM 64 bit Linux SMP kernel, ver. 2.6.27, NUMA enabled Benchmark Database The Amadeus Ticket view (one record per passenger per flight) ~350byte per record; 47 attributes, many of them flags Benchmarks use 15 GB of net data Query + Update Workload Current: Amadeus Workload (from Amadeus traces) Predicted: Synthetic workload with varying predicate selectivity Multi core Scale up 558.5 Q/s 10.5 Q/s 1.9 Q/s Round robin partitioning, read only Amadeus workload, vary number of threads 13

Latency vs. Query Volume thrashing, queue overflows L1 cache base latency of scan L2 cache Hash partitioning, read only Amadeus workload, vary queries/sec Latency vs. Concurrent Writes Hash partitioning, Amadeus workload, 2000 queries/sec, vary updates 14

Crescando vs. MySQL Latency updates + big queries cause massive queuing s= 1.4: 1 / 3,000 queries do not hit an index s= 1.5: 1 / 10,000 queries do not hit an index 16s = time for full table scan in MySQL Amadeus workload, 100 q/sec, vary updates Synthetic read only workload, vary skew Crescando vs. MySQL Throughput read only workload! Amadeus workload, vary updates Synthetic read only workload, vary skew 15

An interesting storage layer Interface is SQL (not pages or blocks) high concurrent query + update throughput Amadeus: ~4000 queries/sec + ~1000 updates/sec updates do not impact latency of queries predictable and guaranteed latency depends on size of partition: not optimal, good enough cost and energy effeciency depends on workload: great for hot data, heavy WL consistency: write monotonicity, can build SI on top works great on NUMA! controls read+write pattern linear scale upwith numberofcores Status & Outlook Status Fully operational system Extensive experiments at Amadeus Production: Summer 2011 (planned) Outlook Column store variant of Crescando Compression E cast: flexible partitioning & replication Additional operators (group by) 16

SWISSBOX: Additional components ETH SWISSBOX 17

Shared DB = processing layer If we can share the scans (Crescando) then maybe we can share other operators (join, short) SharedDB is built on top of Crescando and implements shared operators capable of providing scalable, predictable performance for high volumes of concurrent queries. Shared join Crescando runs selection and projections in one set of cores SharedDB runs joins on the streams from Crescando, thousands of queries at a time 18

Predictability at scale SharedDB can run complex joins (and shorts) in predictable time with large update loads Linear scalability with number of processing units (cores) SWISSBOX: A research platform 19

Key ideas around SwissBox A new way to process queries Massively parallel, simple, predictable Not always optimal, but always good enough Ideal for operational BI High query throughput Concurrent updates with freshness guarantees Great opportunity for research Rethink the database and storage system architecture Explore new posibilities 20