Driving MySQL to Big Data Scale. Thomas Hazel Founder, Chief Scien@st thomas@deepis.com

Similar documents

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Texas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Benchmarking Cassandra on Violin

Oracle Database In-Memory The Next Big Thing

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Databases Acceleration with Non Volatile Memory File System (NVMFS) PRESENTATION TITLE GOES HERE Saeed Raja SanDisk Inc.

In-Memory Data Management for Enterprise Applications

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

How To Scale Myroster With Flash Memory From Hgst On A Flash Flash Flash Memory On A Slave Server

Big Data With Hadoop

Benchmarking Hadoop & HBase on Violin

Configuring Apache Derby for Performance and Durability Olav Sandstå

Parallel Replication for MySQL in 5 Minutes or Less

SQL Server Performance Tuning and Optimization

Optimizing the Performance of Your Longview Application

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC

Choosing Storage Systems

In Memory Accelerator for MongoDB

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

Rackspace Cloud Databases and Container-based Virtualization

Database Hardware Selection Guidelines

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

Capturing Client IO Workloads for SSD Performance Evaluation

Microsoft SQL Database Administrator Certification

Configuring Apache Derby for Performance and Durability Olav Sandstå

Distributed Architecture of Oracle Database In-memory

Microsoft SQL Server: MS Performance Tuning and Optimization Digital

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Virtuoso and Database Scalability

Hypertable Goes Realtime at Baidu. Yang Dong Sherlock Yang(

Advances in Virtualization In Support of In-Memory Big Data Applications

Data Deduplication HTBackup

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012

CSE 326: Data Structures B-Trees and B+ Trees

Social Networks and the Richness of Data

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Why are state-of-the-art flash-based multi-tiered storage systems performing poorly for HTTP video streaming?

Hypertable Architecture Overview

NetApp FAS Hybrid Array Flash Efficiency. Silverton Consulting, Inc. StorInt Briefing

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Seeking Fast, Durable Data Management: A Database System and Persistent Storage Benchmark

In-Memory Databases MemSQL

SSDs: Practical Ways to Accelerate Virtual Servers

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Tier Architectures. Kathleen Durant CS 3200

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

SSDs: Practical Ways to Accelerate Virtual Servers

Physical Data Organization

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

SQream Technologies Ltd - Conﬁden7al

How SSDs Fit in Different Data Center Applications

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

The Classical Architecture. Storage 1 / 36

MS SQL Performance (Tuning) Best Practices:

Scaling Database Performance in Azure

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Cloud Computing at Google. Architecture

DBA Tutorial Kai Voigt Senior MySQL Instructor Sun Microsystems Santa Clara, April 12, 2010

Comparing SQL and NOSQL databases

The Data Placement Challenge

Performance Management in Big Data Applica6ons. Michael Kopp, Technology

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

An Overview of Flash Storage for Databases

Performance Tuning and Optimizing SQL Databases 2016

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

The Methodology Behind the Dell SQL Server Advisor Tool

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Flash-Friendly File System (F2FS)

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC b Test Report Date: 27, April

Accelerating Server Storage Performance on Lenovo ThinkServer

Oracle DBA Course Contents

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

SQL Server Business Intelligence on HP ProLiant DL785 Server

Graph Database Proof of Concept Report

Understanding Enterprise NAS

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

Hardware Configuration Guide

Flash Performance in Storage Systems. Bill Moore Chief Engineer, Storage Systems Sun Microsystems

Transcription:

Driving MySQL to Big Data Scale Thomas Hazel Founder, Chief Scien@st thomas@deepis.com

Millions to Billions to Trillions

Agenda Driving MySQL to Big Data Scale Market Trends Hardware Trends Current Computer Science Limita@ons with current Science Rethinking the Science of Databases Introducing CASSI for MySQL Scaling Benchmarking million, billion, trillion 3

Market Trends Where are things heading? Constant data acquisi,on o Streaming data feeds like IoT More data being collected o Larger table sizes required Desire for in- place analy,cs o More indexing to support complex queries 4

Hardware Trends Resource/Capability Storage Type and Size/Performance o HDD RPM (5.9K, 7.2K, 15K, etc.) o SSD Enterprise/Client grade Memory Type and Size/Performance o DRAM (FPM, EDO, etc.) o SRAM (DDR2, DDR3, etc.) Processor Type and Count/Performance o Intel (i3, i5, i7, etc.) o AMD (X2, X3, FX, etc.) Speed / Cost Yearly Trend Size / Count 5

Current Computer Science Structures/Algorithms Phase 1 - Log File (WAL) o Error Recovery o Merge/Op,miza,on Phase 2 - B- Tree/B+Tree o In- memory and on- disk via MMAP o Rows and Index/Key based representa,on Phase 2 - Log Structured Merge (LSM) Tree o In- memory Write- Back Cache of Rows o On- disk Immutable Maps of Sorted Keys and Values Log File B-Tree LSM-Tree 6

Limita@ons with current Science Fix Structures/Algorithms B- Tree (Read Op,mized) o Read before Write o Fix block orienta,on o Inline/sawtooth rebalancing o Scan vs. Write/Point- Read performance vs. Size LSM- Tree (Write Op,mized) o Write w/o Read, Slower Read o Fix block append orienta,on o Background/deferred rebalance/merge o Write Performance vs. Point- Read vs. Scan vs. Size Read Optimized Write Optimized 7

Rethinking the Science of Databases Maintenance free Performance with Scale How to maximize Writes without sacrificing Reads? How to dynamically resize/redefine structures at run-,me? How to remove mathema,cal limits of memory and storage? How to replace offline with online reconfigura,on/op,miza,on? How to support all the classic/powerful database features at scale? 8

CASSI: Adap@ve Structure/Algorithm Con@nuous Adap@ve Sequen@al Summariza@on of Informa@on Separate algorithm behavior from data structure Split memory and storage into independent structures Introduce kernel scheduling techniques to u,lize hardware Introduce layer to observe and adapt to workloads/resources Machine learning to define structure and schedule resources Dynamic and con,nuous online calibra,on (reorder, compress) Metadata embedded in data (cardinality, counts, cost, etc.) 9

CASSI: Adap@ve Structure/Algorithm Fundamentals Constructs Infinite File Logging o Storing both rows and indexes (e.g. rowdata.vrt, indexdata.irt) o Con,nuous merge/op,miza,on (inline memory, background storage) Variable size Segments o Define/Size ranges of blocks based on data values, workload, resources o Allow Segments to be represented as all or part of the actual dataset Memory/Storage Structure (Segments, Segments of Segments) o Memory: tree- oriented summa,on with physical/logical constructs o Storage: append- only, protocol based with physical/logical constructs 10

CASSI: Adap@ve Structure/Algorithm Fundamentals Behavior Scheduling of Work o Task base indexing, defragment, compression, memory/disk access o Orchestrate tasks based on hardware, workload, informa,on modeling Dynamic Structure/Algorithm o Model based splieng, merging, purging, summa,on, etc. of segments o Range space independence, one segment does not affect another Orchestrate/Op,mize the three tenants of CASSI o Always append data to file (i.e. don t seek, use current posi,on, support upsert) o Read data sequen,ally (i.e. don t seek, use current posi,on) o Con,nually re- write and reorder such that previous two principles are met 11

CASSI: Adap@ve Structure/Algorithm Diagram Write Flow Cache Workload Value Log File CASSI Kernel Order Reorder Index Log File Value Log File Compress Index Log File Value Log File Index Log File 12

CASSI: Adap@ve Structure/Algorithm Diagram Read Flow Cache (P1) Cache (S2) Cache (S3) Key-only Scan t0 I U I I U I U Value Log File tn Indexes (1, 2, 3) Log 13

CASSI: Adap@ve Structure/Algorithm Diagram Summariza@on Finalized Segments Summarized Segment Finalized Segments Summarized Segment Summation Range Summation Range 14

CASSI: Adap@ve Structure/Algorithm Diagram Concurrency Reader 1 i Reader 1 j View 0 Reader 1 k View 1 View 2 Writer 1 Active Lockless Access to Segments Temp. User Space Lock to Storyline Lockless Isolated Access to Views 15

Benchmarking Configura@on CASSI vs. B- Tree Random Keys (Small, Medium, Large, Extreme) Schema/Specifica,ons o 1 Primary Index o 3 Indexes containing 4 Columns o 4 Hosts at increasing Scale/Capability Performance/Scale Tes,ng o Small 2 runs, 50 million rows o Medium 2 runs, 100 million rows o Large 2 runs, 1 Billion rows o Extreme 2 runs, 1 Trillion rows (CASSI only, simple schema) Id Cust. Prod. Price Time Data 001 0001 0001 1.00 10/10 aaaa [ AUTOINC PRIMARY KEY (ìd`), KEY ìndex1` (`cust`,`prod`,`price`,`time`), KEY ìndex2` (`prod`,`price`,`time`,`cust`), KEY ìndex3` (`price`,`time`,`cust`,`prod`) ] 16

50 million with complex indexing 8 CPU, 2G Cache on 7200 RPM HDD, 2x1G Log (Innodb) Inges,on Time 10 clients o CASSI 520 seconds, 8.5G Size o B- Tree 9,285 seconds, 12G Size o Difference Insert Speed ~18x, Size Efficiency 1.4x Cold start Index only Query o CASSI 4,965 rows in set (0.42 sec) o B- Tree 4,953 rows in set (0.83 sec) o Difference ~2x improvement Cold start Index + Point Query o CASSI 4,965 rows in set (33 sec), 0.02G Cache o B- Tree 4,953 rows in set (151 sec), 1G Cache o Difference Query Speed ~4.5x, Cache Efficiency 50x Insert Time Disk Size Index Query Point Query Cache Eff. B-Tree CASSI 17

100 million with complex indexing 8 CPU, 10G Cache on Client Grade SSD, 2x1G Log (Innodb) Inges,on Time 15 clients o CASSI 901 seconds, 17G Size o B- Tree 5,895 seconds, 27G Size o Difference Insert Speed ~6.5x, Size Efficiency 1.5x Cold start Index only Query o CASSI 10,414 rows in set (0.08 sec) o B- Tree 10,435 rows in set (0.17 sec) o Difference ~2x improvement Cold start Index + Point Query o CASSI 10414 rows in set (61 sec) 0.05G Cache o B- Tree 10414 rows in set (147 sec ) 1.2G Cache o Difference Query Speed ~2.4x, Cache Efficiency ~25x Insert Time Disk Size Index Query Point Query Cache Eff. B-Tree CASSI 18

1 billion with complex indexing 16 CPU, 64G Cache on General Purpose SSD (3000 IOPS), 2x1G Log (Innodb) Inges,on Time 15 clients o CASSI 10800 seconds (3 hours), 185G Size o B- Tree 57600 seconds (16 hours), 234G Size o Difference Insert Speed ~5.3x, Size Efficiency 1.2x Cold start Index only Query o CASSI 1,049,702 rows in set (151 sec) o B- Tree 1,052,021 rows in set (159 sec) o Difference ~1.05x improvement Cold start Index + Point Query o CASSI 1,049,702 rows in set (318 sec) 0.06G Cache o B- Tree 1,052,021 rows in set (787 sec) 20G Cache o Difference Query Speed ~2.1x, Cache Efficiency ~333x Insert Time Disk Size Index Query Point Query Cache Eff. B-Tree CASSI 19

1 trillion CASSI Theory in Prac@ce Two machines tested Virtual/Physical (primary with non- indexed column) o Amazon 16 CPU General Purpose SSD with 64G cache o Bare metal 48 CPU HDD 5900 RPM with 128G cache Time to complete test 15 local clients o Amazon 2 weeks, 50 million inserts per minute, 24TB o Bare metal 2.5 weeks, 37 million inserts per minute, 24TB Cold start Schema/Query Performance o Select count(*) from table 0 seconds (0 seeks) o Any point query across 1 trillion rows ~1 second (3 seeks) What we Learned from tes,ng 1 trillion rows and 24TB o Disk Errors and Power failure (incremental backup and verify) o Verify algorithms and protocols at extreme scale (at testable,me) 20

1 trillion 21

What s Next for Deep Engine Founda@onal Technology Structured - MySQL/Percona/MariaDB Semi- Structured - MongoDB Unstructured - Hadoop/HDFS Deep Na,ve? 22

Thank You!

Thomas Hazel Founder, Chief Scien,st thomas@deepis.com www.deepis.com Follow us: @DeepInfoSci