Indexing Big Data. Michael A. Bender. ingest data ?????? ??? oy vey

Similar documents
Data Structures and Algorithms for Big Databases

Data storage Tree indexes

Big Data and Scripting. Part 4: Memory Hierarchies

Benchmarking Cassandra on Violin

Hypertable Architecture Overview

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Seeking Fast, Durable Data Management: A Database System and Persistent Storage Benchmark

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

In-Memory Databases MemSQL

Graph Database Proof of Concept Report

MySQL performance in a cloud. Mark Callaghan

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

The Bw-Tree Key-Value Store and Its Applications to Server/Cloud Data Management in Production

Physical Data Organization

University of Dublin Trinity College. Storage Hardware.

Speeding Up Cloud/Server Applications Using Flash Memory

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Yahoo! Cloud Serving Benchmark

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

External Memory Geometric Data Structures

Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0

An Overview of Flash Storage for Databases

HPC data becomes Big Data. Peter Braam

Algorithms and Data Structures

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Unit Storage Structures 1. Storage Structures. Unit 4.3

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

Parallel Replication for MySQL in 5 Minutes or Less

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Memory-Centric Database Acceleration

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

From Last Time: Remove (Delete) Operation

Lecture 1: Data Storage & Index

Raima Database Manager Version 14.0 In-memory Database Engine

Parallel & Distributed Data Management

Introduction to new high performance storage engines in MongoDB

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Optimizing Ext4 for Low Memory Environments

Recovery Principles in MySQL Cluster 5.1

Rackspace Cloud Databases and Container-based Virtualization

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

High Frequency Trading and NoSQL. Peter Lawrey CEO, Principal Consultant Higher Frequency Trading

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

File Management. Chapter 12

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Topics in Computer System Performance and Reliability: Storage Systems!

Cloud Computing at Google. Architecture

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

Technical Challenges for Big Health Care Data. Donald Kossmann Systems Group Department of Computer Science ETH Zurich

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

ProTrack: A Simple Provenance-tracking Filesystem

Configuring Apache Derby for Performance and Durability Olav Sandstå

Tushar Joshi Turtle Networks Ltd

DATABASE DESIGN - 1DL400

MySQL: Cloud vs Bare Metal, Performance and Reliability

Enhancing SQL Server Performance

Binary search tree with SIMD bandwidth optimization using SSE

Performance And Scalability In Oracle9i And SQL Server 2000

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Data Structures and Algorithm Analysis (CSC317) Intro/Review of Data Structures Focus on dynamic sets

Bigtable is a proven design Underpins 100+ Google services:

Big Data Functionality for Oracle 11 / 12 Using High Density Computing and Memory Centric DataBase (MCDB) Frequently Asked Questions

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Kafka & Redis for Big Data Solutions

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Data Structures for Big Data: Bloom Filter. Vinicius Vielmo Cogo Smalltalks, DI, FC/UL. October 16, 2014.

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Benchmarking Hadoop & HBase on Violin

Cache Oblivious Search Trees via Binary Trees of Small Height

Virtuoso and Database Scalability

Databases and Information Systems 1 Part 3: Storage Structures and Indices

How To Scale Out Of A Nosql Database

Binary Heap Algorithms

NoSQL Databases. Polyglot Persistence

Throwing Hardware at SQL Server Performance problems?

Deploying Affordable, High Performance Hybrid Flash Storage for Clustered SQL Server

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Oracle Database In-Memory The Next Big Thing

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

There are numerous ways to access monitors:

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Hardware Recommendations

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

Amazon EC2 XenApp Scalability Analysis

Chapter 12 File Management

Transcription:

Indexing Big Data 30,000 Foot View Big data of Databases problem Michael A. Bender ingest data oy vey 365 42????????? organize data on disks query your data

Indexing Big Data 30,000 Foot View Big data of Databases problem Michael A. Bender ingest data oy vey 365 42????????? organize data on disks query your data Goal: Index big data so that it can be queried quickly.

Setting of Talk-- Type II streaming Type II streaming: Collect and store streams of data to answer queries. Data is stored on disks. Dealing with disk latency has similarities to dealing with network latency. We ll discuss indexing. We want to store data so that it can be queried efficiently. To discuss: relationship with graphs.

Setting of Talk-- Type II streaming Sorry it didn t render. Type II streaming: Collect and store streams of data to answer queries. Data is stored on disks. Dealing with disk latency has similarities to dealing with network latency. We ll discuss indexing. We want to store data so that it can be queried efficiently. To discuss: relationship with graphs.

For on-disk data, one traditionally sees funny tradeoffs in the speeds of data ingestion, query speed, and freshness of data. ingest data oy vey 365 42????????? organize data on disks query your data

Funny tradeoff in ingestion, querying, freshness Select queries were slow until I added an index onto the timestamp field... Adding the index really helped our reporting, BUT now the inserts are taking forever. Comment on mysqlperformanceblog.com I'm trying to create indexes on a table with 308 million rows. It took ~20 minutes to load the table but 10 days to build indexes on it. MySQL bug #9544 They indexed their tables, and indexed them well, And lo, did the queries run quick! But that wasn t the last of their troubles, to tell Their insertions, like treacle, ran thick. Not from Alice in Wonderland by Lewis Carroll queries + answers data ingestion 42??? data indexing query processor Don t Thrash: How to Cache Your Hash in Flash

Funny tradeoff in ingestion, querying, freshness Select queries were slow until I added an index onto the timestamp field... Adding the index really helped our reporting, BUT now the inserts are taking forever. Comment on mysqlperformanceblog.com I'm trying to create indexes on a table with 308 million rows. It took ~20 minutes to load the table but 10 days to build indexes on it. MySQL bug #9544 They indexed their tables, and indexed them well, And lo, did the queries run quick! But that wasn t the last of their troubles, to tell Their insertions, like treacle, ran thick. Not from Alice in Wonderland by Lewis Carroll queries + answers data ingestion 42??? data indexing query processor Don t Thrash: How to Cache Your Hash in Flash

Funny tradeoff in ingestion, querying, freshness Select queries were slow until I added an index onto the timestamp field... Adding the index really helped our reporting, BUT now the inserts are taking forever. Comment on mysqlperformanceblog.com I'm trying to create indexes on a table with 308 million rows. It took ~20 minutes to load the table but 10 days to build indexes on it. MySQL bug #9544 They indexed their tables, and indexed them well, And lo, did the queries run quick! But that wasn t the last of their troubles, to tell Their insertions, like treacle, ran thick. Not from Alice in Wonderland by Lewis Carroll queries + answers data ingestion 42??? data indexing query processor Don t Thrash: How to Cache Your Hash in Flash

Tradeoffs come from different ways to organize data on disk Like a librarian?

Tradeoffs come from different ways to organize data on disk Like a librarian? Fast to find stuff. Slow to add stuff. Indexing

Tradeoffs come from different ways to organize data on disk Like a librarian? Like a teenager? Fast to find stuff. Slow to add stuff. Indexing

Tradeoffs come from different ways to organize data on disk Like a librarian? Like a teenager? Fast to find stuff. Slow to add stuff. Indexing Fast to add stuff. Slow to find stuff. Logging

Fractal-tree index B ɛ -tree This talk: we don t need tradeoffs LSM tree Write-optimized data structures: Faster indexing (10x-100x) Faster queries Fresh data These structures efficiently scale to very big data sizes. 8

Our algorithmic work appears in two commercial products Tokutek s high-performance MySQL and MongoDB. TokuDB TokuMX Application MySQL Database -- SQL processing, -- query optimization libft File System Application Standard MongoDB -- drivers, -- query language, and -- data model libft File System Disk/SSD

Our algorithmic work appears in two commercial products Tokutek s high-performance MySQL and MongoDB. TokuDB Application MySQL Database -- SQL processing, -- query optimization libft File System TokuMX Application Standard MongoDB -- drivers, -- query language, and -- data model libft File System The Fractal Tree engine implements the persistent structures for storing data on disk. Disk/SSD

Write-Optimized Data Structures 10

Write-Optimized Data Structures Would you like them with an algorithmic performance model? 10

An algorithmic performance model How computation works: Data is transferred in blocks between RAM and disk. The number of block transfers dominates the running time. Goal: Minimize # of block transfers Performance bounds are parameterized by block size B, memory size M, data size N. B RAM Disk M B 11 [Aggarwal+Vitter 88] Don t Thrash: How to Cache Your Hash in Flash

Memory and disk access times Disks: ~6 milliseconds per access. RAM: ~60 nanoseconds per access

Memory and disk access times Disks: ~6 milliseconds per access. RAM: ~60 nanoseconds per access Analogy: disk = elevation of Sandia peak RAM = height of a watermellon

Memory and disk access times Disks: ~6 milliseconds per access. RAM: ~60 nanoseconds per access Analogy: disk = elevation of Sandia peak RAM = height of a watermellon Analogy: disk = walking speed of the giant tortoise (0.3mph) RAM = escape velocity from earth (25,000 mph)

The traditional data structure for disks is the B-tree O(logBN)

The traditional data structure for disks is the B-tree Adding a new datum to an N-element B-tree uses O(logBN) block transfers in the worst case. (Even paying one block transfer is too expensive.) O(logBN)

Write-optimized data structures perform better Data structures: [O'Neil,Cheng, Gawlick, O'Neil 96], [Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook 00], [Argel 03], [Graefe 03], [Brodal, Fagerberg 03], [Bender, Farach,Fineman,Fogel, Kuszmaul, Nelson 07], [Brodal, Demaine, Fineman, Iacono, Langerman, Munro 10], [Spillane, Shetty, Zadok, Archak, Dixit 11]. Systems: BigTable, Cassandra, H-Base, LevelDB, TokuDB. B-tree Some write-optimized structures logn Insert/delete logn O(logBN)=O( ) O( ) logb If B=1024, then insert speedup is B/logB 100. Hardware trends mean bigger B, bigger speedup. Less than 1 I/O per insert. B Don t Thrash: How to Cache Your Hash in Flash

Optimal Search-Insert Tradeoff [Brodal, Fagerberg 03] insert point query Optimal tradeoff (function of ɛ=0...1) O log1+b " N B 1 " O log 1+B " N 10x-100x faster inserts B-tree (ɛ=1) ɛ=1/2 ɛ=0 O (log B N) logb N O p B log N O B O (log B N) O (log B N) O (log N) Don t Thrash: How to Cache Your Hash in Flash

Illustration of Optimal Tradeoff [Brodal, Fagerberg 03] Point Queries Slow Fast B-tree Optimal Curve Logging Slow Inserts Fast Don t Thrash: How to Cache Your Hash in Flash

Illustration of Optimal Tradeoff [Brodal, Fagerberg 03] Target of opportunity Point Queries Slow Fast B-tree Insertions improve by 10x-100x with almost no loss of pointquery performance Optimal Curve Logging Slow Inserts Fast Don t Thrash: How to Cache Your Hash in Flash

Performance of write-optimized data structures Write performance on large data >100x faster MongoDB MySQL 16x faster Don t Thrash: How to Cache Your Hash in Flash

Performance of write-optimized data structures Write performance on large data >100x faster MongoDB MySQL 16x faster Later: why fast indexing leads to faster queries. Don t Thrash: How to Cache Your Hash in Flash

How to Build Write- Optimized Structures

How to Build Write- Optimized Structures write-optimized

A simple write-optimized structure O(log N) queries and O((log N)/B) inserts: A balanced binary tree with buffers of size B Inserts + deletes: Send insert/delete messages down from the root and store them in buffers. When a buffer fills up, flush. 20 Don t Thrash: How to Cache Your Hash in Flash

A simple write-optimized structure O(log N) queries and O((log N)/B) inserts: A balanced binary tree with buffers of size B Inserts + deletes: Send insert/delete messages down from the root and store them in buffers. When a buffer fills up, flush. 20 Don t Thrash: How to Cache Your Hash in Flash

A simple write-optimized structure O(log N) queries and O((log N)/B) inserts: A balanced binary tree with buffers of size B Inserts + deletes: Send insert/delete messages down from the root and store them in buffers. When a buffer fills up, flush. 20 Don t Thrash: How to Cache Your Hash in Flash

A simple write-optimized structure O(log N) queries and O((log N)/B) inserts: A balanced binary tree with buffers of size B Inserts + deletes: Send insert/delete messages down from the root and store them in buffers. When a buffer fills up, flush. 20 Don t Thrash: How to Cache Your Hash in Flash

A simple write-optimized structure O(log N) queries and O((log N)/B) inserts: A balanced binary tree with buffers of size B Inserts + deletes: Send insert/delete messages down from the root and store them in buffers. When a buffer fills up, flush. 20 Don t Thrash: How to Cache Your Hash in Flash

A simple write-optimized structure O(log N) queries and O((log N)/B) inserts: A balanced binary tree with buffers of size B Inserts + deletes: Send insert/delete messages down from the root and store them in buffers. When a buffer fills up, flush. 21 Don t Thrash: How to Cache Your Hash in Flash

Analysis of writes An insert/delete costs amortized O((log N)/B) per insert or delete A buffer flush costs O(1) & sends B elements down one level It costs O(1/B) to send element down one level of the tree. There are O(log N) levels in a tree. 22 Don t Thrash: How to Cache Your Hash in Flash

Difficulty of Key Accesses

Difficulty of Key Accesses

Analysis of point queries To search: examine each buffer along a single root-to-leaf path. This costs O(log N). 24 Don t Thrash: How to Cache Your Hash in Flash

Obtaining optimal point queries + very fast inserts B B fanout: B... Point queries cost O(log B N)= O(logB N) This is the tree height. Inserts cost O((logBN)/ B) Each flush cost O(1) I/Os and flushes B elements. 25 Don t Thrash: How to Cache Your Hash in Flash

Powerful and Explainable Write-optimized data structures are very powerful. They are also not hard to teach in a standard algorithms course. 26 Don t Thrash: How to Cache Your Hash in Flash

What the world looks like Insert/point query asymmetry Inserts can be fast: >50K high-entropy writes/sec/disk. Point queries are necessarily slow: <200 high-entropy reads/ sec/disk. We are used to reads and writes having about the same cost, but writing is easier than reading. Writing is easier. Reading is hard. 27 Don t Thrash: How to Cache Your Hash in Flash

The right read optimization is write optimization The right index makes queries run fast. Write-optimized structures maintain indexes efficiently. queries data ingestion 42 answers??? 28 data indexing query processor Don t Thrash: How to Cache Your Hash in Flash

The right read optimization is write optimization The right index makes queries run fast. Write-optimized structures maintain indexes efficiently. Fast writing is a currency we use to accelerate queries. Better indexing means faster queries. queries data ingestion 42 answers??? 28 data indexing query processor Don t Thrash: How to Cache Your Hash in Flash

The right read optimization is write optimization Adding more indexes leads to faster queries. Index maintenance has been notoriously slow. TokuMX MongoDB query I/O load on a production server indexing rate If we can insert faster, we can afford to maintain more indexes (i.e., organize the data in more ways.) Don t Thrash: How to Cache Your Hash in Flash

Summery Slide

Summery Slide The right read optimization is write optimization.

Summery Slide The right read optimization is write optimization. We don t need to trade off ingestion speed, query speed, and data freshness.

Summery Slide The right read optimization is write optimization. How do write-optimized data structures help us with graph queries? We don t need to trade off ingestion speed, query speed, and data freshness.

Summery Slide The right read optimization is write optimization. How do write-optimized data structures help us with graph queries? We don t need to trade off ingestion speed, query speed, and data freshness. Can we use write-optimization on large Sandia machines? There are similar challenges with network and I/O latency.