In-Memory Performance for Big Data



Similar documents
In-Memory Performance for Big Data

Storage in Database Systems. CMPSCI 445 Fall 2010

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Enhancing Recovery Using an SSD Buffer Pool Extension

The Oracle Universal Server Buffer Manager

Virtuoso and Database Scalability

The Bw-Tree Key-Value Store and Its Applications to Server/Cloud Data Management in Production

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Performance Baseline of Hitachi Data Systems HUS VM All Flash Array for Oracle

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

NVRAM-aware Logging in Transaction Systems. Jian Huang

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

Maximizing SQL Server Virtualization Performance

Performance Tuning and Optimizing SQL Databases 2016

Object Oriented Database Management System for Decision Support System.

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

MySQL performance in a cloud. Mark Callaghan

Anti-Caching: A New Approach to Database Management System Architecture

In-Memory Databases MemSQL

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

Oracle TimesTen: An In-Memory Database for Enterprise Applications

In-Memory Data Management for Enterprise Applications

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Shore-MT: A Scalable Storage Manager for the Multicore Era

TECHNICAL OVERVIEW HIGH PERFORMANCE, SCALE-OUT RDBMS FOR FAST DATA APPS RE- QUIRING REAL-TIME ANALYTICS WITH TRANSACTIONS.

Application-Tier In-Memory Analytics Best Practices and Use Cases

Performance Baseline of Oracle Exadata X2-2 HR HC. Part II: Server Performance. Benchware Performance Suite Release 8.4 (Build ) September 2013

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

A Comparison of Oracle Performance on Physical and VMware Servers

FPGA-based Multithreading for In-Memory Hash Joins

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

<Insert Picture Here> Oracle In-Memory Database Cache Overview

Agility Database Scalability Testing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

A Deduplication File System & Course Review

QuickDB Yet YetAnother Database Management System?

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft

Distributed Architecture of Oracle Database In-memory

SQL Server 2014 Optimization with Intel SSDs

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008

System Architecture. In-Memory Database

The Classical Architecture. Storage 1 / 36

Solving Performance Problems In SQL Server by Michal Tinthofer

Birds of a Feather Session: Best Practices for TimesTen on Exalytics

Storage Class Memory Aware Data Management

SAP HANA In-Memory Database Sizing Guideline

A Comparison of Oracle Performance on Physical and VMware Servers

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki

COS 318: Operating Systems

VMware vcenter 4.0 Database Performance for Microsoft SQL Server 2008

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Performance and Tuning Guide. SAP Sybase IQ 16.0

Enhancing SQL Server Performance

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Accelerating Server Storage Performance on Lenovo ThinkServer

Boost SQL Server Performance Buffer Pool Extensions & Delayed Durability

SQL Server Transaction Log from A to Z

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Scaling from Datacenter to Client

Flash-Based Extended Cache for Higher Throughput and Faster Recovery

Optimizing the Performance of Your Longview Application

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

SCALABLE DATA SERVICES

Safe Harbor Statement

Databases Acceleration with Non Volatile Memory File System (NVMFS) PRESENTATION TITLE GOES HERE Saeed Raja SanDisk Inc.

Physical Data Organization

DataBlitz Main Memory DataBase System

SQL Server 2012 Optimization, Performance Tuning and Troubleshooting

Parallel Processing and Software Performance. Lukáš Marek

Solid State Storage in Massive Data Environments Erik Eyberg

A client side persistent block cache for the data center. Vault Boston Luis Pabón - Red Hat

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

Accelerate SQL Server 2014 AlwaysOn Availability Groups with Seagate. Nytro Flash Accelerator Cards

Write Amplification: An Analysis of In-Memory Database Durability Techniques

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

System Requirements Table of contents

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Database Server Configuration Best Practices for Aras Innovator 10

Parallel Replication for MySQL in 5 Minutes or Less

Memory-Centric Database Acceleration

Benchmarking Hadoop & HBase on Violin

arxiv: v1 [cs.db] 12 Sep 2014

Speeding Up Cloud/Server Applications Using Flash Memory

Oracle9i Release 2 Database Architecture on Windows. An Oracle Technical White Paper April 2003

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Tempesta FW. Alexander Krizhanovsky NatSys Lab.

Fusion iomemory iodrive PCIe Application Accelerator Performance Testing

Enabling Efficient OS Paging for Main-Memory OLTP Databases

Top 10 reasons your ecommerce site will fail during peak periods

Energy-aware Memory Management through Database Buffer Control

Data Management in the Cloud

PUBLIC Performance Optimization Guide

SQL Server Performance Tuning and Optimization

Oracle Rdb Performance Management Guide

DATABASE ENGINES ON MULTICORES SCALE: A PRACTICAL APPROACH. João Soares and Nuno Preguiça

Transcription:

In-Memory Performance for Big Data Goetz Graefe, Haris Volos, Hideaki Kimura, Harumi Kuno, Joseph Tucek, Mark Lillibridge, Alistair Veitch VLDB 2014, presented by Nick R. Katsipoulakis

A Preliminary Experiment B-Tree nodes 10GB of Memory Buffer pool Disk pages In-Memory Direct pointers between nodes

Related Work In-memory databases Workload fits e.g. Oracle TimesTen, SQL Server Hekaton, MonetDB, SAP Hana, VoltDB etc. Workload does not fit OS VM layer Poor eviction decisions Data integrity issues Compression (frozen data) Identify hot and cold data Stoica and Ailamaki work on VoltDB Decrease statistic cost Anti-Caching

Motivation Combine best of both worlds Near in-memory performance (workload fits) Buffer-pool performance (workload does not fit) Buffer Pool Benefits large working sets support for write-ahead logging Insulation from cache-coherence issues Drawbacks: Level of indirection

But first, the System Model Transactional Storage Manager ACID guarantees Modern hardware (multi-core architecture) Data Storage B-Tree (one node to one disk page) Leaf nodes maintain data Buffer pool copies of pages Latches and Locks Write-ahead Logging

A flashback at Harizopoulos et al. observation Dataset in-memory Observations Buffer manager takes up ~30% of both instructions and cycles total Idea Faster buffer pool Correctness guarantees

A closer look the source of all evil Hash table 3 - Key 1 maps to p_id_1 1 - lookup(key 1, root) 5 - pin(p_id_1, &buffer) the disk 2 - H(root) 6 H(p_id_1) 4 fetch(d_mem_1) p_id_1 mapping structure long pid_to_mem(long pid) { return mem_addr; } long mem_to_pid(long mem_addr) { return page_id; }

Their proposal for improving the buffer pool Decrease buffer pool overhead Remove the accesses to the common mapping structure Pointer swizzling lazy not all page-ids are swizzled Contribution Buffer pool re-design. Support pointer (un-)swizzling Eviction policy

But, wait. What about Virtual Memory? Correctness requirements might be violated write too early e.g. write a page before the log has concluded write too late e.g. miss a checkpoint because a dirty page have not been written to the backing store recycle non-persistent logs e.g. log page is recycled by the OS VM manager, but, changes have not yet been persisted to actual storage msync() & mlock() do not support: asynchronous read-ahead concurrent multiple writes

A look at traditional B-Tree Nodes and the buffer pool

Flow-charts for locating pages Traditional buffer pool: In-memory:

Proposed buffer-pool design with pointer swizzling Buffer Pool Flow-chart

Proposed design with swizzling Pointers are swizzled one at a time Not all pointers are swizzled Pool eviction Generalized clock scheme Sweep B-Tree using depth-first search Pages with no recent usage are un-swizzled unless they contain swizzled parent-to-child pointers Child-to-parent pointers Expedite un-swizzling Include parent-frame in metadata

Experimental Evaluation Shore-MT pointer-swizzling buffer pool traditional buffer pool in-memory Testbed: Intel Xeon (4 socket, 24 cores), 256 GB Ram, RAID-10 with 10K rpm drives 10GB Buffer pool with O_DIRECT enabled 100GB database size Key size 20 bytes Value size 20 bytes

Buffer pool performance Query performance

Buffer pool performance Insert performance 24 threads 50 million records initially 10 million records Randomly chosen keys

Buffer pool performance - Drifting working set

TPC-C Benchmark

Conclusion - Thoughts A way to combine the best-of-both worlds In-memory performance (workload fits) Buffer pool performance (workload does not fit) Questions Was it really the mapping-data-structure the bottleneck? If a NVM database was used, is pointer-swizzling the answer? Do we still need a buffer manager, or do we need a general memory manager? Thank you!