05. Alternative Speichermodelle. Architektur von Datenbanksystemen I



Similar documents
In-Memory Data Management for Enterprise Applications

Innovative technology for big data analytics

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe

Lecture Data Warehouse Systems

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG

Performance Verbesserung von SAP BW mit SQL Server Columnstore

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

DKDA 2012 and the Impact of In-Memory Database Algorithms

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

In-memory databases and innovations in Business Intelligence

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

6. Storage and File Structures

The Design and Implementation of Modern Column-Oriented Database Systems

How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D.

Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki

Big Fast Data Hadoop acceleration with Flash. June 2013

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Physical Data Organization

Column-Oriented Databases to Gain High Performance for Data Warehouse System

2009 Oracle Corporation 1

MCJoin: A Memory-Constrained Join for Column-Store Main-Memory Databases.

Weaving Relations for Cache Performance

Actian Vector in Hadoop

Coldbase - A Column-Oriented In-Memory Database

The Classical Architecture. Storage 1 / 36

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries

FPGA-based Multithreading for In-Memory Hash Joins

SQL Server Column Store Indexes

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

CS54100: Database Systems

Cache Conscious Column Organization in In-Memory Column Stores

DATA WAREHOUSING II. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 23

Storage in Database Systems. CMPSCI 445 Fall 2010

5 Signs You Might Be Outgrowing Your MySQL Data Warehouse*

Architectures for Big Data Analytics A database perspective

An Efficient Approach Optimized Performance with SAP Net Weaver BI Accelerator

Main Memory Data Warehouses

MS SQL Performance (Tuning) Best Practices:

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

Binary search tree with SIMD bandwidth optimization using SSE

Comparison of Data Warehousing DBMS Platforms

Column-Stores vs. Row-Stores: How Different Are They Really?

System Architecture. In-Memory Database

CBW NLS High Speed Query Access to Database and Nearline Storage

SQL 2014 CTP1. Hekaton & CSI Version 2 unter der Lupe. Sascha Götz Karlsruhe, 03. Dezember 2013

Daniel J. Adabi. Workshop presentation by Lukas Probst

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Minimize cost and risk for data warehousing

HP SiteScope. HP Vertica Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems. Software Version: 11.

Dissertation. Finding the Right Processor for the Job. Co-Processors in a DBMS. Dipl.-Inf. Hannes Rauhe

Real Life Performance of In-Memory Database Systems for BI

Near-line Storage with CBW NLS

Flash Accel, Flash Cache, Flash Pool, Flash Ray Was? Wann? Wie?

Gaining the Performance Edge Using a Column-Oriented Database Management System

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft

PBS CBW NLS IQ Enterprise Content Store

Fact Sheet In-Memory Analysis

Inge Os Sales Consulting Manager Oracle Norway

Exploring the Synergistic Relationships Between BPC, BW and HANA

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Big Data and Its Impact on the Data Warehousing Architecture

Data Warehouse: Introduction

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Toronto 26 th SAP BI. Leap Forward with SAP

Microsoft Analytics Platform System. Solution Brief

Chapter 12 File Management. Roadmap

Chapter 12 File Management

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

SQL Server 2008 Performance and Scale

Storing Data: Disks and Files

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008

Data Warehousing & Data Mining

IBM Informix Warehouse Accelerator (IWA)

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Oracle Database In-Memory The Next Big Thing

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems

SQL Performance for a Big Data 22 Billion row data warehouse

PUBLIC Performance Optimization Guide

Why DBMSs Matter More than Ever in the Big Data Era

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Columnstore in SQL Server 2016

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH

Design and Evaluation of Storage Organizations for Read-Optimized Main Memory Databases

In-memory computing with SAP HANA

A Deduplication File System & Course Review

The Advantages of In-Memory DBMS

The Impact of Columnar In-Memory Databases on Enterprise Systems

Transcription:

05. Alternative Speichermodelle Architektur von Datenbanksystemen I

Einführung LETZTE VORLESUNG ROW-BASED RECORD MANAGEMENT klassisches N-äres Speichermodell (NSM), auch row-store NSM = Normalized Storage Model VORTEILE gesamter Datensatz kann mit einem Seitenzugriff gelesen werden leichte Änderbarkeit einzelner Attributwerte NACHTEIL: werden nur wenige Attributwerte benötigt, müssen trotzdem immer alle Attributwerte gelesen werden à unnötiger IO-Aufwand ALTERNATIVEN: SPALTENORIENTIERTE SPEICHERMODELLE Zerlegung einer n-stelligen Relation in eine Menge von Projektionen (z.b. binäre Relation) Identifikation (und Rekonstruktion) über eine Schlüsselspalte oder Position NSM page organization 2

Beispiel ROW-BASED RECORD MANAGEMENT VERSUS COLUMN-BASED RECORD MANAGEMENT Datensatz als Einheit Menge von vertikalen Projektionen 3

Decomposition Storage Model - DSM BESCHREIBUNG alle Werte einer Spalte (Attribut) werden hintereinander gespeichert Adressierung über Position bzw. logischer ID (surrogate) Seitenaufbau (Datensatz bestehend aus 2 Attributen) G.P. Copeland, S.F. Khoshafian: A Decmposition Storage Model, In: SIGMOD 1985, pages 268-279 1985: DSM (DECOMPOSITION STORAGE MODEL) Proposed as an alternative to NSM (Normalized Storage Model) Decomposition storage mode, decomposes relations vertically 2 indexes: clustered on ID, non-clustered on value Speeds up queries projecting few columns Disadvantages: storage overhead for storing tuple IDs, expensive tuple reconstruction costs 4

Decomposition Storage Model/2 EIGENSCHAFTEN Kompression einfach möglich (z.b. Run length encoding) effizientere Scanoperationen (Feldoperationen bessere Cache-Nutzung) jedoch: Updateoperationen sind komplexer, Lesen aller Spalten aufwendiger Einsatz bei leseoptimierten Datenbanken 5

Vergleich + easy to add/modify a record - might read unnecessary data + only need to read in relevant data - tuple writes require multiple accesses -> suitable for read-mostly, read-intensive, large data repositories 6

Vergleich/2 Characterisitc NSM DSM Inter-record spatial locality Low record reconstruction cost 7

Partition Attributes Across - PAX GOALS Maximizes inter-record spatial locality Incurs a minimal record reconstruction cost APPROACH compromise between NSM and DSM keep attributes values of each record on the same page using a cache-friendly algorithm for placing attributes values inside the page - vertically partitionsthe records within each page - storting together the valuesof each attribute in minipages 8

PAX-Design STORAGE DESIGN each page is partitioned in n minipages (n attributes in a relation) Page Header - pointersto the beginningof each minipage - free space information - number of records - attributes sizes (fixed length or variable) Minipages - F-minpage à fixed-length attributevalues, precence bits indicate the availability of attributesvaluesfor the records (if null, the attribute value is not present) - V-minipage à variable-length attributes values, slotted with pointersto each value, null valuesare denoted by null pointers 9

Evaluation QUERY PERFORMANCE (READ) BULK-LOADING 10

Cache Behavior 11

History & Development 12

From DSM to Column-Stores 1985: DECOMPOSITION STORAGE MODEL LATE 90S 2000S: FOCUS ON MAIN-MEMORY SOME COLUMN STORE SYSTEMS MonetDB, C-Store, Sybase IQ, SAP Business Warehouse Accelerator, Infobright, Exasol, X100/VectorWise PERFORMANCE MonetDB PAX: Partition Attributes Across - Retains NSM I/O pattern - Optimizescache-to-RAM communication 2005: THE (RE)BIRTH OF COLUMN-STORES New hardware and application realities - Faster CPUs, larger memories, disk bandwidth - Multi-terabyte Data Warehouses New approach: combine several techniques - Read-optimized, fast multi-column access, disk/cpu efficiency, light-weight compression Used in read oriented environments - OLAP 13

Application Characteristics OLTP (ON-LINE TRANSACTION PROCESSING) Mix between read-only and update queries Minor analysis tasks Used for data preservation and lookup Read typically only a few records at a time High performance by storing contiguous records in disk pages OLAP (ON-LINE ANALYTICAL PROCESSING) Query-intensive DBMS applications Infrequent batch-oriented updates Complex analysis on large data volumes Read typically only a few attributes of large amounts of historical data in order to partition them and compute aggregates High performance by storing contiguous values of a single attribute 14

Hardware Development - Memory Wall HARDWARE IMPROVEMENTS NOT EQUALLY DISTRIBUTED Advances in CPU speed have outpaced advances in RAM latency CACHE MEMORIES CAN REDUCE THE MEMORY LATENCY WHEN THE REQUESTED DATA IS FOUND IN THE CACHE. Vertically fragmented data structures optimize memory cache usage Main-memory access has become a performance bottleneck for many computer applications - Bandwidth - Latency - Adress translation (TLB) à Memory Wall 15

Speed in Relation... 16

Memory Performance Comparison 17

The Role of Caches CACHES THE SUNNY SIDE Memory is physically accessed at cache line granularity, e.g. 64Byte Sequential memory access: 18

The Role of Caches CACHES THE BAD SIDE Memory is physically accessed at cache line granularity, e.g. 64Byte Random memory access: cache miss 19

The Role of Caches CACHES THE UGLY Memory is physically accessed at cache line granularity, e.g. 64Byte Writes effectively turn into read-modify-write - Many memory addresses map into the same cache line(s) - Dirty cache line needs to be evicted before new one loads 20

The Role of Caches CACHES THE UGLY Memory is physically accessed at cache line granularity, e.g. 64Byte Writes effectively turn into read-modify-write - Many memory addresses map into the same cache line(s) - Dirty cache line needs to be evicted before new one loads 21

Is Memory the new Disk? IS MEMORY THE NEW DISK à IN TERMS OF BEHAVIOR? à NOT QUITE Some characteristics are very similar, e.g. random vs. sequential Memory architecture complicates things! 22

Architektur kommerzieller Produkte 23

Vertica VERTICA ANAYLYTIC DATABASE DBMS Optimized for Next-Generation Data Warehousing (OLAP) Hybrid Store consisting of two distinct storage structures - WOS (Write-Optimized Store): fits into main memoryand is desigend to efficiently support insert and update operations; WOS is unsorted and uncompressed - ROS (Read-Optimized Store): bulkof the data; sorted andcompressed; making it efficient to read and query Tuple Mover - Moves data out of the WOS and into ROS Structure - WOS and ROS are organized as DMS 24

SAP HANA ARCHITECTURE (2012) 25

SAP HANA Column Store MAIN AND DELTA STORE Main Store: main part of the data; compressed data Delte Store: all data changes are written; basic compression and optimized for write access MERGE PROCESS Moves data from delta to main store 26

SAP HANA Column Store/2 THE DELTA MERGE OPERATION 27