Technical Deep-Dive in a Column-Oriented In-Memory Database



Similar documents
In-Memory Data Management for Enterprise Applications

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe

A NEW ARCHITECTURE FOR ENTERPRISE APPLICATION SOFTWARE BASED ON IN-MEMORY DATABASES

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Lecture Data Warehouse Systems

The Impact of Columnar In-Memory Databases on Enterprise Systems

Data Management in SAP Environments

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Semplicità ed Innovazione a portata di mano

Chapter 2 Why Are Enterprise Applications So Diverse?

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki

Oracle Database In-Memory The Next Big Thing

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

BBM467 Data Intensive ApplicaAons

Inge Os Sales Consulting Manager Oracle Norway

Apache Kylin Introduction Dec 8,

Hypertable Architecture Overview

The SAP HANA Database An Architecture Overview

2009 Oracle Corporation 1

SAP HANA Reinventing Real-Time Businesses through Innovation, Value & Simplicity. Eduardo Rodrigues October 2013

DKDA 2012 and the Impact of In-Memory Database Algorithms

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

SanssouciDB: An In-Memory Database for Processing Enterprise Workloads

A Deduplication File System & Course Review

Crack Open Your Operational Database. Jamie Martin September 24th, 2013

Physical Data Organization

Oracle MulBtenant Customer Success Stories

Lecture Data Warehouse Systems

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

DATA WAREHOUSING II. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 23

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley

Innovative technology for big data analytics

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing

Welcome to an introduction to SAP Business One.

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0

Performance Verbesserung von SAP BW mit SQL Server Columnstore

6. Storage and File Structures

Benchmarking Cassandra on Violin

DATA WAREHOUSING AND OLAP TECHNOLOGY

Rethinking SIMD Vectorization for In-Memory Databases

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

Driving MySQL to Big Data Scale. Thomas Hazel Founder, Chief

The Arts & Science of Tuning HANA models for Performance. Abani Pattanayak, SAP HANA CoE Nov 12, 2015

Columnstore in SQL Server 2016

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Data Warehousing With DB2 for z/os... Again!

BW-EML SAP Standard Application Benchmark

In-memory databases and innovations in Business Intelligence

Oracle InMemory Database

Coldbase - A Column-Oriented In-Memory Database

Parquet. Columnar storage for the people

Integrating Apache Spark with an Enterprise Data Warehouse

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

In-Memory Databases MemSQL

Toronto 26 th SAP BI. Leap Forward with SAP

SQL Server 2012 Performance White Paper

Aggregates Caching in Columnar In-Memory Databases

Performance and Scalability Overview

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Big Data Analytics Using SAP HANA Dynamic Tiering Balaji Krishna SAP Labs SESSION CODE: BI474

Main Memory Data Warehouses

SAP BW Columnstore Optimized Flat Cube on Microsoft SQL Server

The New Economics of SAP Business Suite powered by SAP HANA SAP AG. All rights reserved. 2

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Exploring the Synergistic Relationships Between BPC, BW and HANA

Performance Tuning for the Teradata Database

Oracle Big Data SQL Technical Update

Binary search tree with SIMD bandwidth optimization using SSE

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Cache Conscious Column Organization in In-Memory Column Stores

Oracle Database 12c Plug In. Switch On. Get SMART.

SAP HANA SPS 09 - What s New? SAP HANA Scalability

Cloud Computing at Google. Architecture

In-memory computing with SAP HANA

Next Generation Data Warehouse and In-Memory Analytics

Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems

Efficient Compression Techniques for an In Memory Database System

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

Infrastructure Matters: POWER8 vs. Xeon x86

How To Handle Big Data With A Data Scientist

ORACLE DATABASE 12C IN-MEMORY OPTION

A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database

The Data Warehouse ETL Toolkit

Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database Option

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Introducing Oracle Exalytics In-Memory Machine

Big data big talk or big results?

Storage and File Structure

MS SQL Performance (Tuning) Best Practices:

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

Raima Database Manager Version 14.0 In-memory Database Engine

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

SAP HANA Live for SAP Business Suite. David Richert Presales Expert BI & EIM May 29, 2013

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

Real-Time Analytical Processing with SQL Server

Transcription:

Technical Deep-Dive in a Column-Oriented In-Memory Database Martin Faust Martin.Faust@hpi.uni-potsdam.de Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Enterprise Platform and Integration Concepts Group - Topics Research Group of Prof. Dr. h.c. Hasso Plattner Research focuses on the technical aspects of enterprise software and design of complex applications In-Memory Data Management for Enterprise Applications Human-Centered Software Design and Engineering Programming Model for Enterprise Applications Maintenance and Evolution of Service-Oriented Enterprise Software RFID Technology in Enterprise Platforms Teaching activities: Lectures, Exercises, Seminars Bachelor/Master projects Master thesis 2

The Design Thinking Methodology Needs & Opportunities Desirability Human Factors Viability Business Factors Solutions Feasibility Technical Factors 3

Setup Enables Leading Research in the Field of In- Memory Data Management End-Users Software Hardware Supply Chain Innovation Database Technology Massachusetts Institute of Technology Demand Prediction Hasso Plattner Institute, Enterprise Systems & Applications Chair UC Berkeley Cloud Computing Stanford University In-Memory Data Management Design Research

Learning Map of our Online Lecture @ openhpi.de The Future of Enterprise Compu'ng In- Memory Database Operators Founda'ons of Database Storage Techniques Advanced Database Storage Tech- niques Founda'ons for a New Enterprise Applica'on Development Era

Goals Deep technical understanding of a column- oriented, dicaonary- encoded in- memory database and its applicaaon in enterprise compuang Chapters The future of enterprise compuang FoundaAons of database storage techniques In- memory database operators Advanced database storage techniques ImplicaAons on ApplicaAon Development 6

Chapter 1: The Future of Enterprise Computing

New Requirements for Enterprise Computing Sensors Events Structured + unstructured data Social networks... 8

Enterprise Application Characteristics

OLTP vs. OLAP Online TransacAon Processing Online AnalyAcal Processing Modern enterprise resource planning (ERP) systems are challenged by mixed workloads, including OLAP- style queries. For example: OLTP- style: create sales order, invoice, accounang documents, display customer master data or sales order OLAP- style: dunning, available- to- promise, cross selling, operaaonal reporang (list open sales orders) But: Today s data management systems are opamized either for daily transac'onal or analy'cal workloads storing their data along rows or columns 10

Drawbacks of the Separation OLAP systems do not have the latest data OLAP systems only have predefined subset of the data Cost- intensive ETL processes have to sync both systems SeparaAon introduces data redundancy Different data schemas introduce complexity for applicaaons combining sources 11

Enterprise Workloads are Read Dominated Workload in enterprise applicaaons consatutes of Mainly read queries (OLTP 83%, OLAP 94%) Many queries access large sets of data Workload 100% 90 % 80 % 70 % 60 % 50 % 40 % 30 % 20 % Write: Delete Modification Insert Read: Range Select Table Scan Lookup Workload 100% 90 % 80 % 70 % 60 % 50 % 40 % 30 % 20 % Write: Delete Modification Insert Read: Select 10 % 10 % 12 0 % OLTP OLAP 0 % TPC-C

Few Updates in OLTP Percentage of rows updated 13

Vision Combine OLTP and OLAP data using modern hardware and database systems to create a single source of truth, enable real- 'me analy'cs and simplify applicaaons and database structures. AddiAonally, ExtracAon, transformaaon, and loading (ETL) processes Pre- computed aggregates and materialized views become (almost) obsolete. 14

Enterprise Data Characteristics Many columns are not used even once Many columns have a low cardinality of values NULL values/default values are dominant Sparse distribuaon facilitates high compression Standard enterprise so`ware data is sparse and wide. 15

Many Columns are not Used Even Once 55% unused columns per company in average 40% unused columns across all 12 analyzed companies % of Columns 80% 70% 60% 50% 40% 30% 64% 78% Inventory Management Financial Accounting 20% 24% 10% 0% 12% 9 % 13% 1-32 33-1023 1024-100000000 Number of Distinct Values 16

Changes in Hardware

Changes in Hardware give an opportunity to re- think the assump'ons of yesterday because of what is possible today. MulA- Core Architecture (128 cores per server) Large main memories: 2TB / blade One blade ~$50.000 = 1 enterprise class server Parallel scaling across blades A 64- bit address space 12TB in current servers Cost- performance raao rapidly declining Memory hierarchies Main Memory becomes cheaper and larger 18

In the Meantime Research has come up with several advances in sorware for processing data Column- oriented data organizaaon (the column store) Sequen'al scans allow best bandwidth ualizaaon between CPU cores and memory Independence of tuples allows easy paraaoning and therefore parallel processing + Lightweight Compression Reducing data amount Increasing processing speed through late materializaaon And more, e.g., parallel scan/join/aggregaaon 19

A Blueprint of SanssouciDB

Merge SanssouciDB: An In-Memory Database for Enterprise Applications In- Memory Database (IMDB) Data resides permanently in main memory Main Memory is the primary persistence SAll: logging and recovery to/from flash Main memory access is the new botleneck Cache- conscious algorithms/ data structures are crucial (locality is king) Interface Services and Session Management Query Execution Metadata TA Manager Active Data Main Store Column Column Data aging Combined Column Time travel Passive Data (History) Differential Store Column Column Combined Column Logging Snapshots Indexes Inverted Object Data Guide Recovery Log Distribution Layer at Blade Server i i Main Memory at Blade Server i i Non-Volatile Memory

Chapter 2: Foundations of Database Storage Techniques

Learning Map of our Online Lecture @ openhpi.de The Future of Enterprise Compu'ng In- Memory Database Operators Founda'ons of Database Storage Techniques Advanced Database Storage Tech- niques Founda'ons for a New Enterprise Applica'on Development Era

Data Layout in Main Memory

Memory Basics (1) Memory in today s computers has a linear address layout: addresses start at 0x0 and go to 0xFFFFFFFFFFFFFFFF for 64bit Each process has its own virtual address space Virtual memory allocated by the program can distribute over mulaple physical memory locaaons Address translaaon is done in hardware by the CPU 25

Memory Basics (2)... 0x0 0xFFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF Memory layout is only linear Every higher- dimensional access (like two- dimensional database tables) is mapped to this linear band 26

Memory Hierarchy Nehalem Quadcore Core 0 Core 1 Core 2 Core 3 TLB Nehalem Quadcore Core 1 Core 2 Core 3 Core 0 TLB L1 Cacheline L1 L1 L2 Cacheline L2 L2 L3 Cacheline L3 Cache QPI QPI L3 Cache Memory Page Main Memory Main Memory 27

Physical Data Representation Row store: Rows are stored consecuavely OpAmal for row- wise access (e.g. SELECT *) Column store: Columns are stored consecuavely OpAmal for akribute focused access (e.g. SUM, GROUP BY) Note: concept is independent from storage type + Row-Store Column-store Row 1 Doc Num Doc Date Sold- To Value Sales Status Org Row 2 Row 3 Row 4 28

Row Data Layout Data is stored tuple- wise Leverage co- locaaon of akributes for a single tuple Low cost for tuple reconstrucaon, but higher cost for sequenaal scan of a single akribute Column Operation A B C A B C A B C A B C Row Row Row Row Operation A B C A B C A B C A B C 29 Row Row Row Row

Columnar Data Layout Data is stored akribute- wise Leverage sequenaal scan- speed in main memory for predicate evaluaaon Tuple reconstrucaon is more expensive Column Operation A A A A B B B B C C C C Column Column Column Row Operation A A A A B B B B C C C C 30 Column Column Column

Row-oriented storage A1 B1 C1 A2 B2 C2 A3 B3 C3 31 A4 B4 C4

Row-oriented storage A1 B1 C1 A2 B2 C2 A3 B3 C3 32 A4 B4 C4

Row-oriented storage A1 B1 C1 A2 B2 C2 A3 B3 C3 33 A4 B4 C4

Row-oriented storage A1 B1 C1 A2 B2 C2 A3 B3 C3 34 A4 B4 C4

Row-oriented storage A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 35

Column-oriented storage A1 B1 C1 A2 B2 C2 A3 B3 C3 36 A4 B4 C4

Column-oriented storage A1 A2 A3 A4 B1 B2 B3 C1 C2 C3 37 B4 C4

Column-oriented storage A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 38 C4

Column-oriented storage A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 39

Dictionary Encoding

Motivation Main memory access is the new bokleneck Idea: Trade CPU Ame to compress and decompress data Compression reduces number of memory accesses Leads to less cache misses due to more informaaon on a cache line OperaAon directly on compressed data possible Offseong with bit- encoded fixed- length data types Based on limited value domain 41

Dictionary Encoding Example 8 billion humans Akributes first name last name gender country city birthday à 200 byte Each akribute is stored dicaonary encoded 42

Sample Data rec ID fname lname gender city country birthday 39 John Smith m Chicago USA 12.03.1964 40 Mary Brown f London UK 12.05.1964 41 Jane Doe f Palo Alto USA 23.04.1976 42 John Doe m Palo Alto USA 17.06.1952 43 Peter Schmidt m Potsdam GER 11.11.1975 43

Dictionary Encoding a Column A column is split into a dicaonary and an akribute vector DicAonary stores all disanct values with implicit value ID Akribute vector stores value IDs for all entries in the column PosiAon is implicit, not stored explicitly Enables offseong with fixed- length data types Rec ID fname Dic'onary for fname ATribute Vector for fname Value ID Value posi'on Value ID 39 John 40 Mary 23 John 39 23 41 Jane 24 Mary 40 24 42 John 25 Jane 41 25 43 Peter 26 Peter 42 23 43 26 44

Sorted Dictionary DicAonary entries are sorted either by their numeric value or lexicographically DicAonary lookup complexity: O(log(n)) instead of O(n) DicAonary entries can be compressed to reduce the amount of required storage SelecAon criteria with ranges are less expensive (order- preserving dicaonary) 45

Data Size Examples Column Cardi- nality Bits Needed Item Size Plain Size Size with Dic'onary (Dic'onary + Column) Compression Factor First name Last name 5 million 23 bit 50 Byte 373 GB 238.4 MB + 21.4 GB 17 8 million 23 bit 50 Byte 373 GB 381.5 MB + 21.4 GB 17 Gender 2 1 bit 1 Byte 7 GB 2 bit + 953.7 MB 8 City 1 million 20 bit 50 Byte 373 GB 47.7 MB + 18.6 GB 20 Country 200 8 bit 47 Byte 350 GB 9.2 KB + 7.5 GB 47 Birthday 40,000 16 bit 2 Byte 15 GB 78.1 KB + 14.9 GB 1 Totals 200 Byte 1.6 TB 92 GB 17 46

Chapter 3: In-Memory Database Operators

Learning Map of our Online Lecture @ openhpi.de The Future of Enterprise Compu'ng In- Memory Database Operators Founda'ons of Database Storage Techniques Advanced Database Storage Tech- niques Founda'ons for a New Enterprise Applica'on Development Era

Scan Performance (1) 8 billion humans Akributes First Name Last Name Gender Country City Birthday è 200 byte QuesAon: How many men/women? Assumed scan speed: 2 MB/ms/core 49

Scan Performance (2) Row Store Layout Table: humans Row 1 Row 2 Row 3... First Name Last Name Country Gender City Birthday Table size = 8 billion tuples x 200 bytes per tuple à ~1.6 TB Scan through all rows with 2 MB/ms/core à ~800 seconds with 1 core Row 8 x 10 9 50

Scan Performance (3) Row Store Full Table Scan Table: humans Row 1 Row 2 Row 3... Last Name Country First Name Gender City Birthday Table size = 8 billion tuples x 200 bytes per tuple à ~1.6 TB Scan through all rows with 2 MB/ms/core à ~800 seconds with 1 core Row 8 x 10 9 51 Data loaded and used Data loaded but not used

Scan Performance (4) Row Store Stride Access Gender Table: humans Row 1 Row 2 Row 3 Last Name Country First Name Gender City Birthday 8 billion cache accesses à 64 byte à ~512 GB Read with 2 MB/ms/core à ~256 seconds with 1 core... Row 8 x 10 9 52 Data not loaded Data loaded and used Data loaded but not used

Scan Performance (5) Column Store Layout Table: humans Last Name First Name Gender Country City Birthday Table size Attribute vectors: ~91 GB Dictionaries: ~700 MB à Total: ~92 GB Compression factor: ~17 53

Scan Performance (6) Column Store Full Column Scan on Gender Table: humans Last Name First Name Gender Country City Birthday Size of attribute vector gender = 8 billion tuples x 1 bit per tuple à ~1 GB Scan through attribute vector with 2 MB/ms/core à ~0.5 seconds with 1 core 54 Data not loaded Data loaded and used Data loaded but not used

Scan Performance (7) Column Store Full Column Scan on Birthday Table: humans Last Name First Name Gender Country City Birthday Size of attribute vector birthday = 8 billion tuples x 2 Byte per tuple à ~16 GB Scan through column with 2 MB/ms/core à ~8 seconds with 1 core 55 Data not loaded Data loaded and used Data loaded but not used

Scan Performance Summary How many women, how many men? Time in seconds Column Store Full table scan Row Store Stride access 0.5 800 256 1,600x slower 512x slower 56

Tuple Reconstruction

Tuple Reconstruction (1) Accessing a record in a row store Table: world_populaaon All akributes are stored consecuavely 200 byte à 4 cache accesses à 64 byte à 256 byte Read with 2 MB/ms/core à ~0.128 μs with 1 core 58

Tuple Reconstruction (2) Virtual record IDs Table: world_populaaon All akributes are stored in separate columns Implicit record IDs are used to reconstruct rows 59

Tuple Reconstruction (3) Virtual record IDs Table: world_populaaon 1 cache access for each akribute 6 cache accesses à 64 byte à 384 byte Read with 2 MB/ms/core à ~0.192 μs with 1 core 60

Select

SELECT Example SELECT fname, lname FROM world_population WHERE country="italy" and gender="m" fname lname country gender Gianluigi Buffon Italy m Lena Gercke Germany f Mario Balotelli Italy m Manuel Neuer Germany m Lukas Podolski Germany m Klaas- Jan Huntelaar Netherlands m 62

Query Plan MulAple plans exist to execute query Query OpAmizer decides which is executed Based on cost model, staasacs and other parameters AlternaAves Scan country and gender, posiaonal AND Scan over country and probe into gender Indices might be used Decision depends on data and query parameters like e.g. selecavity SELECT fname, lname FROM world_population WHERE country="italy" and gender="m" 63

Query Plan (i) country = "Italy" gender = "m" position list position list positional AND π fname, lname PosiAonal AND: Predicates are evaluated and generate posiaon lists Intermediate posiaon lists are logically combined Final posiaon list is used for materializaaon 64

Query Execution (i) Value ID Dic'onary for country 0 Algeria 1 France 2 Germany 3 Italy 4 Netherlands fname lname country gender Gianluigi Buffon 3 1 Lena Gercke 2 0 Mario Balotelli 3 1 Manuel Neuer 2 1 Lukas Podolski 2 1 Klaas- Jan Huntelaar 4 1 country = 3 ("Italy") Posi'on 0 2 AND Value ID 0 f 1 m gender = 1 ("m") Posi'on 0 2 3 4 5 Dic'onary for gender 65 Posi'on 0 2

Query Plan (ii) country = "Italy" position list Positional Filter gender = "m" Probe Gender π fname, lname Based on posiaon list produced by first selecaon, gender column is probed. 66

Insert

Insert Insert is the dominant modificaaon operaaon Delete/Update can be modeled as Inserts as well (Insert- only approach) InserAng into a compressed in- memory persistence can be expensive UpdaAng sorted sequences (e.g. dicaonaries) is a challenge InserAng into columnar storages is generally more expensive than inserang into row storages 68

Insert Example world_populaaon rowid fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977..................... INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) 69

INSERT (1) w/o new Dictionary entry INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV 0 0 1 1 2 3 3 2 4 3 D 0 Albrecht 1 Berg 2 Meyer 3 Schulze fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977..................... Akribute Vector (AV) DicAonary (D) 70

INSERT (1) w/o new Dictionary entry INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV 0 0 1 1 2 3 3 2 4 3 D 0 Albrecht 1 Berg 2 Meyer 3 Schulze fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977..................... 1. Look- up on D à entry found Akribute Vector (AV) DicAonary (D) 71

INSERT (1) w/o new Dictionary entry INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV 0 0 1 1 2 3 3 2 4 3 5 3 D 0 Albrecht 1 Berg 2 Meyer 3 Schulze fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977 5 Schulze 1. Look- up on D à entry found 2. Append ValueID to AV..................... Akribute Vector (AV) DicAonary (D) 72

INSERT (2) with new Dictionary Entry I/II INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV 0 0 1 0 2 1 3 2 4 3 D 0 Berlin 1 Hamburg 2 Innsbruck 3 Potsdam fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977 5 Schulze..................... Akribute Vector (AV) DicAonary (D) 73

INSERT (2) with new Dictionary Entry I/II INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV 0 0 1 0 2 1 3 2 4 3 D 0 Berlin 1 Hamburg 2 Innsbruck 3 Potsdam fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977 5 Schulze 1. Look- up on D à no entry found..................... Akribute Vector (AV) DicAonary (D) 74

INSERT (2) with new Dictionary Entry I/II INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV 0 0 1 0 2 1 3 2 4 3 D 0 Berlin 1 Hamburg 2 Innsbruck 3 Potsdam 4 Rostock fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977 5 Schulze 1. Look- up on D à no entry found 2. Append new value to D (no re- sorang necessary)..................... Akribute Vector (AV) DicAonary (D) 75

INSERT (2) with new Dictionary Entry I/II INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV 0 0 1 0 2 1 3 2 4 3 5 4 D 0 Berlin 1 Hamburg 2 Innsbruck 3 Potsdam 4 Rostock fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977 5 Schulze Rostock 1. Look- up on D à no entry found 2. Append new value to D (no re- sorang necessary) 3. Append ValueID to AV..................... Akribute Vector (AV) DicAonary (D) 76

INSERT (2) with new Dictionary Entry I/II INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV 0 2 1 3 2 1 3 0 4 4 D 0 Anton 1 Hanna 2 MarAn 3 Michael 4 Sophie fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977 5 Schulze Rostock..................... Akribute Vector (AV) DicAonary (D) 77

INSERT (2) with new Dictionary Entry II/II INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV 0 2 1 3 2 1 3 0 4 4 D 0 Anton 1 Hanna 2 MarAn 3 Michael 4 Sophie fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977 5 Schulze Rostock 1. Look- up on D à no entry found..................... Akribute Vector (AV) DicAonary (D) 78

INSERT (2) with new Dictionary Entry II/II INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV 0 2 1 3 2 1 3 0 4 4 D 0 Anton 1 Hanna 2 Karen 3 MarAn 4 Michael 5 Sophie fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977 5 Schulze Rostock 1. Look- up on D à no entry found 2. Insert new value to D..................... Akribute Vector (AV) DicAonary (D) 79

INSERT (2) with new Dictionary Entry II/II INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV D (old) D (new) 0 2 1 3 2 1 3 0 4 4 0 Anton 1 Hanna 2 MarAn 3 Michael 4 Sophie 0 Anton 1 Hanna 2 Karen 3 MarAn 4 Michael fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977 5 Sophie 5 Schulze Rostock 1. Look- up on D à no entry found 2. Insert new value to D..................... Akribute Vector (AV) DicAonary (D) 80

INSERT (2) with new Dictionary Entry II/II INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) AV (old) AV (new) D (new) 0 2 1 3 2 1 3 0 4 4 0 3 1 4 2 1 3 0 4 5 0 Anton 1 Hanna 2 Karen 3 MarAn 4 Michael fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977 5 Sophie 5 Schulze Rostock 1. Look- up on D à no entry found 2. Insert new value to D 3. Change ValueIDs in AV..................... Akribute Vector (AV) DicAonary (D) 81

INSERT (2) with new Dictionary Entry II/II INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) Changed Value IDs AV 0 3 1 4 2 1 3 0 4 5 5 2 D 0 Anton 1 Hanna 2 Karen 3 MarAn 4 Michael 5 Sophie fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Sophie Schulze f GER Potsdam 09-03- 1977 5 Karen Schulze Rostock 1. Look- up on D à no entry found 2. Insert new value to D 3. Change ValueIDs in AV 4. Append new ValueID to AV..................... Akribute Vector (AV) DicAonary (D) 82

Result world_populaaon rowid fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Ulrike Schulze f GER Potsdam 09-03- 1977 5 Karen Schulze f GER Rostock 11-15- 2012 INSERT INTO world_populaaon VALUES (Karen, Schulze, f, GER, Rostock, 11-15- 2012) 83

Chapter 4: Advanced Database Storage Techniques

Learning Map of our Online Lecture @ openhpi.de The Future of Enterprise Compu'ng In- Memory Database Operators Founda'ons of Database Storage Techniques Advanced Database Storage Tech- niques Founda'ons for a New Enterprise Applica'on Development Era

Differential Buffer

Motivation InserAng new tuples directly into a compressed structure can be expensive Especially when using sorted structures New values can require reorganizing the dicaonary Number of bits required to encode all dicaonary values can change, akribute vector has to be reorganized 87

Differential Buffer New values are wriken to a dedicated differenaal buffer (Delta) Cache SensiAve B+ Tree (CSB+) used for fast search on Delta Read Write world_population fname Table fname (compressed) Akribute Vector Data Modifying Operations 0 0 1 1 2 1 3 3 4 2 5 1 DicAonary 0 Anton 1 Hanna 2 Michael 3 Sophie Differential Buffer Main Store Akribute Vector 0 0 1 1 Table 2 1 3 2 0 Angela 1 Klaus 2 Andre Read Operations DicAonary CSB+ 88 8 Billion entries Main Store up to 50,000 entries DifferenAal Buffer/ Delta

Differential Buffer Inserts of new values are fast, because dicaonary and akribute vector do not need to be resorted Range selects on differenaal buffer are expensive Unsorted dicaonary allows no direct comparison of value IDs Scans with range selecaon need to lookup values in dicaonary for comparisons DifferenAal Buffer requires more memory: Akribute vector not bit- compressed AddiAonal CSB+ Tree for dicaonary 89

Tuple Lifetime Michael moves from Berlin to Potsdam Main Table: world_populaaon recid fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Ulrike Schulze f GER Potsdam 09-03- 1977 5 Sophie Schulze f GER Rostock 06-20- 2012..................... 8 * 10 9 Zacharias Perdopolus m GRE Athen 03-12- 1979 Main Store DifferenAal Buffer UPDATE "world_populaaon" SET city="potsdam" WHERE fname="michael" AND lname="berg" 90

Tuple Lifetime Michael moves from Berlin to Potsdam Main Table: world_populaaon recid fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Ulrike Schulze f GER Potsdam 09-03- 1977 5 Sophie Schulze f GER Rostock 06-20- 2012..................... 8 * 10 9 Zacharias Perdopolus m GRE Athen 03-12- 1979 Main Store DifferenAal Buffer UPDATE "world_populaaon" SET city="potsdam" WHERE fname="michael" AND lname="berg" 91

Tuple Lifetime Michael moves from Berlin to Potsdam Main Table: world_populaaon recid fname lname gender country city birthday 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 Michael Berg m GER Berlin 03-05- 1970 2 Hanna Schulze f GER Hamburg 04-04- 1968 3 Anton Meyer m AUT Innsbruck 10-20- 1992 4 Ulrike Schulze f GER Potsdam 09-03- 1977 5 Sophie Schulze f GER Rostock 06-20- 2012..................... 8 * 10 9 Zacharias Perdopolus m GRE Athen 03-12- 1979 0 Michael Berg m GER Potsdam 03-05- 1970 Main Store DifferenAal Buffer UPDATE "world_populaaon" SET city="potsdam" WHERE fname="michael" AND lname="berg" 92

Tuple Lifetime Tuples are now available in Main Store and DifferenAal Buffer Tuples of a table are marked by a validity vector to reduce the required amount of reorganizaaon steps AddiAonal akribute vector for validity 1 bit required per database tuple Invalidated tuples stay in the database table, unal the next reorganizaaon takes place Query results Main and delta have to be queried Results are filtered using the validity vector 93

Tuple Lifetime Michael moves from Berlin to Potsdam Main Table: world_populaaon recid fname lname gender country city birthday valid 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 1 Michael Berg m GER Berlin 03-05- 1970 0 Main Store 2 Hanna Schulze f GER Hamburg 04-04- 1968 1 3 Anton Meyer m AUT Innsbruck 10-20- 1992 1 4 Ulrike Schulze f GER Potsdam 09-03- 1977 1 5 Sophie Schulze f GER Rostock 06-20- 2012 1..................... 8 * 10 9 Zacharias Perdopolus m GRE Athen 03-12- 1979 1 0 Michael Berg m GER Potsdam 03-05- 1970 1 DifferenAal Buffer UPDATE "world_populaaon" SET city="potsdam" WHERE fname="michael" AND lname="berg" 94

Tuple Lifetime Michael moves from Berlin to Potsdam Main Table: world_populaaon recid fname lname gender country city birthday valid 0 MarAn Albrecht m GER Berlin 08-05- 1955 1 1 Michael Berg m GER Berlin 03-05- 1970 0 Main Store 2 Hanna Schulze f GER Hamburg 04-04- 1968 1 3 Anton Meyer m AUT Innsbruck 10-20- 1992 1 4 Ulrike Schulze f GER Potsdam 09-03- 1977 1 5 Sophie Schulze f GER Rostock 06-20- 2012 1..................... 8 * 10 9 Zacharias Perdopolus m GRE Athen 03-12- 1979 1 0 Michael Berg m GER Potsdam 03-05- 1970 1 DifferenAal Buffer UPDATE "world_populaaon" SET city="potsdam" WHERE fname="michael" AND lname="berg" 95

Merge

Handling Write Operations All Write operaaons (INSERT, UPDATE) are stored within a differenaal buffer (delta) first Read- operaaons on differenaal buffer are more expensive than on main store DifferenAal buffer is merged periodically with the main store To avoid performance degradaaon based on large delta Merge is performed asynchronously 97

Merge Overview I/II The merge process is triggered for single tables Is triggered by: Amount of tuples in buffer Cost model to Schedule Take query cost into account Manually 98

Merge Overview II/II Working on data copies allows asynchronous merge Very limited interrupaon due to short lock At least twice the memory of the table needed! Before Data Modifying Operations During the Merge Process Merge Operation Data Modifying Operations After Data Modifying Operations Table Table Table Main Store Differential Buffer Main Store Differential Buffer Main Store (new) Differential Buffer (new) Main Store (new) Differential Buffer (new) Read Operations Read Operations Read Operations 99 Prepare Attribute Merge Commit

Attribute Merge Step 1: Dic'onary Merge Merge main and delta dicaonary OpAonally remove unused values Merge of two sorted arrays Create mapping if valueids changed Step 2: Update ATribute Vector Create new merged main paraaon Update valueids reflecang changed dicaonary 100

Example Main Store Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Albert Berlin 1 Michael London 2 Nadja recid fname city 0 2 0 1 1 1 2 0 0 recid valid 0 1 1 1 2 1 Delta Dic'onaries ATribute Vectors Validity Vector valueid fname city recid fname city recid valid 101

Main Store Example Michael moves from London to Berlin Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Albert Berlin 1 Michael London 2 Nadja recid fname city 0 2 0 1 1 1 2 0 0 recid valid 0 1 1 0 2 1 Delta Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Michael Berlin recid fname city 0 0 0 recid valid 0 1 102

Main Store Example Nadja moves from Berlin to Potsdam Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Albert Berlin 1 Michael London 2 Nadja recid fname city 0 2 0 1 1 1 2 0 0 recid valid 0 0 1 0 2 1 Delta Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Michael Berlin 1 Nadja Potsdam recid fname city 0 0 0 1 1 1 recid valid 0 1 1 1 103

Main Store Example Michael moves from Berlin to Potsdam Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Albert Berlin 1 Michael London 2 Nadja recid fname city 0 2 0 1 1 1 2 0 0 recid valid 0 0 1 0 2 1 Delta Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Michael Berlin 1 Nadja Potsdam recid fname city 0 0 0 1 1 1 2 0 1 recid valid 0 0 1 1 2 1 104

First Step: Validity Detection Main Store Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Albert Berlin 1 Michael London 2 Nadja recid fname city 0 2 0 1 1 1 2 0 0 recid valid 0 0 1 0 2 1 Delta Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Michael Berlin 1 Nadja Potsdam recid fname city 0 0 0 1 1 1 2 0 1 recid valid 0 0 1 1 2 1 Will be deleted / moved into history paraaon 105

First Step: Dictionary Merge Main Store Dic'onaries valueid fname city 0 Albert Berlin 1 Michael London 2 Nadja New Combined Main Store Dic'onaries valueid fname city 0 Albert Berlin 1 Michael Potsdam 2 Nadja Delta Dic'onaries valueid fname city 0 Michael Berlin 1 Nadja Potsdam Mappings fname: old to new Main valueid (old) 0 1 2 valueid (new) 0 1 2 city: old to new Main valueid (old) 0 1 valueid (new) 0 NA fname: Delta to Main valueid (Delta) 0 1 valueid (Main) 1 2 city: Delta to Main valueid (Delta) 0 1 valueid (Main) 0 1

Second Step: New Main Column Main Store Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Albert Berlin 1 Michael London 2 Nadja Delta recid fname city 0 0 0 recid valid 0 1 Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Michael Berlin 1 Nadja Potsdam recid fname city 0 1 1 1 0 1 recid valid 0 1 1 1 Mappings fname: old to new Main fname: Delta to Main city: old to new Main city: Delta to Main valueid (old) 0 1 2 valueid (Delta) 0 1 valueid (old) 0 1 valueid (Delta) 0 1 valueid (new) 0 1 2 valueid (Main) 1 2 valueid (new) 0 NA valueid (Main) 0 1

Second Step: New Main Column Main Store Dic'onaries ATribute Vectors Validity Vector valueid fname city 0 Albert Berlin 1 Michael Potsdam 2 Nadja recid fname city 0 0 0 1 2 1 2 1 1 recid valid 0 1 1 1 2 1 Delta Dic'onaries ATribute Vectors Validity Vector valueid fname city recid fname city recid valid 108

Chapter 5: Implications on Application Development

Learning Map of our Online Lecture @ openhpi.de The Future of Enterprise Compu'ng In- Memory Database Operators Founda'ons of Database Storage Techniques Advanced Database Storage Tech- niques Founda'ons for a New Enterprise Applica'on Development Era

How does it all come together? 1. Mixed Workload combining OLTP and analytic-style queries Column-Stores are best suited for analytic-style queries In-memory databases enable fast tuple re-construction In-memory column store allows aggregation on-the-fly 2. Sparse enterprise data Lightweight compression schemes are optimal Increases query execution Improves feasibility of in-memory databases Changed Hardware Advances in data processing (software) 111 3. Mostly read workload Read-optimized stores provide best throughput i.e. compressed in-memory column-store Write-optimized store as delta partition to handle data changes is sufficient Our focus Complex Enterprise Applications

Merge An In-Memory Database for Enterprise Applications In- Memory Database (IMDB) Data resides permanently in main memory Main Memory is the primary persistence SAll: logging and recovery from/to flash Main memory access is the new botleneck Cache- conscious algorithms/ data structures are crucial (locality is king) Interface Services and Session Management Query Execution Metadata TA Manager Active Data Main Store Column Column Data aging Combined Column Time travel Differential Store Column Column Combined Column Logging Indexes Inverted Object Data Guide Recovery Distribution Layer at Blade i Main Memory at Blade i Log Non-Volatile Memory 112 Passive Data (History) Snapshots

Simplified Application Development ApplicaAon cache Database cache Prebuilt aggregates TradiAonal Column- oriented Fewer caches necessary No redundant data (OLAP/OLTP, LiveCache) No maintenance of materialized views or aggregates Minimal index maintenance Raw data 113

Examples for Implications on Enterprise Applications

SAP ERP Financials on In-Memory Technology In-memory column database for an ERP system Combined workload (parallel OLTP/OLAP queries) Leverage in-memory capabilities to Reduce amount of data Aggregate on-the-fly Run analytic-style queries (to replace materialized views) Execute stored procedures Use Case: SAP ERP Financials solution Post and change documents Display open items Run dunning job 115 Analytical queries, such as balance sheet

116 Current Financials Solutions

The Target Financials Solution Only base tables, algorithms, and some indices 117

Feasibility of Financials on In- Memory Technology in 2009 Modifications on SAP Financials Removed secondary indices, sum tables and pre-calculated and materialized tables Reduce code complexity and simplify locks Insert Only to enable history (change document replacement) Added stored procedures with business functionality European division of a retailer 118 ERP 2005 ECC 6.0 EhP3 5.5 TB system database size Financials: 23 million headers / 1.5 GB in main memory 252 million items / 50 GB in main memory (including inverted indices for join attributes and insert only extension)

In-Memory Financials on SAP ERP accounting documents dunning data sum tables BKPF BSEG secondary indices BSAD MHNK MHND GLT0 BSAK change documents BSAS BSID CDHDR CDPOS LFC1 BSIK BSIS KNC1 119

In-Memory Financials on BKPF BSEG SAP ERP accounting documents 120

Reduction by a Factor 10 Classic Row-Store (w/o compr.) IMDB BKPF 8.7 GB 1.5 GB BSEG 255 GB 50 GB 263.7 GB 51.5 GB Secondary Indices 255 GB - Sum Tables 0.55 GB - Complete 519.25 GB 51.5 GB 121

Booking an accounting document Insert into BKPF and BSEG only Lack of updates reduces locks 122

Wrap Up (I) The future of enterprise compuang Big data challenges Changes in Hardware OLTP and OLAP in one single system FoundaAons of database storage techniques Data layout opamized for memory hierarchies Light- weight compression techniques In- memory database operators Operators on dicaonary compressed data Query execuaon: Scan, Insert, Tuple ReconstrucAon 123

Wrap Up (II) Advanced database storage techniques DifferenAal buffer accumulates changes Merge combines changes periodically with main storage ImplicaAons on ApplicaAon Development Move data intensive operaaons closer to the data New analyacal applicaaons on transacaonal data possible Less data redundancy, more on the fly calculaaon Reduced code complexity 124

References A Course in In- Memory Data Management, H. Plakner hkp://epic.hpi.uni- potsdam.de/home/inmemorybook PublicaAons of our Research Group: Papers about the inner- workings of in- memory databases hkp://epic.hpi.uni- potsdam.de/home/publicaaons 125

Thank You! Technical Deep-Dive in a Column-Oriented In-Memory Database Martin Faust Martin.Faust@hpi.uni-potsdam.de Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Dunning Run Dunning run determines all open and due invoices Customer defined queries on 250M records Current system: 20 min New logic: 1.5 sec In- memory column store Parallelized stored procedures Simplified Financials 127

Bring Application Logic Closer to the Storage Layer Select accounts to be dunned, for each: Select open account items from BSID, for each: Calculate due date Select dunning procedure, level and area Create MHNK entries Create and write dunning item tables 128

Bring Application Logic Closer to the Storage Layer Select accounts to be dunned, for each: Select open account items from BSID, for each: Calculate due date Select dunning procedure, level and area Create MHNK entries Create and write dunning item tables 1 SELECT 10000 SELECTs 10000 SELECTs 31000 Entries 129

Bring Application Logic Closer to the Storage Layer Select accounts to be dunned, for each: Select open account items from BSID, for each: Calculate due date Select dunning procedure, level and area Create MHNK entries Create and write dunning item tables One single 1 SELECT stored 10000 procedure SELECTs executed 10000 within SELECTs IMDB 31000 Entries 130

Bring Application Logic Closer to the Storage Layer Select accounts to be dunned, for each: Select open account items from BSID, for each: Calculate due date Select dunning procedure, level and area Create MHNK entries Create and write dunning item tables One single stored procedure executed within IMDB Calculated onthe-fly 131

132 Dunning Application

133 Dunning Application

Available-to-Promise Check Can I get enough quanaaes of a requested product on a desired delivery date? Goal: Analyze and validate the potenaal of in- memory and highly parallel data processing for Available- to- Promise (ATP) Challenges Dynamic aggregaaon Instant rescheduling in minutes vs. nightly batch runs Real- Ame and historical analyacs Outcome Real- Ame ATP checks without materialized views Ad- hoc rescheduling No materialized aggregates 134

In-Memory Available-to- Promise 135

Demand Planning Flexible analysis of demand planning data Zooming to choose granularity Filter by certain products or customers Browse through Ame spans CombinaAon of locaaon- based geo data with planning data in an in- memory database External factors such as the temperature, or the level of cloudiness can be overlaid to incorporate them in planning decisions 136

GORFID HANA for Streaming Data Processing Use Case: In- Memory RFID Data Management EvaluaAon of SAP OER Prototypical implementaaon of: RFID Read Event Repository on HANA Discovery Service on HANA (10 billion data records with ~3 seconds response Ame) Front ends for iphone & ipad Key Findings: HANA is suited for streaming data (using bulk inserts) AnalyAcs on streaming data is now possible 137

GORFID: Near Real-Time as a Concept Read Read Read Event Event Repositories Verification Verification Services Services Bulk load every 2-3 seconds: > 50,000 inserts/s up to 8,000 read up up to to 8.000 8.000 read read event event event per per notifications second second per second Discovery Discovery Service Service Discovery Service up to 2,000 up up to to 2.000 2.000 requests per per requests second per second SAP SAP HANA HANA P A A P 138

139 POS Explorer I

140 POS Explorer II

141 POS Explorer III