In-Memory Data Management for Enterprise Applications



Similar documents
In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe

Lecture Data Warehouse Systems

Innovative technology for big data analytics

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing

Chapter 2 Why Are Enterprise Applications So Diverse?

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki

Actian Vector in Hadoop

SQL Server 2012 Performance White Paper

DKDA 2012 and the Impact of In-Memory Database Algorithms

Oracle Database In-Memory The Next Big Thing

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

A NEW ARCHITECTURE FOR ENTERPRISE APPLICATION SOFTWARE BASED ON IN-MEMORY DATABASES

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley

Data Management in SAP Environments

The SAP HANA Database An Architecture Overview

Performance Verbesserung von SAP BW mit SQL Server Columnstore

In-memory databases and innovations in Business Intelligence

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Main Memory Data Warehouses

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

SanssouciDB: An In-Memory Database for Processing Enterprise Workloads

05. Alternative Speichermodelle. Architektur von Datenbanksystemen I

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Technical Deep-Dive in a Column-Oriented In-Memory Database

The Impact of Columnar In-Memory Databases on Enterprise Systems

2009 Oracle Corporation 1

Inge Os Sales Consulting Manager Oracle Norway

NextGen Infrastructure for Big DATA Analytics.

CS54100: Database Systems

Cache Conscious Column Organization in In-Memory Column Stores

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008

An Overview of SAP BW Powered by HANA. Al Weedman

Oracle Database 12c Built for Data Warehousing O R A C L E W H I T E P A P E R F E B R U A R Y

Oracle Big Data SQL Technical Update

Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database Option

Integrating Apache Spark with an Enterprise Data Warehouse

How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D.

Parallel Data Warehouse

A Deduplication File System & Course Review

System Architecture. In-Memory Database

Performance and Scalability Overview

SAP BW Columnstore Optimized Flat Cube on Microsoft SQL Server

Key Attributes for Analytics in an IBM i environment

Technical Challenges for Big Health Care Data. Donald Kossmann Systems Group Department of Computer Science ETH Zurich

Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0

The New Economics of SAP Business Suite powered by SAP HANA SAP AG. All rights reserved. 2

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

SAP HANA Reinventing Real-Time Businesses through Innovation, Value & Simplicity. Eduardo Rodrigues October 2013

SAP HANA In-Memory Database Sizing Guideline

Column-Oriented Databases to Gain High Performance for Data Warehouse System

Distributed Architecture of Oracle Database In-memory

When to Use Oracle Database In-Memory

A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Apache Kylin Introduction Dec 8,

Rethinking SIMD Vectorization for In-Memory Databases

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Oracle Database 12c Plug In. Switch On. Get SMART.

Data Warehouse in the Cloud Marketing or Reality? Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team

Infrastructure Matters: POWER8 vs. Xeon x86

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

Architectures for Big Data Analytics A database perspective

Toronto 26 th SAP BI. Leap Forward with SAP

Emerging Innovations In Analytical Databases TDWI India Chapter Meeting, July 23, 2011, Hyderabad

2010 Ingres Corporation. Interactive BI for Large Data Volumes Silicon India BI Conference, 2011, Mumbai Vivek Bhatnagar, Ingres Corporation

Aggregates Caching in Columnar In-Memory Databases

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

MS SQL Performance (Tuning) Best Practices:

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Performance and Scalability Overview

Safe Harbor Statement

Microsoft Analytics Platform System. Solution Brief

IS IN-MEMORY COMPUTING MAKING THE MOVE TO PRIME TIME?

IBM DB2 specific SAP NetWeaver Business Warehouse Near-Line Storage Solution

A SCALABLE DEDUPLICATION AND GARBAGE COLLECTION ENGINE FOR INCREMENTAL BACKUP

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box)

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

The Methodology Behind the Dell SQL Server Advisor Tool

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

Increasing Flash Throughput for Big Data Applications (Data Management Track)

Oracle MulBtenant Customer Success Stories

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

SQL Server 2014 In-Memory Tables (Extreme Transaction Processing)

Next Generation Data Warehouse and In-Memory Analytics

CitusDB Architecture for Real-Time Big Data

Coldbase - A Column-Oriented In-Memory Database

Columnstore in SQL Server 2016

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Transcription:

In-Memory Data Management for Enterprise Applications Jens Krueger Senior Researcher and Chair Representative Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam

Agenda 2 1. Changed Hardware 2. Advances in Data Processing 3. Todays Enterprise Applications 4. The In-Memory Data Management for Enterprise Applications 5. Impact on Enterprise Applications

All Areas have to taken into account 3 Changed Hardware Advances in data processing (software) Complex Enterprise Applications Our focus

Why a New Data Management?! 4 DBMS architecture has not changed over decades Redesign needed to handle the changes in: Hardware trends (CPU/cache/memory) Changed workloads Data characteristics Data amount Query engine Buffer pool Some academic prototypes: MonetDB, C-store, HyPer, HYRISE Several database vendors picked up the idea and have new databases in place (e.g., SAP, Vertica, Greenplum, Oracle) Traditional DBMS Architecture

Changes in Hardware 5 give an opportunity to re-think the assumptions of yesterday because of what is possible today. Multi-Core Architecture (96 cores per server) One blade ~$50.000 = 1 Enterprise Class Server Parallel scaling across blades A 64 bit address space 2TB in current servers 25GB/s per core Cost-performance ratio rapidly declining Memory hierarchies Main Memory becomes cheaper and larger

In the Meantime Research as come up with 6 several advance in software for processing data Column-oriented data organization (the column-store) Sequential scans allow best bandwidth utilization between CPU cores and memory Independence of tuples within columns allows easy partitioning and therefore parallel processing + Lightweight Compression Reducing data amount, while.. Increasing processing speed through late materialization And more, e.g., parallel scan/join/aggregation

Two Different Principles of Physical Data Storage: Row- vs. Column-Store 7 Row-store: Rows are stored consecutively Optimal for row-wise access (e.g. *) Column-store: Columns are stored consecutively Optimal for attribute focused access (e.g. SUM, GROUP BY) Note: concept is independent from storage type But only in-memory implementation allows fast tuple reconstruction in case of a column store + Row-Store Column-store Row 1 Doc Num Doc Date Sold- To Value Sales Status Org Row 2 Row 3 Row 4

OLTP- and OLAP-style Queries Favor Different Storage Patterns 8 Row Store Column Store Row 1 Doc Num Doc Date Sold- To Value Sales Status Org SELECT * FROM Sales Orders WHERE Document Number = 95779216 Row 2 Row 3 Row 4 Row 1 Doc Num Doc Date Sold- To Value Sales Status Org SELECT SUM(Order Value) FROM Sales Orders WHERE Document Date > 2009-01-20 Row 2 Row 3 Row 4

Motivation for Compression in Databases 9 Main memory access is the bottleneck Idea: Trade CPU time to compress and decompress data Lightweight Compression Lossless Reduces I/O operations to main memory Leads to less cache misses due to more information on a cache line Enables operations directly on compressed data Allows to offset by the use of fixed-length data types

Lightweight Dictionary Encoding for Compression and Late Materialization 10 Store distinct values once in separate mapping table (the dictionary) Associate unique mapping key (valueid) for each distinct value Store valueid instead of value in attribute vector Enables offsetting with bit-encoded fixed-length data types RecId 1 RecId 2 RecId 3 RecId 4 RecId 5 RecId 6 RecId 7 RecId 8 Table JAN FEB MAR APR MAY JUN JUL AUG INTEL ABB HP INTEL IBM IBM SIEMENS INTEL 1 2 2 4 5 5 4 3 Attribute: Company Name Attribute Vector RecId ValueId 1 4 2 1 3 2 4 4 5 3 6 3 7 5 8 4 Dictionary ValueId Value 1 ABB 2 HP 3 IBM 4 Intel 5 SIEMENS Inverted Index ValueId RecIdList 1 2 2 3 3 5,6 Typical compression factor for enterprise software 10 In financial applications up to 50 4 1,4,8 5 7

Data Modifications in a Compressed Store 11 Differential Store: two separate in-memory partitions Read-optimized main partition (ROS) Write-optimized delta partition (WOS) Both represent the current state of the data WOS/Delta as an intermediate storage for several modifications Re-compression costs are shared among all recent modifications (merge process) Insert/Update Select (union) WOS Merge Process (asynchronously) ROS

Todays Enterprise Applications 12 Enterprise applications have evolved: not just OLTP vs. OLAP Demand for real-time analytics on transactional data High throughput analytics è completely in memory Examples Available-To-Promise Check Perform real-time ATP check directly on transactional data during order entry, without materialized aggregates of available stocks. Dunning Search for open invoices interactively instead of scheduled batch runs. Operational Analytics Instant customer sales analytics with always up-to-date data. Data integration as big challenge (e.g. POS data)

Enterprise Workloads are Read-Mostly 13 Customer analysis shows a widening read -gap between transactional and analytical queries It is a myth that OLTP is write-oriented, and OLAP is readoriented Real world is more complicated than single tuple access, lots of range queries Workload 100% 90 % 80 % 70 % 60 % 50 % 40 % 30 % 20 % Write: Delete Modification Insert Read: Range Select Table Scan Lookup Workload 100% 90 % 80 % 70 % 60 % 50 % 40 % 30 % 20 % Write: Delete Modification Insert Read: Select 10 % 10 % 0 % OLTP OLAP 0 % TPC-C

Enterprise Data is Typically Sparse 14 Enterprise data is wide and sparse Most columns are empty or have a low cardinality of distinct values Sparse distribution facilitates high compression % of Columns 80% 70% 60% 50% 40% 30% 20% 10% 0% 64% 78% 12% 9 % Inventory Management Financial Accounting 24% 13% 1-32 33-1023 1024-100000000 Number of Distinct Values

Challenge 1 for Enterprises: Dealing with all Sorts of Data 15 Transactional Data Entry Sources: Machines, Transactional Apps, User Interaction, etc. Real-time Analytics, Structured Data Sources: Reporting, Classical Analytics, planning, simulation CPUs (multi-core + Cache + Memory) Event Processing, Stream Data Sources: machines, sensors, high volume systems Data Management Text Analytics, Unstructured Data Sources: web, social, logs, support systems, etc.

Challenge 2 for Enterprises: Current application architectures 16 create different application-specific silos with redundant data that reduce real-time behavior & increase complexity. ANALYTICAL CUBES Transactional ANALYTICAL Text Processing ETL RDB on Disk (Tuples) RDB on Disk (Star Schemas) Accelerator Column Store In-Memory (Fully Cached Result Sets) Blobs and Text Columns In-Memory

Drawbacks of this Separation 17 Historically, OLTP and OLAP system are separated because of resource contention and hardware limitations. But, this separation has several disadvantages: OLAP system does not have the latest data OLAP system does only have predefined subset of the data Cost-intensive ETL process has to keep both systems in synch There is a lot of redundancy Different data schemas introduce complexity for applications combining sources

18 Approach Change overall data management system assumption In-Memory only Vertically partitioned (column store) CPU-cache optimized Only one optimization objective main memory access IN-Memory Column + Row OLTP + OLAP + Text Column Rethink how enterprise application persistence is build Single data management system No redundant data, no materialized views, cubes Computational application logic closer to the database (i.e. complex queries, stored procedures) ROW Backup TEXT

Intermezzo 19 Hardware advances More computing power through multi-core CPU s Larger and cheaper main memory Algorithms need to be aware of the memory wall Software advances Columns stores superior for analytic style queries Light-weight compression schemes utilize modern hardware Enterprise applications Need to execute complex queries in real-time One single source of truth is needed

How does it all come together? 20 1. Mixed Workload combining OLTP and analytic-style queries Column-Stores are best suited for analytic-style queries In-memory database enables fast tuple re-construction In-memory column store allows aggregation on the fly 2. Sparse enterprise data Lightweight compression schemes are optimal Increases query execution Improves feasibility of in-memory database Changed Hardware Advances in data processing (software) 3. Mostly read workload Read-optimized stores provide best throughput i.e. compressed in-memory column-store Write-optimized store as delta partition to handle data changes is sufficient Our focus Complex Enterprise Applications

Merge SanssouciDB: An In-Memory Database for Enterprise Applications 21 In-Memory Database (IMDB) Data resides permanently in main memory Main Memory is the primary persistence Still: logging to disk/recovery from disk Main memory access is the new bottleneck Cache-conscious algorithms/ data structures are crucial (locality is king) Interface Services and Session Management Query Execution Metadata TA Manager Active Data Main Store Column Column Data aging Combined Column Time travel Differential Store Column Column Combined Column Logging Indexes Inverted Object Data Guide Recovery Log Distribution Layer at Blade i Main Memory at Blade i Non-Volatile Memory Passive Data (History) Snapshots

Impact on Application Development 22 Traditional In-Memory Column-Store Application cache Materialized views Prebuilt aggregates Raw data Less caches needed No redundant objects No maintenance of materialized views or aggregates Minimized index maintenance Data movements are minimized

23 Impact on Enterprise Applications: Financials as of Today

Impact on Enterprise Applications: Simplified Financials on In-Memory DB 24 Only base tables, algorithms, and some indexes Reduces complexity Lowers TCO While adding more flexibility, integration, and functionality

Conclusion 25 In-memory column stores are better suited as database management system (DBMS) for enterprise applications than conventional DBMS In-memory column stores utilizes modern hardware optimally Several data processing techniques leverage in-memory only data processing Enterprise applications show specific characteristics: Sparsely filled data tables Complex read-mostly workload Real-world experiences have proven the feasibility of the in-memory column-store

Thanks! 26 Questions? Jens Krueger Hasso Plattner Institute jens.krueger@hpi.uni-potsdam.de