The MonetDB Architecture. Martin Kersten CWI Amsterdam. M.Kersten 2008 1



Similar documents

Chapter 13: Query Optimization

Evaluation of Expressions

Chapter 14: Query Optimization

Relational Databases

SQL Query Evaluation. Winter Lecture 23

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Chapter 13: Query Processing. Basic Steps in Query Processing

Inside the PostgreSQL Query Optimizer

Topics in basic DBMS course

MS SQL Performance (Tuning) Best Practices:

In-Memory Data Management for Enterprise Applications

IT2305 Database Systems I (Compulsory)

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Advanced Oracle SQL Tuning

SQL Query Performance Tuning: Tips and Best Practices

Physical Database Design and Tuning

1. Physical Database Design in Relational Databases (1)

Rackspace Cloud Databases and Container-based Virtualization

Chapter 5: Overview of Query Processing


Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Toad for Oracle 8.6 SQL Tuning

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Arrays in database systems, the next frontier?

Lecture 6: Query optimization, query tuning. Rasmus Pagh

Relational Algebra. Basic Operations Algebra of Bags

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe

Partitioning under the hood in MySQL 5.5

CIS 631 Database Management Systems Sample Final Exam

DISTRIBUTED AND PARALLELL DATABASE

Architecture Sensitive Database Design: Examples from the Columbia Group

Driving force. What future software needs. Potential research topics

Navigating Big Data with High-Throughput, Energy-Efficient Data Partitioning

IT2304: Database Systems 1 (DBS 1)

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Drupal Performance Tuning

SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology

Physical Data Organization

Integrating Apache Spark with an Enterprise Data Warehouse

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

DB2 for i5/os: Tuning for Performance

Unit Storage Structures 1. Storage Structures. Unit 4.3

MapReduce and the New Software Stack

DATABASE DESIGN - 1DL400

DBMS / Business Intelligence, SQL Server

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc.

Cloud Computing at Google. Architecture

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

NUMA obliviousness through memory mapping

RevoScaleR Speed and Scalability

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Computer Organization

6. Storage and File Structures

ORACLE DATABASE 10G ENTERPRISE EDITION

Multi-dimensional index structures Part I: motivation

Big Data Technology CS , Technion, Spring 2013

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : Hertford College Supervisor: Dr.

Physical DB design and tuning: outline

Rethinking SIMD Vectorization for In-Memory Databases

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

DATA STRUCTURES USING C

Availability Digest. Raima s High-Availability Embedded Database December 2011

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

In-Memory Databases MemSQL

CPU Organisation and Operation

FAST 11. Yongseok Oh University of Seoul. Mobile Embedded System Laboratory

µz An Efficient Engine for Fixed points with Constraints

1 Structured Query Language: Again. 2 Joining Tables

In-memory Tables Technology overview and solutions

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Virtuoso and Database Scalability

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Database Scalability and Oracle 12c

Performance Tuning for the Teradata Database

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Part 5: More Data Structures for Relations

1 File Processing Systems

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

LiTH, Tekniska högskolan vid Linköpings universitet 1(7) IDA, Institutionen för datavetenskap Juha Takkinen

1. INTRODUCTION TO RDBMS

news from Tom Bacon about Monday's lecture

Raima Database Manager Version 14.0 In-memory Database Engine

Object Oriented Database Management System for Decision Support System.

3. Relational Model and Relational Algebra

Transcription:

The MonetDB Architecture Martin Kersten CWI Amsterdam M.Kersten 2008 1

Try to keep things simple Database Structures Execution Paradigm Query optimizer DBMS Architecture M.Kersten 2008 2

End-user application MonetDB quickstep SQL XQuery JDBC ODBC Python Perl PHP RoR C-mapi lib MAPI protocol MonetDB kernel M.Kersten Sep 2008

The MonetDB Software Stack SQL/XML SOAP XQuery SQL 03 Open-GIS Optimizers X100 MonetDB 4 MonetDB 5 MonetDB kernel Compile? An advanced column-oriented DBMS M.Kersten Sep 2008

Try to keep things simple MonetDB storage Database Structures N-ary stores PAX stores Column stores M.Kersten 2008 5

Try to keep things simple Early 80s: tuple storage structures for PCs were simple OK OK John 32 Houston Mary 31 Houston Easy to access at the cost of wasted space M.Kersten 2008 6

Try to keep things simple Slotted pages Logical pages equated physical pages 32 John Houston 31 Mary Houston M.Kersten 2008 7

Try to keep things simple Slotted pages Logical pages equated multiple physical pages 32 John Houston 31 Mary Houston M.Kersten 2008 8

Avoid things you don t always need Not all attributes are equally important M.Kersten 2008 9

Avoid moving too much around A column orientation is as simple and acts like an array Attributes of a tuple are correlated by offset M.Kersten 2008 10

Try to keep things simple MonetDB Binary Association Tables M.Kersten 2008 11

Physical data organization Binary Association Tables Try to avoid doing things twice Dense sequence Memory mapped files Bat Unit fixed size M.Kersten 2008 12

Try to avoid doing things twice Binary Association Tables accelerators Hash-based access Column properties: key-ness non-null dense ordered M.Kersten 2008 13

Try to avoid doing things twice Binary Association Tables storage control 100 Type remappings are used to squeeze space A VID datatype can be used to represent dense enumerations A BAT can be used as an encoding table M.Kersten 2008 14

Mantra: Try to keep things simple Column orientation benefits datawarehousing Brings a much tighter packaging and improves transport through the memory hierarchy Each column can be more easily optimized for storage using compression schemes Each column can be replicated for read-only access M.Kersten 2008 15

Try to maximize performance Database Structures Execution Paradigm Query optimizer DBMS Architecture M.Kersten 2008 16

Try to maximize performance Volcano model Execution Paradigm Materialize All Model Vectorized model M.Kersten 2008 17

Volcano Engines Query SELECT name, salary*.19 AS tax FROM employee WHERE age > 25 M.Kersten 2008 18

Volcano Engines Operators Iterator interface - open() - next(): tuple - close() M.Kersten 2008 19

Volcano Engines Primitives Provide computational functionality All arithmetic allowed in expressions, e.g. multiplication mult(int,int) int M.Kersten 2008 20

Try to maximize performance Volcano paradigm The Volcano model is based on a simple pullbased iterator model for programming relational operators. The Volcano model minimizes the amount of intermediate store The Volcano model is CPU intensive and can be inefficient M.Kersten 2008 21

Try to use simple a software pattern MonetDB paradigm The MonetDB kernel is a programmable relational algebra machine Relational operators operate on array -like structures M.Kersten 2008 22

MonetDB quickstep select count(*) from photoobjall; function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X20 := aggr.count(x1); sql.exportvalue(1,"sys.","count_","int",32,0,6,x20,""); end s3_1; Kernel execution paradigms Tuple-at-a-time pipelined Operator-at-a-time SQL Tactical Optimizers MonetDB Kernel MonetDB Server MAL MAL

Try to use simple a software pattern Operator implementation All algebraic operators materialize their result GOOD: small code footprints GOOD: potential for re-use BAD : extra storage for intermediates BAD: cpu cost for retaining it Local optimization decisions Sortedness, uniqueness, hash index Sampling to determine sizes Parallelism options Properties that affect the algorithms M.Kersten 2008 24

Try to use simple a software pattern Operator implementation All algebraic operators materialize their result Local optimization decisions Heavy use of code expansion to reduce cost 55 selection routines 149 unary operations 335 join/group operations 134 multi-join operations 72 aggregate operations M.Kersten 2008 25

Try to avoid the search space trap Database Structures Execution Paradigm Query optimizer DBMS Architecture M.Kersten 2008 26

MonetDB quickstep Strategic optimizer: Exploit the lanuage the language Rely on heuristics Tactical MAL optimizer: Modular optimizer framework Focused on coarse grain resource optimization SQL MAL optimizers MAL MAL Operational optimizer: Exploit everything you know at runtime Re-organize if necessary MonetDB Kernel MonetDB Server M.Kersten 2008 27

MonetDB quickstep select count(*) from photoobjall; function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X6:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",1); X9:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",2); X13:bat[:oid,:oid] := sql.bind_dbat("sys","photoobjall",1); X8 := algebra.kunion(x1,x6); X11 := algebra.kdifference(x8,x9); X12 := algebra.kunion(x11,x9); X14 := bat.reverse(x13); 0 base table X15 := algebra.kdifference(x12,x14); X18 := algebra.markt(x15,0@0); X19 := bat.reverse(x18); X20 := aggr.count(x19); sql.exportvalue(1,"sys.","count_","int",32,0,6,x20,""); end s3_1; delete 2 update 1 insert SQL Tactical Optimizers MonetDB Kernel MonetDB Server MAL MAL

MonetDB quickstep select count(*) from photoobjall; function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X20 := aggr.count(x1); sql.exportvalue(1,"sys.","count_","int",32,0,6,x20,""); end s3_1; Optimizer pipelines. sql> select optimizer; inline,remap,evaluate,costmodel,coercions,emptyset,aliases,mergetable,deadcode,c onstants,commonterms,joinpath,deadcode,reduce,garbagecollector,dataflow,history,r eplication,multiplex SQL Tactical Optimizers MonetDB Kernel MonetDB Server MAL MAL

MonetDB quickstep select count(*) from photoobjall; function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X20 := aggr.count(x1); sql.exportvalue(1,"sys.","count_","int",32,0,6,x20,""); end s3_1; Kernel execution paradigms Tuple-at-a-time pipelined Operator-at-a-time SQL Tactical Optimizers MonetDB Kernel MonetDB Server MAL MAL

Query optimization Alternative ways of evaluating a given query Equivalent expressions Different algorithms for each operation (Chapter 13) Cost difference between a good and a bad way of evaluating a query can be enormous Example: performing a r X s followed by a selection r.a = s.b is much slower than performing a join on the same condition Need to estimate the cost of operations Depends critically on statistical information about relations which the database must maintain Need to estimate statistics for intermediate results to compute cost of complex expressions

Introduction (Cont.) Relations generated by two equivalent expressions have the same set of attributes and contain the same set of tuples, although their attributes may be ordered differently.

Introduction (Cont.) Generation of query-evaluation plans for an expression involves several steps: 1. Generating logically equivalent expressions Use equivalence rules to transform an expression into an equivalent one. 2. Annotating resultant expressions to get alternative query plans 3. Choosing the cheapest plan based on estimated cost The overall process is called cost based optimization.

Equivalence Rules 1. Conjunctive selection operations can be deconstructed into a sequence of individual selections. 2. 2. Selection operations are commutative. 3. Only the last in a sequence of projection operations is needed, the others can be omitted. 4. Selections can be combined with Cartesian products and theta joins. a. σ θ (E 1 X E 2 ) = E 1 θ E 2 b. σ θ1 (E 1 θ2 E 2 ) = E 1 θ1 θ2 E 2

Equivalence Rules (Cont.) 5. Theta-join operations (and natural joins) are commutative. E 1 θ E 2 = E 2 θ E 1 6. (a) Natural join operations are associative: (E 1 E 2 ) E 3 = E 1 (E 2 E 3 ) (b) Theta joins are associative in the following manner: (E 1 θ1 E 2 ) θ2 θ 3 E 3 = E 1 θ2 θ3 (E 2 θ2 E 3 ) where θ 2 involves attributes from only E 2 and E 3.

Pictorial Depiction of Equivalence Rules

Equivalence Rules (Cont.) 7. The selection operation distributes over the theta join operation under the following two conditions: (a) When all the attributes in θ 0 involve only the attributes of one of the expressions (E 1 ) being joined. σ θ0 (E 1 θ E 2 ) = (σ θ0 (E 1 )) θ E 2 (b) When θ 1 involves only the attributes of E 1 and θ 2 involves only the attributes of E 2. σ θ1 θ2 (E 1 θ E 2 ) = (σ θ1 (E 1 )) θ (σ θ2 (E 2 ))

Equivalence Rules (Cont.) 8. The projections operation distributes over the theta join operation as follows: (a) if it involves only attributes from L 1 L 2 : (b) Consider a join E 1 θ E 2. Let L 1 and L 2 be sets of attributes from E 1 and E 2, respectively. Let L 3 be attributes of E 1 that are involved in join condition θ, but are not in L 1 L 2, and let L 4 be attributes of E 2 that are involved in join condition θ, but are not in L 1 L 2.

Equivalence Rules (Cont.) 9. The set operations union and intersection are commutative E 1 E 2 = E 2 E 1 E 1 E 2 = E 2 E 1 (set difference is not commutative). 11. Set union and intersection are associative. (E 1 E 2 ) E 3 = E 1 (E 2 E 3 ) (E 1 E 2 ) E 3 = E 1 (E 2 E 3 ) 11. The selection operation distributes over, and. σ θ (E 1 E 2 ) = σ θ (E 1 ) σ θ (E 2 ) and similarly for and in place of Also: σ θ (E 1 E 2 ) = σ θ (E 1 ) E 2 and similarly for in place of, but not for 12. The projection operation distributes over union Π L (E 1 E 2 ) = (Π L (E 1 )) (Π L (E 2 ))

Multiple Transformations (Cont.)

Heuristic Optimizer strategies Apply the transformation rules in a specific order such that the cost converges to a minimum Cost based Simulated annealing Randomized generation of candidate QEP Problem, how to guarantee randomness

Memoization Techniques How to generate alternative Query Evaluation Plans? Early generation systems centred around a tree representation of the plan Hardwired tree rewriting rules are deployed to enumerate part of the space of possible QEP For each alternative the total cost is determined The best (alternatives) are retained for execution Problems: very large space to explore, duplicate plans, local maxima, expensive query cost evaluation. SQL Server optimizer contains about 300 rules to be deployed.

Memoization Techniques How to generate alternative Query Evaluation Plans? Keep a memo of partial QEPs and their cost. Use the heuristic rules to generate alternatives to built more complex QEPs r 1 r 2 r 3 r 4 r 4 Level n plans r 3 r 3 Level 2 plans r 2 r 1 x r 1 r 2 r 2 r 3 r 3 r 4 r 1 r 4 Level 1 plans

Ditching the optimizers Applications have different characteristics Platforms have different characteristics The actual state of computation is crucial A generic all-encompassing optimizer costmodel does not work M.Kersten 2008 44

Try to disambiguate decisions Code Inliner. Constant Expression Evaluator. Accumulator Evaluations. Strength Reduction. Common Term Optimizer. Join Path Optimizer. Ranges Propagation. Operator Cost Reduction. Foreign Key handling. Aggregate Groups. Code Parallizer. Replication Manager. Result Recycler. MAL Compiler. Dynamic Query Scheduler. Memo-based Execution. Vector Execution. Alias Removal. Dead Code Removal. Garbage Collector. M.Kersten 2008 45

No data from persistent store to the memory trash Database Structures Execution Paradigm Query optimizer DBMS Architecture M.Kersten 2008 46

No data from persistent store to the memory trash Execution paradigms The MonetDB kernel is set up to accommodate different execution engines The MonetDB assembler program is Interpreted in the order presented Interpreted in a dataflow driven manner Compiled into a C program Vectorised processing X100 project M.Kersten 2008 47

MonetDB/x100 X100 query engine Combine Volcano model with vector processing. CPU cache RAM ColumnBM (buffer manager) All vectors together should fit the CPU cache Vectors are compressed Optimizer should tune this, given the query characteristics. networked ColumnBM-s M.Kersten 2008 48

No data from persistent store to the memory trash Varying the vector size on TPC-H query 1 mysql, oracle, db2 low IPC, overhead MonetDB X100 RAM bandwidth bound M.Kersten 2008 49

Query evaluation strategy Pipe-line query evaluation strategy Called Volcano query processing model Standard in commercial systems and MySQL Basic algorithm: Demand-driven evaluation of query tree. Operators exchange data in units such as records Each operator supports the following interfaces: open, next, close open() at top of tree results in cascade of opens down the tree. An operator getting a next() call may recursively make next() calls from within to produce its next answer. close() at top of tree results in cascade of close down the tree

Query evaluation strategy Pipe-line query evaluation strategy Evaluation: Oriented towards OLTP applications Granule size of data interchange Items produced one at a time No temporary files Choice of intermediate buffer size allocations Query executed as one process Generic interface, sufficient to add the iterator primitives for the new containers. CPU intensive Amenable to parallelization

Query evaluation strategy Materialized evaluation strategy Used in MonetDB Basic algorithm: for each relational operator produce the complete intermediate result using materialized operands Evaluation: Oriented towards decision support queries Limited internal administration and dependencies Basis for multi-query optimization strategy Memory intensive Amendable for distributed/parallel processing

TPC-H TPC-H 60K rows line_item table Comfortably fit in memory Performance in milliseconds ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory

TPC-H Scale-factor 1 6M row line-item table Out of the box performance Queries produce empty or erroneous results ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory

ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory TPC-H

ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory TPC-H