Fabian.Groffen@cwi.nl Ying.Zhang@cwi.nl



Similar documents
The MonetDB Architecture. Martin Kersten CWI Amsterdam. M.Kersten

Performance Tuning and Optimizing SQL Databases 2016

In-Memory Data Management for Enterprise Applications

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

SQL Server 2012 Optimization, Performance Tuning and Troubleshooting

Performance Verbesserung von SAP BW mit SQL Server Columnstore

Binary search tree with SIMD bandwidth optimization using SSE

About Me: Brent Ozar. Perfmon and Profiler 101

In-memory databases and innovations in Business Intelligence

Enterprise Applications

SQL Performance for a Big Data 22 Billion row data warehouse

Rethinking SIMD Vectorization for In-Memory Databases

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

In-Memory Databases MemSQL

Things to consider before you do an In-place upgrade to Windows 10. Setup Info. In-place upgrade to Windows 10 Enterprise with SCCM

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Fact Sheet In-Memory Analysis

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

SQL Query Evaluation. Winter Lecture 23

Actian Vector in Hadoop

Oracle Database In-Memory The Next Big Thing

iservdb The database closest to you IDEAS Institute

Operating Systems. Virtual Memory

B.Sc (Computer Science) Database Management Systems UNIT-V

MS SQL Server 2014 New Features and Database Administration

Architectures for Big Data Analytics A database perspective

Jet Data Manager 2012 User Guide

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

Understanding the Value of In-Memory in the IT Landscape

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Intelligent Business Operations and Big Data Software AG. All rights reserved.

Coldbase - A Column-Oriented In-Memory Database

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe

Advanced Performance Forensics

Chapter 3 Operating-System Structures

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Columnstore in SQL Server 2016

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

General Incremental Sliding-Window Aggregation

Integrating Apache Spark with an Enterprise Data Warehouse

Real Life Performance of In-Memory Database Systems for BI

ORACLE DATABASE 10G ENTERPRISE EDITION

SQL Server 2012 Database Administration With AlwaysOn & Clustering Techniques

Postgres Plus Advanced Server

Unit Storage Structures 1. Storage Structures. Unit 4.3

Google File System. Web and scalability

Simple Solutions for Compressed Execution in Vectorized Database System

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

features at a glance

Architecture Sensitive Database Design: Examples from the Columbia Group

SQL Server. DMVs in Action. Better Queries with. Dynamic Management Views MANNING IANW. STIRK. Shelter Island

DBMS / Business Intelligence, SQL Server

Monitoring PostgreSQL database with Verax NMS

Course Outline. Upgrading Your Skills to SQL Server 2016 Course 10986A: 5 days Instructor Led

Complex Data and Object-Oriented. Databases

In-memory Tables Technology overview and solutions

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Exam Number/Code : Exam Name: Name: PRO:MS SQL Serv. 08,Design,Optimize, and Maintain DB Admin Solu. Version : Demo.

Multi-dimensional index structures Part I: motivation

Cloud Computing at Google. Architecture

Distributed Data Management

Mind Q Systems Private Limited

Big Fast Data Hadoop acceleration with Flash. June 2013

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008

FAST 11. Yongseok Oh University of Seoul. Mobile Embedded System Laboratory

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

Enhancing SQL Server Performance

Distributed Databases

In This Lecture. Physical Design. RAID Arrays. RAID Level 0. RAID Level 1. Physical DB Issues, Indexes, Query Optimisation. Physical DB Issues

Database Application Developer Tools Using Static Analysis and Dynamic Profiling

Optimizing compilers. CS Modern Compilers: Theory and Practise. Optimization. Compiler structure. Overview of different optimizations

1 Structured Query Language: Again. 2 Joining Tables

PRODUCT OVERVIEW SUITE DEALS. Combine our award-winning products for complete performance monitoring and optimization, and cost effective solutions.

Boost SQL Server Performance Buffer Pool Extensions & Delayed Durability

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Application of Predictive Analytics for Better Alignment of Business and IT

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

File Management. COMP3231 Operating Systems. Kevin Elphinstone. Tanenbaum, Chapter 4

Chapter 13: Query Processing. Basic Steps in Query Processing

Data Warehouse design

In-Memory Analytics: A comparison between Oracle TimesTen and Oracle Essbase

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Main Memory Data Warehouses

High Performance Time-Series Analysis Powered by Cutting-Edge Database Technology

Introducing Oracle Exalytics In-Memory Machine

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

Transcription:

A Glance over MonetDB November 22 nd 2011 at Nopsar Fabian.Groffen@cwi.nl Ying.Zhang@cwi.nl Centrum Wiskunde & Informatica

Background

History of MonetDB 1979-1992 first relational kernel 1993-1995 BAT-based kernel 1996-2003 spin-off Data Distilleries 2003-2007 Open Source (v4) 2008-20?? MonetDB v5 3

MonetDB Principles Full vertical fragmentation: always! (BATs) RISC approach to databases Optimised for in-memory processing Operator-at-a-time bulk processing CPU and memory cache optimised 4

Traditional DBMSs Row-based Buffer managers, pages,... (Magnetic) disk I/O conscious Tuple-at-a-time volcano-style processing Mostly disk (index) bound (for speed) 5

Research Nature MonetDB is different by design Educated guesses have proven useful Open Architecture (pluggable/extensible) Experimental research additions 6

The Vision Column-store long before column-stores became known a pioneer in the field We can t solve problems by using the same kind of thinking we used when we created them. 7

Problems Seen From OLTP to OLAP, BI, Data Mining DBMSs on modern processors: 60 90% idle Waiting for memory to arrive at CPU Non-utilised caches and CPU features 8

CPU niceness

CPU usage 10

Why are we waiting? CPU is 60%-90% idle, waiting for memory: L1 data stalls L1 instruction stalls L2 data stalls TLB stalls Branch mispredictions Resource stalls 11

Memory Wall Trip to memory = 1000s of instructions! 12

Memory Hierarchy 13

simple hardcoded semantics Processing SELECT id, name, (age-30)*50 as bonus FROM people WHERE age > 30 batcalc_minus_int(int* res, int* col, xint val, int n) { for(i=0; i<n; i++) res[i] = col[i] - val; } CPU: Give it nice code! - few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD One loop for an entire column - no per-tuple interpretation - arrays: no record navigation - better instruction cache locality 14

Internals

Software Stack GDK (BAT Kernel) MonetDB 5 MAL interpreter Optimiser stack Execution/scheduler SQL to MAL translator MonetDB daemon 16

A Row-store Early 80s: tuple storage structures were simple OK John 32 Houston OK Mary 31 Houston 17

Disk Pages 32 John Houston 31 Mary Houston 18

A Column-store A column orientation is simple and acts like an array Attributes of a tuple are correlated by offset 19

Binary Association Tables row-store column-store 20

Data Organisation sequence{ dense head tail 100 10 101 11 102 12 103 14 104 18 head and tail stored as separate files memory mapped head and tail columns in fact fixed width are C-like arrays 21

Tail Heaps head tail 100 0x01 101 0x04 102 0x08 103 0x04 104 0x25 John heap Mary best effort duplicate elimination 22

Accelerators hash-based access head tail 100 10 101 11 102 12 103 14 104 18 column properties: key-ness non-null dense ordered 23

GDK processing model Bulk processing (full materialisation) Binary algebra core select, join, semijoin, outerjoin, union, intersection, diff, group, count, max, min, sum, avg, reverse, mirror, mark Runtime operational optimisation 24

GDK algorithms Heavy use of code expansion Fast, branchless, code paths ~1500 selection routines Runtime selection of best algorithm for current situation 25

Maintenance

Knoblessness MonetDB is host-oriented We follow the no knobs principle MonetDB aims to maintain its own databases TODO: we need vacuum for deletes Upgrades on new releases are in-place 27

Backups dbfarm can be copied verbatim as long as the server won t change it which means either it s stopped or suspended only works on same architecture (rarely a problem these days, most is x86_64) Data can be dumped to SQL 28

Deletes A DELETE does not remove for real! :( On frequent DELETE scenarios, tables need to be reloaded to free up space dump/restore CREATE TABLE LIKE SELECT... WITH DATA delete by dropping entire tables 29

Code generation

Inspection from SQL Prefix a query by: PLAN to get the relational plan, independent of data EXPLAIN to get the MAL plan, what will be really executed TRACE to see each instruction of the MAL plan prefixed by microseconds 31

Optimisers Strategic optimizer: Exploit the semantics of the language Rely on heuristics Tactical MAL optimizer: No changes in front-ends and no direct human guidance Minimal changes in the engine SQL Tactical Optimizer MAL MAL Operational optimizer: Exploit everything you know at runtime Re-organize if necessary MonetDB Kernel MonetDB Server 32

Optimisers Strategic optimizer: Exploit the semantics of the language Rely on heuristics Tactical MAL optimizer: x1:bat[:oid,:dbl]:= sql.bind("sys","photoobjall","ra",0); No x14:= changes algebra.uselect(x1,a0,a1); in front-ends and no direct human guidance Minimal changes in the engine y1:bat[:oid,:dbl]:= bpm.take("sys_photoobjall_ra"); y2 := bpm.new(:oid,:oid); barrier rs:= bpm.newiterator(y1,a0,a1); t1:= algebra.uselect(rs,a0,a1); bpm.addsegment(y2,t1); Operational optimizer: redo rs:= bpm.hasmoreelements(y1,a0,a1); exit rs; Exploit everything you know at runtime Re-organize if necessary SQL MAL Tactical Optimizer MAL MonetDB Kernel MonetDB Server 32

Examples Code Inliner Constant Expression Evaluator Accumulator Evaluations Strength Reduction Common Term Optimizer Join Path Optimiser Ranges Propagation Operator Cost Reduction Foreign Key handling Aggregate Groups Code Paralliser Replication Manager Result Recycler Dynamic Query Scheduler Alias Removal Dead Code Removal Garbage Collector 33

Usage

Research Usage, Amsterdam Core DBMS Reseach TIJAH: Multi-Media IR Data Mining, GIS, Astronomy, RDF/SPARQL, Streams,... Universität Tübingen (with UTwente & ) Pathfinder: XQuery compiler Knowledge Discovery Lab, UMass, Amherst Proximity: OpenSource relational knowledge 35

Commercial Usage Data Distilleries ( Spin-Off, now part of SPSS -> IBM), Amsterdam Commercial Data-Mining & CRM Software Many banks & insurance companies in NL Pentaho MonetDB as supported analytic database platform Coupling to Infobright 36

Extendability

MonetDB Sources Using Mercurial, distributed VCS at http://dev.monetdb.org/hg/monetdb/ Release Managers create release branches (Aug2011, Dec2011,...) and tags (SP-1,...) Commits are propagated from/to stable, candidate and development branches Nightly regression testing of branches 38

Release Cycle Deliver releases on regular intervals Predictable for both devs and users Keeps gap between devs and users low 39

Classic School Theory unlock tree slowdown branch test & fix release fixes 40

Classic School Theory unlock tree slowdown branch test & fix release fixes 40

OpenBSD Release Cycle unlock tree normal development big commits now slowdown API/ABI locks lock tree 41 branch everyone tests release unlock tree next cycle

OpenBSD Release Cycle unlock tree normal development big commits now slowdown API/ABI locks lock tree 41 branch everyone tests release unlock tree next cycle

OpenBSD Release Cycle unlock tree normal development big commits now slowdown API/ABI locks lock tree 41 branch everyone tests release unlock tree next cycle

OpenBSD? OS with strong focus on security and stability their release cycle might be too rigid for us (e.g. treelocks, single version development) good things: features are committed at the start the final release process should be minor 42

normal development MonetDB Cycle big commits now initial release SP1 SP2 release N 43 fixes release N+1

Branching One of the strong points of DVCSs Hg allows easy merging Each clone is a branch itself Extremely simple to keep changes local 44

Mercurial Each clone contains full history/data Pulling changes as well as pushing Local in-house master clone pulling in changes from e.g. dev.monetdb.org Staged/selected pushs/pull requests of fixes back to monetdb.org (or sent through hg email my-bugfix-rev) 45

Considerations MonetDB devs stop bug-fixes on a branch when follow-up one becomes a release Release branches maintain API/ABI compatability Database format upgrade path only supported from previous release It is best to stay with current release branches 46

MonetDB Team Spirit Bring all fixes back to monetdb.org codebase Help develop solutions, have primitives available on monetdb.org codebase Share our minds to help find good (code) solutions, or migrations 47