A Glance over MonetDB November 22 nd 2011 at Nopsar Fabian.Groffen@cwi.nl Ying.Zhang@cwi.nl Centrum Wiskunde & Informatica
Background
History of MonetDB 1979-1992 first relational kernel 1993-1995 BAT-based kernel 1996-2003 spin-off Data Distilleries 2003-2007 Open Source (v4) 2008-20?? MonetDB v5 3
MonetDB Principles Full vertical fragmentation: always! (BATs) RISC approach to databases Optimised for in-memory processing Operator-at-a-time bulk processing CPU and memory cache optimised 4
Traditional DBMSs Row-based Buffer managers, pages,... (Magnetic) disk I/O conscious Tuple-at-a-time volcano-style processing Mostly disk (index) bound (for speed) 5
Research Nature MonetDB is different by design Educated guesses have proven useful Open Architecture (pluggable/extensible) Experimental research additions 6
The Vision Column-store long before column-stores became known a pioneer in the field We can t solve problems by using the same kind of thinking we used when we created them. 7
Problems Seen From OLTP to OLAP, BI, Data Mining DBMSs on modern processors: 60 90% idle Waiting for memory to arrive at CPU Non-utilised caches and CPU features 8
CPU niceness
CPU usage 10
Why are we waiting? CPU is 60%-90% idle, waiting for memory: L1 data stalls L1 instruction stalls L2 data stalls TLB stalls Branch mispredictions Resource stalls 11
Memory Wall Trip to memory = 1000s of instructions! 12
Memory Hierarchy 13
simple hardcoded semantics Processing SELECT id, name, (age-30)*50 as bonus FROM people WHERE age > 30 batcalc_minus_int(int* res, int* col, xint val, int n) { for(i=0; i<n; i++) res[i] = col[i] - val; } CPU: Give it nice code! - few dependencies (control,data) - CPU gets out-of-order execution - compiler can e.g. generate SIMD One loop for an entire column - no per-tuple interpretation - arrays: no record navigation - better instruction cache locality 14
Internals
Software Stack GDK (BAT Kernel) MonetDB 5 MAL interpreter Optimiser stack Execution/scheduler SQL to MAL translator MonetDB daemon 16
A Row-store Early 80s: tuple storage structures were simple OK John 32 Houston OK Mary 31 Houston 17
Disk Pages 32 John Houston 31 Mary Houston 18
A Column-store A column orientation is simple and acts like an array Attributes of a tuple are correlated by offset 19
Binary Association Tables row-store column-store 20
Data Organisation sequence{ dense head tail 100 10 101 11 102 12 103 14 104 18 head and tail stored as separate files memory mapped head and tail columns in fact fixed width are C-like arrays 21
Tail Heaps head tail 100 0x01 101 0x04 102 0x08 103 0x04 104 0x25 John heap Mary best effort duplicate elimination 22
Accelerators hash-based access head tail 100 10 101 11 102 12 103 14 104 18 column properties: key-ness non-null dense ordered 23
GDK processing model Bulk processing (full materialisation) Binary algebra core select, join, semijoin, outerjoin, union, intersection, diff, group, count, max, min, sum, avg, reverse, mirror, mark Runtime operational optimisation 24
GDK algorithms Heavy use of code expansion Fast, branchless, code paths ~1500 selection routines Runtime selection of best algorithm for current situation 25
Maintenance
Knoblessness MonetDB is host-oriented We follow the no knobs principle MonetDB aims to maintain its own databases TODO: we need vacuum for deletes Upgrades on new releases are in-place 27
Backups dbfarm can be copied verbatim as long as the server won t change it which means either it s stopped or suspended only works on same architecture (rarely a problem these days, most is x86_64) Data can be dumped to SQL 28
Deletes A DELETE does not remove for real! :( On frequent DELETE scenarios, tables need to be reloaded to free up space dump/restore CREATE TABLE LIKE SELECT... WITH DATA delete by dropping entire tables 29
Code generation
Inspection from SQL Prefix a query by: PLAN to get the relational plan, independent of data EXPLAIN to get the MAL plan, what will be really executed TRACE to see each instruction of the MAL plan prefixed by microseconds 31
Optimisers Strategic optimizer: Exploit the semantics of the language Rely on heuristics Tactical MAL optimizer: No changes in front-ends and no direct human guidance Minimal changes in the engine SQL Tactical Optimizer MAL MAL Operational optimizer: Exploit everything you know at runtime Re-organize if necessary MonetDB Kernel MonetDB Server 32
Optimisers Strategic optimizer: Exploit the semantics of the language Rely on heuristics Tactical MAL optimizer: x1:bat[:oid,:dbl]:= sql.bind("sys","photoobjall","ra",0); No x14:= changes algebra.uselect(x1,a0,a1); in front-ends and no direct human guidance Minimal changes in the engine y1:bat[:oid,:dbl]:= bpm.take("sys_photoobjall_ra"); y2 := bpm.new(:oid,:oid); barrier rs:= bpm.newiterator(y1,a0,a1); t1:= algebra.uselect(rs,a0,a1); bpm.addsegment(y2,t1); Operational optimizer: redo rs:= bpm.hasmoreelements(y1,a0,a1); exit rs; Exploit everything you know at runtime Re-organize if necessary SQL MAL Tactical Optimizer MAL MonetDB Kernel MonetDB Server 32
Examples Code Inliner Constant Expression Evaluator Accumulator Evaluations Strength Reduction Common Term Optimizer Join Path Optimiser Ranges Propagation Operator Cost Reduction Foreign Key handling Aggregate Groups Code Paralliser Replication Manager Result Recycler Dynamic Query Scheduler Alias Removal Dead Code Removal Garbage Collector 33
Usage
Research Usage, Amsterdam Core DBMS Reseach TIJAH: Multi-Media IR Data Mining, GIS, Astronomy, RDF/SPARQL, Streams,... Universität Tübingen (with UTwente & ) Pathfinder: XQuery compiler Knowledge Discovery Lab, UMass, Amherst Proximity: OpenSource relational knowledge 35
Commercial Usage Data Distilleries ( Spin-Off, now part of SPSS -> IBM), Amsterdam Commercial Data-Mining & CRM Software Many banks & insurance companies in NL Pentaho MonetDB as supported analytic database platform Coupling to Infobright 36
Extendability
MonetDB Sources Using Mercurial, distributed VCS at http://dev.monetdb.org/hg/monetdb/ Release Managers create release branches (Aug2011, Dec2011,...) and tags (SP-1,...) Commits are propagated from/to stable, candidate and development branches Nightly regression testing of branches 38
Release Cycle Deliver releases on regular intervals Predictable for both devs and users Keeps gap between devs and users low 39
Classic School Theory unlock tree slowdown branch test & fix release fixes 40
Classic School Theory unlock tree slowdown branch test & fix release fixes 40
OpenBSD Release Cycle unlock tree normal development big commits now slowdown API/ABI locks lock tree 41 branch everyone tests release unlock tree next cycle
OpenBSD Release Cycle unlock tree normal development big commits now slowdown API/ABI locks lock tree 41 branch everyone tests release unlock tree next cycle
OpenBSD Release Cycle unlock tree normal development big commits now slowdown API/ABI locks lock tree 41 branch everyone tests release unlock tree next cycle
OpenBSD? OS with strong focus on security and stability their release cycle might be too rigid for us (e.g. treelocks, single version development) good things: features are committed at the start the final release process should be minor 42
normal development MonetDB Cycle big commits now initial release SP1 SP2 release N 43 fixes release N+1
Branching One of the strong points of DVCSs Hg allows easy merging Each clone is a branch itself Extremely simple to keep changes local 44
Mercurial Each clone contains full history/data Pulling changes as well as pushing Local in-house master clone pulling in changes from e.g. dev.monetdb.org Staged/selected pushs/pull requests of fixes back to monetdb.org (or sent through hg email my-bugfix-rev) 45
Considerations MonetDB devs stop bug-fixes on a branch when follow-up one becomes a release Release branches maintain API/ABI compatability Database format upgrade path only supported from previous release It is best to stay with current release branches 46
MonetDB Team Spirit Bring all fixes back to monetdb.org codebase Help develop solutions, have primitives available on monetdb.org codebase Share our minds to help find good (code) solutions, or migrations 47