PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor
The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007 2013) under grant agreem ent num be 318633
AXLE Project Analytics on Xtremely Large European data Secure Big Fast Hardware optimised Visual Analytics axleproject.eu
Topics Business Intelligence & Architecture BI Performance Feature Effectiveness Benchmark Analysis & Opportunities New Features in Progress
Business Intelligence ETL Reporting Ad-hoc queries Data Mining Many query types Counting Summarisation Strategic Analysis Analytics
BI Architecture SQL was invented for Business Intelligence Classic DW DB2 v Teradata Specialist Databases OLTP DW
Specialist OLTP Problems M ongodb Joins don't scale! V oltdb No concurrency All SQL must run in same duration Partial SQL implementation
Specialist DW Problems Second specialist system required ETL middleware also often required for loadin Data delayed on route to second system Frequently highly compressed, so read only o difficult to maintain
Get Real Big Data 99% of databases are <100GB Real Time results Business Intelligence required 24x7 Closed loop processing requires fast response SQL is much easier to use than alternatives More expressive and easier to use Already the de facto standard for BI
Minimal Approach Emphasise that additional BI technology will not reduce costs and may not offer solutions Keep Business Intelligence on PostgreSQL Use Hot Standby to expand capacity and isolate Business Intelligence workloads Minimise ETL whenever possible Gain benefits of SQL and concurrency Immediate access to data
Things To Learn Query performance is important Custom/special data structures are important in increasing performance Stale answers are acceptable for many situations
BI Feature Effectiveness Problem 1: Get the work into the database Problem 2: Speed up the work in the database +++++ Work Avoidance ++ Algorithmic Improvement + Brute Force
Orange Data Mining Orange 3.0 generates SQL for all data flows Directly utilises the power of databases Integrates with PostgreSQL
BI Tuning Opportunities COPY batch optimisations defeated Btree insert bottlenecks on large data loads Aggregate Optimisation Use sum()/count() not avg() Join Estimate/Actual Mismatch Use enable_nestloops = off Plan Pushdown Manual SQL rewrite
Speed Up: Work Avoidance Cacheing Result Cache Materialized Views Approximation Partition Elimination Improved Optimisation
Speed Up: Algorithms Compression Column Orientation Vectors Hardware approaches
Speed Ups: Brute Force Parallel Query is a brute force approach Gains in performance come from additional utilisation of resources, not from being smarter Reduces overall concurrency Still requires extensive optimizer changes The industry thinks we need it Some queries do require it PostgreSQL should do this, 2ndQuadrant can, will and has already helped
9.4 BI Features In-Progress Min Max Indexes Parallel Sort & Parallel Query infrastructure Materialized Views++ Multi-core scalability gains (lwlocks) (DDL Locking impact reductions) (Row Level Security)
Min Max Indexes (9.4) Automatic Partitioning Store min and max tuples for each page range Use theorem proving to avoid sections of scan Covers all columns, not just defined partition key Can be added easily to existing applications
Min Max Index results 2 GB table MinMax B-Tree Index build time 11s 96s Index size 24kB 1.1GB Load time w index 1 x2-3 Index SEL (1 row) x2-3 1 Index SEL (many) same same
MinMax Indexes Does not require complex DDL Generate almost no index inserts Fits in RAM even for Petabytes of data Generate almost no additional WAL Works well with Hot Standby data warehousing Only works with some data distributions Additional indexing may be needed
PostgreSQL BI Roadmap 9.4 10.0 10.1 Advanced Business Intelligence High Security Online Change Very Large Database
2ndQuadrant Consulting, Migration Open Source Development Training Support, RemoteDBA
The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007 2013) under grant agreem ent num be 318633