Scalability And Performance Improvements In PostgreSQL 9.5

Similar documents
PostgreSQL Past, Present, and Future EDB All rights reserved.

Benchmarking FreeBSD. Ivan Voras

PostgreSQL Performance when it s not your job.

MS SQL Performance (Tuning) Best Practices:

Multi-CPU performance in PostgreSQL 9.2. Heikki Linnakangas / EnterpriseDB

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

PostgreSQL when it s not your job. Christophe Pettus PostgreSQL Experts, Inc. DjangoCon US 2012

PostgreSQL when it s not your job. Christophe Pettus PostgreSQL Experts, Inc. DjangoCon Europe 2012

PostgreSQL Backup Strategies

Database Hardware Selection Guidelines

Boost SQL Server Performance Buffer Pool Extensions & Delayed Durability

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

Configuring Apache Derby for Performance and Durability Olav Sandstå

Comparing SQL and NOSQL databases

Outline. Failure Types

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

Weaving Stored Procedures into Java at Zalando

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

A Shared-nothing cluster system: Postgres-XC

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

PostgreSQL 9.0 High Performance

- An Oracle9i RAC Solution

Expert Oracle. Database Architecture. Techniques and Solutions. 10gr, and 11g Programming. Oracle Database 9/, Second Edition.

Postgres Plus Advanced Server

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Rethinking SIMD Vectorization for In-Memory Databases

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft

GraySort on Apache Spark by Databricks

Enhancing SQL Server Performance

Optimizing the Performance of Your Longview Application

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

ONE NATIONAL HEALTH SYSTEM ONE POSTGRES DATABASE

Whitepaper: performance of SqlBulkCopy

StreamServe Persuasion SP5 Oracle Database

Microsoft SQL Server OLTP Best Practice

In-Memory Performance for Big Data

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Which Database is Better for Zabbix? PostgreSQL vs MySQL. Yoshiharu Mori SRA OSS Inc. Japan

Solving Performance Problems In SQL Server by Michal Tinthofer

The Ultimate Remote Database Administration Tool for Oracle, SQL Server and DB2 UDB

Kernel Optimizations for KVM. Rik van Riel Senior Software Engineer, Red Hat June

Oracle DBA Course Contents

PostgreSQL database performance optimization. Qiang Wang

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Social Networks and the Richness of Data

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

PostgreSQL Features, Futures and Funding. Simon Riggs

Postgres Plus Advanced Server Performance Features Guide

Greenplum Database Best Practices

Eloquence Training What s new in Eloquence B.08.00

Virtuoso and Database Scalability

PGCon PostgreSQL Performance Pitfalls

Configuring Apache Derby for Performance and Durability Olav Sandstå

DBA Best Practices: A Primer on Managing Oracle Databases. Leng Leng Tan Vice President, Systems and Applications Management

Apache Derby Performance. Olav Sandstå, Dyre Tjeldvoll, Knut Anders Hatlen Database Technology Group Sun Microsystems

SQL Server 2012 Query. Performance Tuning. Grant Fritchey. Apress*

Extreme Linux Performance Monitoring Part II

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

Barracuda Backup Vx. Virtual Appliance Deployment. White Paper

Microsoft SQL Server 2005 Database Mirroring

Database Scalability and Oracle 12c

Physical DB design and tuning: outline

Database Administration with MySQL

Module 14: Scalability and High Availability

Sharding with postgres_fdw

Data Integrator Performance Optimization Guide

AIX 5L NFS Client Performance Improvements for Databases on NAS

Contact for all enquiries Phone: info@recordpoint.com.au. Page 2. RecordPoint Release Notes V3.8 for SharePoint 2013

In and Out of the PostgreSQL Shared Buffer Cache

A Deduplication File System & Course Review

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

Partitioning under the hood in MySQL 5.5

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

FairCom c-tree Server System Support Guide

Maximum Availability Architecture

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

Quantcast Petabyte Storage at Half Price with QFS!

WebSphere Architect (Performance and Monitoring) 2011 IBM Corporation

SQL Server Training Course Content

Optimising the Mapnik Rendering Toolchain

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Why Zalando trusts in PostgreSQL

In Memory Accelerator for MongoDB

Recovery Principles in MySQL Cluster 5.1

Monitoring Agent for PostgreSQL Fix Pack 10. Reference IBM

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

Practical Cassandra. Vitalii


DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Performance Tuning Guidelines for PowerExchange for Microsoft Dynamics CRM

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

MySQL Enterprise Backup

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

Using VMWare VAAI for storage integration with Infortrend EonStor DS G7i

Transcription:

Scalability And Performance Improvements In PostgreSQL 9.5 Amit Kapila 2015.06.19 2013 EDB All rights reserved. 1

Contents Read Scalability Further Improvements In Read Operation Other Performance Work In 9.5 Page Writes Write Scalability 2

Read Scalability What is Read Scalability? Select operation should scale as number of sessions increase, assuming enough CPU's. But it doesn't because of locking. We are mostly concerned about workloads where all data is in memory. 3

Read Scalability Good boost in scalability. When data fits in shared_buffers When data fits in RAM 4

Read Scalability Data fits in shared_buffers pgbench -S -M prepared, PG9.5dev as of commit 62f5e4 median of 3 5-minute runs, scale_factor = 300, max_connections = 300, shared_buffers = 8GB 600000 500000 TPS 400000 300000 200000 100000 9.4 HEAD 0 1 8 16 32 64 128 256 Client Count M/c used - IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM 5

Read Scalability In 9.4 it peaks at 32 clients, now it peaks at 64 clients and we can see the performance improvement upto (~98%) and it is better in all cases at higher client count starting from 32 clients The main work which lead to this improvement is commit ab5194e6 (Improve LWLock scalability) 6

Read Scalability The previous implementation has a bottleneck around spin locks that were acquired for LWLock Acquisition and Release and the implementation for 9.5 has changed the LWLock implementation to use atomic operations to manipulate the state. 7

Read Scalability Data fits in RAM pgbench -S -M prepared, PG9.5dev as of commit 62f5e4 median of 3 5-minute runs, scale_factor = 1000, max_connections = 300, shared_buffers = 8GB TPS 400000 350000 300000 250000 200000 150000 100000 50000 0 1 8 16 32 64 128 256 Client Count 9.4 HEAD 8

Read Scalability Performance Improvement 25% at 32 client count 96% at 64 client count Commits lead to this improvement commit id 5d7962c6 (Change locking regimen around buffer replacement). commit id 3acc10c9 (Increase the number of buffer mapping partitions to 128). 9

Read Scalability 2 main bottlenecks a BufFreeList LWLock was getting acquired to find a free buffer for a page to change the association of buffer in buffer mapping hash table a LWLock is acquired on a hash partition to which the buffer to be associated belongs and as there were just 16 such partitions, there was huge contention when multiple clients starts operating on same partition 10

Read Scalability To reduce the bottleneck due to first problem, used a spinlock which is held just long enough to pop the freelist or advance the clock sweep hand, and then released. To reduce the bottleneck due to second problem, increase the buffer partitions to 128. The crux of this improvement is that we had to resolve both the bottlenecks together to see a major improvement in scalability. 11

Contents Read Scalability Further Improvements In Read Operation Other Performance Work In 9.5 Page Writes Write Scalability 12

Further Improvements In Read Operation Dynahash tables - Current Number of Partitions sufficient? - Bottleneck is around the spinlock used to protect any addition or deletion in hash table (in particular nentries and freelist). Snapshot Acquire - Contends with transaction end 13

Contents Read Scalability Further Improvements In Read Operation Other Performance Work In 9.5 Page Writes Write Scalability 14

Sorting Improvements Use abbreviated keys for faster sorting of text, numeric, datum This can be much faster than the old way of doing sorting if the first few bytes of the string are usually sufficient to resolve the comparison. 15

Sorting Improvements As an example create table stuff as select random()::text as a, 'filler filler filler'::text as b, g as c from generate_series(1, 1000000) g; SELECT 1000000 create index on stuff (a); CREATE INDEX On PPC64 m/c, before this feature, above operation use to take 6.3 seconds and after feature it took just 1.9 seconds, which is 3x improvement. Hooray! 16

PLpgsql Improvements Impressive speed gains for plpgsql functions that do element-by-element access or update of large arrays. Reduce IO casting and used binary casting for assignment among nonidentical variable types wherever possible. 17

BRIN Block Range Index Stores only bounds-per-block-range Default is 128 blocks Very small indexes Scans all blocks for matches Used for scanning large tables 18

Parallel Vacuumdb vacuumdb can use concurrent connections Add -j<n> to command line This option reduces the time of the processing but it also increases the load on the database server, so use it cautiously. 19

WAL Compression Optional compression for full page images in WAL wal_compression=off Smaller WAL Faster writes, faster replication Costs CPU Only compresses FPIs 20

Reduce Lock Level Reduce lock levels to ShareRowExclusive for the following SQL CREATE TRIGGER (but not DROP or ALTER) ALTER TABLE ENABLE TRIGGER ALTER TABLE DISABLE TRIGGER ALTER TABLE ADD CONSTRAINT FOREIGN KEY 21

Miscellaneous Performance Improvements Improved performance for Index Scan on ">" condition. We can see performance improvement from 5 to 30 percent. Improved speed for CRC calculation which will help in reducing WAL record formation time. Reduced memory allocations during transaction start time. This has small but measurable performance improvement for simple transactions. 22

Contents Read Scalability Further Improvements In Read Operation Other Performance Work In 9.5 Page Writes Write Scalability 23

Page Writes Page Writes are done for dirty buffers by - Checkpoint, when it gets triggered - Bgwriter, when it gets triggered - Backend, when it needs to evict dirty buffer or for some kind of DDL's like ALTER TABLE SET TABLESPACE Both Bgwriter and Backend flushes the page to kernel and the real write is done by kernel. 24

Page Writes Tests which shows the writes frequency All the tests are are done on Power-8 m/c Common non-default settings shared_buffers=8gb; min_wal_size=15gb; max_wal_size=20gb checkpoint_timeout =35min; maintenance_work_mem = 1GB checkpoint_completion_target = 0.9; autovacuum=off synchronous_commit = off; scale_factor=3000 Test used to collect data./pgbench -c 64 -j 64 -T 1800 -M prepared postgres 25

Page Writes Default : bgwriter_delay=200ms;bgwriter_lru_maxpages=100;bgwriter_lru_multiplier=2.0 non_def_1 : bgwriter_delay=10ms;bgwriter_lru_maxpages=800;bgwriter_lru_multiplier=4.0 non_def_2 : bgwriter_delay=10ms;bgwriter_lru_maxpages=1000;bgwriter_lru_multiplier=10.0 Columns Default non_def_1 non_def_2 checkpoints_timed 0 0 0 checkpoints_req 14 14 14 checkpoint_write_time 517261 523436 487685 checkpoint_sync_time 572158 672476 671569 buffers_checkpoint 4336630 4262901 4167658 buffers_clean 849710 10383550 10427094 maxwritten_clean 8328 618 1775 buffers_backend 9607706 104214 103963 buffers_backend_fsync 0 0 0 buffers_alloc 22417504 21907092 21838094 26

Page Writes - Observations Backend writes (buffers_backend) have been reduced significantly on changing bgwriter specific settings. Even at most aggressive settings (non_def_2), the writes have not been reduced to zero. Reduced writes by backend improves performance by just 3~4%. Writing to kernel is not that costly. 27

Contents Read Scalability Further Improvements In Read Operation Other Performance Work In 9.5 Page Writes Write Scalability 28

Write Scalability What is Write Scalability? Write operations (Insert/Update/Delete) should scale as as number of sessions increases assuming enough CPU's. But it doesn't, because of locking done during Commit operation. We are mostly concerned about workloads where data fits in memory. 29

Write Scalability Performance Data Data is mainly taken for 2 kind of modes synchronous_commit = on synchronous_commit = off 2 kind of scale factors are used when all the data fits in shared buffers (scale_factor = 300) when all the data can't fit in shared buffers, but can fit in RAM (scale_factor = 3000) 30

Write Scalability Non- default parameters - min_wal_size=15gb; max_wal_size=20gb; checkpoint_timeout = 35min; maintenance_work_mem = 1GB; checkpoint_completion_target = 0.9; autovacuum=off 31

Write Scalability Data fits in shared_buffers (scale_factor = 300) Performance increase upto 64 client count with TPS being approximately 75 percent higher at 64 client-count as compare to 8 client count. Data doesn't fit in shared buffers, but fit in RAM (scale_factor = 3000) we can see performance increase upto 32 client-count with TPS being 64 percent higher than at 8 client-count and then it falls there on. 32

Write Scalability Non- default parameters - min_wal_size=15gb; max_wal_size=20gb; checkpoint_timeout = 35min; maintenance_work_mem = 1GB; checkpoint_completion_target = 0.9; autovacuum=off 33

Write Scalability Data fits in shared_buffers (scale_factor = 300) Performance increase upto 64 client count with TPS being approximately 189 percent higher at 64 client-count as compare to 8 client count which sounds good. Data doesn't fit in shared buffers, but fit in RAM (scale_factor = 3000) A pretty flat graph with some performance upto 16 client-count with TPS being approximately 22 percent higher than at 8 client-count and then it stays as it is when the data fits in shared_buffers (scale_factor = 300), TPS at higher client-count (64) in synchronous_commit = on mode becomes equivalent to TPS in synchronous_commit = off which suggests that either there is more contention around CLogControlLock in async mode or there is no major contention due to WAL writing in such loads. 34

Write Scalability General observations For both the cases (Asynchronous and Synchronous commit) when the data doesn't fit in shared_buffers (scale_factor = 3000), the TPS is quite low and one reason is that backends might be performing writes themselves. 35

Write Scalability Concurrency Bottlenecks As per my knowledge, the locks that can lead to contention for this workload are: a. ProcArrayLock (used for taking snapshot and at transaction commit) b. WALWriteLock (used for performing WALWrites) c. CLOGControlLock (used to read and write transaction status) d. WALInsertLocks (used for writing data to WAL buffer) 36

Questions? 37

Thanks! 38