Gleb Arshinov, Alexander Dymo PGCon 2010 PostgreSQL as a secret weapon for high-performance Ruby on Rails applications



Similar documents
PGCon PostgreSQL Performance Pitfalls

PostgreSQL database performance optimization. Qiang Wang

PostgreSQL Past, Present, and Future EDB All rights reserved.

Optimising the Mapnik Rendering Toolchain

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia

Practical Cassandra. Vitalii

PostgreSQL when it s not your job. Christophe Pettus PostgreSQL Experts, Inc. DjangoCon Europe 2012

Inside the PostgreSQL Query Optimizer

Agenda. ! Strengths of PostgreSQL. ! Strengths of Hadoop. ! Hadoop Community. ! Use Cases

Monitoring PostgreSQL database with Verax NMS

PostgreSQL 9.0 High Performance

Performance Verbesserung von SAP BW mit SQL Server Columnstore

Oracle DBA Course Contents

Database Administration with MySQL

Instant SQL Programming

DBMS / Business Intelligence, SQL Server

Postgres Plus Advanced Server

PostgreSQL when it s not your job. Christophe Pettus PostgreSQL Experts, Inc. DjangoCon US 2012

Parallel Replication for MySQL in 5 Minutes or Less

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Sharding with postgres_fdw

In Memory Accelerator for MongoDB

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Optimizing Performance. Training Division New Delhi

Microsoft SQL Server performance tuning for Microsoft Dynamics NAV

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

Top 10 reasons your ecommerce site will fail during peak periods

MySQL Storage Engines

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Benchmarking Cassandra on Violin

Full Text Search in MySQL 5.1 New Features and HowTo

Cloud Computing at Google. Architecture

Administering your PostgreSQL Geodatabase

The Arts & Science of Tuning HANA models for Performance. Abani Pattanayak, SAP HANA CoE Nov 12, 2015

Linas Virbalas Continuent, Inc.

A Shared-nothing cluster system: Postgres-XC

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Database migration. from Sybase ASE to PostgreSQL. Achim Eisele and Jens Wilke. 1&1 Internet AG

PostgreSQL Functions By Example

Best Practices for Multi-Dimensional Design using Cognos 8 Framework Manager

Scalability And Performance Improvements In PostgreSQL 9.5

Comparing SQL and NOSQL databases

GraySort on Apache Spark by Databricks

Graph Database Proof of Concept Report

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Database Design Patterns. Winter Lecture 24

Stellar Phoenix. SQL Database Repair 6.0. Installation Guide

Virtuoso and Database Scalability

Real Life Performance of In-Memory Database Systems for BI

Social Networks and the Richness of Data

Performance And Scalability In Oracle9i And SQL Server 2000

Binary search tree with SIMD bandwidth optimization using SSE

Object Oriented Database Management System for Decision Support System.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs)

Why Zalando trusts in PostgreSQL

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Real-time reporting at 10,000 inserts per second. Wesley Biggs CTO 25 October 2011 Percona Live

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

Advances in Virtualization In Support of In-Memory Big Data Applications

SQL Server 2008 Performance and Scale

SQL Server Administrator Introduction - 3 Days Objectives

Novell File Reporter 2.5 Who Has What?

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

PostgreSQL 9.4 up and running.. Vasilis Ventirozos Database Administrator OmniTI

Which Database is Better for Zabbix? PostgreSQL vs MySQL. Yoshiharu Mori SRA OSS Inc. Japan

Physical DB design and tuning: outline

The elephant called PostgreSQL

Memory-Centric Database Acceleration

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Postgres Plus xdb Replication Server with Multi-Master User s Guide

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Be Very Afraid. Christophe Pettus PostgreSQL Experts Logical Decoding & Backup Conference Europe 2014

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Ground up Introduction to In-Memory Data (Grids)

Oracle Database In-Memory The Next Big Thing

CSE 403. Performance Profiling Marty Stepp

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Data sharing in the Big Data era

FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r

Optimizing Your Data Warehouse Design for Superior Performance

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

SAP HANA In-Memory Database Sizing Guideline

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

Sitecore Health. Christopher Wojciech. netzkern AG. Sitecore User Group Conference 2015

Scaling Database Performance in Azure

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

PostgreSQL Backup Strategies

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Conquer the 5 Most Common Magento Coding Issues to Optimize Your Site for Performance

Parquet. Columnar storage for the people

Transcription:

Gleb Arshinov, Alexander Dymo PGCon 2010 PostgreSQL as a secret weapon for high-performance Ruby on Rails applications www.acunote.com

About Gleb Arshinov, CEO, gleb@pluron.com Alexander Dymo, Director of Engineering, adymo@pluron.com Acunote www.acunote.com Online project management and Scrum software ~7000 customers Hosted on Own Servers Hosted on Customer's Servers nginx + mongrel PostgreSQL 8.4 Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 2 / 58

Web Development Pains Performance Data integrity Developer productivity Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 3 / 58

Types of Web Apps Web apps range from Digg to a custom accounting system Your app is somewhere in between Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 4 / 58

Outline Rails as ORM Optimizing Rails With PostgreSQL PostgreSQL Limitations PostgreSQL Approaches Optimizing Database Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 5 / 58

Rails as ORM > Myths and Realities Myth: not close to SQL Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 6 / 58

Rails as ORM > Myths and Realities Close to SQL: author = Author.find(:first) select * from authors limit 1; articles = author.articles select * from articles where author_id = 1 author_name = Orwell author = Author.find(:all, :conditions => [ name =?, author_name]) select * from authors where name = Orwell Drop down to SQL easily: author = Author.find_by_sql( select * from authors where authors.birthday > now() ) Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 7 / 58

Rails as ORM > Attribute Preloading The conventional Rails way is not so bad: Tasks id serial name varchar Tags id serial name varchar Tasks_Tags tag_id integer task_id integer tasks = Task.find(:all, :include => :tags) select * from tasks select * from tags inner join tasks_tags on tags.id = tasks_tags.tag_id where tasks_tags.task_id in (1,2,3,..) Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 8 / 58

Rails as ORM > Multiplying Queries But classical ORM problem exists: create table Task ( id serial not null, parent_id integer ) class Task < ActiveRecord::Base acts_as_tree end Uses N+1 database queries to load N nodes from the tree: (root) -1-11 -2-12 - 21-111 - 112 select * from tasks where parent_id = nil select * from tasks where parent_id = 1 select * from tasks where parent_id = 11 select * from tasks where parent_id = 111 select * from tasks where parent_id = 112 select * from tasks where parent_id = 12 select * from tasks where parent_id = 2 select * from tasks where parent_id = 21 Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 9 / 58

Rails as ORM > Multiplying Queries create table Task ( id serial not null, parent_id integer ) class Task < ActiveRecord::Base acts_as_tree end Should rather be: select * from tasks left outer join (select id as parents_id, parent_id as parents_parent_id from tasks) as parents on (tasks.parent_id = parents_id) left outer join (select id as parents_parents_id from tasks) as parents_parents on (parents_parent_id = parents_parents_id) Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 10 / 58

Rails as ORM > Generating Queries 4 1 2 3 1. Task tree (3 levels) (+2 joins & 1 subselect) 2. Task tags (+2 subselects) 3. Task property counters (+4 subselects) 4. Last "timecell" values (+4 joins to get group-wise maximum) etc... - 12 joins and subselects Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 11 / 58

Rails as ORM > Generating Queries All this done in 1 query in under 60ms even on EeePC! Equivalent Ruby code took up to 8 sec! 133x! Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 12 / 58

Rails as ORM > Query Tests rev 834: Show past and future sprints in the list --- application_helper.rb +++ application_helper.rb @@ -456,8 +456,8 @@ sprints = [] sprints.concat current_project.sprints(:present) +sprints.concat current_project.sprints(:past) +sprints.concat current_project.sprints(:future) Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 13 / 58

Rails as ORM > Query Tests rev 834: Show past and future sprints in the list --- application_helper.rb +++ application_helper.rb @@ -456,8 +456,8 @@ sprints = [] sprints.concat current_project.sprints(:present) +sprints.concat current_project.sprints(:past) +sprints.concat current_project.sprints(:future) Sprint 20 x (1+5) (C) Before 0.87 ± 0.01 After 0.88 ± 0.01 Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 14 / 58

Rails as ORM > Query Tests rev 834: Show past and future sprints in the list --- application_helper.rb +++ application_helper.rb @@ -456,8 +456,8 @@ sprints = [] sprints.concat current_project.sprints(:present) +sprints.concat current_project.sprints(:past) +sprints.concat current_project.sprints(:future) --- empty_controller_test.rb +++ empty_controller_test.rb @@ -79,11 +79,12 @@ "Sprint Load", + "Sprint Load", + "Sprint Load", "common/_nav_dialog", "Project Load", Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 15 / 58

Rails as ORM > Query Tests Query tests to make sure we don't fall into the multiplying queries trap def test_queries queries = track_queries do get :index end assert_equal queries, [ "Task Load", "Tag Load", "Event Create", "SQL" ] end Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 16 / 58

Rails as ORM > Query Tests module ActiveSupport class BufferedLogger attr_reader :tracked_queries def tracking=(val) @tracked_queries = [] @tracking = val end def add_with_tracking(severity, message = nil, progname = nil, &block) @tracked_queries << $1 if @tracking && message =~ /3[56]\;1m(.* (Load Create Update Destroy)) \(/ @tracked_queries << $1 if @tracking && message =~ /3[56]\;1m(SQL) \(/ add_without_tracking(severity, message, progname, &block) end alias_method_chain :add, :tracking end end class ActiveSupport::TestCase def track_queries(&block) RAILS_DEFAULT_LOGGER.tracking = true yield result = RAILS_DEFAULT_LOGGER.tracked_queries RAILS_DEFAULT_LOGGER.tracking = false result end end Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 17 / 58

Rails as ORM > Migrations Use SQL DDL not Rails DSL (unless targeting multiple RDBMS) Schema in SQL vs Rails parlance Migration in SQL Migration in Rails parlance execute " create table Foo ( id serial not null, name varchar(20), bar_id integer, create_table :foo do t t.string :name, :limit => 20 t.references :bar end primary key (id), foreign key (bar_id) references Bar (id) execute "alter table foo add foreign key (bar_id) references Bar (id)" ); " Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 18 / 58

Rails as ORM > Data Integrity Myth - rails does not support constraints Actually not possible to assure data integrity in Rails Use constraints, rules, triggers and other database magic to protect data integrity, not to implement business logic FK constraints -- everything should be RESTRICT ON X SET NULL and CASCADE is a problem Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 19 / 58

Outline Rails as ORM Optimizing Rails with PostgreSQL PostgreSQL Limitations PostgreSQL Approaches Optimizing Database Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 20 / 58

Rails Performance > Ruby Good language, bad implementation Slow Unreliable Deal with it! Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 21 / 58

Rails Performance > Ruby Compare to the database: PostgreSQL: explain analyze select sin(2+2) as hard_stuff; QUERY PLAN ------------------------------------------------------------------Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1) Total runtime: 0.012 ms Ruby: Benchmark.realtime{ sin(2+2) }*1000 > 0.027 ms 13x! Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 22 / 58

Rails Performance > Rails Has a reputation of being slow Actually even slower Most of the time spent in GC CPU bound Doesn't parallelize Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 23 / 58

Rails Performance > Performance Tests Keep a set of benchmarks for most frequent user requests. For example: Benchmark Burndown 120 Benchmark Inc. Burndown 120 Benchmark Sprint 20 x (1+5) (C) Benchmark Issues 100 (C) Benchmark Prediction 120 Benchmark Progress 120 Benchmark Sprint 20 x (1+5) Benchmark Timeline 5x100 Benchmark Signup Benchmark Export Benchmark Move Here 20/120 Benchmark Order By User Benchmark Set Field (EP) Benchmark Task Create + Tag... 30 more... 0.70 0.92 0.45 0.34 0.56 0.23 0.93 0.11 0.77 0.20 0.89 0.98 0.21 0.23 ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 24 / 58

Rails Performance > Performance Tests Benchmarks as a special kind of tests: class RenderingTest < ActionController::IntegrationTest def test_sprint_rendering login_with users(:user), "user" benchmark :title => "Sprint 20 x (1+5) (C)", :route => "projects/1/sprints/3/show", :assert_template => "tasks/index" end end Benchmark Sprint 20 x (1+5) (C) 0.45 ± 0.00 Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 25 / 58

Rails Performance > Performance Tests Benchmarks as a special kind of tests: def benchmark(options = {}) (0..100).each do i GC.start pid = fork do begin out = File.open("values", "a") ActiveRecord::Base.transaction do elapsed_time = Benchmark::realtime do request_method = options[:post]? :post : :get send(request_method, options[:route]) end out.puts elapsed_time if i > 0 out.close raise CustomTransactionError end rescue CustomTransactionError exit end end Process::waitpid pid ActiveRecord::Base.connection.reconnect! end values = File.read("values") print "#{mean(values).to_02f} ± #{sigma(values).to_02f}\n" end Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 26 / 58

Rails Performance > Solution Scalability is not a substitute for performance Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 27 / 58

Rails Performance > Solution Delegate as much work as possible to... Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 28 / 58

Rails Performance > Solution Delegate as much work as possible to the database! Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 29 / 58

Rails Performance > Attribute Preloading The conventional Rails way: Tasks id serial name varchar Tags id serial name varchar Tasks_Tags tag_id integer task_id integer tasks = Task.find(:all, :include => :tags) > 0.058 sec 2 SQL queries select * from tasks select * from tags inner join tasks_tags on tags.id = tasks_tags.tag_id where tasks_tags.task_id in (1,2,3,..) Rals creates an object for each tag, that's not fast and takes memory Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 30 / 58

Rails Performance > Attribute Preloading Faster with Postgres arrays: Tasks id serial name varchar Tags id serial name varchar Tasks_Tags tag_id integer task_id integer tasks = Task.find(:all, :select => "*, array(select tags.name from tags inner join tasks_tags on (tags.id = tasks_tags.tag_id) where tasks_tasks.task_id=tasks.id) as tag_names ") > 0.018 sec 1 SQL query Rails doesn't have to create objects >3x faster: (was 0.058 sec, now 0.018 sec) 3x! Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 31 / 58

Rails Performance > Preloading Attributes Faster with Postgres arrays: Tasks id serial name varchar Tags id serial name varchar Tasks_Tags tag_id integer task_id integer tasks = Task.find(:all, :select => "*, array(select tags.name from tags inner join tasks_tags on (tags.id = tasks_tags.tag_id) where tasks_tasks.task_id=tasks.id) as tag_names ") puts tasks.first.tag_names > "{Foo,Bar,Zee}" Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 32 / 58

Rails Performance > Access Control Simplified model for user privilege management: Users id serial name varchar Role id serial name varchar privilege1 boolean privilege2 boolean... Roles_Users user_id integer role_id integer user = User.find(:first, :include => :roles) can_do_1 = user.roles.any { role role.privilege1? } Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 33 / 58

Rails Performance > Access Control Simplified model for user privilege management: Users id serial name varchar Role id serial name varchar privilege1 boolean privilege2 boolean... Roles_Users user_id integer role_id integer user = User.find(:first, :include => :roles) can_do_1 = user.roles.any { role role.privilege1? } Where is the problem? - 2 SQL queries - Rails has to create objects for each role - Ruby iterates over the roles array Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 34 / 58

Rails Performance > Access Control Same in SQL: Users id serial name varchar Role id serial name varchar privilege1 boolean privilege2 boolean Roles_Users user_id integer role_id integer user = User.find(:first, :select => "*", :joins => " inner join (select user_id, bool_or(privilege1) as privilege1 from roles_users inner join roles on (roles.id = roles_users.role_id) group by user_id) as roles_users on (users.id = roles_users.user_id) " ) can_do_1 = ActiveRecord::ConnectionAdapters::Column. value_to_boolean(user.privilege1) Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 35 / 58

Rails Performance > Access Control Optimization Effect: can_do_1 = user.roles.any { role role.privilege1? } > 2.1 sec can_do_1 = ActiveRecord::ConnectionAdapters::Column. value_to_boolean(user.privilege1) > 64 msec!!! Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 36 / 58

Rails Performance > Aggregations Perform calculations and aggregations on large datasets in SQL: real life example: 600 000 data rows, 3-dimensional OLAP cube, slicing and aggregation: Ruby: ~1 Gb RAM, ~90 sec SQL: up to 5 sec Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 37 / 58

Outline Rails as ORM Rails Performance and PostgreSQL PostgreSQL Experience PostgreSQL Approaches Optimizing Database Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 38 / 58

Postgres Experience > Good Things Good things about Postgres: SQL standard compliance (and useful non-standard addons) good documentation sustainable development good optimizer and EXPLAIN ANALYZE a lot of things can be expressed in pure SQLConstraints referential integrity deadlock detection Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 39 / 58

Postgres Experience > Good Things Good things that were introduced recently: replication (warm and hot standby, streaming replication) windowing functions recursive queries ordering for aggregates Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 40 / 58

Postgres Experience > Limitations And now... limitations Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 41 / 58

Postgres Experience > Limit/Offset Pagination VS Subselects: select *, (select count(*) from attachments where issue_id = issues.id) as num_attachments from issues limit 100 offset 0; Limit -> (cost=0.00..831.22 rows=100 width=143) (actual time=0.050..1.242 rows=100 loops=1) Seq Scan on issues (cost=0.00..2509172.92 rows=301866 width=143) (actual time=0.049..1.119 rows=100 loops=1) SubPlan -> Aggregate (cost=8.27..8.28 rows=1 width=0) (actual time=0.006..0.006 rows=1 loops=100) -> Index Scan using attachments_issue_id_idx on attachments (cost=0.00..8.27 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=100) Index Cond: (issue_id = $0) Total runtime: 1.383 ms Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 42 / 58

Postgres Experience > Limit/Offset Pagination VS Subselects: select *, (select count(*) from attachments where issue_id = issues.id) as num_attachments from issues limit 100 offset 100; Limit -> (cost=831.22..1662.44 rows=100 width=143) (actual time=1.070..7.927 rows=100 loops=1) Seq Scan on issues (cost=0.00..2509172.92 rows=301866 width=143) (actual time=0.039..7.763 rows=200 loops=1) SubPlan -> Aggregate (cost=8.27..8.28 rows=1 width=0) (actual time=0.034..0.034 rows=1 loops=200) -> Index Scan using attachments_issue_id_idx on attachments (cost=0.00..8.27 rows=1 width=0) (actual time=0.032..0.032 rows=0 loops=200) Index Cond: (issue_id = $0) Total runtime: 8.065 ms Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 43 / 58

Postgres Experience > Limit/Offset Be careful with subselects: they are executed limit + offset times! Use joins to overcome the limitation Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 44 / 58

Postgres Experience > in() and Joins Use any(array ()) instead of in() to force subselect and avoid join explain analyze select * from issues where id in (select issue_id from tags_issues); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------Merge IN Join (actual time=0.096..576.704 rows=55363 loops=1) Merge Cond: (issues.id = tags_issues.issue_id) -> Index Scan using issues_pkey on issues (actual time=0.027..270.557 rows=229991 loops=1) -> Index Scan using tags_issues_issue_id_key on tags_issues (actual time=0.051..73.903 rows=70052loops=1) Total runtime: 605.274 ms explain analyze select * from issues where id = any( array( (select issue_id from tags_issues) ) ); QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on issues (actual time=247.358..297.932 rows=55363 loops=1) Recheck Cond: (id = ANY ($0)) InitPlan -> Seq Scan on tags_issues (actual time=0.017..51.291 rows=70052 loops=1) -> Bitmap Index Scan on issues_pkey (actual time=246.589..246.589 rows=70052 loops=1) Index Cond: (id = ANY ($0)) Total runtime: 325.205 ms 2x! Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 45 / 58

Postgres Experience > generate_series() select * from (select *, (select min(split_date) from tasks where tasks.issue_id = issues.id) as split_date from issues where org_id = 2) as issues, (select generate_series(0,10) + date '2010-01-01' as date) as dates QUERY PLAN ---------------------------------------------------------------------------------------Nested Loop (actual time=2.581..2525.798 rows=149666 loops=1) -> Result (actual time=0.007..0.063 rows=11 loops=1) -> Bitmap Heap Scan on issues (actual time=2.697..47.756 rows=13606 loops=11) Recheck Cond: (public.issues.org_id = 2) -> Bitmap Index Scan on issues_org_id_idx (actual time=1.859..1.859 rows=13607 loops=11) Index Cond: (public.issues.org_id = 2) SubPlan 1 -> Aggregate (actual time=0.010..0.010 rows=1 loops=149666) -> Index Scan using tasks_issue_id_key on tasks (actual time=0.006..0.008 rows=1 loops=149666) Index Cond: (issue_id = $0) Total runtime: 2608.891 ms Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 46 / 58

Postgres Experience > generate_series() select * from (select * from issues left outer join ( select issue_id, min(split_date) as split_date from tasks where org_id = 2 group by issue_id ) tasks on (tasks.issue_id = issues.id) where org_id = 2) as issues, (select generate_series(0,10) + date '2010-01-01' as date) as dates QUERY PLAN ---------------------------------------------------------------------------------------Nested Loop (actual time=174.706..831.263 rows=149666 loops=1) -> Result (actual time=0.006..0.055 rows=11 loops=1) -> Merge Left Join (actual time=15.885..60.496 rows=13606 loops=11) Merge Cond: (public.issues.id = public.tasks.issue_id) -> Sort (actual time=8.048..18.068 rows=13606 loops=11) -> Bitmap Heap Scan on issues (actual time=2.7..55 rows=13606 loops=1) Recheck Cond: (org_id = 2) -> Bitmap Index Scan on issues_org_id_idx (actual time=1.912..1.>> Index Cond: (org_id = 2) -> Sort (actual time=7.834..15.519 rows=13202 loops=11) -> HashAggregate (actual time=62.150..71.767 rows=13202 loops=1) -> Bitmap Heap Scan on tasks (actual time=3.177..41.700 rows=18>> Recheck Cond: (org_id = 2) -> Bitmap Index Scan on tasks_org_id_idx (actual time=2.50>> Index Cond: (org_id = 2) Total runtime: 906.146 ms Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 47 / 58

Outline Rails as ORM Rails Performance and PostgreSQL PostgreSQL Limitations PostgreSQL Approaches Optimizing Database Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 48 / 58

Postgres Approaches Benchmarking/performance Distrust vendors Sane appreciation of commodity hardware Culture of operations Release management Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 49 / 58

Outline Rails as ORM Rails Performance and PostgreSQL PostgreSQL Experience PostgreSQL Approaches Optimizing Database Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 50 / 58

Optimize Database > Basics How to optimize PostgreSQL: explain analyze explain analyze explain analyze... Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 51 / 58

Optimize Database > Cold DB Tip EXPLAIN ANALYZE explains everything, but...... run it also for the "cold" database state! Example: complex query which works on 230 000 rows and does 9 subselects / joins: cold state: 28 sec, hot state: 2.42 sec Database server restart doesn't help Need to clear disk cache: sudo echo 3 sudo tee /proc/sys/vm/drop_caches (Linux) Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 52 / 58

Optimize Database > Shared Database You're competing for memory cache on a shared server: 1. two databases with equal load share the cache Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 53 / 58

Optimize Database > Shared Database You're competing for memory cache on a shared server: 2. one of the databases gets more load and wins the cache Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 54 / 58

Optimize Database > Shared Database As a result, your database can always be in a "cold" state and you read data from disk, not from memory! complex query which works on 230 000 rows and does 9 subselects / joins: from disk: 28 sec, from memory: 2.42 sec Solutions: optimize for IO/cold state sudo echo 3 sudo tee /proc/sys/vm/drop_caches push down SQL conditions Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 55 / 58

Optimize Database > Postgres Config # How much memory we have to cache the database, RAM_FOR_DATABASE * 3/4 effective_cache_size = <%= ram_for_database.to_i * 3/4 %>MB # Shared memory to hold data in RAM, RAM_FOR_DATABASE/4 shared_buffers = <%= ram_for_database.to_i / 3 %>MB # Work memory for queries (RAM_FOR_DATABASE/max_connections) ROUND DOWN 2^x work_mem = <%= 2**(Math.log(ram_for_database.to_i / expected_max_active_connections.to_i)/math.log(2)).floor %>MB # Memory for vacuum, autovacuum, index creation, RAM/16 ROUND DOWN 2^x maintenance_work_mem = <%= 2**(Math.log(ram_for_database.to_i / 16)/Math.log(2)).floor %>MB # To ensure that we don't lose data, always fsync after commit synchronous_commit = on # Size of WAL on disk, recommended setting: 16 checkpoint_segments = 16 # WAL memory buffer wal_buffers = 8MB # Ensure autovacuum is always turned on autovacuum = on # Set the number of concurrent disk I/O operations that PostgreSQL expects can be executed simultaneously. effective_io_concurrency = 4 Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 56 / 58

Optimize Database > Postgres Config Effect from better configuration: Query Default Settings Custom Settings Effect Aggregation on a large dataset 8205 ms 7685 ms 6% 143 ms 38% Query with complex 229 ms joins and subselects Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 57 / 58

Thanks! Rails performance articles and more: http://blog.pluron.com Gleb Arshinov CEO gleb@pluron.com Alexander Dymo Director of Engineering adymo@pluron.com Gleb Arshinov & Alex Dymo PostgreSQL as a secret weapon for high-performance Ruby on Rails applications PGCon 2010 58 / 58