Explaining the Postgres Query Optimizer

Similar documents

Programming the SQL Way with Common Table Expressions

SQL Query Evaluation. Winter Lecture 23

The Advantages of PostgreSQL

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

Inside the PostgreSQL Query Optimizer

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

Sharding with postgres_fdw

Oracle Database 11g: SQL Tuning Workshop

SQL Server Query Tuning

Unit Storage Structures 1. Storage Structures. Unit 4.3

ERserver. iseries. DB2 Universal Database for iseries - Database Performance and Query Optimization

Oracle Database 11g: SQL Tuning Workshop Release 2

Advanced Oracle SQL Tuning

Weaving Stored Procedures into Java at Zalando

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Partitioning under the hood in MySQL 5.5

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

D B M G Data Base and Data Mining Group of Politecnico di Torino

Useful Business Analytics SQL operators and more Ajaykumar Gupte IBM

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Chapter 13: Query Processing. Basic Steps in Query Processing

A Shared-nothing cluster system: Postgres-XC

Mastering Mail Merge. 2 Parts to a Mail Merge. Mail Merge Mailings Ribbon. Mailings Create Envelopes or Labels

Merging Labels, Letters, and Envelopes Word 2013

Copyright 2015 NTT corp. All Rights Reserved.

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Performance Tuning for the Teradata Database

When a variable is assigned as a Process Initialization variable its value is provided at the beginning of the process.

Query Processing C H A P T E R12. Practice Exercises

Talking to Databases: SQL for Designers

Programming Database lectures for mathema

NetBrain Enterprise Edition 6.0a NetBrain Server Backup and Failover Setup

Introduction to Triggers using SQL

Monitoring PostgreSQL database with Verax NMS

DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

How To Use Postgresql With Foreign Data In A Foreign Server In A Multi Threaded Database (Postgres)

Relational Database: Additional Operations on Relations; SQL

IENG2004 Industrial Database and Systems Design. Microsoft Access I. What is Microsoft Access? Architecture of Microsoft Access

Using Ad-Hoc Reporting

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Managing rights in PostgreSQL

Tech Note 663 HMI Reports: Creating Alarm Database (WWALMDB) Reports

Best Practices for DB2 on z/os Performance

Improving SQL Server Performance

WRITING EFFICIENT SQL. By Selene Bainum

Database Administration with MySQL

SQL Server. 1. What is RDBMS?

Look Out The Window Functions

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

Microsoft Access 2010 Part 1: Introduction to Access

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

The Database is Slow

Analyzing Plan Diagrams of Data Query Optimizers

Introduction to SQL for Data Scientists

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

SQL Server Database Coding Standards and Guidelines

HTSQL is a comprehensive navigational query language for relational databases.

Inventory Manager. Getting started Usage and general How-To

MOC 20461C: Querying Microsoft SQL Server. Course Overview

Monitoring Agent for PostgreSQL Fix Pack 10. Reference IBM

Sage Abra SQL HRMS Reports. User Guide

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Porting from Oracle to PostgreSQL

Querying Microsoft SQL Server

Inquiry Formulas. student guide

Mail Merge Microsoft Word and Excel Queries Scott Kern Senior Consultant

ROLAP with Column Store Index Deep Dive. Alexei Khalyako SQL CAT Program Manager

User Manual Microsoft Dynamics AX Add-on LabAX Label Printing

In This Lecture. Physical Design. RAID Arrays. RAID Level 0. RAID Level 1. Physical DB Issues, Indexes, Query Optimisation. Physical DB Issues

The join operation allows you to combine related rows of data found in two tables into a single result set.

SQL Query Performance Tuning: Tips and Best Practices

SQL. by Steven Holzner, Ph.D. ALPHA. A member of Penguin Group (USA) Inc.

Gleb Arshinov, Alexander Dymo PGCon 2010 PostgreSQL as a secret weapon for high-performance Ruby on Rails applications

Using Database Performance Warehouse to Monitor Microsoft SQL Server Report Content

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

SQL - QUICK GUIDE. Allows users to access data in relational database management systems.

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

ERserver. iseries. DB2 Universal Database for iseries - Database Performance and Query Optimization

CHAPTER 23: USING ODBC

DIPLOMA IN WEBDEVELOPMENT

Advanced Query for Query Developers

MySQL for Beginners Ed 3

Oracle Database: SQL and PL/SQL Fundamentals NEW

Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server

Course ID#: W 35 Hrs. Course Content

Workflow performance analysis tests

Database Design Patterns. Winter Lecture 24

Lab # 5. Retreiving Data from Multiple Tables. Eng. Alaa O Shama

Databases for text storage

Part 5: More Data Structures for Relations

Using the SQL Server Linked Server Capability

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Transcription:

Explaining the Postgres Query Optimizer BRUCE MOMJIN The optimizer is the "brain" of the database, interpreting SQL queries and determining the fastest method of execution. This talk uses the EXPLIN command to show how the optimizer interprets queries and determines optimal execution. Creative Commons ttribution License http://momjian.us/presentations Last updated: pril, 2016 1 / 56

Postgres Query Execution User Terminal PostgreSQL pplication Code Database Server Libpq Queries Results 2 / 56

Postgres Query Execution Main Postmaster Libpq Postgres Postgres Parse Statement utility Traffic Cop Query SELECT, INSERT, UPDTE, DELETE Rewrite Query Utility Command e.g. CRETE TBLE, COPY Generate Paths Optimal Path Generate Plan Plan Execute Plan Utilities Catalog Storage Managers ccess Methods Nodes / Lists 3 / 56

Postgres Query Execution Parse Statement utility Traffic Cop Query SELECT, INSERT, UPDTE, DELETE Rewrite Query Utility Command e.g. CRETE TBLE, COPY Generate Paths Optimal Path Generate Plan Plan Execute Plan 4 / 56

The Optimizer Is the Brain http://www.wsmanaging.com/ 5 / 56

What Decisions Does the Optimizer Have to Make? Scan Method Join Method Join Order 6 / 56

Which Scan Method? Sequential Scan Bitmap Index Scan Index Scan 7 / 56

Simple Example Using pg_class.relname SELECT relname FROM pg_class ORDER BY 1 LIMIT 8; relname ----------------------------------- _pg_foreign_data_wrappers _pg_foreign_servers _pg_user_mappings administrable_role_authorizations applicable_roles attributes check_constraint_routine_usage check_constraints (8 rows) 8 / 56

Let s Use Just the First Letter of pg_class.relname SELECT substring(relname, 1, 1) FROM pg_class ORDER BY 1 LIMIT 8; substring ----------- _ a a a c c (8 rows) 9 / 56

Create a Temporary Table with an Index CRETE TEMPORRY TBLE sample (letter, junk) S SELECT substring(relname, 1, 1), repeat( x, 250) FROM pg_class ORDER BY random(); -- add rows in random order SELECT 253 CRETE INDEX i_sample on sample (letter); CRETE INDEX ll the queries used in this presentation are available at http:// momjian.us/main/writings/pgsql/optimizer.sql. 10 / 56

Create an EXPLIN Function CRETE OR REPLCE FUNCTION lookup_letter(text) RETURNS SETOF text S $$ BEGIN RETURN QUERY EXECUTE EXPLIN SELECT letter FROM sample WHERE letter = $1 ; END $$ LNGUGE plpgsql; CRETE FUNCTION 11 / 56

What is the Distribution of the sample Table? WITH letters (letter, count) S ( SELECT letter, COUNT(*) FROM sample GROUP BY 1 ) SELECT letter, count, (count * 100.0 / (SUM(count) OVER ()))::numeric(4,1) S "%" FROM letters ORDER BY 2 DESC; 12 / 56

What is the Distribution of the sample Table? letter count % --------+-------+------ p 199 78.7 s 9 3.6 c 8 3.2 r 7 2.8 t 5 2.0 v 4 1.6 f 4 1.6 d 4 1.6 u 3 1.2 a 3 1.2 _ 3 1.2 e 2 0.8 i 1 0.4 k 1 0.4 (14 rows) 13 / 56

Is the Distribution Important? EXPLIN SELECT letter FROM sample WHERE letter = p ; QUERY PLN ------------------------------------------------------------------------ Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=32) Index Cond: (letter = p ::text) (2 rows) 14 / 56

Is the Distribution Important? EXPLIN SELECT letter FROM sample WHERE letter = d ; QUERY PLN ------------------------------------------------------------------------ Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=32) Index Cond: (letter = d ::text) (2 rows) 15 / 56

Is the Distribution Important? EXPLIN SELECT letter FROM sample WHERE letter = k ; QUERY PLN ------------------------------------------------------------------------ Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=32) Index Cond: (letter = k ::text) (2 rows) 16 / 56

Running NLYZE Causes a Sequential Scan for a Common Value NLYZE sample; NLYZE EXPLIN SELECT letter FROM sample WHERE letter = p ; QUERY PLN --------------------------------------------------------- Seq Scan on sample (cost=0.00..13.16 rows=199 width=2) Filter: (letter = p ::text) (2 rows) utovacuum cannot NLYZE (or VCUUM) temporary tables because these tables are only visible to the creating session. 17 / 56

Sequential Scan T D T D T D T D T D T D T D T D T D 8K Heap D T T D T D 18 / 56

Less Common Value Causes a Bitmap Index Scan EXPLIN SELECT letter FROM sample WHERE letter = d ; QUERY PLN ----------------------------------------------------------------------- Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2) Recheck Cond: (letter = d ::text) -> Bitmap Index Scan on i_sample (cost=0.00..4.28 rows=4 width=0) Index Cond: (letter = d ::text) (4 rows) 19 / 56

Bitmap Index Scan Index 1 Index 2 Combined col1 = col2 = NS Index Table 0 0 0 ND NS 1 0 & 1 1 = 1 0 1 0 0 20 / 56

n Even Rarer Value Causes an Index Scan EXPLIN SELECT letter FROM sample WHERE letter = k ; QUERY PLN ----------------------------------------------------------------------- Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2) Index Cond: (letter = k ::text) (2 rows) 21 / 56

Index Scan D T D T D T D T D T D T D < > = Key < > = Key Index Heap < > = Key T D T D T D T D T D T 22 / 56

Let s Look at ll Values and their Effects WITH letter (letter, count) S ( SELECT letter, COUNT(*) FROM sample GROUP BY 1 ) SELECT letter S l, count, lookup_letter(letter) FROM letter ORDER BY 2 DESC; l count lookup_letter ---+-------+----------------------------------------------------------------------- p 199 Seq Scan on sample (cost=0.00..13.16 rows=199 width=2) p 199 Filter: (letter = p ::text) s 9 Seq Scan on sample (cost=0.00..13.16 rows=9 width=2) s 9 Filter: (letter = s ::text) c 8 Seq Scan on sample (cost=0.00..13.16 rows=8 width=2) c 8 Filter: (letter = c ::text) r 7 Seq Scan on sample (cost=0.00..13.16 rows=7 width=2) r 7 Filter: (letter = r ::text) 23 / 56

OK, Just the First Lines WITH letter (letter, count) S ( SELECT letter, COUNT(*) FROM sample GROUP BY 1 ) SELECT letter S l, count, (SELECT * FROM lookup_letter(letter) S l2 LIMIT 1) S lookup_letter FROM letter ORDER BY 2 DESC; 24 / 56

Just the First EXPLIN Lines ---+-------+----------------------------------------------------------------------- l count lookup_letter p 199 Seq Scan on sample (cost=0.00..13.16 rows=199 width=2) s 9 Seq Scan on sample (cost=0.00..13.16 rows=9 width=2) c 8 Seq Scan on sample (cost=0.00..13.16 rows=8 width=2) r 7 Seq Scan on sample (cost=0.00..13.16 rows=7 width=2) t 5 Bitmap Heap Scan on sample (cost=4.29..12.76 rows=5 width=2) f 4 Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2) v 4 Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2) d 4 Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2) a 3 Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2) _ 3 Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2) u 3 Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2) e 2 Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2) i 1 Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2) k 1 Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2) (14 rows) 25 / 56

We Can Force an Index Scan SET enable_seqscan = false; SET enable_bitmapscan = false; WITH letter (letter, count) S ( SELECT letter, COUNT(*) FROM sample GROUP BY 1 ) SELECT letter S l, count, (SELECT * FROM lookup_letter(letter) S l2 LIMIT 1) S lookup_letter FROM letter ORDER BY 2 DESC; 26 / 56

Notice the High Cost for Common Values ---+-------+----------------------------------------------------------------------- l count lookup_letter p 199 Index Scan using i_sample on sample (cost=0.00..39.33 rows=199 width= s 9 Index Scan using i_sample on sample (cost=0.00..22.14 rows=9 width=2) c 8 Index Scan using i_sample on sample (cost=0.00..19.84 rows=8 width=2) r 7 Index Scan using i_sample on sample (cost=0.00..19.82 rows=7 width=2) t 5 Index Scan using i_sample on sample (cost=0.00..15.21 rows=5 width=2) d 4 Index Scan using i_sample on sample (cost=0.00..15.19 rows=4 width=2) v 4 Index Scan using i_sample on sample (cost=0.00..15.19 rows=4 width=2) f 4 Index Scan using i_sample on sample (cost=0.00..15.19 rows=4 width=2) _ 3 Index Scan using i_sample on sample (cost=0.00..12.88 rows=3 width=2) a 3 Index Scan using i_sample on sample (cost=0.00..12.88 rows=3 width=2) u 3 Index Scan using i_sample on sample (cost=0.00..12.88 rows=3 width=2) e 2 Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2) i 1 Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2) k 1 Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2) (14 rows) RESET LL; RESET 27 / 56

This Was the Optimizer s Preference ---+-------+----------------------------------------------------------------------- l count lookup_letter p 199 Seq Scan on sample (cost=0.00..13.16 rows=199 width=2) s 9 Seq Scan on sample (cost=0.00..13.16 rows=9 width=2) c 8 Seq Scan on sample (cost=0.00..13.16 rows=8 width=2) r 7 Seq Scan on sample (cost=0.00..13.16 rows=7 width=2) t 5 Bitmap Heap Scan on sample (cost=4.29..12.76 rows=5 width=2) f 4 Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2) v 4 Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2) d 4 Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2) a 3 Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2) _ 3 Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2) u 3 Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2) e 2 Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2) i 1 Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2) k 1 Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2) (14 rows) 28 / 56

Which Join Method? Nested Loop With Inner Sequential Scan With Inner Index Scan Hash Join Merge Join 29 / 56

What Is in pg_proc.oid? SELECT oid FROM pg_proc ORDER BY 1 LIMIT 8; oid ----- 31 33 34 35 38 39 40 41 (8 rows) 30 / 56

Create Temporary Tables from pg_proc and pg_class CRETE TEMPORRY TBLE sample1 (id, junk) S SELECT oid, repeat( x, 250) FROM pg_proc ORDER BY random(); -- add rows in random order SELECT 2256 CRETE TEMPORRY TBLE sample2 (id, junk) S SELECT oid, repeat( x, 250) FROM pg_class ORDER BY random(); -- add rows in random order SELECT 260 These tables have no indexes and no optimizer statistics. 31 / 56

Join the Two Tables with a Tight Restriction EXPLIN SELECT sample2.junk FROM sample1 JOIN sample2 ON (sample1.id = sample2.id) WHERE sample1.id = 33; QUERY PLN --------------------------------------------------------------------- Nested Loop (cost=0.00..234.68 rows=300 width=32) -> Seq Scan on sample1 (cost=0.00..205.54 rows=50 width=4) Filter: (id = 33::oid) -> Materialize (cost=0.00..25.41 rows=6 width=36) -> Seq Scan on sample2 (cost=0.00..25.38 rows=6 width=36) Filter: (id = 33::oid) (6 rows) 32 / 56

Nested Loop Join with Inner Sequential Scan Outer aag aay aar aai Inner aai aag aas aar aay aaa aag No Setup Required Used For Small Tables 33 / 56

Pseudocode for Nested Loop Join with Inner Sequential Scan for (i = 0; i < length(outer); i++) for (j = 0; j < length(inner); j++) if (outer[i] == inner[j]) output(outer[i], inner[j]); 34 / 56

Join the Two Tables with a Looser Restriction EXPLIN SELECT sample1.junk FROM sample1 JOIN sample2 ON (sample1.id = sample2.id) WHERE sample2.id > 33; QUERY PLN ---------------------------------------------------------------------- Hash Join (cost=30.50..950.88 rows=20424 width=32) Hash Cond: (sample1.id = sample2.id) -> Seq Scan on sample1 (cost=0.00..180.63 rows=9963 width=36) -> Hash (cost=25.38..25.38 rows=410 width=4) -> Seq Scan on sample2 (cost=0.00..25.38 rows=410 width=4) Filter: (id > 33::oid) (6 rows) 35 / 56

Hash Join Outer Inner aay aak aas aag aak aam aay aar aar Hashed aao aaw Must fit in Main Memory 36 / 56

Pseudocode for Hash Join for (j = 0; j < length(inner); j++) hash_key = hash(inner[j]); append(hash_store[hash_key], inner[j]); for (i = 0; i < length(outer); i++) hash_key = hash(outer[i]); for (j = 0; j < length(hash_store[hash_key]); j++) if (outer[i] == hash_store[hash_key][j]) output(outer[i], inner[j]); 37 / 56

Join the Two Tables with No Restriction EXPLIN SELECT sample1.junk FROM sample1 JOIN sample2 ON (sample1.id = sample2.id); QUERY PLN ------------------------------------------------------------------------- Merge Join (cost=927.72..1852.95 rows=61272 width=32) Merge Cond: (sample2.id = sample1.id) -> Sort (cost=85.43..88.50 rows=1230 width=4) Sort Key: sample2.id -> Seq Scan on sample2 (cost=0.00..22.30 rows=1230 width=4) -> Sort (cost=842.29..867.20 rows=9963 width=36) Sort Key: sample1.id -> Seq Scan on sample1 (cost=0.00..180.63 rows=9963 width=36) (8 rows) 38 / 56

Merge Join Outer Inner aaa aaa Sorted aab aac aad aab aab aac Sorted aae aaf aaf Ideal for Large Tables n Index Can Be Used to Eliminate the Sort 39 / 56

Pseudocode for Merge Join sort(outer); sort(inner); i = 0; j = 0; save_j = 0; while (i < length(outer)) if (outer[i] == inner[j]) output(outer[i], inner[j]); if (outer[i] <= inner[j] && j < length(inner)) j++; if (outer[i] < inner[j]) save_j = j; else i++; j = save_j; 40 / 56

Order of Joined Relations Is Insignificant EXPLIN SELECT sample2.junk FROM sample2 JOIN sample1 ON (sample2.id = sample1.id); QUERY PLN ------------------------------------------------------------------------ Merge Join (cost=927.72..1852.95 rows=61272 width=32) Merge Cond: (sample2.id = sample1.id) -> Sort (cost=85.43..88.50 rows=1230 width=36) Sort Key: sample2.id -> Seq Scan on sample2 (cost=0.00..22.30 rows=1230 width=36) -> Sort (cost=842.29..867.20 rows=9963 width=4) Sort Key: sample1.id -> Seq Scan on sample1 (cost=0.00..180.63 rows=9963 width=4) (8 rows) The most restrictive relation, e.g. sample2, is always on the outer side of merge joins. ll previous merge joins also had sample2 in outer position. 41 / 56

dd Optimizer Statistics NLYZE sample1; NLYZE sample2; 42 / 56

This Was a Merge Join without Optimizer Statistics EXPLIN SELECT sample2.junk FROM sample1 JOIN sample2 ON (sample1.id = sample2.id); QUERY PLN ------------------------------------------------------------------------ Hash Join (cost=15.85..130.47 rows=260 width=254) Hash Cond: (sample1.id = sample2.id) -> Seq Scan on sample1 (cost=0.00..103.56 rows=2256 width=4) -> Hash (cost=12.60..12.60 rows=260 width=258) -> Seq Scan on sample2 (cost=0.00..12.60 rows=260 width=258) (5 rows) 43 / 56

Outer Joins Can ffect Optimizer Join Usage EXPLIN SELECT sample1.junk FROM sample1 RIGHT OUTER JOIN sample2 ON (sample1.id = sample2.id); QUERY PLN -------------------------------------------------------------------------- Hash Left Join (cost=131.76..148.26 rows=260 width=254) Hash Cond: (sample2.id = sample1.id) -> Seq Scan on sample2 (cost=0.00..12.60 rows=260 width=4) -> Hash (cost=103.56..103.56 rows=2256 width=258) -> Seq Scan on sample1 (cost=0.00..103.56 rows=2256 width=258) (5 rows) 44 / 56

Cross Joins re Nested Loop Joins without Join Restriction EXPLIN SELECT sample1.junk FROM sample1 CROSS JOIN sample2; QUERY PLN ---------------------------------------------------------------------- Nested Loop (cost=0.00..7448.81 rows=586560 width=254) -> Seq Scan on sample1 (cost=0.00..103.56 rows=2256 width=254) -> Materialize (cost=0.00..13.90 rows=260 width=0) -> Seq Scan on sample2 (cost=0.00..12.60 rows=260 width=0) (4 rows) 45 / 56

Create Indexes CRETE INDEX i_sample1 on sample1 (id); CRETE INDEX i_sample2 on sample2 (id); 46 / 56

Nested Loop with Inner Index Scan Now Possible EXPLIN SELECT sample2.junk FROM sample1 JOIN sample2 ON (sample1.id = sample2.id) WHERE sample1.id = 33; QUERY PLN --------------------------------------------------------------------------------- Nested Loop (cost=0.00..16.55 rows=1 width=254) -> Index Scan using i_sample1 on sample1 (cost=0.00..8.27 rows=1 width=4) Index Cond: (id = 33::oid) -> Index Scan using i_sample2 on sample2 (cost=0.00..8.27 rows=1 width=258) Index Cond: (sample2.id = 33::oid) (5 rows) 47 / 56

Nested Loop Join with Inner Index Scan Outer aag aay aar aai Index Lookup Inner aai aag aas aar aay aaa aag No Setup Required Index Must lready Exist 48 / 56

Pseudocode for Nested Loop Join with Inner Index Scan for (i = 0; i < length(outer); i++) index_entry = get_first_match(outer[j]) while (index_entry) output(outer[i], inner[index_entry]); index_entry = get_next_match(index_entry); 49 / 56

Query Restrictions ffect Join Usage EXPLIN SELECT sample2.junk FROM sample1 JOIN sample2 ON (sample1.id = sample2.id) WHERE sample2.junk ^aaa ; QUERY PLN ------------------------------------------------------------------------------- Nested Loop (cost=0.00..21.53 rows=1 width=254) -> Seq Scan on sample2 (cost=0.00..13.25 rows=1 width=258) Filter: (junk ^aaa ::text) -> Index Scan using i_sample1 on sample1 (cost=0.00..8.27 rows=1 width=4) Index Cond: (sample1.id = sample2.id) (5 rows) No junk rows begin with aaa. 50 / 56

ll junk Columns Begin with xxx EXPLIN SELECT sample2.junk FROM sample1 JOIN sample2 ON (sample1.id = sample2.id) WHERE sample2.junk ^xxx ; QUERY PLN ------------------------------------------------------------------------ Hash Join (cost=16.50..131.12 rows=260 width=254) Hash Cond: (sample1.id = sample2.id) -> Seq Scan on sample1 (cost=0.00..103.56 rows=2256 width=4) -> Hash (cost=13.25..13.25 rows=260 width=258) -> Seq Scan on sample2 (cost=0.00..13.25 rows=260 width=258) Filter: (junk ^xxx ::text) (6 rows) Hash join was chosen because many more rows are expected. The smaller table, e.g. sample2, is always hashed. 51 / 56

Without LIMIT, Hash Is Used for this Unrestricted Join EXPLIN SELECT sample2.junk FROM sample1 JOIN sample2 ON (sample1.id = sample2.id); QUERY PLN ------------------------------------------------------------------------ Hash Join (cost=15.85..130.47 rows=260 width=254) Hash Cond: (sample1.id = sample2.id) -> Seq Scan on sample1 (cost=0.00..103.56 rows=2256 width=4) -> Hash (cost=12.60..12.60 rows=260 width=258) -> Seq Scan on sample2 (cost=0.00..12.60 rows=260 width=258) (5 rows) 52 / 56

LIMIT Can ffect Join Usage EXPLIN SELECT sample2.id, sample2.junk FROM sample1 JOIN sample2 ON (sample1.id = sample2.id) ORDER BY 1 LIMIT 1; QUERY PLN ------------------------------------------------------------------------------------------ Limit (cost=0.00..1.83 rows=1 width=258) -> Nested Loop (cost=0.00..477.02 rows=260 width=258) -> Index Scan using i_sample2 on sample2 (cost=0.00..52.15 rows=260 width=258) -> Index Scan using i_sample1 on sample1 (cost=0.00..1.62 rows=1 width=4) Index Cond: (sample1.id = sample2.id) (5 rows) 53 / 56

LIMIT 10 EXPLIN SELECT sample2.id, sample2.junk FROM sample1 JOIN sample2 ON (sample1.id = sample2.id) ORDER BY 1 LIMIT 10; QUERY PLN ------------------------------------------------------------------------------------------ Limit (cost=0.00..18.35 rows=10 width=258) -> Nested Loop (cost=0.00..477.02 rows=260 width=258) -> Index Scan using i_sample2 on sample2 (cost=0.00..52.15 rows=260 width=258) -> Index Scan using i_sample1 on sample1 (cost=0.00..1.62 rows=1 width=4) Index Cond: (sample1.id = sample2.id) (5 rows) 54 / 56

LIMIT 100 Switches to Hash Join EXPLIN SELECT sample2.id, sample2.junk FROM sample1 JOIN sample2 ON (sample1.id = sample2.id) ORDER BY 1 LIMIT 100; QUERY PLN ------------------------------------------------------------------------------------ Limit (cost=140.41..140.66 rows=100 width=258) -> Sort (cost=140.41..141.06 rows=260 width=258) Sort Key: sample2.id -> Hash Join (cost=15.85..130.47 rows=260 width=258) Hash Cond: (sample1.id = sample2.id) -> Seq Scan on sample1 (cost=0.00..103.56 rows=2256 width=4) -> Hash (cost=12.60..12.60 rows=260 width=258) -> Seq Scan on sample2 (cost=0.00..12.60 rows=260 width=258) (8 rows) 55 / 56

Conclusion http://momjian.us/presentations http://www.vivapixel.com/photo/14252 56 / 56