Run$me Query Op$miza$on

Similar documents
Bigdata : Enabling the Semantic Web at Web Scale

bigdata SYSTAP, LLC All Rights Reserved bigdata Presented at Graph Data Management 2012

Using RDBMS, NoSQL or Hadoop?

SQL Query Evaluation. Winter Lecture 23

Data Stream Algorithms in Storm and R. Radek Maciaszek

Oracle Database In- Memory Op4on in Ac4on

The Sierra Clustered Database Engine, the technology at the heart of

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Chapter 13: Query Processing. Basic Steps in Query Processing

Inside the PostgreSQL Query Optimizer

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

An Oracle White Paper May Guide for Developing High-Performance Database Applications

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Lecture 1: Data Storage & Index

SQL Server Query Tuning

BBM467 Data Intensive ApplicaAons

Deliverable Billion Triple dataset hosted on the LOD2 Knowledge Store Cluster. LOD2 Creating Knowledge out of Interlinked Data

Advanced Oracle SQL Tuning

Graph Database Proof of Concept Report

OS/Run'me and Execu'on Time Produc'vity

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Performance Management in Big Data Applica6ons. Michael Kopp, Technology

Query Performance Tuning: Start to Finish. Grant Fritchey

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

DB2 for i5/os: Tuning for Performance

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

SQL Performance for a Big Data 22 Billion row data warehouse

SQL Query Performance Tuning: Tips and Best Practices

Introduc)on to urika. Mul)threading. SPARQL Database. urika Appliance. XMT- 2 Programming. Use Cases

Data Warehousing und Data Mining

Introduction. Bigdata Database Architecture

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan?

HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering

D B M G Data Base and Data Mining Group of Politecnico di Torino

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Content Distribu-on Networks (CDNs)

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Introducing Cost Based Optimizer to Apache Hive

Big Data Processing with Google s MapReduce. Alexandru Costan

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Extending the Linked Data API with RDFa

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

Architec;ng Splunk for High Availability and Disaster Recovery

How To Use A Webmail On A Pc Or Macodeo.Com

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

A Shared-nothing cluster system: Postgres-XC

Integrity Checking and Monitoring of Files on the CASTOR Disk Servers

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

Optimizing Performance. Training Division New Delhi

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Efficient Data Access and Data Integration Using Information Objects Mica J. Block

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

Introduc8on to Apache Spark

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Physical Database Design and Tuning

Best Practices for DB2 on z/os Performance

1. Physical Database Design in Relational Databases (1)

Oracle Spatial and Graph. Jayant Sharma Director, Product Management

SQream Technologies Ltd - Confiden7al

Datenbanksysteme II: Implementation of Database Systems Implementing Joins

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Binary search tree with SIMD bandwidth optimization using SSE

Nodes, Ties and Influence

Semantic Web Standard in Cloud Computing

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Parallel Computing. Benson Muite. benson.

Driving MySQL to Big Data Scale. Thomas Hazel Founder, Chief

BIG DATA AND INVESTIGATIVE ANALYTICS

White Paper. Optimizing the Performance Of MySQL Cluster

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Partitioning under the hood in MySQL 5.5

An Ant Colony Optimization Approach to the Software Release Planning Problem with Dependent Requirements

Project Group High- performance Flexible File System 2010 / 2011

Data Management in the Cloud

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

bigdata Managing Scale in Ontological Systems

An Open Dynamic Big Data Driven Applica3on System Toolkit

Architectures for Big Data Analytics A database perspective

Data Mining. Supervised Methods. Ciro Donalek Ay/Bi 199ab: Methods of Sciences hcp://esci101.blogspot.

Big RDF Data Partitioning and Processing using hadoop in Cloud

Transcription:

Run$me Query Op$miza$on Robust Op$miza$on for Graphs 2006-2014 All Rights Reserved 1

RDF Join Order Op$miza$on Typical approach Assign es$mated cardinality to each triple pabern. Bigdata uses the fast range counts Start with the most selec$ve triple pabern For each remaining join Propagate variable bindings Re- es$mate cardinality of remaining joins Cardinality es$ma$on error Grows exponen.ally with join path length 2006-2014 All Rights Reserved 2

Run$me Query Op$mizer (RTO) Inspired by ROX (Run$me Op$miza$on for XQuery) Online, incremental and adap$ve dynamic programming. Breadth first es$ma$on of actual cost of join paths. Cost func$on is cumula$ve intermediate cardinality. Extensions for SPARQL seman$cs. Query plans op$mized in the data for each query. Never produces a bad query plan. Does not rely on sta$s$cal summaries. Recognizes correla$ons in the data for the specific query. Can improve the running $me for some queries by 10x - 1000x. Maximum payoff for high volume analy$c queries. 2006-2014 All Rights Reserved 3

Join Graphs Model set of joins as ver$ces Connected by join variables Implicit connec$ons if variables shared through filters. Requires an index for each join Easily sa$sfied for RDF. Star$ng ver$ces Determined by their range counts. 2006-2014 All Rights Reserved 4

Join Graph for LUBM Q9 v0?x a Student v1?y a ub:faculty v2?z a ub:course v3?x ub:advisor?y v4?y ub:teacherof?z v5?x ub:takescourse?z SELECT?x?y?z WHERE {?x a ub:student. # v0?y a ub:faculty. # v1?z a ub:course. # v2?x ub:advisor?y. # v3?y ub:teacherof?z. # v4?x ub:takescourse?z. # v5-6 Vertices - 3 Variables, each in 3 vertices. - No filters. 2006-2014 All Rights Reserved 5

Annotate with Range Counts 6463 v0?x a Student v3 3101?x ub:advisor?y - Range count each vertex. - Minimum cardinality for {1,4,2. - Will begin with those vertices. 540 v1 v4 1627?y a ub:faculty?y ub:teacherof?z 1627 21489 v2 v5?z a ub:course?x ub:takescourse?z 2006-2014 All Rights Reserved 6

Incremental Join Path Es$ma$on Evalua$on proceeds in rounds. Increase sample size in each round. Resample to reduce sampling bias. Es$ma$on error reduced as sample size increases. Es$mate join cardinality using cut off joins Push random sample of N tuples into each vertex Join cut off when N tuples are output (handles overflow). Output cardinality es$mate can underflow (raise N in next round). Join paths pruned when dominated by another path for the same ver$ces. 2006-2014 All Rights Reserved 7

Round 0 6463 v0?x a Student v3 3101?x ub:advisor?y Path : sumestcard - [1 4] : 1400 - [4 2] : 1627 540 v1 v4 1627?y a ub:faculty?y ub:teacherof?z 1627 21489 v2 v5?z a ub:course?x ub:takescourse?z 2006-2014 All Rights Reserved 8

Round 1 6463 v0?x a Student 540 v1?y a ub:faculty v3 v4 3101?x ub:advisor?y 1627?y ub:teacherof?z Path : sumestcard - [1 4 2] : 1500 - [1 4 3] : 1500 - [1 4 5] : 1500 - [4 2 1] : 3254 - [4 2 3] : 10421 - [4 2 5] : 21964 1627 21489 v2 v5?z a ub:course?x ub:takescourse?z 2006-2014 All Rights Reserved 9

Round 2 6463 v0?x a Student 540 v1?y a ub:faculty v3 v4 3101?x ub:advisor?y 1627?y ub:teacherof?z Path : sumestcard - [1 4 3 0] : 8373 - [1 4 3 5] : 1686 - [1 4 5 0] : 44166 - [4 2 1 3] : 12685 - [4 2 1 5] : 30370 - [4 2 3 0] : 15639 - [4 2 3 5] : 10578 - [4 2 5 0] : 49080 1627 21489 v2 v5?z a ub:course?x ub:takescourse?z 2006-2014 All Rights Reserved 10

Round 3 6463 v0?x a Student 540 v1?y a ub:faculty v3 v4 3101?x ub:advisor?y 1627?y ub:teacherof?z Path : sumestcard - [1 4 3 0 2] : 14440 - [1 4 3 5 0] : 1791 - [1 4 3 5 2] : 1857 - [4 2 1 5 0] : 71045 - [4 2 3 5 0] : 10704 1627 21489 v2 v5?z a ub:course?x ub:takescourse?z 2006-2014 All Rights Reserved 11

Round 4 6463 v0 v3 3101 Path : sumestcard - [1 4 3 5 0 2] : 1896?x a Student?x ub:advisor?y 540 v1 v4 1627?y a ub:faculty?y ub:teacherof?z 1627 21489 v2 v5?z a ub:course?x ub:takescourse?z 2006-2014 All Rights Reserved 12

Selected join path vertex srccard * join hit ratio ( in limit out ) = sumestcard 1 * ( 800 540 ) = 540 4 540E * 2.97 ( 337 1000 1000 ) = 2142 3 1602 * 6.62 ( 151 1000 1000 ) = 12751 5 10609 * 0.02 ( 700 1000 11 ) = 12917 0 166 * 0.64 ( 11 1000 7 ) = 13022 2 105 * 1 ( 7 1000 7 ) = 13127 75% Faster than the join path chosen by the static join order optimizer. vertex : The vertices in the join path in the selected evaluation order. srccard : The estimated input cardinality to each join (E means Exact). ratio : The estimated join hit ratio for each join. in : The #of input solutions for each cutoff join. limit : The sample size for each cutoff join. out : The #of output solutions for each cutoff join. sumestcard : The cumulative estimated cardinality of the join path. 2006-2014 All Rights Reserved 13

Sub- Groups, Sub- Select, etc. v0?x a Student v1?y a ub:faculty v2?z a ub:course v3?x ub:advisor?y v4?y ub:teacherof?z v5?x ub:takescourse?z Vectored leb- to- right evalua$on Solu$ons flowing into a group depend on join order in the parent SELECT?x?y?z WHERE {?y a ub:faculty. # v1?z a ub:course. # v2?x ub:advisor?y. # v3?y ub:teacherof?z. # v4 {?x a ub:student. # v0?x ub:takescourse?z. # v5 FlaBen into join graph Path extensions MUST respect SPARQL group boundaries Sample join path with groups: 1 => 4 => 3 { => 5 => 0 => 2 2006-2014 All Rights Reserved 14

Bigdata Vectored Query Engine 15

Bigdata Query Engine Vectored query engine High concurrency and vectored operator evalua$on. Physical query plan (BOPs) Supports pipelined, blocked, and at- once operators. Chunks queue up for each operator. Chunks transparently mapped across a cluster. Query engine instance runs on: Each data service; plus Each node providing a SPARQL endpoint. 16

Query Hints Virtual triples Scope to query, graph pabern group, or join. Allow fine grained control over Analy$c query mode Evalua$on order Vector size Etc. SELECT?x?y?z WHERE { hint:query hint:vectorsize 10000.?x a ub:graduatestudent.?y a ub:university.?z a ub:department.?x ub:memberof?z.?z ub:suborganizationof?y.?x ub:undergraduatedegreefrom?y. 17

New Bigdata Join Operators Pipeline joins (fast, incremental). Map binding sets over the shards, execu$ng joins close to the data. Faster for single machine and much faster for distributed query. First results appear with very low latency Hash joins against an access path Scan the access path, probing the hash table for joins. Hash joins on a cluster can saturate the disk transfer rate! Solu$on set hash joins Combine sets of solu$ons on shared join variables. Used for Sub- Select and Group Graph PaBerns (aka join groups) Merge joins Used when a series of group graph paberns share a common join variable Think OPTIONALs. Java : Must sort the solu$on sets first, then a single pass to do the join. HTree : Linear in the data (it does not need to sort the solu$ons). 18

Query Evalua$on Logical Query Model Physical Query Model SPARQL Parser AST AST Optimizers AST Query Plan Generator BOPs Vectored Query Engine Solutions Fast Range Counts Resolve Terms database Access Paths 19

Data Flow Evalua$on Leb- to- right evalua$on Physical query plan is operator sequence (not tree) op1 op2 op3 Parallelism Within operator Across operators Across queries Operator annota$ons used extensively. 20

Pipeline Joins Fast, vectored, indexed join. Very low latency to first solu$ons. Mapped across shards on a cluster. Joins execute close to the data. Faster for single machine Much faster for distributed query. 21

Preparing a query Original query: SELECT?x WHERE {?x a ub:graduatestudent ; ub:takescourse <http://www.department0.university0.edu/ GraduateCourse0>. Translated query: query :- (x 8 256) ^ (x 400 3048) Query execu$on plan (access paths selected, joins reordered): query :- pos(x 400 3048) ^ spo(x 8 256) 22

Pipeline Join Execu$on 23

TEE Solu$ons flow to both the default and alt sinks. UNION(A,B) default sink TEE A B alt. sink Generalizes to n- ary UNION 24

Condi$onal Flow Data flows to either the default sink or the alt sink, depending on the condi$on. default sink cond? yes no alt. sink Branching pabern depends on the query logic. 25

Sub- Group, Sub- Select, etc. Build a hash index from the incoming solu$ons (at- once). Run the sub- select, sub- group, etc. Solu$ons from the hash index are vectored into the sub- plan. Sub- Select must hide variables which are not projected. Join against the hash index (at- once). HashIndexOp build index group SolutionSetHashJoinOp Normal Optional Exists Not Exists hash index 26

Named Solu$on Sets Support solu$on set reuse Evalua$on Run before main WHERE clause. Exogenous solu$ons fed into named subquery. Results wriben on hash index. Hash index INCLUDEd into query. Example is BSBM BI Q5 Select?country?product?nrOfReviews?avgPrice WITH { Select?country?product (count(?review) As?nrOfReviews) {?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/ v01/instances/producttype4>.?review bsbm:reviewfor?product.?review rev:reviewer?reviewer.?reviewer bsbm:country?country. Group By?country?product AS %namedset1 WHERE { { Select?country (max(?nrofreviews) As?maxReviews) { INCLUDE %namedset1 Group By?country { Select?product (avg(xsd:float(str(?price))) As?avgPrice) {?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/ v01/instances/producttype4>.?offer bsbm:product?product.?offer bsbm:price?price. Group By?product INCLUDE %namedset1. FILTER(?nrOfReviews=?maxReviews) Order By desc(?nrofreviews)?country?product 27

Named Subquery Evalua$on Run the sub- query on the query engine. Output wriben onto hash index or stream. Hash index is a good choice if the join variables can be iden$fied in advance, but that is oben not possible since the posi$on of the INCLUDE(s) for that solu$on set in the rest of the query are not yet decided. Sub- Query Sub- Plan Query Engine SOLUTIONS Hash Index Stream Any INCLUDE SolutionSetHashJoinOp HashJoinOp (Hash join against AP scan) Nested loop scan (joining the named solution set against exogenous solutions with low cardinality) 28

BoBom Up Evalua$on SPARQL seman$cs specify as if bobom up evalua$on For many queries, leb- to- right evalua$on produces the same results. Badly designed leb joins must be libed out and run first. SELECT * { :x1 :p?v. OPTIONAL { :x3 :q?w. OPTIONAL { :x2 :p?v rewrite SELECT * { WITH { SELECT?w?v { :x3 :q?w. OPTIONAL { :x2 :p?v AS %namedset1 :x1 :p?v. OPTIONAL { INCLUDE %namedset1 29

Merge Join PaBern High performance n- way join Linear in the data (on HTree). Series of group graph paberns must share a common join variable. Common pabern in SPARQL Select something and then fill in op$onal informa$on about it through op$onal join groups. SELECT (SAMPLE(?var9) AS?var1)?var2?var3 { { SELECT DISTINCT?var3 WHERE {?var3 rdf:type pol:politician.?var3 pol:hasrole?var6.?var6 pol:party "Democrat". OPTIONAL {?var3 p1:name?var9 OPTIONAL {?var10 p2:votedby?var3.?var10 rdfs:label?var2.... GROUP BY?var2?var3 30

Merge Joins (cont.) SELECT (SAMPLE(?var9) AS?var1)?var2?var3 { { SELECT DISTINCT?var3 WHERE {?var3 rdf:type pol:politician.?var3 pol:hasrole?var6.?var6 pol:party "Democrat". OPTIONAL {?var3 p1:name?var9 OPTIONAL {?var10 p2:votedby?var3.?var10 rdfs:label?var2.... GROUP BY?var2?var3 rewrite SELECT WITH { as %set1 WITH { as %set2 WITH { as %set3 WHERE { INCLUDE %set1. OPTIONAL { INCLUDE %set2. OPTIONAL { INCLUDE %set3. Handled as one n- way merge join. 31

Parallel Graph Query Plans Partly ordered scans Parallel AP reads against mul$ple shards Hash par$$oned Dis$nct Order by Aggrega$on (when not pipelined) Default Graph Queries Global index view provides RDF Merge seman$cs. DISTINCT TRIPLES across the set of contexts in the default graph 32

Parallel Hash Joins for Unselec$ve APs AP must read a lot of data with reasonable locality Shard view: journal + index segment(s). One IO per leaf on segments. 90%+ data in key- order on disk Narrow stride (e.g., OSP or POS) Crossover once sequen$al IO dominates vectored nested index join Key range scans turn into sequen$al IO in the cluster Goal is to saturate the disk read bandwidth Strategy A Map intermediate solu$ons against target shards Build hash index on target nodes Parallel local AP scans, probing hash index for joins. Strategy B Hash par$$on intermediate solu$ons on join variables. Parallel local AP scans against spanned shard(s) Vectored hash par$$on of solu$ons read from APs. Vectored probe in distributed hash index par$$ons for joins. 33

Examining a Query Plan 34

NSS Explain SPARQL Query As given SPARQL Parser SPARQL Parse Tree As parsed Original AST Generated abstract operator model Op$mized AST Op$mized abstract operator model Query Plan Physical query plan Query Evalua$on Sta$s$cs As executed AST Resolve Terms Logical Query Model AST Optimizers database AST Fast Range Counts Query Plan Generator Physical Query Model BOPs Access Paths Vectored Query Engine Solutions 35

Query Evalua$on Sta$s$cs As- run or live Explain provides as- run view. Live view available by monitoring long running queries. Summary sta$s$cs solu$ons=128, chunks=52, subqueries=0, elapsed=2467ms. solu$ons chunks subqueries elapsed #of generated solu$ons. #of chunks in the output solu$ons. #of subqueries evaluated (each in their own table) total elapsed clock $me for this query. Followed by a detailed table of per- operator sta$s$cs. For the top- level query For each sub- query 36

Detailed Operator Sta$s$cs (Many columns are for the cluster) queryid evalorder bopid, predid bopsummary predsummary query or subquery iden$fier total then one per operator in order. Index of operator or predicate in plan. Short summary for each operator Short summary for access path (AP). fastrangecount Es$mated cardinality for AP. 37

Units & Chunks In & Out unitsin chunksin unitsout chunksout typeerrors joinra$o #of solu$ons flowing into an operator #of chunks of solu$ons flowing into an op. #of solu$ons flowing out of an operator. #of chunks of solu$ons flowing out of an op. #of run$me SPARQL type errors (solu$on dropped) Ra$o of solu$ons out to solu$ons in. Indicator of selec$vity and cardinality explosion. 38

Query Engine Metrics NSS / Ganglia Work Queues one per operator: blockedworkqueuecount blockedworkqueuerunningtotal Operators many per query plan: operatorac$vecount operatorhaltcount operatorstartcount Logical Query Model Physical Query Model operatortasksperquery Queries: queriespersecond SPARQL Parser AST AST Optimizers AST Query Plan Generator BOPs Vectored Query Engine Solutions querydonecount queryerrorcount querystartcount Resolve Terms database Fast Range Counts Access Paths 39