Datenbanksysteme II: Implementation of Database Systems Implementing Joins
|
|
|
- Christiana Lamb
- 10 years ago
- Views:
Transcription
1 Datenbanksysteme II: Implementation of Database Systems Implementing Joins Material von Prof. Johann Christoph Freytag Prof. Kai-Uwe Sattler Prof. Alfons Kemper, Dr. Eickler Prof. Hector Garcia-Molina
2 Content of this Lecture Nested loop and blocked nested loop Sort-merge join Hash-based join strategies Index join (Star Join later) Ulf Leser: Datenbanksysteme II, Sommersemester
3 Join Operator JOIN: Most important relational operator Potentially very expensive Required in all practical queries and applications Often appears in groups of joins Many variations with different characteristics, suited for different situations Example: Relations R (A, B) and S (B, C) SELECT * FROM R, S WHERE R.B = S.B A B C R A B S B C A2 1 C1 A1 0 1 C1 A2 1 C3 A2 1 2 C2 A2 1 C5 A3 2 1 C3 R S A3 2 C2 A4 1 3 C4 A4 1 C1 1 C5 A4 1 C3 A4 1 C5 Ulf Leser: Datenbanksysteme II, Sommersemester
4 Unary versus Binary Operations Relational operators working on one table Selection, projection On two tables Product, Join, Intersection, Union, Minus Binary operators are usually more expensive Unary: Look at table (scanning, index, hash, ) Binary: Look at each tuple of first table for each tuple of second table Potentially quadratic complexity Ulf Leser: Datenbanksysteme II, Sommersemester
5 Nested-loop Join Super-naïve FOR EACH r IN R DO FOR EACH s IN S DO IF ( r.b=s.b) THEN OUTPUT (r s) Slight improvement FOR EACH block x IN R DO FOR EACH block y IN S DO FOR EACH r in x DO FOR EACH s in y DO IF ( r.b=s.b) THEN OUTPUT (r s) S R Cost estimations b(r), b(s) number of blocks in R and in S Each block of outer relation is read once Inner relation is read once for each block of outer relation Inner two loops are free (only main memory operations) Altogether: b(r)+b(r)*b(s) IO Ulf Leser: Datenbanksysteme II, Sommersemester
6 Example Assume b(r)=10.000, b(s)=2.000 R as outer relation IO = *2.000 = S as outer relation IO = * = Use smaller relation as outer relation For large relation, choice doesn t really matter Can t we do better?? Ulf Leser: Datenbanksysteme II, Sommersemester
7 ... There is no m in the formula m: Size of main memory in blocks This should make you suspicious We are not using our available main memory Ulf Leser: Datenbanksysteme II, Sommersemester
8 Blocked nested-loop join Rule of thumb: Use all memory you can get Use all memory the buffer manager allocates to your process This might be a difficult decision even for a single query which operations get how much memory? Blocked-nested-loop FOR i=1 TO b(r)/(m-1) DO READ NEXT m-1 blocks of R into M FOR EACH block y IN S DO FOR EACH r in R-chunk DO FOR EACH s in y do IF ( r.b=s.b) THEN OUTPUT (r s) Cost estimation Outer relation is read once Inner relation is read once for every chunk of R There are ~b(r)/m chunks IO = b(r) + b(r)*b(s)/m Further advantage: Outer relation is read in chunks sequential IO Ulf Leser: Datenbanksysteme II, Sommersemester
9 Example Example Assume b(r)=10.000, b(s)=2.000, m=500 R as outer relation IO = *2.000/500 = S as outer relation IO = *10.000/500 = Compare to one-block NL: IO Use smaller relation as outer relation Again, difference irrelevant as tables get larger But sizes of relations do matter If one relation fits into memory (b<m) Total cost: b(r) + b(s) One pass blocked-nested-loop We can do a little better with blocked-nested loop?? Ulf Leser: Datenbanksysteme II, Sommersemester
10 Zig-Zag Join When finishing a chunk of outer relation, hold last block of inner relation in memory Load next chunk of outer relation and compare with the still available last block of inner relation For each chunk, we need to read one block less Thus: Saves b(r)/m IO If R is outer relation Ulf Leser: Datenbanksysteme II, Sommersemester
11 Sort-Merge Join How does it work?? What does it cost?? Does it matter which is outer/inner relation?? When is it better then blocked-nested loop?? Ulf Leser: Datenbanksysteme II, Sommersemester
12 Sort-Merge Join How does it work? Sort both relations on join attribute(s) Merge both sorted relations Caution if duplicates exist The result size still is R * S in worst case If there are r/s tuples with value x in the join attribute in R / S, we need to output r*s tuples for x So what is the worst case?? Example R A B A1 0 A2 1 A4 1 A3 2 S B C 1 C1 1 C3 1 C5 2 C2 3 C4 Ulf Leser: Datenbanksysteme II, Sommersemester
13 Example Continued R A B S B C A1 0 A2 1 A4 1 A3 2 1 C1 1 C3 1 C5 2 C2 3 C4 Partial join {<A1,0>,<A2,1>} S A B C A2 1 C1 A2 1 C3 A2 1 C5 R A B S B C A B C A1 0 A2 1 1 C1 1 C3 Partial join A2 1 C1 A2 1 C3 A4 1 A3 2 1 C5 2 C2 3 C4 {<A1,0>,<A2,1>, <A4,1>} S A2 1 C5 A4 1 C1 A4 1 C3 new A4 1 C5 Ulf Leser: Datenbanksysteme II, Sommersemester
14 Merge Phase r := first (R); s := first (S); WHILE NOT EOR (R) and NOT EOR (S) DO IF r[b] < s[b] THEN r := next (R) ELSEIF r[b] > s[b] THEN s := next (S) ELSE /* r[b] = s[b]*/ b := r[b]; B := ; WHILE NOT EOR(S) and s[b] = b DO B := B {s}; s = next (S); END DO; WHILE NOT EOR(R) and r[b] = b DO FOR EACH e in B DO OUTPUT (r,e); r := next (R); END DO; END DO; Can we improve pipeline behavior?? Ulf Leser: Datenbanksysteme II, Sommersemester
15 Cost estimation Sorting R costs 2*b(R) * ceil(log m (b(r))) Sorting S costs 2*b(S) * ceil(log m (b(s))) Merge phase reads each relation once Total IO b(r) + b(s) + 2*b(R)*ceil(log m (b(r))) + 2*b(S)*ceil(log m (b(s))) Improvement While sorting, do not perform last read/write phase Open all sorted runs in parallel for merging Saves 2*b(R) + 2*b(S) IO Sometimes, we are able to save the sorting phase If sort is performed somewhere down in the tree Needs to be a sort on the same attribute(s) Merge-sort join: No inner/outer relation Ulf Leser: Datenbanksysteme II, Sommersemester
16 Better than Blocked-Nested-Loop? Assume b(r)=10.000, b(s)=2.000, m=500 BNL costs With S as outer relation SM: * *2.000 = Improved SM: Assume b(r)= , b(s)=1.000, m=500 BNL costs *1000/500 = SM: * *1.000 = Improved SM: When is SM better than BNL?? Consider improved version with 2*b(R)*ceil(log m (b(r))) + 2*b(S)*ceil(log m (b(s))) b(r) - b(s) ~ 2*b(R)*(log m (b)+1) + 2*b(S)*(log m (S)+1) b(r) b(s) ~ 2*b(R)*log m (b) + 2*b(S)*log m (S) b(r) b(s) ~ b(r)*(2*log m (b)-1) + b(s)*(2*log m (S)-1) In most cases, this means 3*(b(S)+b(R)) Ulf Leser: Datenbanksysteme II, Sommersemester
17 Comparison Assume relations of equal size b SM: 2*b*(2*log m (b)-1) BNL: b+b 2 /m BNL > SM b+b 2 /m > 2*b*(2*log m (b)-1) 1+b/m > 4*log m (b) - 2 b > 4m*log m (b) 3m Example b=10.000, m=100 ( > 500) BNL: , SM: 6* = b=10.000, m=5000 ( < ) BNL: , SM: 6* = Ulf Leser: Datenbanksysteme II, Sommersemester
18 Comparison 2 b(r)= , b(s)=2.000, m between 100 and BNL Join SM+Join BNL very good is one relation is much smaller than other and sufficient memory available SM can better cope with limited memory SM profit from more memory has jumps (= number of runs) Ulf Leser: Datenbanksysteme II, Sommersemester
19 Comparison 3 b(r)= , b(s)=50.000, m between 500 and BNL+Join SM-Join BNL very sensible to small memory sizes Ulf Leser: Datenbanksysteme II, Sommersemester
20 Merge-Join and Main Memory We have no m in the formula of the merge phase Implicitly, it is in the number of runs required More memory doesn t decrease number of blocks to read, but can be used for sequential reads Always fill memory with m/2 blocks from R and m/2 blocks from S Use asynchronous IO 1. Schedule request for m/4 blocks from R and m/4 blocks from S 2. Wait until loaded 3. Schedule request for next m/4 blocks from R and next m/4 blocks from S 4. Do not wait perform merge on first 2 chunks of m/4 blocks 5. Wait until previous request finished 1. We used this waiting time very well 6. Jump to 3, using m/4 chunks of M in turn Ulf Leser: Datenbanksysteme II, Sommersemester
21 Hash Join As always, we may save sorting if good hash function available Assume a very good hash function Distributes hash values almost uniformly over hash table If we have good histograms (later), a simple intervalbased hash function will usually work How can we apply hashing to joins?? Ulf Leser: Datenbanksysteme II, Sommersemester
22 Hash Join Idea Use join attributes as hash keys in both R and S Choose hash function for hash table of size m Each bucket has size b(r)/m, b(s)/m Hash phase Scan R, compute hash table, writing full blocks to disk immediately Scan S, compute hash table, writing full blocks to disk immediately Notice: Probably better to use some n<b(r)/m to allow for sequential writes Merge phase Iteratively, load same bucket of R and of S in memory Compute join Total cost Hash phase costs 2*b(R)+2*b(S) Merge phase costs b(r) + b(s) Total: 3*(b(R)+b(S)) Under what assumption?? Ulf Leser: Datenbanksysteme II, Sommersemester
23 Hash Join with Large Tables Merge phase assumes that 2 buckets can be hold in memory Thus, we roughly assume that 2*b(R)/m<m (if b(r)~b(s)) Note: Merge phase of sorting only requires 2 blocks (or more for more runs), hashing requires 2 buckets to be loaded What if b(r)>m 2 /2?? We need to create smaller buckets Partition R/S such that each partition is smaller than m 2 /2 Compute buckets for all partitions in both relations Merge in cross-product manner P R,1 with P S,1, P S,2,, P S,n P R,2 with P S,1, P S,2,, P S,n P R,m with P S,1, P S,2,, P S,n Actually, it suffices if either b(r) or b(s) is small enough Chose the smaller relation as driver (outer relation) Load one bucket into main memory Load same bucket in other relation block by block and filter tuples Ulf Leser: Datenbanksysteme II, Sommersemester
24 Hybrid Hash Join Assume that min(b(r),b(s)) < m 2 /2 Notice: During merge phase, we used only (b(r)+b(s))/m (size of two buckets) memory blocks Although there are much more Improvement Chose smaller relation (assume S) Chose a number k of buckets to build (with k<m) Again, assuming perfect hash functions, each bucket has size b(s)/k When hashing S, keep first x buckets completely in memory, but only one block for each of the (k-x) other buckets These x buckets are never written to disk When hashing R If hash value maps into buckets 1..x, perform join immediately Otherwise, map to the k-x other buckets/blocks and write buckets to disk After first round, we have performed the join on x buckets and have k-x buckets of both relations on disk Perform normal merge phase on k-x buckets Ulf Leser: Datenbanksysteme II, Sommersemester
25 Hybrid Hash Join - Complexity Total saving (compared to normal hash join) We save 2 IO (writing) for every block that is never written to disk We keep x buckets in memory, each having b(s)/k and b(r)/k blocks, respectively Together, we save 2*x/k*(b(S)+b(R)) IO operations Question: How should we choose k and x? Optimal solution x=1 and k as small as possible Build as large partitions as possible, such that still one entire partition and one block for all other partitions fits into memory Thus, we use as much memory as possible for savings Optimum reached at approximately k=b(s)/m k must be smaller, so that M can accommodate 1 block for each other bucket Together, we save 2m/b(S)*(b(S)+b(R)) Total cost: (3-2m/b(S))*(b(S)+b(R)) Ulf Leser: Datenbanksysteme II, Sommersemester
26 Comparing Hash Join and Sort-Merge Join If enough memory provided, both require approximately the same number of IOs 3*(b(R)+b(S)) Hybrid-hash join improves slightly SM generates sorted results sort phase of other joins in query plan can be dropped Advantage propagates up the tree HJ does not need to perform O(n*log(n)) sorting in main memory HJ requires that only one relation is small enough, SM needs two small relations Because both are sorted independently HJ depends on roughly equally sized buckets Otherwise, performance might degrade due to unexpected paging To prevent, estimate k more conservative and do not fill m completely Some memory remains unused Both can be tuned to generate mostly sequential IO Ulf Leser: Datenbanksysteme II, Sommersemester
27 Comparing Join Methods Ulf Leser: Datenbanksysteme II, Sommersemester
28 Index Join Assume we have an index B_Index on one join attribute Choose indexed relation as inner relation Index join FOR EACH r IN R DO X = { SEARCH (S.B_Index, <r.b>) } FOR EACH TID i in X DO s = READ (S, i) ; output (r s). R A B A1 0 A2 1 A3 2 A S.B_Index auf S S B C 1 C1 2 C2 1 C3 3 C4 1 C5 Actually, this is a one block-nested loop with index access Using BNL possible (and better) Ulf Leser: Datenbanksysteme II, Sommersemester
29 Index Join Cost Assumptions R.B is foreign key referencing S.B Every tuple from R has one or more join tuples in S Let v(x,b) be the number of different values of attribute B in X Each value in S.B appears v~b(s)/v(s,b) times For each r R, we read all tuples with given value in S Total cost: b(r) + R *(log k (S/b) + v/k + v) Outer relation read once Find value in B*, read all matching TIDs, access S for each TID Other way round: Assume that S.B is foreign key for R.B Some tuples of R will have no join partner in S Assume only r R tuples have partner Total cost: b(r) + r*(log k (S/b) + v/k + v) No real change Ulf Leser: Datenbanksysteme II, Sommersemester
30 Index Join Cost Comparison to sort-merge join Neglect log k (S/b) + v/k First term is only ~2, second ~1 in many cases SM > IJ roughly requires Example Assuming that 2 passes suffice for sorting 3*(b(R)+b(S)) > b(r)+ R *b(s)/v(s,b) b(r)=10.000, b(s)=2.000, m=500, v(s,b)=10, k=50 SM: IJ: *50*2.000/10 ~ When is an index join a good idea?? Ulf Leser: Datenbanksysteme II, Sommersemester
31 Index Join: Advantageous Situations When r is very small If join is combined with selection on R Most tuples are filtered, only very few accesses to S Index pays off When R is very small, R.B is foreign key, S.B is primary key Similar to previous case If S is primary key, then v(s,b)= S, and hence v=1 R can be read fast and probes into S We get total cost of ~b(r)+ R (plus index access etc.) Ulf Leser: Datenbanksysteme II, Sommersemester
32 Index Join with Sorting Problem with last approach: Blocks of S are read many times Caching will reduce the overhead difficult to predict Alternative First compute all necessary TID s from S Sort and read in sorted order Advantage: blocks of S will mostly be in cache when accessed Requires enough memory for keeping TID list and tuples of R Further improvement Perform Sort-Merge join on B values only B values from S are already sorted (if B*-index) is used Hence: Sort R on B, merge lists, probe into S for required values Can be combined with previous improvement Ulf Leser: Datenbanksysteme II, Sommersemester
33 Index Join with 2 Indexes Assume we have an index on both join attributes What are we doing?? Ulf Leser: Datenbanksysteme II, Sommersemester
34 Index Join with 2 Indexes TID list join Read both indexes sequentially (order does not matter) Join (value,tid) lists Probe into R and S Large advantage, if intersection is small Otherwise, we need sorted tables (index-organized) But then sort-merge is probably faster Ulf Leser: Datenbanksysteme II, Sommersemester
35 Semi Join Consider queries such as SELECT DISTINCT R.* FROM S,R WHERE R.B=S.B SELECT R.* FROM R WHERE R.B IN (SELECT S.B FROM S) SELECT R.* FROM R WHERE R.B IN ( ) What s special? No values from S are requested in result S (or inner query) acts as filter on R Semi-Join R S Very important for distributed databases Accessing data becomes even more expensive Idea: First ship only join attribute values, compute Semi-Join, then retrieve rest of matching tuples Technique also can be used for very large tuples and small result sizes First project on attribute values Intersect lists, probe into tables and load data Question: How do we know the sizes of intermediate result? Ulf Leser: Datenbanksysteme II, Sommersemester
36 Implementing Semi-Join Using blocked-nested-loop join Chose filter relation as outer relation Perform BNL Whenever partner for R.B is found, exit inner loop Using sort-merge join Sort R Sort join attribute values from S, remove duplicates on the way Perform merge phase as usual Very effective if v(s,b) is small Ulf Leser: Datenbanksysteme II, Sommersemester
37 Union, Difference, Intersection Other binary operations use methods similar as those for joins Sorting, hashing, index,... See Garcia-Molina et al. for detailed discussion Ulf Leser: Datenbanksysteme II, Sommersemester
Query Processing C H A P T E R12. Practice Exercises
C H A P T E R12 Query Processing Practice Exercises 12.1 Assume (for simplicity in this exercise) that only one tuple fits in a block and memory holds at most 3 blocks. Show the runs created on each pass
Chapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
Data Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data
Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)
Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster
SQL Query Evaluation. Winter 2006-2007 Lecture 23
SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution
Datenbanksysteme II: Implementation of Database Systems Recovery Undo / Redo
Datenbanksysteme II: Implementation of Database Systems Recovery Undo / Redo Material von Prof. Johann Christoph Freytag Prof. Kai-Uwe Sattler Prof. Alfons Kemper, Dr. Eickler Prof. Hector Garcia-Molina
Oracle Database 11g: SQL Tuning Workshop Release 2
Oracle University Contact Us: 1 800 005 453 Oracle Database 11g: SQL Tuning Workshop Release 2 Duration: 3 Days What you will learn This course assists database developers, DBAs, and SQL developers to
Oracle Database 11g: SQL Tuning Workshop
Oracle University Contact Us: + 38516306373 Oracle Database 11g: SQL Tuning Workshop Duration: 3 Days What you will learn This Oracle Database 11g: SQL Tuning Workshop Release 2 training assists database
Efficient Processing of Joins on Set-valued Attributes
Efficient Processing of Joins on Set-valued Attributes Nikos Mamoulis Department of Computer Science and Information Systems University of Hong Kong Pokfulam Road Hong Kong [email protected] Abstract Object-oriented
Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html
Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the
1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle
1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We
16.1 MAPREDUCE. For personal use only, not for distribution. 333
For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several
EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES
ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada [email protected]
Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.
Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design
Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division
Answer Key UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division CS186 Fall 2003 Eben Haber Midterm Midterm Exam: Introduction to Database Systems This exam has
Topics. Distributed Databases. Desirable Properties. Introduction. Distributed DBMS Architectures. Types of Distributed Databases
Topics Distributed Databases Chapter 21, Part B Distributed DBMS architectures Data storage in a distributed DBMS Distributed catalog management Distributed query processing Updates in a distributed DBMS
External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao
University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao CMPSCI 445 Midterm Practice Questions NAME: LOGIN: Write all of your answers directly on this paper. Be sure to clearly
Advanced Oracle SQL Tuning
Advanced Oracle SQL Tuning Seminar content technical details 1) Understanding Execution Plans In this part you will learn how exactly Oracle executes SQL execution plans. Instead of describing on PowerPoint
Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan?
Why Query Optimization? Access Path Selection in a Relational Database Management System P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price Peyman Talebifard Queries must be executed and execution
Chapter 13: Query Optimization
Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Transformation of Relational Expressions Catalog
External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13
External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing
SQL Server Query Tuning
SQL Server Query Tuning Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author Pro SQL Server
Lecture 1: Data Storage & Index
Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager
Rethinking SIMD Vectorization for In-Memory Databases
SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest
Chapter 14: Query Optimization
Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog
Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework
Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework Fariha Atta MSc Informatics School of Informatics University of Edinburgh 2010 T Abstract he Map/Reduce
SQL Query Performance Tuning: Tips and Best Practices
SQL Query Performance Tuning: Tips and Best Practices Pravasini Priyanka, Principal Test Engineer, Progress Software INTRODUCTION: In present day world, where dozens of complex queries are run on databases
Useful Business Analytics SQL operators and more Ajaykumar Gupte IBM
Useful Business Analytics SQL operators and more Ajaykumar Gupte IBM 1 Acknowledgements and Disclaimers Availability. References in this presentation to IBM products, programs, or services do not imply
Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner
Understanding SQL Server Execution Plans Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author
Performance Tuning for the Teradata Database
Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document
Chapter 13. Disk Storage, Basic File Structures, and Hashing
Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing
D B M G Data Base and Data Mining Group of Politecnico di Torino
Database Management Data Base and Data Mining Group of [email protected] A.A. 2014-2015 Optimizer objective A SQL statement can be executed in many different ways The query optimizer determines
Best Practices for DB2 on z/os Performance
Best Practices for DB2 on z/os Performance A Guideline to Achieving Best Performance with DB2 Susan Lawson and Dan Luksetich www.db2expert.com and BMC Software September 2008 www.bmc.com Contacting BMC
COS 318: Operating Systems
COS 318: Operating Systems File Performance and Reliability Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Topics File buffer cache
Bloom Filters. Christian Antognini Trivadis AG Zürich, Switzerland
Bloom Filters Christian Antognini Trivadis AG Zürich, Switzerland Oracle Database uses bloom filters in various situations. Unfortunately, no information about their usage is available in Oracle documentation.
Query Optimization in a Memory-Resident Domain Relational Calculus Database System
Query Optimization in a Memory-Resident Domain Relational Calculus Database System KYU-YOUNG WHANG and RAVI KRISHNAMURTHY IBM Thomas J. Watson Research Center We present techniques for optimizing queries
Optimizing Performance. Training Division New Delhi
Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,
Databases and Information Systems 1 Part 3: Storage Structures and Indices
bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: - database buffer -
Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University
Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University File Management File management system has traditionally been considered part of the operating system. Applications
The Classical Architecture. Storage 1 / 36
1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage
Overview of Storage and Indexing
Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan
Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : 933597 Hertford College Supervisor: Dr.
Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : 933597 Hertford College Supervisor: Dr. Dan Olteanu Submitted as part of Master of Computer Science Computing Laboratory
DATABASE DESIGN - 1DL400
DATABASE DESIGN - 1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information
Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed ... ...
Outline Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) Hardware: Disks Access Times Example -
NetApp FAS Hybrid Array Flash Efficiency. Silverton Consulting, Inc. StorInt Briefing
NetApp FAS Hybrid Array Flash Efficiency Silverton Consulting, Inc. StorInt Briefing PAGE 2 OF 7 Introduction Hybrid storage arrays (storage systems with both disk and flash capacity) have become commonplace
Q & A From Hitachi Data Systems WebTech Presentation:
Q & A From Hitachi Data Systems WebTech Presentation: RAID Concepts 1. Is the chunk size the same for all Hitachi Data Systems storage systems, i.e., Adaptable Modular Systems, Network Storage Controller,
Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*
Oracle Database 11 g Performance Tuning Recipes Sam R. Alapati Darl Kuhn Bill Padfield Apress* Contents About the Authors About the Technical Reviewer Acknowledgments xvi xvii xviii Chapter 1: Optimizing
Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8
Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan
Fallacies of the Cost Based Optimizer
Fallacies of the Cost Based Optimizer Wolfgang Breitling [email protected] Who am I Independent consultant since 1996 specializing in Oracle and Peoplesoft setup, administration, and performance tuning
An Oracle White Paper May 2010. Guide for Developing High-Performance Database Applications
An Oracle White Paper May 2010 Guide for Developing High-Performance Database Applications Introduction The Oracle database has been engineered to provide very high performance and scale to thousands
Semi-Streamed Index Join for Near-Real Time Execution of ETL Transformations
Semi-Streamed Index Join for Near-Real Time Execution of ETL Transformations Mihaela A. Bornea #1, Antonios Deligiannakis 2, Yannis Kotidis #1,Vasilis Vassalos #1 # Athens U. of Econ and Business, Technical
Record Storage and Primary File Organization
Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records
Chapter 5: Overview of Query Processing
Chapter 5: Overview of Query Processing Query Processing Overview Query Optimization Distributed Query Processing Steps Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of
Big Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server
Understanding Query Processing and Query Plans in SQL Server Craig Freedman Software Design Engineer Microsoft SQL Server Outline SQL Server engine architecture Query execution overview Showplan Common
Partitioning and Divide and Conquer Strategies
and Divide and Conquer Strategies Lecture 4 and Strategies Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies Quiz 4.1 For nuclear reactor simulation,
Optimizing Your Data Warehouse Design for Superior Performance
Optimizing Your Data Warehouse Design for Superior Performance Lester Knutsen, President and Principal Database Consultant Advanced DataTools Corporation Session 2100A The Problem The database is too complex
Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3
Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM
In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki 30.11.2012
In-Memory Columnar Databases HyPer Arto Kärki University of Helsinki 30.11.2012 1 Introduction Columnar Databases Design Choices Data Clustering and Compression Conclusion 2 Introduction The relational
SQL Server Query Tuning
SQL Server Query Tuning A 12-Step Program By Thomas LaRock, Technical Evangelist and Head Geek Confio Software 4772 Walnut Street, Suite 100 Boulder, CO 80301 www.confio.com Introduction Query tuning is
In-memory Tables Technology overview and solutions
In-memory Tables Technology overview and solutions My mainframe is my business. My business relies on MIPS. Verna Bartlett Head of Marketing Gary Weinhold Systems Analyst Agenda Introduction to in-memory
Efficient Data Access and Data Integration Using Information Objects Mica J. Block
Efficient Data Access and Data Integration Using Information Objects Mica J. Block Director, ACES Actuate Corporation [email protected] Agenda Information Objects Overview Best practices Modeling Security
MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context
MapReduce Jeffrey Dean and Sanjay Ghemawat Background context BIG DATA!! o Large-scale services generate huge volumes of data: logs, crawls, user databases, web site content, etc. o Very useful to be able
Database 2 Lecture I. Alessandro Artale
Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (1) Database 2 Lecture I Alessandro Artale Faculty of Computer Science Free University of Bolzano Room: 221 [email protected] http://www.inf.unibz.it/
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time
Physical DB design and tuning: outline
Physical DB design and tuning: outline Designing the Physical Database Schema Tables, indexes, logical schema Database Tuning Index Tuning Query Tuning Transaction Tuning Logical Schema Tuning DBMS Tuning
Inside the PostgreSQL Query Optimizer
Inside the PostgreSQL Query Optimizer Neil Conway [email protected] Fujitsu Australia Software Technology PostgreSQL Query Optimizer Internals p. 1 Outline Introduction to query optimization Outline of
Data storage Tree indexes
Data storage Tree indexes Rasmus Pagh February 7 lecture 1 Access paths For many database queries and updates, only a small fraction of the data needs to be accessed. Extreme examples are looking or updating
ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001
ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel
Multi-dimensional index structures Part I: motivation
Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for
Efficiently Identifying Inclusion Dependencies in RDBMS
Efficiently Identifying Inclusion Dependencies in RDBMS Jana Bauckmann Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany [email protected]
Analyze Database Optimization Techniques
IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.8, August 2010 275 Analyze Database Optimization Techniques Syedur Rahman 1, A. M. Ahsan Feroz 2, Md. Kamruzzaman 3 and
CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113
CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113 Instructor: Boris Glavic, Stuart Building 226 C, Phone: 312 567 5205, Email: [email protected] Office Hours:
Oracle Rdb Performance Management Guide
Oracle Rdb Performance Management Guide Solving the Five Most Common Problems with Rdb Application Performance and Availability White Paper ALI Database Consultants 803-648-5931 www.aliconsultants.com
Run$me Query Op$miza$on
Run$me Query Op$miza$on Robust Op$miza$on for Graphs 2006-2014 All Rights Reserved 1 RDF Join Order Op$miza$on Typical approach Assign es$mated cardinality to each triple pabern. Bigdata uses the fast
Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015
Execution Plans: The Secret to Query Tuning Success MagicPASS January 2015 Jes Schultz Borland plan? The following steps are being taken Parsing Compiling Optimizing In the optimizing phase An execution
CIS 631 Database Management Systems Sample Final Exam
CIS 631 Database Management Systems Sample Final Exam 1. (25 points) Match the items from the left column with those in the right and place the letters in the empty slots. k 1. Single-level index files
Evaluation of Expressions
Query Optimization Evaluation of Expressions Materialization: one operation at a time, materialize intermediate results for subsequent use Good for all situations Sum of costs of individual operations
Power BI Performance. Tips and Techniques WWW.PRAGMATICWORKS.COM
Power BI Performance Tips and Techniques Rachael Martino Principal Consultant [email protected] @RMartinoBoston About Me SQL Server and Oracle developer and IT Manager since SQL Server 2000 Focused
Chapter 8: Structures for Files. Truong Quynh Chi [email protected]. Spring- 2013
Chapter 8: Data Storage, Indexing Structures for Files Truong Quynh Chi [email protected] Spring- 2013 Overview of Database Design Process 2 Outline Data Storage Disk Storage Devices Files of Records
Symbol Tables. Introduction
Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The
FAST 11. Yongseok Oh <[email protected]> University of Seoul. Mobile Embedded System Laboratory
CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory
In and Out of the PostgreSQL Shared Buffer Cache
In and Out of the PostgreSQL Shared Buffer Cache 2ndQuadrant US 05/21/2010 About this presentation The master source for these slides is http://projects.2ndquadrant.com You can also find a machine-usable
