CS 525: Advanced Database Organization. Execution Plan. Query Execution. How to estimate costs. 10: Query Execution. Estimating IOs: Boris Glavic

Size: px
Start display at page:

Download "CS 525: Advanced Database Organization. Execution Plan. Query Execution. How to estimate costs. 10: Query Execution. Estimating IOs: Boris Glavic"

Transcription

1 SQL query CS 55: Advanced Database Organization 0: Query Execution Boris Glavic Slides: adapted from a course taught by Hector Garcia-Molina, Stanford InfoLab CS 55 Notes 0 - Query Execution parse parse tree convert logical query plan apply laws improved l.q.p estimate result sizes l.q.p. +sizes consider physical plans {P,P,..} statistics CS 55 Notes 0 - Query Execution answer execute Pi pick best {(P,C),(P,C)...} estimate costs Query Execution Here only: how to implement operators what are the costs of implementations how to implement queries Data flow between operators Next part: How to choose good plan Execution Plan A tree (DAG) of physical operators that implement a query May use indices May create temporary relations May create indices on the fly May use auxiliary operations such as sorting CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution How to estimate costs If everything fits into memory Standard computational complexity If not Assume fixed memory available for buffering pages Count I/O operations eal systems combine this with CPU estimations Estimating IOs: Count # of disk blocks that must be read (or written) to execute query plan CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 6

2 To estimate costs, we may have additional parameters: B() = # of blocks containing tuples f() = max # of tuples of per block M = # memory blocks available To estimate costs, we may have additional parameters: B() = # of blocks containing tuples f() = max # of tuples of per block M = # memory blocks available HT(i) = # levels in index i LB(i) = # of leaf blocks in index i CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 8 Clustered index Index that allows tuples to be read in an order that corresponds to physical order A A index CS 55 Notes 0 - Query Execution 9 Operators Overview (External) Sorting Joins (Nested Loop, Merge, Hash, ) Aggregation (Sorting, Hash) Selection, Projection (Index, Scan) Union, Set Difference Intersection Duplicate Elimination CS 55 Notes 0 - Query Execution 0 Operator Profiles Algorithm In-memory complexity: e.g., O(n ) Memory requirements untime based on available memory #I/O if operation needs to go to disk Disk space needed Prerequisites Conditions under which the operator can be applied CS 55 Notes 0 - Query Execution Execution Strategies Compiled Translate into C/C++/Assembler code Compile, link, and execute code Interpreted Generic operator implementations Generic executor Interprets query plan CS 55 Notes 0 - Query Execution

3 Virtual Machine Approach Implement virtual machine of low-level DBMS operations Compile query into machine-code for that machine Iterator Model Need to be able to combine operators in different ways E.g., join inputs may be scans, or outputs of other joins, -> define generic interface for operators be able to arbitrarily compose complex plans from a small set of operators CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution Iterator Model - Interface Open Prepare operator to read inputs Close Close operator and clean up Next eturn next result tuple Query Execution Iterator Model Iterator open close next Iterator open close next Iterator open close next Iterator open close next CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 6 Query Execution Iterator Model Key Call Iterator open close next Query Execution Iterator Key Model eturn Tuple Call Iterator open close next Iterator open close next Iterator open close next Iterator open close next Iterator open close next Iterator open close next Iterator open close next CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 8

4 Parallelism Iterator Model Pull-based query execution Potential types of parallelism Inter-query (every multiuser system) Intra-operator Inter-operator Intra-Operator Parallelism Execute portions of an operator in parallel Merge-Sort Assign a processor to each merge phase Scan Partition tables Each process scans one partition CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 0 Inter-Operator Parallelism Each process executes one or more operators Pipelining Push-based query execution Chain operators to directly produce results Pipeline-breakers Operators that need to consume the whole input (or large parts) before producing outputs Pipelining Communication Queues Operators push their results to queues Operators read their inputs from queues Direct call Operator calls its parent in the tree with results Within one process CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution Key Append to queue Dequeue Direct Call Pipelines Pipeline-breakers Sorting All operators that apply sorting Aggregation Set Difference Some implementations of Join Union CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution

5 Operators Overview (External) Sorting Joins (Nested Loop, Merge, Hash, ) Aggregation (Sorting, Hash) Selection, Projection (Index, Scan) Union, Set Difference Intersection Duplicate Elimination Sorting Why do we want/need to sort Query requires sorting (ODE BY) Operators require sorted input Merge-join Aggregation by sorting Duplicate removal using sorting CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 6 In-memory sorting Algorithms from data structures 0 Quick sort Merge sort Heap sort Intro sort External sorting Problem: Sort N pages of data with M pages of memory Solutions? CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 8 First Idea Split data into runs of size M Sort each run in memory and write back to disk N/M sorted runs of size M Now what? M M M Merging uns Need to create bigger sorted runs out of sorted smaller runs Divide and Conquer Merge Sort? How to merge two runs that are bigger than M? CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 0 5

6 Merging uns using pages Merging uns Merging sorted runs and Need pages One page to buffer pages from One page to buffer pages from One page to buffer the result Whenever this buffer is full, write it to disk read read merge write CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution -Way External Mergesort epeat process until we have one sorted run Each iteration (pass) reads and writes the whole table once: B() I/Os Each pass doubles the run size + log (B() / M) runs B() * ( + log (B() / M) ) I/Os Input Pass 0 Pass Pass Pass CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution N-Way External Mergesort Merging uns How to utilize M buffer during merging? Each pass merges M- runs at once One memory page as buffer for each run #I/Os read read merge write + log M- (B() / M) runs B() *( + log M- (B() / M) ) I/Os M- read CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 6 6

7 How many passes do we need? N M= M=9 M=5 M=5 M=05 00,000 0,000 00,000 5,000, ,000, ,000,000,000,000, To put into perspective Scenario Page size KB TB of data (50,000,000) 0MB of buffer for sorting (50) Passes passes CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 8 Merge In practice would want larger I/O buffer for each run Trade-off between number of runs and efficiency of I/O Improving in-memory merging Merging M runs To choose next element to output Have to compare M elements -> complexity linear in M: O(M) How to improve that? Use priority queue to store current element from each run -> O(log (M)) CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 0 Priority Queue Queue for accessing elements in some given order pop-smallest = return and remove smallest element in set Insert(e) = insert element into queue Min-Heap Implementation of priority queue Store elements in a binary tree All levels are full (except leaf level) Heap property Parent is smaller than child Example: {,,, 0 } 0 CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution

8 Min-Heap Insertion insert(e)!. Add element at next free leaf node This may invalidate heap property. If node smaller than parent then Switch node with parent. epeat until ) cannot be applied anymore Min-Heap Dequeue pop-smallest!. eturn oot and use right-most leaf as new root This may invalidate heap property. If node smaller than child then Switch node with smaller child. epeat until ) cannot be applied anymore CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution Insertion Dequeue Insert Insert at first free position estore heap property CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 6 Dequeue Min/Max-Heap Complexity Heap is a complete tree Height is O(log (n)) Insertion Maximal height of the tree switches -> O(log (n)) Dequeue Maximal height of the tree switches -> O(log (n)) CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 8 8

9 Min-Heap Implementation Full tree Use array to implement tree Compute positions Parent(n) = (n- ) / Children(n) = n +, n Merging with Priority Queue CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 50 Merging with Priority Queue Merging with Priority Queue CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 5 Using a heap to generate runs Using a heap to generate runs ead inputs into heap Until available memory is full eplace elements emove smallest element from heap If larger then last element written of current run then write to current run Else create a new run Add new element from input to heap CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 5 9

10 Using a heap to generate runs Using a heap to generate runs CS 55 Notes 0 - Query Execution 55 CS 55 Notes 0 - Query Execution 56 Using a heap to generate runs Using a heap to generate runs CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 58 Using a heap to generate runs Using a heap to generate runs CS 55 Notes 0 - Query Execution 59 CS 55 Notes 0 - Query Execution 60 0

11 Using a heap to generate runs Increases the run-length On average by a factor of (see Knuth) Use clustered B+-tree Keys in the B+-tree I are in sort order If B+-tree is clustered traversing the leaf nodes is sequential I/O! K = #keys/leaf node Approach Traverse from root to first leaf: HT(I) Follow sibling pointers: / K ead data blocks: B() CS 55 Notes 0 - Query Execution 6 CS 55 Notes 0 - Query Execution 6 I/O Operations HT(I) + / K + B() I/Os Less than B() = pass external mergesort ->Better than external merge-sort! CS 55 Notes 0 - Query Execution 6 Unclustered B+-tree? Each entry in a leaf node may point to different page of relation For each leaf page we may read up to K pages from relation andom I/O In worst-case we have K * B() K = * B() = 50 merge passes CS 55 Notes 0 - Query Execution 6 Sorting Comparison B() = number of block of M = number of available memory blocks #B = records per page HT = height of B+-tree (logarithmic) K = number of keys per leaf node Property Ext. Mergesort B+ (clustered) B+ (unclustered) untime O (N log M- (N)) O(N) O(N) #I/O (random) B() * ( + log M- (B() / M) ) HT + / K + B() Variants ) Merge with heap ) un generation with heap ) Larger Buffer CS 55 Notes 0 - Query Execution 65 HT + / K + K * #B Memory M (better HT + X) (better HT + X) Disk Space B() 0 0 Operators Overview (External) Sorting Joins (Nested Loop, Merge, Hash, ) Aggregation (Sorting, Hash) Selection, Projection (Index, Scan) Union, Set Difference Intersection Duplicate Elimination CS 55 Notes 0 - Query Execution 66

12 Scan Implements access to a table Combined with selection Probably projection too Variants Sequential Scan through all tuples of relation Index Use index to find tuples that match selection CS 55 Notes 0 - Query Execution 6 Operators Overview (External) Sorting Joins (Nested Loop, Merge, Hash, ) Aggregation (Sorting, Hash) Selection, Projection (Index, Scan) Union, Set Difference Intersection Duplicate Elimination CS 55 Notes 0 - Query Execution 68 Options Transformations: c, c Joint algorithms: Nested loop Merge join Join with index Hash join Outer join algorithms Nested Loop Join (conceptually) for each r do for each s do if (r,s) C then output (r,s) Applicable to: Any join condition C Cross-product CS 55 Notes 0 - Query Execution 69 CS 55 Notes 0 - Query Execution 0 Merge Join (conceptually) () if and not sorted, sort them () i ; j ; While (i T( )) (j T( )) do if { i }.C = { j }.C then outputtuples else if { i }.C > { j }.C then j j+ else if { i }.C < { j }.C then i i+ Applicable to: C is conjunction of equalities or </>: A = B AND AND A n = B n Procedure Output-Tuples While ( { i }.C = { j }.C) (i T( )) do [jj j; while ( { i }.C = { jj }.C) (jj T( )) do [output pair { i }, { jj }; jj jj+ ] i i+ ] CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution

13 Example i {i}.c {j}.c j Index nested loop (Conceptually) For each r do Assume.C index [ X index (, C, r.c) for each s X do output (r,s) pair] Note: X index(rel, attr, value) then X = set of rel tuples with attr = value CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution Hash join (conceptual) Hash function h, range 0 k Buckets for : G 0, G,... G k Buckets for : H 0, H,... H k Applicable to: C is conjunction of equalities A = B AND AND A n = B n Hash join (conceptual) Hash function h, range 0 k Buckets for : G 0, G,... G k Buckets for : H 0, H,... H k Algorithm () Hash tuples into G buckets () Hash tuples into H buckets () For i = 0 to k do match tuples in G i, H i buckets CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 6 Simple example hash: even/odd Factors that affect performance Buckets 5 Even: Odd: CS 55 Notes 0 - Query Execution () Tuples of relation stored physically together? () elations sorted by join attribute? () Indexes exist? CS 55 Notes 0 - Query Execution 8

14 Example (a) NL Join elations not contiguous ecall T( ) = 0,000 T( ) = 5,000 S( ) = S( ) =/0 block MEM=0 blocks Example (a) Nested Loop Join elations not contiguous ecall T( ) = 0,000 T( ) = 5,000 S( ) = S( ) =/0 block MEM=0 blocks Cost: for each tuple: [ead tuple + ead ] Total =0,000 [+500]=5,00,000 IOs CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 80 Can we do better? Can we do better? Use our memory () ead 00 blocks of () ead all of (using block) + join () epeat until done CS 55 Notes 0 - Query Execution 8 CS 55 Notes 0 - Query Execution 8 Cost: for each chunk: ead chunk: 00 IOs ead : 500 IOs 600 Cost: for each chunk: ead chunk: 00 IOs ead : 500 IOs 600 Total =,000 x 600 = 6,000 IOs 00 CS 55 Notes 0 - Query Execution 8 CS 55 Notes 0 - Query Execution 8

15 Can we do better? Can we do better? E everse join order: Total = 500 x (00 +,000) = 00 5 x,00 = 5,500 IOs CS 55 Notes 0 - Query Execution 85 CS 55 Notes 0 - Query Execution 86 Cost of Block Nested Loop E everse join order: Total = B() x (min(b(), M-) + B()) M- Block-Nested Loop Join (conceptual) for each M- blocks of do read M- blocks of into buffer for each block of do read next block of for each tuple r in block for each tuple s in block if (r,s) C then output (r,s) CS 55 Notes 0 - Query Execution 8 CS 55 Notes 0 - Query Execution 88 Note How much memory for buffering inner and for outer chunks? for inner would minimize I/O But, larger buffer better for I/O M - k M - k M - k k k k k k k CS 55 Notes 0 - Query Execution 89 CS 55 Notes 0 - Query Execution 90 5

16 Example (b) Merge Join Both, ordered by C; relations contiguous Memory Example (b) Merge Join Both, ordered by C; relations contiguous Memory Total cost: ead cost + read cost = =,500 IOs CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 9 A a b a c d e 5 Merge Join Example B=C S B Output: (a,,,x) S C D Z Z S x y e 6 q d A Merge Join Example B=C S B a b Z a c d e 5 Output: (b,,,x) S C D Z S x y e 6 q d CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 9 Merge Join Example B=C S.B > S.C: advance Z S S A B C D a Z S x b y a Z e c 6 q d d e 5 A Merge Join Example B=C S B Output: (a,,,y) S a b Z S a Z c d e 5 C D x y e 6 q d CS 55 Notes 0 - Query Execution 95 CS 55 Notes 0 - Query Execution 96 6

17 A Merge Join Example B=C S B Output: (a,,,e) S a b a Z Z S c d e 5 C D x y e 6 q d Merge Join Example B=C S.B > S.C: advance Z S S A B C D a x b y a Z S e c Z 6 q d d e 5 CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 98 Merge Join Example B=C S.B < S.C: advance Z S A B C D a x b y a e c Z Z S 6 q d d e 5 Merge Join Example B=C S.B < S.C: advance Z S A B C D a x b y a e c Z S 6 q d Z d e 5 CS 55 Notes 0 - Query Execution 99 CS 55 Notes 0 - Query Execution 00 Merge Join Example Example (c) Merge Join A a b a c d e 5 B=C S B.B < S.C: DONE Z Z S S C D x y e 6 q d, not ordered, but contiguous --> Need to sort, first CS 55 Notes 0 - Query Execution 0 CS 55 Notes 0 - Query Execution 0

18 One way to sort: Merge Sort (i) For each 00 blk chunk of : - ead chunk - Sort in memory - Write to disk Memory... sorted chunks (ii) ead all chunks + merge + write out Sorted file Memory Sorted Chunks CS 55 Notes 0 - Query Execution 0 CS 55 Notes 0 - Query Execution 0 Cost: Sort Each tuple is read,written, read, written so... Sort cost : x,000 =,000 Sort cost : x 500 =,000 Example (d) Merge Join (continued), contiguous, but unordered Total cost = sort cost + join cost = 6,000 +,500 =,500 IOs CS 55 Notes 0 - Query Execution 05 CS 55 Notes 0 - Query Execution 06 Example (c) Merge Join (continued), contiguous, but unordered Total cost = sort cost + join cost = 6,000 +,500 =,500 IOs But: Iteration cost = 5,500 so merge joint does not pay off! But say = 0,000 blocks contiguous = 5,000 blocks not ordered Iterate: 5000 x (00+0,000) = 50 x 0,00 00 = 505,000 IOs Merge join: 5(0,000+5,000) = 5,000 IOs Merge Join (with sort) WINS! CS 55 Notes 0 - Query Execution 0 CS 55 Notes 0 - Query Execution 08 8

19 How much memory do we need for merge sort? E.g: Say I have 0 memory blocks chunks to merge, need 00 blocks! In general: Say k blocks in memory x blocks for relation sort # chunks = (x/k) size of chunk = k CS 55 Notes 0 - Query Execution 09 CS 55 Notes 0 - Query Execution 0 In general: Say k blocks in memory x blocks for relation sort # chunks = (x/k) size of chunk = k # chunks < buffers available for merge In general: Say k blocks in memory x blocks for relation sort # chunks = (x/k) size of chunk = k # chunks < buffers available for merge so... (x/k) k or k x or k x CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution In our example is 000 blocks, k.6 is 500 blocks, k.6 Can we improve on merge join? Hint: do we really need the fully sorted files? Need at least buffers Join? Again: in practice we would not want to use only one buffer per run! sorted runs CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 9

20 Cost of improved merge join: C = ead + write into runs + read + write into runs + join =,000 +,000 +,500 =,500 --> Memory requirement? Example (d) Index Join Assume.C index exists; levels Assume contiguous, unordered Assume.C index fits in memory CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 6 Cost: eads: 500 IOs for each tuple: - probe index - free - if match, read tuple: IO What is expected # of matching tuples? (a) say.c is key,.c is foreign key then expect = (b) say V(,C) = 5000, T( ) = 0,000 with uniform assumption expect = 0,000/5,000 = CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 8 What is expected # of matching tuples? (c) Say DOM(, C)=,000,000 T( ) = 0,000 with alternate assumption Expect = 0,000 =,000, Total cost with index join (a) Total cost = () = 5,500 (b) Total cost = () = 0,500 (c) Total cost = (/00)=550 CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 0 0

21 What if index does not fit in memory? Example: say.c index is 0 blocks Keep root + 99 leaf nodes in memory Expected cost of each probe is E = (0)99 + () Total cost (including probes) = [Probe + get records] = [0.5+] uniform assumption = 500+,500 =,000 (case b) CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution Total cost (including probes) = [Probe + get records] = [0.5+] uniform assumption = 500+,500 =,000 (case b) So far Nested Loop 5500 Merge join 500 Sort+Merge Join C Index C Index For case (c): = [0.5 + (/00) ] = = 050 IOs CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution Example (e) Partition Hash Join, contiguous (un-ordered) Use 00 buckets ead, hash, + write buckets -> Same for -> ead one bucket; build memory hash table -using different hash function h -> ead corresponding bucket + hash probe memory 0 blocks Then repeat for all buckets CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 6

22 Cost: Cost: Bucketize: ead + write Bucketize: ead + write ead + write ead + write Join: ead, Join: ead, Total cost = x [ ] = 500 Total cost = x [ ] = 500 Note: this is an approximation since buckets will vary in size and we have to round up to blocks CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 8 Why is Hash Join good? Minimum memory requirements: Size of bucket = (x/k) k = number of memory buffers x = number of blocks So... (x/k) < k S S k > x need: k+ total memory buffers CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 0 Can we use Hash-join when buckets do not fit into memory?: Treat buckets as relations and apply Hash-join recursively join Duality Hashing-Sorting Both partition inputs Until input fits into memory Logarithmic number of phases in memory size CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution

23 Trick: keep some buckets in memory E.g., k = in memory G0 G buckets = blocks keep in memory... -= called hybrid hash-join Trick: keep some buckets in memory E.g., k = in memory G0 G buckets = blocks keep in memory... Memory use: G 0 buffers G buffers Output - buffers input Total 9 buffers 6 buffers to spare!! -= called hybrid hash-join CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution Next: Bucketize buckets =500/= 6 blocks Two of the buckets joined immediately with G 0,G Finally: Join remaining buckets for each bucket pair: read one of the buckets into memory join with second bucket in memory G0 G buckets 6 buckets ans out memory one full bucket Gi buckets 6 buckets... -=... -= one buffer... -=... -= CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 6 Cost Bucketize = 000+ =96 To bucketize, only write buckets: so, cost = =996 To compare join ( buckets already done) read + 6=5 How many buckets in memory? in memory memory G0 G O... in G0? Total cost = = See textbook for answer... CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 8

24 Another hash join trick: Only write into buckets <val,ptr> pairs When we get a match in join phase, must fetch tuples To illustrate cost computation, assume: 00 <val,ptr> pairs/block expected number of result tuples is 00 CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 0 To illustrate cost computation, assume: 00 <val,ptr> pairs/block expected number of result tuples is 00 Build hash table for in memory 5000 tuples 5000/00 = 50 blocks ead and match ead ~ 00 tuples To illustrate cost computation, assume: 00 <val,ptr> pairs/block expected number of result tuples is 00 Build hash table for in memory 5000 tuples 5000/00 = 50 blocks ead and match ead ~ 00 tuples Total cost = ead : 500 ead : 000 Get tuples: CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution So far: Iterate 5500 Merge join 500 Sort+merge joint 500.C index C index Build.C index Build.C index Hash join 500+ with trick, first with trick, first Hash join, pointers 600 CS 55 Notes 0 - Query Execution Yet another hash join trick: Combine the ideas of block nested-loop with hash join Use memory to build hash-table for one chunk of relation Find join partners in O() instead of O(M) Trade-off Space-overhead of hash-table Time savings from look-up CS 55 Notes 0 - Query Execution

25 Summary Nested Loop ok for small relations (relative to memory size) Need for complex join condition For equi-join, where relations not sorted and no indexes exist, hash join usually best Sort + merge join good for non-equi-join (e.g.,.c >.C) If relations already sorted, use merge join If index exists, it could be useful (depends on expected result size) CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 6 Join Comparison N i = number of tuples in i B( i ) = number of blocks of i #P = number of partition steps for hash join P ij = average number of join partners Algorithm #I/O Memory Disk Space Nested Loop (block) B( ) / (M-) *! [min(b(),m-) + B( )] 0 Index Nested Loop B( ) + N * P B(Index) + 0 Merge (sorted) B( ) + B( ) Max tuples = 0 Merge (unsorted) B( ) + B( )+ (sort pass) Hash (#P + ) (B( ) + B( )) sort B( ) + B( ) root(max(b( ), B( )), #P + ) ~B( ) + B( ) Why do we need nested loop? emember not all join implementations work for all types of join conditions Algorithm Type of Condition Example Nested Loop any a LIKE %hello% Index Nested Loop Supported by index: Equi-join (hash) Equi or range (B-tree) a = b a < b Merge Equalities and ranges a < b, a = b AND c = d Hash Equi-join a = b CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 8 Outer Joins Merge Left Outer Join How to implement (left) outer joins? Nested Loop and Merge Use a flag that is set to true if we find a match for an outer tuple If flag is false fill with NULL Hash If no matching tuple fill with NULL A a d e 5 B=C S B Output: (a,,,x) S C D Z Z S x y e 6 q d CS 55 Notes 0 - Query Execution 9 CS 55 Notes 0 - Query Execution 50 5

26 Merge Left Outer Join Merge Left Outer Join B=C S No match for (d,) Output: (d,,null,null) S B=C S No match for (e,5) Output: (e,5,null,null) S A B a d e 5 Z Z S C D x y e 6 q d A B a d e 5 Z Z S C D x y e 6 q d CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 5 Operators Overview (External) Sorting Joins (Nested Loop, Merge, Hash, ) Aggregation (Sorting, Hash) Selection, Projection (Index, Scan) Union, Set Difference Intersection Duplicate Elimination Aggregation Have to compute aggregation functions for each group of tuples from input Groups Determined by equality of group-by attributes CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 5 Aggregation Example SELECT sum(a),b! FOM! GOUP BY b! a b sum(a) b 6 6 Aggregation Function Interface init()! Initialize state update(tuple)! Update state with information from tuple close()! eturn result and clean-up CS 55 Notes 0 - Query Execution 55 CS 55 Notes 0 - Query Execution 56 6

27 Implementation SUM(A) init()! sum := 0! update(tuple)! sum += tuple.a! close()! return sum! CS 55 Notes 0 - Query Execution 5 Aggregation Implementations Sorting Sort input on group-by attributes On group boundaries output tuple Hashing Store current aggregated values for each group in hash table Update with newly arriving tuples Output result after processing all inputs CS 55 Notes 0 - Query Execution 58 Grouping by sorting Similar to Merge join Sort on group-by attribute Scan through sorted input If group-by values change Output using close() and call init() Otherwise Call update() a b Aggregation Example SELECT sum(a),b! FOM! GOUP BY b! sort a b init() 0 CS 55 Notes 0 - Query Execution 59 CS 55 Notes 0 - Query Execution 60 Aggregation Example SELECT sum(a),b! FOM! GOUP BY b! Aggregation Example SELECT sum(a),b! FOM! GOUP BY b! a b update(,) a b update(,) 6 CS 55 Notes 0 - Query Execution 6 CS 55 Notes 0 - Query Execution 6

28 Aggregation Example SELECT sum(a),b! FOM! GOUP BY b! a b Group by changed! close(), init(), update(,) 6 0 output Grouping by Hashing Create in-memory hash-table For each input tuple probe hash table with group by values If no entry exists then call init(), update(), and add entry Otherwise call update() for entry Loop through all entries in hash-table and ouput calling close() CS 55 Notes 0 - Query Execution 6 CS 55 Notes 0 - Query Execution 6 Aggregation Example SELECT sum(a),b! FOM! GOUP BY b! Aggregation Example SELECT sum(a),b! FOM! GOUP BY b! Init() and update(,) a b a b CS 55 Notes 0 - Query Execution 65 CS 55 Notes 0 - Query Execution 66 Aggregation Example SELECT sum(a),b! FOM! GOUP BY b! Init() and update(,) Aggregation Example SELECT sum(a),b! FOM! GOUP BY b! update(,) a b a b 6 CS 55 Notes 0 - Query Execution 6 CS 55 Notes 0 - Query Execution 68 8

29 Aggregation Example SELECT sum(a),b! FOM! GOUP BY b! a b Loop through hash table entries Output tuples 6 6 Aggregation Summary Hashing No sorting -> no extra I/O Hash table has to fit into memory No outputs before all inputs have been processed Sorting No memory required Output one group at a time CS 55 Notes 0 - Query Execution 69 CS 55 Notes 0 - Query Execution 0 Operators Overview (External) Sorting Joins (Nested Loop, Merge, Hash, ) Aggregation (Sorting, Hash) Selection, Projection (Index, Scan) Union, Set Difference Intersection Duplicate Elimination Duplicate Elimination Equivalent to group-by on all attributes -> Can use aggregation implementations Optimization Hash Directly output tuple and use hash table only to avoid outputting duplicates CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution Operators Overview (External) Sorting Joins (Nested Loop, Merge, Hash, ) Aggregation (Sorting, Hash) Selection, Projection (Index, Scan) Union, Set Difference Intersection Duplicate Elimination Set Operations Can be modeled as join with different output requirements As aggregation/group by on all columns with different output requirements CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 9

30 Union Bag union Append the two inputs E.g., using three buffers Set union Apply duplicate removal to result Intersection Set version Equivalent to join + project + duplicate removal -state aggregate function (found left, found right, found both) Bag version Join + project + min(i,j) Aggegate min(count(i),count(j)) CS 55 Notes 0 - Query Execution 5 CS 55 Notes 0 - Query Execution 6 Set Difference Using join methods Find matching tuples If no match found, then output Using aggregation count(i) count(j) (bag) true(i) AND false(j) (set) Summary Operator implementations Joins! Other operators Cost estimations I/O memory Query processing architectures CS 55 Notes 0 - Query Execution CS 55 Notes 0 - Query Execution 8 Next Query Optimization Physical -> How to efficiently choose an efficient plan CS 55 Notes 0 - Query Execution 9 0

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Query Processing C H A P T E R12. Practice Exercises

Query Processing C H A P T E R12. Practice Exercises C H A P T E R12 Query Processing Practice Exercises 12.1 Assume (for simplicity in this exercise) that only one tuple fits in a block and memory holds at most 3 blocks. Show the runs created on each pass

More information

Datenbanksysteme II: Implementation of Database Systems Implementing Joins

Datenbanksysteme II: Implementation of Database Systems Implementing Joins Datenbanksysteme II: Implementation of Database Systems Implementing Joins Material von Prof. Johann Christoph Freytag Prof. Kai-Uwe Sattler Prof. Alfons Kemper, Dr. Eickler Prof. Hector Garcia-Molina

More information

SQL Query Evaluation. Winter 2006-2007 Lecture 23

SQL Query Evaluation. Winter 2006-2007 Lecture 23 SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution

More information

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13 External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

Lecture 1: Data Storage & Index

Lecture 1: Data Storage & Index Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager

More information

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level) Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster

More information

DATABASE DESIGN - 1DL400

DATABASE DESIGN - 1DL400 DATABASE DESIGN - 1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information

More information

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * * Binary Heaps A binary heap is another data structure. It implements a priority queue. Priority Queue has the following operations: isempty add (with priority) remove (highest priority) peek (at highest

More information

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing

More information

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao CMPSCI 445 Midterm Practice Questions NAME: LOGIN: Write all of your answers directly on this paper. Be sure to clearly

More information

Big Data and Scripting. Part 4: Memory Hierarchies

Big Data and Scripting. Part 4: Memory Hierarchies 1, Big Data and Scripting Part 4: Memory Hierarchies 2, Model and Definitions memory size: M machine words total storage (on disk) of N elements (N is very large) disk size unlimited (for our considerations)

More information

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3 Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM

More information

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2) Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse

More information

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:

More information

Binary Heap Algorithms

Binary Heap Algorithms CS Data Structures and Algorithms Lecture Slides Wednesday, April 5, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks CHAPPELLG@member.ams.org 2005 2009 Glenn G. Chappell

More information

Symbol Tables. Introduction

Symbol Tables. Introduction Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The

More information

Oracle Database 11g: SQL Tuning Workshop

Oracle Database 11g: SQL Tuning Workshop Oracle University Contact Us: + 38516306373 Oracle Database 11g: SQL Tuning Workshop Duration: 3 Days What you will learn This Oracle Database 11g: SQL Tuning Workshop Release 2 training assists database

More information

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ Answer the following 1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ 2) Which data structure is needed to convert infix notations to postfix notations? Stack 3) The

More information

Inside the PostgreSQL Query Optimizer

Inside the PostgreSQL Query Optimizer Inside the PostgreSQL Query Optimizer Neil Conway neilc@samurai.com Fujitsu Australia Software Technology PostgreSQL Query Optimizer Internals p. 1 Outline Introduction to query optimization Outline of

More information

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:

More information

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92. Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

More information

Data storage Tree indexes

Data storage Tree indexes Data storage Tree indexes Rasmus Pagh February 7 lecture 1 Access paths For many database queries and updates, only a small fraction of the data needs to be accessed. Extreme examples are looking or updating

More information

CIS 631 Database Management Systems Sample Final Exam

CIS 631 Database Management Systems Sample Final Exam CIS 631 Database Management Systems Sample Final Exam 1. (25 points) Match the items from the left column with those in the right and place the letters in the empty slots. k 1. Single-level index files

More information

Sample Questions Csci 1112 A. Bellaachia

Sample Questions Csci 1112 A. Bellaachia Sample Questions Csci 1112 A. Bellaachia Important Series : o S( N) 1 2 N N i N(1 N) / 2 i 1 o Sum of squares: N 2 N( N 1)(2N 1) N i for large N i 1 6 o Sum of exponents: N k 1 k N i for large N and k

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate

More information

Oracle Database 11g: SQL Tuning Workshop Release 2

Oracle Database 11g: SQL Tuning Workshop Release 2 Oracle University Contact Us: 1 800 005 453 Oracle Database 11g: SQL Tuning Workshop Release 2 Duration: 3 Days What you will learn This course assists database developers, DBAs, and SQL developers to

More information

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Questions 1 through 25 are worth 2 points each. Choose one best answer for each. Questions 1 through 25 are worth 2 points each. Choose one best answer for each. 1. For the singly linked list implementation of the queue, where are the enqueues and dequeues performed? c a. Enqueue in

More information

In-Memory Databases MemSQL

In-Memory Databases MemSQL IT4BI - Université Libre de Bruxelles In-Memory Databases MemSQL Gabby Nikolova Thao Ha Contents I. In-memory Databases...4 1. Concept:...4 2. Indexing:...4 a. b. c. d. AVL Tree:...4 B-Tree and B+ Tree:...5

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

Rethinking SIMD Vectorization for In-Memory Databases

Rethinking SIMD Vectorization for In-Memory Databases SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner Understanding SQL Server Execution Plans Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author

More information

Raima Database Manager Version 14.0 In-memory Database Engine

Raima Database Manager Version 14.0 In-memory Database Engine + Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized

More information

Evaluation of Expressions

Evaluation of Expressions Query Optimization Evaluation of Expressions Materialization: one operation at a time, materialize intermediate results for subsequent use Good for all situations Sum of costs of individual operations

More information

External Memory Geometric Data Structures

External Memory Geometric Data Structures External Memory Geometric Data Structures Lars Arge Department of Computer Science University of Aarhus and Duke University Augues 24, 2005 1 Introduction Many modern applications store and process datasets

More information

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo. PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo. VLDB 2009 CS 422 Decision Trees: Main Components Find Best Split Choose split

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs. Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design

More information

A Comparison of Dictionary Implementations

A Comparison of Dictionary Implementations A Comparison of Dictionary Implementations Mark P Neyer April 10, 2009 1 Introduction A common problem in computer science is the representation of a mapping between two sets. A mapping f : A B is a function

More information

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the

More information

Execution Strategies for SQL Subqueries

Execution Strategies for SQL Subqueries Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp With additional slides from material in paper, added by S. Sudarshan 1 Motivation

More information

Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server

Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server Understanding Query Processing and Query Plans in SQL Server Craig Freedman Software Design Engineer Microsoft SQL Server Outline SQL Server engine architecture Query execution overview Showplan Common

More information

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771 ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced

More information

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We

More information

Lecture 6: Query optimization, query tuning. Rasmus Pagh

Lecture 6: Query optimization, query tuning. Rasmus Pagh Lecture 6: Query optimization, query tuning Rasmus Pagh 1 Today s lecture Only one session (10-13) Query optimization: Overview of query evaluation Estimating sizes of intermediate results A typical query

More information

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team Lecture Summary In this lecture, we learned about the ADT Priority Queue. A

More information

SQL Server Query Tuning

SQL Server Query Tuning SQL Server Query Tuning Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author Pro SQL Server

More information

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113 CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113 Instructor: Boris Glavic, Stuart Building 226 C, Phone: 312 567 5205, Email: bglavic@iit.edu Office Hours:

More information

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D. 1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D. base address 2. The memory address of fifth element of an array can be calculated

More information

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n) Hash Tables Tables so far set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n) Worst O(lg n) O(lg n) O(lg n) Table naïve array implementation

More information

Converting a Number from Decimal to Binary

Converting a Number from Decimal to Binary Converting a Number from Decimal to Binary Convert nonnegative integer in decimal format (base 10) into equivalent binary number (base 2) Rightmost bit of x Remainder of x after division by two Recursive

More information

Partitioning and Divide and Conquer Strategies

Partitioning and Divide and Conquer Strategies and Divide and Conquer Strategies Lecture 4 and Strategies Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies Quiz 4.1 For nuclear reactor simulation,

More information

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division Answer Key UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division CS186 Fall 2003 Eben Haber Midterm Midterm Exam: Introduction to Database Systems This exam has

More information

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property CmSc 250 Intro to Algorithms Chapter 6. Transform and Conquer Binary Heaps 1. Definition A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called

More information

Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited

Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited Hash Tables Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited We ve considered several data structures that allow us to store and search for data

More information

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing

More information

Heaps & Priority Queues in the C++ STL 2-3 Trees

Heaps & Priority Queues in the C++ STL 2-3 Trees Heaps & Priority Queues in the C++ STL 2-3 Trees CS 3 Data Structures and Algorithms Lecture Slides Friday, April 7, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks

More information

Chapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013

Chapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013 Chapter 8: Data Storage, Indexing Structures for Files Truong Quynh Chi tqchi@cse.hcmut.edu.vn Spring- 2013 Overview of Database Design Process 2 Outline Data Storage Disk Storage Devices Files of Records

More information

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015 Execution Plans: The Secret to Query Tuning Success MagicPASS January 2015 Jes Schultz Borland plan? The following steps are being taken Parsing Compiling Optimizing In the optimizing phase An execution

More information

Cpt S 223. School of EECS, WSU

Cpt S 223. School of EECS, WSU Priority Queues (Heaps) 1 Motivation Queues are a standard mechanism for ordering tasks on a first-come, first-served basis However, some tasks may be more important or timely than others (higher priority)

More information

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques Database Systems Session 8 Main Theme Physical Database Design, Query Execution Concepts and Database Programming Techniques Dr. Jean-Claude Franchitti New York University Computer Science Department Courant

More information

Record Storage and Primary File Organization

Record Storage and Primary File Organization Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records

More information

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Databases and Information Systems 1 Part 3: Storage Structures and Indices bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: - database buffer -

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

Algorithms. Margaret M. Fleck. 18 October 2010

Algorithms. Margaret M. Fleck. 18 October 2010 Algorithms Margaret M. Fleck 18 October 2010 These notes cover how to analyze the running time of algorithms (sections 3.1, 3.3, 4.4, and 7.1 of Rosen). 1 Introduction The main reason for studying big-o

More information

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1. Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik Demaine and Shafi Goldwasser Quiz 1 Quiz 1 Do not open this quiz booklet until you are directed

More information

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com

More information

DATA STRUCTURES USING C

DATA STRUCTURES USING C DATA STRUCTURES USING C QUESTION BANK UNIT I 1. Define data. 2. Define Entity. 3. Define information. 4. Define Array. 5. Define data structure. 6. Give any two applications of data structures. 7. Give

More information

In-Memory Database: Query Optimisation. S S Kausik (110050003) Aamod Kore (110050004) Mehul Goyal (110050017) Nisheeth Lahoti (110050027)

In-Memory Database: Query Optimisation. S S Kausik (110050003) Aamod Kore (110050004) Mehul Goyal (110050017) Nisheeth Lahoti (110050027) In-Memory Database: Query Optimisation S S Kausik (110050003) Aamod Kore (110050004) Mehul Goyal (110050017) Nisheeth Lahoti (110050027) Introduction Basic Idea Database Design Data Types Indexing Query

More information

File Management. Chapter 12

File Management. Chapter 12 Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution

More information

DATABASDESIGN FÖR INGENJÖRER - 1DL124

DATABASDESIGN FÖR INGENJÖRER - 1DL124 1 DATABASDESIGN FÖR INGENJÖRER - 1DL124 Sommar 2005 En introduktionskurs i databassystem http://user.it.uu.se/~udbl/dbt-sommar05/ alt. http://www.it.uu.se/edu/course/homepage/dbdesign/st05/ Kjell Orsborn

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1 Slide 13-1 Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible

More information

Data Structures Fibonacci Heaps, Amortized Analysis

Data Structures Fibonacci Heaps, Amortized Analysis Chapter 4 Data Structures Fibonacci Heaps, Amortized Analysis Algorithm Theory WS 2012/13 Fabian Kuhn Fibonacci Heaps Lacy merge variant of binomial heaps: Do not merge trees as long as possible Structure:

More information

ProTrack: A Simple Provenance-tracking Filesystem

ProTrack: A Simple Provenance-tracking Filesystem ProTrack: A Simple Provenance-tracking Filesystem Somak Das Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology das@mit.edu Abstract Provenance describes a file

More information

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes. 1. The advantage of.. is that they solve the problem if sequential storage representation. But disadvantage in that is they are sequential lists. [A] Lists [B] Linked Lists [A] Trees [A] Queues 2. The

More information

Storage in Database Systems. CMPSCI 445 Fall 2010

Storage in Database Systems. CMPSCI 445 Fall 2010 Storage in Database Systems CMPSCI 445 Fall 2010 1 Storage Topics Architecture and Overview Disks Buffer management Files of records 2 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query

More information

6. Standard Algorithms

6. Standard Algorithms 6. Standard Algorithms The algorithms we will examine perform Searching and Sorting. 6.1 Searching Algorithms Two algorithms will be studied. These are: 6.1.1. inear Search The inear Search The Binary

More information

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context MapReduce Jeffrey Dean and Sanjay Ghemawat Background context BIG DATA!! o Large-scale services generate huge volumes of data: logs, crawls, user databases, web site content, etc. o Very useful to be able

More information

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University File Management File management system has traditionally been considered part of the operating system. Applications

More information

PES Institute of Technology-BSC QUESTION BANK

PES Institute of Technology-BSC QUESTION BANK PES Institute of Technology-BSC Faculty: Mrs. R.Bharathi CS35: Data Structures Using C QUESTION BANK UNIT I -BASIC CONCEPTS 1. What is an ADT? Briefly explain the categories that classify the functions

More information

Name: 1. CS372H: Spring 2009 Final Exam

Name: 1. CS372H: Spring 2009 Final Exam Name: 1 Instructions CS372H: Spring 2009 Final Exam This exam is closed book and notes with one exception: you may bring and refer to a 1-sided 8.5x11- inch piece of paper printed with a 10-point or larger

More information

Query Processing. Q Query Plan. Example: Select B,D From R,S Where R.A = c S.E = 2 R.C=S.C. Himanshu Gupta CSE 532 Query Proc. 1

Query Processing. Q Query Plan. Example: Select B,D From R,S Where R.A = c S.E = 2 R.C=S.C. Himanshu Gupta CSE 532 Query Proc. 1 Q Query Plan Query Processing Example: Select B,D From R,S Where R.A = c S.E = 2 R.C=S.C Himanshu Gupta CSE 532 Query Proc. 1 How do we execute query? One idea - Do Cartesian product - Select tuples -

More information

Efficient Processing of Joins on Set-valued Attributes

Efficient Processing of Joins on Set-valued Attributes Efficient Processing of Joins on Set-valued Attributes Nikos Mamoulis Department of Computer Science and Information Systems University of Hong Kong Pokfulam Road Hong Kong nikos@csis.hku.hk Abstract Object-oriented

More information

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed ... ...

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed ... ... Outline Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) Hardware: Disks Access Times Example -

More information

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database

More information

Vector storage and access; algorithms in GIS. This is lecture 6

Vector storage and access; algorithms in GIS. This is lecture 6 Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

6 March 2007 1. Array Implementation of Binary Trees

6 March 2007 1. Array Implementation of Binary Trees Heaps CSE 0 Winter 00 March 00 1 Array Implementation of Binary Trees Each node v is stored at index i defined as follows: If v is the root, i = 1 The left child of v is in position i The right child of

More information

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions

More information

Distributed Data Management Summer Semester 2015 TU Kaiserslautern

Distributed Data Management Summer Semester 2015 TU Kaiserslautern Distributed Data Management Summer Semester 2015 TU Kaiserslautern Prof. Dr.-Ing. Sebastian Michel Databases and Information Systems Group (AG DBIS) http://dbis.informatik.uni-kl.de/ Distributed Data Management,

More information

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search Chapter Objectives Chapter 9 Search Algorithms Data Structures Using C++ 1 Learn the various search algorithms Explore how to implement the sequential and binary search algorithms Discover how the sequential

More information

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. "Executive Summary": here.

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. Executive Summary: here. Topic for today Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) How to represent data on disk

More information

Big Data and Scripting map/reduce in Hadoop

Big Data and Scripting map/reduce in Hadoop Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb

More information

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc.

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc. Impala: A Modern, Open-Source SQL Engine for Hadoop Marcel Kornacker Cloudera, Inc. Agenda Goals; user view of Impala Impala performance Impala internals Comparing Impala to other systems Impala Overview:

More information

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files

More information

Chapter 13: Query Optimization

Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

From Last Time: Remove (Delete) Operation

From Last Time: Remove (Delete) Operation CSE 32 Lecture : More on Search Trees Today s Topics: Lazy Operations Run Time Analysis of Binary Search Tree Operations Balanced Search Trees AVL Trees and Rotations Covered in Chapter of the text From

More information