Introduction. Example. Example. Query Optimization. a x 2 b y 2 c z 2 d x 1 e y 3

Size: px
Start display at page:

Download "Introduction. Example. Example. Query Optimization. a x 2 b y 2 c z 2 d x 1 e y 3"

Transcription

1 Query Optimization Elisa Bertino CS Department and CERIAS Purdue University Introduction So far we have seen how to organize the data in a DB Usually the decisions about the auxiliary data structures to allocate are made during the physical design of the database Modifications to such structures can be expensive Therefore when a query is presented to the system, the system needs to determine the most efficient strategy for executing the query by using the available structures 1 2 Example In query processing a query is thus transformed into a query plan Example: SELECT B,D FROM R,S WHERE R.A = c S.E = 2 R.C=S.C Example R A B C S C D E a x 2 b y 2 c z 2 d x 1 e y

2 Introduction Example 5 Reply: B D 2 x How can we execute such query? A possible strategy is: To perform the Cartesian product To then select the tuples based on the search condition To finally perform the projection 6 RXS R.A R.B R.C S.C S.D S.E a x 2 a y 2.. C x 2.. Execution plans Execution strategies The execution plan can be described by using the Relational Algebra The plan for the example would be Π B,D [ σ R.A= c S.E=2 R.C = S.C (RXS)] Plan I Π B,D Another possible strategy Plan II Π B,D σ R.A = c σ S.E = 2 natural join σ R.A= c S.E=2 R.C=S.C R S 7 R X S 8 2

3 Execution strategies R S A B C σ (R) σ(s) C D E a 1 10 A B C C D E 10 x 2 b 1 20 c x 2 20 y 2 c y 2 30 z 2 d z 2 40 x 1 e y 3 Execution strategies Plan III Suppose that there are indexes on attributes R.A and S.C; we can use such indexes as follows We use the index on R.A to retrieve the tuples of R with R.A = c For each value retrieved for R.C, we use the index on S.C to retrieve the matching tuples from S We eliminates the tuples of S such that S.E 2 We concatenate the resulting tuples of R and S and then perform a projection on B and D 9 10 Example R S A= c C A B C I1 I2 C D E a x 2 b 1 20 <c,2,10> 210 <10,x,2> 20 y 2 c 2 10 check=2? 30 z 2 d 2 35 output: <2,x> 40 x 1 e y 3 Execution strategies For complex queries there are several possible execution strategies The cost for determining the optimal strategy can be very high The advantage in terms of execution time is however such that it can be convenient to perform the optimization 11 Next tuple: <c,7,15> 12 3

4 Execution strategies Example Consider the relations Students(S#,Name,Addr,OtherInfo) Exams(Course,S#,Grade,Date) Consider a query that retrieves the names of the students and the date of the exam for the students that have passed the DB exam with a grade of 30 SELECT Name, Date FROM Students NATURAL JOIN Exams WHERE Course ='BD' AND Grade =30 Execution strategies Consider a DB storing students and exams, of which 500 are for the BD course and of which only 50 have 30 has grade (assume that the strategy used for accessing each relation is the sequential scan) If we compute the Cartesian product of the two relations we obtain a temporary relation with tuples, from which we then extract the 50 tuples that are part of the result (cost proportional to accesses) If we instead first retrieve the 50 exams of DB with grade equal 30 and then we execute the join of such temporary relation with the Students relation, we have a cost proportional to accesses Steps in query processing Parsing The syntactical correctness of the query is checked and an internal query representation (parse tree) is generated Algebraic transformations the query is transformed in a query which is equivalent and more efficient to be executed (based on the properties of RA) Examples: Executing the selection operations as soon as possible (push selection down) Avoiding joins that are Cartesian products Steps in query processing Selection of the execution strategy The plan for executing the query is devised (the plan says for example which indexes will be used in the query execution) The strategy is selected mainly based on the I/O costs Execution of the selected strategy It is possible to execute some of the above steps at program compile time (DB2 and System R) or execute all steps at run time

5 Steps in query processing Steps in query processing 17 parse SQL query parse tree convert answer logical query plan execute apply laws statistics Pi improved l.q.p. pick best estimate result sizes l.q.p. +sizes {(P1,C1),(P2,C2)...} consider physical plans estimate costs {P1, P2, } 18 Example: query SQL SELECT title FROM StarsIn WHERE starname IN ( SELECT name FROM MovieStar WHERE birthdate LIKE %1960 ); Find all the movies on which actors born after 1960 have worked Schema: StarsIn(title,year,starName) MovieStar(name, address, gender,birthdate) Steps in query processing parse tree Steps in query processing algebraic representation <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> Πtitle σ <Attribute> <RelName> <Tuple> IN <Query> StarsIn <condition> title StarsIn <Attribute> ( <Query> ) starname <SFW> <tuple> IN Πname SELECT <SelList> FROM <FromList> WHERE <Condition> <attribute> σbirthdate LIKE % <Attribute> <RelName> <Attribute> LIKE <Pattern> name MovieStar birthdate % starname MovieStar 5

6 Steps in query processing logical query plan Steps in query processing improved logical query plan Πtitle Πtitle σstarname=name starname=name StarsIn Πname StarsIn Πname σbirthdate LIKE %1960 σbirthdate LIKE % MovieStar 22 MovieStar Steps in query processing estimation of the result cardinality Steps in query processing physical plan StarsIn we need to estimate the size Π σ Hash join Parameters: join execution order, memory size, attributes on which to project,... SEQ scan index scan Parameters: Selection conditions,... StarsIn MovieStar MovieStar

7 Steps in query processing cost estimation L.Q.P. P1 P2. Pn C1 C2. Cn Selection of the best execution plan A query execution plan thus consists of: An execution order for the various relational operations The strategy for the execution of each operation In what follows we will see: How to order the data on secondary storage (external sorting) Possible strategies for the execution of the various operations The main optimization steps Logical optimization (re-writing rules) Generation of the execution plans and their cost estimates External sorting External sorting Two phase multiway merge sort 27 When processing queries it is very often necessary to sort the data; motivations: The query contains an ORDER BY clause If the data are sorted, the execution of certain operations (duplicate elimination, i joins) becomes more efficient i The classical sorting algorithms cannot be used because the data are too large to fit in MM We will see two approaches two-phase multiway merge sort Use of B+ trees 28 Main idea: we separately sort portions of the data that fit in main memory and then we merge them Phase 1: We fill all available buffer blocks with blocks from the relations to be ordered We order the records in main memory (using one of the classical ordering algorithms, such as quicksort) The sorted records are written onto new secondary storage blocks by forming a sorted run At the end of phase 1 all relation records have been read once and are part of a sorted run, that has a dimension equal to the MM and has been written onto secondary storage 7

8 External sorting Two phase multiway merge sort External sorting Two phase multiway merge sort 29 R Main memory... Ordered runs Example: Consider a relation R containing tuples (each 100 byte long) Suppose that 50 Mb be available as buffer blocks (over 64Mb) Suppose that the blocks have a size of 4096 bytes and thus we can store blocks in main memory A block can store 40 tuples The relation thus contains blocks Thus we will fill in the main memory 20 times And we will make * 2 I/O operations (a much higher cost compared to the cost of ordering in main memory) We should actually take into account that the blocks are likely to be randomly distributed on secondary storage and thus the actual cost will be lower) 30 External sorting Two phase multiway merge sort External sorting Two phase multiway merge sort 31 phase 2: We perform the merge of the sorted runs We could perform the merge by using the two-way merge sort classic algorithm, but this would require to perform 2 log 2 n I/O operations for per n sorts A better approach: we read in memory the first block of each sorted run All block buffers can be used except for one which is used for the output 32 phase 2 (continues): We perform a cycle until all runs have been analyzed; steps: The smallest value among the elements of each run is determined Such value is moved in the output block If the output block is full, it is written on disk and it is reinitialized If the block from which the element has been extracted is empty, the next block from the same run is read If the run does not have any other block to analyze, the corresponding buffer block is left empty In the example, * 2 additional I/O operations are executed 8

9 External sorting Two phase multiway merge sort External sorting Two phase multiway merge sort 33 ordered relation main memory ordered runs 34 In the example we had 20 sorted runs and buffer blocks and thus it is possible to merge the 20 runs in only one step Assume to have Blocks of size equal to B bytes M bytes of available main memory Records of size equal to R bytes The number of blocks is thus M/B We can thus execute a single merge step if there are less than M/B - 1 runs Each time that we fill the main memory we sort M/R records Therefore with this approach we can sort (M/R) * ((M/B) -1) records that is about M 2 /RB records External sorting Two phase multiway merge sort In our example 6.1 billions of records could be ordered, equivalent to 0.6 Terabytes If we had large relations, we can add a third step: We use the described approach to order groups of M 2 /RB records, that are written as sorted runs We apply again the step 2 to such sublists, by sorting in a single relation (M/B) - 1 of such runs We can thus order M 3 /RB 2 records, that requires M 3 /B 2 blocks External sorting The described approach can be improved: blocked I/O (groups of pages are read/writte) double buffering (in order not to waste CPU time while performing I/O) Parallel processing

10 External sorting Use of B+ -trees for the ordering External sorting Use of B+ -trees for the ordering If the relation to be ordered has a B+ tree index allocated on the attribute with respect to which the ordering is executed, the relation can be ordered by traversing the leaf nodes of the index If the index is clustered, this is a good strategy If the index is not clustered, this may be a bad strategy Clustered B+ tree cost: From the root to the letfmost leaf node and then all the leaf nodes Each data page, only one time... index Data entries Data records External sorting Use of B+ -trees for the ordering 39 Unclustered B+ tree cost: The same data page may be accessed several times In the worst case, the number of I/O operations is equal to the number of records! index Later on we will see a more precise analysis Data records of the cost for accessing a relation by using an unclustered index... Data entries 40 We will focus on the operations of the relational algebra, with some important differences: For the set-oriented operations (union, intersection, difference) there is a set version and a bag version, depending on whether the duplicates are eliminated or not The projection has also two variations, one of which is followed by a duplicate elimination operation An ordering operation will be considered An aggregate operation will be considered 10

11 41 The techniques used by the algorithms implementing the various operations are essentially three: iteration: the tuples of the input relation are sequentially analyzed (sequential scan) indexes: if a selection or a join condition is specified, an index can be used to determine the tuples that verify the condition partitioning: by partitioning the tuples based on a key, the operation can be decomposed into a set of less expensive operations on the various partitions examples: ordering and hashing 42 An access path allows us to describe a possible strategy for retrieving the tuples of a relation An access path is (1) Sequential scan, or (2) An index and a corresponding selection condition (called search condition), such that: The index can reduce the cost of verifying the condition (for example if the condition is C+D=1.000 or D 20, with C and D attributes on which indexes are allocated, the indexes do not help in reducing the number of tuples to analyze) The sequential scan is an access path always possible If the operation is a selection or a join, and there are corresponding indexes, we can have different access paths The cost of an access path is given by the number of accessed pages (both the index pages and the data pages) if we use this access path to retrieve the searched tuples When several access paths are available, the one with the minimum cost must be selected Cost = number of I/O operations (we do not consider the CPU cost) In determining the costs for the various operations, we do not consider for now the cost of writing the output This is because such cost does not depend from the used strategy, but only from the output size, which is the same for all strategies In several cases, the output is not actually written on disk

12 Example relations customer (c-name, street, c-city) deposit (b-name, account-number, c-name, balance) T(deposit) = 10, T(customer) = 200 The relation deposit is clustered, with 20 tuples per block, B(deposit) = 500 The relation customer is clustered, with 20 tuple per block, B(customer) = 10 Notation: T(R) = number of tuples in relation R B(R) = number of blocks/pages of relation R Important observation: We have talked of clustered index/relation and of clustering It is important to pay attention to these various meanings Selection Selection 47 We first consider selections with simple conditions of the form σ attr op value (R) As an example consider the query SELECT * FROM deposit WHERE balance > 10,000 Sequential scan: the entire relation is scanned; the condition is checked on each tuple; if a tuple verifies the condition, the tuple it is added to the result The cost is equal to B(R) I/O operations, 500 in our example 48 If there are no indexes on the relation and the file is not ordered, the sequential scan is the only possible strategy If instead there a index allocated on the attribute attr (selection attribute), we can try to use the index to determine the entries of the data verifying the condition and from these entries we can access only the pages that contain such data If the index is of type hash, it can be used only if op is the = operator 12

13 Selection Selection 49 The cost depends from the number of tuples that verify the selection condition (we will see later on how such cost can be estimated) and from whether the index is clustered or not In both cases, the cost is given by the cost of determining the data entries (leaf nodes of the index) that verify the condition plus the cost of accessing the corresponding data blocks (typically much greater) Like the ordering, whereas for a clustered index each data block is accessed only once, for an unclustered index, the same block may have to be accessed several times 50 It is however possible (and very useful) to avoid accessing several times the same block for the same value of the search key The leaf nodes of the index verifying the condition are retrieved The rid s of the data records to be accessed are ordered, so that the rid s of records in the same block are near to each other The corresponding records are accessed according the order determined at the previous step Each data block is thus accessed only once for each value of the search key (even though it can be accessed multiple times, for different values of the search key search) Selection Selection 51 We now consider more general conditions ex. (b-name= Chester AND balance > 10,000) OR account-number= OR c-name = Smith such conditions are first of all re-written in conjunctive normal form (CNF) ex. (b-name= Chester OR account-number= OR c-name = Smith ) AND (balance > 10,000 OR account-number= OR c-name = Smith ) 52 We consider the conjuncts that do not contain OR (called Boolean factors) The elements of such conjuncts if false make false the entire condition It is thus possible to access the tuples of the relation that verify such conjuncts, without having to analyze the other tuples If there are several applicables indexes, two approaches are possible ex. b-name = Chester AND c-name = Smith AND balance > 10,000 AND account-number = 18894, with indexes on b-name, c-name and account-number 13

14 Selection Selection 53 First approach A single index is used The access cost is determined for each index (it depends from the predicate selectivity and from whether the index is clustered or not) The index with the minimum access cost is selected The relation tuples are accessed by using such index The other predicates in the condition are verified directly on the such tuples In the example, if the index on account-number is used to retrieve the tuples, then for each retrieved tuple we need to check the predicate b-name = Chester AND balance > 10,000 AND c-name = Smith 54 Second approach All available indexes are used The rids of records verifying the conditions are retrieved from the leaf nodes of each index and these sets of rids are intersected t The records in the intersection are retrieved from the data pages and the rest of the condition is verified on these records In our example, we use all the indexes to retrieve the rids, we intersect the three sets of rids, we retrieve the corresponding tuples and on each retrieved tuple we check whether balance > 10,000 Selection Projection If instead the conjunct is a disjunction (thus it is not a Boolean factor), it is not possible to use as access paths the indexes on the attributes that appear in the conjunct: note: a conjunct of the form for example c-name = Smith OR c-name = Johnson in which the disjuncts are on the same attribute is equivalent to a conjunct of the form c-name IN ( Smith, Johnson ) for which an index can be used Actually if each term in the disjunction has an index which is a search predicate, one could retrieve all the candidate tuples using the indexes and then make the union of the results The majority of existing systems do not efficiently manage conditions containing disjunctions We consider the algebraic projection, that is, the projection corresponding to a SELECT DISTINCT query To implement the projection we need to: Remove the attributes not appearing in the projection Eliminate the duplicates the second step is the most difficult/expensive There are two possible approaches, one based on the ordering and one based on hashing 14

15 Projection Projection 57 Approach based on ordering The relation R is sequentially scanned to obtain a set of tuples that only contain the attributes to be retrieved Such tuple set is ordered by using one of the ordering algorithms we have seen; all attributes are used as ordering key The result is scanned by eliminating the duplicate tuples (which are adjacent) The total cost is O(B(R)logB(R)) 58 Such approach can be actually extended by integrating the duplicate elimination operations in the ordering algorithm (external merge sort) The first phase is modified so to eliminate the attributes not required by the projection (the size of the tuples to be ordered thus decreases) The second phase is modified so to eliminate the duplicate cost: First phase: B(R) read operations; the same number of tuples is written but the tuples have lower size Second phase: the same number of tuples is read; a lower number of tuples is written (the difference depends from the number of duplicate tuples) Projection Projection 59 Approach based on hashing The main memory is organized into a number B of buffers partitioning phase: R is read by using an input buffer For each tuple, the attributes not required by the projection are removed; a hash function h1 is applied to all the remaining attributes in order to select one of the B - 1 output buffers At the end we have B - 1 partitions (with tuples containing only the required attributes) Two tuples in different partitions are certainly distinct Duplicate elimination phase each partition is read and a hash table is allocated in main memory using a function h2 (different from h1!) on all the attributes If the function applied to a tuple generates the same bucket address of an existing tuple, we check whether the tuple is a duplicate, and if so, the tuple is eliminated The goal of using h2 is to distribute the tuples in a partition in different buffer, in order to minimize the collisions If the partition does not fit in memory, we can recursively apply the algorithm to the partition 60 15

16 Projection 61 cost: First phase: B(R) read operations; the same number of tuples is written but the tuples have smaller size Second phase: the same number of tuples is read; a lower number of tuples is written (the difference depends from the number of duplicate tuples) The approach based on ordering works better than the hashing approach is there several duplicates or the hash function is not uniform; in addition it has the (good) side effect of ordering the relation which can be useful for subsequent operations it is the most widely used If the projection has to be executed on a single attribute on which a dense index is allocated, we could access only the leaf nodes of the three instead of accessing the data file (index-only scan) 62 Consider the natural join between two relations, having only one common attribute Example relations customer (c-name, street, c-city) deposit (b-name, account-number, c-name, balance) T(deposit) = 10,000 T(customer) = 200 The deposit relation is clustered, with 20 tuples per block B(deposit) = 500 The customer relation is clustered, with 20 tuples per block B(customer) = 10 SELECT * FROM customer NATURAL JOIN deposit 63 Simple nested loop a tuple of deposit (outer relation) is accessed and it is compared with each tuple of customer (inner relation) for each tuple d in deposit do begin for each tuple c in customer do begin test pair (d,c) to see if a tuple should be added to the result end end 64 Each tuple of the relation deposit is read only one time; in the worst case reading all the tuples of deposit may require 10,000 I/O (T(R)) Because the tuples are clustered, the cost greatly decrease; the total number of accesses for the deposit relation is B(R), that is, 500 With respect to the tuples of the customer relations, these tuples are accessed for each tuple of deposit; therefore each tuple of customer is accessed 10,000 times Because the tuples of customer are 200, the total cost would be T(R) * T(S) = 2,000,000 Because the tuples of customer are clustered, the total access cost decreases to T(R) * B(S) =100,000 Therefore the total cost, when both relations are clustered, is B(R) + (T(R) * B(S)) = ,000 = 100,500 16

17 65 Block nested loop It is possible to improve the previous strategy if the relations are processed by blocks and not by tuples This strategy is mainly useful when the tuples of the same relation are clustered 66 for each block Bd of deposit do begin for each block Bc of customer do begin for each tuple b in Bd do begin for each tuple c in Bc do begin test pair (b,c) to see if a tuple should be added to the result end end end end This strategy execute the join by processing an entire block of the deposit relation at the time The access cost for deposit is the same of the simple nested loop (that is, 500) We have however to scan the customer relation several times (with a cost of 10 I/O per scan) However with respect to simple iteration strategy, we have to scan the customer relation a number of times equal to the number of blocks of deposit (B(R)) (instead than a number equal to the number of tuples of deposit (T(R))) The number of scans of the customer relation is 500; the total access cost for customer is 500*10=5000 accesses The total cost of this strategy is thus 67 B(R) + (B(R) * B(S)) = = The selection of deposit as outer relation and customer as inner relation is arbitrary If we had used customer as outer relation and deposit as inner the total cost would have been 5010 One advantage in using the smaller relation as inner relation is that if the inner relation is small it can be kept in MM If for example customer were small enough to stay in MM, the execution strategy would require only 500 accesses to read deposit and 10 to read customer, for a total of 510 accesses (B(R) + B(S)) 17

18 69 Index nested loop If one of the two relations has an index on the join attribute, it is better to use this relation as the inner relation and to use the index Assume that an index is allocated on the attribute c- name of the customer relation Consider the simple nested loop as execution strategy Given a tuple d of deposit, we do not any longer need to scan the entire customer relation; we only need to execute a search on the index by using as search key the value d[c-name] 70 for each tuple d of deposit do begin for each tuple c of customer s.t. c[c-name] = d[c-name] do end begin end add pair (b,c) to the result We have to execute 500 accesses to read the deposit relation If each leaf node of the index contains 20 entries, because T(customer) = 200, the index search has a cost of 2 I/O The cost for accessing the data depends on whether the index is clustered or not In the first case, only one access is required In the second case, there could be an access for each matching tuple With respect to the simple nested loop, we need to perform 3 accesses for each tuple of deposit, instead of 200; the total cost of the strategy is, in the worst case, 30,500 I/O Merge join If no one of the two relations is small enough to fit in MM, it is possible to efficiently execute the join if both relations are ordered with respect to the join attribute Assume that the customer and deposit relations be ordered with respect to the c-name attribute The merge-join strategy requires to associate a pointer with each relation Such pointers initially point to the first tuple of each relationi Because the tuples are ordered with respect to the join attribute, each tuple is read exactly one time

19 73 The total cost is B(R) + B(S); in the case of the customer and deposit relations the total cost would be 510 I/O The algorithm does not require that the relation be all in MM; it is sufficient that all tuples with the same join attribute value in MM This is in most cases possible also when relations are large The main disadvantage of this method is that it requires the relations to be ordered However, because it is very efficient, it may be convenient to order the relations before executing the join 74 pd := address of first tuple of deposit; pc := address of first tuple of customer; while (pc ~= null) do begin t c :=tuple to which pc points; S c := {t c }; set pc to point to next tuple of customer; done := false; while (not done) do begin t c ':=tuple to which pc points; if t c [customer-name] =t c '[customer-name] then begin S c := S cu {t c' }; set pc to point to next tuple of customer; end else done:=true; end t d :=tuple to which pd points; while (t d [customer-name] < t c [customer-name]) do begin set pd to point to next tuple of deposit; t d :=tuple to which pd points; end while (t d [customer-name] = t c [customer-name]) do begin for each tins c do begin compute t x t d and add this to the result; end set pd to next tuple of deposit; t d :=tuple to which pd points; D N B C N D a1 b1 i1 pd a1 c1 i6 pc a1 b2 i2 a1 c3 i7 a2 b4 i3 a3 c5 i8 a2 b5 i4 a4 c7 i9 a5 b7 i5 a5 c9 i10 tc=<a1,c1> Sc={<a1,c1>} 75 end end 76 19

20 D N B C N D D N B C N D a1 b1 i1 pd a1 c1 i6 a1 b1 i1 pd a1 c1 i6 a1 b2 i2 a1 c3 i7 pc a1 b2 i2 a1 c3 i7 a2 b4 i3 a3 c5 i8 a2 b4 i3 a3 c5 i8 pc a2 b5 i4 a4 c7 i9 a2 b5 i4 a4 c7 i9 a5 b7 i5 a5 c9 i10 a5 b7 i5 a5 c9 i10 tc'=<a1,c3> and because tc'[n]=tc[n] Sc={<a1,c1>, <a1,c3>} tc'=<a3,c5>, tc'[n]<>tc[n], td=<a1,b1> td[n]<tc[n]? NO td[n]=tc[n]? YES The join between the current td tuple and all the tuples in Sc R={<a1,b1,c1>, <a1,b1,c3>} D N B C N D D N B C N D a1 b1 i1 a1 c1 i6 a1 b1 i1 a1 c1 i6 a1 b2 i2 pd a1 c3 i7 a1 b2 i2 a1 c3 i7 a2 b4 i3 a3 c5 i8 pc a2 b4 i3 pd a3 c5 i8 pc a2 b5 i4 a4 c7 i9 a2 b5 i4 a4 c7 i9 a5 b7 i5 a5 c9 i10 a5 b7 i5 a5 c9 i10 td=<a1,b2> td=<a2,b4> td[n]=tc[n]? YES td[n]=tc[n]? NO 79 R = {<a1,b1,c1>, <a1,b1,c3>, <a1,b2,c1>, <a1,b2,c3>} 80 tc=<a3,c5> Sc={<a3,c5>} 20

21 D N B C N D D N B C N D a1 b1 i1 a1 c1 i6 a1 b1 i1 a1 c1 i6 a1 b2 i2 a1 c3 i7 a1 b2 i2 a1 c3 i7 a2 b4 i3 pd a3 c5 i8 pc a2 b4 i3 a3 c5 i8 pc a2 b5 i4 a4 c7 i9 a2 b5 i4 pd a4 c7 i9 a5 b7 i5 a5 c9 i10 a5 b7 i5 a5 c9 i10 tc'=<a4,c7> td=<a2,b5> 81 tc[n]=tc'[n]? NO td[n] < tc[n]? YES (td[n]=a2 tc[n]=a3) 82 td[n] <tc[n]? YES (td[n]=a2 tc[n]=a3) D N B C N D D N B C N D a1 b1 i1 a1 c1 i6 a1 b1 i1 a1 c1 i6 a1 b2 i2 a1 c3 i7 a1 b2 i2 a1 c3 i7 a2 b4 i3 a3 c5 i8 pc a2 b4 i3 a3 c5 i8 a2 b5 i4 a4 c7 i9 a2 b5 i4 a4 c7 i9 pc a5 b7 i5 pd a5 c9 i10 a5 b7 i5 pd a5 c9 i10 td=<a5,b7> td=<a5,b7> td[n]<tc[n]? NO (td[n]=a5 tc[n]=a3) td[n]<tc[n]? NO (td[n]=a5 tc[n]=a4) 83 td[n]=tc[n]? NO 84 td[n]=tc[n]? NO 21

22 D N B C N D a1 b1 i1 a1 c1 i6 a1 b2 i2 a1 c3 i7 a2 b4 i3 a3 c5 i8 a2 b5 i4 a4 c7 i9 a5 b7 i5 pd a5 c9 i10 pc tc=<a5,c9> td[n]=tc[n]? YES If each of the Sc sets is small enough to fit in MM, each block of the relations is accessed only once, thus the cost is B(R) + B(S) If the relations are not already ordered, the merge phase (phase 2) of the external merge sort algorithm can be combined with the merge required by the join The sublists for both relations are obtained; then the first block of each sublist of each relation is loaded in MM; the sublist merge and the verification of the join predicate are executed simultaneosly <a5,b7,c9> is added to the result Hash join In a partitioning phase both R and S are partitioned using a hash function h, so that the tuples of R in partition i only match the tuples of S in partition i In a matching phase a partition of R is read, and to each element a hash function h2 (different from h) is applied; the corresponding partition of S is scanned in order to determine the matching tuples of S the total cost (if there are no overflows from partitions) is 3 * (B(R) + B(S)) 88 /* partitioning phase - h hash function that distribute to 1..k*/ for each tuple d of deposit do read d and add it to buffer page h(d) for each tuple c of customer do read c and add it to buffer page h(c) for l = 1,, k do /* matching phase */ begin for each tuple d of partition l of deposit do read d and insert into hash table using h2(d) for each tuple c of partition l of customer do begin read c and insert into hash table using h2(c) for matching deposit tuples d add (c,d) to the result end end 22

23 Set-oriented operations 89 More general join conditions Equality predicates on several attributes Any index on any of such attributes can be used in merge and hash join, the tuples must be ordered/partitioned on all the join attributes Inequality predicates The index nested loop strategy can be used if the index is a B+ tree merge and hash join cannot be applied In such case, the block nested loop is the most efficient method 90 The intersection and the Cartesian product are special cases of the join The union (without duplicates) and the difference are similar Let us consider the union Approach based on ordering: Both relations are ordered (with respect to all attributes) The ordered relations are scanned and merged (by eliminating the duplicates) Alternative approach: to apply phase 2 of the external merge sort to the ordered sublists obtained from the application of phase 1 to both relations Set-oriented operations Aggregate operations 91 Approach based on hashing: R and S are partitioned by using a hash function h For each partition of S, a hash table is allocated in MM, using a function h2 The corresponding partition of R is scanned and its tuples are added only if they are not duplicates 92 Without GROUP BY: in general, an aggregate operation requires scanning the entire relation It is necessary to maintain some running information in order to compute the aggregate functions SUM total of examined values AVG total and numbed of examined values COUNT numbed of examined values MIN smallest examined value MAX largest examined value 23

24 Aggregate operations 93 With GROUP BY: The relation is ordered with respect to the grouping attributes; then the relation is scanned in order to compute the aggregate functions for each group Such strategy can be improved by combining the ordering phase and the aggregate computation phase A similar approach is based on the hashing of the grouping attributes 94 parse SQL query parse tree convert answer logical query plan execute apply laws statistics Pi improved l.q.p. pick best estimate result sizes l.q.p. + sizes {(P1,C1),(P2,C2)...} consider physical plans estimate costs {P1, P2, } Pipelining 95 Logical query plan: an algebraic expression (in extended algebra) for the query, represented a tree Physical query plan: a tree in which, in addition to the order of execution for the various operations, the following information is specified: Algorithm for the execution of each operation Access path for each accessed relation Order steps How the results of the intermediate operations are passed to the subsequent operations (materialized vs pipelined) 96 When a query is composed by several operations, the results of an operation can be pipelined to the subsequent operation, without creating a temporary relation storing the intermediate result If instead the output of an operation is saved in a temporary relation, we say that the result is materialized Pipelining the result of an operation to the subsequent operation avoids the cost of writing the intermediate result and than to read it again Because such cost can be significant, pipelining is to preferred to materialization, if the operation allows it 24

25 Pipelining If for example the natural join of three relations (A >< B) >< C has to be executed, one can pipeline the result of the first join to the second join As soon as a tuple of the join A >< B is obtained, such tuple is used to perform the join with C, with a technique like nested loop (with A >< B as outer relation) or with the use of an index Such an approach has the major advantage that it does not require writing the results of A >< B in a temporary file, because the tuples of such relation are produced and directly used a page at the time Iterator interface The execution plan of a query is a tree of relational operations; the plan is executed by invoking the operations according to such order (possibly by also interleaving them) Each operation has one or more input and one output, that are also nodes of the tree To simplify the code coordinating the execution of a plan, the operations that are in the tree provide a uniform iterator interface, that hides the internal implementation details of each operation Iterator interface Iterator interface 99 Such interface supports the following operations: open that initializes the state of the iterator by allocating the input and output buffers; it is also used to transmit arguments (for example, selection conditions) that may modify the behavior of the operation get_next is executed on each input element and executes the code specific for the operation; such code process the input tuples and writes the output tuplcs in the output buffer close that de-allocates the state information 10 0 Such interface directly supports pipelining the result: the decision to perform pipelining or materializing the input tuples is encapsulated in the code implementing the operation processing the input tuples If the algorithm for the operation supports the pipelining, the tuples are not materialized and are processed in pipelining; otherwise if the algorithm must examine several times the input tuples, these are materialized Such decision, as many other implementation details, is hidden by the iterator interface of the operator 25

26 Iterator interface System R 10 1 Such interface allows one to also encapsulate access methods such as indexes (B+-trees or hash) Externally such methods can be simply be seen as operations that generate a sequence of output tuples In such case the open method can be used to transmit the selection condition that corresponds to the access path 10 2 The query optimizers of current commercial relational DBMS have been strongly influenced by the design and decisions concerning the query optimizer of the IBM System R The main novel ideas introduced by System R include: The use of statistics on the database contents in order to estimate the cost of execution plans The heuristics of considering only plans with binary joins in which the inner relation is a base relation (this heuristics reduces the number of alternative plans that have to be considered) The decision of considering SQL queries without subqueries and to handle the subqueries with an ad-hoc approach System R Query Format The decision of not eliminating the duplicates during the projection, but only as an additional step is required by the DISTINCT clause A cost model that takes into account the CPU costs, in addition to the I/O costs In what we will discuss we will follow such choices, except for the cost model, for which we consider only the I/O costs For the moment, we consider queries without subqueries, that is, of the form SELECT AttributeList FROM RelationList WHERE Condition in CNF [GROUP BY AttributeList] [HAVING Condition] We will then see how to handle queries with subqueries later

27 Algebraic equivalences Algebraic equivalences 10 5 An SQL query block can be seen as an algebraic expression that consists of the Cartesian products of the relations in the FROM clause, the selections in the WHERE clause and the projections in the SELECT clause The algebraic equivalences allow us to convert the Cartesian product into joins, to select a different order for executing the joins, and to push down selections and projections An algebraic expression can thus be transformed in an equivalent expression, the evaluation of which is more efficient 10 6 Two RA expressions are equivalent if, for each possible input relation, return the same output Selection σ P1 (σ P2 (e) ) σ P2 (σ P1 (e) ) σ P1 AND P2 (e) It allows one to manage a cascade of selections and establishes the commutative property for the selection Algebraic equivalences Algebraic equivalences Projection Π A1,...,An (Π B1,...,Bm (e)) Π A1,...,An (e) It allows one to manage a cascade of projections Note that {A1,...,An} C {B1,...,Bm} in order for the projection cascade to be correct Cartesian product and join Commutative property e1 >< F e2 e2 >< F e1 e1 >< e2 e2 >< e1 e1 e2 e2 e1 Associative property (e1 >< F1 e2) >< F2 e3 e1 >< F1 (e2 >< F2 e3) (e1 >< e2) >< e3 e1 >< (e2 >< e3) (e1 e2) e3 e1 (e2 e3) 27

28 Algebraic equivalences Algebraic equivalences 10 9 Selection, projection and join Commutation of selection and projection if a selection with a predicate P involves only the attributes t A1,...,An, A then Π A1,...,An (σ P (e)) σ P (Π A1,...,An (e)) More in general, if predicate P involves also some attributes B1,...,Bm that are not among the attributes A1,...,An then Π A1,...,An (σ P (e)) Π A1,...,An (σ P (Π A1,...,An,B1,...,Bm (e))) 11 0 Selection, projection and join Commutation of selection and Cartesian product If a selection with predicate P involves only the attributes of e1, then σ P (e1 e2) σ P (e1) e2 as consequence, if P=P1 AND P2 where P1 involves only the attributes of e1 and P2 only those of e2, σ P (e1 e2) σ P1 (e1) σ P2 (e2) In addition if P1 involves only attributes of e1, whereas P2 involves attributes of e1 and e2 σ P (e1 e2) σ P2 (σ P1 (e1) e2) Algebraic equivalences Algebraic equivalences 11 1 Selection, projection and join It is also possible to transform a selection and a Cartesian product into a join, based on the definition of join σ P (1 (e1 e2) e1 >< P e2 Commutation of projection and Cartesian product let A1,...,An be a list of attributes of which the attributes B1,...,Bm are attributes of e1, and the remaining attributes C1,...,Ck are attributes of e2, then Π A1,...,An (e1 e2) Π B1,...,Bm (e1) Π C1,...,Ck (e2) 11 2 Additional equivalences (union and difference) Commutative and associative properties of the union (e1 e2) e3 e1 (e2 e3) e1 e2 e2 e1 Commutation of selection and union σ P (e1 e2) σ P (e1) σ P (e2) Commutation of selection and difference σ P (e1 - e2) σ P (e1) - σ P (e2) σ P (e1) - e2 Commutation of projection and union Π A1,...,An (e1 e2) Π A1,...,An (e1) Π A1,...An (e2) 28

29 Algebraic equivalences - heuristics Once the logical query plan is generated, the equivalence rules are applied, in order to transform the plan into an improved logical query plan which hopefully is more efficient to execute An alternative possibility would be to consider several equivalent logical plans and to estimate the cost of the various physical plans associated with each logical plan considered Such an alternative is not usually applied in order to limit the number of plans to evaluate Algebraic equivalences - heuristics the heuristics are based on the idea of anticipating as much as possible the execution of the operations that reduce the cardinality of the intermediate results: selection and projection The order according to which joins are executed is instead determined during the next phase based on information concerning the relation sizes and the evaluation of the costs for the various execution orderings Algebraic equivalences - heuristics Algebraic equivalences - heuristics 11 5 Heuristics: selection Execute the selection operations (σ) as soon as possible Transform expressions of the form σ P1 AND P2 (e) into expressions of the form σ P1 (σ P2 (e) ) where P1 are P2 predicates and e è an algebraic expression Sometimes it is however convenient to first move the selection at the external level and then again at the internal level example: (σ A = v (R)) >< S if S has also the attribute A, the most efficient equivalent expression is (σ A = v (R)) >< (σ A = v (S)) 11 6 Heuristics: projection Execute the projection operations (π) as soon as possible It is in addition possible to introduce additional projections in the expression; the only attributes that must not be eliminated are the ones that Appear in the query result Are needed for the subsequent operations However it is not always convenient to introduce these additional projections 29

30 11 7 Algebraic equivalences - heuristics example: π A (σ B = v (R) could be transformed into π A (σ B = v (π A,B (R)) However, if R has an index on attribute B, applying the projection on all the relation results in some waste of time with respect to retrieving through the index the tuples that verify the condition on B and then applying the projection only to these tuples conventional wisdom must be used by keeping in mind that no transformation is always good 11 8 Estimation of the execution cost At this point, we have chosen a logical query plan, that must be transformed into a physical plan Such transformation is in general executed by considering various physical plans that implement the selected logical plan, by estimating the cost of each such physical plans, and by selecting the physical plan with the lowest cost (cost-based enumeration) to estimate the cost of an execution plan, for each node of the tree: The cost of performing the corresponding operation is estimated The cardinality of the result is estimated (it is the input to the subsequent operations) and whether it is ordered 11 9 Estimation of the execution cost Given a logical plan, the corresponding physical plans are obtained by fixing for each physical plan: An execution order for the associative and commutative operations (join, union, intersection) An algorithm for each operator in the logical plan Additional operations (ex. scan, ordering) that are not specified in the logical plan The strategy according to which the intermediate results are passed from one operator to the subsequent one (ex. materialization or pipelining) Statistics In order to determine the costs for the various operations, the DBMS maintains in the system catalogs some statistical information on the data stored in the relations For each relation R: T(R) number of tuples in relation R B(R) number of blocks in relation R S(R) size of a tuple of relation R in bytes (for fixed length tuples; otherwise average values are used) S(A,R) size of attribute A in relation R V(A,R) number of distinct values of attribute A in relation R 12 0 Max(A,R) and Min(A,R) minimum and maximum values of attribute A in relation R 30

31 Statistics Estimation of the result size 12 1 For each index I K(I) number of entries (key values) of index I L(I) number of pages of index I (leaf node pages in the case of a B+-tree) H(I) height of index I Such statistics are updated when a relation is loaded or an index is created; then they are updated only periodically Updating them after each data updates is too expensive; the cost estimates are anyhow approximated and thus it is acceptable that the statistics be not always up to date Several DBMS provide an UPDATE STATISTICS command to explicitly require the statistics updates 12 2 Projection the dimension of π A1,, An (R) can be estimated as follows T(R)* (S(A1,R) + + S(An,R)) [note: the projection does not automatically eliminate the duplicates] If we want to take into account the duplicate elimination, we can estimate the size of π A (R) as V(A,R) * S(A,R) Estimation of the result size Estimation of the result size Selection The number of tuples returned by a selection σ P (R) depends from the number of tuples of R that verify the predicate P The ratio between the tuples of R that verify P and all the tuples of R (that is, the probability that a tupla of R verifies P) is called selectivity factor of predicate P and is denoted by F(P) It is possible to easily estimate the selectivity factor by assuming a uniform distribution for the values of each attribute, that is, under the hypothesis that each value appears with the same probability 12 We can then estimate that σ P (R) selects a number of tuples 3 equal to T(R) * F(P) 12 4 Selection estimation of selectivity factors F(A=v) = 1/V(A,R) [if for some reason V(A,R) is not known, F=1/10 is taken] F(A>v) = (Max(A,R) ( -v)/(max(a,r)-min(a,r)) )( ( ) (, )) F(A<v) = (v-min(a,r))/(max(a,r)-min(a,r)) [if the max and min values of the attribute are not known, or the attribute is not numeric, the estimation is F=1/3] F(A IN (v 1,v 2,...,v N ))= N * F(A=v) F(A BETWEEN (v 1,v 2 ))= (v 2 -v 1 )/(Max(A,R)-Min(A,R)) [if the max and min values of the attribute are not known, or the attribute is not numeric, the estimation is F=1/4] 31

32 Estimation of the result size Estimation of the result size 12 5 Selection - estimation of selectivity factors F(A1 = A2) = 1/ MAX (V(A1,R), V(A2,R)) [such estimation assumes that for each value of the attribute with less values, a corresponding value exists in the attribute with more values; if the numbers of values of the attributes are not known, 1/10 is assumed] F(C 1 AND C 2 ) = F(C 1 ) *F(C 2 ) F(C 1 OR C 2 )=F(C 1 )+F(C 2 )-F(C 1 )*F(C 2 ) F(NOT C) = 1 - F(C) Based on its definition, it is clear that the smallest the selectivity factor the more selective is the corresponding predicate 12 6 Selection - Histograms The estimations we have seen so far are based on the assumption that all values have the same probability to appear, and that there are no correlations among the values of different attributes In some cases, such assumption is not realistic For example, consider the relation deposit and the attribute b-name; we can reasonably expect that the major branches have more deposits and therefore certain branch names are more frequent than other names Estimation of the result size Estimation of the result size 12 7 Selection - Histograms In the employees example it is likely that there is a correlation between the job of an employee and his/her salary However both assumptions work well in a lot of recently Recently more sophisticated techniques have been developed based on maintaining more detailed statistics (histograms of the attribute values) that are being introduced in commercial systems 12 8 Selection - Histograms To better approximate the distribution of the values of an attribute A, we may maintain other information, in addition to the number of values, and the min and max values

33 12 9 Estimation of the result cardinality Selection - histogram Under the uniform distribution assumption, the distribution is approximated as follows Estimation of the result cardinality Selection - Histograms An histogram is a data structure maintained by the DBMS to approximate the distributions of the data values the range of the values taken by attribute A is divided in subranges called buckets; for each bucket the number of tuples of R having for attribute A a value in this bucket is determined There are two possible strategies for determining the buckets: equiwidth (the subranges all contain the same number of values of A) and equidepth (the subranges all contain the same number of tuples of R) 13 In addition sometimes the DBMS maintains the most frequent 0 values and the number of their occurrences (ex. 7: 8 e 14: 9) Estimation of the result cardinality Estimation of the result cardinality Selection - Histograms histograms equiwidth Selection - Histograms histograms equidepth

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Query Processing. Q Query Plan. Example: Select B,D From R,S Where R.A = c S.E = 2 R.C=S.C. Himanshu Gupta CSE 532 Query Proc. 1

Query Processing. Q Query Plan. Example: Select B,D From R,S Where R.A = c S.E = 2 R.C=S.C. Himanshu Gupta CSE 532 Query Proc. 1 Q Query Plan Query Processing Example: Select B,D From R,S Where R.A = c S.E = 2 R.C=S.C Himanshu Gupta CSE 532 Query Proc. 1 How do we execute query? One idea - Do Cartesian product - Select tuples -

More information

SQL Query Evaluation. Winter 2006-2007 Lecture 23

SQL Query Evaluation. Winter 2006-2007 Lecture 23 SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution

More information

Evaluation of Expressions

Evaluation of Expressions Query Optimization Evaluation of Expressions Materialization: one operation at a time, materialize intermediate results for subsequent use Good for all situations Sum of costs of individual operations

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Chapter 13: Query Optimization

Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Query Processing C H A P T E R12. Practice Exercises

Query Processing C H A P T E R12. Practice Exercises C H A P T E R12 Query Processing Practice Exercises 12.1 Assume (for simplicity in this exercise) that only one tuple fits in a block and memory holds at most 3 blocks. Show the runs created on each pass

More information

Inside the PostgreSQL Query Optimizer

Inside the PostgreSQL Query Optimizer Inside the PostgreSQL Query Optimizer Neil Conway neilc@samurai.com Fujitsu Australia Software Technology PostgreSQL Query Optimizer Internals p. 1 Outline Introduction to query optimization Outline of

More information

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13 External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute

More information

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level) Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs. Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design

More information

Translating SQL into the Relational Algebra

Translating SQL into the Relational Algebra Translating SQL into the Relational Algebra Jan Van den Bussche Stijn Vansummeren Required background Before reading these notes, please ensure that you are familiar with (1) the relational data model

More information

DATABASE DESIGN - 1DL400

DATABASE DESIGN - 1DL400 DATABASE DESIGN - 1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information

More information

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao CMPSCI 445 Midterm Practice Questions NAME: LOGIN: Write all of your answers directly on this paper. Be sure to clearly

More information

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ Answer the following 1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ 2) Which data structure is needed to convert infix notations to postfix notations? Stack 3) The

More information

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92. Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

More information

CIS 631 Database Management Systems Sample Final Exam

CIS 631 Database Management Systems Sample Final Exam CIS 631 Database Management Systems Sample Final Exam 1. (25 points) Match the items from the left column with those in the right and place the letters in the empty slots. k 1. Single-level index files

More information

Lecture 6: Query optimization, query tuning. Rasmus Pagh

Lecture 6: Query optimization, query tuning. Rasmus Pagh Lecture 6: Query optimization, query tuning Rasmus Pagh 1 Today s lecture Only one session (10-13) Query optimization: Overview of query evaluation Estimating sizes of intermediate results A typical query

More information

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing

More information

Advanced Oracle SQL Tuning

Advanced Oracle SQL Tuning Advanced Oracle SQL Tuning Seminar content technical details 1) Understanding Execution Plans In this part you will learn how exactly Oracle executes SQL execution plans. Instead of describing on PowerPoint

More information

Symbol Tables. Introduction

Symbol Tables. Introduction Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The

More information

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer. DBMS Architecture INSTRUCTION OPTIMIZER Database Management Systems MANAGEMENT OF ACCESS METHODS BUFFER MANAGER CONCURRENCY CONTROL RELIABILITY MANAGEMENT Index Files Data Files System Catalog BASE It

More information

University of Aarhus. Databases 2009. 2009 IBM Corporation

University of Aarhus. Databases 2009. 2009 IBM Corporation University of Aarhus Databases 2009 Kirsten Ann Larsen What is good performance? Elapsed time End-to-end In DB2 Resource consumption CPU I/O Memory Locks Elapsed time = Sync. I/O + CPU + wait time I/O

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

Instant SQL Programming

Instant SQL Programming Instant SQL Programming Joe Celko Wrox Press Ltd. INSTANT Table of Contents Introduction 1 What Can SQL Do for Me? 2 Who Should Use This Book? 2 How To Use This Book 3 What You Should Know 3 Conventions

More information

Chapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013

Chapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013 Chapter 8: Data Storage, Indexing Structures for Files Truong Quynh Chi tqchi@cse.hcmut.edu.vn Spring- 2013 Overview of Database Design Process 2 Outline Data Storage Disk Storage Devices Files of Records

More information

Lecture 1: Data Storage & Index

Lecture 1: Data Storage & Index Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager

More information

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR 1 2 2 3 In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR The uniqueness of the primary key ensures that

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino Database Management Data Base and Data Mining Group of tania.cerquitelli@polito.it A.A. 2014-2015 Optimizer objective A SQL statement can be executed in many different ways The query optimizer determines

More information

10CS35: Data Structures Using C

10CS35: Data Structures Using C CS35: Data Structures Using C QUESTION BANK REVIEW OF STRUCTURES AND POINTERS, INTRODUCTION TO SPECIAL FEATURES OF C OBJECTIVE: Learn : Usage of structures, unions - a conventional tool for handling a

More information

Oracle Database 11g: SQL Tuning Workshop Release 2

Oracle Database 11g: SQL Tuning Workshop Release 2 Oracle University Contact Us: 1 800 005 453 Oracle Database 11g: SQL Tuning Workshop Release 2 Duration: 3 Days What you will learn This course assists database developers, DBAs, and SQL developers to

More information

Oracle Database 11g: SQL Tuning Workshop

Oracle Database 11g: SQL Tuning Workshop Oracle University Contact Us: + 38516306373 Oracle Database 11g: SQL Tuning Workshop Duration: 3 Days What you will learn This Oracle Database 11g: SQL Tuning Workshop Release 2 training assists database

More information

MapReduce examples. CSE 344 section 8 worksheet. May 19, 2011

MapReduce examples. CSE 344 section 8 worksheet. May 19, 2011 MapReduce examples CSE 344 section 8 worksheet May 19, 2011 In today s section, we will be covering some more examples of using MapReduce to implement relational queries. Recall how MapReduce works from

More information

Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : 933597 Hertford College Supervisor: Dr.

Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : 933597 Hertford College Supervisor: Dr. Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : 933597 Hertford College Supervisor: Dr. Dan Olteanu Submitted as part of Master of Computer Science Computing Laboratory

More information

Datenbanksysteme II: Implementation of Database Systems Implementing Joins

Datenbanksysteme II: Implementation of Database Systems Implementing Joins Datenbanksysteme II: Implementation of Database Systems Implementing Joins Material von Prof. Johann Christoph Freytag Prof. Kai-Uwe Sattler Prof. Alfons Kemper, Dr. Eickler Prof. Hector Garcia-Molina

More information

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database

More information

MOC 20461C: Querying Microsoft SQL Server. Course Overview

MOC 20461C: Querying Microsoft SQL Server. Course Overview MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server

More information

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium. Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically

More information

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the

More information

SQL Server Query Tuning

SQL Server Query Tuning SQL Server Query Tuning Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author Pro SQL Server

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

DBMS / Business Intelligence, SQL Server

DBMS / Business Intelligence, SQL Server DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.

More information

Best Practices for DB2 on z/os Performance

Best Practices for DB2 on z/os Performance Best Practices for DB2 on z/os Performance A Guideline to Achieving Best Performance with DB2 Susan Lawson and Dan Luksetich www.db2expert.com and BMC Software September 2008 www.bmc.com Contacting BMC

More information

C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods

C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods Overview 1 Determining Data Relationships 1 Understanding the Methods for Combining SAS Data Sets 3

More information

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We

More information

Guide to Performance and Tuning: Query Performance and Sampled Selectivity

Guide to Performance and Tuning: Query Performance and Sampled Selectivity Guide to Performance and Tuning: Query Performance and Sampled Selectivity A feature of Oracle Rdb By Claude Proteau Oracle Rdb Relational Technology Group Oracle Corporation 1 Oracle Rdb Journal Sampled

More information

Relational Algebra. Basic Operations Algebra of Bags

Relational Algebra. Basic Operations Algebra of Bags Relational Algebra Basic Operations Algebra of Bags 1 What is an Algebra Mathematical system consisting of: Operands --- variables or values from which new values can be constructed. Operators --- symbols

More information

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University File Management File management system has traditionally been considered part of the operating system. Applications

More information

Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

More information

File Management. Chapter 12

File Management. Chapter 12 Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8 Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

2) Write in detail the issues in the design of code generator.

2) Write in detail the issues in the design of code generator. COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage

More information

Record Storage and Primary File Organization

Record Storage and Primary File Organization Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records

More information

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan?

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan? Why Query Optimization? Access Path Selection in a Relational Database Management System P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price Peyman Talebifard Queries must be executed and execution

More information

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Databases and Information Systems 1 Part 3: Storage Structures and Indices bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: - database buffer -

More information

Data Structures Fibonacci Heaps, Amortized Analysis

Data Structures Fibonacci Heaps, Amortized Analysis Chapter 4 Data Structures Fibonacci Heaps, Amortized Analysis Algorithm Theory WS 2012/13 Fabian Kuhn Fibonacci Heaps Lacy merge variant of binomial heaps: Do not merge trees as long as possible Structure:

More information

Introduction to Parallel Programming and MapReduce

Introduction to Parallel Programming and MapReduce Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant

More information

Execution Strategies for SQL Subqueries

Execution Strategies for SQL Subqueries Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp With additional slides from material in paper, added by S. Sudarshan 1 Motivation

More information

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1. Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik Demaine and Shafi Goldwasser Quiz 1 Quiz 1 Do not open this quiz booklet until you are directed

More information

Storage and File Structure

Storage and File Structure Storage and File Structure Chapter 10: Storage and File Structure Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files

More information

Physical DB design and tuning: outline

Physical DB design and tuning: outline Physical DB design and tuning: outline Designing the Physical Database Schema Tables, indexes, logical schema Database Tuning Index Tuning Query Tuning Transaction Tuning Logical Schema Tuning DBMS Tuning

More information

Introduction. The Quine-McCluskey Method Handout 5 January 21, 2016. CSEE E6861y Prof. Steven Nowick

Introduction. The Quine-McCluskey Method Handout 5 January 21, 2016. CSEE E6861y Prof. Steven Nowick CSEE E6861y Prof. Steven Nowick The Quine-McCluskey Method Handout 5 January 21, 2016 Introduction The Quine-McCluskey method is an exact algorithm which finds a minimum-cost sum-of-products implementation

More information

Performance Evaluation of Natural and Surrogate Key Database Architectures

Performance Evaluation of Natural and Surrogate Key Database Architectures Performance Evaluation of Natural and Surrogate Key Database Architectures Sebastian Link 1, Ivan Luković 2, Pavle ogin *)1 1 Victoria University of Wellington, Wellington, P.O. Box 600, New Zealand sebastian.link@vuw.ac.nz

More information

IT2305 Database Systems I (Compulsory)

IT2305 Database Systems I (Compulsory) Database Systems I (Compulsory) INTRODUCTION This is one of the 4 modules designed for Semester 2 of Bachelor of Information Technology Degree program. CREDITS: 04 LEARNING OUTCOMES On completion of this

More information

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:

More information

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions

More information

TREE BASIC TERMINOLOGIES

TREE BASIC TERMINOLOGIES TREE Trees are very flexible, versatile and powerful non-liner data structure that can be used to represent data items possessing hierarchical relationship between the grand father and his children and

More information

Introduction to database design

Introduction to database design Introduction to database design KBL chapter 5 (pages 127-187) Rasmus Pagh Some figures are borrowed from the ppt slides from the book used in the course, Database systems by Kiefer, Bernstein, Lewis Copyright

More information

Question 1. Relational Data Model [17 marks] Question 2. SQL and Relational Algebra [31 marks]

Question 1. Relational Data Model [17 marks] Question 2. SQL and Relational Algebra [31 marks] EXAMINATIONS 2005 MID-YEAR COMP 302 Database Systems Time allowed: Instructions: 3 Hours Answer all questions. Make sure that your answers are clear and to the point. Write your answers in the spaces provided.

More information

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

LOBs were introduced back with DB2 V6, some 13 years ago. (V6 GA 25 June 1999) Prior to the introduction of LOBs, the max row size was 32K and the

LOBs were introduced back with DB2 V6, some 13 years ago. (V6 GA 25 June 1999) Prior to the introduction of LOBs, the max row size was 32K and the First of all thanks to Frank Rhodes and Sandi Smith for providing the material, research and test case results. You have been working with LOBS for a while now, but V10 has added some new functionality.

More information

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing

More information

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed ... ...

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed ... ... Outline Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) Hardware: Disks Access Times Example -

More information

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D. 1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D. base address 2. The memory address of fifth element of an array can be calculated

More information

3. The Junction Tree Algorithms

3. The Junction Tree Algorithms A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

More information

2) What is the structure of an organization? Explain how IT support at different organizational levels.

2) What is the structure of an organization? Explain how IT support at different organizational levels. (PGDIT 01) Paper - I : BASICS OF INFORMATION TECHNOLOGY 1) What is an information technology? Why you need to know about IT. 2) What is the structure of an organization? Explain how IT support at different

More information

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division Answer Key UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division CS186 Fall 2003 Eben Haber Midterm Midterm Exam: Introduction to Database Systems This exam has

More information

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3 Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM

More information

Efficient Data Structures for Decision Diagrams

Efficient Data Structures for Decision Diagrams Artificial Intelligence Laboratory Efficient Data Structures for Decision Diagrams Master Thesis Nacereddine Ouaret Professor: Supervisors: Boi Faltings Thomas Léauté Radoslaw Szymanek Contents Introduction...

More information

www.gr8ambitionz.com

www.gr8ambitionz.com Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1

More information

Relational Database: Additional Operations on Relations; SQL

Relational Database: Additional Operations on Relations; SQL Relational Database: Additional Operations on Relations; SQL Greg Plaxton Theory in Programming Practice, Fall 2005 Department of Computer Science University of Texas at Austin Overview The course packet

More information

Efficient Processing of Joins on Set-valued Attributes

Efficient Processing of Joins on Set-valued Attributes Efficient Processing of Joins on Set-valued Attributes Nikos Mamoulis Department of Computer Science and Information Systems University of Hong Kong Pokfulam Road Hong Kong nikos@csis.hku.hk Abstract Object-oriented

More information

ICAB4136B Use structured query language to create database structures and manipulate data

ICAB4136B Use structured query language to create database structures and manipulate data ICAB4136B Use structured query language to create database structures and manipulate data Release: 1 ICAB4136B Use structured query language to create database structures and manipulate data Modification

More information

Access Path Selection in a Relational Database Management System

Access Path Selection in a Relational Database Management System Access Path Selection in a Relational Database Management System P. Griffiths Selinger M. M. Astrahan D. D. Chamberlin R. A. Lorie T. G. Price IBM Research Division, San Jose, California 95193 ABSTRACT:

More information

PES Institute of Technology-BSC QUESTION BANK

PES Institute of Technology-BSC QUESTION BANK PES Institute of Technology-BSC Faculty: Mrs. R.Bharathi CS35: Data Structures Using C QUESTION BANK UNIT I -BASIC CONCEPTS 1. What is an ADT? Briefly explain the categories that classify the functions

More information

MapReduce and the New Software Stack

MapReduce and the New Software Stack 20 Chapter 2 MapReduce and the New Software Stack Modern data-mining applications, often called big-data analysis, require us to manage immense amounts of data quickly. In many of these applications, the

More information

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration

More information

16.1 MAPREDUCE. For personal use only, not for distribution. 333

16.1 MAPREDUCE. For personal use only, not for distribution. 333 For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several

More information

COMP 5138 Relational Database Management Systems. Week 5 : Basic SQL. Today s Agenda. Overview. Basic SQL Queries. Joins Queries

COMP 5138 Relational Database Management Systems. Week 5 : Basic SQL. Today s Agenda. Overview. Basic SQL Queries. Joins Queries COMP 5138 Relational Database Management Systems Week 5 : Basic COMP5138 "Relational Database Managment Systems" J. Davis 2006 5-1 Today s Agenda Overview Basic Queries Joins Queries Aggregate Functions

More information

Converting a Number from Decimal to Binary

Converting a Number from Decimal to Binary Converting a Number from Decimal to Binary Convert nonnegative integer in decimal format (base 10) into equivalent binary number (base 2) Rightmost bit of x Remainder of x after division by two Recursive

More information

[Refer Slide Time: 05:10]

[Refer Slide Time: 05:10] Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture

More information

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig Contents Acknowledgements... 1 Introduction to Hive and Pig... 2 Setup... 2 Exercise 1 Load Avro data into HDFS... 2 Exercise 2 Define an

More information

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design Physical Database Design Process Physical Database Design Process The last stage of the database design process. A process of mapping the logical database structure developed in previous stages into internal

More information

COWLEY COLLEGE & Area Vocational Technical School

COWLEY COLLEGE & Area Vocational Technical School COWLEY COLLEGE & Area Vocational Technical School COURSE PROCEDURE FOR COBOL PROGRAMMING CIS1866 3 Credit Hours Student Level: This course is open to students on the college level in either Freshman or

More information

3. Relational Model and Relational Algebra

3. Relational Model and Relational Algebra ECS-165A WQ 11 36 3. Relational Model and Relational Algebra Contents Fundamental Concepts of the Relational Model Integrity Constraints Translation ER schema Relational Database Schema Relational Algebra

More information