Query Optimization. Coming to Introduction to Database Systems by C. J. Date, he discussed the automatic optimization.

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Query Optimization. Coming to Introduction to Database Systems by C. J. Date, he discussed the automatic optimization."

Transcription

1 Query Optimization Introduction: Query optimization is a function of many relational database management systems in which multiple query plans for satisfying a query are examined and a good query plan is identified. This may or not be the absolute best strategy because there are many ways of doing plans. There is a trade-off between the amount of time spent figuring out the best plan and the amount running the plan. Different qualities of database management systems have different ways of balancing these two. Cost based query optimizers evaluate the resource footprint of various query plans and use this as the basis for plan selection. Typically the resources which are costed are CPU path length, amount of disk buffer space, disk storage service time, and interconnect usage between units of parallelism. The set of query plans examined is formed by examining possible access paths (e.g., primary index access, secondary index access, full file scan) and various relational table join techniques (e.g., merge join, hash join, product join). The search space can become quite large depending on the complexity of the SQL query. There are two types of optimization. These consist of logical optimization which generates a sequence of relational algebra to solve the query. In addition there is physical optimization which is used to determine the means of carrying out each operation. Coming to Introduction to Database Systems by C. J. Date, he discussed the automatic optimization. There are several reasons to say that optimizer might actually do better than Human. A good optimizer will have a wealth of information which a normal human user doesn t have, like certain statistical information. o Number of distinct values of each type. o Number of tuples currently appearing in each base relvar. o Number of distinct values currently appearing in each attribute in each base relvar. and so on. As a result, the optimizer is able to make more accurate assessment of the efficiency in any given strategy for implementing particular request. Thus, it is more likely to choose the most efficient implementation.

2 If the database statistics change over time, then a different strategy might be chosen; re-optimization might be required. In Relational system the reoptimization is trivial; it needs to reprocess the original query by system optimizer whereas in Non-Relational System, the re-optimization involves rewriting of the program. Optimization is a program. So it is much more patient than humans. It considers several hundreds of implementation strategies for given request. Human user would not consider more than three or four. Optimizer is designed by the skills and services of best human programmers. So it makes the scarce set of resources available to everybody in an efficient and cost effective manner. The above reasons support as evidence that relational requests are Optimizable- in fact strength of Relational systems. Motivating Example: Consider the shipment example, ((SP JOIN S) WHERE P# = P# ( P2 )) {SNAME} Consider database contains 100 suppliers and 10,000 shipments, of which 50 are of part P2. If the above query is to be executed without optimizing then following sequence will occur. JOIN SP and S (over S#): This step involves reading 10,000 shipments; reading 100 suppliers 10,000 times each; constructing 10,000 joined tuples; writing them back into disk. Restrict the result of step 1 to just the tuples for part 2: This step involves reading the joined tuples and produces a result consisting of 50 tuples. Project the result of step 2 over SNAME: This step produces the desired result. If the above example is performed with optimizing then following sequence will occur.

3 Restrict SP to just the tuples for part P2: This step involves the reading of 10,000 records once producing 50 tuples of part P2. Join Result of step 1 to S (over S#): This step involves reading 100 suppliers only once and produces the 50 joined tuples. Project the result of step 2 over SNAME: Desired result will be produced. From the above we can clearly see that the execution without optimization involves total of 1,030,000 tuple I/O s, whereas with optimization involves 10,100. If number of tuple I/O s is our measure then second procedure is 100 times better than first. We see that a simple change in the execution algorithm (doing restriction and then joining instead of joining and then restricting) has produced a dramatic improvement in performance. Performance would improve more dramatically if we include hashing or indexing on P#. The number of shipments read in step 1 would reduce from 10,000 to 50 and number of suppliers read in step 2 would reduce from 100 to 50; which is almost 10,000 times better than the original execution. I.e. if un-optimized query took 3 hours to execute, the optimized query using hashing or indexing will take just over 1 sec. An overview of Query Optimization: We can identify the four broad stages in Query Processing: 1. Cast the query into internal form. 2. Convert to colonial form. 3. Choose candidate low-level procedures. 4. Generate Query plans and choose the cheapest.

4 Query Processing Overview Cast the query into internal form: The original query is converted into some internal representation that is more suitable for machine manipulations; thus eliminating the external considerations (such as syntax) and paving the subsequent stages in overall process. View Processing is also done during this stage. What formalism should the internal representation based on? Whatever formalism chosen it must be rich enough to represent all the queries in external query language. It should be neutral, in sense that it should not prejudice the subsequent choices. Internal form typically chosen is some kind of abstract syntax tree or query tree.

5 Query tree for Get names of suppliers who supplies part P2 However for internal representation, it will be convenient to choose the formalisms we are familiar with: namely Relational Algebra, Relational Calculus. The algebraic expression for the above tree will be ((SP JOIN S) WHERE P# = P# ( P2 )) {SNAME} Convert to colonial form: In this stage, optimizer performs a number of optimizations that guarantee to be good, regardless the actual data and its physical path. The point is relational language allows all but simplest of queries can be expressed in variety of ways and not by replacing A=B by B=A, etc. And the performance is not dependent on the way the user writes. The next step in processing is converting the internal representation into colonial form, with objective to eliminate such superficial distinctions. Given a set Q of objects and a notion of equivalence among those objects, Subset C of Q is said to be canonical set of Q if every object q of Q is equivalent with exactly one c of C. In order to transform the result of stage 1 to equivalent but efficient form, the optimizer makes use of certain transformation rules. E.g. (A JOIN B) WHERE restriction on A Can be transformed into equivalent and efficient expression (A WHERE restriction on A) JOIN B Choose candidate low-level Procedures: After converting the internal representation into some more desirable form, the optimizer must decide how to execute this transformed query. At this stage all the data values, physical path, etc. Come into play.

6 The best strategy is to consider the query as a sequence of low level operations. The code to perform will require its input tuples to be sorted in some order. The output tuples of the preceding operation must be in sequence to input for the next operation. For each possible low level operation, the optimizer will have set of pre-defined implementation procedures. For example, Restriction operation has a set of implementation procedures: One is using Equality comparison. One where restriction attribute is indexed, One where restriction attribute is hashed. Next by using the catalog information regarding the current state of the database, the optimizer will choose one or more candidate procedures. The process is sometimes referred to as Access Path Selection. Generate query plans and choose the cheapest: The final stage of optimization process involves construction of set of candidate query plans, followed by best of those plans. Each query is built by combining the candidate procedures; One such procedure for each low level operation in the query. It is not a good idea to analyze all possible plans. So it is better to use some heuristic algorithm to set the bounds. It reduces the search space thereby referred as reducing search space. Choosing the cheapest plan obviously need a method to find the cost. In optimization the cost of the given plan is the sum of all the individual costs. The problem is cost depends on the size of the relation to be processed. Since the intermediate results will be generated during execution, it has to find cost of these intermediate results. But these results are dependent on actual data values. So accurate cost estimation is a difficult problem. Expression Transformation: In this session we describe some of the transformation rules that might be useful in stage 2 of optimization process. Explaining why they were useful with examples. Given a particular expression to transform, the application of one rule might generate an expression that could be transformed in accordance to other rule. Starting from one expression the optimizer will apply its transformation repeatedly until it finally arrives at an expression it could judge based on some set of heuristics.

7 Restrictions & Projections: It is better to do restriction before projection as it reduces the size of input to the projection and reduce the amount of data that might need to be sorted for duplicate elimination purposes. Distributivity: This transformation rule used in the previous example (transforming a join followed by a restriction into a restriction followed by a join) is actually a special case of Distributive law. In general f is said to be distributive over o if and only if f (A o B) = f (A) o f (B), for all A, B. In general arithmetic, for example, SQRT is distributive over multiplication, because SQRT (A * B) = SQRT (A) * SQRT (B)

8 So an arithmetic expression optimizer can replace either expression by other when doing arithmetic expression transformation. In counter example the SQRT is not distributive over Addition, as SQRT of A+B is not equal to SQRT (A) + SQRT (B). In Relational Algebra, restriction is distributive over union, intersection, and difference. It also distributes over join, if and only if the restriction condition consists, at its most complex, of two simple restriction conditions ANDed together, one for each of the two join operands. In the case of supplier s example, this requirement was indeed satisfied- in fact the condition was a simple restriction condition on just one of the operands- and so we could use the distributive law to replace the overall expression by a more efficient equivalent. The net effect was that we were able to do the restriction early. Doing the restriction early is a good idea, because it reduces the number of tuples to be scanned in the next operation in sequence and probably reduces the number of tuples in the output from the next operation too. Here are a couple more specific cases of distributive law, this time involving projection. First projection distributes over union and intersection but not difference. A and B must be of same type of course. Second, Projection also distributes over join as long as the projection retains all of the join attributes, thus: Here acl1 is the union of the join attributes and those attributes of acl that appear in A only, acl2 is the union of the join attributes and attributes of acl that appear in B only. These laws can be used to do projections early, which again is usually a good idea for reasons similar to those given previously for restrictions. Idempotent and Absorption:

9 Commutativity and Associativity: Computational Expressions: It is not just relational expressions that are subject to transformation laws. For instance, we have already indicated that certain transformations are valid for arithmetic expressions. Here is a specific example: The expression A * B + A * C Can be transformed into A * (B + C) By virtue of the fact that * distributes over +. A relational optimizer needs to know about such transformations because it will encounter such expressions in the context of the extend and summarize operators. Note, incidentally, that this example illustrates a slightly more general form of distributivity. Earlier, we defined distributivity in term of a monadic operator distributing over a dyadic operator; in the case at hand, however, * and + are both

10 dyadic operators. In general, the dyadic operator δ is said to be distributive over the dyadic operator Ο if and only if A δ (B Ο C) = (A δ B) Ο (A δ C) For all A, B, C (in the arithmetic example, take δ as * and Ο as + ). Boolean Expressions: We turn now to Boolean expressions. Suppose A and B are attributes of two distinct relations. Then the Boolean expression A > B and B > 3 Is clearly equivalent to the following: A > B and B > 3 and A > 3 The equivalence is based on the fact that the comparison operator ">" is transitive. Note that this transformation is certainly work making, because it enables the system to perform an additional restriction (on A) before doing the greater-than join implied by the comparison "A > B''. To repeat a point made earlier doing restrictions early is generally a good idea; having the system infer additional "early'' restrictions, as here, is also a good idea. Note: This technique is implemented in several commercial products, including, for example, DB2 (where it is called "predicate transitive closure') and Ingres. Here is another example: The expression A > B or (C = D and E < F) Can be transformed into (A > B or C = D) and (A > B or E < F) By virtue of the fact that OR distributes over AND. This example illustrates another general law-vfz.; any Boolean expression can be transformed into an equivalent in what is called conjunctive normal form (CNF). A CNF expression is an expression of the form C1 and C2 and and Cn Where each of C1, C2 Cn is, in turn a boolean expression (called a conjunct) that involves no ANDs. The advantage of CNF expression is true only if every conjunct is true; equivalently, it is false if any conjunct is false. Since AND is commutative, the optimizer can evaluate the individual conjuncts in any order it likes; in particular, it can do them in order of increasing difficulty. As soon as it finds one that is false, the whole process can stop. Furthermore, in a parallel-processing system, it might even be possible to evaluate all of the conjuncts in parallel. Again, as soon as one is found that is false, the whole process can stop.

11 It follows from this subsection and its predecessor that the optimizer needs to know how general properties such as distributivity apply not only to relational operators such as join, but also to comparison operators such as >, Boolean operators such as AND & OR, arithmetic operators such as +, and so on. Choice of Evaluation Plans: Generation of expressions is only part of the query-optimization process, since each operation in the expression can be implemented with different algorithms. An evaluation plan is therefore needed to define exactly what algorithm should be used for each operation, and how the execution of the operations should be coordinated. As we have seen, several different algorithms can be used for each relational operation, giving rise to alternative evaluation plans. Further, decisions about pipelining have to be made. In the figure, the edges from the selection operations to the merge join operation are marked as pipelined; pipelining is feasible if the selection operations generate their output sorted on the Join attributes. They would do so if the indices on branch and account store records with equal values for the index attributes sorted by branch_name. Interaction of Evaluation Techniques: One way to choose an evaluation plan for a query expression is simply to choose for each operation the cheapest algorithm for evaluating it. We can choose any ordering of the operations that ensures that operations lower in the tree are executed before operations higher in the tree. However, choosing the cheapest algorithm for each operation independently is not necessarily a good idea. Although a merge join at a given level may be costlier

12 than a hash join, it may provide a Sorted Output that makes evaluating a later operation (such as duplicate eliminations, intersection, or another merge join) cheaper. Similarly, a nested loop join with indexing may provide opportunities for pipelining the results to the next operation, and thus may be Useful even if it is not the cheapest way of performing the Join. To choose the best overall algorithm, we must consider even non-optimal algorithms for individual operations. Thus, in addition to considering alternative expressions for a query, we must also consider alternative algorithms for each operation in an expression. We can use rules much like the equivalence rules to define what algorithms can be used for each operation, and Whether its result can be pipelined or must be materialized. We can use these rules to generate all the query-evaluation plans for a given expression. Depending upon the indices available, certain selection operations can be evaluated using only an index without accessing the relation itself. That still leaves the problem of choosing the best evaluating plan for a query. There are two broad approaches: The first searches all the plans, and chooses the best plan in a cost based fashion. The second uses heuristics to choose a plan. Practically query optimizers incorporate elements of both approaches. Cost-Based Optimization: A cost-based optimizer generates a range of query-evaluation plans from the given query by using the equivalence rules, and chooses the one with the least cost. For a complex query, the number of different query plans that are equivalent to a given plan can be large. As an illustration, consider the expression r1 r2 rn where the joins are expressed without any ordering. With n = 3, there are 12 different join orderings: r1 (r2 r3) r1 (r3 r2) (r2 r3) r1 (r3 r2) r1 r2 (r1 r3) r2 (r3 r1) (r1 r3) r2 (r3 r1) r2 r3 (r1 r2) r3 (r2 r1) (r1 r2) r3 (r2 r1) r3 In general, with n relations, there are (2(n - 1))! / (n - 1)! different join orders. For joins involving small numbers of relations, this number is acceptable; for example, with n = 5, the number is However, as n increases, this number rises quickly. With n = 7, the number is 665,280; with n, = 10, the number is greater than 17.6 billion! Luckily, it is not necessary to generate the entire expressions equivalent to a given expression. For example, suppose we want to find the best join order of the form (r1 r2 r3) r4 r5 which represents all join orders where r1, r2, and r3 are joined first (in some order), and the result is joined (in some order) with r4 and r5. There are 12 different join orders for computing r1 r2 r3, and 12 orders for computing the join of this result

13 with r4 and r5. Thus, there appear to be 144 join orders to examine. However, once we have found the best join order for the subset of relations {r1, r2, r3}, we can use that order for further joins with r4 and r5, and can ignore all costlier join orders of r1 r2 r3. Thus, instead of 144 choices to examine, we need to examine only choices. Using this idea, we can develop a dynamic-programming algorithm for finding optimal join orders. Dynamic-programming algorithms store results of computations and reuse them, a procedure that can reduce execution time greatly. The procedure stores the evaluation plans it computes in an associative array bestplan, which is indexed by sets of relations. Each element of the associative array contains two components: the cost of the best plan of S, and the plan itself. The value of bestplan[s].cost is assumed to be initialized to if bestplan[s] has not yet been computed. Dynamic-programming algorithm for join order optimization. procedure FindBestPlan (S) if (bestplan[s].cost <> ) /* bestplan[s] already computed */ return bestplan[s] if (S contains only 1 relation) set bestplan[s].plan and bestplan[s].cost based on best way of accessing S else for each non-empty subset S1 of S such that S1<> S P1 = FindBestPlan (S1) P2 = FindBestPlan (S - S1) A = best algorithm for joining results of P1 and P2 cost = P1.cost + P2.cost + cost of A if cost < bestplan[s].cost return bestplan[s] bestplan[s].cost = cost bestplan[s].plan = execute P1.plan; execute P2.plan; join results of P1 and P2 using A The procedure first checks if the best plan for computing the join of the given set of relations S has been computed already (and stored in the associative array bestplan); if so, it returns the already computed plan. If S contains only one relation, the best way of accessing S (taking selections on S, if any, into account) is recorded in bestplan. This may involve using an index to identify tuples, and then fetching the tuples (often referred to as an index scan), or scanning the entire relation (often referred to as a relation scan). Otherwise, the procedure tries every way of dividing S into two disjoint subsets. For each division, the procedure recursively finds the best plans for each of

14 the two subsets, and then computes the cost of the overall plan by using that division. The procedure picks the cheapest plan from among all the alternatives for dividing S into two sets. The cheapest plan and its cost are stored in the array bestplan, and returned by the procedure. The time complexity of the procedure can be shown to be O (3 n ). Actually the order in which tuples are generated by the join of a set of relations is also important for finding the best overall join order, since it can affect the cost of further joins (for instance, if merge join is used). A particular sort order of the tuples is said to be an interesting sort order if it could be useful for a later operation. For instance, generating the result of r1 r2 r3 sorted on the attributes common with r4 and r5 may be useful, but generating it sorted on the attributes common to only r1 and r2 is not useful. Using merge join for computing r1 r2 r3 may be costlier than using some other join technique, but may provide an output sorted in an interesting sort order. Hence, it is not sufficient to find the best join order for each subset of the set of n given relations. Instead, we have to find the best join order for each subset, for each interesting sort order of the join result for that subset. The number of subsets of n relations is 2 n. The number of interesting sort orders is generally not large. Thus, about 2 n join expressions need to be stored. The dynamic-programming algorithm for finding the best join order can be easily extended to handle sort orders. The cost of the extended algorithm depends on the number of interesting orders for each subset of relations; since this number has been found to be small in practice, the cost remains at O (3 n ). With n = 10, this number is around 59,000, which is much better than the 17.6 billion different join orders. More important, the storage required is much less than before, since we need to store only one join order for each interesting sort order of each of 1024 subsets of r1,..., r10. Although both numbers still increase rapidly with n, commonly occurring joins usually have less than 10 relations, and can be handled easily. We can use several techniques to reduce further the cost of searching through a large number of plans. For instance, when examining the plans for an expression, we can terminate after we examine only a part of the expression, if we determine that the cheapest plan for that part is already costlier than the cheapest evaluation plan for a full expression examined earlier. Similarly, suppose that we determine that the cheapest way of evaluating a sub-expression is costlier than the cheapest evaluation plan for a full expression examined earlier. Then, no full expression involving that sub-expression needs to be examined. We can further reduce the number of evaluation plans that need to be considered fully by first making a heuristic guess of a good plan, and estimating that plan's cost. Then, only a few competing plans will require a full analysis of cost. These optimizations can reduce the overhead of query optimization significantly. The intricacies of SQL introduce a good deal of complexity into query optimizers. The approach to optimization described above concentrates on join-order

15 optimization. In contrast, the optimizers used in some other systems, notably Microsoft SQL Server, are based on equivalence rules. The benefit of using equivalence rules is that it is easy to extend the optimizer with new rules. For example, nested queries can be represented using extended relational-algebra constructs, and transformations of nested queries can be expressed as equivalence rules. To make the approach work efficiently requires efficient techniques for detecting duplicate derivations, and a form of dynamic programming to avoid reoptimizing the same sub-expressions. This approach was pioneered by the Volcano research project. Advanced Types of Optimization: In this section, we attempt to provide a brief glimpse of advanced types of optimization that researchers have proposed over the past few years. The descriptions are based on examples only; further details may be found in the references provided. Furthermore, there are several issues that are not discussed at all due to lack of space, although much interesting work has been done on them, e.g., nested query optimization, rule-based query optimization, query optimizer generators, object-oriented query optimization, optimization with materialized views, heterogeneous query optimization, recursive query optimization, aggregate query optimization, optimization with expensive selection predicates, and query optimizer validation. 1. Semantic Query Optimization: Semantic query optimization is a form of optimization mostly related to the Rewriter module. The basic idea lies in using integrity constraints defined in the database to rewrite a given query into semantically equivalent ones [Kin81]. These can then be optimized by the Planner as regular queries and the most efficient plan among all can be used to answer the original query. As a simple example, using a hypothetical SQL-like syntax, consider the following integrity constraint: ASSERT sal-constraint ON emp: sal >100K WHERE job = Sr. Programmer". Also consider the following query: SELECT name, floor FROM emp, dept WHERE emp.dno = dept.dno AND job = Sr. Programmer". Using the above integrity constraint, the query can be rewritten into a semantically equivalent one to include a selection on sal: SELECT name, floor FROM emp, dept

16 WHERE emp.dno = dept.dno AND job = \Sr. Programmer" AND sal>100k. Having the extra selection could help tremendously in finding a fast plan to answer the query if the only index in the database is a B+-tree on emp.sal. On the other hand, it would certainly be a waste if no such index exists. For such reasons, all proposals for semantic query optimization present various heuristics or rules on which rewritings have the potential of being beneficial and should be applied and which not. 2. Global Query Optimization: So far, we have focused our attention to optimizing individual queries. Quite often, however, multiple queries become available for optimization at the same time, e.g., queries with unions, queries from multiple concurrent users, queries embedded in a single program, or queries in a deductive system. Instead of optimizing each query separately, one may be able to obtain a global plan that, although possibly suboptimal for each individual query, is optimal for the execution of all of them as a group. Several techniques have been proposed for global query optimization [Sel88]. As a simple example of the problem of global optimization consider the following two queries: SELECT name, floor FROM emp, dept WHERE emp.dno = dept.dno AND job = Sr. Programmer", SELECT name FROM emp, dept WHERE emp.dno = dept.dno AND budget > 1M. Depending on the sizes of the emp and dept relations and the selectivity of the selections, it may well be that computing the entire join once and then applying separately the two selections to obtain the results of the two queries is more efficient than doing the join twice, each time taking into account the corresponding selection. Developing Planner modules that would examine all the available global plans and identify the optimal one is the goal of global/multiple query optimizers. 3. Parametric/Dynamic Query Optimization: As mentioned earlier, embedded queries are typically optimized once at compile time and are executed multiple times at run time. Because of this temporal separation between optimization and execution, the values of various parameters that are used during optimization may be very different during execution. This may make the chosen plan invalid (e.g., if indices used in the plan are no longer available) or simply not optimal (e.g., if the number of available buffer pages or operator selectivity s have changed, or if new indices have become available). To address this issue, several techniques have been proposed that use various search strategies (e.g., randomized algorithms or the strategy of Volcano) to optimize queries as much as possible at compile time taking into account all possible values that interesting parameters may have at run time. These techniques use the actual parameter values

17 at run time, and simply pick the plan that was found optimal for them with little or no overhead. Of a drastically different flavor is the technique of Rdb/VMS [Ant93], where by dynamically monitoring how the probability distribution of plan costs changes, plan switching may actually occur during query execution. Estimation of Query-Processing Cost: 1. To choose a strategy based on reliable information, the database system may store statistics for each relation r: o nr - The number of tuples in r. o sr - The size in bytes of a tuple of r (for fixed-length records). o V (A, r) - the number of distinct values that appear in relation r for attribute A. 2. The first two quantities allow us to estimate accurately the size of a Cartesian product. o The Cartesian product r s contains nr ns tuples. o Each tuple of r s occupies sr + ss bytes. o The third statistic is used to estimate how many tuples satisfy a selection predicate of the form o <attribute-name> = <value> o We need to know how often each value appears in a column. o If we assume each value appears with equal probability, then σa = a (r) is estimated to have tuples. o This may not be the case, but it is a good approximation of reality in many relations. o We assume such a uniform distribution for the rest of this chapter. o Estimation of the size of a natural join is more difficult. o Let r1 (R1) and r1 (R1) be relations on schemes R1 and R2. o If R1 R2 = Φ (no common attributes), then r1 can estimate the size of this accurately. r2 is the same as r s and we o If R1 R2 is a key for R1, then we know that a tuple of r2 will join with exactly one tuple of r1. o Thus the number of tuples in r1 r2 will be no greater than nr2. o If R1 R2 is not a key for R1 or R2, things are more difficult.

18 o We use the third statistic and the assumption of uniform distribution. o Assume R1 R2 = {Λ} o We assume there are tuples in r2 with an A value of t [A] for tuple t in r1. o So tuple t of r1 produces tuples in r1 r2 3. Considering all the tuples in r1, we estimate that there are tuples in total in r1 r2 4. If we reverse the roles of r1 and r2 in this equation, we get a different estimate if V (Λ, r1)<> V (Λ, r2) o If this occurs, there are likely to be some dangling tuples that do not participate in the join. o Thus the lower estimate is probably the better one. o This estimate may still be high if the V (Λ, r1) values in r1 have few values in common with the V (Λ, r2) values in r2. o However, it is unlikely that the estimate is far off, as dangling tuples are likely to be a small fraction of the tuples in a real world relation. 5. To maintain accurate statistics, it is necessary to update the statistics whenever a relation is modified. This can be substantial, so most systems do this updating during periods of light load on the system. Guidelines: For any production database, SQL query performance becomes an issue sooner or later. Having long-running queries not only consumes system resources that makes the server and application run slowly, but also may lead to table locking and data corruption issues. So, query optimization becomes an important task. First, we offer some guiding principles for query optimization: 1. Understand how your database is executing your query Nowadays all databases have their own query optimizer, and offer a way for users to understand how a query is executed. For example, which index from which table is

19 being used to execute the query? The first step to query optimization understands what the database is doing. Different databases have different commands for this. For example, in MySql, one can use "EXPLAIN [SQL Query]" keyword to see the query plan. In Oracle, one can use "EXPLAIN PLAN FOR [SQL Query]" to see the query plan. 2. Retrieve as little data as possible The more data returned from the query, the more resources the database needs to expand to process and store these data. So for example, if you only need to retrieve one column from a table, do not use 'SELECT *'. 3. Store intermediate results Sometimes logic for a query can be quite complex. Often, it is possible to achieve the desired result through the use of sub queries, inline views, and UNION-type statements. For those cases, the intermediate results are not stored in the database, but are immediately used within the query. This can lead to performance issues, especially when the intermediate results have a large number of rows. The way to increase query performance in those cases is to store the intermediate results in a temporary table, and break up the initial SQL statement into several SQL statements. In many cases, you can even build an index on the temporary table to speed up the query performance even more. Granted, this adds a little complexity in query management (i.e., the need to manage temporary tables), but the speedup in query performance is often worth the trouble. Below are several specific query optimization strategies. Use Index Using an index is the first strategy one should use to speed up a query. In fact, this strategy is so important that index optimization is also discussed. Aggregate Table Pre-populating tables at higher levels so fewer amounts of data need to be parsed. Vertical Partitioning Partition the table by columns. This strategy decreases the amount of data a SQL query needs to process. Horizontal Partitioning Partition the table by data value, most often time. This strategy decreases the amount of data a SQL query needs to process.

20 De-normalization The process of de-normalization combines multiple tables into a single table. This speeds up query performance because fewer table joins are needed. Server Tuning Each server has its own parameters, and often tuning server parameters so that it can fully take advantage of the hardware resources can significantly speed up query performance. References: An Introduction to Database Systems, Eight Edition - C. J. Date Database System Concepts, Fifth Edition Silberschatz, Korth, Sudharshan

Evaluation of Expressions

Evaluation of Expressions Query Optimization Evaluation of Expressions Materialization: one operation at a time, materialize intermediate results for subsequent use Good for all situations Sum of costs of individual operations

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

8. Query Processing. Query Processing & Optimization

8. Query Processing. Query Processing & Optimization ECS-165A WQ 11 136 8. Query Processing Goals: Understand the basic concepts underlying the steps in query processing and optimization and estimating query processing cost; apply query optimization techniques;

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 4: Query Optimization Chapter 4 Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational Expressions Dynamic Programming for

More information

Chapter 13: Query Optimization

Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

ICS 434 Advanced Database Systems

ICS 434 Advanced Database Systems ICS 434 Advanced Database Systems Dr. Abdallah Al-Sukairi sukairi@kfupm.edu.sa Second Semester 2003-2004 (032) King Fahd University of Petroleum & Minerals Information & Computer Science Department Outline

More information

Optimization Overview. Overview of Query Optimization

Optimization Overview. Overview of Query Optimization Optimization Overview Instructor: Sharma Chakravarthy sharma@cse.uta.edu The University of Texas @ Arlington Database Management Systems, S. Chakravarthy 1 Overview of Query Optimization Input: Sql query

More information

Inside the PostgreSQL Query Optimizer

Inside the PostgreSQL Query Optimizer Inside the PostgreSQL Query Optimizer Neil Conway neilc@samurai.com Fujitsu Australia Software Technology PostgreSQL Query Optimizer Internals p. 1 Outline Introduction to query optimization Outline of

More information

Steps in Query Processing Query Query Parser Parsed query. Query Evaluation and Optimization. An Overview. Plan Generator

Steps in Query Processing Query Query Parser Parsed query. Query Evaluation and Optimization. An Overview. Plan Generator Query Evaluation and Optimization An Overview Web Forms Application FEs SQL Interface Plan Executor Operator Evaluator Parser Optimizer Query Evaluation Engine Concurrency Control Transaction Manager Lock

More information

OVERVIEW OF QUERY EVALUATION

OVERVIEW OF QUERY EVALUATION 12 OVERVIEW OF QUERY EVALUATION Exercise 12.1 Briefly answer the following questions: 1. Describe three techniques commonly used when developing algorithms for relational operators. Explain how these techniques

More information

Overview of Query Evaluation

Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Comp 521 Files and Databases Fall 2010 1 Overview of Query Evaluation Query: SELECT sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid = 100 AND S.rating

More information

Plan for the Query Optimization topic

Plan for the Query Optimization topic VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui COMP302 Database Systems Plan for the Query Optimization topic COMP302 Database Systems Query Optimisation_04 1 What is Query

More information

! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions

! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions Basic Steps in Query

More information

Part 8. Implementation

Part 8. Implementation Part 8 Implementation Performance Efficiency Depends On: Physical data storage Use of indices Query optimization Compiled vs. interpreted execution Ability to predict database usage, communicate that prediction

More information

Part 19. Implementation

Part 19. Implementation Part 19 Implementation Performance Efficiency Depends On: Physical data storage Use of indices Query optimization Compiled vs. interpreted execution Ability to predict database usage, communicate that

More information

Database System Concepts

Database System Concepts Chapter 14: Optimization Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2009/2010 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth and Sudarshan.

More information

Overview of Query Evaluation. Overview of Query Evaluation

Overview of Query Evaluation. Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

SQL Query Evaluation. Winter 2006-2007 Lecture 23

SQL Query Evaluation. Winter 2006-2007 Lecture 23 SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution

More information

Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

Topics in basic DBMS course

Topics in basic DBMS course Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch

More information

Query Processing, optimization, and indexing techniques

Query Processing, optimization, and indexing techniques Query Processing, optimization, and indexing techniques What s s this tutorial about? From here: SELECT C.name AS Course, count(s.students) AS Cnt FROM courses C, subscription S WHERE C.lecturer = Calders

More information

Query processing. Tore Risch Information Technology Uppsala University

Query processing. Tore Risch Information Technology Uppsala University Query processing Tore Risch Information Technology Uppsala University 2011-03-08 What is query processing? A given SQL query is translated by the query processor into a low level program called an execution

More information

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the

More information

Relational Query Optimization. Chapter 15

Relational Query Optimization. Chapter 15 Relational Query Optimization Chapter 15 Highlights of System R Optimizer Impact: Most widely used currently; works well for < 10 joins. Cost estimation: Approximate art at best. Statistics, maintained

More information

Classical query optimization

Classical query optimization Chapter 2 Classical query optimization This chapter presents a preliminary study of the general aspects that query evaluation and more specifically, query optimization involves, as well as the existent

More information

Query Processing C H A P T E R12. Practice Exercises

Query Processing C H A P T E R12. Practice Exercises C H A P T E R12 Query Processing Practice Exercises 12.1 Assume (for simplicity in this exercise) that only one tuple fits in a block and memory holds at most 3 blocks. Show the runs created on each pass

More information

CS 220 Relational Algebra 2/3/2015

CS 220 Relational Algebra 2/3/2015 CS 220 Relational Algebra 2/3/2015 Relational Query Languages Query = retrieval program Language examples: Theoretical: 1. Relational Algebra 2. Relational Calculus a. tuple relational calculus (TRC) b.

More information

Query Optimization! Chapter 13: Query Optimization!

Query Optimization! Chapter 13: Query Optimization! Query Optimization 13.1 Chapter 13: Query Optimization Introduction Execution plans Transformation of Relational Expressions Catalog Information for Cost Estimation Statistical Information for Cost Estimation

More information

L12: Query Optimization

L12: Query Optimization Note: Slides whose titles are put in () are for your reference only. Details will be covered in COMP9315. L12: heuristicbased q.o. parse convert SQL query parse tree logical query plan apply laws improved

More information

SQL Query Performance Tuning: Tips and Best Practices

SQL Query Performance Tuning: Tips and Best Practices SQL Query Performance Tuning: Tips and Best Practices Pravasini Priyanka, Principal Test Engineer, Progress Software INTRODUCTION: In present day world, where dozens of complex queries are run on databases

More information

Analysis of Query Optimization Techniques in Databases

Analysis of Query Optimization Techniques in Databases Analysis of Query Optimization Techniques in Databases Jyoti Mor M. Tech Student, CSE Dept. Indu Kashyap Assistant Professor, CSE Dept. R. K. Rathy, PhD. Professor, CSE Department ABSTRACT Query optimization

More information

Query Processing + Optimization: Outline

Query Processing + Optimization: Outline Query Processing + Optimization: Outline Operator Evaluation Strategies Query processing in general Selection Join Query Optimization Heuristic query optimization Cost-based query optimization Query Tuning

More information

CS2255 DATABASE MANAGEMENT SYSTEM QUESTION BANK

CS2255 DATABASE MANAGEMENT SYSTEM QUESTION BANK SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF INFORMATION TECHNOLOGY CS2255 DATABASE MANAGEMENT SYSTEM QUESTION BANK

More information

Q4. What are data model? Explain the different data model with examples. Q8. Differentiate physical and logical data independence data models.

Q4. What are data model? Explain the different data model with examples. Q8. Differentiate physical and logical data independence data models. FAQs Introduction to Database Systems and Design Module 1: Introduction Data, Database, DBMS, DBA Q2. What is a catalogue? Explain the use of it in DBMS. Q3. Differentiate File System approach and Database

More information

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc.

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc. Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc. Table of Contents Overview...................................................................................

More information

Query Optimization Over Web Services Using A Mixed Approach

Query Optimization Over Web Services Using A Mixed Approach Query Optimization Over Web Services Using A Mixed Approach Debajyoti Mukhopadhyay 1, Dhaval Chandarana 1, Rutvi Dave 1, Sharyu Page 1, Shikha Gupta 1 1 Maharashtra Institute of Technology, Pune 411038

More information

Can you design an algorithm that searches for the maximum value in the list?

Can you design an algorithm that searches for the maximum value in the list? The Science of Computing I Lesson 2: Searching and Sorting Living with Cyber Pillar: Algorithms Searching One of the most common tasks that is undertaken in Computer Science is the task of searching. We

More information

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We

More information

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level) Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster

More information

DBMS LAB MANUAL PREPARED BY JAGRUTI SAVE

DBMS LAB MANUAL PREPARED BY JAGRUTI SAVE DBMS LAB MANUAL PREPARED BY JAGRUTI SAVE 1 EXPERIMENT NO: 1 AIM: Preparing an ER diagram for given database and Conversion from ER diagram to tables THEORY: Database : A Database is a collection of interrelated

More information

Physical Database Design and Tuning

Physical Database Design and Tuning Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence

More information

Relational Query Optimization 2

Relational Query Optimization 2 Relational Query Optimization 2 R&G - Chapter 14 For ease and speed in doing a thing do not give the work lasting solidity or exactness of beauty. Plutarch, Life of Pericles Query Optimization Query can

More information

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products Chapter 3 Cartesian Products and Relations The material in this chapter is the first real encounter with abstraction. Relations are very general thing they are a special type of subset. After introducing

More information

The Relational Algebra

The Relational Algebra The Relational Algebra The relational algebra is very important for several reasons: 1. it provides a formal foundation for relational model operations. 2. and perhaps more important, it is used as a basis

More information

1. Physical Database Design in Relational Databases (1)

1. Physical Database Design in Relational Databases (1) Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer. DBMS Architecture INSTRUCTION OPTIMIZER Database Management Systems MANAGEMENT OF ACCESS METHODS BUFFER MANAGER CONCURRENCY CONTROL RELIABILITY MANAGEMENT Index Files Data Files System Catalog BASE It

More information

Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 10 Synthesis: Part 3 I have talked about two-level

More information

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel

More information

MS SQL Performance (Tuning) Best Practices:

MS SQL Performance (Tuning) Best Practices: MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware

More information

Reading Assignment 5 An Overview of Query Optimization in Relational Systems

Reading Assignment 5 An Overview of Query Optimization in Relational Systems Reading Assignment 5 An Overview of Query Optimization in Relational Systems José Filipe Barbosa de Carvalho (jose.carvalho@fe.up.pt) 5th December 2007 Advanced Database Systems Technische Universität

More information

Boolean Operations on Intervals and Axis-Aligned Rectangles

Boolean Operations on Intervals and Axis-Aligned Rectangles Boolean Operations on Intervals and Axis-Aligned Rectangles David Eberly Geometric Tools, LLC http://www.geometrictools.com/ Copyright c 1998-2016. All Rights Reserved. Created: July 28, 2008 Contents

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement. COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations

More information

2) Write in detail the issues in the design of code generator.

2) Write in detail the issues in the design of code generator. COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage

More information

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database

More information

Performance Basics; Computer Architectures

Performance Basics; Computer Architectures 8 Performance Basics; Computer Architectures 8.1 Speed and limiting factors of computations Basic floating-point operations, such as addition and multiplication, are carried out directly on the central

More information

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed

More information

Introduction to DBMS

Introduction to DBMS CHAPTER 1 Introduction to DBMS In this chapter, you will learn 1.0 Introduction 1.1 History of Database Management System 1.2 Database Architecture 1.3 Database Management System Users 1.4 Role of DBMS

More information

3. Relational Model and Relational Algebra

3. Relational Model and Relational Algebra ECS-165A WQ 11 36 3. Relational Model and Relational Algebra Contents Fundamental Concepts of the Relational Model Integrity Constraints Translation ER schema Relational Database Schema Relational Algebra

More information

SQL QUERY EVALUATION. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 12

SQL QUERY EVALUATION. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 12 SQL QUERY EVALUATION CS121: Introduction to Relational Database Systems Fall 2015 Lecture 12 Query Evaluation 2 Last time: Began looking at database implementation details How data is stored and accessed

More information

The Import & Export of Data from a Database

The Import & Export of Data from a Database The Import & Export of Data from a Database Introduction The aim of these notes is to investigate a conceptually simple model for importing and exporting data into and out of an object-relational database,

More information

Relational Algebra The Relational Algebra and Relational Calculus. Relational Query Languages

Relational Algebra The Relational Algebra and Relational Calculus. Relational Query Languages The Relational Algebra and Relational Calculus Relational Algebra Slide 6-2 Relational Query Languages Query languages Allow manipulation and retrieval of data Not like programming languages Not intend

More information

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No. # 02 Lecture No. # 05 Run-time Environments-Part 3 and Local Optimizations

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute

More information

[Refer Slide Time: 05:10]

[Refer Slide Time: 05:10] Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture

More information

Record Storage and Primary File Organization

Record Storage and Primary File Organization Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records

More information

CSE 444 Practice Problems. Query Optimization

CSE 444 Practice Problems. Query Optimization CSE 444 Practice Problems Query Optimization 1. Query Optimization Given the following SQL query: Student (sid, name, age, address) Book(bid, title, author) Checkout(sid, bid, date) SELECT S.name FROM

More information

Distributed Data Management

Distributed Data Management Introduction Distributed Data Management Involves the distribution of data and work among more than one machine in the network. Distributed computing is more broad than canonical client/server, in that

More information

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium. Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino Database Management Data Base and Data Mining Group of tania.cerquitelli@polito.it A.A. 2014-2015 Optimizer objective A SQL statement can be executed in many different ways The query optimizer determines

More information

Oracle Database 11g: SQL Tuning Workshop

Oracle Database 11g: SQL Tuning Workshop Oracle University Contact Us: + 38516306373 Oracle Database 11g: SQL Tuning Workshop Duration: 3 Days What you will learn This Oracle Database 11g: SQL Tuning Workshop Release 2 training assists database

More information

The MonetDB Architecture. Martin Kersten CWI Amsterdam. M.Kersten 2008 1

The MonetDB Architecture. Martin Kersten CWI Amsterdam. M.Kersten 2008 1 The MonetDB Architecture Martin Kersten CWI Amsterdam M.Kersten 2008 1 Try to keep things simple Database Structures Execution Paradigm Query optimizer DBMS Architecture M.Kersten 2008 2 End-user application

More information

Optimization Techniques in C. Team Emertxe

Optimization Techniques in C. Team Emertxe Optimization Techniques in C Team Emertxe Optimization Techniques Basic Concepts Programming Algorithm and Techniques Optimization Techniques Basic Concepts What is Optimization Methods Space and Time

More information

Distributed Database Design and Distributed Query Execution. Designing with distribution in mind: top-down

Distributed Database Design and Distributed Query Execution. Designing with distribution in mind: top-down Distributed Database Design and Distributed Query Execution Designing with distribution in mind: top-down 1 Data Fragmentation and Placement Fragmentation: How to split up the data into smaller fragments?

More information

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner Understanding SQL Server Execution Plans Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author

More information

1 11/11/2002. σ P.Dnum=D.Dnum and D.Dmgrssn=E.SSN and P.Plocation= Stafford 3 11/11/ /11/2002

1 11/11/2002. σ P.Dnum=D.Dnum and D.Dmgrssn=E.SSN and P.Plocation= Stafford 3 11/11/ /11/2002 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 17 Query optimization algorithms Query tree A representation of a select statement Canonical form of

More information

OVERVIEW OF STORAGE AND INDEXING

OVERVIEW OF STORAGE AND INDEXING 8 OVERVIEW OF STORAGE AND INDEXING Exercise 8.1 Answer the following questions about data on external storage in a DBMS: 1. Why does a DBMS store data on external storage? 2. Why are I/O costs important

More information

Repetition and Loops. Additional Python constructs that allow us to effect the (1) order and (2) number of times that program statements are executed.

Repetition and Loops. Additional Python constructs that allow us to effect the (1) order and (2) number of times that program statements are executed. New Topic Repetition and Loops Additional Python constructs that allow us to effect the (1) order and (2) number of times that program statements are executed. These constructs are the 1. while loop and

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi Lecture - 15 Limited Resource Allocation Today we are going to be talking about

More information

DBMS / Business Intelligence, SQL Server

DBMS / Business Intelligence, SQL Server DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.

More information

arm DBMS File Organization, Indexes 1. Basics of Hard Disks

arm DBMS File Organization, Indexes 1. Basics of Hard Disks DBMS File Organization, Indexes 1. Basics of Hard Disks All data in a DB is stored on hard disks (HD). In fact, all files and the way they are organised (e.g. the familiar tree of folders and sub-folders

More information

PART OF THE PICTURE: Computer Architecture

PART OF THE PICTURE: Computer Architecture PART OF THE PICTURE: Computer Architecture 1 PART OF THE PICTURE: Computer Architecture BY WILLIAM STALLINGS At a top level, a computer consists of processor, memory, and I/O components, with one or more

More information

Chapter 5: Overview of Query Processing

Chapter 5: Overview of Query Processing Chapter 5: Overview of Query Processing Query Processing Overview Query Optimization Distributed Query Processing Steps Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of

More information

Optimizing Performance. Training Division New Delhi

Optimizing Performance. Training Division New Delhi Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,

More information

SQL Server. 2012 for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

SQL Server. 2012 for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach TRAINING & REFERENCE murach's SQL Server 2012 for developers Bryan Syverson Joel Murach Mike Murach & Associates, Inc. 4340 N. Knoll Ave. Fresno, CA 93722 www.murach.com murachbooks@murach.com Expanded

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs. Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design

More information

Chapter 4: SQL. Basic Structure

Chapter 4: SQL. Basic Structure Chapter 4: SQL Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Joined Relations Data Definition Language Embedded SQL

More information

MOC 20461C: Querying Microsoft SQL Server. Course Overview

MOC 20461C: Querying Microsoft SQL Server. Course Overview MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server

More information

www.gr8ambitionz.com

www.gr8ambitionz.com Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1

More information

HEURISTIC BASED QUERY OPTIMIZATION

HEURISTIC BASED QUERY OPTIMIZATION HEURISTIC BASED QUERY OPTIMIZATION Vishal Hatmode 1, Prof. Sonali Rangdale 2 Department of Information Technology, Siddhant College of Engineering, Pune, India 1,2 Abstract: In this paper, we will enlist

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

NUMBERING SYSTEMS C HAPTER 1.0 INTRODUCTION 1.1 A REVIEW OF THE DECIMAL SYSTEM 1.2 BINARY NUMBERING SYSTEM

NUMBERING SYSTEMS C HAPTER 1.0 INTRODUCTION 1.1 A REVIEW OF THE DECIMAL SYSTEM 1.2 BINARY NUMBERING SYSTEM 12 Digital Principles Switching Theory C HAPTER 1 NUMBERING SYSTEMS 1.0 INTRODUCTION Inside today s computers, data is represented as 1 s and 0 s. These 1 s and 0 s might be stored magnetically on a disk,

More information

Data Management for Portable Media Players

Data Management for Portable Media Players Data Management for Portable Media Players Table of Contents Introduction...2 The New Role of Database...3 Design Considerations...3 Hardware Limitations...3 Value of a Lightweight Relational Database...4

More information

COLORED GRAPHS AND THEIR PROPERTIES

COLORED GRAPHS AND THEIR PROPERTIES COLORED GRAPHS AND THEIR PROPERTIES BEN STEVENS 1. Introduction This paper is concerned with the upper bound on the chromatic number for graphs of maximum vertex degree under three different sets of coloring

More information

CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY

CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY 2.1 Introduction In this chapter, I am going to introduce Database Management Systems (DBMS) and the Structured Query Language (SQL), its syntax and usage.

More information

Oracle Database 10g: Introduction to SQL

Oracle Database 10g: Introduction to SQL Oracle University Contact Us: 1.800.529.0165 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database technology.

More information