Query Optimization. Coming to Introduction to Database Systems by C. J. Date, he discussed the automatic optimization.

Size: px
Start display at page:

Download "Query Optimization. Coming to Introduction to Database Systems by C. J. Date, he discussed the automatic optimization."

Transcription

1 Query Optimization Introduction: Query optimization is a function of many relational database management systems in which multiple query plans for satisfying a query are examined and a good query plan is identified. This may or not be the absolute best strategy because there are many ways of doing plans. There is a trade-off between the amount of time spent figuring out the best plan and the amount running the plan. Different qualities of database management systems have different ways of balancing these two. Cost based query optimizers evaluate the resource footprint of various query plans and use this as the basis for plan selection. Typically the resources which are costed are CPU path length, amount of disk buffer space, disk storage service time, and interconnect usage between units of parallelism. The set of query plans examined is formed by examining possible access paths (e.g., primary index access, secondary index access, full file scan) and various relational table join techniques (e.g., merge join, hash join, product join). The search space can become quite large depending on the complexity of the SQL query. There are two types of optimization. These consist of logical optimization which generates a sequence of relational algebra to solve the query. In addition there is physical optimization which is used to determine the means of carrying out each operation. Coming to Introduction to Database Systems by C. J. Date, he discussed the automatic optimization. There are several reasons to say that optimizer might actually do better than Human. A good optimizer will have a wealth of information which a normal human user doesn t have, like certain statistical information. o Number of distinct values of each type. o Number of tuples currently appearing in each base relvar. o Number of distinct values currently appearing in each attribute in each base relvar. and so on. As a result, the optimizer is able to make more accurate assessment of the efficiency in any given strategy for implementing particular request. Thus, it is more likely to choose the most efficient implementation.

2 If the database statistics change over time, then a different strategy might be chosen; re-optimization might be required. In Relational system the reoptimization is trivial; it needs to reprocess the original query by system optimizer whereas in Non-Relational System, the re-optimization involves rewriting of the program. Optimization is a program. So it is much more patient than humans. It considers several hundreds of implementation strategies for given request. Human user would not consider more than three or four. Optimizer is designed by the skills and services of best human programmers. So it makes the scarce set of resources available to everybody in an efficient and cost effective manner. The above reasons support as evidence that relational requests are Optimizable- in fact strength of Relational systems. Motivating Example: Consider the shipment example, ((SP JOIN S) WHERE P# = P# ( P2 )) {SNAME} Consider database contains 100 suppliers and 10,000 shipments, of which 50 are of part P2. If the above query is to be executed without optimizing then following sequence will occur. JOIN SP and S (over S#): This step involves reading 10,000 shipments; reading 100 suppliers 10,000 times each; constructing 10,000 joined tuples; writing them back into disk. Restrict the result of step 1 to just the tuples for part 2: This step involves reading the joined tuples and produces a result consisting of 50 tuples. Project the result of step 2 over SNAME: This step produces the desired result. If the above example is performed with optimizing then following sequence will occur.

3 Restrict SP to just the tuples for part P2: This step involves the reading of 10,000 records once producing 50 tuples of part P2. Join Result of step 1 to S (over S#): This step involves reading 100 suppliers only once and produces the 50 joined tuples. Project the result of step 2 over SNAME: Desired result will be produced. From the above we can clearly see that the execution without optimization involves total of 1,030,000 tuple I/O s, whereas with optimization involves 10,100. If number of tuple I/O s is our measure then second procedure is 100 times better than first. We see that a simple change in the execution algorithm (doing restriction and then joining instead of joining and then restricting) has produced a dramatic improvement in performance. Performance would improve more dramatically if we include hashing or indexing on P#. The number of shipments read in step 1 would reduce from 10,000 to 50 and number of suppliers read in step 2 would reduce from 100 to 50; which is almost 10,000 times better than the original execution. I.e. if un-optimized query took 3 hours to execute, the optimized query using hashing or indexing will take just over 1 sec. An overview of Query Optimization: We can identify the four broad stages in Query Processing: 1. Cast the query into internal form. 2. Convert to colonial form. 3. Choose candidate low-level procedures. 4. Generate Query plans and choose the cheapest.

4 Query Processing Overview Cast the query into internal form: The original query is converted into some internal representation that is more suitable for machine manipulations; thus eliminating the external considerations (such as syntax) and paving the subsequent stages in overall process. View Processing is also done during this stage. What formalism should the internal representation based on? Whatever formalism chosen it must be rich enough to represent all the queries in external query language. It should be neutral, in sense that it should not prejudice the subsequent choices. Internal form typically chosen is some kind of abstract syntax tree or query tree.

5 Query tree for Get names of suppliers who supplies part P2 However for internal representation, it will be convenient to choose the formalisms we are familiar with: namely Relational Algebra, Relational Calculus. The algebraic expression for the above tree will be ((SP JOIN S) WHERE P# = P# ( P2 )) {SNAME} Convert to colonial form: In this stage, optimizer performs a number of optimizations that guarantee to be good, regardless the actual data and its physical path. The point is relational language allows all but simplest of queries can be expressed in variety of ways and not by replacing A=B by B=A, etc. And the performance is not dependent on the way the user writes. The next step in processing is converting the internal representation into colonial form, with objective to eliminate such superficial distinctions. Given a set Q of objects and a notion of equivalence among those objects, Subset C of Q is said to be canonical set of Q if every object q of Q is equivalent with exactly one c of C. In order to transform the result of stage 1 to equivalent but efficient form, the optimizer makes use of certain transformation rules. E.g. (A JOIN B) WHERE restriction on A Can be transformed into equivalent and efficient expression (A WHERE restriction on A) JOIN B Choose candidate low-level Procedures: After converting the internal representation into some more desirable form, the optimizer must decide how to execute this transformed query. At this stage all the data values, physical path, etc. Come into play.

6 The best strategy is to consider the query as a sequence of low level operations. The code to perform will require its input tuples to be sorted in some order. The output tuples of the preceding operation must be in sequence to input for the next operation. For each possible low level operation, the optimizer will have set of pre-defined implementation procedures. For example, Restriction operation has a set of implementation procedures: One is using Equality comparison. One where restriction attribute is indexed, One where restriction attribute is hashed. Next by using the catalog information regarding the current state of the database, the optimizer will choose one or more candidate procedures. The process is sometimes referred to as Access Path Selection. Generate query plans and choose the cheapest: The final stage of optimization process involves construction of set of candidate query plans, followed by best of those plans. Each query is built by combining the candidate procedures; One such procedure for each low level operation in the query. It is not a good idea to analyze all possible plans. So it is better to use some heuristic algorithm to set the bounds. It reduces the search space thereby referred as reducing search space. Choosing the cheapest plan obviously need a method to find the cost. In optimization the cost of the given plan is the sum of all the individual costs. The problem is cost depends on the size of the relation to be processed. Since the intermediate results will be generated during execution, it has to find cost of these intermediate results. But these results are dependent on actual data values. So accurate cost estimation is a difficult problem. Expression Transformation: In this session we describe some of the transformation rules that might be useful in stage 2 of optimization process. Explaining why they were useful with examples. Given a particular expression to transform, the application of one rule might generate an expression that could be transformed in accordance to other rule. Starting from one expression the optimizer will apply its transformation repeatedly until it finally arrives at an expression it could judge based on some set of heuristics.

7 Restrictions & Projections: It is better to do restriction before projection as it reduces the size of input to the projection and reduce the amount of data that might need to be sorted for duplicate elimination purposes. Distributivity: This transformation rule used in the previous example (transforming a join followed by a restriction into a restriction followed by a join) is actually a special case of Distributive law. In general f is said to be distributive over o if and only if f (A o B) = f (A) o f (B), for all A, B. In general arithmetic, for example, SQRT is distributive over multiplication, because SQRT (A * B) = SQRT (A) * SQRT (B)

8 So an arithmetic expression optimizer can replace either expression by other when doing arithmetic expression transformation. In counter example the SQRT is not distributive over Addition, as SQRT of A+B is not equal to SQRT (A) + SQRT (B). In Relational Algebra, restriction is distributive over union, intersection, and difference. It also distributes over join, if and only if the restriction condition consists, at its most complex, of two simple restriction conditions ANDed together, one for each of the two join operands. In the case of supplier s example, this requirement was indeed satisfied- in fact the condition was a simple restriction condition on just one of the operands- and so we could use the distributive law to replace the overall expression by a more efficient equivalent. The net effect was that we were able to do the restriction early. Doing the restriction early is a good idea, because it reduces the number of tuples to be scanned in the next operation in sequence and probably reduces the number of tuples in the output from the next operation too. Here are a couple more specific cases of distributive law, this time involving projection. First projection distributes over union and intersection but not difference. A and B must be of same type of course. Second, Projection also distributes over join as long as the projection retains all of the join attributes, thus: Here acl1 is the union of the join attributes and those attributes of acl that appear in A only, acl2 is the union of the join attributes and attributes of acl that appear in B only. These laws can be used to do projections early, which again is usually a good idea for reasons similar to those given previously for restrictions. Idempotent and Absorption:

9 Commutativity and Associativity: Computational Expressions: It is not just relational expressions that are subject to transformation laws. For instance, we have already indicated that certain transformations are valid for arithmetic expressions. Here is a specific example: The expression A * B + A * C Can be transformed into A * (B + C) By virtue of the fact that * distributes over +. A relational optimizer needs to know about such transformations because it will encounter such expressions in the context of the extend and summarize operators. Note, incidentally, that this example illustrates a slightly more general form of distributivity. Earlier, we defined distributivity in term of a monadic operator distributing over a dyadic operator; in the case at hand, however, * and + are both

10 dyadic operators. In general, the dyadic operator δ is said to be distributive over the dyadic operator Ο if and only if A δ (B Ο C) = (A δ B) Ο (A δ C) For all A, B, C (in the arithmetic example, take δ as * and Ο as + ). Boolean Expressions: We turn now to Boolean expressions. Suppose A and B are attributes of two distinct relations. Then the Boolean expression A > B and B > 3 Is clearly equivalent to the following: A > B and B > 3 and A > 3 The equivalence is based on the fact that the comparison operator ">" is transitive. Note that this transformation is certainly work making, because it enables the system to perform an additional restriction (on A) before doing the greater-than join implied by the comparison "A > B''. To repeat a point made earlier doing restrictions early is generally a good idea; having the system infer additional "early'' restrictions, as here, is also a good idea. Note: This technique is implemented in several commercial products, including, for example, DB2 (where it is called "predicate transitive closure') and Ingres. Here is another example: The expression A > B or (C = D and E < F) Can be transformed into (A > B or C = D) and (A > B or E < F) By virtue of the fact that OR distributes over AND. This example illustrates another general law-vfz.; any Boolean expression can be transformed into an equivalent in what is called conjunctive normal form (CNF). A CNF expression is an expression of the form C1 and C2 and and Cn Where each of C1, C2 Cn is, in turn a boolean expression (called a conjunct) that involves no ANDs. The advantage of CNF expression is true only if every conjunct is true; equivalently, it is false if any conjunct is false. Since AND is commutative, the optimizer can evaluate the individual conjuncts in any order it likes; in particular, it can do them in order of increasing difficulty. As soon as it finds one that is false, the whole process can stop. Furthermore, in a parallel-processing system, it might even be possible to evaluate all of the conjuncts in parallel. Again, as soon as one is found that is false, the whole process can stop.

11 It follows from this subsection and its predecessor that the optimizer needs to know how general properties such as distributivity apply not only to relational operators such as join, but also to comparison operators such as >, Boolean operators such as AND & OR, arithmetic operators such as +, and so on. Choice of Evaluation Plans: Generation of expressions is only part of the query-optimization process, since each operation in the expression can be implemented with different algorithms. An evaluation plan is therefore needed to define exactly what algorithm should be used for each operation, and how the execution of the operations should be coordinated. As we have seen, several different algorithms can be used for each relational operation, giving rise to alternative evaluation plans. Further, decisions about pipelining have to be made. In the figure, the edges from the selection operations to the merge join operation are marked as pipelined; pipelining is feasible if the selection operations generate their output sorted on the Join attributes. They would do so if the indices on branch and account store records with equal values for the index attributes sorted by branch_name. Interaction of Evaluation Techniques: One way to choose an evaluation plan for a query expression is simply to choose for each operation the cheapest algorithm for evaluating it. We can choose any ordering of the operations that ensures that operations lower in the tree are executed before operations higher in the tree. However, choosing the cheapest algorithm for each operation independently is not necessarily a good idea. Although a merge join at a given level may be costlier

12 than a hash join, it may provide a Sorted Output that makes evaluating a later operation (such as duplicate eliminations, intersection, or another merge join) cheaper. Similarly, a nested loop join with indexing may provide opportunities for pipelining the results to the next operation, and thus may be Useful even if it is not the cheapest way of performing the Join. To choose the best overall algorithm, we must consider even non-optimal algorithms for individual operations. Thus, in addition to considering alternative expressions for a query, we must also consider alternative algorithms for each operation in an expression. We can use rules much like the equivalence rules to define what algorithms can be used for each operation, and Whether its result can be pipelined or must be materialized. We can use these rules to generate all the query-evaluation plans for a given expression. Depending upon the indices available, certain selection operations can be evaluated using only an index without accessing the relation itself. That still leaves the problem of choosing the best evaluating plan for a query. There are two broad approaches: The first searches all the plans, and chooses the best plan in a cost based fashion. The second uses heuristics to choose a plan. Practically query optimizers incorporate elements of both approaches. Cost-Based Optimization: A cost-based optimizer generates a range of query-evaluation plans from the given query by using the equivalence rules, and chooses the one with the least cost. For a complex query, the number of different query plans that are equivalent to a given plan can be large. As an illustration, consider the expression r1 r2 rn where the joins are expressed without any ordering. With n = 3, there are 12 different join orderings: r1 (r2 r3) r1 (r3 r2) (r2 r3) r1 (r3 r2) r1 r2 (r1 r3) r2 (r3 r1) (r1 r3) r2 (r3 r1) r2 r3 (r1 r2) r3 (r2 r1) (r1 r2) r3 (r2 r1) r3 In general, with n relations, there are (2(n - 1))! / (n - 1)! different join orders. For joins involving small numbers of relations, this number is acceptable; for example, with n = 5, the number is However, as n increases, this number rises quickly. With n = 7, the number is 665,280; with n, = 10, the number is greater than 17.6 billion! Luckily, it is not necessary to generate the entire expressions equivalent to a given expression. For example, suppose we want to find the best join order of the form (r1 r2 r3) r4 r5 which represents all join orders where r1, r2, and r3 are joined first (in some order), and the result is joined (in some order) with r4 and r5. There are 12 different join orders for computing r1 r2 r3, and 12 orders for computing the join of this result

13 with r4 and r5. Thus, there appear to be 144 join orders to examine. However, once we have found the best join order for the subset of relations {r1, r2, r3}, we can use that order for further joins with r4 and r5, and can ignore all costlier join orders of r1 r2 r3. Thus, instead of 144 choices to examine, we need to examine only choices. Using this idea, we can develop a dynamic-programming algorithm for finding optimal join orders. Dynamic-programming algorithms store results of computations and reuse them, a procedure that can reduce execution time greatly. The procedure stores the evaluation plans it computes in an associative array bestplan, which is indexed by sets of relations. Each element of the associative array contains two components: the cost of the best plan of S, and the plan itself. The value of bestplan[s].cost is assumed to be initialized to if bestplan[s] has not yet been computed. Dynamic-programming algorithm for join order optimization. procedure FindBestPlan (S) if (bestplan[s].cost <> ) /* bestplan[s] already computed */ return bestplan[s] if (S contains only 1 relation) set bestplan[s].plan and bestplan[s].cost based on best way of accessing S else for each non-empty subset S1 of S such that S1<> S P1 = FindBestPlan (S1) P2 = FindBestPlan (S - S1) A = best algorithm for joining results of P1 and P2 cost = P1.cost + P2.cost + cost of A if cost < bestplan[s].cost return bestplan[s] bestplan[s].cost = cost bestplan[s].plan = execute P1.plan; execute P2.plan; join results of P1 and P2 using A The procedure first checks if the best plan for computing the join of the given set of relations S has been computed already (and stored in the associative array bestplan); if so, it returns the already computed plan. If S contains only one relation, the best way of accessing S (taking selections on S, if any, into account) is recorded in bestplan. This may involve using an index to identify tuples, and then fetching the tuples (often referred to as an index scan), or scanning the entire relation (often referred to as a relation scan). Otherwise, the procedure tries every way of dividing S into two disjoint subsets. For each division, the procedure recursively finds the best plans for each of

14 the two subsets, and then computes the cost of the overall plan by using that division. The procedure picks the cheapest plan from among all the alternatives for dividing S into two sets. The cheapest plan and its cost are stored in the array bestplan, and returned by the procedure. The time complexity of the procedure can be shown to be O (3 n ). Actually the order in which tuples are generated by the join of a set of relations is also important for finding the best overall join order, since it can affect the cost of further joins (for instance, if merge join is used). A particular sort order of the tuples is said to be an interesting sort order if it could be useful for a later operation. For instance, generating the result of r1 r2 r3 sorted on the attributes common with r4 and r5 may be useful, but generating it sorted on the attributes common to only r1 and r2 is not useful. Using merge join for computing r1 r2 r3 may be costlier than using some other join technique, but may provide an output sorted in an interesting sort order. Hence, it is not sufficient to find the best join order for each subset of the set of n given relations. Instead, we have to find the best join order for each subset, for each interesting sort order of the join result for that subset. The number of subsets of n relations is 2 n. The number of interesting sort orders is generally not large. Thus, about 2 n join expressions need to be stored. The dynamic-programming algorithm for finding the best join order can be easily extended to handle sort orders. The cost of the extended algorithm depends on the number of interesting orders for each subset of relations; since this number has been found to be small in practice, the cost remains at O (3 n ). With n = 10, this number is around 59,000, which is much better than the 17.6 billion different join orders. More important, the storage required is much less than before, since we need to store only one join order for each interesting sort order of each of 1024 subsets of r1,..., r10. Although both numbers still increase rapidly with n, commonly occurring joins usually have less than 10 relations, and can be handled easily. We can use several techniques to reduce further the cost of searching through a large number of plans. For instance, when examining the plans for an expression, we can terminate after we examine only a part of the expression, if we determine that the cheapest plan for that part is already costlier than the cheapest evaluation plan for a full expression examined earlier. Similarly, suppose that we determine that the cheapest way of evaluating a sub-expression is costlier than the cheapest evaluation plan for a full expression examined earlier. Then, no full expression involving that sub-expression needs to be examined. We can further reduce the number of evaluation plans that need to be considered fully by first making a heuristic guess of a good plan, and estimating that plan's cost. Then, only a few competing plans will require a full analysis of cost. These optimizations can reduce the overhead of query optimization significantly. The intricacies of SQL introduce a good deal of complexity into query optimizers. The approach to optimization described above concentrates on join-order

15 optimization. In contrast, the optimizers used in some other systems, notably Microsoft SQL Server, are based on equivalence rules. The benefit of using equivalence rules is that it is easy to extend the optimizer with new rules. For example, nested queries can be represented using extended relational-algebra constructs, and transformations of nested queries can be expressed as equivalence rules. To make the approach work efficiently requires efficient techniques for detecting duplicate derivations, and a form of dynamic programming to avoid reoptimizing the same sub-expressions. This approach was pioneered by the Volcano research project. Advanced Types of Optimization: In this section, we attempt to provide a brief glimpse of advanced types of optimization that researchers have proposed over the past few years. The descriptions are based on examples only; further details may be found in the references provided. Furthermore, there are several issues that are not discussed at all due to lack of space, although much interesting work has been done on them, e.g., nested query optimization, rule-based query optimization, query optimizer generators, object-oriented query optimization, optimization with materialized views, heterogeneous query optimization, recursive query optimization, aggregate query optimization, optimization with expensive selection predicates, and query optimizer validation. 1. Semantic Query Optimization: Semantic query optimization is a form of optimization mostly related to the Rewriter module. The basic idea lies in using integrity constraints defined in the database to rewrite a given query into semantically equivalent ones [Kin81]. These can then be optimized by the Planner as regular queries and the most efficient plan among all can be used to answer the original query. As a simple example, using a hypothetical SQL-like syntax, consider the following integrity constraint: ASSERT sal-constraint ON emp: sal >100K WHERE job = Sr. Programmer". Also consider the following query: SELECT name, floor FROM emp, dept WHERE emp.dno = dept.dno AND job = Sr. Programmer". Using the above integrity constraint, the query can be rewritten into a semantically equivalent one to include a selection on sal: SELECT name, floor FROM emp, dept

16 WHERE emp.dno = dept.dno AND job = \Sr. Programmer" AND sal>100k. Having the extra selection could help tremendously in finding a fast plan to answer the query if the only index in the database is a B+-tree on emp.sal. On the other hand, it would certainly be a waste if no such index exists. For such reasons, all proposals for semantic query optimization present various heuristics or rules on which rewritings have the potential of being beneficial and should be applied and which not. 2. Global Query Optimization: So far, we have focused our attention to optimizing individual queries. Quite often, however, multiple queries become available for optimization at the same time, e.g., queries with unions, queries from multiple concurrent users, queries embedded in a single program, or queries in a deductive system. Instead of optimizing each query separately, one may be able to obtain a global plan that, although possibly suboptimal for each individual query, is optimal for the execution of all of them as a group. Several techniques have been proposed for global query optimization [Sel88]. As a simple example of the problem of global optimization consider the following two queries: SELECT name, floor FROM emp, dept WHERE emp.dno = dept.dno AND job = Sr. Programmer", SELECT name FROM emp, dept WHERE emp.dno = dept.dno AND budget > 1M. Depending on the sizes of the emp and dept relations and the selectivity of the selections, it may well be that computing the entire join once and then applying separately the two selections to obtain the results of the two queries is more efficient than doing the join twice, each time taking into account the corresponding selection. Developing Planner modules that would examine all the available global plans and identify the optimal one is the goal of global/multiple query optimizers. 3. Parametric/Dynamic Query Optimization: As mentioned earlier, embedded queries are typically optimized once at compile time and are executed multiple times at run time. Because of this temporal separation between optimization and execution, the values of various parameters that are used during optimization may be very different during execution. This may make the chosen plan invalid (e.g., if indices used in the plan are no longer available) or simply not optimal (e.g., if the number of available buffer pages or operator selectivity s have changed, or if new indices have become available). To address this issue, several techniques have been proposed that use various search strategies (e.g., randomized algorithms or the strategy of Volcano) to optimize queries as much as possible at compile time taking into account all possible values that interesting parameters may have at run time. These techniques use the actual parameter values

17 at run time, and simply pick the plan that was found optimal for them with little or no overhead. Of a drastically different flavor is the technique of Rdb/VMS [Ant93], where by dynamically monitoring how the probability distribution of plan costs changes, plan switching may actually occur during query execution. Estimation of Query-Processing Cost: 1. To choose a strategy based on reliable information, the database system may store statistics for each relation r: o nr - The number of tuples in r. o sr - The size in bytes of a tuple of r (for fixed-length records). o V (A, r) - the number of distinct values that appear in relation r for attribute A. 2. The first two quantities allow us to estimate accurately the size of a Cartesian product. o The Cartesian product r s contains nr ns tuples. o Each tuple of r s occupies sr + ss bytes. o The third statistic is used to estimate how many tuples satisfy a selection predicate of the form o <attribute-name> = <value> o We need to know how often each value appears in a column. o If we assume each value appears with equal probability, then σa = a (r) is estimated to have tuples. o This may not be the case, but it is a good approximation of reality in many relations. o We assume such a uniform distribution for the rest of this chapter. o Estimation of the size of a natural join is more difficult. o Let r1 (R1) and r1 (R1) be relations on schemes R1 and R2. o If R1 R2 = Φ (no common attributes), then r1 can estimate the size of this accurately. r2 is the same as r s and we o If R1 R2 is a key for R1, then we know that a tuple of r2 will join with exactly one tuple of r1. o Thus the number of tuples in r1 r2 will be no greater than nr2. o If R1 R2 is not a key for R1 or R2, things are more difficult.

18 o We use the third statistic and the assumption of uniform distribution. o Assume R1 R2 = {Λ} o We assume there are tuples in r2 with an A value of t [A] for tuple t in r1. o So tuple t of r1 produces tuples in r1 r2 3. Considering all the tuples in r1, we estimate that there are tuples in total in r1 r2 4. If we reverse the roles of r1 and r2 in this equation, we get a different estimate if V (Λ, r1)<> V (Λ, r2) o If this occurs, there are likely to be some dangling tuples that do not participate in the join. o Thus the lower estimate is probably the better one. o This estimate may still be high if the V (Λ, r1) values in r1 have few values in common with the V (Λ, r2) values in r2. o However, it is unlikely that the estimate is far off, as dangling tuples are likely to be a small fraction of the tuples in a real world relation. 5. To maintain accurate statistics, it is necessary to update the statistics whenever a relation is modified. This can be substantial, so most systems do this updating during periods of light load on the system. Guidelines: For any production database, SQL query performance becomes an issue sooner or later. Having long-running queries not only consumes system resources that makes the server and application run slowly, but also may lead to table locking and data corruption issues. So, query optimization becomes an important task. First, we offer some guiding principles for query optimization: 1. Understand how your database is executing your query Nowadays all databases have their own query optimizer, and offer a way for users to understand how a query is executed. For example, which index from which table is

19 being used to execute the query? The first step to query optimization understands what the database is doing. Different databases have different commands for this. For example, in MySql, one can use "EXPLAIN [SQL Query]" keyword to see the query plan. In Oracle, one can use "EXPLAIN PLAN FOR [SQL Query]" to see the query plan. 2. Retrieve as little data as possible The more data returned from the query, the more resources the database needs to expand to process and store these data. So for example, if you only need to retrieve one column from a table, do not use 'SELECT *'. 3. Store intermediate results Sometimes logic for a query can be quite complex. Often, it is possible to achieve the desired result through the use of sub queries, inline views, and UNION-type statements. For those cases, the intermediate results are not stored in the database, but are immediately used within the query. This can lead to performance issues, especially when the intermediate results have a large number of rows. The way to increase query performance in those cases is to store the intermediate results in a temporary table, and break up the initial SQL statement into several SQL statements. In many cases, you can even build an index on the temporary table to speed up the query performance even more. Granted, this adds a little complexity in query management (i.e., the need to manage temporary tables), but the speedup in query performance is often worth the trouble. Below are several specific query optimization strategies. Use Index Using an index is the first strategy one should use to speed up a query. In fact, this strategy is so important that index optimization is also discussed. Aggregate Table Pre-populating tables at higher levels so fewer amounts of data need to be parsed. Vertical Partitioning Partition the table by columns. This strategy decreases the amount of data a SQL query needs to process. Horizontal Partitioning Partition the table by data value, most often time. This strategy decreases the amount of data a SQL query needs to process.

20 De-normalization The process of de-normalization combines multiple tables into a single table. This speeds up query performance because fewer table joins are needed. Server Tuning Each server has its own parameters, and often tuning server parameters so that it can fully take advantage of the hardware resources can significantly speed up query performance. References: An Introduction to Database Systems, Eight Edition - C. J. Date Database System Concepts, Fifth Edition Silberschatz, Korth, Sudharshan

Evaluation of Expressions

Evaluation of Expressions Query Optimization Evaluation of Expressions Materialization: one operation at a time, materialize intermediate results for subsequent use Good for all situations Sum of costs of individual operations

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Chapter 13: Query Optimization

Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Inside the PostgreSQL Query Optimizer

Inside the PostgreSQL Query Optimizer Inside the PostgreSQL Query Optimizer Neil Conway neilc@samurai.com Fujitsu Australia Software Technology PostgreSQL Query Optimizer Internals p. 1 Outline Introduction to query optimization Outline of

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products Chapter 3 Cartesian Products and Relations The material in this chapter is the first real encounter with abstraction. Relations are very general thing they are a special type of subset. After introducing

More information

Topics in basic DBMS course

Topics in basic DBMS course Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch

More information

SQL Query Evaluation. Winter 2006-2007 Lecture 23

SQL Query Evaluation. Winter 2006-2007 Lecture 23 SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

More information

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc.

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc. Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc. Table of Contents Overview...................................................................................

More information

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the

More information

SQL Query Performance Tuning: Tips and Best Practices

SQL Query Performance Tuning: Tips and Best Practices SQL Query Performance Tuning: Tips and Best Practices Pravasini Priyanka, Principal Test Engineer, Progress Software INTRODUCTION: In present day world, where dozens of complex queries are run on databases

More information

Query Processing C H A P T E R12. Practice Exercises

Query Processing C H A P T E R12. Practice Exercises C H A P T E R12 Query Processing Practice Exercises 12.1 Assume (for simplicity in this exercise) that only one tuple fits in a block and memory holds at most 3 blocks. Show the runs created on each pass

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute

More information

Query Optimization Over Web Services Using A Mixed Approach

Query Optimization Over Web Services Using A Mixed Approach Query Optimization Over Web Services Using A Mixed Approach Debajyoti Mukhopadhyay 1, Dhaval Chandarana 1, Rutvi Dave 1, Sharyu Page 1, Shikha Gupta 1 1 Maharashtra Institute of Technology, Pune 411038

More information

Physical Database Design and Tuning

Physical Database Design and Tuning Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence

More information

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We

More information

1. Physical Database Design in Relational Databases (1)

1. Physical Database Design in Relational Databases (1) Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence

More information

Record Storage and Primary File Organization

Record Storage and Primary File Organization Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records

More information

MS SQL Performance (Tuning) Best Practices:

MS SQL Performance (Tuning) Best Practices: MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware

More information

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level) Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster

More information

Oracle Database 11g: SQL Tuning Workshop

Oracle Database 11g: SQL Tuning Workshop Oracle University Contact Us: + 38516306373 Oracle Database 11g: SQL Tuning Workshop Duration: 3 Days What you will learn This Oracle Database 11g: SQL Tuning Workshop Release 2 training assists database

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer. DBMS Architecture INSTRUCTION OPTIMIZER Database Management Systems MANAGEMENT OF ACCESS METHODS BUFFER MANAGER CONCURRENCY CONTROL RELIABILITY MANAGEMENT Index Files Data Files System Catalog BASE It

More information

2) Write in detail the issues in the design of code generator.

2) Write in detail the issues in the design of code generator. COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage

More information

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel

More information

The Import & Export of Data from a Database

The Import & Export of Data from a Database The Import & Export of Data from a Database Introduction The aim of these notes is to investigate a conceptually simple model for importing and exporting data into and out of an object-relational database,

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database

More information

INTRODUCTION TO DATABASE SYSTEMS

INTRODUCTION TO DATABASE SYSTEMS 1 INTRODUCTION TO DATABASE SYSTEMS Exercise 1.1 Why would you choose a database system instead of simply storing data in operating system files? When would it make sense not to use a database system? Answer

More information

3. Relational Model and Relational Algebra

3. Relational Model and Relational Algebra ECS-165A WQ 11 36 3. Relational Model and Relational Algebra Contents Fundamental Concepts of the Relational Model Integrity Constraints Translation ER schema Relational Database Schema Relational Algebra

More information

2 SYSTEM DESCRIPTION TECHNIQUES

2 SYSTEM DESCRIPTION TECHNIQUES 2 SYSTEM DESCRIPTION TECHNIQUES 2.1 INTRODUCTION Graphical representation of any process is always better and more meaningful than its representation in words. Moreover, it is very difficult to arrange

More information

CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY

CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY 2.1 Introduction In this chapter, I am going to introduce Database Management Systems (DBMS) and the Structured Query Language (SQL), its syntax and usage.

More information

Data Management for Portable Media Players

Data Management for Portable Media Players Data Management for Portable Media Players Table of Contents Introduction...2 The New Role of Database...3 Design Considerations...3 Hardware Limitations...3 Value of a Lightweight Relational Database...4

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium. Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically

More information

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed

More information

www.dotnetsparkles.wordpress.com

www.dotnetsparkles.wordpress.com Database Design Considerations Designing a database requires an understanding of both the business functions you want to model and the database concepts and features used to represent those business functions.

More information

Guide to Performance and Tuning: Query Performance and Sampled Selectivity

Guide to Performance and Tuning: Query Performance and Sampled Selectivity Guide to Performance and Tuning: Query Performance and Sampled Selectivity A feature of Oracle Rdb By Claude Proteau Oracle Rdb Relational Technology Group Oracle Corporation 1 Oracle Rdb Journal Sampled

More information

Advanced Oracle SQL Tuning

Advanced Oracle SQL Tuning Advanced Oracle SQL Tuning Seminar content technical details 1) Understanding Execution Plans In this part you will learn how exactly Oracle executes SQL execution plans. Instead of describing on PowerPoint

More information

PERFORMANCE TIPS FOR BATCH JOBS

PERFORMANCE TIPS FOR BATCH JOBS PERFORMANCE TIPS FOR BATCH JOBS Here is a list of effective ways to improve performance of batch jobs. This is probably the most common performance lapse I see. The point is to avoid looping through millions

More information

[Refer Slide Time: 05:10]

[Refer Slide Time: 05:10] Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture

More information

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving Section 7 Algebraic Manipulations and Solving Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving Before launching into the mathematics, let s take a moment to talk about the words

More information

Guide to SQL Programming: SQL:1999 and Oracle Rdb V7.1

Guide to SQL Programming: SQL:1999 and Oracle Rdb V7.1 Guide to SQL Programming: SQL:1999 and Oracle Rdb V7.1 A feature of Oracle Rdb By Ian Smith Oracle Rdb Relational Technology Group Oracle Corporation 1 Oracle Rdb Journal SQL:1999 and Oracle Rdb V7.1 The

More information

The MonetDB Architecture. Martin Kersten CWI Amsterdam. M.Kersten 2008 1

The MonetDB Architecture. Martin Kersten CWI Amsterdam. M.Kersten 2008 1 The MonetDB Architecture Martin Kersten CWI Amsterdam M.Kersten 2008 1 Try to keep things simple Database Structures Execution Paradigm Query optimizer DBMS Architecture M.Kersten 2008 2 End-user application

More information

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software

More information

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner Understanding SQL Server Execution Plans Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author

More information

DBMS / Business Intelligence, SQL Server

DBMS / Business Intelligence, SQL Server DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino Database Management Data Base and Data Mining Group of tania.cerquitelli@polito.it A.A. 2014-2015 Optimizer objective A SQL statement can be executed in many different ways The query optimizer determines

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit. Is your database application experiencing poor response time, scalability problems, and too many deadlocks or poor application performance? One or a combination of zparms, database design and application

More information

Distributed Data Management

Distributed Data Management Introduction Distributed Data Management Involves the distribution of data and work among more than one machine in the network. Distributed computing is more broad than canonical client/server, in that

More information

Lecture 6: Query optimization, query tuning. Rasmus Pagh

Lecture 6: Query optimization, query tuning. Rasmus Pagh Lecture 6: Query optimization, query tuning Rasmus Pagh 1 Today s lecture Only one session (10-13) Query optimization: Overview of query evaluation Estimating sizes of intermediate results A typical query

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information

Oracle Database 10g: Introduction to SQL

Oracle Database 10g: Introduction to SQL Oracle University Contact Us: 1.800.529.0165 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database technology.

More information

Chapter 5: Overview of Query Processing

Chapter 5: Overview of Query Processing Chapter 5: Overview of Query Processing Query Processing Overview Query Optimization Distributed Query Processing Steps Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of

More information

SQL Server. 2012 for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

SQL Server. 2012 for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach TRAINING & REFERENCE murach's SQL Server 2012 for developers Bryan Syverson Joel Murach Mike Murach & Associates, Inc. 4340 N. Knoll Ave. Fresno, CA 93722 www.murach.com murachbooks@murach.com Expanded

More information

Physical DB design and tuning: outline

Physical DB design and tuning: outline Physical DB design and tuning: outline Designing the Physical Database Schema Tables, indexes, logical schema Database Tuning Index Tuning Query Tuning Transaction Tuning Logical Schema Tuning DBMS Tuning

More information

www.gr8ambitionz.com

www.gr8ambitionz.com Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1

More information

Partitioning under the hood in MySQL 5.5

Partitioning under the hood in MySQL 5.5 Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael Ronström, Partitioning author Who are we? Mikael is a founder of the technology behind NDB

More information

MOC 20461C: Querying Microsoft SQL Server. Course Overview

MOC 20461C: Querying Microsoft SQL Server. Course Overview MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server

More information

Regular Expressions and Automata using Haskell

Regular Expressions and Automata using Haskell Regular Expressions and Automata using Haskell Simon Thompson Computing Laboratory University of Kent at Canterbury January 2000 Contents 1 Introduction 2 2 Regular Expressions 2 3 Matching regular expressions

More information

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan?

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan? Why Query Optimization? Access Path Selection in a Relational Database Management System P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price Peyman Talebifard Queries must be executed and execution

More information

Chapter 7 Memory Management

Chapter 7 Memory Management Operating Systems: Internals and Design Principles Chapter 7 Memory Management Eighth Edition William Stallings Frame Page Segment A fixed-length block of main memory. A fixed-length block of data that

More information

Distributed Databases in a Nutshell

Distributed Databases in a Nutshell Distributed Databases in a Nutshell Marc Pouly Marc.Pouly@unifr.ch Department of Informatics University of Fribourg, Switzerland Priciples of Distributed Database Systems M. T. Özsu, P. Valduriez Prentice

More information

Optimizing Performance. Training Division New Delhi

Optimizing Performance. Training Division New Delhi Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs. Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design

More information

ICAB4136B Use structured query language to create database structures and manipulate data

ICAB4136B Use structured query language to create database structures and manipulate data ICAB4136B Use structured query language to create database structures and manipulate data Release: 1 ICAB4136B Use structured query language to create database structures and manipulate data Modification

More information

3. Mathematical Induction

3. Mathematical Induction 3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

More information

Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis

Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Rajesh Reddy Muley 1, Sravani Achanta 2, Prof.S.V.Achutha Rao 3 1 pursuing M.Tech(CSE), Vikas College of Engineering and

More information

Oracle Database 11g: SQL Tuning Workshop Release 2

Oracle Database 11g: SQL Tuning Workshop Release 2 Oracle University Contact Us: 1 800 005 453 Oracle Database 11g: SQL Tuning Workshop Release 2 Duration: 3 Days What you will learn This course assists database developers, DBAs, and SQL developers to

More information

SQL Server Query Tuning

SQL Server Query Tuning SQL Server Query Tuning Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author Pro SQL Server

More information

CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS

CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS In today's scenario data warehouse plays a crucial role in order to perform important operations. Different indexing techniques has been used and analyzed using

More information

Database Programming with PL/SQL: Learning Objectives

Database Programming with PL/SQL: Learning Objectives Database Programming with PL/SQL: Learning Objectives This course covers PL/SQL, a procedural language extension to SQL. Through an innovative project-based approach, students learn procedural logic constructs

More information

6.3 Conditional Probability and Independence

6.3 Conditional Probability and Independence 222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

More information

Mathematical Induction

Mathematical Induction Mathematical Induction (Handout March 8, 01) The Principle of Mathematical Induction provides a means to prove infinitely many statements all at once The principle is logical rather than strictly mathematical,

More information

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design Physical Database Design Process Physical Database Design Process The last stage of the database design process. A process of mapping the logical database structure developed in previous stages into internal

More information

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals Oracle University Contact Us: 1.800.529.0165 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This course is designed to deliver the fundamentals of SQL and PL/SQL along

More information

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao CMPSCI 445 Midterm Practice Questions NAME: LOGIN: Write all of your answers directly on this paper. Be sure to clearly

More information

Introducing Microsoft SQL Server 2012 Getting Started with SQL Server Management Studio

Introducing Microsoft SQL Server 2012 Getting Started with SQL Server Management Studio Querying Microsoft SQL Server 2012 Microsoft Course 10774 This 5-day instructor led course provides students with the technical skills required to write basic Transact-SQL queries for Microsoft SQL Server

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program. Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to

More information

Special Situations in the Simplex Algorithm

Special Situations in the Simplex Algorithm Special Situations in the Simplex Algorithm Degeneracy Consider the linear program: Maximize 2x 1 +x 2 Subject to: 4x 1 +3x 2 12 (1) 4x 1 +x 2 8 (2) 4x 1 +2x 2 8 (3) x 1, x 2 0. We will first apply the

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR 1 2 2 3 In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR The uniqueness of the primary key ensures that

More information

Toad for Oracle 8.6 SQL Tuning

Toad for Oracle 8.6 SQL Tuning Quick User Guide for Toad for Oracle 8.6 SQL Tuning SQL Tuning Version 6.1.1 SQL Tuning definitively solves SQL bottlenecks through a unique methodology that scans code, without executing programs, to

More information

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals NEW Oracle University Contact Us: + 38516306373 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training delivers the

More information

8 Divisibility and prime numbers

8 Divisibility and prime numbers 8 Divisibility and prime numbers 8.1 Divisibility In this short section we extend the concept of a multiple from the natural numbers to the integers. We also summarize several other terms that express

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Fragmentation and Data Allocation in the Distributed Environments

Fragmentation and Data Allocation in the Distributed Environments Annals of the University of Craiova, Mathematics and Computer Science Series Volume 38(3), 2011, Pages 76 83 ISSN: 1223-6934, Online 2246-9958 Fragmentation and Data Allocation in the Distributed Environments

More information

Programming Languages

Programming Languages Programming Languages Programming languages bridge the gap between people and machines; for that matter, they also bridge the gap among people who would like to share algorithms in a way that immediately

More information

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 3 Issue 2; March-April-2016; Page No. 09-13 A Comparison of Database

More information

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals Oracle University Contact Us: +966 12 739 894 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training is designed to

More information

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation

More information

Efficient Data Structures for Decision Diagrams

Efficient Data Structures for Decision Diagrams Artificial Intelligence Laboratory Efficient Data Structures for Decision Diagrams Master Thesis Nacereddine Ouaret Professor: Supervisors: Boi Faltings Thomas Léauté Radoslaw Szymanek Contents Introduction...

More information

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods

C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods Overview 1 Determining Data Relationships 1 Understanding the Methods for Combining SAS Data Sets 3

More information

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML? CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 Introduction to Databases CS2 Spring 2005 (LN5) 1 Why databases? Why not use XML? What is missing from XML: Consistency

More information