DBMS Architecture INSTRUCTION OPTIMIZER Database Management Systems MANAGEMENT OF ACCESS METHODS BUFFER MANAGER CONCURRENCY CONTROL RELIABILITY MANAGEMENT Index Files Data Files System Catalog BASE It selects an efficient strategy for query execution It is a fundamental building block of a relational DBMS It guarantees the data independence property The form in which the query is written does not affect the way in which it is implemented A physical reorganization of data does not require rewriting queries It automatically generates a query execution plan It was formerly hard-coded by a programmer The automatically generated execution plan is usually more efficient It evaluates many different alternatives It exploits statistics on data, stored in the system catalog, to make decisions It exploits the best known strategies It dynamically adapts to changes in the data distribution 3 4 Lexical, syntactic and semantic analysis Analysis of a statement to detect Lexical errors e.g., misspelled keywords Syntactic errors errors in the grammar of the language Semantic errors references to objects which do not actually exist in the database (e.g, attributes or tables) information in the data dictionary is needed 6 Pag. 1 1
Lexical, syntactic and semantic analysis Output Internal representation in (extended) relational algebra Why relational algebra? It explicitly represents the order in which operators are applied It is procedural (different from ) There is a corpus of theorems and properties exploited to modify the initial query tree 7 Execution of algebraic transformations considered to be always beneficial Example: anticipation of selection with respect to join Should eliminate the difference among different formulations of the same query This step is usually independent of the data distribution Output Query tree in canonical form CANONICAL TREE COST BASED 9 Cost based optimization Cost based optimization Selection of the best execution plan by evaluating execution cost Selection of the best access method for each table the best algorithm for each relational operator among available alternatives Based on a cost model for access methods and algorithms Generation of the code implementing the best strategy Output Access program in executable format It exploits the internal structures of the DBMS Set of dependencies conditions on which the validity of the query plan depends e.g., the existence of an index 11 12 Pag. 2 2
Execution modes Compile and go Compilation and immediate execution of the statement No storage of the query plan Dependencies are not needed CANONICAL TREE COST BASED PROFILES (STATISTICS ON ) ACCESS PROGRAM SET OF DEPENDENCIES 14 Execution modes Compile and store The access plan is stored in the database together with its dependencies It is executed on demand It should be recompiled when the data structure changes Database Management Systems 15 CANONICAL TREE COST BASED PROFILES (STATISTICS ON ) It is based on equivalence transformations Two relational expressions are equivalent if they both produce the same query result for any arbitrary database instance Interesting transformations reduce the size of the intermediate result to be stored in memory prepare an expression for the application of a transformation which reduces the size of the intermediate result ACCESS PROGRAM SET OF DEPENDENCIES 18 Pag. 3 3
1. Atomization of selection 1 Ʌ F2 (E) 2 (1 1 (2 1. Atomization of selection 1 Ʌ F2 (E) 2 (1 1 (2 2. Cascading projections (E) (,Y 19 20 1. Atomization of selection 1 Ʌ F2 (E) 2 (1 1 (2 2. Cascading projections (E) (,Y 3. Anticipation of selection with respect to join (pushing selection down) E 2 ) E 1 ( F is a predicate on attributes in E 2 only 4. Anticipation of projection with respect to join π L p E 2 ) π L ((π L1, J ) p(π L2,J ) L1 = L - Schema ) L2 = L - Schema ) J = set of attributes needed to evaluate join predicate p 21 22 5. Join derivation from Cartesian product E 2 ) E 1 F E 2 predicate F only relates attributes in E 1 and E 2 5. Join derivation from Cartesian product E 2 ) E 1 F E 2 predicate F only relates attributes in E 1 and E 2 6. Distribution of selection with respect to union ) ( ) ) ( ) ) 23 24 Pag. 4 4
5. Join derivation from Cartesian product E 2 ) E 1 F E 2 8. Distribution of projection with respect to union ) ( ( predicate F only relates attributes in E 1 and E 2 6. Distribution of selection with respect to union ) ( ) ) ( ) ) 7. Distribution of selection with respect to difference E 2 ) ( ) ) ( ) ) ( E 2 25 26 8. Distribution of projection with respect to union ) ( ( Can projection be distributed with respect to difference? -E 2 ) ( - ( 8. Distribution of projection with respect to union ) ( ( Can projection be distributed with respect to difference? -E 2 ) ( - ( This equivalence only holds if X includes the primary key or a set of attributes with the same properties (unique and not null) 27 28 9. Other properties 1 V F2 (E) (1 (2 1 Ʌ F2 (E) (1 (2 10.Distribution of join with respect to union E ) (E E 1 ) (E E 2 ) All Binary operators are commutative and associative except for difference 29 30 Pag. 5 5
Example Tables EMP (Emp#,, Dept#, Salary) DEPT (Dept#, DName, ) π DName (σ EMP.Dept#=DEPT.Dept# Ʌ Salary >1000 (EMP DEPT query SELECT DISTINCT DName FROM EMP, DEPT WHERE EMP.Dept#=DEPT.Dept# AND Salary > 1000; 31 32 π DName (σ EMP.Dept#=DEPT.Dept# Ʌ Salary >1000 (EMP DEPT Prop #1 π DName (σ EMP.Dept#=DEPT.Dept# Ʌ Salary >1000 (EMP DEPT Prop #1 π DName (σ Salary >1000 (σ EMP.Dept#=DEPT.Dept# (EMP DEPT π DName (σ Salary >1000 (σ EMP.Dept#=DEPT.Dept# (EMP DEPT Prop #5 π DName (σ Salary >1000 (EMP 33 34 π DName (σ Salary >1000 (EMP π DName (σ Salary >1000 (EMP Prop #3 Prop #3 π DName (σ Salary >1000 (EMP π DName (σ Salary >1000 (EMP Prop #2 and #4 π DName ((π Dept# (σ Salary >1000 (EMP (π Dept#,DName ( 35 36 Pag. 6 6
Example: Query tree Example: Cardinalities Final query tree π DName Cardinality (EMP) 10,000 Cardinality ( 100 Cardinality (EMP where Salary > 1000) 50 π Dept# σ Salary>1000 π Dept#,DName DEPT EMP 37 38 Pag. 7 7