Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Similar documents
Physical DB design and tuning: outline

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

Physical Database Design and Tuning

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

1. Physical Database Design in Relational Databases (1)

CIS 631 Database Management Systems Sample Final Exam


Advanced Oracle SQL Tuning

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

Oracle Database 11g: SQL Tuning Workshop Release 2

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

SQL Query Evaluation. Winter Lecture 23

There are five fields or columns, with names and types as shown above.

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

B2.2-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Overview of Storage and Indexing

Oracle Database 11g: SQL Tuning Workshop

RDBMS Using Oracle. Lecture Week 7 Introduction to Oracle 9i SQL Last Lecture. kamran.munir@gmail.com. Joining Tables

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Unit Storage Structures 1. Storage Structures. Unit 4.3

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Active database systems. Triggers. Triggers. Active database systems.

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino

Query Processing C H A P T E R12. Practice Exercises

DBMS / Business Intelligence, SQL Server

Data warehouse design

Lecture 1: Data Storage & Index

CS54100: Database Systems

Oracle Database 12c: Introduction to SQL Ed 1.1

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

SQL Optimization & Access Paths: What s Old & New Part 1

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Storage and File Structure

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

BCA. Database Management System

Question 1. Relational Data Model [17 marks] Question 2. SQL and Relational Algebra [31 marks]

Part A: Data Definition Language (DDL) Schema and Catalog CREAT TABLE. Referential Triggered Actions. CSC 742 Database Management Systems

Instant SQL Programming

Index Selection Techniques in Data Warehouse Systems

Benefits of Normalisation in a Data Base - Part 1

White Paper. Optimizing the Performance Of MySQL Cluster

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Oracle DBA Course Contents

MS SQL Performance (Tuning) Best Practices:

SQL Server Query Tuning

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

The process of database development. Logical model: relational DBMS. Relation

Chapter 10 Functional Dependencies and Normalization for Relational Databases

DB2 Developers Guide to Optimum SQL Performance

Chapter 10. Functional Dependencies and Normalization for Relational Databases

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Databases and BigData

Chapter 13: Query Optimization

Fallacies of the Cost Based Optimizer

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

Chapter 8. SQL-99: SchemaDefinition, Constraints, and Queries and Views

University of Aarhus. Databases IBM Corporation

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Scheme G. Sample Test Paper-I

Practical Database Design and Tuning

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Relational Algebra. Query Languages Review. Operators. Select (σ), Project (π), Union ( ), Difference (-), Join: Natural (*) and Theta ( )

Indexing Techniques in Data Warehousing Environment The UB-Tree Algorithm

Developing Microsoft SQL Server Databases 20464C; 5 Days

Performance Tuning for the Teradata Database

Inside the PostgreSQL Query Optimizer

How To Improve Performance In A Database

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

Theory of Relational Database Design and Normalization

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

Database Design Methodology

Oracle Database: SQL and PL/SQL Fundamentals NEW

Access Path Selection in a Relational Database Management System

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

Theory of Relational Database Design and Normalization

20464C: Developing Microsoft SQL Server Databases

IT2305 Database Systems I (Compulsory)

Overview. Physical Database Design. Modern Database Management McFadden/Hoffer Chapter 7. Database Management Systems Ramakrishnan Chapter 16

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

Exercise 1: Relational Model

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

Oracle 10g PL/SQL Training

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

Data warehousing in Oracle. SQL extensions for data warehouse analysis. Available OLAP functions. Physical aggregation example

6. Storage and File Structures

LOGICAL DATABASE DESIGN

Developing Microsoft SQL Server Databases

Course 20464: Developing Microsoft SQL Server Databases

Developing Microsoft SQL Server Databases (20464) H8N64S

Contents at a Glance. Contents... v About the Authors... xiii About the Technical Reviewer... xiv Acknowledgments... xv

Practical Cassandra. Vitalii

Transcription:

Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design Physical schema 1 2 : Inputs Goal Providing good performance for database applications Application software is not affected by physical design choices Data independence It requires the selection of the DBMS product Different DBMS products provide different storage structures access techniques Logical schema of the database Features of the selected DBMS product e.g., index types, page clustering Workload Important queries with their estimated frequency Update operations with their estimated frequency Required performance for relevant queries and updates 3 4 : Outputs : Selection dimensions Physical schema of the database table organization, indices Set up parameters for database storage and DBMS configuration e.g., initial file size, extensions, initial free space, buffer size, page size Default values are provided Physical file organization unordered (heap) ordered (clustered) hashing on a hash-key clustering of several relations Tuples belonging to different tables may be interleaved 5 6 Pag. 1 1

: Selection dimensions Characterization of the workload Indices different structures e.g., B + -Tree, hash clustered (or primary) Only one index of this type can be defined unclustered (or secondary) Many different indices can be defined For each query accessed tables visualized attributes attributes involved in selections and joins selectivity of selections For each update attributes and tables involved in selections selectivity of selections update type (Insert / Delete / Update) and updated attributes (if any) 7 8 Selection of data structures Selection of data structures Selection of physical storage of tables indices For each table file structure heap or clustered attributes to be indexed hash or B + -Tree clustered or unclustered Changes in the logical schema alternatives which preserve BCNF (Boyce Code Normal Form) alternatives not preserving BCNF e.g., in data warehouses partitioning on different disks 9 10 strategies General criteria No general design methodology is available trial and error design process general criteria common sense heuristics may be improved after deployment database tuning The primary key is usually exploited for selections and joins index on the primary key clustered or unclustered, depending on other design constraints 11 12 Pag. 2 2

General criteria Add more indices for the most common query predicates Select a frequent query Consider its current evaluation plan Define a new index and consider the new evaluation plan if the cost improves, add the index Verify the effect of the new index on modification workload available disk space Never index small tables loading the entire table requires few disk reads Never index attributes with low cardinality domains low selectivity e.g., gender attribute not true in data warehouses different workloads and exploitation of bitmap indices 13 14 For attributes involved in simple predicates of a where clause Equality predicate Hash is preferred B + -Tree Range predicate B + -Tree Evaluate if using a clustered index improves in case of slow queries For where clauses involving many simple predicates Multi attribute (composite) index Select the appropriate key order the order of attributes affects the usability of the index Evaluate maintenance cost 15 16 To improve joins Nested loop Index on the inner table join attribute Merge scan B + -Tree on the join attribute (if possible, clustered) For group by Index on the grouping attributes Hash index or B + -tree Consider group by push down anticipation of group by with respect to joins not available in all systems observe the execution plan 17 18 Pag. 3 3

Example: Group by push down Example: Initial query tree Tables PRODUCT (Prod#, PName, PType, PCategory) SHOP (Shop#, City, Province, Region, State) SALES (Prod#, Shop#, Date, Qty) SELECT PType, Province, SUM (Qty) FROM Sales S, Shop SH, Product P WHERE S.Shop# = SH.Shop# AND S.Prod# = P.Prod# AND Region = Piemonte GROUP BY PType, Province; GROUP BY Prod#, Province GROUP BY PType, Province s Region = Piemonte SHOP SALES PRODUCT 19 20 Example: Rewritten query tree (1) Example: Rewritten query tree (2) GROUP BY PType, Province GROUP BY PType, Province GROUP BY Prod#, Province PRODUCT GROUP BY Prod#, Province PRODUCT s Region = Piemonte GROUP BY Prod#, Shop# s Region = Piemonte GROUP BY Prod#, Shop# SHOP SALES SHOP SALES 21 22 If it does not work? Query execution is not as fast as you expect or you are not satisfied yet Remember to update database statistics! Database tuning Add and remove indices May be performed also after deployment Techniques to affect the optimizer decision Should almost never be used called hints in oracle Data independence is lost Database Management Systems Physical Design Examples 23 24 Pag. 4 4

Example tables Tables EMP (Emp#, EName, Dept#, Salary, Age, Hobby) DEPT (Dept#, DName, Mgr) In EMP Dept# FOREIGN KEY REFERENCES DEPT.Dept# In DEPT Mgr FOREIGN KEY REFERENCES EMP.Emp# SELECT * WHERE Salary/12 = 1500; Index on the salary attribute (B + -Tree) The index may be disregarded because of the arithmetic expression Example 1 25 26 SELECT * WHERE Salary = 18000; Example 2 The index is used but it does not provide any benefit Consider Salary data distribution The value is very frequent and index access is not appropriate 27 Example 3 Suppose that table EMP has block factor (number of tuples per block) equal to 30 a) Card(DEPT)= 50 b) Card(DEPT) = 2000 For accessing Dept# in the EMP table, would you define a secondary index on Emp.Dept#? 28 Example 3 Example 4 Case A: Card (DEPT) = 50 Indexing is not appropriate Each page on average contains almost all departments sequential scan is better Case B: Card(DEPT) = 2000 Indexing is appropriate Each page contains tuples belonging to few departments 29 SELECT EName, Mgr E, DEPT D WHERE E.Dept# = D.Dept# AND DName = Toys ; Index definition Hash Index on DName for the selection condition Hash Index on Emp. Dept# for a nested loop with Emp as inner table 30 Pag. 5 5

6 Database Management Systems Example 5 Example 6 SELECT EName, Mgr E, DEPT D WHERE E.Dept# = D.Dept# AND DName = Toys AND Age=25; SELECT EName, Mgr E, DEPT D WHERE E.Dept# = D.Dept# AND Salary BETWEEN 10000 AND 12000 AND Hobby= Tennis ; Index definition An index on Age may be considered it depends on the selectivity of the condition 31 32 Example 6: selection Example 6: join Alternatives for the selection on EMP hash index on Hobby B + -Tree on Salary Usually equality predicates are more selective One index is always considered by the optimizer Two indices may be exploited by smart optimizers compute the intersection of RIDs before reading tuples Alternatives for join Hash join Nested loop EMP outer because of selection predicates DEPT inner plus index on DEPT.Dept# not appropriate if DEPT table is very small 33 34 Example 7 Example 7 SELECT Dept#, Count(*) WHERE Age>20 GROUP BY Dept# If the selection condition on Age is not very selective no B + -Tree on Age For group by Clustered index on Dept# Avoids sorting and reads blocks ready for group by Good! Secondary index on Dept# May cause too many reads Consider if appropriate 35 36 Pag. 6

SELECT Dept#, COUNT(*) GROUP BY Dept# Example 8 Example 8 Unclustered (secondary) index on Dept# It avoids reading table EMP It is a covering index it answers the query without requiring access to table data 37 38 Example 9 Example 10 SELECT Mgr FROM DEPT, EMP WHERE DEPT.Dept#=EMP.Dept# Unclustered index on EMP.Dept# It avoids reading table EMP SELECT AVG(Salary) WHERE Age = 25 AND Salary BETWEEN 3000 AND 5000 39 40 Example 10 Composite index on <Age,Salary> Fastest solution This order is the best if the condition on Age is more selective Issues in composite indices Order of the fields in a composite index is important Update overhead grows 41 Pag. 7 7