Clustering. Oracle Server Concepts Manual. Database Systems Concepts Silberschatz/ Korth Sec. 10.7

Similar documents
Overview. Physical Database Design. Modern Database Management McFadden/Hoffer Chapter 7. Database Management Systems Ramakrishnan Chapter 16

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

RDBMS Using Oracle. Lecture Week 7 Introduction to Oracle 9i SQL Last Lecture. kamran.munir@gmail.com. Joining Tables

Databases What the Specification Says


Benefits of Normalisation in a Data Base - Part 1

Physical Data Organization

TIM 50 - Business Information Systems

7. Databases and Database Management Systems

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc.

Using Temporary Tables to Improve Performance for SQL Data Services

SQL, PL/SQL FALL Semester 2013

Review: Participation Constraints

Foundations of Business Intelligence: Databases and Information Management

LiTH, Tekniska högskolan vid Linköpings universitet 1(7) IDA, Institutionen för datavetenskap Juha Takkinen

DATABASE DESIGN - 1DL400

Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

DATABASDESIGN FÖR INGENJÖRER - 1DL124

IT2304: Database Systems 1 (DBS 1)

Computer Security: Principles and Practice

IT2305 Database Systems I (Compulsory)

Course MIS. Foundations of Business Intelligence

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

Foundations of Business Intelligence: Databases and Information Management

Chapter 6: Integrity Constraints

Data Modeling. Database Systems: The Complete Book Ch ,

3. Relational Model and Relational Algebra

Using Indexes. Introduction

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Advice for software developers and horse racing enthusiasts: Avoid hacks.

TYPICAL QUESTIONS & ANSWERS

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

The Entity-Relationship Model

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Relational Database Basics Review

Oracle Database 11g: SQL Tuning Workshop

SQL NULL s, Constraints, Triggers

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

Databases and BigData


Objectives of SQL. Terminology for Relational Model. Introduction to SQL

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

DATABASE MANAGEMENT SYSTEMS. Question Bank:

Part 5: More Data Structures for Relations

SQL Simple Queries. Chapter 3.1 V3.0. Napier University Dr Gordon Russell

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING DR.NAVALAR NEDUNCHEZHIAYN COLLEGE OF ENGINEERING, THOLUDUR , CUDDALORE DIST.

Displaying Data from Multiple Tables

Foundations of Business Intelligence: Databases and Information Management

Displaying Data from Multiple Tables. Chapter 4

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

Tutorial on Relational Database Design

Conceptual Design Using the Entity-Relationship (ER) Model

Maximizing Materialized Views

Data Models and Database Management Systems (DBMSs) Dr. Philip Cannata

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Introduction to Databases

Lecture 6. SQL, Logical DB Design

Foundations of Business Intelligence: Databases and Information Management

Chapter 9, More SQL: Assertions, Views, and Programming Techniques

Database Systems. National Chiao Tung University Chun-Jen Tsai 05/30/2012

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

Ch.5 Database Security. Ch.5 Database Security Review

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

In-Memory Databases MemSQL

GUJARAT TECHNOLOGICAL UNIVERSITY

Contents RELATIONAL DATABASES

Chapter 2: Security in DB2

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

There are five fields or columns, with names and types as shown above.

Unit 5.1 The Database Concept

Relational Databases and SQLite

CSC 443 Data Base Management Systems. Basic SQL

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

1. INTRODUCTION TO RDBMS

Database CIS 340. lab#6. I.Arwa Najdi

Unit Storage Structures 1. Storage Structures. Unit 4.3

CIS 631 Database Management Systems Sample Final Exam

SQL Query Evaluation. Winter Lecture 23

Mini User's Guide for SQL*Plus T. J. Teorey

Course: CSC 222 Database Design and Management I (3 credits Compulsory)

Oracle Database 11g: SQL Tuning Workshop Release 2

Topics Advanced PL/SQL, Integration with PROIV SuperLayer and use within Glovia

Relational Database: Additional Operations on Relations; SQL

PERFORMANCE TIPS FOR BATCH JOBS

Types & Uses of Databases

Database Design Patterns. Winter Lecture 24

CHAPTER 17: File Management

Physical DB design and tuning: outline

Transcription:

Oracle Server Concepts Manual Database Systems Concepts Silberschatz/ Korth Sec. 10.7 Fundamentals of Database Systems Elmasri/Navathe Sec. 5.10 Stephen Mc Kearney, 2001. 1

Overview Intra-file What types of clustering exist? Definition How is it implemented? When is it used? Index How is it implemented in Oracle? in Oracle How do you decide to cluster data? Inter-file How does clustering work? Advantages & Disadvantages? Applications Criteria for How does clustering compare to B + -Trees? in Pages Advantages Disadvantages Comparison Compare clustered and unclustered? Unclustered Relations Clustered Relations 2 Stephen Mc Kearney, 2001. 2

Definition means that records related to each other are stored physically beside each other. Frank 3 is a method of storing data on a disc. A cluster is used to store tuples from one or more relations physically close to other tuples in the database. The purpose of clustering is to speed up the performance of certain types of queries. When tuples that are physically close to each other are retrieved they are retrieved more quickly than tuples that are not physically close to each other. Because clustering affects how the data is actually stored on the disc, the decision to use clustering in the database is part of the physical database design process. does not affect the applications that access the relations which have been clustered. Clustered and unclustered relations appear the same to users of the system. Stephen Mc Kearney, 2001. 3

Intra-file Data items in a single file are stored together. Supplier 1 Supplier 2 Supplier 3 Supplier n Suppliers are stored in the order they are most often retrieved 4 In intra-file clustering records in a single file are stored close to related records in the same file. For example, if suppliers are normally ordered by their supplier number then each supplier would be stored to the supplier with the next highest supplier number. Stephen Mc Kearney, 2001. 4

Inter-file Data items in two or more files are stored together. Supplier 1 Shipment A Shipment B Supplier 2 Shipment C Shipment D Shipment E Supplier 3 Shipment F Shipment G Shipments from one file are stored beside suppliers in another file. 5 In inter-file clustering records from one file are stored close to records from another file. For example, a shipment from a shipments file would be stored close to the supplier of the shipment. Stephen Mc Kearney, 2001. 5

Overview Intra-file What types of clustering exist? Definition How is it implemented? When is it used? Index How is it implemented in Oracle? in Oracle How do you decide to cluster data? Inter-file How does clustering work? Advantages & Disadvantages? Applications Criteria for How does clustering compare to B + -Trees? in Pages Advantages Disadvantages Comparison Compare clustered and unclustered? Unclustered Relations Clustered Relations 6 Stephen Mc Kearney, 2001. 6

Data in Pages Disc These pages will be slower to retrieve. The disc must rotate further to read each page. These pages will be quicker to retrieve. The disc must rotate less to read each page. Data that is stored close together will be quicker to retrieve. 7 affects the physical position of data on the disc. When two data items are stored on the same page on the disc, they can be read with one page read operation. Because the computer reads one page at a time, data items stored on the same page will be read at the same time. When two data items are stored on pages that are close to each other on the disc, they can be read with two page read operations. Because the pages occur one after another there is no disc head movement between reads (no seek time). When two data items are stored in separate locations on the disc, they can be read with two page read operations and a seek operation. Because the pages occur at separate locations on the disc the disc head must move to a new position on the disc to read the second page. Stephen Mc Kearney, 2001. 7

Unclustered Relations Adapted from Oracle7 Concepts Server Manual 8 Unclustered relations are stored in their own pages on the disc. That is, each page will contain tuples from one relation only. The pages may be positioned anywhere on the disc. Therefore, to join two relations at least two pages must be read from the disc - one page for each relation. For example, in the above example, the emp relation (table) is stored at one location on the disc and the dept relation (table) is stored at another location. Stephen Mc Kearney, 2001. 8

Clustered Relations Adapted from Oracle7 Concepts Server Manual 9 Clustered relations are stored using a cluster key. Each relation belonging to the cluster has an attribute corresponding to the cluster key. Each block will store tuples with a particular cluster key value. For example, in the above example, the cluster key is deptno and all the departments and employees with deptno=10 are stored together. This type of cluster will improve the performance of queries that join the emp and the dept relations. Note that the cluster key value is only stored once for each distinct value. For example, the value deptno=10 is only stored once and all tuples with deptno=10 are stored together. Stephen Mc Kearney, 2001. 9

Overview Intra-file What types of clustering exist? Definition How is it implemented? When is it used? Index How is it implemented in Oracle? in Oracle How do you decide to cluster data? Inter-file How does clustering work? Advantages & Disadvantages? Applications Criteria for How does clustering compare to B + -Trees? in Pages Advantages Disadvantages Comparison Compare clustered and unclustered? Unclustered Relations Clustered Relations 10 Stephen Mc Kearney, 2001. 10

Advantages Advantages Speeds up some queries Uses less space Supplier 1 These shipments are for supplier 1. Shipment A Shipment B Supplier 2 Shipment C Shipment D Shipment E Supplier 3 Shipment F A query for all shipments of supplier 1 will be quick because all the shipments for supplier 1 follow immediately after supplier 1. Shipment G 11 will speed up some database queries. For example, a cluster consisting of suppliers and shipments will speed up queries that request all the shipments for a particular supplier. The cluster improves the supplier/shipment query because the data for each shipment is stored on the same page as the corresponding supplier. Hence, when the supplier record is read the set of shipments is also read. The cluster key value that is used to cluster relations is only stored once in each page. This may save disc space. Stephen Mc Kearney, 2001. 11

Disadvantages Disadvantages Slows down some queries Slows down writes Supplier 1 To read all the shipment records the supplier records must also be read. Shipment A Shipment B Supplier 2 Shipment C Shipment D Shipment E Supplier 3 A query for all shipments will be slow because the shipments are not stored together on the disc. Shipment F Shipment G 12 will slow down certain types of queries. For example, the cluster on suppliers and shipments will slow down queries that ask for all shipments. The cluster slows down the all shipments query because the shipments are stored with each supplier. To read all the shipments the DBMS must also read the supplier data. Inserting new records into a cluster may also be slow. For example, adding a new shipment for supplier 1 will involve making space after shipment B. Stephen Mc Kearney, 2001. 12

Overview Intra-file What types of clustering exist? Definition How is it implemented? When is it used? Index How is it implemented in Oracle? in Oracle How do you decide to cluster data? Inter-file How does clustering work? Advantages & Disadvantages? Applications Criteria for How does clustering compare to B + -Trees? in Pages Advantages Disadvantages Comparison Compare clustered and unclustered? Unclustered Relations Clustered Relations 13 Stephen Mc Kearney, 2001. 13

Applications 1 - Hierarchies ER Diagram Customer Order Order Line Cluster Customer 1 Order 1 ER Instance Customer 1 Order Line 1 Order Line 2 Order 2 Order Line 1 Order Line 2 Order 1 Order 2 Order 3 Order 3 Order Line 1 Customer 2 Order Line 1 Order Line 2 Order Line 1 Order Line 2 Order Line 1 Order Line 2 A hierarchy of customer to orders to order lines. 14 is used when the data has a hierarchical structure. For instance, in the example above, the cluster would be used when the most common queries will retrieve all the orders and order lines for a customer. A cluster to store the above structure would cluster all the order lines with their corresponding orders and then the orders and order lines would be stored with their corresponding customer. Stephen Mc Kearney, 2001. 14

Applications 2 - Lists List of Products Cluster Product 1 Product 1 Product 2 Product 3 Product 2 Product 3 15 A cluster may be used when queries will retrieve lists of data items. For example, in the above example, the cluster of products will improve queries requesting all the products. Stephen Mc Kearney, 2001. 15

Applications 3 - SQL Joins Equi-joins SELECT name, address, deptname FROM emp, dept WHERE emp.deptno = dept.deptno The emp and dept relations may be clustered on the deptno attribute. 16 A cluster may be used to cluster relations that are frequently joined together. In the above example, the relations emp and dept may be clustered on the deptno attribute. The value of each deptno will be stored once together with all the corresponding emp and dept tuples. Stephen Mc Kearney, 2001. 16

Overview Intra-file What types of clustering exist? Definition How is it implemented? When is it used? Index How is it implemented in Oracle? in Oracle How do you decide to cluster data? Inter-file How does clustering work? Advantages & Disadvantages? Applications Criteria for How does clustering compare to B + -Trees? in Pages Advantages Disadvantages Comparison Compare clustered and unclustered? Unclustered Relations Clustered Relations 17 Stephen Mc Kearney, 2001. 17

Index Deptno Records 10 Dept Page P1 Index on Deptno 10 20 30 Employee Employee Employee 20 Dept Employee Employee Employee All records with deptno=10 Page P2 All records with deptno=20 30 Dept Employee Employee Page P3 Employee All records with deptno=30 18 The DBMS uses a clustering index when it implements a cluster. The clustering index is used to index the cluster key. This allows the DBMS to efficiently access the data in the cluster. The cluster index contains an entry for each cluster key value. The index may be a B + -Tree Ref: Elmasri, sec 6.1.2 Stephen Mc Kearney, 2001. 18

in Oracle Create a cluster CREATE CLUSTER emp_dept (deptno NUMBER(3)); Create a cluster index CREATE INDEX emp_dept_index ON CLUSTER emp_dept; Create Tables CREATE TABLE dept (deptno NUMBER(3), ) CLUSTER emp_dept (deptno) PRIMARY KEY (deptno); CREATE TABLE emp (empno NUMBER(5), deptno NUMBER(3), ) CLUSTER emp_dept (deptno) FOREIGN KEY (deptno) REFERENCES dept; 19 There are three steps required to create a cluster in Oracle: 1. Create the cluster The space for the cluster is allocated on the disc. 2. Create the cluster index Oracle requires a cluster index to be able to access the cluster. Therefore, the cluster index must exist before data can be added to the cluster. 3. Create the tables When the tables are created a parameter is added to the CREATE TABLE command indicating the cluster to which the table will belong. Once the cluster has been created the normal data manipulation commands (INSERT, DELETE, UPDATE, SELECT) may be used. Therefore, using a cluster to improve the performance of a database does not affect the application programs that access the data. Stephen Mc Kearney, 2001. 19

Overview Intra-file What types of clustering exist? Definition How is it implemented? When is it used? Index How is it implemented in Oracle? in Oracle How do you decide to cluster data? Inter-file How does clustering work? Advantages & Disadvantages? Criteria for How does clustering Applications compare to B + -Trees? in Pages Advantages Disadvantages Comparison Compare clustered and unclustered? Unclustered Relations Clustered Relations 20 Stephen Mc Kearney, 2001. 20

Criteria for Query Requirements Joins Lists Hierarchies Space Requirements may save space Update Requirements may slow updates 21 Deciding to cluster a set of relations depends on three factors: Query requirements improves joins between relations because it stores related tuples together in the same page. When the most common queries involve joining two relations, a cluster may improve performance. Space requirements Because each cluster key value is only stored once, storing relations in a cluster can use less storage space than storing the same relations separately. If storage space is restricted clustering the data may save space. Update requirements Cluster are difficult to update because space must be left to allow for additional clustered tuples. If space is not available, it may be necessary to move tuples between pages. Stephen Mc Kearney, 2001. 21

Comparison with Other Techniques B + -Tree Fast access to individual tuples Does not affect the order of data Can be ignored if not useful Easy to create and delete Cluster Fast access across relations Changes the order of the data Must be searched to access data Difficult to create and delete 22 A B+-Tree is designed to provide fast access to individual tuples in a relation. A cluster is designed to improve the performance of queries that join two or more relations together. A B+-Tree does not affect the order of the actual data. Although the index may be ordered, the actual data remains unordered. A cluster orders the actual data. A B+-Tree does not have to be used to answer a query. It is possible to access the data directly if using the B+-Tree is too inefficient. As a cluster affects the physical ordering of the data, the cluster must be accessed to retrieve the data. Hence, a cluster will slow down certain queries. A B+-Tree index is easy to create and delete because it is separate from the data. A cluster is difficult to create or change because it must be created before the data is added to the database. Deleting a cluster will destroy the data. Stephen Mc Kearney, 2001. 22

Partitioned Table CREATE TABLE sales ( acct_no NUMBER(5), acct_name CHAR(30), amount_of_sale NUMBER(6), week_no INTEGER ) PARTITION BY RANGE ( week_no ). (PARTITION sales1 VALUES LESS THAN ( 4 ) TABLESPACE ts0, PARTITION sales2 VALUES LESS THAN ( 8 ) TABLESPACE ts1,... PARTITION sales13 VALUES LESS THAN ( 52 ) TABLESPACE ts12 ); Oracle Concepts Manual 23 Stephen Mc Kearney, 2001. 23

Partitioned Index 1 Oracle Concepts Manual 24 Stephen Mc Kearney, 2001. 24

Partitioned Index 2 Oracle Concepts Manual 25 Stephen Mc Kearney, 2001. 25

Partitioned Index 3 Oracle Concepts Manual 26 Stephen Mc Kearney, 2001. 26

Equipartitioned Tables Oracle Concepts Manual Better availability and reliability 27 Stephen Mc Kearney, 2001. 27

Disc Striping Oracle Concepts Manual 28 Stephen Mc Kearney, 2001. 28