Benefits of Normalisation in a Data Base - Part 1



Similar documents
Overview. Physical Database Design. Modern Database Management McFadden/Hoffer Chapter 7. Database Management Systems Ramakrishnan Chapter 16

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Foundations of Business Intelligence: Databases and Information Management

Course MIS. Foundations of Business Intelligence

Databases What the Specification Says

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Foundations of Business Intelligence: Databases and Information Management

A Review of Database Schemas

Databases and Information Management

TIM 50 - Business Information Systems

- Eliminating redundant data - Ensuring data dependencies makes sense. ie:- data is stored logically

SQL Server. 1. What is RDBMS?

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

Physical Database Design and Tuning

1. Physical Database Design in Relational Databases (1)

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London Martin Paris Deen London

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Information Systems Analysis and Design CSC John Mylopoulos Database Design Information Systems Analysis and Design CSC340

Foundations of Business Intelligence: Databases and Information Management


Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Introduction to normalization. Introduction to normalization

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Data Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Normalization of Database

How To Write A Diagram

The Entity-Relationship Model

COMHAIRLE NÁISIÚNTA NA NATIONAL COUNCIL FOR VOCATIONAL AWARDS PILOT. Consultative Draft Module Descriptor. Relational Database

Foundations of Business Intelligence: Databases and Information Management

Course: CSC 222 Database Design and Management I (3 credits Compulsory)

IT2304: Database Systems 1 (DBS 1)

7. Databases and Database Management Systems

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Unit 3.1. Normalisation 1 - V Normalisation 1. Dr Gordon Russell, Napier University

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

DATABASE MANAGEMENT SYSTEM

Introduction to Databases

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

1. INTRODUCTION TO RDBMS

Concepts of Database Management Seventh Edition. Chapter 6 Database Design 2: Design Method

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option


CIS 631 Database Management Systems Sample Final Exam

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

BCA. Database Management System

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES

Normalisation 1. Chapter 4.1 V4.0. Napier University

The Relational Database Model

Physical DB design and tuning: outline

Foundations of Information Management

THE BCS PROFESSIONAL EXAMINATION Diploma. October 2004 EXAMINERS REPORT. Database Systems

Overview of Database Management Systems

In This Lecture. Physical Design. RAID Arrays. RAID Level 0. RAID Level 1. Physical DB Issues, Indexes, Query Optimisation. Physical DB Issues

DATABASE INTRODUCTION

Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys

Foundations of Business Intelligence: Databases and Information Management

2. Basic Relational Data Model

How To Improve Performance In A Database

Database Design Methodology

Database Design and Normalization

Designing Databases. Introduction

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Database Management. Technology Briefing. Modern organizations are said to be drowning in data but starving for information p.

DATABASE MANAGEMENT SYSTEMS. Question Bank:

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

Topics. Database Essential Concepts. What s s a Good Database System? Using Database Software. Using Database Software. Types of Database Programs

SQL Server 2008 Core Skills. Gary Young 2011

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Ch.5 Database Security. Ch.5 Database Security Review

David Dye. Extract, Transform, Load

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

Normalization in Database Design

Optimizing Performance. Training Division New Delhi

Designing a Database Schema

SQL Server Query Tuning

Conceptual Design Using the Entity-Relationship (ER) Model

n Assignment 4 n Due Thursday 2/19 n Business paper draft n Due Tuesday 2/24 n Database Assignment 2 posted n Due Thursday 2/26

Conventional Files versus the Database. Files versus Database. Pros and Cons of Conventional Files. Pros and Cons of Databases. Fields (continued)

Tutorial on Relational Database Design

The process of database development. Logical model: relational DBMS. Relation

DATABASE DESIGN: Normalization Exercises & Answers

Fundamentals of Database Design

Database Design and the Reality of Normalisation

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

LOGICAL DATABASE DESIGN

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Higher National Unit specification: general information. Relational Database Management Systems

Normalization in OODB Design

Concepts of Database Management Eighth Edition. Chapter 1 Introduction to Database Management

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

Extraction Transformation Loading ETL Get data out of sources and load into the DW

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

BCS THE CHARTERED INSTITUTE FOR IT BCS HIGHER EDUCATION QUALIFICATIONS BCS Level 5 Diploma in IT. September 2013 EXAMINERS REPORT

Transcription:

Denormalisation (But not hacking it) Denormalisation: Why, What, and How? Rodgers Oracle Performance Tuning Corrigan/Gurry Ch. 5, p69 Stephen Mc Kearney, 2001. 1

Overview Purpose of normalisation Methods of improving database performance Denormalisation Definition Part of database design What can be denormalised Examples Applying Denormalisation Safely Materialised Views 2 Stephen Mc Kearney, 2001. 2

Purpose of Normalisation Two views Design process Steps 1NF, 2NF, 3NF, BCNF Complex to apply This is how we teach it Analysis process Understanding dependencies Correctness check Levels of correctness e.g. 1NF, 2NF, 3NF, BCNF 3 Normalisation is designed to limit the amount of redundant data stored in a database. Normalisation is based on identifying the functional dependencies in the database and using the dependencies to remove potential redundancy. Redundant data introduces potential update anomalies when information in the database is changed. Normalisation also helps to clarify the designer s understanding of the data. Therefore, an unnormalised set of data will potentially contain redundant data, encourage anomalies and make integrity checking difficult. It is never acceptable to design a database that is unnormalised. Stephen Mc Kearney, 2001. 3

Why is normalisation good? Better analysis Understand the dependencies Clarify integrity constraints Scientific, mathematical, etc. Provides future flexibility Reduce data redundancy Avoid anomalies during updates, deletes and inserts Robust designs 4 Stephen Mc Kearney, 2001. 4

Problems with Normalisation Can lead to many relations Each relation is atomic Requires many join queries Affects performance Unnecessary relations eg area( area_code, area_description ) Frequently, designer misunderstood normalisation 5 Stephen Mc Kearney, 2001. 5

What to do when the database is slow? Better design Index the database Use clustering, partitioning, etc. Accept poorer performance If all else fails, denormalise. 6 When a normalised set of relations is created in a DBMS it is important to ensure that the resulting database answers queries efficiently. A relational DBMS provides a number of options for improving the efficiency of a slow database: 1. Improve the design by looking for mistakes and errors in the current design. If the design does not correspond to the actual data usage the database will be slow and inefficient. 2. Create indexes on the most frequently queried attributes. Ensure that existing indexes are correct. 3. Use clustering to improve the performance of joins between relations. Clustering is good because it is hidden from the user. 4. Accept the poorer performance. If poor performance does not interfere with the user s requirements then accept the current performance. If the performance of the database is still poor, then it may be necessary to denormalise the database. Stephen Mc Kearney, 2001. 6

Definition Denormalisation is the design process of taking normalised data and producing a physical design in which normalised data is rearranged so that optimal access and manipulation of data can be achieved. Inmon Also called Consolidation 7 A set of relations may be: Unnormalised When no systematic analysis of the data has been carried out and there may be hidden redundancy in the data. Normalised When a systematic analysis of the data has been carried out and the set of data has been correctly normalised to remove data redundancy. Denormalised When a normalised set of data has been systematically analysed and known redundancy has been introduced into the database to improve the database s efficiency. Poor performance is normally a result of joining many relations or performing complex calculations. Poor performance affects all types of database. The problems that occur in relational databases have been more extensively studied than the problems that occur in other databases. Relational DBMSs are optimised to performed three-way joins. When more complex joins are required, the database structure may not be efficient. The normalisation and denormalisation processes provide a systematic method of analysing the efficiency of a database. Stephen Mc Kearney, 2001. 7

The Database Design Process Conceptual Model Logical Model Entity-Relationship Model Relational Model Denormalisation is part of the physical database design. Physical Model 8 The denormalisation process is performed during the physical design stage. Therefore, denormalisation can only be performed after the data has been normalised. Rodgers describes the main purpose of denormalisation as being to reduce the number of tables that need to be joined for specific access needs. It is important to understand which entities are accessed by the application programs and how these entities relate to other entities. Rodgers suggests using entity-relationship diagrams, data flow diagrams and function/entity cross-reference matrices to identify database usage. Stephen Mc Kearney, 2001. 8

What can be denormalised? One-to-one relationships Many-to-many relationships Splitting tables Report tables Reference data Low-level detail data Derived data 9 Stephen Mc Kearney, 2001. 9

One-to-One Relationships Lift has a for a Contract Merge two entities related by a one-to-one entity Lift Contract Data about both the lift and contract are stored in Lift Contract. 10 When two entities are related by a one-to-one relationship, there is always a one-to-one correspondence between the entities. It is possible to merge the entities into a single entity and implement them as a single relation in the database. Implementing the entities as a single relation avoids a join between two relations. However, if both the entities are not created and deleted at the same time, then null values will have to be used to represent the missing entity values. Stephen Mc Kearney, 2001. 10

Example Lift LiftNo LiftLocation MainDate Checked 100 Poole House 01/03/2001 SMcK 200 Dorset House 01/04/2001 JC 300 Studland House01/05/2001 JC Contract ContractNo LiftNo SignDate Auth 123 100 01/03/2000 PB 345 200 01/04/2000 OD 567 300 01/05/2000 PB LiftContract LiftNo LiftLocation MainDate Checked ContractNoLiftNo SignDate Auth 100 Poole House 01/03/2001 SMcK 123 100 01/03/2000 PB 200 Dorset House 01/04/2001 JC 345 200 01/04/2000 OD 300 Studland House 01/05/2001 JC 567 300 01/05/2000 PB 11 Stephen Mc Kearney, 2001. 11

Many-to-Many Relationships Employee Project Employee Project Works on Many-to-Many Resolved Many-to-Many Employee Project Storing as a one-to-many relationship Project must be duplicated 12 In the relational data model many-to-many relationships are replaced with a new entity and two one-to-many relationships. This means that all queries which involve many-to-many relationships require a join between three relations. It is possible to store a many-to-many relationship in two relations if one of the relations contains duplicate data. In the example above, the many-to-many relationship between the employee and project entities is stored as a one-to-many relationship. This is achieved by duplicating the project data. For example, employee Smith will work on project P1 (first copy) and employee Jones will work on project P1 (second copy). Update anomalies will occur when it is possible for the entities to exist without taking part in the relationship. For example, in the example above, the project must contain a foreign key to the employee entity which will be null if projects can exist without employees working on them. This method is useful when one of the entities contains very little data. Stephen Mc Kearney, 2001. 12

Example Employee EmpNo Ename Dept Sal 100 Stephen DEC RW 200 Jim DEC RW 300 Peter BS KL EmployeeOnProject Empno Pcode 100 A 100 C 200 B 200 D 200 E 200 C Project Pcode Pname Budget A DBS1 10 B DBS2 20 C ADB 15 D Prog 13 E SSADM 12 EmpProject EmpNo Pcode Pname Budget 100 A DBS1 10 200 B DBS2 20 100 C ADB 15 200 C ADB 15 200 D Prog 13 200 E SSADM 12 13 Stephen Mc Kearney, 2001. 13

Splitting Tables Purchases Purchases2004 Purchases2000-2003 Horizontal Split Purchases Purchases Product Purchases Customer Vertical Split Shared Primary Key 14 Stephen Mc Kearney, 2001. 14

Reporting Tables Reporting tools cannot manipulate complex relational models. Therefore, simplify the structure by combining tables into simple tables. 15 Stephen Mc Kearney, 2001. 15

Summary Tables Purchases by product, brand, shop and area Purchases by shop and area Summarise large data sets 16 Stephen Mc Kearney, 2001. 16

Reference Data Reference data consists of descriptions and codes. e.g. Telephone numbers and people. 595015, Computer Centre 595205, S Mc Kearney Reference data is stored in lookup tables. Store descriptions in the entities they describe. Justified because codes are artificial shorthands for descriptions. 17 A lot of information in a database is stored using codes that correspond to descriptions. To access the code corresponding to a description it is necessary to perform a join between an entity and the table of codes. The join can be avoided if the descriptions are actually stored in the entity. This process replaces transitive dependencies that were removed to produce 3rd normal form. Stephen Mc Kearney, 2001. 17

Example Project Pcode Pname Budget A DBS1 10 B DBS2 20 C ADB 15 D Prog 13 E SSADM 12 EmployeeOnProject Empno Pcode 100 A 100 C 200 B 200 D 200 E 200 C EmployeeOnProject Empno Pcode Pname 100 A DBS1 100 C ADB 200 B DBS2 200 D Prog 200 E SSADM 200 C ADB 18 Stephen Mc Kearney, 2001. 18

Detail Data Project One row per booking time-booking(projno,jobno,month,booking) Job Merge job and time-booking Time-Booking job(projno,jobno,m1,m2,m3,,m12) 19 In the example above, the time-booking entity contains one row for every month booked on every job in every project. The data stored in the attributes projno, jobno and month will be duplicated often. For example: 10, 100, Jan, 345 10, 100, Feb, 232 10, 100, Mar, 342 10, 101, Jan, 876 This version of time-booking is easy to process with SQL but requires a lot of storage space. When the job and the time-booking entities are merged each project and job has one tuple containing all the bookings for the year. 10, 100, 345, 232, 342, 10, 101, 876, Merging job and time-booking saves space but is difficult to process with SQL. For example, aggregate functions (MAX, AVERAGE) cannot be used. The merged relation also restricts the number of booking months to 12. Hence, it is only suitable when the amount of detail data is fixed. Stephen Mc Kearney, 2001. 19

Derived Data Derived data is normally not stored in the database. It is calculated from the contents of the database. Calculations can take a long time. Therefore, store derived data in the database. 20 Derived data that is complex to calculate can be pre-calculated and stored in the database. For example, total sales figures may be stored in the database rather than requiring them to be calculated from the individual sales figures. This approach is widely used in information warehouses that store large numbers of pre-calculated summary tables. The summary tables are normally updated as a batch process. The main problem with storing derived data is that it must be re-calculated when the underlying data is changed. Stephen Mc Kearney, 2001. 20

Applying Denormalisation Safely DBMS features Integrity checks Triggers Automatic duplicated updates Views Hide the denormalisation relations Maintain the normalised model Materialised views Pre-executed queries e.g. summaries, joins, etc 21 Stephen Mc Kearney, 2001. 21

Materialised Views Snapshots of a query Unlike views, materialized views are executed and the results stored in the database Example (simplified!) create materialized view sum_sales as select product_no, sum(qty) sum_qty from sales group by product_no; product_no sum_qty 22 Stephen Mc Kearney, 2001. 22

Materialised Views Query Rewriting Optimiser can identify queries that can be answered more efficiently using the MV Example Query select sum(qty) from sales Rewritten and executed as select sum(sum_qty) from sum_sales MV sum_sales is a much smaller table Queried more quickly Potentially significant increase in performance 23 Stephen Mc Kearney, 2001. 23

Materialised Views Advantages Pre-calculated queries much faster Similar to clustering and partitioning Unlike clustering, can have many MVs Oracle can rewrite queries to use a MV automatically Users query a normalised data model (logical model) Optimiser uses denormalised model (physical model) MVs are updated automatically Changing the normalised tables, changes the MVs 24 Stephen Mc Kearney, 2001. 24

Materialised Views Disadvantages Complex Difficult to set up correctly Performance penalty Updating can be difficult and slow Not all DBMSs support MVs Oracle SQL Server Called Indexed Views 25 Stephen Mc Kearney, 2001. 25