Database Design and the Reality of Normalisation



Similar documents
DATABASE SYSTEMS. Chapter 7 Normalisation

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

Chapter 6. Database Tables & Normalization. The Need for Normalization. Database Tables & Normalization

Topic 5.1: Database Tables and Normalization

Introduction to normalization. Introduction to normalization

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

Normalization in OODB Design

Normalization. Reduces the liklihood of anomolies

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

Higher National Unit specification: general information. Relational Database Management Systems

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

Teaching Database Modeling and Design: Areas of Confusion and Helpful Hints

Part 6. Normalization

IT2304: Database Systems 1 (DBS 1)

Normalization of Database

Database Normalization as a By-product of Minimum Message Length Inference

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

Database Design and Normalization

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London Martin Paris Deen London

Chapter 9: Normalization

Module 5: Normalization of database tables

Overview. Physical Database Design. Modern Database Management McFadden/Hoffer Chapter 7. Database Management Systems Ramakrishnan Chapter 16

DATABASE NORMALIZATION

Normalization. CIS 3730 Designing and Managing Data. J.G. Zheng Fall 2010

MCQs~Databases~Relational Model and Normalization

Normalization. Functional Dependence. Normalization. Normalization. GIS Applications. Spring 2011

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

Normalization. CIS 331: Introduction to Database Systems

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Analysis and Design Complex and Large Data Base using MySQL Workbench

Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

Database Concepts II. Top down V Bottom up database design. database design (Cont) 3/22/2010. Chapter 4

BCA. Database Management System

DATABASE DESIGN. - Developing database and information systems is performed using a development lifecycle, which consists of a series of steps.

Benefits of Normalisation in a Data Base - Part 1

IT2305 Database Systems I (Compulsory)

DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF)

SQL Server. 1. What is RDBMS?

Normalization. Normalization. Normalization. Data Redundancy

Database Management System

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3)

Relational Data Analysis I

If it's in the 2nd NF and there are no non-key fields that depend on attributes in the table other than the Primary Key.

Functional Dependency and Normalization for Relational Databases

Functional Dependencies and Finding a Minimal Cover

Normalization in Database Design

Unit 3.1. Normalisation 1 - V Normalisation 1. Dr Gordon Russell, Napier University

How To Write A Diagram

DATABASE MANAGEMENT SYSTEMS. Question Bank:

Fundamentals of Database System

CSCI-GA Database Systems Lecture 7: Schema Refinement and Normalization

Determination of the normalization level of database schemas through equivalence classes of attributes


Database Normalization. Mohua Sarkar, Ph.D Software Engineer California Pacific Medical Center

Lecture 2 Normalization

Database Design and Normal Forms

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

Chapter 7: Relational Database Design

Normalisation 1. Chapter 4.1 V4.0. Napier University

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

normalisation Goals: Suppose we have a db scheme: is it good? define precise notions of the qualities of a relational database scheme

CS143 Notes: Normalization Theory

Chapter 10. Functional Dependencies and Normalization for Relational Databases

Relational Database Design: FD s & BCNF

Database Design Methodology

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 5: Normalization II; Database design case studies. September 26, 2005

Data Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

7.1 The Information system

DBMS. Normalization. Module Title?

Design of Relational Database Schemas

THE BCS PROFESSIONAL EXAMINATION Diploma. October 2004 EXAMINERS REPORT. Database Systems

The Relational Database Model

Database Design Final Project

Theory of Relational Database Design and Normalization

Chapter 10 Functional Dependencies and Normalization for Relational Databases

Notes. Information Systems. Higher Still. Higher. HSN31010 Database Systems First Normal Form. Contents

MODULE 8 LOGICAL DATABASE DESIGN. Contents. 2. LEARNING UNIT 1 Entity-relationship(E-R) modelling of data elements of an application.

Why & How: Business Data Modelling. It should be a requirement of the job that business analysts document process AND data requirements

C HAPTER 4 INTRODUCTION. Relational Databases FILE VS. DATABASES FILE VS. DATABASES

Normalization NORMALIZATION. Primary keys (contd.) Primary keys

Boyce-Codd Normal Form

EXTENDED LEARNING MODULE A

Unit 2.1. Data Analysis 1 - V Data Analysis 1. Dr Gordon Russell, Napier University

RELATIONAL DATABASE DESIGN. Basic Concepts

Course Outline (Undergraduate):

Database Constraints and Design

USING UML FOR OBJECT-RELATIONAL DATABASE SYSTEMS DEVELOPMENT: A FRAMEWORK

UNDERSTANDING NORMALIZATION

Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102

BCS THE CHARTERED INSTITUTE FOR IT BCS HIGHER EDUCATION QUALIFICATIONS BCS Level 5 Diploma in IT. September 2013 EXAMINERS REPORT

Theory of Relational Database Design and Normalization

Transcription:

Proceedings of the NACCQ 2000 Wellington NZ www.naccq.ac.nz Database Design and the Reality of Normalisation Dave Kennedy ABSTRACT Institute of Technology Christchurch Polytechnic Te Whare Runanga O Otautahi kennedyd@chchpoly.ac.nz What is normalisation all about? Why do we teach it? How do we teach it? How can we explain normalisation to our students so that they will understand it? This paper presents a method of teaching normalisation that, experience has shown, students can understand. The paper also considers the broader questions of: Why is normalisation important? Where does it fit in the process of database design? How important is it in the real world? Database design can be done using an entity relationship diagram (ERD) - a top down approach or by normalisation of sets of data - a bottom up approach. The question is, What do real database designers do? What methodologies do they use? How important is normalisation? What normalisation rules do they use i.e. how far do they take it? How important is denormalisation? This paper presents a summary of findings, from interviews with database designers, that should help us in our teaching of Database design. Keywords Normalisation, database design, dependency diagram 1. INTRODUCTION I use a mostly ERD approach to database design but I don t do it unaware of normalisation Database designer What is normalisation all about? Why do we teach it? How do we teach it? How can we explain normalisation to our students so that they will understand it? This paper presents a method of teaching normalisation that, experience has shown, students can understand. The paper also considers the broader questions of: Why is normalisation important? Where does it fit in the process of database design? How important is it in the real world? 167

The teaching method I use is based on Rob & Coronel (1997). The reality check was made by interviewing three industry people involved in relational database design. If we follow the ERD approach and then check the resultant tables against the normalisation rules we usually only need to go to 3NF. If we blindly normalise sets of data then often we will need to go to 4NF. 2. NORMALISATION 2.1 What is Normalisation Normalisation is a set of rules which can be used to modify the way data is stored in tables. (Rob & Coronel, Lecture #5, Date, Date & Fagin). Normalisation The process of converting complex data structures into simple, stable data structures (McFadden & Hoffer). There are rules for 1NF, 2NF, 3NF, BCNF, 4NF, 5NF and domain-key NF. Most textbooks mention 5NF and DKNF only in passing and note that they are not particularly applicable to the design process (Rob & Coronel, pg303, Pratt & Adamski, pg 161, Howe p87). Normalisation is really about the formalisation of simple ideas (Date & Fagin). All too often the simplicity is lost in esoteric terminology and papers are often excessively concerned with the formalism and provide very little practical insight (Date & Fagin). 2.2 Why Normalisation Normalisation is about designing a good database i.e. a set of related tables with a minimum of redundant data and no update, delete or insert anomalies. Normalisation is a bottom up approach to database design. The designer interviews users and collects documents - reports etc. The data on a report can be listed and then normalised to produce the required tables and attributes. Normalisation is also used to repair a bad database design, i.e. given a set of tables that exhibit update, delete & insert anomalies the normalisation process can be used to change this set of tables to a set that do not have problems. Another approach to database design is to use Entity-Relationship Diagrams (ERD). This is a topdown approach. An Entity is a thing about which we wish to store data. An ERD models the entities, their attributes and the relationships between them. The ERD rules are: 1. Each entity has its own table 2. M-M relationships are resolved by creating a composite (bridge) entity which has, at least, the primary keys of its parent entities 3. The aim is to minimise data redundancy. Figure 1. Steps in Normalisation 168

When teaching normalisation I often find myself thinking There are 2 or 3 entities involved here and there are various M-M relationships between them - If I sorted that out first then I wouldn t have to go through this complicated normalisation stuff. Rob & Coronel, pg 226 and Harrington suggest that the best approach is a combined, iterative methodology. Database design is not just about normalisation although normalisation is a useful aid in the process of database design (Date). 2.3 Normalisation - The How There are many textbooks and websites that attempt to explain the process and the rules of normalisation. There are many different ways to explain normalisation. Some are easier to understand than others. I have found the following (based on Rob & Coronel, McFadden & Hoffer) to be the most understandable. See Figure 1. Steps in normalisation 2.4 Definitions 1NF 2NF 3NF First Normal Form atomic values Primary Key no repeating groups Second Normal Form There are no partial dependencies Third Normal Form There are no transitive dependencies There are no multi-valued dependencies I find these to be the most easily understood definitions of 1NF - 4NF. It is worth noting the Date & Fagin conditions for 4NF and 5NF viz. 4NF - the relation is in BCNF and some key is simple. 5NF - the relation is in 3NF and every key is simple. A simple key is a single attribute key. 2.5 Normalisation - a Teaching Method. 1. Create a table and insert representative data - with as much redundancy as possible 2. Identify the Primary Key 3. Draw a dependency diagram 4. Remove partial dependencies 5. Remove transitive dependencies 6. Check for BCNF i.e. remove any other dependencies that are not candidate keys 7. Remove multi-valued dependencies Example Table 1 Normalise the following relation to BCNF. Classes (Staff#, StaffName, {ClassCode {StudID, StudName, Grade, ClassPos}}) A staff member can teach more than one class. A student can be in more than one class. A class is taught by only one staff member and can have many students. Grade is the student s final grade in a specific class. ClassPos is the student s final position in a specific class - and is unique for that class. BCNF key 4NF Boyce-Codd Normal Form Every determinant is a candidate Fourth Normal Form Staff# StaffNameClassCode StudID StudName Grade ClassPos S101 Smith PR203A 1000 Peter B 5 S101 Smith PR203A 1010 Raewyn A 2 S101 Smith DB100B 1000 Peter B 7 S101 Smith DB100B 1020 Sue A 5 S102 Jones SF100C 1010 Raewyn A 3 S102 Jones SF100C 1020 Sue B 8 etc The Primary key is ClassCode + StudID. Table 1 NACCQ 2000 169

Figure 3. 2NF Dependency Diagram 2NF Figure 2. Dependency Diagram Remove the partial dependencies Figure 3. 2NF Dependency Diagram 3NF & BCNF Remove the transitive dependency Figure 4. 3NF & BCNF Dependency Diagram This relation satisfies BCNF because ClassCode + ClassPos is a candidate key 2.6 Normalisation - The Reality Both Pratt and Date note that in reality we usually eliminate multiple repeating groups at the 1NF stage. This, in effect, is what the ERD composite entity creation does too. 5NF and DKNF are of little practical significance (Howe, Rob & Coronel) and if the ERD and normalization are done in conjunction then usually 3NF is sufficient to produce a good database design (Harrington, Rob & Coronel). The question is, What do real database designers do? What methodologies do they use? How important is normalisation? What normalisation rules do they use i.e. how far do they take it? How important is denormalisation? 170

2.6.1 Method Three database designers, from three different companies, were interviewed. They were asked the following questions: 1. What process/methodologies do you use in designing a relational database? 2. How far do you take the normalisation process? 3. Is normalisation important? 4. Is de-normalisation a consideration? When? 2.6.2 Summary of Responses Question 1 What process/methodologies do you use in designing a relational database? identify input sources and output requirements identify data catchment points design input screens build the ERD and resolve M-M relationships use auto generators for Primary Keys most of the time especially for tables with composite keys (except when the table is only a bridge) Talk to users Identify requirements Build an ERD and resolve M-M relationships I m very much an entity person Repeat the above steps - take it back to people and ask about scenarios I hate writing one line of code until the data structures are settled Use composite keys in bridging tables Use composite keys in other tables with caution Interview users - what do they want - specifics Define outcomes From outcomes work back to define entities and build an ERD Use this to talk to users again Build a prototype - and use this to talk to users Iterative prototyping - the prototype becomes the documentation - used to promote communication and understanding Resolve M-M relationships Auto generated numeric Primary keys - except for bridge entities Question 2 How far do you take the normalisation process? The method forces 4NF because you have already resolved M-M relationships Auto generated PKs force 2NF Each entity has own table - forces 3NF BCNF doesn t occur ERD method resolves 4NF issues I only go to 3NF Payback not there to go beyond 3NF The nature of the data is such that it is not often updated Question 3 Is normalisation important? Often a major issue when you take over someone else s work - often the problems are normalisation problems. I use an ERD approach but I don t do it unaware of normalization Important that the design is at least 3NF or you will have problems with changes I use normalisation when looking at other people s tables Useful in a trouble-shooting role NACCQ 2000 171

Question 4 Is de-normalisation a consideration? When? Important to get the tables right before you denormalise - only then do you know you are doing it safely Usually for performance - especially when response times are an issue - i.e. tables involved in the client, customer interface De-normalisation is not done often - maybe 2 or 3 tables in a database of 120 tables Needs to be controlled - code required for this Yes - but not often. Usually for performance - often money totals are added to a table Use triggers to keep database in sync I don t de-normalise when using Access because it doesn t have triggers 4. REFERENCES Date C. J. (1986) An Introduction to Database Systems. V 1, Fourth edition, Addison-Wesley. Date, C. J. & Fagin, R. (1992) Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases. ACM TODS, V 17, 3, Sept, pp 465-476. Harrington, J. L.(1998) Relational Database Design Clearly Explained. AP Professional. Howe, D. R. ((1983) Data Analysis for Data Base Design. Edward Arnold. Lecture #5 (2000) Data Normalisation. Accessed April 17, 2000. <http://phoenix.ucr.edu/mis/mgt230/ lecture5/index.html> McFadden, F. R. and Hoffer, J. A. (1994) Modern Database Management. Fourth Edition, The Bejamin/ Cummings Publishing Company. Pratt P. J. & Adamski J. J. (1987) Database Systems: Management and Design. Boyd & Fraser. Rob P. & Coronel C. (1997) DataBase Systems, Design, Implementation and Management, Course Technology. I don t de-normalise 3. CONCLUSIONS Normalisation is an important process that database designers need to know. The database designers that I interviewed use an ERD approach to database design but they are aware of normalisation problems. Their approach usually results in tables normalised to 4NF. Normalisation is most useful when analyzing other people s database designs or when there are problems with a database. If we are to teach normalization so that student s can understand it then we need to keep it simple. I have found that if they add rows of data to a table and then draw a dependency diagram it improves their understanding of the process. In the light of the interviews conducted for this research I would suggest allowing student s to draw ER Diagrams in conjunction with the normalisation process. The normalization process should also eliminate multiple repeating groups at the 1NF stage thus avoiding the need to check for 4NF. It is important to note that in reality database designers add auto generated numeric Primary keys to most tables, which ensures at least 2NF. 172