Chapter 9: Normalization



Similar documents
The 3 Normal Forms: Copyright Fred Coulson 2007 (last revised February 1, 2009)

Topic 5.1: Database Tables and Normalization

Functional Dependency and Normalization for Relational Databases

Module 5: Normalization of database tables

Chapter 6. Database Tables & Normalization. The Need for Normalization. Database Tables & Normalization

Normalisation. Why normalise? To improve (simplify) database design in order to. Avoid update problems Avoid redundancy Simplify update operations

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London Martin Paris Deen London

DATABASE NORMALIZATION

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

Fundamentals of Database System

Normalization of Database

DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF)

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

Chapter 10. Functional Dependencies and Normalization for Relational Databases

Normalization in Database Design

Normalization. CIS 3730 Designing and Managing Data. J.G. Zheng Fall 2010

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

Theory of Relational Database Design and Normalization

Schema Refinement, Functional Dependencies, Normalization

Normalization. Reduces the liklihood of anomolies

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

Introduction to normalization. Introduction to normalization

Overview of Database Management Systems

The process of database development. Logical model: relational DBMS. Relation

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES

Part 6. Normalization

Chapter 10 Functional Dependencies and Normalization for Relational Databases


DATABASE SYSTEMS. Chapter 7 Normalisation

Theory of Relational Database Design and Normalization

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

Database Management System

Tutorial on Relational Database Design

A Short Tutorial on Using Visio 2010 for Entity-Relationship Diagrams

Database Design and Normalization

Normalization. CIS 331: Introduction to Database Systems

Normalization. Normalization. First goal: to eliminate redundant data. for example, don t storing the same data in more than one table

Lecture 6. SQL, Logical DB Design

Database Design Basics

Normalization. Normalization. Normalization. Data Redundancy

Week 11: Normal Forms. Logical Database Design. Normal Forms and Normalization. Examples of Redundancy

Database Normalization. Mohua Sarkar, Ph.D Software Engineer California Pacific Medical Center

- Eliminating redundant data - Ensuring data dependencies makes sense. ie:- data is stored logically

DATABASE INTRODUCTION

Introduction to Microsoft Jet SQL

Database Design and the Reality of Normalisation

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Conceptual Design: Entity Relationship Models. Objectives. Overview

Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

Database Concepts II. Top down V Bottom up database design. database design (Cont) 3/22/2010. Chapter 4

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

CS143 Notes: Normalization Theory

Lecture 2 Normalization

SAMPLE FINAL EXAMINATION SPRING SESSION 2015

DATABASE DESIGN: Normalization Exercises & Answers

Functional Dependencies and Finding a Minimal Cover

Database Design and Normalization

Normalization. Functional Dependence. Normalization. Normalization. GIS Applications. Spring 2011

MCQs~Databases~Relational Model and Normalization

Database Design and Implementation

Normal forms and normalization

Relational Data Analysis I

BCA. Database Management System

Database Design and Normal Forms

If it's in the 2nd NF and there are no non-key fields that depend on attributes in the table other than the Primary Key.

Design of Relational Database Schemas

Unit 3.1. Normalisation 1 - V Normalisation 1. Dr Gordon Russell, Napier University

RELATIONAL DATABASE DESIGN

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Database Normalization as a By-product of Minimum Message Length Inference

Relational Database Basics Review

Normalisation 1. Chapter 4.1 V4.0. Napier University

Normalization of Database Tables. Functional Dependency. Examples of Functional Dependencies: So Now what is Normalization? Transitive Dependencies

Chapter 7: Relational Database Design

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

The Relational Database Model

The Relational Data Model and Relational Database Constraints

CSCI-GA Database Systems Lecture 7: Schema Refinement and Normalization

Lecture Notes on Database Normalization

Normalization in OODB Design

Designing Databases. Introduction

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

Data Modeling: Part 1. Entity Relationship (ER) Model

Fundamentals of Database Design

RELATIONAL DATABASE DESIGN Good Database Design Principles

C HAPTER 4 INTRODUCTION. Relational Databases FILE VS. DATABASES FILE VS. DATABASES

SQL DATA DEFINITION: KEY CONSTRAINTS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7

DBMS. Normalization. Module Title?

Normalization. Purpose of normalization Data redundancy Update anomalies Functional dependency Process of normalization

EXTENDED LEARNING MODULE A

Why & How: Business Data Modelling. It should be a requirement of the job that business analysts document process AND data requirements

SQL AND DATA. What is SQL? SQL (pronounced sequel) is an acronym for Structured Query Language, CHAPTER OBJECTIVES

Databases Model the Real World. The Entity- Relationship Model. Conceptual Design. Steps in Database Design. ER Model Basics. ER Model Basics (Contd.

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3)

ER modelling, Weak Entities, Class Hierarchies, Aggregation

Transcription:

Chapter 9: Normalization Part 1: A Simple Example Part 2: Another Example & The Formal Stuff A Problem: Keeping Track of Invoices (cont d) Suppose we have some invoices that we may or may not want to refer to later 1

A Problem: Keeping Track of Invoices (cont d) Fig. 9.1 Could store in an excel file but, as seen, might have problems if have complex questions relating to the data: 1. How many 4 bolts did Frankenstein Parts order in 2002? 2. What items were sold on a certain date? Solution: A Normalized Database First Normal Form (NF1): No Repeating Elements or Groups of Elements In Fig. 9.1, rows 2, 3, 4 represent invoice 125, which in DB terms is a single tuple In NF1 want to get rid of repeating elements, which are: column H2 to H4, column J2 to J4, column K2 to K4 etc these contain lists of values, and these are hated by NF1 NF1 wants atomicity: each attribute is simple & indivisible the repeating data for invoice 125 is cells: H2-M2, H3-M3, H4- M4 Can satisfy NF1, simply by separating each item in these lists into its own row (See Fig. 9.2). 2

Solution: NF1 Cont d Fig. 9.2 But, were trying to reduce & simplify, now have introduced more data! No matter, this will be addressed later (with NF3) Solution: NF1 Cont d Have only done half of NF1. NF1 addresses: 1. Row of data can t have repeat groups of similar data (atomicity) 2. Each row of data must have a unique identifier (or Primary Key) In order to look at 2., have to convert Fig 9.2 into a RDBMS (see the table in MS Access Fig. 9.3) Fig. 9.3 As can be seen, no one column ids each row, so have to use two together: order_id & item_id Together the concatenated primary key ids each row 3

Solution: NF1 Cont d The underlying structure of the table can be represented as Fig. 9.4 Fig. 9.4 order_ date Identify the columns that t make up the customer_id primary key with the PK notation. customer_address Fig. 9.4 begins the Entity Relationship customer_state Diagram (or ERD). DB schema now satisfies the 2 requirements of NF1: atomicity & uniqueness. Thus it meets the most basic criteria of a relational db. item_total_pricetotal order_total_price Solution: NF2 Second Normal Form (NF2): No Partial Dependencies on a Concatenated Key Next have to test each table for partial dependencies on a concatenated key Means that for a table with a concatenated primary key, each column that is not part of the primary key must depend upon the entire concatenated key for its existence. If a column depends upon only 1 part of the concatenated key, then entire table has failed NF2 & must create another table to fix it. For each column must ask the question: Can this column exist without one or the other part of the concatenated primary key? If answer is yes even once table fails NF2 4

Solution: NF2 Cont d Refer to Fig. 9.4 again to recall table structure. Recall the meaning of the two columns Fig. 9.4 in the primary key: order_ date order_id ids invoice this item comes from. customer_id item_id is the inventory items unique identifier. customer_address Can think of it as a part number. customer_state Don't analyze these columns (since they are part of the primary key). total Instead consider the remaining columns... item_total_price order_total_price Solution: NF2 Cont d order_date is the date on which the order was made. relies on order_id; an order date has to have an order, otherwise it is only a date can an order date exist without an item_id? yes: order_date relies on order_id, not item_id id (a specific order doesn t have to have a specific item) so order_date fails NF2 customer_id is ID of the customer who placed the order does it rely on order_id? No: a customer can exist without placing any. does it rely on item_id? id? No (same reason). customer_id does not rely on either member of the PK What to do? NF3 will come to the rescue here, hence? for all the rest of the customer_* columns 5

Solution: NF2 Cont d is next column not itself part of PK. It is the plain-language description of the inventory item. relies on item_id, but can it exist without an order_id? Yes! An inventory item (&"description") could sit on a shelf, and never be purchased... It can exist independent of an order. fails the test. is no. of items purchased on a particular invoice. can it exist without an item_id? No: cant have "amount of nothing" can it exist without an order_id? No: a quantity purchased with an invoice is meaningless without an invoice. So this column does not violate NF2 depends on both parts of our concatenated PK. Solution: NF2 Cont d is similar to. It depends on the item_id but not on order_id, so it does violate NF2. item_total_price is tricky: seems to depend on both order_id & item_id, so passes NF2. but it is a derived value: it is times. so, in fact, it doesn t belong in the db at all. can easily be reconstructed outside of db; to include it would be redundant (and could quite possibly introduce corruption). therefore can discard it order_total_price the sum of all the item_total_price fields for a particular order, is another derived value. can discard this field too for the same reason as item_total_price 6

Solution: NF2 Cont d Fig. 9.4 order_date customer_id customer_address customer_state item_total_price order_total_price Fig. 9.4 (New) order_date customer_id?? customer_address? customer_state? item_total_price order_total_price? Solution: NF2 Cont d What to do with a table that fails NF2, as this one has? First take out the second half of the concatenated PK (item_id) & put it in its own table. All columns that depend on item_id - whether in whole or in part - follow it into the new table, order_items (see Fig. 9.5). The other fields those that rely on just the first half of the PK (order_id) and those we aren't sure about stay where they are. Fig. 9.5 order_date customer_id customer_address customer_state order_items 7

Solution: NF2 Cont d things to notice abut Fig. 9.5: 1. have brought a copy of order_id to the order_items table to allow each order_item to "remember" which order it is a part of. 2. table has fewer rows than before & no longer has a concatenated PK. PK consists of a single column, order_id. 3. order_items table does have a concatenated primary key. Crows feet mean in Fig. 9.5: each order can be associated with any number of order-items, but at least one; each order-item is associated with one order, and only one. Fig. 9.5 order_date customer_id customer_address customer_state order_items Solution: NF2 Phase II Remember, NF2 only applies to tables with a concatenated PK. Now has a single-column PK, it has passed NF2. order_items, however, still has a concatenated PK. have to pass it thro NF2 analysis again to see if it passes. ask the same question we did before: Can this column exist without one or the other part of the concatenated PK? Fig. 9.6 shows order_items table structure. Fig. 9.6 order_items relies on item_id, but not order_id, so this again fails NF2 relies on both parts of PK, does not violate NF2 relies on item_id but not on order_id, so it does violate NF2 8

Fig. 9.6 Solution: NF2 Phase II Cont d order_items item_ descriptionp Fig. 9.6 (New) order_items On first pass thro NF2 test, lost all fields relying on item_id & put them into new table. This time, only taking fields failing the test: ie stays. What's different this time? First pass, removed item_ id key from altogether cos of the 1:M relationship between & order-items. Therefore field had to follow item_id into the new table. Second pass, item_id wasn t taken from order-items table cos of the M:1 relationship between order-items & items. Therefore, since does not violate NF2 this time, it is permitted to stay in the table with the two PK parts that it relies on. Solution: NF2 Phase II Cont d Crows feet mean in Fig. 9.7: each item can be associated with any number of lines on any number of invoices, including zero; each order-item is associated with one item, and only one. These two lines are examples of 1:M relationships. This 3-table structure, is how express a M:N relationship: Each order can have many items; each item can belong to many. Notes: Didn t bring a copy of order_id column into new table cos individual items needn t know the they are part of, as order_items remembers this r ship via the order_id & item_id columns. Taken together these columns comprise the PK of order_items, but taken separately they are FKs to rows in other tables. New table does not have a concatenated PK, so it passes NF2. Fig. 9.7 order_date customer_id customer_address customer_state order_items items 9

Solution: NF3 Third Normal Form (NF3): No Dependencies on Non-Key Attributes Can return to repeating Customer info problem. As db stands, if customer places >1 order have to input customer's s contact info again cos there are columns in that rely on "non-key attributes". To understand this, consider order_date. Can it exist independent of order_id? No!: an "order date" is meaningless without an order. order_date depends on a key attribute (order_id is "key attribute" because it is table s PK). What about can it exist on its own, outside of the table? Yes. It is meaningful to talk about a customer name without referring to an order or invoice. Solution: NF3 Cont d Same goes for customer_address,, & customer_state. These 4 columns actually rely on customer_id, which is not a key in this table (it is a non-key attribute). These fields belong in their own table customers, with customer_id as PK (see Fig 9.8). However, notice in Fig 9.8 that relationship has been severed btw table and the Customer data that used to inhabit it. customer_id(fk) order_date order_items items Fig. 9.8 customers customer_id(pk) customer_address customer_state 10

Solution: NF3 Cont d Restore relationship by creating a foreign key (indicated by (FK)) in As know, FK is a column that points to the PK in another table. Fig 9.9 describes this relationship, and shows our completed ERD. Relationship between & customers may be expressed in this way: each order is made by one, and only one customer; each customer can make any number of, including zero customer_id(fk) order_date order_items items Fig. 9.9 customers customer_id(pk) customer_address customer_state Solution: NF3 Cont d Last point to note: order_id and item_id columns in order_items perform a dual purpose: not only do they function as the (concatenated) PK for order_items, they also individually id serve as FKs to the table and items table respectively. This is shown in Fig. 9.10 customer_id(fk) order_date order_items order_id(fk) PK item_id(fk) items Fig. 9.10 customers customer_id(pk) customer_address customer_state 11

Normalisation cont d Introduction to Database Design As we have seen, an important part of database design is deciding on a suitable logical structure or schema to implement... called database design. SP Considering supplier parts example (S,P,SP) S# P# QTY S S1 P1 300 there is a feeling of correctness. S1 P2 200 S# SName Status City S1 P3 400 S1 Smith 20 Paris Normalisation theory is a S2 P1 300 S2 Jones 10 Paris S2 P2 400 S3 Blake 30 Rome formalism of simple ideas with a P S3 P2 200 practical application P# PName Colour Weight City P1 Nut Red 12 London P2 Bolt Green 17 Paris in logical database schema design. P3 Screw Blue 27 Rome P4 Screw Red 14 London Normalisation theory should allow us to recognise relations with undesirable properties, tell us what is "wrong" & how to "correct" it. 12

Intro to Database Design Cont d Normalisation theory is built around normal forms - each normal form has a set of satisfiable criteria. Normal forms exist in a hierarchy: 1NF -> 2NF -> 3NF -> BCNF -> 4NF -> PJ/NF (5NF) Codd defined 1NF, 2NF, 3NF in 1972. 3NF had inadequacies so revised in 74 by Boyce/Codd (BCNF). 1977 Fagin defined 4NF, 1979 defined 5NF. 6NF,7NF?... dependencies theory suggests there may be higher NFs but not practicable in database environment. DB designers should aim for higher NFs but this is not law - just recommended as normalisation simply provides guidelines for database design. There are often good reason for not using normalisation theory. Introduction to Database Design Cont d In order to describe the various normal forms we must first introduce some definitions: Functional Dependency Given relation R, attribute Y of R is functionally dependent on X of R, R.X -> R.Y, or R.X functionally determines R.Y...... iff each R.X value has associated with it precisely one R.Y value, where X and/or Y may be composite. R.X called the determinant, R.Y called the dependent S.SNAME, S.STATUS and S.CITY are each functionally dependent on S.S# S# If R.X is a candidate key or if R.X is the primary key, then all R.Y must be functionally dependent on R.X In SP we have a composite primary key so SP.(S#,P#) -> SP.QTY 13

Introduction to Database Design Cont d There is no requirement in the definition of functional dependence that R.X be a candidate key, thus: R.X -> R.Y iff whenever 2 tuples of R.X are the same then the corresponding R.Y values are also the same. R.Y is fully functionally dependent on R.X.. iff it is functionally dependent on R.X & not fully functionally dependent on any subset of R.X Example: S.(S#,STATUS) -> S.CITY is true but not full functional dependence as S.S# -> S.CITY If R.X -> R.Y but not fully then R.X must be composite Normalisation: Example 2 Given the report in Fig 9.11, need to put it in a tidy DB. Problems with current form: PROJ_NUM is supposed to be PK or part of PK but contains nulls. Maybe PROJ_ NUM+EMP _ NUM will define e each row. The table entries contain inconsistencies (e.g. JOB_CLASS Elect. Engineer could be EE or E. Eng or others) Fig. 9.11 14

Normalisation: Example 2 Cont d Further problems with current form: The table has data redundancies leading to the following anomalies: 1. Update Anomalies: Modifying (e.g.) JOB_CLASS for Employee 105 requires lots of alterations (one for each employee 105). 2. Insertion Anomalies: To complete a row definition, a new employee must be given a project; if not yet assigned, this must be assumed to complete the employee tuple. 3. Deletion Anomalies: If employee 103 quits, every row with EMP_NUM=103 must be deleted with the potential loss of other data. Inefficiency: If a large number of new employees are hired, a lot of redundant/unassigned d d data must be assumed and input. Integrity: Possible data integrity problems may arise out of the above. Example 2: Conversion to NF1 So Problems with Fig. 9.11: Data cannot be as shown in Fig. 9.11 cos have to be able to identify all tuples with a PK. PROJ_NUM cannot be PK in Fig. 9.11 cos of nulls Cannot have the repeating groups shown in Fig. 9.11 so have to alter table to remove them. Step 1. Eliminate the repeating groups Eliminate the null values. Now have Fig. 9.12 Fig. 9.12 15

Example 2: Conversion to NF1 Cont d Step 2. Identify the Primary Key Layout in Fig. 9.12 is only a cosmetic change need a PK to uniquely identify all tuples. This may be seen to be PROJ_NUM+EMP_NUM Step 3. Identify all dependencies The identification of the PK means already have the following: PROJ_NUM,EMP_NUM PROJ_NAME,EMP_NAME,JOB_CLASS,CHG_HOUR, HOURS Fig. 8.12 Example 2: Conversion to NF1 Cont d Step 3. Cont d But there are additional dependencies: 1. The project number determines the project name: PROJ_NUM PROJ_NAME 2. If know employee number, also know their name, job classification and their charge per hour: EMP_NUM EMP_NAME, JOB_CLASS, CHG_HOURS 3. Also knowing job classification means also know the charge per hour: JOB_CLASS CHG_HOURS These dependencies are shown in the Dependency Diagram in Fig. 9.13 Dependency Diagrams are useful for getting an overall view of relationships among attributes. Fig. 9.13 PROJ_ NUM PROJ_ NAME EMP_ NUM EMP_ NAME JOB_ CLASS CHG_ HOUR HOURS Normal Partial Transitive 16

Example 2: Conversion to NF1 Cont d Looking at Fig. 9.13, can see that: 1. PK attributes are bold, underlined and a different colour. 2. Arrows above (blue) denote desirable FDs (those based on PK) 3. Arrows below the diagram (red and green) are less desirable: a) Partial Dependencies: dependencies based on part of composite PK Need only know PROJ_NUM to know PROJ_NAME, so PROJ_NAME is only dependent on part of the PK. Need only know EMP_NUM to find the EMP_NAME, JOB_CLASS, CHG_HOUR. b) Transitive Dependencies: Dependency of 1 non-prime attribute on another From Fig. 9.13, can see that CHG_HOUR is dependent on JOB_CLASS Neither of these is part of PK (i.e. a Prime Attribute). Fig. 9.13 PROJ_ NUM PROJ_ NAME EMP_ NUM EMP_ NAME JOB_ CLASS CHG_ HOUR HOURS Normal Partial Transitive Example 2: Conversion to NF1 Cont d Properties of NF1: A table in NF1 must have: 1. All key attributes defined 2. No repeating groups in the table (i.e each row/column entry must have only one value) Problem with Fig. 9.13 is the partial dependencies. This can be eliminated with NF2 17

Example 2: Conversion to NF2 Step 1. Identify all key components: PROJ_NUM EMP_NUM PROJ_NUM, EMP_NUM Each component becomes the key of a new table. Three new tables project, employee, assign Step 2. Identify the dependent attributes Use Fig. 9.13 to determine which attributes are dependent on which others, using the arrows in the dependency diagram project(proj_num, PROJ_NAME) employee(emp_num, EMP_NAME, JOB_CLASS, CHG_HOURS) assign(proj_num, EMP_NUM, ASSIGN_HOURS) Results are shown in Fig. 9.14 Example 2: Conversion to NF2 Cont d At this point, most anomalies discussed above have been eliminated e.g. if want to add/change/delete a project record, only need to alter 1 row of project So a table is in NF2 iff 1. It is in NF1 And 2. It has no partial dependencies (can still have transitive dependencies) Fig. 9.14 still has a transitive dependency which can generate anomalies e.g. if charge per hour changes for a job classification held by many employees, that change must be made for all (leading to possible update anomalies) Resolve transitive dependencies in NF3 PROJ_ NUM project PROJ_ NAME Fig. 9.14 EMP_ NUM EMP_ NAME JOB_ CLASS employee CHG_ HOUR PROJ_ NUM EMP_ NUM assign ASSIGN _HOURS 18

Example 2: Conversion to NF3 Step 1. Identify each new determinant For each transitive dependency, write its determinant as a PK for a new table (recall: determinant is any attribute whose value determines other values within a row). If have 3 transitive dependencies, have 3 different determinants Here only have one: JOB_CLASS Step 2. Identify the dependent attributes Identify the attributes dependent on each determinant identified in Step 1. Here, have JOB_CLASS CHG_HOUR Name the table to reflect its contents & function, here JOB is ok Step 3. Remove dependent attrib from transitive dependencies Remove all dependent attributes from dependent relationship(s) from each table with transitive relationships JOB_CLASS remains in the employee table as FK Example 2: Conversion to NF3 Final dependency diagram is shown in Fig. 9.15 Fig. 9.15 PROJ_ PROJ_ EMP_ EMP_ JOB_ JOB_ CHG_ PROJ_ EMP_ NUM NAME NUM NAME CLASS CLASS HOUR NUM NUM project employee job assign Or 4 Tables: project(proj_num, PROJ_NAME) assign(emp_num, PROJ_NUM, ASSIGN_HOURS) employee(emp_num, EMP_NAME, JOB_CLASS) job(job_class, CHG_HOUR) A table is in NF3 iff It is in NF2 And It contains no transitive dependencies. ASSIGN _HOURS 19