Part 6. Normalization



Similar documents
A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

1.204 Lecture 2. Keys

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

Normalization in OODB Design

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

DATABASE NORMALIZATION

If it's in the 2nd NF and there are no non-key fields that depend on attributes in the table other than the Primary Key.

Topic 5.1: Database Tables and Normalization

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

1.264 Lecture 10. Data normalization

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London Martin Paris Deen London


DATABASE SYSTEMS. Chapter 7 Normalisation

Introduction to normalization. Introduction to normalization

RELATIONAL DATABASE DESIGN

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

Normalization. CIS 3730 Designing and Managing Data. J.G. Zheng Fall 2010

Normal forms and normalization

BCA. Database Management System

Tutorial on Relational Database Design

Normalization. Functional Dependence. Normalization. Normalization. GIS Applications. Spring 2011

Chapter 9: Normalization

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

Functional Dependency and Normalization for Relational Databases

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Overview - detailed. Goal. Faloutsos CMU SCS

COMPUTING PRACTICES ASlMPLE GUIDE TO FIVE NORMAL FORMS IN RELATIONAL DATABASE THEORY

Normalization of Database

DBMS. Normalization. Module Title?

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

Database Design Basics

Database Design and Normalization

Normalization. Normalization. Normalization. Data Redundancy

Normalization. CIS 331: Introduction to Database Systems

Database Design and the Reality of Normalisation

LOGICAL DATABASE DESIGN

Normalization in Database Design

CST221, Dr. Zhen Jiang Normalization & design (see Appendix pages 42-55)

Normalization. Reduces the liklihood of anomolies

UNDERSTANDING NORMALIZATION

Chapter 6. Database Tables & Normalization. The Need for Normalization. Database Tables & Normalization

A Simple Guide to Five Normal Forms in Relational Database Theory

MODULE 8 LOGICAL DATABASE DESIGN. Contents. 2. LEARNING UNIT 1 Entity-relationship(E-R) modelling of data elements of an application.

DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF)

Database Design and Normalization

DATABASE INTRODUCTION

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

Normalization. Normalization. First goal: to eliminate redundant data. for example, don t storing the same data in more than one table

Lecture 2 Normalization

Normalisation 1. Chapter 4.1 V4.0. Napier University

Unit 3.1. Normalisation 1 - V Normalisation 1. Dr Gordon Russell, Napier University

Creating Tables ACCESS. Normalisation Techniques

SQL Server. 1. What is RDBMS?

The process of database development. Logical model: relational DBMS. Relation

Exercise 1: Relational Model

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

MCQs~Databases~Relational Model and Normalization

Normalization NORMALIZATION. Primary keys (contd.) Primary keys

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3)

B2.2-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

C HAPTER 4 INTRODUCTION. Relational Databases FILE VS. DATABASES FILE VS. DATABASES

The Relational Database Model

3. Relational Model and Relational Algebra

Schema Refinement, Functional Dependencies, Normalization

KNOWLEDGE FACTORING USING NORMALIZATION THEORY

Database Design and Normal Forms

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Schema Design and Normal Forms Sid Name Level Rating Wage Hours

Lecture 6. SQL, Logical DB Design

Optimum Database Design: Using Normal Forms and Ensuring Data Integrity. by Patrick Crever, Relational Database Programmer, Synergex

Normalisation and Data Storage Devices

Stock Take Procedure


CS143 Notes: Normalization Theory

Normalisation. Why normalise? To improve (simplify) database design in order to. Avoid update problems Avoid redundancy Simplify update operations

What is a database? The parts of an Access database

Design of Relational Database Schemas

Chapter 10. Functional Dependencies and Normalization for Relational Databases

RELATIONAL DATABASE DESIGN Good Database Design Principles

SQL Simple Queries. Chapter 3.1 V3.0. Napier University Dr Gordon Russell

Improving Data Quality in Relational Databases: Overcoming Functional Entanglements

Database Design for the Uninitiated CDS Brownbag Series CDS

normalisation Goals: Suppose we have a db scheme: is it good? define precise notions of the qualities of a relational database scheme

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

CSCI-GA Database Systems Lecture 7: Schema Refinement and Normalization

Module 5: Normalization of database tables

ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT


SQL, PL/SQL FALL Semester 2013

Chapter 7: Relational Database Design

Scheme G. Sample Test Paper-I

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

Functional Dependencies and Finding a Minimal Cover

Access Database Design

Information Systems Analysis and Design CSC John Mylopoulos Database Design Information Systems Analysis and Design CSC340

There are five fields or columns, with names and types as shown above.

Topics. Database Essential Concepts. What s s a Good Database System? Using Database Software. Using Database Software. Types of Database Programs

IT2304: Database Systems 1 (DBS 1)

Transcription:

Part 6 Normalization

Normal Form Overview Universe of All Data Relations (normalized / unnormalized 1st Normal Form 2nd Normal Form 3rd Normal Form Boyce-Codd Normal Form (BCNF) 4th Normal Form 5th Normal Form (PJ/NF) Domain/Key Normal Form (DK/NF) Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 2

Universe of Relations Any sequential file is a relation Not all relations are well formed Normalization provides a set of criteria to evaluate the well formedness of a relation Normal form is only one criterion for determining a good model In general, a sequential file may have repeating groups Example 1 - suppliers: part diode bulb suppliers (GE, TRW, Mot) (GE, Syl) Implemented as: part supplier1 supplier2 supplier3 diode GE TRW Mot bulb GE Syl Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 3

Problem with a Relation Not in First Normal Form Retrieval: which supplier of a part should be retrieved? in which order should suppliers be retrieved? Storage: how many suppliers do you allow for? which spaces are kept blank, and how? Insert: to add a supplier, need to retrieve all suppliers, add the new supplier to an empty slot, and replace the record Delete: to delete a supplier, need to adjust the vector (read, move around, erase, re-write) Update: to update a supplier name, need to retrieve all suppliers, find the one to alter, and rewrite entire set of suppliers Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 4

Solution to Repeating Group Problem Eliminate repeating groups by repeating the key Example 1 - suppliers: part diode diode diode bulb bulb supplier GE TRW Mot GE Syl This new table has a different key than the old one. It is part plus supplier. Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 5

First Normal Form All underlying domains contain atomic values only (no vectors / repeating groups) Example 2 - inventory: part # warehouse # wh_address quantity 100 05 Mpls 200 100 08 StPaul 300 200 05 Mpls 250 200 10 Madison 400 300 08 StPaul 350 Update Anomalies: UPDATE address of warehouse stored in many rows if address changes, must change all rows DELETE if the last row for a warehouse is deleted, the address is lost INSERT to insert a new row, warehouse address must be known The problem occurs because this table is not focused on one primary key - it is about two things - warehouses and parts in warehouses. Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 6

Solution to Multiple Focus Problems A relation that is in 1NF but not in a higher normal form has a composite key (more than one attribute in the key) Establish 2 relations via projection Example 2 - inventory: One table about warehouses: warehouse# wh_address 05 Mpls 08 StPaul 10 Madison One table about inventory with a composite key: part# warehouse# quantity 100 05 200 100 08 300 200 05 250 200 10 400 300 08 350 The original table in 1NF can be reconstructed by a join Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 7

Second Normal Form 1NF + every non-key attribute is fully functionally dependent on the primary key Example 3 - departments: name dept dept_loc smith 402 100 jones 401 200 king 402 100 turner 400 200 olson 401 200 Problem: Functional dependency is transitive The primary key is name dept is functionally dependent on name dept_loc is also functionally dependent on name, but it is transitive because dept functionally determines dept_loc Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 8

Problems with 2NF Relations Update Anomalies: UPDATE - location appears many times - if location of a department changes, must fetch and change all rows containing that location DELETE - if the last row for a department is deleted, the department location information is lost INSERT - to insert a new row, department location must be known Solution: Establish 2 relations via projection Example 3 - departments: name dept and dept dept_loc smith 402 400 200 jones 401 401 200 king 402 402 100 turner 400 olson 401 Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 9

Third Normal Form 2NF + every non-key attribute is non-transitively functionally dependent on the primary key OR Every non-key attribute is mutually independent (none is functionally dependent on any of the others) fully functionally dependent on the primary key OR (Kent) Each attribute in the relation is functionally dependent on the key, the whole key, and nothing but the key A relation that is 2NF but not 3NF can be split into a collection of 3NF relations by projection can be reconstructed by join Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 10

3NF Examples Example 4 - locations: dept# dept_name dept_loc 400 programming 200 401 financial 200 402 academic 100 403 support 300 dept# and dept_name are candidate keys dept_loc is the only non-key attribute, and is, by default, non-transitively functionally dependent on the primary key This table is fine - it is only about departments Example 5 - stock: s# sname p# qty 10 GE 102 1000 10 GE 103 625 10 GE 104 2000 20 TRW 102 500 20 TRW 105 1200 30 Syl 103 1300 technically in 3NF qty is the only non-key attribute (like example 1) candidate keys are (s#, p#) and (sname, p#) didn't require components of an alternate key to be fully functionally dependent on the primary key Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 11

Problems with 3NF Relations The problems associated with alternate key components were not recognized in the early formulations of the relational model. Have the same update anomalies as second normal form Solution: Establish 2 relations via projection Example 5 - stock: s# sname and s# p# qty 10 GE 10 102 1000 20 TRW 10 103 625 30 Syl 10 104 2000 20 102 500 20 105 1200 30 103 1300 or [s#, sname] and [sname, p#, qty] Because of this problem, 3NF (as we have described it) is sometimes referred to as early 3rd Normal Form Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 12

BCNF Boyce-Codd Normal Form 3NF + every determinant is a candidate key (Determinant: any attribute on which some other attribute is fully functionally dependent) In example 4, dept# determined dept_name; in example 5, s# determined sname In example 4, dept# was a candidate key In example 5, s# (by itself) was not a candidate key A relation that is 3NF but not BCNF Can be split into a collection of BCNF relations by projection Can be reconstructed by join BCNF is sometimes referred to as late 3rd Normal Form, or even just as 3rd Normal Form Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 13

3NF to BCNF Example Example 6 - enrollment Rules: 1. For each subject, each student is taught by 1 teacher 2. Each teacher teaches only 1 subject (don't I wish) 3. Each subject is taught by several teachers Student Subject Teacher Smith Math Dr. White Smith English Dr. Brown Jones Math Dr. White Jones English Dr. Brown Doe Math Dr. Green a. Teacher dependent on Student + Subject b. Subject dependent on Teacher c. Teacher not dependent on Subject d. (Student, Subject) is a candidate key e. (Student, Teacher) is also a candidate key Update anomalies, e.g., Dr. White changes name Relation in 3NF, but not in BCNF Teacher is a determinant (b.), but not a candidate key (d. and e.) Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 14

Solution to Example 6 Solution: Form two relations: Student Teacher and Teacher Subject Smith Dr. White Dr. White Math Smith Dr. Brown Dr. Brown English Jones Dr. White Dr. Green Math Jones Dr. Brown Doe Dr. Green Question: How did we know to break it up this way? Answer: The rules help us make this decision. In this case, rule 2 gives us the crucial information - once you know the teacher, you know the subject. Therefore, we need two tables to enforce the rule. The [Teacher, Subject] table tells us which one subject each teacher teaches. Students, in general, need both a subject and a teacher If we specify only subject, we don't know the teacher If we specify teacher, however, we do know the subject because of the rule and the first table Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 15

Fourth Normal Form 4NF and 5NF are relevant only when all attributes in the relation are parts of the key if in BCNF and have a non-key attribute, also in 5NF Example 7 - skills: Suppose we wish to store employee job skills and language skills. (An employee may have many of each.) employee skill language Jones electrical French Jones electrical German Jones mechanical French Jones mechanical German Smith plumbing Spanish In general: if Jones x A and Jones y B then Jones x B and Jones y A The relation is in BCNF - because it is all key... but there is redundancy Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 16

Converting to 4NF Ask the following questions: Could the relation have non-key attributes? Could any combination be missing? If either answer is NO, need to break up relation to achieve 4NF Example 7 - skills: employee skill language should be broken up into two relations: employee skill and employee language Jones electrical Jones French Jones mechanical Jones German Smith plumbing Smith Spanish if job skill and language are independent Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 17

Problems without 4NF Problem occurs when dealing with multiple, independent facts How do we represent them in a single relation? Disjoint: Jones electrical Jones mechanical Jones Jones Smith plumbing Smith French German Spanish Random mix: Jones electrical French Jones mechanical German Smith plumbing Spanish (do extras - repeat, - blank, - anything?) Cross product: Jones electrical French Jones mechanical French Jones electrical German Jones mechanical German Smith plumbing Spanish Check for independence! Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 18

Fifth Normal Form PJ/NF or Projection-Join Normal Form (Kent) - Deals with cases where information can be reconstructed from smaller pieces of information which can be maintained with less redundancy Example 8 - dealerships: 1. Agents represent Companies 2. Companies make Products 3. Agents sell Products Which Agent sells which Product for which Company? Agent Company Product smith ford car smith gm truck jones ford car this form is necessary in the general case BUT if we put a rule into effect that reads: 4. if an agent sells a product, and an agent represents a company, then the agent must sell the product made by the company So, to obey the rule, we must add smith ford truck smith gm car NOW, with the rule and the new rows, we have REDUNDANCY Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 19

Converting to 5NF This time, we must break the relation into three parts (will not break in two) Example 8 - dealerships: Agent Company Product smith ford car smith gm truck jones ford car smith ford truck smith gm car BREAK INTO 3 Agent Company Agent Product Company Product smith ford smith car ford car smith gm smith truck ford truck jones ford jones car gm car gm truck A relation is already in 5NF if its information content cannot be reconstructed from several smaller record types (having different keys) Only have 5NF problems if there are symmetry constraints (a pair of rows requires the existence of one or more additional rows) Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 20

Domain/Key Normal Form No insertion/deletion anomalies Impossible to make an insertion/deletion that violates a constraint Constraint types: domain constraints key constraints Example 9 - customers cust# branch 1234 west 1325 south 1421 east 1511 south where valid branches are west, east, and south Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 21

Enforcing Domain Integrity in DK/NF Example 9 - customers: cust# branch 1234 west 1325 south 1421 east 1511 south 1600 north If this update is possible, not in DK/NF One possibility for prohibiting this update is to maintain a table of legal branches and write code to prohibit the entry of a branch not in the table legal branch west south east Problem: What's to stop someone from placing north in the legal branch table? Possible partial solution: Restrict access to the legal branch table Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 22

Normalization Example: AutoCAD Database Manufacturing plant electrical wiring specifications Blueprints contain: parts at locations wired connections attributes for each wire and location AutoCAD transmits variable-length records only transmits data for smart parts one record per part all data must be related to one or more parts Objectives: number of wires from any source to any destination sub-classified by voltage, shielding, and intrinsic safety characteristics obtain conduit count from wire count (by hand) Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 23

Wiring Problem Statement Part: AS 303 Wire 52 Wire 53 Part: AS 404 Wire Panel (3) Part: AS 405 Panel (4) Problem: Count the wires going from panel (3) to panel (4) Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 24

Wiring Database Normalization 0th Normal Form Part# Loc Loc_Desc Wire1 Volt IS Wire2 Volt IS Wire3... AS303 (3) Panel 3 52 240 53 24 IS 54 AS404 (4) Panel 4 52 240 55 120 AS405 (4) Panel 4 53 24 IS 1st Normal Form (no repeating groups) but the 2-part key creates partial dependencies Part# Loc Loc_Desc Wire# Volt IS AS303 (3) Panel 3 52 240 AS303 (3) Panel 3 53 24 IS AS303 (3) Panel 3 54 240 AS404 (4) Panel 4 52 240 AS404 (4) Panel 4 55 120 AS405 (4) Panel 4 53 24 IS 2nd Normal Form (no partial dependencies) Part# Loc Loc_Desc Wire# AS303 (3) Panel 3 52 AS303 (3) Panel 3 53 AS303 (3) Panel 3 54 AS404 (4) Panel 4 52 AS404 (4) Panel 4 55 AS405 (4) Panel 4 53 Wire# Volt IS 52 240 53 24 IS 54 240 55 120 Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 25

Wiring Database Normalization (Continued) 2nd Normal Form (apply rule again) with the step-by-step approach, you only eliminate one partial dependency at a time Part# AS303 52 AS303 53 AS303 54 AS404 52 AS404 55 AS405 53 Wire# Wire# Volt IS 52 240 53 24 IS 54 240 55 120 Part# Loc Loc_Desc AS303 (3) Panel 3 AS404 (4) Panel 4 AS405 (4) Panel 4 Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 26

Wiring Database Normalization (Concluded) 3rd Normal Form (no transitive dependencies) Part# Wire# AS303 52 AS303 53 AS303 54 AS404 52 AS404 55 AS405 53 Wire# Volt IS 52 240 53 24 IS 54 240 55 120 Part# Loc AS303 (3) AS404 (4) AS405 (4) Loc Loc_Desc (3) Panel 3 (4) Panel 4 Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 27

A Case Study (March) Population: MBA students MIS majors last quarter in the program Language: Nomad 2 4GL Case: Customer, with attributes Dealer, with attributes Manufacturer, with attributes Contracts - customer, dealer, manufacturer, with symmetry constraints Task: Given case description fully analyzed Use existing Nomad database Perform 8 queries Perform 11 updates Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 28

Relations Given in Case Study (March) Design: 3 groups of 14 students same case, same queries, same updates different schemas (1NF, 3NF, and 5NF) 1NF Schema: (d#, m#, c#, mfgr_attr, cust_attr, dealer_attr) 3NF Schema: (c#, cust_attr) (d#, dealer_attr) (m#, mfgr_attr) (d#, c#, m#) 5NF Schema: (c#, cust_attr) (d#, dealer_attr) (m#, mfgr_attr) (d#, c#) (d#, m#) (c#, m#) Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 29

Preliminary Case Results (March) Tasks Correctly Performed Normal Form: Queries (8) Updates (11) First 7.21 5.07 (90%) (46%) Third 4.50 3.64 (56%) (33%) Fifth 4.42 3.21 (55%) (29%) Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 30

Database Design Issues Revisited Ease of query formulation Ease of enforcing referential integrity constraints Ease of avoiding update anomalies Normalization focuses only on avoiding update anomalies Being normal is not enough Possible solutions: 1. Don't normalize 2. Don't normalize beyond BCNF 3. Normalize to 5NF, but back off Problems with 1-3: update anomalies, bad data, knowledge of database storage needed 4. Don't let users at base tables 5. Create views that are in low normal forms 6. Pre-define joins that give users the data they need Solutions 4-6 are more work, but generally worth the effort Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 31

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 32