Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31



Similar documents
Database Design and Normalization

Relational Database Design

Schema Refinement and Normalization

Lecture Notes on Database Normalization

Schema Design and Normal Forms Sid Name Level Rating Wage Hours

Database Design and Normalization

Database Design and Normal Forms

Relational Normalization Theory (supplemental material)

Relational Database Design Theory

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

Theory behind Normalization & DB Design. Satisfiability: Does an FD hold? Lecture 12

Design of Relational Database Schemas

Why Is This Important? Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) Example (Contd.)

Chapter 7: Relational Database Design

Database Management Systems. Redundancy and Other Problems. Redundancy

Week 11: Normal Forms. Logical Database Design. Normal Forms and Normalization. Examples of Redundancy

Schema Refinement, Functional Dependencies, Normalization

Functional Dependencies and Normalization

Functional Dependencies and Finding a Minimal Cover

Quiz 3: Database Systems I Instructor: Hassan Khosravi Spring 2012 CMPT 354

normalisation Goals: Suppose we have a db scheme: is it good? define precise notions of the qualities of a relational database scheme

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

CS143 Notes: Normalization Theory

Theory of Relational Database Design and Normalization

Chapter 10. Functional Dependencies and Normalization for Relational Databases

Objectives of Database Design Functional Dependencies 1st Normal Form Decomposition Boyce-Codd Normal Form 3rd Normal Form Multivalue Dependencies

Jordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010

Limitations of E-R Designs. Relational Normalization Theory. Redundancy and Other Problems. Redundancy. Anomalies. Example

Relational Database Design: FD s & BCNF

Introduction Decomposition Simple Synthesis Bernstein Synthesis and Beyond. 6. Normalization. Stéphane Bressan. January 28, 2015

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 5: Normalization II; Database design case studies. September 26, 2005

Functional Dependency and Normalization for Relational Databases

Introduction to Database Systems. Normalization

Chapter 7: Relational Database Design

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

Normalization in Database Design

Advanced Relational Database Design

Design Theory for Relational Databases: Functional Dependencies and Normalization

RELATIONAL DATABASE DESIGN

Chapter 8. Database Design II: Relational Normalization Theory

Boyce-Codd Normal Form

Limitations of DB Design Processes

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

Database Management System

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

CSCI-GA Database Systems Lecture 7: Schema Refinement and Normalization

Normalisation. Why normalise? To improve (simplify) database design in order to. Avoid update problems Avoid redundancy Simplify update operations

How To Find Out What A Key Is In A Database Engine

DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF)

Relational Normalization: Contents. Relational Database Design: Rationale. Relational Database Design. Motivation

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

Chapter 10 Functional Dependencies and Normalization for Relational Databases

DATABASE SYSTEMS. Chapter 7 Normalisation

Normalization. CIS 331: Introduction to Database Systems

An Algorithmic Approach to Database Normalization

DATABASE NORMALIZATION

Unit 3.1. Normalisation 1 - V Normalisation 1. Dr Gordon Russell, Napier University

A Web-Based Environment for Learning Normalization of Relational Database Schemata

Fundamentals of Database System

Database Constraints and Design

Introduction to normalization. Introduction to normalization

Module 5: Normalization of database tables

Normalization in OODB Design

Introduction to Database Systems. Chapter 4 Normal Forms in the Relational Model. Chapter 4 Normal Forms

Unique column combinations

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

MCQs~Databases~Relational Model and Normalization

LiTH, Tekniska högskolan vid Linköpings universitet 1(7) IDA, Institutionen för datavetenskap Juha Takkinen

Normalisation 1. Chapter 4.1 V4.0. Napier University

Normalization of database model. Pazmany Peter Catholic University 2005 Zoltan Fodroczi

Lecture 2 Normalization

Chapter 6. Database Tables & Normalization. The Need for Normalization. Database Tables & Normalization

LOGICAL DATABASE DESIGN

Theory of Relational Database Design and Normalization

Database Systems Concepts, Languages and Architectures

SQL DDL. DBS Database Systems Designing Relational Databases. Inclusion Constraints. Key Constraints

BCA. Database Management System

Functional Dependencies

Normalization of Database

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3)

Teaching Database Modeling and Design: Areas of Confusion and Helpful Hints

Normal forms and normalization

Database Sample Examination

Normalization. Reduces the liklihood of anomolies

Graham Kemp (telephone , room 6475 EDIT) The examiner will visit the exam room at 15:00 and 17:00.

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES

B2.2-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Conceptual Design Using the Entity-Relationship (ER) Model

DATABASE MANAGEMENT SYSTEMS. Question Bank:

Topic 5.1: Database Tables and Normalization

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:


3. Database Design Functional Dependency Introduction Value in design Initial state Aims

DBMS. Normalization. Module Title?

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

User Manual. for the. Database Normalizer. application. Elmar Jürgens

Transcription:

Databases -Normalization III (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

This lecture This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31

Normal Forms BCNF - recap The BCNF decomposition of a relation is derived by a recursive algorithm. A lossless-join decomposition is derived which may not be dependency preserving. The decomposition is too restrictive. To make the decomposition in the previous example dependency preserving we can cover the FD JP C by adding its attributes as a relation R 1 = CSJDQV R 2 = SDP R 3 = JPC We have added the required FD involving key attributes that were prohibited by BCNF. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 3 / 31

Normal Forms 3NF A relational schema is in 3NF if for every functional dependency X A (where X is a subset of the attributes and A is a single attribute) either A X (that is, X A is a trivial FD), or X is a superkey, or A is part of some key for R. Thus the definition of 3NF permits a few additional FDs involving key attributes that are prohibited by BCNF. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 4 / 31

Normal Forms The contracts example The contracts example was not in BCNF because of the FD SD P However as P is part of a key (JP), this FD does not violate the conditions for 3NF. Therefore neither of the two given FDs violate the conditions for 3NF. Checking all the other (implied) FDs shows that contracts is already in 3NF and thus need not be decomposed further. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 5 / 31

Normal Forms Properties of 3NF The most important property of 3NF is the result that there is always a lossless-join and dependency-preserving decomposition of any relational schema into 3NF. While not conceptually difficult, the algorithm to decompose a relation schema into 3NF is quite fiddly and much more complicated than the simple algorithm for decomposition into BCNF. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 6 / 31

Normal Forms 3NF Synthesis The BCNF algorithm is from the perspective of deriving a lossless-join decomposition, and dependency preservation was addressed by adding extra relation schemas. An alternative approach involves synthesis that takes the attributes of the original relation R and the minimal cover F of the FDs and adds a relation schema XA for each FD X A in F. The resulting collection of relations is 3NF and preserves all FDs. If it is not lossless-join, it is made so by adding a relation that contains those attributes that appear in some key. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 7 / 31

Normal Forms The two approaches The decomposition algorithm derives the relations while maintaining the keys, thus emphasizing the lossless-join property, and deals with dependency preservation by adding the necessary relations. The synthesis algorithm build the relations from the set of FDs, emphasizing the dependencies, and then deals with lossless-join issues after the fact by adding the necessary relation. The latter is an algorithm that has polynomial complexity. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 8 / 31

3NF synthesis The Synthesis Algorithm Given a relation R and a set of functional dependencies X Y where X and Y are subsets of the attributes of R. 1 Compute the closures of the FDs. 2 Use ➀ to find all the candidate keys of the relation R. 3 Find the minimal cover of the relation R. 4 Use ➂ to produce R i decompositions in 3NF. 5 If no R i is a super key of R, add a relation containing the key. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 9 / 31

3NF synthesis FD closures 1 Compute the closures of the FDs. For each X Y, derive (X) + the set of attributes of R covered by X Y. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 10 / 31

3NF synthesis Candidate keys 2 Use ➀ to find all the candidate keys of the relation R. If any (X) + covers all the attributes of R then X is a key. It is possibly a super key with redundant attributes contained within X (these will be obvious). Otherwise find the (X) + that maximally covers R and add to X attributes of R missing from the closure. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 11 / 31

3NF synthesis Minimal cover 3 Find the minimal cover of the relation R. ALGORITHM: Step 1 Decompose all the FDs to the form X A, where A is a single attribute. Step 2 Eliminate redundant attributes from the LHS. If XB A, where B is a single attribute and X A is entailed within the set, then B was unnecessary. If LHS has more than one attribute, compute closure of each set that leaves out one attribute. If it includes the right side, remove the extra attribute in the functional dependency. Step 3 Delete the redundant FDs. For each dependency, compute the closure of the determinant attributes against all the other dependencies except the one you re testing. If you find the attribute on the right side in the closure, you can leave that dependency out. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 12 / 31

3NF synthesis 3NF relations 4 Use ➂ to produce R i decompositions in 3NF. ALGORITHM: Step 1 Combine all FDs with the same LHS For each X such that X A, X B... X AB... Step 2 Form a relation R i of all attributes in each FD. For each X ABC, form {X, A, B, C}. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 13 / 31

3NF synthesis 3NF relations 5 If no R i is a super key of R, add a relation containing the key. If UVW is a key for R, and no relation R i formed in the 3NF decomposition contains a super key of R, then add {U, V, W} as a relation. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 14 / 31

3NF synthesis Exercise Consider R = {A, B, C, D, E, G} and the following FDs 1 AB C 2 C A 3 BC D 4 ACD B 5 D EG 6 BE C 7 CG BD 8 CE AG (AB) + = ABCDEG (C) + = AC (BC) + = ABCDEG (ACD) + = ABCDEG (D) + = DEG (BE) + = ABCDEG (CG) + = ABCDEG (CE) + = ABCDEG AB is a key BC is a key ACD is a super key BE is a key CG is a key CE is a key Applying Armstrong s Axioms gives BD and CD are also keys. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 15 / 31

3NF synthesis Generate minimal cover - Working Original FDs 1 AB C 2 C A 3 BC D 4 ACD B 5 D EG 6 BE C 7 CG BD 8 CE AG single RHS 1 AB C 2 C A 3 BC D 4 ACD B 5 D E 6 D G 7 BE C 8 CG B 9 CG D 10 CE A 11 CE G redundant LHS 1 AB C 2 C A 3 BC D 4 ACD B 5 D E 6 D G 7 BE C 8 CG B 9 CG D 10 CE A 11 CE G minimal FDs 1 2 3 4 5 6 7 8 9 10 11 (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 16 / 31

3NF synthesis Generate minimal cover single RHS 1 AB C 2 C A 3 BC D 4 ACD B 5 D E 6 D G 7 BE C 8 CG B 9 CG D 10 CE A 11 CE G redundant LHS 1 AB C 2 C A 3 BC D 4 ACD B 5 D E 6 D G 7 BE C 8 CG B 9 CG D 10 CE A 11 CE G duplications delete ➇ = ➈ + ➃ delete ➉ = ➁ minimal FDs 1 AB C 2 C A 3 BC D 4 CD B 5 D E 6 D G 7 BE C 8 9 CG D 10 11 CE G (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 17 / 31

3NF Synthesis The synthesis process Combine same LHS 1 AB C 2 C A 3 BC D 4 CD B 5 D E, D G 6 BE C 7 CG D 8 CE G Form relations 1 {A, B, C} 2 {C, A} 3 {B, C, D} 4 {C, D, B} 5 {D, E, G} 6 {B, E, C} 7 {C, G, D} 8 {C, E, G} The 3NF decomposition 1 {A, B, C} 2 ➁ is a proper subset of ➀ 3 {B, C, D} 4 ➃ is a subset of ➂ 5 {D, E, G} 6 {B, E, C} 7 {C, G, D} 8 {C, E, G} (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 18 / 31

3NF Synthesis Final 3NF decomposition {A, B, C}{B, C, D}{D, E, G}{B, E, C}{C, G, D}{C, E, G}. Super key is contained in at least one relation. We have looked at how to confirm the decomposition is a lossless-join when an original relation R is decomposed into two R 1 and R 2. Simply check the shared attribute(s) of R 1 and R 2 to see if it is a superkey for either R 1 or R 2. However when decomposed into more than two relations, we will need to use the algorithm described in the next three slides to check for lossless decomposition. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 19 / 31

3NF Synthesis Definition - Lossless Join Property Definition: A decomposition D = {R 1, R 2,..., R m } of R has the lossless (nonadditive) join property with respect to the set of dependencies F on R if, for every relation state R i of R that satisfies F, the natural join of all the relations in D: R 1 R 2... R m = R Note: The word loss in lossless refers to loss of information, not to loss of tuples. In fact, for loss of information a better term is addition of spurious information. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 20 / 31

3NF Synthesis Algorithm for Testing Lossless Join Property Step 1 Create an initial matrix with one row for each decomposed relation R i, and one column for each attribute A j in the original relation R. Step 2 For the ith row, if the jth attribute is in R i, cell (i, j) should contain value a j, otherwise, b ij. A B C D E G R 1 {A, B, C} a 1 a 2 a 3 b 14 b 15 b 16 R 2 {B, C, D} b 21 a 2 a 3 a 4 b 25 b 26 R 3 {D, E, G} b 31 b 32 b 33 a 4 a 5 a 6 R 4 {B, E, C} b 41 a 2 a 3 b 44 a 5 b 46 R 5 {C, G, D} b 51 b 52 a 3 a 4 b 55 a 6 R 6 {C, E, G} b 61 b 62 a 3 b 64 a 5 a 6 (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 21 / 31

3NF Synthesis Algorithm for Testing Lossless Join Property (contd.) Step 3 Apply functional dependencies to replace the b ij s with a j s. Step 4 If there is a row contains all a j s, the decomposition is lossless. A B C D E G R 1 {A, B, C} a 1 a 2 a 3 R 2 {B, C, D} a 2 a 3 a 4 R 3 {D, E, G} a 4 a 5 a 6 R 4 {B, E, C} a 2 a 3 a 5 R 5 {C, G, D} a 3 a 4 a 6 R 6 {C, E, G} a 3 a 5 a 6 1 AB C 2 C A 3 BC D 4 CD B 5 D E 6 D G 7 BE C 8 CG D 9 CE G (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 22 / 31

3NF Synthesis Manual normalization Data typically employed in a business is usually presented as below. However to be useful in a database the data needs to be normalized. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 23 / 31

3NF Synthesis First normal form 1NF First normal form (1NF) requires attribute values be atomic that is, individual values rather than lists or sets and that a primary key is defined. This table is in 1NF. The primary key is (Order ID, Product ID). (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 24 / 31

3NF Synthesis Second normal form 2NF A relation in first normal form (1NF) has the above functional dependencies. A relation is in Second normal for (2NF) if it is in 1NF and contains no partial dependencies. A partial functional dependency exists when a nonkey attribute is functionally dependent on part of, but not all of, the primary key. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 25 / 31

3NF Synthesis Second normal form 2NF 2NF is achieved by creating three (3) relations with the following attributes and keys. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 26 / 31

3NF Synthesis Third normal form 3NF A relation in second normal form (2NF) still has the above functional dependencies. A relation is in Third normal form (3NF) if it is in 2NF and contains no transitive dependencies. A transitive functional dependency in a relation is a functional dependency between two or more nonkey attributes. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 27 / 31

3NF Synthesis Third normal form 3NF 3NF is achieved by creating an additional relation with the following attributes and keys. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 28 / 31

3NF Synthesis Practical Use of Normalization In practice, the DBA should be guided by the theory of normalization, but not slavishly follow it. When the ER diagram is converted into tables, any functional dependencies should be identified and a decision made as to whether the tables should be normalized or not. Occasionally it will even be appropriate to de-normalize data if the most frequently occurring queries involve expensive joins of the normalized tables. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 29 / 31

Aside Multi-valued Attributes A Single-valued attribute of an entity instance has just one value. This has been the typical scenario we have discussed so far and know how this is handled in a schema definition. A Multi-valued attribute is an attribute that can take on more than one value for a given entity instance. Diagrammatically this is indicated in an ERD by a double edged ellipse. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 30 / 31

Aside Multi-valued Attributes The mapping of an entity with a multi-valued attribute is achieved by the following. The first relation contains all the attributes of the entity type except the multi-valued attribute. The second relation contains the attributes that form the primary key of the second relation. The first of these attributes is the primary key of the first relation, and hence it is a foreign key. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 31 / 31