Design of Relational Database Schemas



Similar documents
Database Design and Normalization

Relational Database Design

Relational Database Design Theory

Schema Design and Normal Forms Sid Name Level Rating Wage Hours

Database Design and Normal Forms

Chapter 10. Functional Dependencies and Normalization for Relational Databases

Schema Refinement and Normalization

Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

Lecture Notes on Database Normalization

Relational Database Design: FD s & BCNF

Functional Dependencies and Finding a Minimal Cover

Theory behind Normalization & DB Design. Satisfiability: Does an FD hold? Lecture 12

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

normalisation Goals: Suppose we have a db scheme: is it good? define precise notions of the qualities of a relational database scheme

Theory of Relational Database Design and Normalization

Schema Refinement, Functional Dependencies, Normalization

Chapter 7: Relational Database Design

CS143 Notes: Normalization Theory

Quiz 3: Database Systems I Instructor: Hassan Khosravi Spring 2012 CMPT 354

Chapter 8. Database Design II: Relational Normalization Theory

Chapter 7: Relational Database Design

Database Management Systems. Redundancy and Other Problems. Redundancy

Week 11: Normal Forms. Logical Database Design. Normal Forms and Normalization. Examples of Redundancy

Functional Dependencies and Normalization

Introduction to Database Systems. Normalization

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

Why Is This Important? Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) Example (Contd.)

Advanced Relational Database Design

Functional Dependencies

Chapter 10 Functional Dependencies and Normalization for Relational Databases

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

Database Constraints and Design

Limitations of E-R Designs. Relational Normalization Theory. Redundancy and Other Problems. Redundancy. Anomalies. Example

Database Management System

Introduction Decomposition Simple Synthesis Bernstein Synthesis and Beyond. 6. Normalization. Stéphane Bressan. January 28, 2015

DATABASE NORMALIZATION

Normalisation. Why normalise? To improve (simplify) database design in order to. Avoid update problems Avoid redundancy Simplify update operations

Objectives of Database Design Functional Dependencies 1st Normal Form Decomposition Boyce-Codd Normal Form 3rd Normal Form Multivalue Dependencies

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 5: Normalization II; Database design case studies. September 26, 2005

Theory of Relational Database Design and Normalization

Database Design and Normalization

Jordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010

Limitations of DB Design Processes

How To Find Out What A Key Is In A Database Engine

RELATIONAL DATABASE DESIGN

Relational Normalization Theory (supplemental material)

Design Theory for Relational Databases: Functional Dependencies and Normalization

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

Normalization in Database Design

SQL DDL. DBS Database Systems Designing Relational Databases. Inclusion Constraints. Key Constraints

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

Functional Dependency and Normalization for Relational Databases

Normalization of Database

Database Systems Concepts, Languages and Architectures

Boyce-Codd Normal Form

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

CSCI-GA Database Systems Lecture 7: Schema Refinement and Normalization

Normalization of database model. Pazmany Peter Catholic University 2005 Zoltan Fodroczi

Relational Normalization: Contents. Relational Database Design: Rationale. Relational Database Design. Motivation

Graham Kemp (telephone , room 6475 EDIT) The examiner will visit the exam room at 15:00 and 17:00.

Normalization. CIS 331: Introduction to Database Systems

Determination of the normalization level of database schemas through equivalence classes of attributes

An Algorithmic Approach to Database Normalization

MCQs~Databases~Relational Model and Normalization

A Web-Based Environment for Learning Normalization of Relational Database Schemata

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3)

Question 1. Relational Data Model [17 marks] Question 2. SQL and Relational Algebra [31 marks]

Normalisation and Data Storage Devices

Normalisation in the Presence of Lists

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London Martin Paris Deen London

Normalization for Relational DBs

Normalization. Normalization. Normalization. Data Redundancy

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

LiTH, Tekniska högskolan vid Linköpings universitet 1(7) IDA, Institutionen för datavetenskap Juha Takkinen

Lecture 2 Normalization

3. Database Design Functional Dependency Introduction Value in design Initial state Aims

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Introduction to Database Systems. Chapter 4 Normal Forms in the Relational Model. Chapter 4 Normal Forms

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

BCA. Database Management System

Normalization in OODB Design

Teaching Database Modeling and Design: Areas of Confusion and Helpful Hints

Part I: Entity Relationship Diagrams and SQL (40/100 Pt.)

The class slides, your notes, and the sample problem that we worked in class may be helpful for reference.

Normal forms and normalization

Introduction to normalization. Introduction to normalization

Notes from February 11

Database Sample Examination

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES

DATABASE SYSTEMS. Chapter 7 Normalisation

DATABASE MANAGEMENT SYSTEMS. Question Bank:

Introduction to Databases

Chapter 9: Normalization

Normalization. Normalization. First goal: to eliminate redundant data. for example, don t storing the same data in more than one table

TYPICAL QUESTIONS & ANSWERS

DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF)

Transcription:

Design of Relational Database Schemas T. M. Murali October 27, November 1, 2010

Plan Till Thanksgiving What are the typical problems or anomalies in relational designs? Introduce the idea of decomposing a relation schema into two smaller schemas. Introduce Boyce-Codd normal form (BCNF), a condition on relational schemas that eliminates anomalies. BCNF stated using the concept of FDs. Use decomposition of schemas to bring them to BCNF. Define another type of constraint called Multivalued Dependencies (MDs). Define normal forms that eliminate MDs.

Closures of FDs Given a relation R and a set F of FDs that hold in R the closure {F} + is the set of all FDs that follow from R.

Closures of FDs Given a relation R and a set F of FDs that hold in R the closure {F} + is the set of all FDs that follow from R. Recall: An FD S follows from a set of FDs T if every relation instance that satisfies all the FDs in T also satisfies S. S = {A C} follows from T = {A B, B C}.

Computing Closures of FDs To compute the closure of a set of FDs, repeatedly apply Armstrong s Axioms until you cannot find any new FDs:

Computing Closures of FDs To compute the closure of a set of FDs, repeatedly apply Armstrong s Axioms until you cannot find any new FDs: Reflexivity: If Y X, then X Y Augmentation: If X Y then XZ YZ for any attribute Z. Transitivity: If X Y and Y Z then X Z.

Examples of Computing Closures of FDs Let us include only completely non-trivial FDs in these examples, with a single attribute on the right. Assume that there are no attributes other than those mentioned in the FDs. F = {A B, B C}. {F} + is

Examples of Computing Closures of FDs Let us include only completely non-trivial FDs in these examples, with a single attribute on the right. Assume that there are no attributes other than those mentioned in the FDs. F = {A B, B C}. {F} + is {A B, B C, A C, AC B, AB C}

Examples of Computing Closures of FDs Let us include only completely non-trivial FDs in these examples, with a single attribute on the right. Assume that there are no attributes other than those mentioned in the FDs. F = {A B, B C}. {F} + is {A B, B C, A C, AC B, AB C} F = {AB C, BC A, AC B}. {F} + is

Examples of Computing Closures of FDs Let us include only completely non-trivial FDs in these examples, with a single attribute on the right. Assume that there are no attributes other than those mentioned in the FDs. F = {A B, B C}. {F} + is {A B, B C, A C, AC B, AB C} F = {AB C, BC A, AC B}. {F} + is {AB C, BC A, AC B}

Examples of Computing Closures of FDs Let us include only completely non-trivial FDs in these examples, with a single attribute on the right. Assume that there are no attributes other than those mentioned in the FDs. F = {A B, B C}. {F} + is {A B, B C, A C, AC B, AB C} F = {AB C, BC A, AC B}. {F} + is {AB C, BC A, AC B} F = {A B, B C, C D}. {F} + is

Examples of Computing Closures of FDs Let us include only completely non-trivial FDs in these examples, with a single attribute on the right. Assume that there are no attributes other than those mentioned in the FDs. F = {A B, B C}. {F} + is {A B, B C, A C, AC B, AB C} F = {AB C, BC A, AC B}. {F} + is {AB C, BC A, AC B} F = {A B, B C, C D}. {F} + is {A B, B C, C D, A C, A D, B D,...}

Closures of FDs vs. Closures of Attributes Both algorithms take as input a relation R and a set of FDs F. Closure of FDs: Computes {F} +, the set of all FDs that follow from F. Output is a set of FDs. Output may contain an exponential number of FDs. Closure of attributes: In addition, takes a set {A1, A 2,..., A n } of attributes as input. Computes {A1, A 2,..., A n } +, the set of all attributes B such that the A 1 A 2... A n B follows from F. Output is a set of attributes. Output may contain at most the number of attributes in R.

Projecting Sets of FDs Suppose we have a relation R and a set of FDs F. Let S be a relation obtained by projecting R into a subset of the attributes of R, i.e., S = π Attributes (R). The projection F S of F is the set of FDs that follow from F and hold in S (involve only attributes of S). Algorithm for computing F S : 1. Compute {F} +. 2. F S is the set of all FDs in {F} + that involve only the attributes in S. Book describes a different algorithm on page 82 (Chapter 3.2.8). Book s algorithm also shows how to compute a minimal basis of F S.

Projecting Sets of FDs Suppose we have a relation R and a set of FDs F. Let S be a relation obtained by projecting R into a subset of the attributes of R, i.e., S = π Attributes (R). The projection F S of F is the set of FDs that follow from F and hold in S (involve only attributes of S). Algorithm for computing F S : 1. Compute {F} +. 2. F S is the set of all FDs in {F} + that involve only the attributes in S. Book describes a different algorithm on page 82 (Chapter 3.2.8). Book s algorithm also shows how to compute a minimal basis of F S. R(A, B, C, D), F = {A B, B C, C D}. Which FDs hold in S(A, C, D)?

Projecting Sets of FDs Suppose we have a relation R and a set of FDs F. Let S be a relation obtained by projecting R into a subset of the attributes of R, i.e., S = π Attributes (R). The projection F S of F is the set of FDs that follow from F and hold in S (involve only attributes of S). Algorithm for computing F S : 1. Compute {F} +. 2. F S is the set of all FDs in {F} + that involve only the attributes in S. Book describes a different algorithm on page 82 (Chapter 3.2.8). Book s algorithm also shows how to compute a minimal basis of F S. R(A, B, C, D), F = {A B, B C, C D}. Which FDs hold in S(A, C, D)? {F} + is

Projecting Sets of FDs Suppose we have a relation R and a set of FDs F. Let S be a relation obtained by projecting R into a subset of the attributes of R, i.e., S = π Attributes (R). The projection F S of F is the set of FDs that follow from F and hold in S (involve only attributes of S). Algorithm for computing F S : 1. Compute {F} +. 2. F S is the set of all FDs in {F} + that involve only the attributes in S. Book describes a different algorithm on page 82 (Chapter 3.2.8). Book s algorithm also shows how to compute a minimal basis of F S. R(A, B, C, D), F = {A B, B C, C D}. Which FDs hold in S(A, C, D)? {F} + is {A B, B C, C D, A C, A D, B D}

Projecting Sets of FDs Suppose we have a relation R and a set of FDs F. Let S be a relation obtained by projecting R into a subset of the attributes of R, i.e., S = π Attributes (R). The projection F S of F is the set of FDs that follow from F and hold in S (involve only attributes of S). Algorithm for computing F S : 1. Compute {F} +. 2. F S is the set of all FDs in {F} + that involve only the attributes in S. Book describes a different algorithm on page 82 (Chapter 3.2.8). Book s algorithm also shows how to compute a minimal basis of F S. R(A, B, C, D), F = {A B, B C, C D}. Which FDs hold in S(A, C, D)? {F} + is {A B, B C, C D, A C, A D, B D} F S is {C D, A C, A D}.

Design of Relational Database Schemas Careless design of relational schemas can cause problems. Example: Combining the relation for a many-many relationship with the relation for one of its entity sets causes redundancy.

Design of Relational Database Schemas Careless design of relational schemas can cause problems. Example: Combining the relation for a many-many relationship with the relation for one of its entity sets causes redundancy. Suppose we combine the schemas Courses(Number, DepartmentName, CourseName, Classroom, Enrollment) and Take(StudentName, Address, Number, DepartmentName) into one relation Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address).

Anomalies in Relational Schemas Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address)

Anomalies in Relational Schemas Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address) Redundancy: information is repeated unnecessarily in several tuples. Update anomalies: We change information in one tuple but leave the old information in another tuple. Insertion anomalies: It is not possible to store some information unless some other, unrelated information is stored as well. Deletion anomalies: If a set of values becomes empty, we may lose other information as a side effect.

Decomposing Relations Accepted way to eliminate anomalies is to decompose relations. Given a relation R(A 1, A 2,..., A n ), two relations S(B 1, B 2,..., B m ) and T (C 1, C 2,..., C k ) form a decomposition of R if 1. the attributes of S and T together make up the attributes of R, i.e., {A 1, A 2,..., A n } = {B 1, B 2,..., B m } {C 1, C 2,..., C k }. 2. the tuples in S are the projections into {B 1, B 2,..., B m } of the tuples in R, i.e. S π B1,B 2,...,B m (R). 3. the tuples in T are the projections into {C 1, C 2,..., C k } of the tuples in R, i.e., T π C1,C 2,...,C k (R).

Example of Decomposition Decompose Courses into Courses1(Number, DepartmentName, CourseName, Classroom, Enrollment) and Courses2(Number, DepartmentName, StudentName, Address). Are the anomalies removed? Redundancy Update Insertion Deletion

Boyce-Codd Normal Form Condition on the FDs in a relation that guarantees that anomalies do not exist.

Boyce-Codd Normal Form Condition on the FDs in a relation that guarantees that anomalies do not exist. A relation R is in Boyce-Codd Normal Form (BCNF) if and only if for every non-trivial FD A 1 A 2... A n B for R, {A 1, A 2,..., A n } is a superkey for R. Informally, the left side of every non-trivial FD must be a superkey. A relation R violates BCNF if it has an FD such that the attributes of the left side of an FD do not form a superkey. In other words, there is some key of R such that only some (not all) of the attributes in this key appear on the left side of the FD.

Checking for BCNF Violations 1. List all FDs. 2. Ensure that left hand side of each FD is a superkey.

Checking for BCNF Violations 1. List all FDs. 2. Ensure that left hand side of each FD is a superkey. We have to first find all the keys!

Checking for BCNF Violations 1. List all FDs. 2. Ensure that left hand side of each FD is a superkey. We have to first find all the keys! Is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address) in BCNF?

Checking for BCNF Violations 1. List all FDs. 2. Ensure that left hand side of each FD is a superkey. We have to first find all the keys! Is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address) in BCNF? FDs are

Checking for BCNF Violations 1. List all FDs. 2. Ensure that left hand side of each FD is a superkey. We have to first find all the keys! Is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address) in BCNF? FDs are Number DepartmentName CourseName Number DepartmentName Classroom Number DepartmentName Enrollment

Checking for BCNF Violations 1. List all FDs. 2. Ensure that left hand side of each FD is a superkey. We have to first find all the keys! Is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address) in BCNF? FDs are Number DepartmentName CourseName Number DepartmentName Classroom Number DepartmentName Enrollment What is {Number, DepartmentName} +?

Checking for BCNF Violations 1. List all FDs. 2. Ensure that left hand side of each FD is a superkey. We have to first find all the keys! Is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address) in BCNF? FDs are Number DepartmentName CourseName Number DepartmentName Classroom Number DepartmentName Enrollment What is {Number, DepartmentName} +? {Number, DepartmentName, Coursename, Classroom, Enrollment}.

Checking for BCNF Violations 1. List all FDs. 2. Ensure that left hand side of each FD is a superkey. We have to first find all the keys! Is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address) in BCNF? FDs are Number DepartmentName CourseName Number DepartmentName Classroom Number DepartmentName Enrollment What is {Number, DepartmentName} +? {Number, DepartmentName, Coursename, Classroom, Enrollment}. Therefore, the key is {Number, DepartmentName, StudentName, Address}.

Checking for BCNF Violations 1. List all FDs. 2. Ensure that left hand side of each FD is a superkey. We have to first find all the keys! Is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address) in BCNF? FDs are Number DepartmentName CourseName Number DepartmentName Classroom Number DepartmentName Enrollment What is {Number, DepartmentName} +? {Number, DepartmentName, Coursename, Classroom, Enrollment}. Therefore, the key is {Number, DepartmentName, StudentName, Address}. The relation is not in BCNF.

Decomposition into BCNF Suppose R is a relation schema that violates BCNF. We can decompose R into a set S of new relations such that 1. each relation in S is in BCNF and 2. we can recover R from the relations in S, i.e., we can reconstruct R exactly from the relations in S.

BCNF Normalisation Algorithm Let A be the set of all attributes of R. Let F be the set of all FDs of R. Suppose the FD X 1 X 2... X m Y violates BCNF.

BCNF Normalisation Algorithm Let A be the set of all attributes of R. Let F be the set of all FDs of R. Suppose the FD X 1 X 2... X m Y violates BCNF. 1. Compute {X 1 X 2..., X m } +.

BCNF Normalisation Algorithm Let A be the set of all attributes of R. Let F be the set of all FDs of R. Suppose the FD X 1 X 2... X m Y violates BCNF. 1. Compute {X 1 X 2..., X m } +. 2. Decompose R into two relations R 1 and R 2 with schemas R 1 : all the attributes in {X 1, X 2..., X m } + R 2 : all the attributes on the left side of the violating FD and all the attributes of R not in {X 1, X 2,..., X m } +, i.e., A {X 1, X 2..., X m } + {X 1, X 2..., X m }.

BCNF Normalisation Algorithm Let A be the set of all attributes of R. Let F be the set of all FDs of R. Suppose the FD X 1 X 2... X m Y violates BCNF. 1. Compute {X 1 X 2..., X m } +. 2. Decompose R into two relations R 1 and R 2 with schemas R 1 : all the attributes in {X 1, X 2..., X m } + R 2 : all the attributes on the left side of the violating FD and all the attributes of R not in {X 1, X 2,..., X m } +, i.e., A {X 1, X 2..., X m } + {X 1, X 2..., X m }. 3. Find FDs in R 1 and R 2 and decompose them if they are not in BCNF 4. For i = 1, 2 Compute F Ri the projection of the FDs in F into R i. If any of the FDs in F Ri violates BCNF, decompose R i recursively.

Decomposing Courses Schema is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address). BCNF-violating FD is Number DepartmentName CourseName Classroom Enrollment.

Decomposing Courses Schema is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address). BCNF-violating FD is Number DepartmentName CourseName Classroom Enrollment. What is {Number, DepartmentName} +?

Decomposing Courses Schema is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address). BCNF-violating FD is Number DepartmentName CourseName Classroom Enrollment. What is {Number, DepartmentName} +? {Number, DepartmentName, CourseName, Classroom, Enrollment

Decomposing Courses Schema is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address). BCNF-violating FD is Number DepartmentName CourseName Classroom Enrollment. What is {Number, DepartmentName} +? {Number, DepartmentName, CourseName, Classroom, Enrollment Decompose Courses into Courses1(Number, DepartmentName, CourseName, Classroom, Enrollment) and Courses2(Number, DepartmentName, StudentName, Address).

Decomposing Courses Schema is Courses(Number, DepartmentName, CourseName, Classroom, Enrollment, StudentName, Address). BCNF-violating FD is Number DepartmentName CourseName Classroom Enrollment. What is {Number, DepartmentName} +? {Number, DepartmentName, CourseName, Classroom, Enrollment Decompose Courses into Courses1(Number, DepartmentName, CourseName, Classroom, Enrollment) and Courses2(Number, DepartmentName, StudentName, Address). Are there any BCNF violations in the two new relations?

Another Example of Decomposition (1) Schema is Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId)

Another Example of Decomposition (1) Schema is Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) What are the FDs? ID Name FavouriteAdvisorId AdvisorId AdvisorName

Another Example of Decomposition (1) Schema is Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) What are the FDs? ID Name FavouriteAdvisorId AdvisorId AdvisorName What is the key?

Another Example of Decomposition (1) Schema is Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) What are the FDs? ID Name FavouriteAdvisorId AdvisorId AdvisorName What is the key? {ID, AdvisorId}.

Another Example of Decomposition (1) Schema is Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) What are the FDs? ID Name FavouriteAdvisorId AdvisorId AdvisorName What is the key? {ID, AdvisorId}. Is there a BCNF violation?

Another Example of Decomposition (1) Schema is Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) What are the FDs? ID Name FavouriteAdvisorId AdvisorId AdvisorName What is the key? {ID, AdvisorId}. Is there a BCNF violation? Yes.

Another Example of Decomposition (1) Schema is Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) What are the FDs? ID Name FavouriteAdvisorId AdvisorId AdvisorName What is the key? {ID, AdvisorId}. Is there a BCNF violation? Yes. Use ID Name FavouriteAdvisorId to decompose.

Another Example of Decomposition (1) Schema is Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) What are the FDs? ID Name FavouriteAdvisorId AdvisorId AdvisorName What is the key? {ID, AdvisorId}. Is there a BCNF violation? Yes. Use ID Name FavouriteAdvisorId to decompose. {ID} + is

Another Example of Decomposition (1) Schema is Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) What are the FDs? ID Name FavouriteAdvisorId AdvisorId AdvisorName What is the key? {ID, AdvisorId}. Is there a BCNF violation? Yes. Use ID Name FavouriteAdvisorId to decompose. {ID} + is {ID, Name, FavouriteAdvisorId}.

Another Example of Decomposition (1) Schema is Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) What are the FDs? ID Name FavouriteAdvisorId AdvisorId AdvisorName What is the key? {ID, AdvisorId}. Is there a BCNF violation? Yes. Use ID Name FavouriteAdvisorId to decompose. {ID} + is {ID, Name, FavouriteAdvisorId}. Schemas for new relations are

Another Example of Decomposition (1) Schema is Students(Id, Name, AdvisorId, AdvisorName, FavouriteAdvisorId) What are the FDs? ID Name FavouriteAdvisorId AdvisorId AdvisorName What is the key? {ID, AdvisorId}. Is there a BCNF violation? Yes. Use ID Name FavouriteAdvisorId to decompose. {ID} + is {ID, Name, FavouriteAdvisorId}. Schemas for new relations are Students1(ID, Name, FavouriteAdvisorId) Students2(ID, AdvisorId, AdvisorName)

Another Example of Decomposition (2) What are the FDs in Student1(ID, Name, FavouriteAdvisorId)?

Another Example of Decomposition (2) What are the FDs in Student1(ID, Name, FavouriteAdvisorId)? There are none that violate BCNF.

Another Example of Decomposition (2) What are the FDs in Student1(ID, Name, FavouriteAdvisorId)? There are none that violate BCNF. What are the FDs in Students2(ID, AdvisorId, AdvisorName)?

Another Example of Decomposition (2) What are the FDs in Student1(ID, Name, FavouriteAdvisorId)? There are none that violate BCNF. What are the FDs in Students2(ID, AdvisorId, AdvisorName)? AdvisorId AdvisorName Does it violate BCNF?

Another Example of Decomposition (2) What are the FDs in Student1(ID, Name, FavouriteAdvisorId)? There are none that violate BCNF. What are the FDs in Students2(ID, AdvisorId, AdvisorName)? AdvisorId AdvisorName Does it violate BCNF? Yes. Repeat the decomposition process. Use AdvisorId AdvisorName to decompose. {AdvisorId} + is

Another Example of Decomposition (2) What are the FDs in Student1(ID, Name, FavouriteAdvisorId)? There are none that violate BCNF. What are the FDs in Students2(ID, AdvisorId, AdvisorName)? AdvisorId AdvisorName Does it violate BCNF? Yes. Repeat the decomposition process. Use AdvisorId AdvisorName to decompose. {AdvisorId} + is {AdvisorId, AdvisorName}.

Another Example of Decomposition (2) What are the FDs in Student1(ID, Name, FavouriteAdvisorId)? There are none that violate BCNF. What are the FDs in Students2(ID, AdvisorId, AdvisorName)? AdvisorId AdvisorName Does it violate BCNF? Yes. Repeat the decomposition process. Use AdvisorId AdvisorName to decompose. {AdvisorId} + is {AdvisorId, AdvisorName}. Schemas for new relations are

Another Example of Decomposition (2) What are the FDs in Student1(ID, Name, FavouriteAdvisorId)? There are none that violate BCNF. What are the FDs in Students2(ID, AdvisorId, AdvisorName)? AdvisorId AdvisorName Does it violate BCNF? Yes. Repeat the decomposition process. Use AdvisorId AdvisorName to decompose. {AdvisorId} + is {AdvisorId, AdvisorName}. Schemas for new relations are Students2(ID, AdvisorId) Students3(AdvisorId, AdvisorName).

Examples (Problem 1 of Handout 3) Apply the BCNF normalisation algorithm to Inventory relation. (Problem 2 of Handout 3) Apply the BCNF normalisation algorithm to Concerts relation. For both problems, try all possible choices of FDs to start the normalisation with. Compare the advantages and disadvantages of the choices.

BCNFs and Two-Attribute Relations True or False: Every two-attribute relation R(A, B) is in BCNF.

BCNFs and Two-Attribute Relations True or False: Every two-attribute relation R(A, B) is in BCNF. The statement is true. Why?

BCNFs and Two-Attribute Relations True or False: Every two-attribute relation R(A, B) is in BCNF. The statement is true. Why? Consider four possible cases: 1. There are no non-trivial FDs. 2. A B is the only non-trivial FD. 3. B A is the only non-trivial FD. 4. Both A B and B A hold in R.

Decomposition into BCNF Suppose R is a relation schema that violates BCNF. We can decompose R into a set S of two or more new relations such that 1. each relation in S is in BCNF and 2. we can recover R from the relations in S, i.e., we can reconstruct R from the relations in S. How does the normalisation algorithm guarantee the second condition? In general, what properties does the decomposition satisfy?

Desirable Properties of a Decomposition 1. Eliminate anomalies. 2. Recover the original relation exactly from the relations it is decomposed into. 3. When we reconstruct the original relation, the result will satisfy the original FDs.

Desirable Properties of a Decomposition 1. Eliminate anomalies. 2. Recover the original relation exactly from the relations it is decomposed into. 3. When we reconstruct the original relation, the result will satisfy the original FDs. BCNF decomposition algorithm:

Desirable Properties of a Decomposition 1. Eliminate anomalies. 2. Recover the original relation exactly from the relations it is decomposed into. 3. When we reconstruct the original relation, the result will satisfy the original FDs. BCNF decomposition algorithm: gives us properties 1 and 2 but not 3. 3NF decomposition algorithm: gives us properties 2 and 3 but not 1. (Discuss in the next class.)

Candidate Normalisation Algorithm Every two-attribute relation is in BCNF.

Candidate Normalisation Algorithm Every two-attribute relation is in BCNF. Can we bring any relation R into BCNF by arbitrarily decomposing it into two-attribute relations?

Candidate Normalisation Algorithm Every two-attribute relation is in BCNF. Can we bring any relation R into BCNF by arbitrarily decomposing it into two-attribute relations? No, since we may not be able to recover R correctly from the decomposition.

Joining Relations R A B a 1 b 1 a 2 b 1 a 2 b 2 S B C b 1 c 1 b 2 c 2 b 2 c 3 = T A B C a 1 b 1 c 1 a 2 b 1 c 1 a 2 b 2 c 2 a 2 b 2 c 3 Let R and S be two relations with one common attribute B. Relation T is the natural join of R and S, denoted R S if and only if

Joining Relations R A B a 1 b 1 a 2 b 1 a 2 b 2 S B C b 1 c 1 b 2 c 2 b 2 c 3 = T A B C a 1 b 1 c 1 a 2 b 1 c 1 a 2 b 2 c 2 a 2 b 2 c 3 Let R and S be two relations with one common attribute B. Relation T is the natural join of R and S, denoted R S if and only if the attributes of T are the union of the attributes of R and S,

Joining Relations R A B a 1 b 1 a 2 b 1 a 2 b 2 S B C b 1 c 1 b 2 c 2 b 2 c 3 = T A B C a 1 b 1 c 1 a 2 b 1 c 1 a 2 b 2 c 2 a 2 b 2 c 3 Let R and S be two relations with one common attribute B. Relation T is the natural join of R and S, denoted R S if and only if the attributes of T are the union of the attributes of R and S, every tuple t T is the join of two tuples r R and s S that agree on the attribute B, i.e., t agrees with r on all the attributes in R and with s on all attributes in S,

Joining Relations R A B a 1 b 1 a 2 b 1 a 2 b 2 S B C b 1 c 1 b 2 c 2 b 2 c 3 = T A B C a 1 b 1 c 1 a 2 b 1 c 1 a 2 b 2 c 2 a 2 b 2 c 3 Let R and S be two relations with one common attribute B. Relation T is the natural join of R and S, denoted R S if and only if the attributes of T are the union of the attributes of R and S, every tuple t T is the join of two tuples r R and s S that agree on the attribute B, i.e., t agrees with r on all the attributes in R and with s on all attributes in S, T contains all tuples formed in this manner.

Recovering Information from a Decomposition Suppose R is a relation schema that violates BCNF. The BCNF decomposition algorithm decomposes R into a set {S 1, S 2,... S k } of new relations such that 1. each relation S i, 1 i k is in BCNF and 2. the decomposition of R into {S 1, S 2,... S k } is a lossless-join decomposition, i.e., R = S 1 S 2... S k.

Recovering Information from a Decomposition Suppose R is a relation schema that violates BCNF. The BCNF decomposition algorithm decomposes R into a set {S 1, S 2,... S k } of new relations such that 1. each relation S i, 1 i k is in BCNF and 2. the decomposition of R into {S 1, S 2,... S k } is a lossless-join decomposition, i.e., R = S 1 S 2... S k. 2.1 Every tuple in R is a tuple in S 1 S 2... S k. 2.2 Every tuple in S 1 S 2... S k is in R.

Example of Lossless-Join Decomposition Relation schema is R(A, B, C). FD is B C.

Example of Lossless-Join Decomposition Relation schema is R(A, B, C). FD is B C. Relations in BCNF are S(A, B) and T (B, C).

Example of Lossless-Join Decomposition Relation schema is R(A, B, C). FD is B C. Relations in BCNF are S(A, B) and T (B, C). Prove that R = S T 1. Every tuple in R is in S T. 2. Every tuple in S T is in R.

Example of Lossless-Join Decomposition Relation schema is R(A, B, C). FD is B C. Relations in BCNF are S(A, B) and T (B, C). Prove that R = S T 1. Every tuple in R is in S T. 2. Every tuple in S T is in R. What if FD were A C and we decomposed R into S and T as above?

Example of Lossless-Join Decomposition Relation schema is R(A, B, C). FD is B C. Relations in BCNF are S(A, B) and T (B, C). Prove that R = S T 1. Every tuple in R is in S T. 2. Every tuple in S T is in R. What if FD were A C and we decomposed R into S and T as above? S T contains tuples not in R! In general, if R s attributes are X Y Z and Y Z holds in R, then R = π X Y (R) π Y Z (R).

The Chase Test for Lossless Join Suppose we have a relation R, a set F of FDs that hold in R, and a decomposition of R into relations S 1, S 2,..., S k. We have forgotten how we decomposed R. Is there a way to check R equals the natural join of S 1, S 2,..., S k? 1. Every tuple in R is a tuple in S 1 S 2... S k. 2. Every tuple in S 1 S 2... S k is in R.

The Chase Test for Lossless Join Suppose we have a relation R, a set F of FDs that hold in R, and a decomposition of R into relations S 1, S 2,..., S k. We have forgotten how we decomposed R. Is there a way to check R equals the natural join of S 1, S 2,..., S k? 1. Every tuple in R is a tuple in S 1 S 2... S k. 2. Every tuple in S 1 S 2... S k is in R. R S A B C D A 1 S 2 S 3 S 1 S 2 S 3 D A C B C D A B C D

The Chase Test for Lossless Join S R A B C D A 1 S 2 S 3 S 1 S 2 S 3 D A C B C D A B C D t 1. Natural join is associate and commutative: For each tuple t in S 1 S 2... S k, projection of t into S i is a tuple in π Si (R), for every 1 i k.

u The Chase Test for Lossless Join S R A B C D A 1 S 2 S 3 S 1 S 2 S 3 D A C B C D A B C D u 1. Natural join is associate and commutative: For each tuple t in S 1 S 2... S k, projection of t into S i is a tuple in π Si (R), for every 1 i k. 2. Every tuple u in R is surely in π S1 (R) π S2 (R)... π Sk (R).

t The Chase Test for Lossless Join S R A B C D A 1 S 2 S 3 S 1 S 2 S 3 D A C B C D A B C D t 1. Natural join is associate and commutative: For each tuple t in S 1 S 2... S k, projection of t into S i is a tuple in π Si (R), for every 1 i k. 2. Every tuple u in R is surely in π S1 (R) π S2 (R)... π Sk (R). 3. How can we show that if the FDs in F hold in R, then every tuple in π S1 (R) π S2 (R)... π Sk (R) is also a tuple in R? Use the Chase test.

t 1 t 2 Steps in Chase Test S R A B C D A 1 S 2 S 3 S 1 S 2 S 3 D A C B C D A B C D t t 3 If a tuple t is in π S1(R) π S2(R)... π Sk (R), then there must be tuples t1, t 2,... t k in R such that t is the join of the projections of t i into S i, for every 1 i k. each ti agrees with t in the attributes in S i but has unknown values for the attributes not in S i. Using the FDs in F, we want to prove that t must be equal to some t i.

t 1 t 2 Steps in Chase Test S R A B C D A 1 S 2 S 3 S 1 S 2 S 3 D A C B C D A B C D t t 3 If a tuple t is in π S1(R) π S2(R)... π Sk (R), then there must be tuples t1, t 2,... t k in R such that t is the join of the projections of t i into S i, for every 1 i k. each ti agrees with t in the attributes in S i but has unknown values for the attributes not in S i. Using the FDs in F, we want to prove that t must be equal to some t i. 1. Draw a tableau to indicate which attributes we know the values of in the tuples t 1, t 2,... t k. 2. Use FDs to equate unknown attributes of these tuples. 3. When no FD can be applied, check if t is one of the tuples in the tableau.

Example of Chase Test Work out following two cases: 1. Decomposition of R(A, B, C, D) into S 1 (A, D), S 2 (A, C) and S 3 (B, C, D) with FDs A B, B C, and CD A. 2. Same decomposition with FD B AD Work out examples in Handout 3: 1. (Problem 1, part 6) Apply Chase test to decomposition of Inventory into Inventory1 and Inventory2. 2. (Problem 1, part 7) Modify one of the attributes in either Inventory1 or Inventory2 to obtain a lossless-join decomposition. Verify using the chase test. 3. (Problem 2, part 8(ii)) Apply Chase test to decomposition of Concerts into Concerts1 and Concerts2. 4. Apply Chase test to decomposition of Concerts into Concerts1 and Concerts3(City, Song, Album) and Concerts4(City, Year).