Comp 5311 Database Management Systems. 5. Functional Dependencies

Similar documents
Database Design and Normalization

Relational Database Design

Schema Design and Normal Forms Sid Name Level Rating Wage Hours

Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

Chapter 7: Relational Database Design

Limitations of E-R Designs. Relational Normalization Theory. Redundancy and Other Problems. Redundancy. Anomalies. Example

Functional Dependencies and Finding a Minimal Cover

Why Is This Important? Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) Example (Contd.)

Schema Refinement and Normalization

Relational Database Design: FD s & BCNF

Theory behind Normalization & DB Design. Satisfiability: Does an FD hold? Lecture 12

Database Design and Normal Forms

Design of Relational Database Schemas

Schema Refinement, Functional Dependencies, Normalization

Database Management Systems. Redundancy and Other Problems. Redundancy

Database Design and Normalization

Chapter 10 Functional Dependencies and Normalization for Relational Databases

Week 11: Normal Forms. Logical Database Design. Normal Forms and Normalization. Examples of Redundancy

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Overview - detailed. Goal. Faloutsos CMU SCS

Functional Dependencies

CS143 Notes: Normalization Theory

normalisation Goals: Suppose we have a db scheme: is it good? define precise notions of the qualities of a relational database scheme

Normalisation. Why normalise? To improve (simplify) database design in order to. Avoid update problems Avoid redundancy Simplify update operations

Functional Dependencies and Normalization

Theory of Relational Database Design and Normalization

Chapter 10. Functional Dependencies and Normalization for Relational Databases

Limitations of DB Design Processes

Lecture Notes on Database Normalization

Theory I: Database Foundations

Theory of Relational Database Design and Normalization

Chapter 8. Database Design II: Relational Normalization Theory

Advanced Relational Database Design

Database Management System

How To Find Out What A Key Is In A Database Engine

An Algorithmic Approach to Database Normalization

Database Constraints and Design

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

Relational Database Design Theory

Normalization in Database Design

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

RELATIONAL DATABASE DESIGN

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Chapter 7: Relational Database Design

SQL DDL. DBS Database Systems Designing Relational Databases. Inclusion Constraints. Key Constraints

Introduction Decomposition Simple Synthesis Bernstein Synthesis and Beyond. 6. Normalization. Stéphane Bressan. January 28, 2015

Database Systems Concepts, Languages and Architectures

Design Theory for Relational Databases: Functional Dependencies and Normalization

Introduction to Database Systems. Normalization

Determination of the normalization level of database schemas through equivalence classes of attributes

Boolean Algebra Part 1

Graham Kemp (telephone , room 6475 EDIT) The examiner will visit the exam room at 15:00 and 17:00.

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

Objectives of Database Design Functional Dependencies 1st Normal Form Decomposition Boyce-Codd Normal Form 3rd Normal Form Multivalue Dependencies

Relational Normalization: Contents. Relational Database Design: Rationale. Relational Database Design. Motivation

CSCI-GA Database Systems Lecture 7: Schema Refinement and Normalization

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3)

Jordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010

LiTH, Tekniska högskolan vid Linköpings universitet 1(7) IDA, Institutionen för datavetenskap Juha Takkinen

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

Functional Dependency and Normalization for Relational Databases

Database Sample Examination

The University of British Columbia

MCQs~Databases~Relational Model and Normalization

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

Relational Normalization Theory (supplemental material)

DATABASE NORMALIZATION

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 5: Normalization II; Database design case studies. September 26, 2005

Lecture 2 Normalization

Normalisation 1. Chapter 4.1 V4.0. Napier University

The Relational Data Model and Relational Database Constraints

Fundamentals of Database System

Normalization of database model. Pazmany Peter Catholic University 2005 Zoltan Fodroczi

Introduction to Database Systems. Chapter 4 Normal Forms in the Relational Model. Chapter 4 Normal Forms

Unit 3.1. Normalisation 1 - V Normalisation 1. Dr Gordon Russell, Napier University

Visa Smart Debit/Credit Certificate Authority Public Keys

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

INCIDENCE-BETWEENNESS GEOMETRY

Question 1. Relational Data Model [17 marks] Question 2. SQL and Relational Algebra [31 marks]

Normalization in OODB Design

DATABASE MANAGEMENT SYSTEMS. Question Bank:

Lecture 6. SQL, Logical DB Design

The Relational Data Model: Structure

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

Fragmentation and Data Allocation in the Distributed Environments

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

Normalisation in the Presence of Lists

CSE140: Midterm 1 Solution and Rubric

Database Design Methodology

Normalisation and Data Storage Devices

Boyce-Codd Normal Form

Databases What the Specification Says

2. Basic Relational Data Model

Reasoning about Fuzzy Functional Dependencies

Derivation of Database Keys Operations. Abstract. Introduction

Boolean Algebra (cont d) UNIT 3 BOOLEAN ALGEBRA (CONT D) Guidelines for Multiplying Out and Factoring. Objectives. Iris Hui-Ru Jiang Spring 2010

Database Design Methodologies

Transcription:

Comp 5311 Database Management Systems 5. Functional Dependencies 1

Functional Dependencies (FD) - Definition Let R be a relation scheme and X, Y be sets of attributes in R. A functional dependency from X to Y exists if and only if: For every instance of R of R, if two tuples in R agree on the values of the attributes in X, then they agree on the values of the attributes in Y We write X Y and say that X determines Y Example on PGStudent (sid, name, supervisor_id, specialization): {supervisor_id} {specialization} means If two student records have the same supervisor (e.g., Dimitris), then their specialization (e.g., Databases) must be the same On the other hand, if the supervisors of 2 students are different, we do not care about their specializations (they may be the same or different). Sometimes, we omit the brackets for simplicity: supervisor_id specialization 2

Trivial FDs A functional dependency X Y is trivial if Y is a subset of X {name, supervisor_id} {name} If two records have the same values on both the name and supervisor_id attributes, then they obviously have the same supervisor_id. Trivial dependencies hold for all relation instances A functional dependency X Y is non-trivial if Y X = {supervisor_id} {specialization} Non-trivial FDs are given in the form of constraints when designing a database. For instance, the specialization of a students must be the same as that of the supervisor. They constrain the set of legal relation instances. For instance, if I try to insert two students under the same supervisor with different specializations, the insertion will be rejected by the DBMS Some FDs are neither trivial nor non-trivial. 3

Functional Dependencies and Keys A FD is a generalization of the notion of a key. For PGStudent (sid, name, supervisor_id, specialization), we write: {sid} {name, supervisor_id, specialization} The sid determines all attributes (i.e., the entire record) If two tuples in the relation student have the same sid, then they must have the same values on all attributes. In other words they must be the same tuple (since the relational model does not allow duplicate records) 4

Superkeys and Candidate Keys using FD A set of attributes that determines the entire tuple is a superkey {sid, name} is a superkey for the PGstudent table. Also {sid, name, supervisor_id} etc. A minimal set of attributes that determines the entire tuple is a candidate key {sid, name} is not a candidate key because I can remove the name. sid is a candidate key so is HKID (provided that it is stored in the table). If there are multiple candidate keys, the DB designer chooses designates one as the primary key. 5

Closure of a Set of Functional Dependencies Given a set of functional dependencies F, there are certain other functional dependencies that are logically implied by F. The set of all functional dependencies logically implied by F is the closure of F. We denote the closure of F by F +. We can find all of F + by applying Armstrong s Axioms: if Y X, then X Y (reflexivity) if X Y, then ZX ZY (augmentation) if X Y and Y Z, then X Z (transitivity) these rules are sound and complete. 6

Examples of Armstrong s Axioms if Y X, then X Y (reflexivity generates trivial FDs) name name name, supervisor_id name name, supervisor_id supervisor_id if X Y, then ZX ZY (augmentation) sid name (given) supervisor_id, sid supervisor_id, name if X Y and Y Z, then X Z (transitivity) sid supervisor_id (given) supervisor_id specialization (given) sid specialization 7

Additional Rules We can further simplify computation of F + by using the following additional rules. If X Y holds and X Z holds, then X YZ holds (union) If X YZ holds, then X Y holds and X Z holds (decomposition) If X Y holds and ZY W holds, then ZX W holds (pseudotransitivity) The above rules can be inferred from Armstrong s axioms. E.g., pseudotransitivity X Y, ZY W (given) ZX ZY (by augmentation) ZX W (by transitivity) 8

Example of FDs in the closure F + R = (A, B, C, G, H, I) F = {A B A C CG H CG I B H} some members of F + A H AG I CG HI A B; B H A C; AG CG; CG I 9

Closure of Attribute Sets The closure of X under F (denoted by X + ) is the set of attributes that are functionally determined by X under F: X Y is in F + Y X + X is a set of attributes Given sid If sid name then name is part of sid + i.e., sid + = {sid, name, } If sid supervisor_id then supervisor_id is part of sid + i.e., sid + = {sid, name, supervisor_id, } If sid specialization then continue. Else stop 10

Algorithm for Computing Attribute Closure Input: R a relation scheme F a set of functional dependencies X R (the set of attributes for which we want to compute the closure) Output: X + the closure of X w.r.t. F X (0) := X Repeat X (i+1) := X (i) Z, where Z is the set of attributes such that there exists Y Z in F, and Y X (i) Until X (i+1) := X (i) Return X (i+1) 11

Closure of a Set of Attributes: Example R = {A,B,C,D,E,G} F = { {A,B} {C}, {C} {A}, {B,C} {D}, {A,C,D} {B}, {D} {E,G}, {B,E} {C}, {C,G} {B,D}, {C,E} {A,G}} X = {B,D} X (0) = {B,D} {D} {E,G}, X (1) = {B,D,E,G}, {B,E} {C} X (2) = {B,C,D,E,G}, {C} {A} X (3) = {A,B,C,D,E,G} X (4) = X (3) 12

Uses of Attribute Closure Testing for superkey To test if X is a superkey, we compute X +, and check if X + contains all attributes of R. Testing functional dependencies To check if a functional dependency X Y holds (or, in other words, X Y is in F + ), just check if Y X +. Computing the closure of F For each subset X R, we find the closure X +, and for each Y X +, we output a functional dependency X Y. Computing if two sets of functional dependencies F and G are equivalent, i.e., F+ = G+ For each functional dependency Y Z in F Compute Y+ with respect to G If Z Y+ then Y Z is in G+ And vice versa 13

Redundancy of FDs Sets of functional dependencies may have redundant dependencies that can be inferred from the others {A} {C} is redundant in: {{A} {B}, {B} {C},{A} {C}} Parts of a functional dependency may be redundant Example of extraneous/redundant attribute on RHS: {{A} {B}, {B} {C}, {A} {C,D}} can be simplified to {{A} {B}, {B} {C}, {A} {D}} (because {A} {C} is inferred from {A} {B}, {B} {C}) Example of extraneous/redundant attribute on LHS: {{A} {B}, {B} {C}, {A,C} {D}} can be simplified to {{A} {B}, {B} {C}, {A} {D}} (because of {A} {C}) 14

Canonical Cover A canonical cover for F is a set of dependencies F c such that F and F c are equivalent F c contains no redundancy Each left side of functional dependency in F c is unique. For instance, if we have two FD X Y, X Z, we convert them to X YZ. Algorithm for canonical cover of F: repeat Use the union rule to replace any dependencies in F X 1 Y 1 and X 1 Y 2 with X 1 Y 1 Y 2 Find a functional dependency X Y with an extraneous attribute either in X or in Y If an extraneous attribute is found, delete it from X Y until F does not change Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied 15

Example of Computing a Canonical Cover R = (A, B, C) F = {A BC B C A B AB C} Combine A BC and A B into A BC Set is now {A BC, B C, AB C} A is extraneous in AB C because of B C. Set is now {A BC, B C} C is extraneous in A BC because of A B and B C. The canonical cover is: A B B C 16

Pitfalls in Relational Database Design Relational database design requires that we find a good collection of relation schemas. Functional dependencies can be used to refine ER diagrams or independently (i.e., by performing repetitive decompositions on a "universal" relation that contains all attributes). A bad design may lead to several problems. 17

Problems of Bad Design T1 Assume the position determines the salary: position salary first_name last_name address department position salary Dewi Srijaya 12a Jln Lempeng Toys clerk 2000 Izabel Leong 10 Outram Park Sports trainee 1200 John Smith 107 Clementi Rd Toys clerk 2000 Axel Bayer 55 Cuscaden Rd Sports trainee 1200 Winny Lee 10 West Coast Rd Sports manager 2500 Sylvia Tok 22 East Coast Lane Toys manager 2600 Eric Wei 100 Jurong drive Toys assistant manager 2200 Redundant storage Update anomaly???? security guard 1500 key Potential deletion anomaly 18 Insertion anomaly

T2 Decomposition Example first_name last_name address department position Dewi Srijaya 12a Jln lempeng Toys clerk Izabel Leong 10 Outram Park Sports trainee John Smith 107 Clementi Rd Toys clerk Axel Bayer 55 Cuscaden Rd Sports trainee Winny Lee 10 West Coast Rd Sports manager Sylvia Tok 22 East Coast Lane Toys manager Eric Wei 100 Jurong drive Toys assistant manager T3 position salary clerk 2000 trainee 1200 manager 2500 assistant manager 2200 security guard 1500 No Redundant storage No Update anomaly No Deletion anomaly No Insertion anomaly 19

Normalization Normalization is the process of decomposing a relation schema R into fragments (i.e., smaller tables) R 1, R 2,.., R n. Our goals are: Lossless decomposition: The fragments should contain the same information as the original table. Otherwise decomposition results in information loss. Dependency preservation: Dependencies should be preserved within each R i, i.e., otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive. Good form: The fragments R i should not involve redundancy. Roughly speaking, a table has redundancy if there is a FD where the LHS is not a key (more on this later). 20

Lossless Join Decomposition A decomposition is lossless (aka lossless join) if we can recover the initial table In general a decomposition of R into R 1 and R 2 is lossless if and only if at least one of the following dependencies is in F + : R 1 R 2 R 1 R 1 R 2 R 2 In other words, the common attribute of R 1 and R 2 must be a candidate key for R 1 or R 2. Is the previous decomposition example (T2, T3) lossless? Yes because the common attribute of T2, T3 is position and it determines the salary; therefore it is a key for T3. 21

Example of a Lossy Decomposition Decompose R = (A,B,C) into R 1 = (A,B) and R 2 = (B,C) r A B C a 1 m a 2 n b 1 p,b (r) A B a 1 a 2 b 1 B,C (r) B C 1 m 2 n 1 p,b (r) B,C (r) A B C a 1 m a 1 p a 2 n b 1 m b 1 p It is a lossy decomposition: two extraneous tuples. You get more, not less!! B is not a key of either small table 22

Dependency Preserving Decomposition The decomposition of a relation scheme R with FDs F is a set of tables (fragments) R i with FDs F i F i is the subset of dependencies in F + (the closure of F) that include only attributes in R i. The decomposition is dependency preserving if and only if ( i F i ) + = F + 23

Non-Dependency Preserving Decomposition Example R = (A, B, C), F = {{A} {B}, {B} {C}, {A} {C}}. Key: A There is a dependency {B} {C}, where the LHS is not the key, meaning that there can be considerable redundancy in R. Solution: Break it in two tables R1(A,B), R2(A,C) (normalization) A B C 1 2 3 2 2 3 3 2 3 4 2 4 A B 1 2 2 2 3 2 4 2 A C 1 3 2 3 3 3 4 4 The decomposition is lossless because the common attribute A is a key for R1 (and R2) The decomposition is not dependency preserving because F1={{A} {B}}, F2={{A} {C}} and (F1 F2) + F +. We lost the FD {B} {C}. In practical terms, each FD is implemented as an assertion, which it is checked when there are updates. In the above example, in order to find violations, we have to join R1 and R2. Can be very expensive. 24

Dependency Preserving Decomposition Example R = (A, B, C), F = {{A} {B}, {B} {C}, {A} {C}}. Key: A Break R in two tables R1(A,B), R2(B,C) A B C 1 2 3 2 2 3 3 2 3 4 2 4 A B 1 2 2 2 3 2 4 2 B C 2 3 2 4 The decomposition is lossless because the common attribute B is a key for R2 The decomposition is dependency preserving because F1={{A} {B}}, F2={{B} {C}} and (F1 F2)+=F+ Violations can be found by inspecting the individual tables, without performing a join. 25