Relational Database Design

Similar documents

Database Design and Normal Forms

Database Design and Normalization

Schema Refinement, Functional Dependencies, Normalization

Design of Relational Database Schemas

Theory of Relational Database Design and Normalization

Chapter 10. Functional Dependencies and Normalization for Relational Databases

Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

Theory behind Normalization & DB Design. Satisfiability: Does an FD hold? Lecture 12

Chapter 7: Relational Database Design

Week 11: Normal Forms. Logical Database Design. Normal Forms and Normalization. Examples of Redundancy

Database Management Systems. Redundancy and Other Problems. Redundancy

Schema Refinement and Normalization

Chapter 7: Relational Database Design

Schema Design and Normal Forms Sid Name Level Rating Wage Hours

Functional Dependencies and Normalization

Functional Dependencies and Finding a Minimal Cover

Relational Database Design Theory

normalisation Goals: Suppose we have a db scheme: is it good? define precise notions of the qualities of a relational database scheme

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

Why Is This Important? Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) Example (Contd.)

Limitations of E-R Designs. Relational Normalization Theory. Redundancy and Other Problems. Redundancy. Anomalies. Example

Advanced Relational Database Design

Relational Database Design: FD s & BCNF

CS143 Notes: Normalization Theory

Database Management System

Theory of Relational Database Design and Normalization

Lecture Notes on Database Normalization

Chapter 10 Functional Dependencies and Normalization for Relational Databases

Chapter 8. Database Design II: Relational Normalization Theory

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

Database Design and Normalization

Functional Dependency and Normalization for Relational Databases

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

Normalisation. Why normalise? To improve (simplify) database design in order to. Avoid update problems Avoid redundancy Simplify update operations

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

Relational Normalization: Contents. Relational Database Design: Rationale. Relational Database Design. Motivation

Relational Normalization Theory (supplemental material)

RELATIONAL DATABASE DESIGN

Introduction to Database Systems. Normalization

Normalization in Database Design

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

Limitations of DB Design Processes

Objectives of Database Design Functional Dependencies 1st Normal Form Decomposition Boyce-Codd Normal Form 3rd Normal Form Multivalue Dependencies

Database Systems Concepts, Languages and Architectures

Introduction Decomposition Simple Synthesis Bernstein Synthesis and Beyond. 6. Normalization. Stéphane Bressan. January 28, 2015

Jordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010

How To Find Out What A Key Is In A Database Engine

Quiz 3: Database Systems I Instructor: Hassan Khosravi Spring 2012 CMPT 354

Design Theory for Relational Databases: Functional Dependencies and Normalization

Determination of the normalization level of database schemas through equivalence classes of attributes

Functional Dependencies

CSCI-GA Database Systems Lecture 7: Schema Refinement and Normalization

Database Constraints and Design

SQL DDL. DBS Database Systems Designing Relational Databases. Inclusion Constraints. Key Constraints

An Algorithmic Approach to Database Normalization

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

Normalization. CIS 331: Introduction to Database Systems

Normalisation in the Presence of Lists

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 5: Normalization II; Database design case studies. September 26, 2005

LiTH, Tekniska högskolan vid Linköpings universitet 1(7) IDA, Institutionen för datavetenskap Juha Takkinen

DATABASE NORMALIZATION

Boyce-Codd Normal Form

Database Sample Examination

Introduction to Database Systems. Chapter 4 Normal Forms in the Relational Model. Chapter 4 Normal Forms

Lecture 2 Normalization

Normalisation and Data Storage Devices

3. Database Design Functional Dependency Introduction Value in design Initial state Aims

Normalization for Relational DBs

Normalization of database model. Pazmany Peter Catholic University 2005 Zoltan Fodroczi

MCQs~Databases~Relational Model and Normalization

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London Martin Paris Deen London

LOGICAL DATABASE DESIGN

Graham Kemp (telephone , room 6475 EDIT) The examiner will visit the exam room at 15:00 and 17:00.

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3)

A Web-Based Environment for Learning Normalization of Relational Database Schemata

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

Normalization Theory for XML

Boolean Algebra Part 1

Normalization in OODB Design

BCA. Database Management System

Normal forms and normalization

Normalization of Database

Question 1. Relational Data Model [17 marks] Question 2. SQL and Relational Algebra [31 marks]

Unit 3 Boolean Algebra (Continued)

CSE-3421: Exercises. with answers

CH3 Boolean Algebra (cont d)

TYPICAL QUESTIONS & ANSWERS

Boolean Algebra (cont d) UNIT 3 BOOLEAN ALGEBRA (CONT D) Guidelines for Multiplying Out and Factoring. Objectives. Iris Hui-Ru Jiang Spring 2010

DATABASE MANAGEMENT SYSTEMS. Question Bank:

10CS54: DATABASE MANAGEMENT SYSTEM

Schema Mappings and Data Exchange

COVERS FOR FUNCTIONAL DEPENDENCIES

Normalization. Normalization. Normalization. Data Redundancy

Transcription:

Relational Database Design To generate a set of relation schemas that allows - to store information without unnecessary redundancy - to retrieve desired information easily Approach - design schema in appropriate normal form How to determine whether a schema is in normal form? - functional dependency: a collection of constraints - a key notion in relational database design Undesirable properties of bad design 1) repetition of information, resulting in wasted space and complicated updates 2) inability to represent certain information: introduction of null values 3) loss of information FD-1

Database Design To avoid undesirable features - properties of information repetition and null values suggest decomposition of relation schema - property of information loss implies lossy-join decomposition Major concern in DB design - how to specify constraints on the database and how to obtain lossless-join decomposition that avoids the undesirable properties above FD-2

Loss of Information Borrow = (B-name, Loan#, C-name, Amount) Amt = Amount,C name (Borrow) Loan = B name,loan#,amount (Borrow) Amt Loan contains more tuples that the original relation B name ( C name =Jones (Amt may introduce a wrong answer Loan)) More tuples in Amt Loan implies less information lossy-join decomposition Reason for such anomaly - Amount does not uniquely relate B-name and C-name - customers may have loans in the same amount, but not necessarily at the same branch - uniqueness is critical: notion of key Functional dependency is a generalization of the notion of key (uniqueness) FD-3

Functional Dependency For a relation scheme R(A 1,..., A n ), let X and Y be subsets of attributes A 1,..., A n. X functionally determines Y (X Y) if relation r(r) represents the current instance of the schema R, and it is not possible to have two tuples in r that agree in components of all attributes X and disagree on some attributes in Y. Properties - if X is a key, then X Y for any possible set of attributes of R - FD allows to express constraints that cannot be expressed just using keys <ex> Lending (B-name, Assets, B-city, Loan#, C-name, Amount) B-name is not a superkey, since a branch may have many loans to many customers, but we can express the dependency B-name B-city FD-4

Functional Dependency Assign Pilot Flight Date Time ------------------------------------------- A 112 3/11 1325 B 301 3/10 0600 B 105 3/11 1325 - Flight functionally determines Time Flight Time - For any given Pilot, Date, and Time, there is only one flight Pilot, Date, Time Flight Formal definition A set of attributes X functionally determines the set Y if t 1 (X) = t 2 (X) implies t 1 (Y) = t 2 (Y) or, equivalently t 1 (Y) t 2 (Y) implies t 1 (X) t 2 (X) or, equivalently for all X, Y R, Y( X =x (r)) 1 FD-5

Notes on Functional Dependency FD is a statement about the universe as we understand it. - FD is a semantic integrity constraints, which must hold for all the tuples in the relation - all relations must satisfy all FDs; otherwise they are not correct It is not true to say "Since X functionally determines Y, if we know X, we know Y." Or, "If X Y, then X identifies Y." Let R be a schema, then X R iff X is a superkey of R Some FDs are trivial A A X Y if Y X FD-6

Use of Functional Dependency Legality test - check whether the relations are legal under a given set of FDs - if r is legal under a given set of FDs F, we say r satisfies F. Constraints specification - express constraints on the set of legal relations - rules to be used in database design <ex> r A B C D --------------------- a1 b1 c1 d1 a1 b2 c1 d2 A C a2 b2 c2 d2 AB D a2 b3 c2 d3 C A a3 b3 c2 d4 FD-7

Logical Implications of Functional Dependency Not enough to consider only the given set of FDs - need to consider all FDs that hold - for a given set of FDs F, certain other FDs also hold: they are logically implied by F <ex> R(A, B, C) FD: {A B, B C} A B B C A C <Proof> If t 1 (A)=t 2 (A), then t 1 (B)=t 2 (B) by A B. Since B C, t 1 (C)=t 2 (C). Therefore if t 1 (A)=t 2 (A), then t 1 (C)=t 2 (C). Hence A C. Closure of F (F + ) - set of all FDs logically implied by F - if F = F +, F is called a full family of dependencies - how to compute F +? use inference rules FD-8

Inference Rules A means of inferring the existence of FD from the given set Completeness and soundness - completeness: given F, the rules allow us to determine all dependencies in F + - soundness: we cannot generate any FD not in F + <ex> R=(A, B, C, D) F={A B, B C} F + = {A B, B C, A C} A D? If we get it by the rules, they are not sound. If we cannot get A C by the rules, they are not complete. Are there complete and sound inference rules to compute F +? Armstrong s axioms FD-9

Armstrong s Axioms (1) reflexivity rule if Y X, then X Y holds (trivial dependency) (2) augmentation rule if X Y, then XZ YZ (3) transitivity rule if X Y and Y Z, then X Z Although these three rules are complete, there are additional three rules to compute F + directly (4) additivity (union) rule if X Y and X Z, then X YZ (5) projectivity (decomposition) rule if X YZ, then X Y and X Z (6) pseudo-transitivity rule if X Y and WY Z, then XW Z FD-10

Additional Rules Additional rules can be derived from the original rules union rule: 1. X Y given 2. X XY augment X 3. X Z given 4. XY YZ augment Y 5. X YZ transitivity 2 & 4 decomposition: 1. X YZ given 2. YZ Z reflexivity 3. X Z transitivity 1 & 2 pseudo-transitivity: 1. X Y given 2. XW YW augment W 3. WY Z given 4. XW Z transitivity 2 & 3 Theorem: Armstrong s axioms are sound and complete FD-11

Computing Closure Let X be a set of attributes, then X + is the set of all attributes functionally determined by X under a set of FDs F. <Algorithm> Input: F and X Output: X + (the closure of X with respect to F) Compute a sequence of sets of attributes X (0), X (1),... by the following rule: (1) X (0) = X (2) X (i+1) = X (i) Z if Y Z F and Y X (i) (3) stop when X (i+1) = X (i) <ex> {A C, B C, C D, DE C, CE A} 1. X = AD 2. X = BC X (0) = {AD} X (0) = {BC} X (1) = {AD} {C} = {ACD} X (1) = {BC} {CD} = {BCD} X (2) = {ACD} = X (1) X (2) = {BCD} = X (1) stop stop FD-12

Relation Decomposition One of the properties of bad design suggests to decompose a relation into smaller relations - must achieve lossless-join decomposition (non-additive join) <ex> R = (A, B, C) r A B C F = {A B} --------------- a1 b1 c1 a2 b1 c2 Decomposition 1: r1 A B r2 B C r1 r2 A B C --------- --------- --------------- a1 b1 b1 c1 a1 b1 c1 a2 b1 b1 c2 a1 b1 c2 a2 b1 c1 a2 b1 c2 Decomposition 2: r1 A B r2 A C r1 r2 A B C --------- --------- --------------- a1 b1 a1 c1 a1 b1 c1 a2 b1 a2 c2 a2 b1 c2 FD-13

Lossless Join If R is a relation schema decomposed into R 1,..., R K, and F is a set of FDs, the decomposition is called a lossless join with respect to F, if for every relation r(r) satisfying F r = R1 (r) R K (r) r is the natural join of its projection onto R i Recoverability - lossless join property is necessary if the decomposed relation is to be recovered from its decomposition Testing lossless join Let R be a schema and F be a set of FDs on R, and =(R 1, R 2 ) be a decomposition of R. Then has a lossless join w.r.t. F iff either R 1 R 2 R 1 (or R 1 R 2 ) or R 1 R 2 R 2 (or R 2 R 1 ) where such FD F + FD-14

Lossless Join Decomposition From the previous example: R=(ABC) F={A B} (1) R 1 = (AB) R 2 = (AC) R 1 R 2 = A R 1 R 2 = B A B in F? Yes. (1) R 1 = (AB) R 2 = (BC) R 1 R 2 = B R 1 R 2 R 2 = A R 1 = C B A in F? No. F +? No. B C in F? No. F +? No Therefore lossy join <ex> R=(City, Street, Zip) F={CS Z, Z C} R 1 = (CZ) R 2 = (SZ) R 1 R 2 = Z R 1 R 2 = C Z C in F? Yes. FD-15

Dependency Preservation Decomposition Another desirable property of decomposition - each FD specified in F either appears directly in one of the relations in the decomposition, or be inferred from FDs that appear in some relation Why desirable? - when updating the DB, the system must check all the FDs are satisfied - for efficiency, violation detection can be done without performing join operation FDs need to be tested by checking one relation A decomposition preserves a set of FDs F, if the union of all FDs in R 1 (F) logically implies all FDs in F F i = R1 (F + ) F = F i check if (F ) + = F + FD-16

Testing Dependency Preservation <ex> R=(City, Street, Zip) F={CS Z, Z C} R 1 =(S,Z) R 2 =(C,Z) (1) lossless join? R 1 R 2 = Z, R 1 R 2 = C, Z C in F? Yes (2) dependency preserving? R 1 : only trivial FD R 2 : Z C and trivial FD ( R 1 (F) R 2 (F)) + F + They do not imply CS Z. Hence the decomposition does not preserve dependency. Algorithm for testing dependency preservation - given in the text book, but is not very practical since it requires computing F + that takes exponential time FD-17

Minimal Redundancy Another desirable property of decomposition - decomposition should contain as little redundant information as possible - degrees to which we can achieve the lack of redundancy is represented by several normal forms Normalization process - first introduced by Codd in 1972 - a series of tests to certify whether or not a relation schema belongs to a certain normal form - Codd proposed three normal forms: 1NF, 2NF, and 3NF - stronger 3NF was proposed by Boyce and Codd: BCNF - 1NF, 2NF, 3NF, and BCNF are all based on FDs - 4NF is based on multivalue dependency - 5NF is based on join dependency - domain-key normal form represents an ultimate normal form FD-18