Normalisation Why normalise? To improve (simplify) database design in order to Avoid update problems Avoid redundancy Simplify update operations 1
Example ( the practical difference between a first normal form relation and an unnormalised table) EMPLOYEE-PROJECT EMP* EMP-NAME PROJECT E1 Smith CS 101 CS 203 EE 121 E2 Jones CS 202 CS 101 E3 Lee EE 410 2
Consider 2 transactions on EMPLOYEE-PROJECT T1: Insert the fact that Hanks works on project EE 202. T2: Insert the fact that Smith works on project EE 202. 3
No difference between the two transactions with EMPLOYEE-PROJECT* EMPLOYEE-PROJECT* EMP# EMP-NAME PROJECT E1 Smith CS 101 E1 Smith CS 203 E1 Smith EE 121 E2 Jones CS 202 E2 Jones CS 101 E3 Lee EE 410 4
Normalisation theory allows us to detect such cases and shows how relations can be converted to more suitable forms. 5
Numerous normal forms have been defined: 1NF: first normal form 2NF: second normal form 3NF: third normal form BCNF: Boyce-Codd normal form o 4NF: fourth normal form o 5NF: fifth normal form 6
Each succeeding normal form improves on the previous one by specifying further constraints on the relations. 7
Definition: 1NF 1NF: A relation is in first normal form if and only if it contains atomic values only. 1NF relations can still have undesirable features. 8
Recall the S-P-SP Database Suppose we design the S-P-SP database differently: Instead of S(S#, SNAME, STATUS,CITY) SP(S#, P#, QTY) we have a single relation SSP (S#, SNAME, STATUS,CITY, P#, QTY). (leaving the P relation as it is). 9
S S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens 10
SP S# P# QTY S# P# QTY S1 P1 300 S4 P2 200 S1 P2 200 S4 P4 300 S1 P3 400 S4 P5 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 11
An instance of the SSP S#SNAME STATUS CITY P# QTY S1Smith 20 London P1 300 S1Smith 20 London P2 200 S1Smith 20 London P3 400 S1Smith 20 London P4 200 S1Smith 20 London P5 100 S1Smith 20 London P6 100 S2Jones 10 Paris P1 300 S2Jones 10 Paris P2 400 S3Blake 30 Paris P2 200 S4Clark 20 London P2 200 S4Clark 20 London P4 300 S4Clark 20 London P5 400 12
Do you see any problems with SSP? Redundancies e.g. every tuple for supplier S1 shows SNAME to be Smith, and CITY to be London. Update problems Suppose supplier S1 moves from London to Paris. 13
Compare With The Original Table S S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens 14
There are other problems associated with the design of the SSP relation which we will discuss later. To be able to identify all these problems and solutions in general, we have to know about functional dependencies. 15
Functional Dependencies Given a relation R, and X, Y subsets of the set of attributes of R, Y is functionally dependent on X if and only if each X-value in R has associated with it at most one Y-value in R. In other words, whenever two tuples of R agree on their X-value, they must also agree on their Y-value. The functional dependency of Y on X is expressed by X Y (read as "X functionally determines Y".) 16
Examples S.S# S.SNAME S.S# S.STATUS S.S# S.CITY or more succinctly S.S# S.(SNAME, STATUS, CITY) or S# SNAME, STATUS, CITY if the context of relation S is understood. 17
In relation SP: S#, P# QTY Note: Dependencies are a matter of the semantics of the data, not merely a matter of the data values that happen to appear in a relation at some particular time. 18
Exercise: Functional Dependencies Find the functional dependencies amongst the following attributes: Snumber Name TutorName TutorRoom# Degree LectureCourse (LC) LectureCourseGrade(LCG) LectureRoom# (LR) LectureTime (LT) LectureRoomCapacity (LRC) 19
Which of these hold? Snumber Name, TutorName, Degree TutorName TutorRoom# Snumber TutorRoom# Snumber LectureCourse LectureCourse LectureRoom# LectureRoom# LectureRoomCapacity LectureCourse LectureTime 20
Which of these hold? Snumber,Name TutorName LectureCourse,LectureTime LectureRoom# TutorName LectureCourse 21
Inference Axioms for functional dependencies Given a set of functional dependencies we can derive others using the following inference axioms. In these axioms: X, Y, Z, W denote sets of attributes, XY is a shorthand for X Y. 22
LHS = RHS If you know the items given on the LHS then you can infer the items on the RHS. 23
Axioms for functional dependencies A1:Reflexivity = X X A2:Augmentation X Y = XZ Y X Y = XZ YZ A3:Transitivity (X Y) (Y Z) = X Z The axioms A1-A3 are called Armstrong's axioms. 24
Example of Augmentation: Snumber TutorName Snumber, SName TutorName 25
Axioms cntd. A4: Additivity (X Y) (X Z) = X YZ A5: Projectivity X YZ = X Y X YZ = X Z A6:Pseudotransitivity (X Y) (YZ W) = XZ W A4-A6 can be derived from A1 - A3. 26
Exercise Derive A4 from A1 - A3. 27
X Y X Z??? X YZ given given 28
Some Useful Definitions A functional dependency of the form X Y is trivial iff Y X. (E.g AB A) Let F be a set of fds. The closure of F, denoted F +, is the set of all fds logically implied by F. Let X be a set of attributes. The set of all attributes functionally determined by X under a set F of fds is called the closure of X under F, and is denoted X + F. 29
Example: F: B C C D A E CE F Then B + F = {B, C, D} AB + F =??? 30
An Algorithm for computing result : = X; X + F while (changes to result) do for each fd B C in F do begin end; if B result then result:= result C 31
Example: Find AG + F where F is: F: A B CG I C D H C B H result = {A,G} result = {A,G,B,H} result = {A, G,B,H,C,I,D} result = {A,G,B,H,C,I,D} AG + F = {A,B,C,D,G,H,I} 32
The speed of the algorithm is dependent on the size of F. Also these functional dependencies are part of the integrity constraints of the data stored in the database. These constraints have to be checked when updating the database and maintained. So it can pay to reduce the size of F without changing its closure. 33
Integrity Constraints Name Degree Length of Degree Smith A. BEng 3 Smith B. MEng 4 Fran C. MSc 1 Jones D. MSc 1 Name Degree Degree Length of Degree Update: Change Length of Degree of Fran C. to 2. 34
Definition : equivalent sets of FDs Two sets of functional dependencies S1 and S2 are equivalent if and only if S1 + =S2 +, i.e. S1 implies all the fds in S2, and vice versa. Informally: S1 and S2 are equivalent if they contain exactly the same information. 35
Definition: Irreducible/Canonical Cover An irreducible cover for a set F of fds, denoted Fc, is a set of fds that satisfies the following four conditions: 1. F and Fc are equivalent. 2.The right-hand side of every dependency in Fc involves just one attribute. 36
3.The left-hand side of every dependency in Fc is irreducible, i.e. no attribute in any lefthand side can be discarded without changing the closure. 4. No fd in Fc can be discarded without changing the closure. 37
For each set of functional dependencies there exists at least one irreducible cover, but each set does not necessarily have a unique irreducible cover. 38
Example: Irreducible Cover F: A BC B C A B AB C AC D 39
A BC B C A B AB C AC D A B A C 40
First rewrite F as: A B A C B C A B AB C AC D This set is equivalent to F by additivity and projectivity. 41
A B A C B C A B A B AB C AC D 42
Delete the duplication of A B: A B A C B C AB C AC D 43
AB C can be deleted without changing the closure, because it is implied from A C by augmentation. A B A C B C AB C AC D 44
A B A C B C AC D 45
A B A C B C A C D 46
C can be deleted from AC D, because A D is implied from A C and AC D by augmentation and transitivity: A B A C B C A D 47
A B A C B C A D 48
Finally A C can be deleted because it is implied by A B and B C by transitivity: A B B C A D This final set is an irreducible cover for F. 49
Definitions: Candidate, Primary and Alternate Keys A set of attributes K of a relation R is a candidate key of R iff K satisfies the following two conditions: 1) K R This is a shorthand for saying that K functionally determines all attributes of R. 2) K' ( K' K and K' R) i.e. there is no proper subset of K that functionally determines all attributes of R. 50
Every relation has at least one candidate key. Some relations may have exactly one, but it is possible that some may have two or more. Historically, in the relational model, for any given relation, one of the candidate keys is chosen as the primary key, and then the remainder (if any) are called alternate keys. 51
Example Suppose in relation S, S# and SNAME both uniquely identify each supplier. Then S would have two candidate keys: S# and SNAME. We may choose S# as the primary key. Then SNAME becomes an alternate key. 52
The Entity Integrity Rule In the relational model, the primary key is constrained by the following integrity rule. Entity Integrity Rule: No attribute participating in the primary key of a relation is allowed to accept NULL values. ( NULL values represent unknown or non-existent values.) 53
Name Postcode Telephone Smith SW7 2BZ 02075948822 Jones SW5 5AT NULL Pitt NULL 02086662222 NULL = DO not know / Does not exist 54
Phew.. After all this work on FDs we will now use them to analyse the design of relations and improving the design through normalisation. 55