Normalization of database model Pazmany Peter Catholic University 2005 Zoltan Fodroczi
Closure of an attribute set Given a set of attributes α define the closure of attribute set α under F (denoted as α + ) as the set of attributes that are functionally depend on α under F. Example R(A,B,C,G,H,I) F={A->B,A->C,CG->H,CG->I,B->H} (AG) + => AG trivial ABCG => (A->B,A->C) ABCGH => ( CG->H ) ABCGHI => (CG->I) Is AG is a superkey? Does AG->R? Is AG a candidate key?
Why to normalize? Teacher(T-Name,T-No,U-Name,U-No) Students(S-Name,S-No,U-No) Location(U-No,Room,Time); If all teacher who teach a particular unit leave, the information about the unit (U-Name) is lost. If a teacher teaches many units then information on the teacher is unnecessarily replicated. Similary information about students attending to many units is unnecessarily duplicated. To update the U-Name, one may have to update many Teacher records. Normalization removes such problems!
First Normal Form A relation is in 1NF if it does not contain multivalued field or nested relations, but all the fields are atomic. Eg: Teacher(T-Name, T-No,Units(U-No,U-Name)); Teacher(T-No,T-Name,U-No,U-name);
Second Normal Form A relation R is in 2NF if it is in 1NF and all of its non-prime attributes are fully functionally dependent on every candidate key of R. Full Functional Dependency: X,Y-->Z; X-\->Z and Y-/->Z than X,Y-fully->Z An attribute β is prime if and only if β is involved in any candidate key. Eg: Is in 2NF? Teacher(T-No,T-Name,U-No,U-Name); F={T-No->T-Name, U-No->U-Name}; The candidate key is (T-No,U-No) U-Name and T-Name are non-prime attributes, but they does not fully functionally depend on the candidate key, since U-No->U-Name and T-No- >T-Name are violate it.
Checking 2 nd normal form property Algorithm: Determine all keys Find all non-prime attribute Check the condition of 2 nd NF Consider the following schema: Source(Supp-No,Part-No,Supp-Details,Supp-Name,Price); F={Supp-No->Supp-Details, Supp-No,Part-No->Price, Supp-No->Supp-Name}; Is in 2NF?
2 nd Normal Form - example Candidate Key: (Supp-No, Part-No) Non-Prime attribute: Supp-Details, Supp-Name Violating dependecies are: Supp-No->Supp-Name Supp-No->Supp-Details Not in 2 nd NF.
Third Normal Form A relation R is in 3NF if and only if, it is in 1NF and for any nontrivial functional dependency (a->a) in F + either a superkey or A is a primeattribute. Theorem: A relational schema (R,F) is in 3NF if and only if there is no key a and non-prime attribute A such that A transitively depends on a. Recall transitive dependency: A->B, B->C, => A->C; R(A,B,C) would not be in 3NF; Example: Employee(E-No,E-Name,Dept-No,Salary,Location) F={E-No->E-Name,E-No->Dept-No,E-No->Salary, E-No->Location,Dept-No->Location} Location is a non-prime attribute that transitively depends on E-No, through Dept-No.
Checking 3 nd NF property Algorithm: Determine all keys Find all non-prime attribute Check the condition of 3NF schemas For all dependency a->b Is this dependency trivial if not, then check if a is superkey if not, then check if b contains only prime attributes
Example I Timtetable(S-No,U-No,Time,S-Name,U-Name,Room-No); F={S-No->S-Name,U-No->U-Name, S-No,Time->RoomNo} Is in 3NF? The candidate key is (S-No,U-No,Time) -> non-prime attributes are S-Name, U-Name, Room-No S-No->S-Name: nontrivial, S-No is not a superkey, S-Name is non-prime => not in 3NF
Example II Stock(Bin-No,Part-No,Bin-Quantity,Re-Order-Level); F={Bin-No->Part-No,Bin-No->Bin-Quantity,Part-No->Re-Order-Level} Is in 3NF? Candidate key: (Bin-No) Non-prime attributes: Bin-Quantity, Re-Order-Level Bin-No->Part-No: nontrivial, Bin-No is not superkey, Part-No is nonprime => violates 3NF Bin-No->Bin-Quantity: nontrivial, not superkey, non-prime => violating
Boyce-Codd Normal Form Relation R is in BCNF if it is in 1NF and for every nontrivial functional dependency α β from F +, the attribute set α is a superkey. Eg:R( A, B, C, D, E), F={A D, B E, DE C}. Is it in BCNF? Decomposition algorithm: - Find violating dependencies from F+ like a->b where a is not superkey, if not such like that, then it is in BCNF and stop. - Decompose R into R1=(a,B) and R2=R-R1 - Repeat from first step for R1 using proj R1 (F + ) and for R2 using proj R2 (F + ) Example: A->D violates R1:= AD, R2:= ABCE R1 is in BCNF, {A->D (superkey),a->a (trivial) } but R2 is not in BCNF, because { A A, B B, + other trivials, AB ABCE, B E, AE C }, so the candidate key for R2 is AB, so B E violates2nf. R2 is decomposed into: R21:=BE R22:=ABC Decomposition: R1, R21, R22 all 3 relations are in BCNF.