Normalization Normalization Purpose Redundancy and Data Anomalies 1FN: First Normal Form 2FN: Second Normal Form 3FN: Thrird Normal Form Examples 1 Normalization Normalization is the process of efficiently organizing data in a database with two goals in mind First goal: to eliminate redundant data for example, don t storing the same data in more than one table Second Goal: to ensure data dependencies make sense for example, only storing related data in a table 2 1
Normalization Purpose To avoid redundancy by storing each fact within the database only once. To put data into a form that conforms to relational principles (e.g., single valued attributes, each relation represents one entity) To put the data into a form that is more able to accurately accommodate change To facilitate the enforcement of data constraints. To avoid anomalies. Summary : All attributes in a table must be atomic, and solely dependant upon the fully primary key of that table. 3 The Solution: Normal Forms Bad database designs have: redundancy: inefficient storage. anomalies: data inconsistency, difficulties in maintenance 1NF, 2NF, 3NF, BCNF are some of the early forms in the list that address this problem 4 2
Redundancy and Data Anomalies Redundant data is where we have stored the same information more than once. i.e., the redundant data could be removed without the loss of information. Example: We have the following relation that contains staff and department details: staffno job dept dname city SL10 Salesman 10 Sales Stratford SA51 Manager 20 Accounts Barking DS40 Clerk 20 Accounts Barking OS45 Clerk 30 Operations Barking Such redundancy could lead to the following anomalies Insert Anomaly: We can t insert a dept without inserting a member of staff that works in that department Update Anomaly: We could change the name of the dept that SA51 works in without simultaneously changing the dept that DS40 works in. Deletion Anomaly: By removing employee SL10 we have removed all information pertaining to the Sales dept. 2010/2011 Topic #2: Normalization 5 (Relational Database Design) First Normal Form An atomic attribute cannot be decomposed into meaningful components: Examples: Gender (atomic) Price: $11.00. Currency (dollar) plus number (May or may not be atomic) Name: John Smith. first name plus family name (May or may not be atomic) An scheme satisfies the first normal form (1NF) if all attribute values are atomic: no repeating groups, no composite attributes Do not try intelligent solutions in which a string can be parsed: B427 -> building B, floor 4, room 27 7 3
Second Normal Form Second normal form (2NF) addresses the concept of removing duplicate data R satisfies the 2NF if R is 1NF, and All non-prime attributes are fully dependent on the primary key Example registration(student_id, student_name, course_id, ( course_name Satisfies 1FN (if names are atomic) student_name depends on student_id but not on course_id Solution: split relation in three ( student_name student(student_id, ( course_name course(course_id, ( course_id registration(student_id, 8 Third Normal Form Remove columns that are not dependent upon the primary key Definition of transitive dependence: One attribute depends transitively on another if depends on an attribute that is not the primary key Example: ( client_name order(order_id, date, client_id, It satisfies 1FN y 2FN with primary key order_id But client_name changes if client_id changes Solution create two relations ( client_id order(order_id, date, ( client(client_id,client_name 9 4
Summary Steps in the Normalization process (up to 3NF): 1. Relation must have a fixed number of atomic attributes. 2. Identify primary key. 3. Check that all attributes (except primary key members) depend on the WHOLE primary key (not a part). 4. If partial dependency ((3) is not true) break the relation. 5. Check that the attributes don t depend on another attribute, which is not the primary key. 6. If transitive dependency ((5) is not true) break the relation. 10 ORDER Unnormalised Normal Form (UNF) Customer No: 001964 Order Number: 00012345 Name: Mark Campbell Order Date: 14-Feb-2002 Address: 1 The House Leytonstone E11 9ZZ Product Product Unit Order Line Number Description Price Quantity Total T5060 Hook 5.00 5 25.00 PT42 Bolt 2.50 10 20.50 QZE48 Spanner 20.00 1 20.00 Order Total: 65.50 ORDER (order-no, order-date, cust-no, cust-name, cust-add, (prod-no, prod-desc, unit-price, ord-qty, line-total)*, order-total 2010/2011 Topic #2: Normalization 11 (Relational Database Design) 5
First Normal Form (1NF) Definition: A relation is in 1NF if, and only if, all its underlying attributes contain atomic values only. Remove repeating groups into a new relation A repeating group is shown by a pair of brackets within the relational schema. ORDER (order-no, order-date, cust-no, cust-name, cust-add, (prod-no, prod-desc, unit-price, ord-qty, line-total)*, order-total Steps from UNF to 1NF: Remove the outermost repeating group (and any nested repeated groups it may contain) and create a new relation to contain it. Add to this relation a copy of the PK of the relation immediately enclosing it. Name the new entity (appending the number 1 to indicate 1NF) Determine the PK of the new entity Repeat steps until no more repeating groups. 2010/2011 Topic #2: Normalization 12 (Relational Database Design) Example - UNF to 1NF ORDER (order-no, order-date, cust-no, cust-name, cust-add, (prod-no, prod-desc, unit-price, ord-qty, line-total)*, order-total 1. Remove the outermost repeating group (and any nested repeated groups it may contain) and create a new relation to contain it. (rename original to indicate 1NF) ORDER-1 (order-no, order-date, cust-no, cust-name, cust-add, order-total (prod-no, prod-desc, unit-price, ord-qty, line-total) 2. Add to this relation a copy of the PK of the relation immediately enclosing it. ORDER-1 (order-no, order-date, cust-no, cust-name, cust-add, order-total (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total) 3. Name the new entity (appending the number 1 to indicate 1NF) ORDER-LINE-1 (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total) 4. Determine the PK of the new entity ORDER-LINE-1 (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total) 2010/2011 Topic #2: Normalization 13 (Relational Database Design) 6
Second Normal Form (2NF) Definition: A relation is in 2NF if, and only if, it is in 1NF and every non-key attribute is fully dependent on the primary key. Steps from 1NF to 2NF: Remove the offending attributes that are only partially functionally dependent on the composite key, and place them in a new relation. Remove partial functional dependencies into a new relation Add to this relation a copy of the attribute(s) which are the determinants of these offending attributes. These will automatically become the primary key of this new relation. Name the new entity (appending the number 2 to indicate 2NF) Rename the original entity (ending with a 2 to indicate 2NF) 2010/2011 Topic #2: Normalization 14 (Relational Database Design) Example - 1NF to 2NF ORDER-LINE-1 (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total) 1. Remove the offending attributes that are only partially functionally dependent on the composite key, and place them in a new relation. ORDER-LINE-1 (order-no, prod-no, ord-qty, line-total) (prod-desc, unit-price) 2. Add to this relation a copy of the attribute(s) which determines these offending attributes. These will automatically become the primary key of this new relation.. ORDER-LINE-1 (order-no, prod-no, ord-qty, line-total) (prod-no, prod-desc, unit-price) 3. Name the new entity (appending the number 2 to indicate 2NF) PRODUCT-2 (prod-no, prod-desc, unit-price) 4. Rename the original entity (ending with a 2 to indicate 2NF) ORDER-LINE-2 (order-no, prod-no, ord-qty, line-total) 2010/2011 Topic #2: Normalization 15 (Relational Database Design) 7
Third Normal Form (3NF) Definition: A relation is in 3NF if, and only if, it is in 2NF and every non-key attribute is non-transitively dependent on the primary key. Remove transitive dependencies into a new relation Steps from 2NF to 3NF: Remove the offending attributes that are transitively dependent on non-key attribute(s), and place them in a new relation. Add to this relation a copy of the attribute(s) which are the determinants of these offending attributes. These will automatically become the primary key of this new relation. Name the new entity (appending the number 3 to indicate 3NF) Rename the original entity (ending with a 3 to indicate 3NF) 2010/2011 Topic #2: Normalization 16 (Relational Database Design) Example - 2NF to 3NF ORDER-2 (order-no, order-date, cust-no, cust-name, cust-add, order-total 1. Remove the offending attributes that are transitively dependent on non-key attributes, and place them in a new relation. ORDER-2 (order-no, order-date, cust-no, order-total (cust-name, cust-add ) 2. Add to this relation a copy of the attribute(s) which determines these offending attributes. These will automatically become the primary key of this new relation.. ORDER-2 (order-no, order-date, cust-no, order-total (cust-no, cust-name, cust-add ) 3. Name the new entity (appending the number 3 to indicate 3NF) CUSTOMER-3 (cust-no, cust-name, cust-add ) 4. Rename the original entity (ending with a 3 to indicate 3NF) ORDER-3 (order-no, order-date, cust-no, order-total 2010/2011 Topic #2: Normalization 17 (Relational Database Design) 8
Example - Relations in 3NF ORDER-3 (order-no, order-date, cust-no, order-total CUSTOMER-3 (cust-no, cust-name, cust-add ) PRODUCT-2 (prod-no, prod-desc, unit-price) ORDER-LINE-2 (order-no, prod-no, ord-qty, line-total) order-no ORDER prod-no PRODUCT places placed by contains shows belongs to cust-no part of order-no, prod-no CUSTOMER ORDER-LINE 2010/2011 Topic #2: Normalization 18 (Relational Database Design) Normalize: (1) ( date holiday(place_id, place_name, client_id, client_name, Atomic Attributes? Yes, it is 1FN 2FN All attributes (except those belonging to the primary key) depend on the whole primary key? place_name depends on place_id ( place_name create: place_2(place_id, client_name depends on client_id ( client_name create: client_2 (client_id, ( date and holiday_2 (place_id, client_id, Transitive Dependences? No transitive dependences 3FN 19 9
Normalize: (2) ( book(room_id,date,client_id,client_name Atomic Attributes? Yes, it is 1FN 2FN All attributes (except those belonging to the primary key) depend on the whole primary key? client_name depends on client_id create: client_2 (client_id, ( client_name book_2 (, room_id,date,client_id ) Transitive Dependences? No transitive dependences 3FN 20 10