Introduction to normalization Lecture 4 Instructor Anna Sidorova Agenda Presentation Review of relational models, in class exersise Introduction to normalization In-class exercises Discussion of HW2 1
Next class Review for the midterm HW 2 is due March 7 midterm exam Based on Hoffer, Prescott and Topi Modern Database Management, (c) Prentice Hall 2009 HW 2 Chapter 4, Problem 6 develop a relational schema Convert ERD for Ch. 2, Problem 20 (a part of your HW1) into a relational schema (must be based on the correct solution) Chapter 4, Problems 7 and 8 (we will discuss the relevant material next class) Handout normalization exercises Based on Hoffer, Prescott and Topi Modern Database Management, (c) Prentice Hall 2009 2
Review of relational data models Figure 2-7 Three-schema architecture Different people have different views of the database these these are the external schema The internal schema is the underlying design and implementation Based on Hoffer, Prescott and Topi Modern Database Management, (c) Prentice Hall 2009 3
Relation Definition: A relation is a named, two-dimensional table of data Table consists of rows (records) and columns (attribute or field) Requirements for a table to qualify as a relation: It must have a unique name Every attribute value must be atomic (not multivalued, not composite) Every row must be unique (can t have two rows with exactly the same values for all their fields) Attributes (columns) in tables must have unique names The order of the columns must be irrelevant The order of the rows must be irrelevant NOTE: all relations are in 1 st Normal form Translating ERD into relational schema Map each entity into a relation Map each weak entity into a relation (include the identifier of the strong entity as a part of the primary key) Map each multivalued attribute into a relation (include the identifier of the entity as a part of the primary key) Map many-to-many relationships and associative entities into a relation Represent one-to-one and one-to-many relationships using foreign keys. Based on Hoffer, Prescott and Topi Modern Database Management, (c) Prentice Hall 2009 4
Normalization Learning Objectives Define Normalization Define 1 st, 2 nd and 3 rd Normal Forms Discuss normalization process 5
Normalization: Definitions Normalization is a method used to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data The process of decomposing relations with anomalies to produce smaller, well-structured relations 9.11 Well-Structured Relations A relation that contains minimal data redundancy and allows users to insert, delete, ete, and update rows without causing data inconsistencies Goal is to avoid anomalies Insertion Anomaly adding adding new rows forces user to create duplicate data Deletion Anomaly deleting rows may cause a loss of data that would be needed for other future rows Modification Anomaly changing data in a row forces changes to other rows because of duplication 12 6
Example Figure 5-2b Question Is this a relation? Question What s the primary key? Answer Yes: Unique rows and no multivalued attributes Answer Composite: Emp_ID, Course_Title 13 Anomalies in this Table Insertion can t enter a new employee without having the employee take a class Deletion if if we remove employee 140, we lose information about the existence of a Tax Acc class Modification giving a salary increase to employee 100 forces us to update multiple records Why do these anomalies exist? Because there are two themes (entity types) in this one relation. This results in data duplication and an unnecessary dependency between the entities 14 7
Normalization Process The goal is to bring each relation into the Third Normal Form. The process bringing a relation into the 3 rd Normal Form Goes through stages. 1 st Normal Form 2 nd Normal Form 3 rd Normal Form Functional Dependencies Functional Dependency A particular relationship between two attributes. For a given relation, attribute B is functionally dependent on attribute A if, for every valid value of A, that value of A uniquely determines the value of B Instances (or sample data) in a relation do not prove the existence of a functional dependency Knowledge of problem domain is most reliable method for identifying functional dependency 9.16 8
Functional Dependencies: Notations in Problems A B Attribute B is functionally dependent on attribute A (A determines B) A, B C Attributes A and B together determine attribute C A B, C Both attributes, B and C are determined by (functionally dependent on) attribute A Functional Dependencies We can draw functional dependencies between attributes of a relation as follows: STUDENT Stud_ID F_Name L_Name E-mail 111 Mary Jones mary@hotmail.com 122 Sara Smith smith@hotmail.com 9
Important Definitions Multivalued Attributes (repeating groups) non-key attributes or groups of non-key attributes the values of which are not uniquely identified d by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part). STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 Important Definitions A relation is unnormalized (not in the 1 st Normal Form) if it has multivalued l attributes or repeating groups. STUDENT Repeating Group Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 10
Important Definitions A relation is in the 1 st Normal Form if it has no multivalued attributes or repeating groups. STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 Important Definitions Partial Dependency when an non-key attribute is determined d by a part, but not the whole, of a COMPOSITE primary key. CUSTOMER Partial Dependency Cust_ID Name Order_ID 101 AT&T 1234 101 AT&T 156 125 Cisco 1250 11
Important Definitions A relation is NOT in the 2 nd Normal Form if it has partial dependencies. d CUSTOMER Partial Dependency Cust_ID Name Order_ID 101 AT&T 1234 101 AT&T 156 125 Cisco 1250 Important Definitions A relation is in the 2 nd Normal Form if it is in the 1 st Normal Form AND has no partial dependencies. d EMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sara Smith 2 Mktg 12
Important Definitions Transitive Dependency when a non-key attribute determines another non-key attribute. EMPLOYEE Transitive Dependency Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sara Smith 2 Mktg Important Definitions A relation is NOT in the 3 rd Normal Form if it has transitive dependencies. EMPLOYEE Transitive Dependency Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sara Smith 2 Mktg 13
Important Definitions A relation is in the 3 rd Normal Form if it is in the 2 nd Normal Form and has no transitive dependencies. EMPLOYEE Emp_ID F_Name L_Name Dept_ID 111 Mary Jones 1 122 Sara Smith 2 Normal Forms: Review Unnormalized There are multivalued attributes or repeating groups 1 NF No multivalued attributes or repeating groups. 2 NF 1 NF plus no partial dependencies 3 NF 2 NF plus no transitive dependencies 9.28 14
Example 1: Determine NF ISBN Title ISBN Publisher Publisher Address All attributes are directly or indirectly determined by the primary key; therefore, the relation is at least in 1 NF BOOK ISBN Title Publisher Address Example 1: Determine NF ISBN Title ISBN Publisher Publisher Address BOOK The relation is at least in 1NF. There is no COMPOSITE primary key, therefore there can t be partial dependencies. Therefore, the relation is at least in 2NF ISBN Title Publisher Address 15
Example 1: Determine NF ISBN Title ISBN Publisher Publisher Address BOOK Publisher is a non-key attribute, and it determines Address, another non-key attribute. Therefore, there is a transitive dependency, which means that the relation is NOT in 3 NF. ISBN Title Publisher Address Example 1: Determine NF ISBN Title ISBN Publisher Publisher Address We know that the relation is at least in 2NF, and it is not in 3 NF. Therefore, we conclude that the relation is in 2NF. BOOK ISBN Title Publisher Address 16
Example 1: Determine NF ISBN Title ISBN Publisher Publisher Address In your solution you will write the following justification: 1) No M/V attributes, therefore at least 1NF 2) No partial dependencies, therefore at least 2NF 3) There is a transitive dependency (Publisher Address), therefore, not 3NF Conclusion: The relation is in 2NF BOOK ISBN Title Publisher Address Example 2: Determine NF Product_ID Description ORDER All attributes are directly or indirectly determined by the primary key; therefore, the relation is at least in 1 NF Order_No Product_ID Description 17
Example 2: Determine NF Product_ID Description ORDER The relation is at least in 1NF. There is a COMPOSITE Primary Key (PK) (Order_No, Product_ID), therefore there can be partial dependencies. Product_ID, which is a part of PK, determines Description; hence, there is a partial dependency. Therefore, the relation is not 2NF. No sense to check for transitive dependencies! Order_No Product_ID Description Example 2: Determine NF Product_ID Description ORDER We know that the relation is at least in 1NF, and it is not in 2 NF. Therefore, we conclude that the relation is in 1 NF. Order_No Product_ID Description 18
Example 2: Determine NF Product_ID Description ORDER In your solution you will write the following justification: 1) No M/V attributes, therefore at least 1NF 2) There is a partial dependency (Product_ID Description), therefore not in 2NF Conclusion: The relation is in 1NF Order_No Product_ID Description Example 3: Determine NF Part_ID Description Part_ ID Price Part_ID, Comp_ID No Comp_ID and No are not determined by the primary key; therefore, the relation is NOT in 1 NF. No sense in looking at partial or transitive dependencies. PART Part_ID Descr Price Comp_ID No 19
Example 3: Determine NF Product_ID Description Product_ID Price Part_ID, Comp_ID No In your solution you will write the following justification: 1) There are M/V attributes; therefore, not 1NF Conclusion: The relation is unnormalized. PART Part_ID Descr Price Comp_ID No Bringing a Relation to 1NF STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 20
Bringing a Relation to 1NF Option 1: Make a determinant of the repeating group (or a multivalued attribute) a part of the primary key. STUDENT Composite Primary Key Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 Bringing a Relation to 1NF Option 2: Remove the entire repeating group from the relation. Create another relation which would contain all the attributes of the repeating group, plus the primary key from the first relation. In this new relation, the primary key from the original ii relation and the determinant of the repeating group will comprise a primary key. STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 21
Bringing a Relation to 1NF STUDENT Stud_ID Name 101 Lennon 101 Lennon 125 Jonson STUDENT_COURSE Stud_ID Course Units 101 MSI 250 3 101 MSI 415 3 125 MSI 331 3 Bringing a Relation to 2NF Composite Primary Key STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 22
Bringing a Relation to 2NF Goal: Remove Partial Dependencies Composite Primary Key Partial Dependencies STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 Bringing a Relation to 2NF Remove attributes that are dependent from the part but not the whole of the primary key from the original relation. For each partial dependency, create a new relation, with the corresponding part of the primary key from the original as the primary key. STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 23
Bringing a Relation to 2NF CUSTOMER Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 STUDENT_COURSE Stud_ID Course_ID 101 MSI 250 101 MSI 415 125 MSI 331 STUDENT Stud_ID Name 101 Lennon 101 Lennon 125 Jonson COURSE Course_ID Units MSI 250 3.00 MSI 415 3.00 MSI 331 3.00 Bringing a Relation to 3NF Goal: Get rid of transitive dependencies. EMPLOYEE Transitive Dependency Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sara Smith 2 Mktg 24
Bringing a Relation to 3NF Remove the attributes, which are dependent on a non-key attributes from the original relation. For each transitive dependency, d create a new relation with the non-key attributes which is a determinant in the transitive dependency as a primary key, and the dependent non-key attribute as a dependent. EMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sara Smith 2 Mktg Bringing a Relation to 3NF EMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sara Smith 2 Mktg EMPLOYEE Emp_ID F_Name L_Name Dept_ID 111 Mary Jones 1 122 Sara Smith 2 DEPARTMENT Dept_ID Dept_Name 1 Acct 2 Mktg 25
Other Normal Forms (from Appendix B) Boyce-Codd NF All determinants are candidate d keys there is no determinant that is not a unique identifier Usually, if a relation is in #NF it is in the BCNF, except when a part of the primary key is determined by a non-key attribute. 4 th NF and 5 th NF used primarily for theoretical purposes Merging Relations View Integration Combining entities from multiple ER models into common relations Issues to watch out for when merging entities from different ER models: Synonyms two or more attributes with different names but same meaning Homonyms attributes with same name but different meanings Transitive dependencies even if relations are in 3NF prior to merging, they may not be after merging Supertype/subtype relationships may be hidden prior to merging 26
Enterprise Keys advice from some experts Primary keys that are unique in the whole database, not just within a single relation Corresponds with the concept of an object ID in object-oriented systems 27
In class exercise See handout Based on Hoffer, Prescott and Topi Modern Database Management, (c) Prentice Hall 2009 28