Normalization
Database Normalization Database normalization is the process of removing redundant data from your tables in to improve storage efficiency, data integrity (accuracy and consistency), and scalability ( accommodates changes). In the relational model, methods exist for quantifying how efficient a database is. These classifications are called normal forms (or NF), and there are algorithms for converting a given database between them. Normalization generally involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued.
History Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form in his paper A Relational Model of Data for Large Shared Data Banks Codd stated: There is, in fact, a very simple elimination procedure which we shall call normalization. Through decomposition nonsimple domains are replaced by domains whose elements are atomic (nondecomposable) values.
Normal Form Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF. There are now others that are generally accepted, but 3NF is widely considered d to be sufficient for most applications. Most tables when reaching 3NF are also in BCNF (Boyce-Codd Normal Form).
Update Anomalies Relations that have redundant data may have problems called update anomalies, which are classified as, Insertion anomalies Deletion anomalies Modification anomalies
Unnormalized Table
Insertion anomalies To insert details of a new branch that currently has no members of staff into the StaffBranch table, it s necessary to enter nulls into the staff-related columns, such as staffno. However, as staffno is the primary key for the StaffBranch table, attempting to enter nulls for staffno violates entity integrity, and is not allowed.
Deletion anomalies If we delete a record from the StaffBranch table that represents the last member of staff located at a branch, the details about that branch are also lost from the database. For example, if we delete the record for staff Art Peters (S0415) from the StaffBranch table, the details relating to branch B003 are lost from the database.
Update anomalies If we want to change the value of one of the columns of a particular branch in the StaffBranch table, for example the telephone number for branch B001, we must update the records of all staff located at that branch. If this modification is not carried out on all the appropriate p records of the StaffBranch table, the database will become inconsistent.
The Process of Normalization Normalization is often executed as a series of steps. Each step corresponds to a specific normal form that has known properties. As normalization proceeds, the relations become progressively more restricted in format, and also less vulnerable to update anomalies. For the relational data model, it is important to recognize that it is only first normal form (1NF) that is critical in creating relations. All the subsequent normal forms are optional.
First Normal Form (1NF) Unnormalized form (UNF) A table that contains one or more repeating groups. ClientNo cname propertyno paddress rentstart rentfinish rent ownerno oname CR76 John kay PG4 PG16 PG4 6 lawrence St,Glasgow 5 Novar Dr, Glasgow 6 lawrence St,Glasgow 1-Jul-00 31-Aug-01 350 CO40 Tina Murphy Tony 1-Sep-02 1-Sep-02 450 CO93 Shaw 1-Sep-99 10-Jun-00 350 CO40 Tina Murphy CR56 Aline Stewart PG36 2 Manor Rd, Glasgow 10-Oct-00 1-Dec-01 370 CO93 Tony Shaw PG16 5 Novar Dr, Glasgow 1-Nov-02 1-Aug-03 450 CO93 Tony Shaw Figure 3 ClientRental unnormalized table
Definition of 1NF First Normal Form is a relation in which the intersection of each row and column contains one and only one value. There are two approaches to removing repeating groups from unnormalized tables: 1. Removes the repeating groups by entering appropriate data in the empty columns of rows containing the repeating data. 2. Removes the repeating group by placing the repeating data, along with a copy of the original key attribute(s), in a separate relation. A primary key is identified for the new relation.
First Normal Form Steps to Remove Repeating Groups Remove the repeating columns from the original i unnormalized table. Create a new table with the primary key of the base table and the repeating columns. Add another appropriate column to the primary key, which ensures uniqueness. Create a foreign key in the new table to link back to the original unnormalized table.
1NF ClientRental relation with the first approach The ClientRental relation is defined as follows, ClientRental ( clientno, propertyno, p cname, paddress, rentstart, rentfinish, rent, ownerno, oname) ClientNo propertyno cname paddress rentstart rentfinish rent ownerno oname CR76 PG4 John Kay 6 lawrence St,Glasgow 1-Jul-00 31-Aug-01 350 CO40 Tina Murphy CR76 PG16 John Kay 5 Novar Dr, Glasgow 1-Sep-02 1-Sep-02 450 CO93 Tony Shaw CR56 PG4 Aline Stewart 6 lawrence St,Glasgow 1-Sep-99 10-Jun-00 350 CO40 Tina Murphy CR56 PG36 Aline Stewart 2 Manor Rd, Glasgow 10-Oct-00 1-Dec-01 370 CO93 Tony Shaw CR56 PG16 Aline Stewart 5 Novar Dr, Glasgow 1-Nov-02 1-Aug-03 450 CO93 Tony Shaw Figure 4 1NF ClientRental relation with the first approach
1NF ClientRental relation with the second approach Client PropertyRentalOwner (clientno, cname) (clientno, propertyno, paddress, rentstart, rentfinish, i rent, ownerno, oname) ClientNo CR76 CR56 cname John Kay Aline Stewart ClientNo propertyno paddress rentstart rentfinish rent ownerno oname CR76 CR76 CR56 CR56 CR56 PG4 PG16 PG4 PG36 PG16 6 lawrence St,Glasgow 5 Novar Dr, Glasgow 6 lawrence St,Glasgow 2 Manor Rd, Glasgow 5 Novar Dr, Glasgow 1-Jul-00 31-Aug-01 350 CO40 1-Sep-02 1-Sep-02 450 CO93 1-Sep-99 10-Jun-00 350 CO40 10-Oct-00 1-Dec-01 370 CO93 1-Nov-02 1-Aug-03 450 CO93 Tina Murphy Tony Shaw Tina Murphy Tony Shaw Tony Shaw Figure 5 1NF ClientRental relation with the second approach
Second Normal Form A table is in second normal form if the table is in the first normal form and every non-primary key column is functionally dependent upon the entire primary key. No non-primary key column can be functionally dependent on part of the primary key. If A and B are 2 columns, B is fully functionally dependent on A.B is not dependent on any subset of A.
Second Normal Form A table in the first normal form will be in second normal form if any one of the following applies: The primary key is composed of only one column No non-keyed columns exist in the table. Every non-keyed attribute is dependent on all of the columns contained in the primary key.
Second Normal Form Steps to Remove Partial Dependencies Determine which non-key columns are not dependent upon the table s entire primary key. Remove those columns from the base table. Create a second table with those non-keyed columns and a copy of the columns from the primary key that they are dependent upon. Create a foreign key from the original base table to the new table, linking to the new primary key.
2NF ClientRental relation The ClientRental relation has the following partial dependencies: clientno cname (Partial dependency) fpropertyno paddress, rent, ownerno, oname (Partial dependency) d
2NF ClientRental relation Client (clientno, cname) Rental (clientno, propertyno, tn rentstart, t rentfinish) i PropertyOwner (propertyno, paddress, rent, ownerno, oname) Client ClientNo cname CR76 John Kay CR56 Aline Stewart PropertyOwner Rental ClientNo propertyno rentstart rentfinish CR76 PG4 1-Jul-00 31-Aug-01 CR76 PG16 1-Sep-02 1-Sep-02 CR56 PG4 1-Sep-99 10-Jun-00 CR56 PG36 10-Oct-00 1-Dec-01 CR56 PG16 1-Nov-02 1-Aug-03 propertyno paddress rent ownerno oname PG4 6 lawrence St,Glasgow 350 CO40 Tina Murphy PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw Figure 6 2NF ClientRental relation
Third Normal Form A table is in third normal form if every non-keyed column is directly dependent on the primary key, and not dependent on another non-keyed column. If the table is in second normal form and all of the transitive dependencies are removed, then every non-keyed column is saidtobe dependent upon the key, the whole key, and nothing but the key.
Third Normal Form Steps to Remove Transitive Dependencies Determine which columns are dependent on another non-keyed column. Remove those columns from the base table. Create a second table with those columns and the non-key columns that they are dependent upon. Create a foreign key in the original table linking to the primary key of the new table.
3NF ClientRental relation The Transitive dependencies for the PropertyOwner relations is as follows: PropertyOwner ownerno oname (Transitive Dependency)
3NF ClientRental relation The resulting 3NF relations have the forms: Client (clientno, cname) Rental (clientno, propertyno, rentstart, rentfinish) PropertyOwner (propertyno, paddress, rent, ownerno) Owner (ownerno, oname)
3NF ClientRental relation Client ClientNo cname CR76 John Kay CR56 Aline Stewartt Rental ClientNo propertyno rentstart rentfinish CR76 PG4 1-Jul-00 31-Aug-01 CR76 PG16 1-Sep-02 1-Sep-02 CR56 PG4 1-Sep-99 10-Jun-00 CR56 PG36 10-Oct-00 1-Dec-01 CR56 PG16 1-Nov-02 1-Aug-03 PropertyOwner Owner propertyno paddress rent ownerno PG4 6 lawrence St,Glasgow 350 CO40 PG16 5 Novar Dr, Glasgow 450 CO93 PG36 2 Manor Rd, Glasgow 370 CO93 ownerno CO40 CO93 oname Tina Murphy Tony Shaw Figure 7 2NF ClientRental relation