Normalization. CIS 3730 Designing and Managing Data. J.G. Zheng Fall 2010



Similar documents
Introduction to normalization. Introduction to normalization

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

Fundamentals of Database System

Relational Database Basics Review

Normalization. Normalization. First goal: to eliminate redundant data. for example, don t storing the same data in more than one table

DATABASE SYSTEMS. Chapter 7 Normalisation

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

Normalization in Database Design

DATABASE NORMALIZATION

Functional Dependency and Normalization for Relational Databases

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

Normalization of Database

Chapter 9: Normalization

DBMS. Normalization. Module Title?

Database Normalization. Mohua Sarkar, Ph.D Software Engineer California Pacific Medical Center

If it's in the 2nd NF and there are no non-key fields that depend on attributes in the table other than the Primary Key.

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES

Chapter 6. Database Tables & Normalization. The Need for Normalization. Database Tables & Normalization

Normalization. Functional Dependence. Normalization. Normalization. GIS Applications. Spring 2011

DATABASE DESIGN: Normalization Exercises & Answers


CST221, Dr. Zhen Jiang Normalization & design (see Appendix pages 42-55)

MCQs~Databases~Relational Model and Normalization

The process of database development. Logical model: relational DBMS. Relation

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

Part 6. Normalization

Normalization. Reduces the liklihood of anomolies

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

Normalization. CIS 331: Introduction to Database Systems

Database Design. Adrienne Watt. Port Moody

Normalization. Normalization. Normalization. Data Redundancy

Module 5: Normalization of database tables

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

Optimum Database Design: Using Normal Forms and Ensuring Data Integrity. by Patrick Crever, Relational Database Programmer, Synergex

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3)

Normalization in OODB Design

Topic 5.1: Database Tables and Normalization

Lecture 2 Normalization

RELATIONAL DATABASE DESIGN

LOGICAL DATABASE DESIGN

Normalisation 1. Chapter 4.1 V4.0. Napier University

DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF)

Teaching Database Modeling and Design: Areas of Confusion and Helpful Hints

IT2305 Database Systems I (Compulsory)

Unit 3.1. Normalisation 1 - V Normalisation 1. Dr Gordon Russell, Napier University

Fundamentals of Database Design

Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

1. A table design is shown which violates 1NF. Give a design which will store the same information but which is in 1NF.

Normalization. Purpose of normalization Data redundancy Update anomalies Functional dependency Process of normalization

IT2304: Database Systems 1 (DBS 1)

CPS352 Database Systems: Design Project

Database Design and the Reality of Normalisation

SQL Server. 1. What is RDBMS?

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Database Management System

Designing Databases. Introduction

Database Design and Normalization

Database Design and Normal Forms

BCA. Database Management System

DATABASE MANAGEMENT SYSTEMS. Question Bank:

How To Write A Diagram

Tutorial on Relational Database Design

- Eliminating redundant data - Ensuring data dependencies makes sense. ie:- data is stored logically

SQL, PL/SQL FALL Semester 2013

Normalization NORMALIZATION. Primary keys (contd.) Primary keys

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

RELATIONAL DATABASE DESIGN. Basic Concepts

The Relational Data Model and Relational Database Constraints

The Relational Database Model

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

MODULE 8 LOGICAL DATABASE DESIGN. Contents. 2. LEARNING UNIT 1 Entity-relationship(E-R) modelling of data elements of an application.

The 3 Normal Forms: Copyright Fred Coulson 2007 (last revised February 1, 2009)

Data Dictionary and Normalization

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

LiTH, Tekniska högskolan vid Linköpings universitet 1(7) IDA, Institutionen för datavetenskap Juha Takkinen

RELATIONAL DATABASE DESIGN Good Database Design Principles

SAMPLE FINAL EXAMINATION SPRING SESSION 2015

VBA and Databases (see Chapter 14 )

Chapter 10. Functional Dependencies and Normalization for Relational Databases

7. Databases and Database Management Systems

Databases and BigData

Week 11: Normal Forms. Logical Database Design. Normal Forms and Normalization. Examples of Redundancy

Physical Database Design and Tuning

3. Relational Model and Relational Algebra

Database Query 1: SQL Basics

1. Physical Database Design in Relational Databases (1)

SQL DATA DEFINITION: KEY CONSTRAINTS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7

DATABASE INTRODUCTION

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London Martin Paris Deen London

Database Normalization as a By-product of Minimum Message Length Inference

1.204 Lecture 2. Keys

Higher National Unit specification: general information. Relational Database Management Systems

Database Design Basics

Transcription:

Normalization CIS 3730 Designing and Managing Data J.G. Zheng Fall 2010 1

Overview What is normalization? What are the normal forms? How to normalize relations? 2

Two Basic Ways To Design Tables Bottom-up: Normalization: designing tables by splitting a big relation into multiple related tables to avoid anomalies Top-Down Three-level data modeling approach: conceptual, logical, and physical design. 3

Table Structure Representation Text style [Table Name]([Primary Key], attribute, [Foreign Key], attribute, ) FK: [Foreign Key] [Reference Table].[Primary Key] FK: (if more than one foreign key) Example Primary Key(s): underscored Department (DeptID, DeptName, Location) Employee (EmpID, EmpName, Department) FK: Department Department.DeptID Foreign Key: italicized Foreign Key: definition 4

Introduction How many, and what, relations (tables) should be used to store my data? Is this relation free of problems? Stud_ID Stud_Name Course_ID Course_Name Instructor Office Room Credit 224 Waters CIS20 Intro CIS Greene CBA001 205G 5 224 Waters CIS40 Database Mgt Hong CBA908 311S 5 224 Waters CIS50 Sys.Analysis Purao CBA700 139S 5 351 Byron CIS30 COBOL Hong CBA908 629G 3 351 Byron CIS50 Sys.Analysis Purao CBA700 139S 5 421 Smith CIS20 Intro CIS Greene CBA001 205G 5 421 Smith CIS30 COBOL Hong CBA908 629G 3 421 Smith CIS50 Sys.Analysis Purao CBA700 139S 5 5

Normalization Normalization is a process of producing a set of related relations (tables) with desirable attributes, given the data requirements of a domain The goal is to remove redundancy and data modification problems: insertion anomaly, update anomaly, and deletion anomaly Usually dividing a table into 2 or more tables Using Normal Forms as a formal guide 6

Anomaly Example 7

Normalized Tables 8

Normal Forms Normal forms are formal guidelines (steps) for the normalization process DK/NF 5th Normal Form 4th Normal Form (4NF) Boyce Codd NF (BC/NF) 3rd Normal Form (3NF) 2nd Normal Form (2NF) 1st Normal Form (1NF) Unnormalized Form (UNF) 9

Normalization 1NF A table is in 1NF if 1. it satisfies the definition of a relation Review: what are the features of a relation? 2. no repeating groups (columns) 10

Repeating Groups Customer ID First Name Surname Telephone Number 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659 555-776-4100 789 Maria Fernandez 555-808-9633 Customer ID First Name Surname Tel. No. 1 Tel. No. 2 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659 555-776-4100 789 Maria Fernandez 555-808-9633 Tel. No. 3 Lots of Null values 11

Avoid Repeating Groups Transforming to additional rows, rather than additional columns Customer ID First Name Surname Telephone Number 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659 456 Jane Wright 555-776-4100 789 Maria Fernandez 555-808-9633 12

Transforming to 1NF: Example Another example UNF 1NF 13

Problems in 1NF Basically it may have the same problem as spreadsheet tables Redundancy, and anomalies What s the problem in this table? Customer ID First Name Surname Telephone Number 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659 456 Jane Wright 555-776-4100 789 Maria Fernandez 555-808-9633 14

Higher Normal Forms Normal forms higher than 1NF deal with functional dependency Identifying the normal form level by analyzing the functional dependency between attributes (fields) 15

Functional Dependency If each value of attribute A is associated with only one value of attribute B, we say A determines B Or, B is dependent on A Denoted as: A B Functional dependence describes relationships between attributes (not relations) Composite determinant: A (and B) can be a set of fields If A consists of column a and b, and a and b together determines c, then: (a, b) c 16

Functional Dependency Examples Dependency diagram: A solid line indicates determinacy Dependency example For each Customer ID, there is only one corresponding first name (or last name), so: Customer ID determines First Name, or Customer ID First Name Composite determinant: (First Name, Last Name) Customer ID Non-dependency example An Customer ID can have multiple phone numbers, so: Customer ID does not determine Phone Number Dependency diagram: an example of composite determinant Customer ID First Name Last Name Phone Number 456 Jane Wright 555-403-1659 456 Jane Wright 555-776-4100 789 Jane Fernandez 555-808-9633 Dependency diagram: a broken line indicates non-determinacy 17

Functional Dependency and Keys By definition, a unique key functionally determines all other attributes Primary key Candidate key Surrogate key Composite primary key Example ISBN BookTitle PubDate ListPrice 18

Normalization 2NF A relation is in 2NF, if It is in 1NF, and All non-key attributes (attributes that are not part of any primary key or candidate key) must be functionally dependent on the whole primary (candidate) key Or, NO partial dependency Partial dependency A non-key attribute is dependent on part of a composite primary key Implication A relation with only single-attribute primary key and candidate key does not have partial dependency problem; therefore, such a relation is in 2NF. (A, B) is a composite PK and a composite determinant A B C D Partial dependency: B is also a determinant and is part of the PK (A, B) 19

A Relation in 1NF but Not in 2NF Composite primary key determines other columns Course ID Section Title Classroom Time CIS 2010 1 Intro to CIS ALC 201 TTH 3:00-4:15PM CIS 2010 2 Intro to CIS ALC 310 TTH 9:30-10:45AM CIS 2010 3 Intro to CIS Online W 7:00PM-9:40PM CIS 3730 1 Database CS 200 W 7:00PM-9:40PM Partial dependency 20

Transforming to 2NF Steps Identify the primary key (PK). If PK consists of only one field, then it is in 2NF. If PK is a composite key, then look for partial dependency. If there is partial dependency, move the partial dependency involved attributes to another relation. A B C D A B C B D 21

Transforming to 2NF: Example Course ID Section Title Classroom Time CIS 2010 1 Intro to CIS ALC 201 TTH 3:00-4:15PM CIS 2010 2 Intro to CIS ALC 310 TTH 9:30-10:45AM CIS 2010 3 Intro to CIS Online W 7:00PM-9:40PM CIS 3730 1 Database CS 200 W 7:00PM-9:40PM Redundancy is avoided Course ID CIS 2010 CIS 3730 Title Intro to CIS Database Course ID Section Classroom Time CIS 2010 1 ALC 201 TTH 3:00-4:15PM CIS 2010 2 ALC 310 TTH 9:30-10:45AM CIS 2010 3 Online W 7:00PM-9:40PM CIS 3730 1 CS 200 W 7:00PM-9:40PM Course (CourseID, Title) Schedule (CouseID, Section, Classroom, Time) FK: CourseID Course.CourseID 22

Problems in 2NF Again, there could be redundancy and potential inconsistency Order_ID Order_Date Cust_ID Cust_Name Cust_Address Value 1006 10/24/2004 2 Furniture Plano, TX 1007 10/25/2004 6 Furniture Gallery Boulder, CO 1008 11/1/2004 2 Value Furniture Plano, TX 23

Normalization 3NF A relation is in 3NF, if It is in 2NF, and All (non-key) attributes must, and only, be functionally dependent on the primary key Or, NO transitive dependency Transitive dependency A B and B C, then A C A B C D Transitive dependency: C is a determinant but not PK 24

A Relation in 2NF but Not in 3NF Identify primary key (PK) and look for transitive dependency Order_ID Order_Date CustID Name Address 1006 10/24/2004 2 Value Furniture Plano, TX 1007 10/25/2004 6 Furniture Gallery Boulder, CO 1008 11/1/2004 2 Value Furniture Plano, TX Transitive dependency 25

Transforming to 3NF Move the attributes involved in a transitive dependency to another relation Order Order_ID Order_Date CustID Name Address 1006 10/24/2004 2 Value Furniture Plano, TX 1007 10/25/2004 6 Furniture Gallery Boulder, CO 1008 11/1/2004 2 Value Furniture Plano, TX Order Order_ID Order_Date Customer 1006 10/24/2004 2 1007 10/25/2004 6 1008 11/1/2004 2 Customer CustID Name Address 2 Value Furniture Plano, TX 6 Furniture Gallery Boulder, CO Customer (CustID, Name, Address) Order (Corder_ID, Order_Date, Customer) FK: Customer Customer.CustID 26

BC/NF BC/NF is a stricter form of 2NF and 3NF A B C (A, B) is a candidate key (A, C) is also a candidate key A, B, C are all key attributes; there are no non-key attributes Can be viewed as special case of transitive dependency; or can be transformed into a similar pattern as partial dependency. A B C A C C B 27

BC/NF Example Physician Patient Bill Number A 1 101 B 1 101 A 2 102 B 2 103 Physician Bill Number A 101 B 101 A 102 B 103 Bill Number Patient 101 1 102 2 103 2 28

4NF Brief Multi-value dependency Employee Skill (determines multiple skills) Employee Degree (determines multiple degrees) Employee Skill Degree Jack SQL, Teaching BA, MS, PhD Michael SQL, C#, Java, Network BA, MBA Employee Skill Degree Jack SQL BA Jack Teaching MS Jack PhD Michael SQL BA 29

Practical Tips To identify the normalization level, determine the primary key and candidate keys first; then look for partial dependency (check if there is a composite PK or candidate key) and transitive dependency Design a relation that is easy to explain its meaning If there are attributes of different things in one table, there are usually problems; for example, students and courses are in one table, or customer and products are in one table; etc. Attributes that potentially generate many Null values might be moved into another table Generally relations in the 3NF are considered to be well formed; going higher may introduce unnecessary structural complexity which is inefficient for data queries Very often tables can go for lower normal forms (de-normalization) depending on design requirements 30

Normal Forms Summary BCNF: every attribute is dependent on the key, the whole key, and nothing but the key 3NF: every non-key attribute is dependent on the key, the whole key, and nothing but the key Eliminate transitive dependencies 2NF: every non-key attribute is dependent on the key, and the whole key. Eliminate partial dependencies 1NF: If the tables are relations and no repeating groups UNF Split repeating groups in separate rows 31

Summary Key concepts Anomalies Normalization and de-normalization Normal forms: 1NF to 3NF Functional dependency Partial dependency Transitive dependency Key skills: identify and normalize tables from 1NF to 3NF Be able to identify the normal form of a given relation Be able to identify functional dependency among attributes Be able to apply normalization principles to normalize a relation up to the 3 rd normalization form 32