Normalization. Functional Dependence. Normalization. Normalization. GIS Applications. Spring 2011



Similar documents
COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

COMPUTING PRACTICES ASlMPLE GUIDE TO FIVE NORMAL FORMS IN RELATIONAL DATABASE THEORY

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

DATABASE NORMALIZATION

Normalization in OODB Design

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Introduction to normalization. Introduction to normalization

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

A Simple Guide to Five Normal Forms in Relational Database Theory

Tutorial on Relational Database Design

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES

If it's in the 2nd NF and there are no non-key fields that depend on attributes in the table other than the Primary Key.

Normalization. Normalization. Normalization. Data Redundancy

Functional Dependency and Normalization for Relational Databases

DBMS. Normalization. Module Title?

DATABASE SYSTEMS. Chapter 7 Normalisation


Database Design and Normalization

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

Database Normalization. Mohua Sarkar, Ph.D Software Engineer California Pacific Medical Center

MCQs~Databases~Relational Model and Normalization

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London Martin Paris Deen London

Lecture 2 Normalization

The process of database development. Logical model: relational DBMS. Relation

RELATIONAL DATABASE DESIGN

Normalization. CIS 3730 Designing and Managing Data. J.G. Zheng Fall 2010

Database Design for the Uninitiated CDS Brownbag Series CDS

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

Normalization in Database Design

Normalization of Database

Database Management System

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

LOGICAL DATABASE DESIGN

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

Part 6. Normalization

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

1. INTRODUCTION TO RDBMS

DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF)

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

Unit 3.1. Normalisation 1 - V Normalisation 1. Dr Gordon Russell, Napier University

EECS 647: Introduction to Database Systems

An Algorithmic Approach to Database Normalization

Lecture Notes INFORMATION RESOURCES

Chapter 10. Functional Dependencies and Normalization for Relational Databases

Lecture 6. SQL, Logical DB Design

The Relational Model. Why Study the Relational Model?

Fundamentals of Database System

Optimum Database Design: Using Normal Forms and Ensuring Data Integrity. by Patrick Crever, Relational Database Programmer, Synergex

BCA. Database Management System

Normalization. CIS 331: Introduction to Database Systems

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

Chapter 6. Database Tables & Normalization. The Need for Normalization. Database Tables & Normalization

Topic 5.1: Database Tables and Normalization

Normalisation 1. Chapter 4.1 V4.0. Napier University

Normal forms and normalization

Database Design and Normalization

DATABASE MANAGEMENT SYSTEMS. Question Bank:

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

How To Write A Diagram

SQL Server. 1. What is RDBMS?

IINF 202 Introduction to Data and Databases (Spring 2012)

IT2304: Database Systems 1 (DBS 1)

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3)

Theory of Relational Database Design and Normalization

Extended Principle of Orthogonal Database Design

7. Databases and Database Management Systems

KNOWLEDGE FACTORING USING NORMALIZATION THEORY

1.204 Lecture 2. Keys

1.264 Lecture 10. Data normalization

Fundamentals of Database Design

COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model

SQL DATA DEFINITION: KEY CONSTRAINTS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7

Fundamental Concepts of Relational Theory Technical Brief

Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

z Introduction to Relational Databases for Clinical Research Michael A. Kohn, MD, MPP copyright 2007Michael A.

Module 5: Normalization of database tables

Normalisation. Royal Museum for Central Africa 5 June Edward Vanden Berghe

IT2305 Database Systems I (Compulsory)

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

Relational Database Concepts

C HAPTER 4 INTRODUCTION. Relational Databases FILE VS. DATABASES FILE VS. DATABASES

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer

Chapter 9: Normalization

DATABASE DESIGN: Normalization Exercises & Answers

Entity-Relationship Model

CST221, Dr. Zhen Jiang Normalization & design (see Appendix pages 42-55)

Database Management Systems. Redundancy and Other Problems. Redundancy

2. Basic Relational Data Model

An Introduction to Relational Database Management System

Determination of the normalization level of database schemas through equivalence classes of attributes

2. Conceptual Modeling using the Entity-Relationship Model

3. Relational Model and Relational Algebra

Week 11: Normal Forms. Logical Database Design. Normal Forms and Normalization. Examples of Redundancy

Database Design Basics

ACCESS CONTROL IN A RELATIONAL DATA BASE MANAGEMENT SYSTEM BY QUERY MODIFICATION

Database Design and the Reality of Normalisation

Review of Business Information Systems First Quarter 2011 Volume 15, Number 1

The Relational Database Model

Transcription:

Normalization Normalization Normalization is a foundation for relational database design Systematic approach to efficiently organize data in a database GIS Applications Spring 2011 Objectives Minimize unnecessary redundancies Support general purpose query processing Minimize unwanted side effects of database updates Inadvertent deletions or insertion errors Normalization Normalization theory is based on the concepts of normal forms. A relational table is in a particular normal form if it satisfies a certain set of constraints. The first three normal forms defined by E. F. Codd are typically required of all tables in a relational database The fourth and fifth normal forms are relatively rare Normal forms are guidelines not strict rules. Can deviate from them to meet practical requirements. Based on the analysis of functional dependencies among attributes. Functional Dependence The concept of functional dependence is the basis for the first four normal forms. Given a relation R, a column Y, of R is said to be functionally dependent upon column X of R if and only if each value of X in R is associated with precisely one value of Y at any given time Saying that column Y is functionally dependent upon X is the same as saying the values of column X identify the values of column Y. Such a functional dependency is denoted as X Y 1

The issue of keys The goal of database normalization is to ensure that every non-key column in every table is directly dependent on the key, the whole key and nothing but the key. Candidate and Primary Keys Superkey a set of one or more attributes that uniquely identifies a specific instance of an entity Candidate key any subset of the attributes of a superkey that is also a superkey and not reducible to another superkey Primary key a selection from the set of candidate keys - used to index a relation Dog registry: (dog_, owner_, address, phone) Field test: (field#, year, crop) Catalog Order: (catalog_item #, customer #, billing_address, shipping_address) Primary Keys Every relation (entity) must have a primary key To qualify as a primary key, an attribute must have the following properties: it must have a non-null value for each instance of the entity the value must be unique for each instance of an entity the values must not change or become null during the life of each entity instance Composite Keys Often more than one attribute is required to uniquely identify an entity. A primary key made up of more than one attribute is known as a composite key. Student # Student Major 38214 Bright IS 38214 Bright EE 69173 Smith PHY 2

Full Functional Dependence Applies to tables with composite keys Column Y in relational table R is fully functionally on X of R if it is functionally dependent on X and not functionally dependent upon any subset of X. Full functional dependence means that when a primary key is composite, then the other columns must be identified by the entire key and not just some of the columns that make up the key. Foreign Keys A foreign key identifies a column or a set of columns in one (referencing) table that refers to a column or set of columns in another (referenced) table. Columns in the referencing table must be the primary key or other candidate key in the referenced table. A foreign key completes a relationship by identifying the parent entity. Foreign keys provide a method for maintaining integrity in the data (called referential integrity) and for navigating between different instances of an entity. Every relationship in the model must be supported by a foreign key. Steps in Normalization Assemble data items from user views Convert to un-normalized relations Convert to first normal form (1NF) Convert to second normal form (2NF) Convert to third normal form (3NF) Should result in simple relations that correspond to entities or associations between entity classes Normalized tables when recombined (joined), should convey exactly the same information as the original table. Un-Normalized Relations Un-normalized relations contain one or more repeating groups multiple values at the intersection of rows and columns Student # Student Major Course # 38214 Bright IS IS 350 IS 465 69173 Jones PM IS 465 PM 300 QM 440 Course title Databases System Analysis System Analysis Prod Mang Op Res Codd Kemp Kemp Lewis Kemp B104 B213 B213 D317 B213 Contains redundant information e.g IS 465 appears in more than one row A C A B C 3

Un-Normalized Relations one-to-many relationship one-to-one? relationship one-to-one relationship Student Student# Major Need to address the one-to-many relationships Course# Normalized relations: First Normal Form A relation is in first normal form if the underlying domains contain only atomic values There are no repeating groups within a tuple Most relational systems require a database to be in 1NF First Normal Form Remove repeating groups and form 2 new relations migrate the primary key, and assure there is a valid new primary key Student # Student Major 38214 Bright IS 69173 Smith PM Identification of Primary Key Student # Student Major 38214 Bright IS 69173 Smith PM 4

Insert anomaly Insertion of a new course cannot occur until a student has registered for the course since Student # is part of the composite key Update anomaly Changing a course title or course number requires searching all tuples to find every occurrence of a course number or title Deletion anomaly Dropping a single student from a course requires dropping the course and losing the associated course and instructor information Functional Dependencies Course # Course Course # Course # Student #, Course # 5

Second Normal Form A relation is in second normal form if it is in 1NF and every non-key attribute is fully dependent on the primary key Course, and are partially dependent on the primary key (only on Course#) Second Normal Form To convert from first to second normal form remove partial dependencies Create 2 new relations, one with attributes fully dependent on primary key, other with attributes that were only partially dependent Student # Course # 38214 IS 350 A 38214 IS 465 C 69173 IS 465 B 69173 PM 300 C Course # Course Courses are independent of Student # and so can be inserted or deleted independently, only a single tuple needs to be updated in the course relation IS 350 Database Codd B104 IS 465 Sys Anal Kemp B213 PM 300 Prod man Lewis D317 QM 440 Op Res Kemp B213 Transitive dependencies Course # Course IS 350 Database Codd B104 IS 465 Sys Anal Kemp B213 PM 300 Prod man Lewis D317 QM 440 Op Res Kemp B213 one-to-one relationship one-to-one relationship A non-key attribute is dependent on one or more nokey attributes Insertion anomaly Since instructor is dependent on Course # as primary key no information about an instructor can be added until an instructor has been assigned to a course Delete anomaly Deleting data for a course results in deleting instructor information Course# Course 6

Update anomaly Course # Course IS 350 Database Codd B104 IS 465 Sys Anal Kemp B213 PM 300 Prod man Lewis D317 QM 440 Op Res Kemp B213 To update instructor information the entire relation must be searched since instructor information occurs more than once. Third Normal Form A relation is in third normal form if it is in 2NF and contains no transitive dependencies Every non-key attribute is fully dependent on the primary key and there are no transitive dependencies Non-key attributes that Codd B104 participate i t in the Kemp B213 transitive dependency Lewis D317 form a new relation Course # Course title IS 350 Database Codd IS 465 Sys Anal Kemp PM 300 Prod Mang Lewis QM 440 OP Res Kemp Foreign key a nonkey attribute in one relation that serves as a primary key in another relation Boyce-Codd Normal Form Occurs in the case of overlapping candidate keys Each student can major in several subjects For each major a student has one advisor Each major has several advisors Boyce-Codd Normal Form A relation is in Boyce-Codd normal form if it is in 3NF and there are no dependencies in candidate keys Each advisor advises only one major Student # Major Advisor 123 Physics Einstein 123 Music Mozart 456 Biol Darwin 789 Physics Bohr 999 Physics Einstein There are 2 possible candidate keys: Student #-Major or Student# Advisor and they are overlapping. Attributes that are part of a candidate key are dependent on part of another candidate key. Student # Major Advisor 123 Physics Einstein 123 Music Mozart 456 Biol Darwin 789 Physics Bohr 999 Physics Einstein Need to project into 2 new relations Student # Advisor 123 Einstein 123 Mozart 456 Darwin 789 Bohr 999 Einstein Advisor Major Einstein Physics Mozart Music Darwin Biol Bohr Physics 7

Fourth Normal Form Removes multi-valued dependencies Multi-valued dependency when 3 attributes (A, B, C) exist in a relation and for each value of A there is a well defined set of values for B and a well defined set of values for C,,yet B and C are independent of each other Fourth Normal Form Car Color Doors Outback Gray 2 Outback Gray 4 Outback Navy 4 Outback Navy 2 Forester Silver 2 Forester Silver 4 Several redundancies exist in the relation Can generate deletion and update anomalies Car Color Doors A record type should not contain two or more independent multi-valued facts about an entity. Car Outback Outback Forester Color Gray Navy Silver Car Doors Outback 2 Outback 4 Forester 2 Forester 4 Project to 2 new relations Normalization Summary Leads to simpler (to implement) applications and to more maintainable systems Based on a set of rules that define normal forms of which first three are most important: First normal form: All column values are atomic Second normal form: All column values depend on the whole primary key: no partial dependencies Third normal form: No column value depends on the value of any other column except the primary key no transitive dependencies Limits of Normalization Normalization rules are guidelines In certain circumstances 3NF or higher may not be desirable Customer (, Street, City, State, Zipcode) Does not meet 3NF Customer(, Street, Zipcode) (Zipcode, City, State) 3NF may not be efficient in terms of regular queries Need to apply judgment and common sense 8

Limits of Normalization There can be a number of cases where there is a compelling need for non first normal form structures. Spatial data objects is one of them Object-relational model supports ability to implement nonfirst normal structures arrays nested tables References 1. E.F. Codd, "A Relational Model of Data for Large Shared Data Banks", Comm. ACM 13 (6), June 1970, pp. 377-387. The original paper introducing the relational data model. 2. E.F. Codd, "Normalized Data Base Structure: A Brief Tutorial", ACM SIGFIDET Workshop on Data Description, Access, and Control, Nov. 11-12, 1971, San Diego, California, E.F. Codd and A.L. Dean (eds.). An early tutorial on the relational model and normalization. 3. E.F. Codd, "Further Normalization of the Data Base Relational Model", R. Rustin (ed.), Data Base Systems (Courant Computer Science Symposia 6), Prentice-Hall, 1972. Also IBM Research Report RJ909. The first formal treatment of second and third normal forms. 4. C.J. Date, An Introduction to Database Systems (third edition), Addison-Wesley, 1981. 9