Normalization. Reduces the liklihood of anomolies

Similar documents

Topic 5.1: Database Tables and Normalization

Module 5: Normalization of database tables

Chapter 6. Database Tables & Normalization. The Need for Normalization. Database Tables & Normalization

DATABASE SYSTEMS. Chapter 7 Normalisation

DATABASE NORMALIZATION

Normalization in Database Design

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

SQL DATA DEFINITION: KEY CONSTRAINTS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7

LOGICAL DATABASE DESIGN

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Chapter 9: Normalization

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

Database Design for the Uninitiated CDS Brownbag Series CDS

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF)

Normalization of Database

Functional Dependency and Normalization for Relational Databases

Normalization. CIS 3730 Designing and Managing Data. J.G. Zheng Fall 2010

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

CSCI-GA Database Systems Lecture 7: Schema Refinement and Normalization

DBMS. Normalization. Module Title?

DATABASE INTRODUCTION

Database Design and the Reality of Normalisation

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES

Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

Database Design and Normalization

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Introduction to normalization. Introduction to normalization

MCQs~Databases~Relational Model and Normalization

IT2305 Database Systems I (Compulsory)

Why & How: Business Data Modelling. It should be a requirement of the job that business analysts document process AND data requirements

Database Design. Adrienne Watt. Port Moody

Fundamentals of Database Design

Normal forms and normalization

CSC 742 Database Management Systems

If it's in the 2nd NF and there are no non-key fields that depend on attributes in the table other than the Primary Key.

Databases Model the Real World. The Entity- Relationship Model. Conceptual Design. Steps in Database Design. ER Model Basics. ER Model Basics (Contd.

IT2304: Database Systems 1 (DBS 1)

Part 6. Normalization

Database Normalization. Mohua Sarkar, Ph.D Software Engineer California Pacific Medical Center

C HAPTER 4 INTRODUCTION. Relational Databases FILE VS. DATABASES FILE VS. DATABASES

The process of database development. Logical model: relational DBMS. Relation

Database Design Standards. U.S. Small Business Administration Office of the Chief Information Officer Office of Information Systems Support

THE BCS PROFESSIONAL EXAMINATION Diploma. October 2004 EXAMINERS REPORT. Database Systems

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

Practical Database Design

CS143 Notes: Normalization Theory

Data Hierarchy. Traditional File based Approach. Hierarchy of Data for a Computer-Based File

Conceptual Design Using the Entity-Relationship (ER) Model

Functional Dependencies and Finding a Minimal Cover

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Overview - detailed. Goal. Faloutsos CMU SCS

ETL PROCESS IN DATA WAREHOUSE

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Unit 2.1. Data Analysis 1 - V Data Analysis 1. Dr Gordon Russell, Napier University

Data Modeling Basics

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

Optimum Database Design: Using Normal Forms and Ensuring Data Integrity. by Patrick Crever, Relational Database Programmer, Synergex

Database Design and Development Simplified

How To Develop Software

SQL Server. 1. What is RDBMS?

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 5: Normalization II; Database design case studies. September 26, 2005

Relational Database Basics Review

In This Lecture. Physical Design. RAID Arrays. RAID Level 0. RAID Level 1. Physical DB Issues, Indexes, Query Optimisation. Physical DB Issues

MODULE 8 LOGICAL DATABASE DESIGN. Contents. 2. LEARNING UNIT 1 Entity-relationship(E-R) modelling of data elements of an application.

Fundamentals of Database System

The Relational Database Model

CPS352 Database Systems: Design Project

Brief background What the presentation is meant to cover: Relational, OLTP databases the type that 90% of our applications use

Database Design and Normalization

Tutorial on Relational Database Design

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London Martin Paris Deen London

What You Don t Know Will Haunt You.

Database Design Basics

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Database Design and Normalization

In This Lecture. SQL Data Definition SQL SQL. Notes. Non-Procedural Programming. Database Systems Lecture 5 Natasha Alechina

BCA. Database Management System

Data Analysis 1. SET08104 Database Systems. Napier University

ONLINE EXTERNAL AND SURVEY STUDIES

New Security Options in DB2 for z/os Release 9 and 10

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

Database IST400/600. Jian Qin. A collection of data? A computer system? Everything you collected for your group project?

Foundations of Business Intelligence: Databases and Information Management

Lecture Notes INFORMATION RESOURCES

Creating Database Tables in Microsoft SQL Server

B.1 Database Design and Definition

Outline. Data Modeling. Conceptual Design. ER Model Basics: Entities. ER Model Basics: Relationships. Ternary Relationships. Yanlei Diao UMass Amherst

Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys

How To Write A Diagram

COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model

Chapter 7: Relational Database Design

Unit 3.1. Normalisation 1 - V Normalisation 1. Dr Gordon Russell, Napier University

Lecture #11 Relational Database Systems KTH ROYAL INSTITUTE OF TECHNOLOGY

Transcription:

Normalization

Normalization Tables are important, but properly designing them is even more important so the DBMS can do its job Normalization the process for evaluating and correcting table structures to minimize data redundancies Reduces the liklihood of anomolies A process that goes by levels, refining the design each time Although we deal with tables, the refinement is mostly on the relationship level

Just do it So what's the point? Each level reduces more redundancy Each level also makes DB operations slower There are many levels of normalization, but the typical stopping point is the 3 rd (3NF)

Case Study Consider a construction company that handles several building projects Manage projects, clients, billing, and employees

A First Try Good? Bad?

Observations The Good ''Good'' An employee is only listed in a project once, so PROJ_NUM with EMP_NUM can give you job classification Hourly wage and hours worked are included Total charge for an employee and project as a whole can be calculated Neutral An employee can work on more than one project

The Bad Nulls! Especially in the expected implied PK of PROJ_NUM Easy for data inconsistencies and anomolies occur, especially if relying on human entry (ex. abbreviations) Data redundancies yield inconsistencies Updating a particular employee can require many updates Inserting new employees without a project is trouble Project and Employee data are mixed, so if an employee leaves data gets deleted

The Ugly Reporting on this table can yield different results depending on the anomoly Think of the job abbreviation thing again... if you want a report of all hours worked by a job class how do you do it if they are inconsistent? Data entry is rough even if you audit the errors mentioned already A lot of data is repeated and people are poor at that

Goals of Normalization A table represents only a single object and its attributes Data should be kept in one table where possible (controlled redudancy) to eliminate update anomolies All data in a row must depend on the primary key and nothing else to ensure PKs uniquely identify rows No update, insertion, or deletion anomolies so integrity and consistency is ensured

Functional Dependency Ahhh! Functional Dependency Attr B is functionally dependent on attr A if each value of A determines one and only one value of B Can also be thought of as ''A determines B'' License # is fully functionally dependent on SS# Name is not Fully Functional Dependency with a composite key (the confusing one) B is functionally dependent on A but not any subset of A

1NF Step 1 Step 1 Eliminate Repeating Groups Repeating Group multiple entries exist for the same key A column cannot contain more than one data item Repeating within columns Cannot be more than one column for the same data Repeating across columns A value cannot be assumed to span into multiple rows (re: cascading down into nulls) Repeating across rows

1NF Step 1 Row repetition can be eliminated by filling in the null values and assuring no rows have exactly the same data Column repetition is eliminated by the creation of a second table whose PK is composed of the original table's PK and the column in question (have to do this after step 2...)

1NF Step 2 Identify the minimal PK that will uniquely identify each row All other attributes should thus depend on the PK columns

1NF Step 3 Identify all dependencies between columns All attributes should be dependent on the key Some attributes may be dependent on only part of the key (partial dependency) Some attributes may be dependent on others (transitive dependency) Best way to do this is probably a dependency diagram (pg 160)

1NF Requirements Unique PK identified and all non-pk attributes depend on it Implies no duplicate rows No ordering of rows nor columns Each row-column intersection contains one and only one value in the domain No nulls! Some do not feel this is a requirement These last two combine to mean no repeating groups

2NF If a PK is a single attribute then it is already in 2NF, the below process only applies to composite primary keys Write each component of the key on a separate line with the last line the entire key Figure out which attributes are dependent on which parts Create new tables from each of these lines

2NF Requirements Already in 1NF No partial dependencies

3NF 2NF eliminates partial dependencies but not transitives, which can still leave anomolies, 3NF eliminates these Identify determinants An attribute whose value determines another There will be one per transitive dependency Identify dependent(s) of the determinant(s) Ex. student status charge per credit hour Remove the dependencies and place in table with the determinant as the PK Leave the determinant as a FK in its original table

3NF Requirements 2NF No transitive dependencies

More NF? The book mentions BCNF and 4NF but these are not really used in practice except very specific cases For BCNF, most 3NF tables will conform to its requirements anyway

Design Decisions Normalization refines relationships and eliminates redundancies It does not create good designs

Primary Keys Sometimes natural keys are hard to deal with If it is JOB_NAME for example, it is easy for someone to type it incorrectly in other tables where it is an FK In this case (where PK is an FK) it may be better to use a surrogate key What's the problem with introducing a surrogate key to a system that is already 3NF?

Naming Conventions Naming conventions will probably fall into one of two situations: Business rules specified You get to choose If you are in a position to pick them, make sure they are consistent and informative

Are all attributes accounted for? After designing and normalizing make sure all the information the database needs is present or can be calculated If it isn't you must add new columns and possibly run through the normalization process again

All relationships? Same thing as with attributes

Atomicity Atomicity whether an attribute can be broken down into more than one sensible attribute Address ''123 Fake St, Springfield, IL'' not atomic Address ''123 Fake St'', City ''Springfield'', State ''IL'' is atomic Atomicity is a bit fuzzy and is often considered to be a ''means nothing'' sort of term in its nonadjusted definition (the above is my adjusted version)

Granularity Refinement Granularity is the level of detail of data The most granular you can get it atomic Typically you are concerned with PKs here, but not always For a student, are credit hours specified as a total or only for the current semester? What about an employee and how many hours they work on a project?

Historical Records For the past slide, if choosing the ''current time period'' option, you need some way to maintain a history of the data for lookups of past periods For example, what if a person's consulting rate changes over time? A properly designed table can handle this

Derived Attributes Consider whether it is better to store these or to calculate them on the fly How will end users be using the data? What kind of performance is needed and what is the hardware capable of?

Surrogate Keys Like we discussed in the last lecture, sometimes a composite key is too hard to maintain When it is a FK it is probably not worth using In these cases we create a surrogate key to serve as a PK Single column, numeric, automatically incremented by the database

Be careful... Surrogate keys can lead to similar/duplicate row entires, which are undesirable Must enforce ''unique'' constraints on other attributes to ensure this doesn't happen This may be hard, such as the case of names An externally defined attribute would be better in this case

Design & Normalization Normalization is part of the design process As you are ER modeling you should also be normalizing Normalization is an intense analysis of a specific entity and its relationships. It reduces redundancy and thus anomolies Recommendation: get a rough sketch of the ER complete and then use normalization to refine parts of it individually Section 5.7 gives a walkthrough of a design process

Denormalization Normalization is not a one-size-fits-all process Occasionally design decisions are made that break normalization either for ease of use or speed considerations Is storing both zip codes and cities redundant? The ''Evaluate PK Assignments'' section talks about a situation where having a transitive dependency is acceptable Table 5.6

Homework Review Questions None, but again good review Problems 1, 2, 3, 4, 8, 9, 10