Normalisation. Royal Museum for Central Africa 5 June 2013. Edward Vanden Berghe



Similar documents
Database Normalization. Mohua Sarkar, Ph.D Software Engineer California Pacific Medical Center

The process of database development. Logical model: relational DBMS. Relation

COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

Normalization. Functional Dependence. Normalization. Normalization. GIS Applications. Spring 2011

C# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London Martin Paris Deen London


Normalization of Database

KNOWLEDGE FACTORING USING NORMALIZATION THEORY

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

DBMS. Normalization. Module Title?

If it's in the 2nd NF and there are no non-key fields that depend on attributes in the table other than the Primary Key.

Tutorial on Relational Database Design

RELATIONAL DATABASE DESIGN

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

Database Design for the Uninitiated CDS Brownbag Series CDS

Database Design Basics

Database Design and Normalization

MCQs~Databases~Relational Model and Normalization

Normalization. CIS 3730 Designing and Managing Data. J.G. Zheng Fall 2010

DATABASE NORMALIZATION

BCA. Database Management System

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Normalisation 1. Chapter 4.1 V4.0. Napier University

Creating Tables ACCESS. Normalisation Techniques

Database Design Standards. U.S. Small Business Administration Office of the Chief Information Officer Office of Information Systems Support

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

B.1 Database Design and Definition

CST221, Dr. Zhen Jiang Normalization & design (see Appendix pages 42-55)

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Unit 3.1. Normalisation 1 - V Normalisation 1. Dr Gordon Russell, Napier University

RELATIONAL DATABASE DESIGN. Basic Concepts

An Introduction to Relational Database Management System

Chapter 5: Logical Database Design and the Relational Model Part 2: Normalization. Introduction to Normalization. Normal Forms.

SQL Server 2008 Core Skills. Gary Young 2011

Benefits of Normalisation in a Data Base - Part 1

Optimum Database Design: Using Normal Forms and Ensuring Data Integrity. by Patrick Crever, Relational Database Programmer, Synergex

Normalisation 6 TABLE OF CONTENTS LEARNING OUTCOMES

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3)

Normalization in OODB Design

Once the schema has been designed, it can be implemented in the RDBMS.

DATABASE SYSTEMS. Chapter 7 Normalisation

Normalization in Database Design

Relational Database Basics Review

Relational Data Analysis I

In This Lecture. Physical Design. RAID Arrays. RAID Level 0. RAID Level 1. Physical DB Issues, Indexes, Query Optimisation. Physical DB Issues

Notes. Information Systems. Higher Still. Higher. HSN31010 Database Systems First Normal Form. Contents

Physical Database Design and Tuning

Data Modelling And Normalisation

3. Relational Model and Relational Algebra

1. Physical Database Design in Relational Databases (1)

DATABASE DESIGN: Normalization Exercises & Answers

Normalization. Normalization. First goal: to eliminate redundant data. for example, don t storing the same data in more than one table

Fundamentals of Database System

WHITE PAPER HOW TO REDUCE RISK, ERROR, COMPLEXITY AND DRIVE COSTS IN THE ACCOUNTS PAYABLE PROCESS

DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF)

Database Design and Normalization

Part 6. Normalization

Database Design and the Reality of Normalisation

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

SQL Server. 1. What is RDBMS?

COMPUTING PRACTICES ASlMPLE GUIDE TO FIVE NORMAL FORMS IN RELATIONAL DATABASE THEORY

Institutional Research Database Study

7. Databases and Database Management Systems

Trainer and Consultant at IT-Visions.de

LOGICAL DATABASE DESIGN

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

C HAPTER 4 INTRODUCTION. Relational Databases FILE VS. DATABASES FILE VS. DATABASES

Training Needs Analysis

MODULE 8 LOGICAL DATABASE DESIGN. Contents. 2. LEARNING UNIT 1 Entity-relationship(E-R) modelling of data elements of an application.

Using Indexes. Introduction

The Relational Database Model

Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

Database Concepts II. Top down V Bottom up database design. database design (Cont) 3/22/2010. Chapter 4

Introduction to normalization. Introduction to normalization

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

Explain the role of the database administrator.

Grade descriptions Computer Science Stage 1

Functional Dependency and Normalization for Relational Databases

Chapter 9: Normalization

Business Intelligence: Multidimensional Data Analysis

MS Access: Advanced Tables and Queries. Lesson Notes Author: Pamela Schmidt

Normalization. CIS 331: Introduction to Database Systems

Chapter 6. Database Tables & Normalization. The Need for Normalization. Database Tables & Normalization

Referential Integrity in Cloud NoSQL Databases

Creating tables in Microsoft Access 2007

Programming Database lectures for mathema

2. Basic Relational Data Model

Fundamentals of Database Systems. Emily Hegge. CTU Online. CS A-03 Fundamentals of Database Systems. Phillip Goin

Normal forms and normalization

Fundamentals of Database Design

IT2304: Database Systems 1 (DBS 1)

Normalization Is a Nice Theory David Adams & Dan Beckett. All rights reserved.

Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques

Optimization of SQL Queries in Main-Memory Databases

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

Object models and Databases. Contents

Database Design and Implementation

SQL Server Performance Intelligence

Transcription:

Normalisation Royal Museum for Central Africa 5 June 2013 Edward Vanden Berghe

What is Normalisation? Theoretical: satisfying the requirements of the different Normal Forms, as spelled out by (mainly) E.F. Codd Practical: make sure data is in your database once and only once Repeated data go to separate table Relationships between the tables are part of the model of the database

Earlier example Species # legs # eyes place Country date Asterias rubens 5 0 Oostende Belgium 12/3/2004 Asterias rubens 5 0 Zeebrugge Belgium 13/3/2005 Asterias rubens 5 0 Zeebrugge Belgium 14/3/2005 Cancer pagurus 10 2 De Panne Belgium 12/3/2004 Cancer pagurus 10 2 Oostende Belgium 12/3/2004 Cancer pagurus 10 2 Zeebrugge Belgium 14/3/2004 Asterias rubens 5 0 Wimereux France 13/3/2005 Asterias rubens 5 0 Wimereux France 14/3/2005 Cancer pagurus 10 2 Wimereux France 12/3/2004

Why normalise Save space on disk by avoiding repetition But huge disk space makes this less important Zipping would replace repeated strings by a code Avoid modification anomalies Make model intuitive and informative Make database unbiased with respect to patterns of querying

Modification anomalies Update anomalies Potential source of conflicting data Insertion anomalies Some relevant data can t be stored Deletion anomalies Some relevant data are lost while deleting other data

Update anomalies If data is present more than once, it s possible to create conflicting information by updating one version of he data and not the other Species # legs # eyes place Country date Asterias rubens 6 0 Oostende Belgium 12/3/2004 Asterias rubens 5 0 Zeebrugge Belgium 13/3/2005 Asterias rubens 5 1 Zeebrugge France 14/3/2005

Insertion anomalies If two concepts are mixed in one table, we can t store information on new items of one type, unless we have at the same time information on the other Species # legs # eyes place Country date Asterias rubens 5 0 Oostende Belgium 12/3/2004 Asterias rubens 5 0 Zeebrugge Belgium 13/3/2005 Asterias arenata 5 0 <null> <null> <null>

Deletion anomalies If two concepts are mixed in one table, we loose information on a concept if the last instance of the other concept is deleted Species # legs # eyes place Country date Asterias rubens 5 0 Oostende Belgium 12/3/2004 Asterias rubens 5 0 Zeebrugge Belgium 13/3/2005 Asterias arenata 5 0 Zeebrugge Belgium 13/3/2005

Making model more intuitive A good model should reflect the reality it tries to mirror, including the relationships between the entities. Separate entities in real life (can be abstract) should be modelled separately Species # legs # eyes place Country date Asterias rubens 5 0 Oostende Belgium 12/3/2004 Asterias rubens 5 0 Zeebrugge Belgium 13/3/2005 Shared biological biogeographical

and robust Entries in a database should be atomic Should not be a combination of several smaller entities such as Oostende, Belgium Contain no qualifiers (such as Asterias cfr rubens; Asterias?rubens ) Not be dependent on the value of another field Not contain repeated values (e.g. several authors for a multi-author publication)

Avoid bias Asterias rubens Oostende, Belgium, 12/3 Zeebrugge, Belgium, 13/3 Wimereux, France, 13/3 Asterias arenata Den Osse, Netherlands, 17/3 Cancer pagurus Oostende, Belgium, 12/3 De Panne, Belgium, 12/3 Den Osse, Netherlands, 14/5 Abra alba Oostende, Belgium, 14/5 A nested list is easier to query on the grouping factor of the list. It is easy to find in which countries Asterias rubens occurs; to find out which species occur in say France, we must read our complete database

The formal process The key, The whole key, And nothing but the key So help me (E.F.) Codd

N1NF (non-1 Normal Form) Asterias rubens Oostende, Belgium, 12/3 Zeebrugge, Belgium, 13/3 Wimereux, France, 13/3 Asterias arenata Den Osse, Netherlands, 17/3 Cancer pagurus Oostende, Belgium, 12/3 De Panne, Belgium, 12/3 Den Osse, Netherlands, 14/5 Abra alba Oostende, Belgium, 14/5

N1NF Structure of the table : drs (species, legs, eyes, place1, country1, date1, place2, country2, date2, place3, country3, date3) Entries are not atomic, difficult to query What if we have a fourth distribution record??

1NF Species # legs # eyes place Country date Asterias rubens 5 0 Oostende Belgium 12/3/2004 Asterias rubens 5 0 Zeebrugge Belgium 13/3/2005 Asterias rubens 5 0 Zeebrugge Belgium 14/3/2005 Cancer pagurus 10 2 De Panne Belgium 12/3/2004 Cancer pagurus 10 2 Oostende Belgium 12/3/2004 Cancer pagurus 10 2 Zeebrugge Belgium 14/3/2004 Asterias rubens 5 0 Wimereux France 13/3/2005 Asterias rubens 5 0 Wimereux France 14/3/2005 Cancer pagurus 10 2 Wimereux France 12/3/2004

1NF: the key A distribution record (a line in our table) is unique when taking into account species, place and date drs (species, place, date, legs, eyes, country) Table names are usually plural, field (column) names singular. In this type of analysis keys are underlined

2NF: the whole key Moving repeating groups to separate entities, and looking for a key for that entity: remove entities that are dependent only on part of the compound key Distribution records (species, place, date) Species (species, legs, eyes) Places (place, country)

2NF: foreign keys The one original table was split in three Distribution records (drs), species, places Table drs and species share a field, species, that allow us to find related records Field species is foreign key in table drs Same with drs and places Species and places can be populated from reference tables (CoL; Gazetteer)

3NF: nothing but the key Moving attributes that are functionally dependent on non-key attribute Possible structure (in this case same as 2NF) Distribution records (species, place, date) Places (place, country) Species (species, legs, eyes)

Elaborating further: IDs Key of drs is compound, composed of three fields better to replace with a synthetic key (id autonumber or sequence ) Keys of places and species are names with real meaning; anything with meaning in real life can change, so also better to replace with artificial key

Elaborating further: traits Our database now has information on number of legs and number of eyes. What if we want to start storing colour? Requires rewrite of the database Alternative: split out data on biological traits in table with property/value pairs Species (id, species, author, parent_id ) Traits (species_id, trait, value)

Model

Remarks Sometimes it is better not to normalise completely Surname & first name as 1 attribute instead of 2 Calculated fields to speed up queries Sometimes it is better to denormalise completely Exchange formats such as Darwin Core

Final remarks Normalisation is a means, not a goal Intelligent denormalising is as much an art as normalising!