Normalisation in the Presence of Lists



Similar documents
COSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1

Database Design and Normalization

Relational Database Design

Database Design and Normal Forms

Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases

Schema Refinement, Functional Dependencies, Normalization

Functional Dependencies and Normalization

Determination of the normalization level of database schemas through equivalence classes of attributes

Design of Relational Database Schemas

Relational Database Design Theory

Theory of Relational Database Design and Normalization

Chapter 10. Functional Dependencies and Normalization for Relational Databases

CS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University

Database Management System

Database Management Systems. Redundancy and Other Problems. Redundancy

Week 11: Normal Forms. Logical Database Design. Normal Forms and Normalization. Examples of Redundancy

Schema Refinement and Normalization

Schema Design and Normal Forms Sid Name Level Rating Wage Hours

Relational Database Design: FD s & BCNF

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 5: Normalization II; Database design case studies. September 26, 2005

normalisation Goals: Suppose we have a db scheme: is it good? define precise notions of the qualities of a relational database scheme

Chapter 8. Database Design II: Relational Normalization Theory

Relational Databases

Chapter 7: Relational Database Design

SQL DDL. DBS Database Systems Designing Relational Databases. Inclusion Constraints. Key Constraints

Normalisation. Why normalise? To improve (simplify) database design in order to. Avoid update problems Avoid redundancy Simplify update operations

CS143 Notes: Normalization Theory

Functional Dependency and Normalization for Relational Databases

Functional Dependencies and Finding a Minimal Cover

Normalization Theory for XML

Normalisation to 3NF. Database Systems Lecture 11 Natasha Alechina

Normalization in Database Design

Limitations of E-R Designs. Relational Normalization Theory. Redundancy and Other Problems. Redundancy. Anomalies. Example

Relational Normalization: Contents. Relational Database Design: Rationale. Relational Database Design. Motivation

Advanced Relational Database Design

Chapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.

Testing LTL Formula Translation into Büchi Automata

Normalization. CIS 331: Introduction to Database Systems

Database Design and Normalization

Boyce-Codd Normal Form

Rigorous Software Development CSCI-GA

Conceptual Design Using the Entity-Relationship (ER) Model

Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES

Introduction to Database Systems. Normalization

Normalization of Database

Database Systems Concepts, Languages and Architectures

Introduction Decomposition Simple Synthesis Bernstein Synthesis and Beyond. 6. Normalization. Stéphane Bressan. January 28, 2015

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

A Propositional Dynamic Logic for CCS Programs

Normalization. Functional Dependence. Normalization. Normalization. GIS Applications. Spring 2011

Chapter 7: Relational Database Design

Theory of Relational Database Design and Normalization

Normalization in OODB Design

How To Improve Performance In A Database

Databases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

COMPUTER SCIENCE TRIPOS

Chapter 10 Functional Dependencies and Normalization for Relational Databases

An Algorithmic Approach to Database Normalization

DATABASE NORMALIZATION

DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS

MCQs~Databases~Relational Model and Normalization

Degrees of Truth: the formal logic of classical and quantum probabilities as well as fuzzy sets.

Normalization. Normalization. Normalization. Data Redundancy

Lecture 2 Normalization

Theory behind Normalization & DB Design. Satisfiability: Does an FD hold? Lecture 12

Why Is This Important? Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) Example (Contd.)

2. Basic Relational Data Model

BCA. Database Management System

Database Constraints and Design

Lecture Notes on Database Normalization

EQUATIONAL LOGIC AND ABSTRACT ALGEBRA * ABSTRACT

Chapter 6: Episode discovery process

Completing Description Logic Knowledge Bases using Formal Concept Analysis

Functional Dependencies

Non-deterministic Semantics and the Undecidability of Boolean BI

A Web-Based Environment for Learning Normalization of Relational Database Schemata

A Note on Context Logic

SOLUTIONS TO ASSIGNMENT 1 MATH 576

RELATIONAL DATABASE DESIGN

Linear Algebra. A vector space (over R) is an ordered quadruple. such that V is a set; 0 V ; and the following eight axioms hold:

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

Introduction to Database Systems. Chapter 4 Normal Forms in the Relational Model. Chapter 4 Normal Forms

Semantics and Verification of Software

Basic Concepts of Point Set Topology Notes for OU course Math 4853 Spring 2011

6.2 Permutations continued

Scanner. tokens scanner parser IR. source code. errors

SPARQL: Un Lenguaje de Consulta para la Web

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Lambda Calculus between Algebra and Topology. Antonino Salibra

Aikaterini Marazopoulou

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer

Jordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010

Transcription:

1 Normalisation in the Presence of Lists Sven Hartmann, Sebastian Link Information Science Research Centre, Massey University, Palmerston North, New Zealand 1. Motivation & Revision of the RDM 2. The Brouwerian Algebra of Nested Attributes 3. Axiomatisation of FDs 4. Redundancies and the Nested List Normal Form 5. Update Anomalies 6. Future Work

ADC 2004, 20.01., Dunedin, New Zealand 2 1.1 A first Example consider Factor(Integer,Prime[Number],Exp[Number]) a typical snapshot r consists of the following tuples (12,[2,3],[2,1]), (35,[5,7],[1,1]), (37,[37],[1]), (936,[2,3,13],[3,2,1]) FDs on Factor(Integer,Prime[Number],Exp[Number]) are Factor(Integer), Factor(Prime[Number],Exp[Number]) minimal keys Factor(Prime[λ]) Factor(Exp[λ]) Factor(Exp[λ]) Factor(Prime[λ]) Is this a well-designed database? Why? Why not?

ADC 2004, 20.01., Dunedin, New Zealand 3 1.2 Boyce-Codd Normal Form in the RDM FDs with X, Y R introduced by E.F. Codd in 1972 = r X Y iff t 1 [Y ] = t 2 [Y ], if t 1 [X] = t 2 [X] for any t 1, t 2 r Armstrong Axioms X Y Y X X Y X X Y X Y, Y Z X Z form minimal, sound and complete set of Inference Rules (R, Σ) in BCNF iff every X Y Σ trivial or X is R-superkey justification: (R, Σ) in BCNF iff every X Y Σ is trivial or X superkey for R iff every R-relation r with = r Σ key satisfies = r Σ iff R non-redundant wrt. Σ iff R does not have any insertion anomalies iff R does not have any replacement anomalies (type 1 or 2) only if R does not have any type 3 replacement anomalies

ADC 2004, 20.01., Dunedin, New Zealand 4 1.3 Challenges with Advanced Data Models find unifying framework, extend achievements to complex object types classify data models according to the types they support Dependencies Join Problems Inclusion MVDs FDs Synthesis/Decomposition Justification of Normal Forms Normal Forms Implication Axiomatisation records sets multisets lists unions references Data Models

ADC 2004, 20.01., Dunedin, New Zealand 5 1.4 The Need for Lists here: values, records of values and lists of values ordered relations time-series data, meteorological and astronomical data streams, runs of experimental data, multidimensional arrays, textual information, voices, sound, images, video, etc. subject to studies in deductive and temporal database community occur naturally in object-oriented databases (XML) bioinformatics: lists occur naturally in genomic sequence databases

ADC 2004, 20.01., Dunedin, New Zealand 6 2.1 Syntax: Nested Attributes capture characteristics of objects in target database by attributes finite set U of flat attributes and dom(a) for all A U use set L of labels with U L = and λ / U L nested attributes N A(U, L): flat attributes U N A, null attribute λ N A, record-valued attributes L(N 1,..., N k ) N A, if L L and N 1,..., N k N A with k 1 list-valued attributes L[N] N A, if L L and N N A Factor(Integer,Prime[Number],Exp[Number]) is nested attribute

ADC 2004, 20.01., Dunedin, New Zealand 7 2.2 Semantics: Domain Assignment extend mapping dom from flat attributes to nested attributes by: dom(λ) = {ok}, dom(l(n 1,..., N k )) = {(v 1,..., v k ) v i dom(n i )}, dom(l[n]) = {[v 1,..., v n ] v i dom(n)} empty list denoted by [ ] RDM: record-valued attributes only Dom(Factor(Integer,Prime[Number],Exp[Number])): 3-tuples where first component is some integer value and second and third components are both a finite list of integer values

ADC 2004, 20.01., Dunedin, New Zealand 8 define N A N A by: 2.3 Subattributes N N for all nested attributes N N A, λ A for all flat attributes A U, λ N for all list-valued attributes N N A, L(N 1,..., N k ) L(M 1,..., M k ), if N i M i for all i = 1,..., k, L[N] L[M], if N M Factor(λ, Prime[Number], Exp[λ]) is subattribute of Factor(Integer, Prime[Number], Exp[Number]) subattribute relation on nested attributes is partial order

ADC 2004, 20.01., Dunedin, New Zealand 9 2.4 Semantics on Subattributes: Projection Function for M N define π N M : Dom(N) Dom(M) by: π N N π N λ : v v, : v ok, π L(N 1,...,N k ) L(M 1,...,M k ) : (v 1,..., v k ) (π N 1 M 1 (v 1 ),..., π N k M k (v k )), π L[N ] L[M ] : [v 1,..., v n ] [π N M (v 1 ),..., π N M (v n )] [ ] mapped to itself, except when projected on λ projection of (12, [2, 3], [2, 1]) on Factor(λ,Prime[Number],Exp[λ]) is (ok, [2, 3], [ok, ok])

ADC 2004, 20.01., Dunedin, New Zealand 10 2.5 Operations on Subattributes Sub(N) = {X N A X N}: λ N, Y N Z, Y N Z, Y. N Z: λ N = L(λ N1,..., λ Nk ), if N = L(N 1,..., N k ), and λ N = λ else, X Y : X N Y = Y, X N Y = X and X. N Y = λ N, X. N λ N = X, N = L(N 1,..., N k ), X = L(X 1,..., X k ), Y = L(Y 1,..., Y k ): X N Y = L(X 1 N1 Y 1,..., X k Nk Y k ) for {,, }. N = L[M], X = L[X ], Y = L[Y ], {, }: X N Y = L[X M Y ], X Y : X. N Y = L[X. M Y ]

ADC 2004, 20.01., Dunedin, New Zealand 11 2.6 The Brouwerian Algebra of Subattributes (Sub(N),, N, N,. N, N) is a Brouwerian Algebra (Sub(N),, N, N ) is a lattice N is top element Pseudo-Difference Z. Y of Z and Y in Sub(N) satisfies for all X Sub(N) Z. Y X if and only if Z Y X Brouwerian Complement: Y C N = N. N Y satisfies Y C X if and only if X Y = N bottom element λ N = N. N (Sub(N),, N, N, ( ) C N, λ N, N) is not a Boolean Algebra every Brouwerian Algebra is distributive

ADC 2004, 20.01., Dunedin, New Zealand 12 2.7 Algebra of Factor(Integer,Prime[Number],Exp[Number])

ADC 2004, 20.01., Dunedin, New Zealand 13 3.1 Axiomatisation of Functional Dependencies functional dependency on nested attribute N is X Y with X, Y Sub(N) finite r Dom(N) satisfies X Y on N ( = r X Y) iff π N X (t 1) = π N X (t 2) implies π N Y (t 1) = π N Y (t 2) The generalized Armstrong Axioms for FDs on N X Y Y X, X Y X X N Y, X Y, Y Z X Z form a minimal, sound and complete set of inference rules for the implication of FDs.

ADC 2004, 20.01., Dunedin, New Zealand 14 3.2. Basis Attributes and Keys subattribute basis of N: smallest subset SubB(N) Sub(N) st every X Sub(N) is X = Z for some Z SubB(N) MaxB(N) are maximal elements, NMaxB(N) the non-maximal subattribute basis of Factor(Integer,Prime[Number],Exp[Number]): Factor(Integer,λ,λ), Factor(λ,Prime[Number],λ), Factor(λ,λ,Exp[Number]), Factor(λ,Prime[λ],λ), Factor(λ,λ,Exp[λ]) X Sub(N) superkey for N wrt Σ iff Σ = X N X Sub(N) minimal key for N wrt Σ iff X superkey and there is no proper subattribute of X also being superkey for N Factor(Integer,λ,λ) and Factor(λ,Prime[Number],Exp[Number]) are minimal keys

ADC 2004, 20.01., Dunedin, New Zealand 15 3.3 Trivial FDs and BCNF for Lists = r X Y for all r Dom(N) iff Y X trivial FD on Factor(Integer,Prime[Number],Exp[Number]): Factor(Prime[Number]) Factor(Prime[λ]) (N, Σ) in BCNF iff every X Y Σ trivial or X superkey Factor(Integer,Prime[Number],Exp[Number]) not in BCNF wrt Σ as Factor(Prime[λ]) Factor(Exp[λ]) is not trivial and Factor(Prime[λ]) is not a superkey

ADC 2004, 20.01., Dunedin, New Zealand 16 4.1 Redundancy and Inevitable FDs (N, Σ) redundant iff r Dom(N). = r Σ and t 1, t 2 r.t 1 t 2 and π N X Y (t 1) = π N X Y (t 2) for some non-trivial X Y Σ Factor(Prime[λ]) Factor(Exp[λ]) causes redundancies, i.e., defined in terms of non-maximal basis attribute non-maximal attributes cannot be separated from superattributes calls for better notion of redundancy Σ inev = {X Y Σ + Y X or Y NMaxB(N)} Σ inev + set of inevitable FDs on N wrt Σ (N, Σ) redundant iff r Dom(N). = r Σ and t 1, t 2 r.t 1 t 2 and πx Y N (t 1) = πx Y N (t 2) for some X Y Σ not inevitable Theorem: (N, Σ) redundant iff (N, Σ ) redundant

ADC 2004, 20.01., Dunedin, New Zealand 17 4.2 The Nested List Normal Form (N, Σ) in NLNF iff every X Y Σ is inevitable or X superkey Factor(Integer,Prime[Number],Exp[Number]) is in NLNF wrt Σ BCNF implies NLNF, but not vice versa Theorem: (N, Σ) in NLNF iff (N, Σ) non-redundant Theorem: (N, Σ) in NLNF iff every X Y Σ is inevitable or X superkey Theorem: (N, Σ) in NLNF iff every r Dom(N) with = r Σ key Σ + inev satisfies = r Σ

ADC 2004, 20.01., Dunedin, New Zealand 18 5.1 Update Anomalies I (N, Σ) has strong insertion anomaly iff r Dom(N) with = r Σ and t / r with = r {t} Σ key Σ + inev, but = r {t} Σ Theorem: (N, Σ) in NLNF iff (N, Σ) does not have any strong insertion anomaly (N, Σ) has strong type-1 replacement anomaly iff r Dom(N) with = r Σ and t r, t Dom(N) with π N K (t) = π N K (t ) for minimal key K on N and = r {t} {t } Σ key Σ + inev and = r {t} {t } Σ Theorem: (N, Σ) in NLNF iff (N, Σ) does not have any strong type- 1 replacement anomaly

ADC 2004, 20.01., Dunedin, New Zealand 19 5.2 Update Anomalies II (N, Σ) has strong type-2 replacement anomaly iff r Dom(N) with = r Σ and t r, t Dom(N) with π N K (t) = πn K (t ) for distinguished minimal key K on N and = r {t} {t } Σ key Σ + inev and = r {t} {t } Σ Theorem: (N, Σ) in NLNF iff (N, Σ) does not have any strong type- 2 replacement anomaly (N, Σ) has strong type-3 replacement anomaly iff r Dom(N) with = r Σ and t r, t Dom(N) with π N K (t) = πn K (t ) for all minimal keys K on N and = r {t} {t } Σ key Σ + inev and = r {t} {t } Σ Theorem: (N, Σ) in NLNF does not have any strong type-3 replacement anomaly

ADC 2004, 20.01., Dunedin, New Zealand 20 Record and List Type: 6.1 Extensions: More Results FD implication decidable in linear time MVDs: minimal axiomatisation and finite implication problem FDs/MVDs: minimality and finite implication problem Record, List, Set and Multiset Type: sophisticated axiomatisation for FDs FD implication decidable in O(n 3 s min{s, n}) CVNF proposed and justified XML: several classes of FDs, Axiomatisations

6.2 Extensions: Future Research decomposition/synthesis for NLNF, CVNF unions, references interactions different classes of dependencies: inclusion and join dependencies XML: FIP and Normalisation 21