CS 377 Database Systems Database Design Theory and Normalization Li Xiong Department of Mathematics and Computer Science Emory University 1
Relational database design So far Conceptual database design - ER Model Logical database design - relational model Mapping from ER to Relational Model Relational Algebra SQL Relational database design - relational model goodness measures of relational schemas 2
3
Some Bad Designs
Insert Anomaly: Anomalies an insert operation that insert ONE item of information needs to insert multiple tuples into some relation or needs to use NULL values Delete Anomaly: a delete operation that delete ONE item of information needs to delete multiple tuples from some relation or cause "additional" (unintended) information loss Update Anomaly: an update operation that update ONE item of information needs to update multiple tuples and may result in logical inconsistencies 6
Generation of Spurious Tuples Figure 15.5(a) Relation schemas EMP_LOCS and EMP_PROJ1 NATURAL JOIN Result produces many more tuples than the original set of tuples in EMP_PROJ Called spurious tuples Represent spurious information that is not valid 9
Problematic Designs Anomalies cause redundant work to be done Waste of storage space due to NULLs Difficulty of performing operations and joins due to NULL values Generation of invalid and spurious data during joins 10
Informal Design Guidelines for Good Relation Schemas Clear schema and attribute semantics No insertion, deletion, or update anomalies are present Reducing redundant information in tuples Reducing NULL values in tuples Can be joined with equality conditions on related attributes with guarantees that no spurious tuples are generated 11
Normal forms Database Design Theory Each Normal Form defines a set of properties that relations must satisfy When relations possess these properties, they exhibit less anomalies Successively higher degrees of stringency Database normalization Certify whether a database design satisfies a certain normal form Correct a database design to achieve certain normal form Additional properties Nonadditive join property Dependency preservation property 12
History Relational database model 1970, Codd 1NF, 2NF and 3NF (first, second, and third normal form) 1972, Codd Based on the concept of functional dependency BCNF (Boyce-Codd Normal Form) 4NF 1974, Boyce & Codd new and stronger 3NF 1977, Fagin multi-valued dependencies 5NF (projection-join normal form) 1979, Fagin 13
First Normal Form Part of the formal definition of a relation in the basic (flat) relational model Only attribute values permitted are single atomic (or indivisible) values Techniques to achieve first normal form Remove attribute violating 1NF and place in separate relation Expand the key Use several atomic attributes if maximum number of values is known 14
15
Functional Dependency Constraint between two sets of attributes X functionally determines Y Y is functionally dependent on X Notes If X is a candidate key of R, then X R If X Y, not necessarily Y X 16
Example FDs
Example FDs
Functional Dependency An FD is a property of semantics or meaning of the attributes An FD is a property of the relational schema, not of a particular legal relation state An FD must be defined based on the semantics of the attributes An FD cannot be inferred automatically from a given populated relation An FD may exist Can state that an FD does not hold if there are violations of such an FD 19
Definitions of Keys and Attributes Participating in Keys Definition of superkey and key Candidate key If more than one key in a relation schema One is primary key Others are secondary keys 20
Second Normal Form Full functional dependency vs. Partial functional dependency X Y is a full functional dependency if for any A, (X- {A}) does not functionally determine Y X Y is a partial functional dependency if for some A, (X-{A}) functionally determines Y Second normal form (2NF) Problematic FD Left-hand side is part of primary key 21
Third Normal Form Transitive dependency X Y is a transitive dependency if for some Z that is not a prime attribute, both X Z and Z Y hold Third normal form Problematic FD Left-hand side is part of primary key Left-hand side is a nonkey attribute 24
Summary 27
28
29
30
BCNF Boyce-Codd Normal Form Difference from 3NF: 3NF allows A to be prime Every relation in BCNF is also in 3NF Most relation schemas that are in 3NF are also in BCNF but not all: 31
32
33
Summary Informal guidelines for good design Functional dependency Normal forms 1NF, 2NF, 3NF, BCNF 34