Chapter 8. Database Design II: Relational Normalization Theory


 Kimberly Hampton
 3 years ago
 Views:
Transcription
1 Chapter 8 Database Design II: Relational Normalization Theory The ER approach is a good way to start dealing with the complexity of modeling a realworld enterprise. However, it is only a set of guidelines that requires considerable expertise and intuition, and it can lead to several alternative designs for the same enterprise. Unfortunately, the ER approach does not include the criteria or tools to help evaluate alternative designs and suggest improvements. In this chapter, we present the relational normalization theory, which includes a set of concepts and algorithms that can help with the evaluation and refinement of the designs obtained through the ER approach. The main tool used in normalization theory is the notion of functional dependency (and, to a lesser degree, join dependency). Functional dependency is a generalization of the key dependencies in the ER approach whereas join dependencies do not have an ER counterpart. Both types of dependency are used by designers to spot situations in which the ER design unnaturally places attributes of two distinct entity types into the same relation schema. These situations are characterized in terms of normal forms, whence comes the term normalization theory. Normalization theory forces relations into an appropriate normal form using decompositions, which break up schemas involving unhappy unions of attributes of unrelated entity types. Because of the central role that decompositions play in relational design, the techniques that we are about to discuss are sometimes also called relational decomposition theory. 8.1 THE PROBLEM OF REDUNDANCY Example is the best way to understand the potential problems with relational designs based on the ER approach. Consider the CREATE TABLE Person statement (5.1) on page 93. Recall that this relation schema was obtained by direct translation from the ER diagram in Figure 5.1. The first indication of something wrong with this translation was the realization that SSN is not a key of the resulting Person relation. Instead, the key is a combination (SSN, Hobby). In other words, the attribute SSN does not uniquely identify the tuples in the Person relation even though it 211
2 212 Chapter 8 Database Design II: Relational Normalization Theory does uniquely identify the entities in the Person entity set. Not only is this counterintuitive, but it also has a number of undesirable effects on the instances of the Person relation schema. To see this, we take a closer look at the relation instance shown in Figure 5.2, page 93. Notice that John Doe and Mary Doe are both represented by multiple tuples and that their addresses, names, and Ids occur multiple times as well. Redundant storage of the same information is apparent here. However, wasted space is the least of the problems. The real issue is that when database updates occur, we must keep all the redundant copies of the same data consistent with each other, and we must do it efficiently. Specifically, we can identify the following problems: Update anomaly. If John Doe moves to 1 Hill Top Dr., updating the relation in Figure 5.2 requires changing the address in both tuples that describe the John Doe entity. Insertion anomaly. Suppose that we decide to add Homer Simpson to the Person relation, but Homer s information sheet does not specify any hobbies. One way around this problem might be to add the tuple , Homer Simpson, Fox 5 TV, NULL that is, to fill in the missing field with NULL. However, Hobby is part of the primary key, and most DBMSs do not allow null values in primary keys. Why? For one thing, DBMSs generally maintain an index on the primary key, and it is not clear how the index should refer to the null value. Assuming that this problem can be solved, suppose that a request is made to insert , Homer Simpson, Fox 5 TV, acting. Should this new tuple just be added, or should it replace the existing tuple , Homer Simpson, Fox 5 TV, NULL? A human will most likely choose to replace it, because humans do not normally think of hobbies as a defining characteristic of a person. However, how does a computer know that the tuples with primary key , NULL and , acting refer to the same entity? (Recall that the information about which tuple came from which entity is lost in the translation!) Redundancy is at the root of this ambiguity: If Homer were described by at most one tuple, only one course of action would be possible. Deletion anomaly. Suppose that Homer Simpson is no longer interested in acting. How are we to delete this hobby? We can, of course, delete the tuple that talks about Homer s acting hobby. However, since there is only one tuple that refers to Homer (see Figure 5.2), this throws out perfectly good information about Homer s Id and address. To avoid this loss of information, we can try to replace acting with NULL. Unfortunately, this again raises the issue of nulls in primary key attributes. Once again, redundancy is the culprit. If only one tuple could possibly describe Homer, the attribute Hobby would not be part of the key. For convenience, we sometimes use the term update anomalies to refer to all of the above anomaly types.
3 8.2 Decompositions 213 Figure 8.1 Decomposition of the Person relation shown in Figure 5.2. SSN Hobby SSN Name Address John Doe 123 Main St Mary Doe 7 Lake Dr Bart Simpson Fox 5 TV (a) Person stamps hiking coins hiking skating acting (b) Hobby 8.2 DECOMPOSITIONS The problems caused by redundancy wasted storage and anomalies can be easily fixed using the following simple technique. Instead of having one relation describe all that is known about persons, we can use two separate relation schemas. Person1(SSN, Name, Address) Hobby(SSN, Hobby) (8.1) Projecting the relation in Figure 5.2 on each of these schemas yields the result shown in Figure 8.1. The new design has the following important properties: 1. The original relation of Figure 5.2 is exactly the natural join of the two relations in Figure 8.1. In fact, one can prove that this property is not an artifact of our example that is, that under certain natural assumptions 1 any relation, r, over the schema of Figure 5.2 equals the natural join of the projections of r on the two relational schemas in (8.1). This property, called losslessness, will be discussed in Section This means that our decomposition preserves the original information represented by the Person relation. 2. The redundancy present in the original relation of Figure 5.2 is gone and so are the update anomalies. The only items that are stored more than once are SSNs, which are identifiers of entities of type Person. Thus, changes to addresses, names, or hobbies now affect only a single tuple. Likewise, the removal of Bart Simpson s hobbies from the database does not delete the information about his address and does not require us to rely on null values. The insertion anomaly is also gone because we can now add people and hobbies independently. Observe that the new design still has a certain amount of redundancy and that we might still need to use null values in certain cases. First, since we use SSNs as tuple 1 Namely, that SSN uniquely determines the name and address of a person.
4 214 Chapter 8 Database Design II: Relational Normalization Theory Figure 8.2 The ultimate decomposition. SSN Name John Doe Mary Doe Bart Simpson Address 123 Main St. 7 Lake Dr. Fox 5 TV Hobby stamps hiking coins skating acting identifiers, each SSN can occur multiple times and all of these occurrences must be kept consistent across the database. So consistency maintenance has not been eliminated completely. However, if the identifiers are not dynamic (for instance, SSNs, which do not change frequently), consistency maintenance is considerably simplified. Second, imagine a situation in which we add a person to our Person1 relation and the address is not known. Clearly, even with the new design, we have to insert NULL in the Address field for the corresponding tuple. However, Address is not part of a primary key, so the use of NULL here is not that bad (we will still have difficulties joining Person1 ontheaddress attribute). It is important to understand that not all decompositions are created equal. In fact, most of them do not make any sense even though they might be doing a good job at eliminating redundancy. The decomposition Ssn(SSN) Name(Name) Address(Address) (8.2) Hobby(Hobby) is the ultimate redundancy eliminator. Projecting on the relation of Figure 5.2 yields a database where each value appears exactly once, as shown in Figure 8.2. Unfortunately, this new database is completely devoid of any useful information; for instance, it is no longer possible to tell where John Doe lives or who collects stamps as a hobby. This situation is in sharp contrast with the decomposition of Figure 8.1, where we were able to completely restore the information represented by the original relation using a natural join. The need for schema refinement. Translation of the Person entity type into the relational model indicates that one cannot rely solely on the ER approach for designing database schemas. Furthermore, the problems exhibited by the Person example are by no means rare or unique. Consider the relationship HasAccount of Figure A typical translation of HasAccount into the relational model might be CREATE TABLE HasAccount ( AccountNumber INTEGER NOT NULL, ClientId CHAR(20), (8.3) OfficeId INTEGER,
5 8.3 Functional Dependencies 215 PRIMARY KEY (ClientId, OfficeId), FOREIGN KEY (OfficeId) REFERENCES Office ) Recall that a client can have at most one account in an office and hence (ClientId, OfficeId) is a key. Also, an account must be assigned to exactly one office. Careful analysis shows that this requirement leads to some of the same problems that we saw in the Person example. For example, a tuple that records the fact that a particular account is managed by a particular office cannot be added without also recording client information (since ClientId is part of the primary key) which is an insertion anomaly. This (and the dual deletion anomaly) is perhaps not a serious problem, because of the specifics of this particular application, but the update anomaly could present maintenance issues. Moving an account from one office to another involves changing OfficeId in every tuple corresponding to that account. If the account has multiple clients, this might be a problem. We return to this example later in this chapter because HasAccount exhibits certain interesting properties not found in the Person example. For instance, even though a decomposition of HasAccount might still be desirable, it incurs additional maintenance overhead that the decomposition of Person does not. The above discussion brings out two key points: (1) Decomposition of relation schemas can serve as a useful tool that complements the ER approach by eliminating redundancy problems; (2) The criteria for choosing the right decomposition are not immediately obvious, especially when we have to deal with schemas that contain many attributes. For these reasons, the purpose of Sections 8.3 through 8.6 is to develop techniques and criteria for identifying relation schemas that are in need of decomposition, as well as to understand what it means for a decomposition not to lose information. The central tool in developing much of decomposition theory is functional dependency, which is a generalization of the idea of key constraints. Functional dependencies are used to define normal forms a set of requirements on relational schemas that are desirable in updateintensive transaction systems. This is why the theory of decompositions is often also called normalization theory. Sections 8.5 through 8.9 develop algorithms for carrying out the normalization process. 8.3 FUNCTIONAL DEPENDENCIES For the remainder of this chapter, we use a special notation for representing attributes, which is common in relational normalization theory, as follows: Capital letters from the beginning of the alphabet (A, B, C, D) represent individual attributes; capital letters from the middle to the end of the alphabet with bars over them (P, V, W, X, Y, Z) represent sets of attributes. Also, strings of letters, such as ABCD, denote sets of the respective attributes ({A, B, C, D}); strings of letters with bars over them, (X Y Z), stand for unions of these sets (X Y Z). Although this notation requires some getting used to, it is very convenient and provides a succinct language, which we use in examples and definitions.
6 216 Chapter 8 Database Design II: Relational Normalization Theory A functional dependency (FD) on a relation schema, R, is a constraint of the form X Y, where X and Y are sets of attributes used in R. Ifr is a relation instance of R, it is said to satisfy this functional dependency if For every pair of tuples, t and s, inr, ift and s agree on all attributes in X, then t and s agree on all attributes in Y. Put another way, there must not be a pair of tuples in r such that they have the same values for every attribute in X but different values for some attribute in Y. Note that the key constraints introduced in Chapter 4 are a special kind of FD. Suppose that key(k) is a key constraint on the relational schema R and that r is a relational instance over R. By definition, r satisfies key(k) if and only if there is no pair of distinct tuples, t, s r, such that t and s agree on every attribute in key(k). Therefore, this key constraint is equivalent to the FD K R, where K is the set of attributes in the key constraint and R denotes the set of all attributes in the schema R. Keep in mind that functional dependencies are associated with relation schemas, but when we consider whether or not a functional dependency is satisfied we must consider relation instances over those schemas. This is because FDs are integrity constraints on the schema (much like key constraints), which restrict the set of allowable relation instances to those that satisfy the given FDs. Thus, given a schema, R = (R; Constraints), where R is a set of attributes and Constraints is a set of FDs, we are looking for a set of all relation instances over R that satisfies every FD in Constraints. Such relational instances are called legal instances of R. Functional dependencies and update anomalies. Certain functional dependencies that exist in a relational schema can lead to redundancy in the corresponding relation instances. Consider the two examples discussed in Sections 8.1 and 8.3: the schemas Person and HasAccount. Each has a primary key, as illustrated by the corresponding CREATE TABLE commands (5.1), page 93, and (8.3), page 214. Correspondingly, there are the following functional dependencies: Person: SSN,Hobby SSN,Name,Address,Hobby HasAccount: ClientId,OfficeId AccountNumber, (8.4) ClientId,OfficeId These are not the only FDs implied by the original specifications, however. For instance, both Name and Address are defined as singlevalued attributes in the ER diagram of Figure 5.1. This clearly implies that one Person entity (identified by its attribute SSN) can have at most one name and one address. Similarly, the business rules of PSSC (the brokerage firm discussed in Section 5.5) require that every account be assigned to exactly one office, which means that the following FDs must also hold for the corresponding relation schemas: Person: SSN Name, Address HasAccount: AccountNumber OfficeId (8.5)
7 8.4 Properties of Functional Dependencies 217 It is easy to see that the syntactic structure of the dependencies in (8.4) closely corresponds to the update anomalies that we identified for the corresponding relations. For instance, the problem with Person was that for any given SSN we could not change the values for the attributes Name and Address independently of whether the corresponding person had hobbies: If the person had multiple hobbies, the change had to occur in multiple rows. Likewise with HasAccount we cannot change the value of OfficeId (i.e., transfer an account to a different office) without having to look for all clients associated with this account. Since a number of clients might share the same account, there can be multiple rows in HasAccount that refer to the same account; hence, multiple rows in which OfficeId must be changed. Note that in both cases, the attributes involved in the update anomalies appear on the lefthand sides of an FD. We can see that update anomalies are associated with certain kinds of functional dependencies. Which dependencies are bad? At the risk of giving away the secret, we draw your attention to one major difference between the dependencies in (8.4) and (8.5): The former specify key constraints for their corresponding relations whereas the latter do not. However, simply knowing which dependencies cause the anomalies is not enough we must do something about them. We cannot just abolish the offending FDs, because they are part of the semantics of the enterprise being modeled by the databases. They are implicitly or explicitly part of the Requirements Document and cannot be changed without an agreement with the customer. On the other hand, we saw that schema decomposition can be a useful tool. Even though a decomposition cannot abolish a functional dependency, it can make it behave. For instance, the decomposition shown in (8.1) yields schemas in which the offending FD, SSN Name, Address, becomes a wellbehaved key constraint. 8.4 PROPERTIES OF FUNCTIONAL DEPENDENCIES Before going any further, we need to learn some mathematical properties of functional dependencies and develop algorithms to test them. Since these properties and algorithms rely heavily on the notational conventions introduced at the beginning of Section 8.3, it might be a good idea to revisit these conventions. The properties of FDs that we are going to study are based on entailment. Consider a set of attributes R, a set, F, of FDs over R, and another FD, f,onr. We say that F entails f if every relation r over the set of attributes R has the following property: If r satisfies every FD in F, then r satisfies the FD f. Given a set of FDs, F, the closure of F, denoted F +, is the set of all FDs entailed by F. Clearly, F + contains F as a subset. 2 2 If f F, then every relation that satisfies every FD in F obviously satisfies f. Therefore, by the definition of entailment, f is entailed by F.
8 218 Chapter 8 Database Design II: Relational Normalization Theory If F and G are sets of FDs, we say that F entails G if F entails every individual FD in G. F and G are said to be equivalent if F entails G and G entails F. We now present several simple but important properties of entailment. Reflexivity. Some FDs are satisfied by every relation no matter what. These dependencies all have the form X Y, where Y X, and are called trivial FDs. The reflexivity property states that, if Y X, then X Y. To see why trivial FDs are always satisfied, consider a relation, r, whose set of attributes includes all of the attributes mentioned in X Y. Suppose that t, s r are tuples that agree on X. But, since Y X, this means that t and s agree on Y as well. Thus, r satisfies X Y. We can now relate trivial FDs and entailment. Because a trivial FD is satisfied by every relation, it is entailed by every set of FDs! In particular, F + contains every trivial FD. Augmentation. Consider an FD, X Y, and another set of attributes, Z. Let R contain X Y Z. Then X Y entails X Z Y Z. In other words, every relation r over R that satisfies X Y must also satisfy the FD X Z Y Z. The augmentation property states that, if X Y, then X Z Y Z. To see why this is true, if tuples t, s r agree on every attribute of X Z then in particular they agree on X. Since r satisfies X Y, t and s must also agree on Y. They also agree on Z, since we have assumed that they agree on a bigger set of attributes, X Z. Thus, if s, t agree on every attribute in X Z, they must agree on Y Z. As this is an arbitrarily chosen pair of tuples in r, it follows that r satisfies X Z Y Z. Transitivity. The set of FDs {X Y, Y Z} entails the FD X Z. The transitivity property states that, if X Y and Y Z, then X Z. This property can be established similarly to the previous two (see the exercises). These three properties of FDs are known as Armstrong s Axioms, 3 and their main use is in the proofs of correctness of various database design algorithms. However, they are also a powerful tool used by (real, breathing human) database designers because they can help them quickly spot problematic FDs in relational schemas. We now show how Armstrong s axioms are used to derive new FDs. Union of FDs. Any relation, r, that satisfies X Y and X Z must also satisfy X Y Z. To show this, we can derive X Y Z from X Y and X Z using simple syntactic manipulations defined by Armstrong s axioms. Such manipulations can 3 Strictly speaking, these are inference rules, because they derive new rules out of old ones. Only the reflexivity rule can be viewed as an axiom.
9 8.4 Properties of Functional Dependencies 219 be easily programmed on a computer, unlike the tuplebased considerations we used to establish the axioms themselves. Here is how it is done: (a) X Y Given (b) X Z Given (c) X Y X Adding X to both sides of (a): Armstrong s augmentation rule (d) Y X Y Z Adding Y to both sides of (b): Armstrong s augmentation rule (e) X Y Z By Armstrong s transitivity rule applied to (c) and (d) Decomposition of FDs. In a similar way, we can prove the following rule: Every relation that satisfies X Y Z must also satisfy the FDs X Y and X Z. This is accomplished by the following simple steps: (a) X Y Z Given (b) Y Z Y By Armstrong s reflexivity rule, since Y Y Z (c) X Y By transitivity from (a) and (b) Derivation of X Z is similar. Armstrong s axioms are obviously sound.by sound we mean that any expression of the form X Y derived using the axioms is actually a functional dependency. Soundness follows from the fact that we have proved that these inference rules are valid for every relation. It is much less obvious, however, that they are also complete that is, if a set of FDs, F, entails another FD, f, then f can be derived from F by a sequence of steps similar to the ones above that rely solely on Armstrong s axioms! We do not prove this fact here, but a proof can be found in [Ullman 1988]. Soundness and completeness of Armstrong s axioms is not just a theoretical curiosity this result has considerable practical value because it guarantees that entailment of FDs (i.e., the question of whether f F + ) can be verified by a computer program. We are now going to develop one such algorithm. An obvious way to verify entailment of an FD, f, by a set of FDs, F, isto instruct the computer to apply Armstrong s axioms to F in all possible ways. Since the number of attributes mentioned in F and f is finite, this derivation process cannot go on forever. When we are satisfied that all possible derivations have been made, we can simply check whether f is among the FDs derived by this process. Completeness of Armstrong s axioms guarantees that f F + if and only if f is one of the FDs thus derived. To see how this process works, consider the following sets of FDs: F ={AC B, A C, D A} and G ={A B, A C, D A, D B}. We can use Armstrong s axioms to prove that these two sets are equivalent, i.e., every FD in G is entailed by F, and vice versa. For instance, to prove that A B is implied by F, we can apply Armstrong s axioms in all possible ways. Most of these attempts will not lead anywhere, but a few will. For instance, the following derivation establishes the desired entailment:
10 220 Chapter 8 Database Design II: Relational Normalization Theory (a) A C (b) A AC (c) A B An FD in F From (a) and Armstrong s augmentation axiom From (b), AC B F, and Armstrong s transitivity axiom The FDs A C and D A belong to both F and G, so the derivation is trivial. For D B in G, the computer can try to apply Armstrong s axioms until this FD is derived. After a while, it will stumble upon this valid derivation: (a) D A an FD in F (b) A B derived previously (c) D B from (a), (b), and Armstrong s transitivity axiom This shows that every FD in F entailed by G is done similarly. Although the simplicity of checking entailment by blindly applying Armstrong s axioms is attractive, it is not very efficient. In fact, the size of F + can be exponential in the size of F, so for large database schemas it can take a very long time before the designer ever sees the result. We are therefore going to develop a more efficient algorithm, which is also based on Armstrong s axioms but which applies them much more judiciously. Checking entailment of FDs. The idea of the new algorithm for verifying entailment is based on attribute closure. Given a set of FDs, F, and a set of attributes, X, we define the attribute closure of X with respect to F, denoted X + F, as follows: X + F ={A X A F+ } In other words, X + F is a set of all those attributes, A, such that X A is entailed by F. Note that X X + F because, if A X, then, by Armstrong s reflexivity axiom, X A is a trivial FD that is entailed by every set of FDs, including F. It is important to understand that the closure of F (i.e., F + ) and the closure of X (i.e., X + F ) are related but different notions: F+ is a set of functional dependencies, whereas X + F is a set of attributes. The more efficient algorithm for checking entailment of FDs now works as follows: Given a set of FDs, F, and an FD, X Y, check whether Y X + F. If this is so, then F entails X Y. Otherwise, if Y X + F, then F does not entail X Y. The correctness of this algorithm follows from Armstrong s axioms. If Y X + F, then X A F + for every A Y (by the definition of X + F ). By the union rule for FDs, F thus entails X Y. Conversely, if Y X + F, then there is B Y such that B X + F. Hence, X B is not entailed by F. But then F cannot entail X Y; ifit did, it would have to entail X B as well, by the decomposition rule for FDs. The heart of the above algorithm is a check of whether a set of attributes belongs to X + F. Therefore, we are not done yet: We need an algorithm for computing the
11 8.4 Properties of Functional Dependencies 221 Figure 8.3 Computation of attribute closure X + F. closure := X repeat old := closure if there is an FD Z V F such that Z closure then closure := closure V until old = closure return closure closure of X, which we present in Figure 8.3. The idea behind the algorithm is enlarging the set of attributes known to belong to X + F by applying the FDs in F. The closure is initialized to X, since we know that X is always a subset of X + F. The soundness of the algorithm can be proved by induction. Initially, closure is X, so X closure is in F +. Then, assuming that X closure F + at some intermediate step in the repeat loop of Figure 8.3, and given an FD Z V F such that Z closure, we can use the generalized transitivity rule (see exercise 8.7) to infer that F entails X closure V. Thus, if A closure at the end of the computation, then A X + F. The converse is also true: If A X+ F, then at the end of the computation A closure (see exercise 8.8). Unlike the simpleminded algorithm that uses Armstrong s axioms indiscriminately, the runtime complexity of the algorithm in Figure 8.3 is quadratic in the size of F. In fact, an algorithm for computing X + F that is linear in the size of F is given in [Beeri and Bernstein 1979]. This algorithm is better suited for a computer program, but its inner workings are more complex. Example (Checking Entailment) Consider a relational schema, R = (R; F), where R = ABCDEFGHIJ, and the set of FDs, F, which contains the following FDs: AB C, D E, AE G, GD H, ID J. We wish to check whether F entails ABD GH and ABD HJ. First, let us compute ABD + F. We begin with closure = ABD. Two FDs can be used in the first iteration of the loop in Figure 8.3. For definiteness, let us use AB C, which makes closure = ABDC. In the second iteration, we can use D E, which makes closure = ABDCE. Now it becomes possible to use the FD AE G in the third iteration, yielding closure = ABDCEG. This in turn allows GD H to be applied in the fourth iteration, which results in closure = ABDCEGH. In the fifth iteration, we cannot apply any new FDs, so closure does not change and the loop terminates. Thus, ABD + F = ABDCEGH. Since GH ABDCEGH, we conclude that F entails ABD GH. On the other hand, HJ ABDCEGH, so we conclude that ABD HJ is not entailed by F. Note, however, that F does entail ABD H.
12 222 Chapter 8 Database Design II: Relational Normalization Theory Figure 8.4 Testing equivalence of sets of FDs. Input: F, G FD sets Output: true,iff is equivalent to G; false otherwise for each f F do if G does not entail f then return false for each g G do if F does not entail g then return false return true The above algorithm for testing entailment leads to a simple test for equivalence between a pair of sets of FDs. Let F and G be such sets. To check that they are equivalent, we must check that every FD in G is entailed by F, and vice versa. The algorithm is depicted in Figure NORMAL FORMS To eliminate redundancy and potential update anomalies, database theory identifies several normal forms for schemas such that, if a schema is in one of the normal forms, it has certain predictable properties. Originally, [Codd 1970] proposed three normal forms, each eliminating more anomalies than the previous one. The first normal form (1NF), as introduced by Codd, is equivalent to the definition of the relational data model. The second normal form (2NF) was an attempt to eliminate some potential anomalies, but it has turned out to be of no practical use, so we do not discuss it. The third normal form (3NF) was initially thought to be the ultimate normal form. However, Boyce and Codd soon realized that 3NF can still harbor undesirable combinations of functional dependencies, so they introduced the socalled Boyce Codd normal form (BCNF). Unfortunately, the sobering reality of computational sciences is that there is no free lunch. Even though BCNF is more desirable, it is not always achievable without paying a price elsewhere. In this section, we define both BCNF and 3NF. Subsequent sections develop algorithms for automatically converting relational schemas that possess various bad properties into sets of schemas in 3NF and BCNF. We also study the tradeoffs associated with such conversions. Toward the end of this chapter, we show that certain types of redundancy are caused not by FDs but by other dependencies. To deal with this problem, we introduce the fourth normal form (4NF), which further extends BCNF. The BoyceCodd Normal Form. A relational schema, R = (R; F), where R is the set of attributes of R and F is the set of functional dependencies associated with R, is in BoyceCodd normal form if, for every FD X Y F, either of the following is true: Y X (i.e., this is a trivial FD). X is a superkey of R.
13 8.5 Normal Forms 223 In other words, the only nontrivial FDs are those in which a key functionally determines one or more attributes. It is easy to see that Person1 and Hobby, the relational schemas given in (8.1), are in BCNF, because the only nontrivial FD is SSN Name, Address. It applies to Person1 which has SSN as a key. On the other hand, consider the schema Person defined by the CREATE TABLE statement (5.1), page 93, and the schema HasAccount defined by the SQL statement (8.3), page 214. As discussed earlier, these statements fail to capture some important relationships, which are represented by the FDs in (8.5), page 216. Each of these FDs is in violation of the requirement to be in BCNF: They are not trivial, and their lefthand sides SSN and AccountNumber are not keys of their respective schemas. Note that a BCNF schema can have more than one key. For instance, R = (ABCD; F), where F ={AB CD, AC BD} has two keys, AB and AC. And yet it is in BCNF because the lefthand side of each of the two FDs in F is a key. An important property of BCNF schemas is that their instances do not contain redundant information. Since we have been illustrating redundancy problems only through concrete examples, the above statement might seem vague. Exactly what is redundant information? For instance, does the abstract relation A B C D over the above mentioned BCNF schema R store redundant information? Superficially it might seem so, because the two tuples agree on all but one attribute. However, having identical values in some attributes of different tuples does not necessarily imply that the tuples are storing redundant information. Redundancy arises when the values of some set of attributes, X, necessarily implies the value that must exist in another attribute, A a functional dependency. If two tuples have the same values in X, they must have the same value of A. Redundancy is eliminated if we store the association between X and A only once (in a separate relation) instead of repeating it in all tuples of an instance of the schema R that agree on X. Since R does not have FDs over the attributes BCD, no redundant information is stored. The fact that the tuples in the relation coincide over BCD is coincidental. For instance, the value of attribute D in the first tuple can be changed from 4 to 5 without regard for the second tuple. A DBMS automatically eliminates one type of redundancy: Two tuples with the same values in the key fields are prohibited in any instance of a schema. This is a special case: The key identifies an entity and so determines the values of all attributes describing that entity. As the definition of BCNF precludes associations that do not contain keys, the relations over BCNF schemas do not store redundant information. As a result, deletion and update anomalies do not arise in BCNF relations.
14 224 Chapter 8 Database Design II: Relational Normalization Theory Relations with more than one key still can have insertion anomalies. To see this, suppose that associations over ABD and over ACD are added to our relation as shown: A B C D NULL 5 3 NULL 2 5 Because the value over the attribute C in the first association and over B in the second is unknown, we fill in the missing information with NULL. However, now we cannot tell if the two newly added tuples are the same it all depends on the real values for the nulls. A practical solution to this problem, as adopted by the SQL standard, is to designate one key as primary and to prohibit null values in its attributes. The Third Normal Form. A relational schema, R = (R; F), where R is the set of attributes of R and F is the set of functional dependencies associated with R, isin third normal form if, for every FD X A F, any one of the following is true: A X, (i.e., this is a trivial FD). X is a superkey of R. A K for some key K of R. Observe that the first two conditions in the definition of 3NF are identical to the conditions that define BCNF. Thus, 3NF is a relaxation of BCNF s requirements. Every schema in BCNF must also be in 3NF, but the converse is not true in general. For instance, the relation HasAccount (8.3) on page 214 is in 3NF because the only FD that is not based on a key constraint is AccountNumber OfficeId, and OfficeId is part of the key. However, this relation is not in BCNF, as shown previously. 4 If you are wondering about the intrinsic merit of the third condition in the definition of 3NF, the answer is that there is none. In a way, 3NF was discovered by mistake in the search for what we now call BCNF! The reason for its remarkable survival is that it was found later to have some desirable algorithmic properties that BCNF does not have. We discuss these issues in subsequent sections. Recall from Section 8.2 that relation instances over HasAccount might store redundant information. Now we can see that this redundancy arises because of the functional dependency that relates AccountNumber and OfficeId and that is not implied by key constraints. 4 In fact, HasAccount is the smallest possible example of a 3NF relation that is not in BCNF (see Exercise 8.5).
15 8.6 Properties of Decompositions 225 For another example, consider the schema Person discussed earlier. This schema violates the 3NF requirements because, for example, the FD SSN Name is not based on a key constraint (SSN is not a superkey) and Name does not belong to a key of Person. However, the decomposition of this schema into Person1 and Hobby in (8.1), page 213, yields a pair of schemas that are in both 3NF and BCNF. 8.6 PROPERTIES OF DECOMPOSITIONS Since there is no redundancy in BCNF schemas and redundancy in 3NF is limited, we are interested in decomposing a given schema into a collection of schemas, each of which is in one of these normal forms. The main thrust of the discussion in the previous section was that 3NF does not completely solve the redundancy problem. Therefore, at first glance, there appears to be no justification to consider 3NF as a goal for database design. It turns out, however, that the maintenance problems associated with redundancy do not show the whole picture. As we will see, maintenance is also associated with integrity constraints, 5 and 3NF decompositions sometimes have better properties in this regard than do BCNF decompositions. Our first step is to define these properties. Recall from Section 8.2 that not all decompositions are created equal. For instance, the decomposition of Person shown in (8.1) is considered good while the one in (8.2) makes no sense. Is there an objective way to tell which decompositions make sense and which do not, and can this objective way be explained to a computer? The answer to both questions is yes. The decompositions that make sense are called lossless. Before tackling this notion, we need to be more precise about what we mean by a decomposition. A decomposition of a schema, R = (R; F), where R is a set of attributes of the schema and F is its set of functional dependencies, is a collection of schemas R 1 = (R 1 ; F 1 ), R 2 = (R 2 ; F 2 ),..., R n = (R n ; F n ) such that the following conditions hold 1. R = n i=1 R i 2. F entails F i for every i = 1,..., n. The first part of the definition is clear: A decomposition should not introduce new attributes, and it should not drop attributes found in the original schema. The second part of the definition says that a decomposition should not introduce new functional dependencies (but may drop some). We discuss the second requirement in more detail later. The decomposition of a schema naturally leads to decomposition of relations over it. A decomposition of a relation, r, over schema R is a set of relations r 1 = π R1 (r), r 2 = π R2 (r),..., r n = π Rn (r) 5 For example, a particular integrity constraint in the original table might be checkable in the decomposed tables only by taking the join of these tables which results in significant overhead at run time.
16 226 Chapter 8 Database Design II: Relational Normalization Theory where π is the projection operator. It can be shown (see exercise 8.9) that, if r is a valid instance of R, then each r i satisfies all FDs in F i and thus each r i is a valid relation instance over the schema R i. The purpose of a decomposition is to replace the original relation, r, with a set of relations r 1,..., r n over the schemas that constitute the decomposed schema. In view of the above definitions, it is important to realize that schema decomposition and relation instance decomposition are two different but related notions. Applying the above definitions to our running example, we see that splitting Person into Person1 and Hobby (see (8.1) and Figure 8.1) yields a decomposition. Splitting Person as shown in (8.2) and Figure 8.2 is also a decomposition in the above sense. It clearly satisfies the first requirement for being a decomposition. It also satisfies the second, since only trivial FDs hold in (8.2), and these are entailed by every set of dependencies. This last example shows that the above definition of a decomposition does not capture all of the desirable properties of a decomposition because, as you may recall from Section 8.2, the decomposition in (8.2) makes no sense. In the following, we introduce additional desirable properties of decompositions Lossless and Lossy Decompositions Consider a relation, r, and its decomposition, r 1,..., r n, as defined above. Since after the decomposition the database no longer stores the relation r and instead maintains its projections r 1,..., r n, the database must be able to reconstruct the original relation r from these projections. Not being able to reconstruct r means that the decomposition does not represent the same information as does the original database (imagine a bank losing the information about who owns which account or, worse, giving those accounts to the wrong owners!). In principle, one can use any computational method that guarantees reconstruction of r from its projections. However, the natural and, in most cases, practical method is the natural join. We thus assume that r is reconstructible if and only if r = r 1 r 2 r n Reconstructibility must be a property of schema decomposition and not of a particular instance over this schema. At the database design stage, the designer manipulates schemas, not relations, and any transformation performed on a schema must guarantee that reconstructibility holds for all of its valid relation instances. This discussion leads to the following notion. A decomposition of schema R = (R; F) into a collection of schemas R 1 = (R 1 ; F 1 ), R 2 = (R 2 ; F 2 ),..., R n = (R n ; F n ) is lossless if, for every valid instance r of schema R, where r = r 1 r 2 r n r 1 = π R1 (r), r 2 = π R2 (r),..., r n = π Rn (r)
17 8.6 Properties of Decompositions 227 A decomposition is lossy otherwise. In plain terms, a lossless schema decomposition is one that guarantees that any valid instance of the original schema can be reconstructed from its projections on the individual schemas of the decomposition. Note that r r 1 r 2 r n holds for any decomposition (exercise 8.10), so losslessness really just states the opposite inclusion. The fact that r r 1 r 1 r n r r 1 r 2 r n holds, no matter what, may seem confusing at first. If we can get more tuples by joining the projections of r, why is such a decomposition called lossy? After all, we gained more tuples, not less! To clarify this issue, observe that what we might lose here are not tuples but rather information about which tuples are the right ones. Consider, for instance, the decomposition (8.2) of schema Person. Figure 8.2 presents the corresponding decomposition of a valid relation instance over Person shown in Figure 5.2. However, if we now compute a natural join of the relations in the decomposition (which becomes a Cartesian product since these relations do not share attributes), we will not be able to tell who lives where and who has what hobbies. The relationship among names and SSNs is also lost. In other words, when reconstructing the original relation getting more tuples is as bad as getting fewer we must get exactly the set of tuples in the original relation. Now that we are convinced of the importance of losslessness, we need an algorithm that a computer can use to verify this property, since the definition of lossless joins does not provide an effective test but only tells us to try every possible relation. This is not feasible or efficient. A general test of whether a decomposition into n schemas is lossless exists, but is somewhat complex. It can be found in [Beeri et al. 1981]. However, there is a much simpler test that works for binary decompositions, that is, decompositions into a pair of schemas. This test can establish losslessness of a decomposition into more than two schemas, provided that this decomposition was obtained by a series of binary decompositions. Since most decompositions are obtained in this way, the simple binary test introduced below is sufficient for most practical purposes. Testing the losslessness of a binary decomposition. Let R = (R; F) be a schema and R 1 = (R 1 ; F 1 ), R 2 = (R 2 ; F 2 ) be a binary decomposition of R. This decomposition is lossless if and only if either of the following is true: (R 1 R 2 ) R 1 F +. (R 1 R 2 ) R 2 F +.
18 228 Chapter 8 Database Design II: Relational Normalization Theory Figure 8.5 Tuple structure in a lossless binary decomposition: A row of r 1 combines with exactly one row of r 2. R 1 R 2 r 1 r 2 R 1 R2 R 2 To see why this is so, suppose that (R 1 R 2 ) R 2 F +. Then R 1 is a superkey of R since the values in a subset of attributes of R 1 determine the values of all attributes of R 2 (and, obviously, R 1 functionally determines R 1 ). Let r be a valid relation instance for R. Since R 1 is a superkey of R every tuple in r 1 = π R1 (r) extends to exactly one tuple in r. Thus, as depicted in Figure 8.5, the cardinality (the number of tuples) in r 1 equals the cardinality of r, and every tuple in r 1 joins with exactly one tuple in r 2 = π R2 (r) (if more than one tuple in r 2 joined with a tuple in r 1, (R 1 R 2 ) R 2 would not be an FD). Therefore, the cardinality of r 1 r 2 equals the cardinality of r 1, which in turn equals the cardinality of r. Since r must be a subset of r 1 r 2, it follows that r = r 1 r 2. Conversely, if neither of the above FDs holds, it is easy to construct a relation r such that r r 1 r 2. Details of this construction are left to exercise The above test can now be used to substantiate our intuition that the decomposition (8.1) of Person into Person1 and Hobby is a good one. The intersection of the attributes of Hobby and Person1 is{ssn}, and SSN is a key of Person1. Thus, this decomposition is lossless DependencyPreserving Decompositions Consider the schema HasAccount once again. Recall that it has attributes AccountNumber, ClientId, OfficeId, and its FDs are ClientId, OfficeId AccountNumber (8.6) AccountNumber OfficeId (8.7) According to the losslessness test, the following decomposition is lossless:
19 8.6 Properties of Decompositions 229 AcctOffice = (AccountNumber, OfficeId; {AccountNumber OfficeId}) (8.8) AcctClient = (AccountNumber, ClientId; { }) because AccountNumber (the intersection of the two attribute sets) is a key of the first schema, AcctOffice. Even though decomposition (8.8) is lossless, something seems to have fallen through the cracks. AcctOffice hosts the FD (8.7), but AcctClient s associated set of FDs is empty. This leaves the FD (8.6), which exists in the original schema, homeless. Neither of the two schemas in the decomposition has all of the attributes needed to house this FD; furthermore, the FD cannot be derived from the FDs that belong to the schemas AcctOffice and AcctClient. In practical terms, this means that even though decomposing relations over the schema HasAccount into relations over AcctOffice and AcctClient does not lead to information loss, it might incur a cost for maintaining the integrity constraint that corresponds to the lost FD. Unlike the FD (8.7), which can be checked locally, in the relation AcctOffice, verification of the FD (8.6) requires computing the join of AcctOffice and AcctClient before checking of the FD can begin. In such cases, we say that the decomposition does not preserve the dependencies in the original schema. We now define what this means. Consider a schema, R = (R; F), and suppose that R 1 = (R 1 ; F 1 ), R 2 = (R 2 ; F 2 ),..., R n = (R n ; F n ) is a decomposition. F entails each F i by definition, so F entails n i=1 F i. However, this definition does not require that the two sets of dependencies be equivalent that is, that n i=1 F i must also entail F. This reverse entailment is what is missing in the above example. The FD set of HasAccount, which consists of dependencies (8.6) and (8.7), is not entailed by the union of the dependencies in decomposition (8.8), which consists of only a single dependency AccountNumber OfficeId. Because of this, (8.8) is not a dependencypreserving decomposition. Formally, R 1 = (R 1 ; F 1 ), R 2 = (R 2 ; F 2 ),..., R n = (R n ; F n ) is said to be a dependencypreserving decomposition of R = (R; F) if and only if it is a decomposition and the sets of FDs F and n i=1 F i are equivalent. The fact that an FD, f,inf is not in any F i does not mean that the decomposition is not dependency preserving, since f might be entailed by n i=1 F i. In this case, maintaining f as a functional dependency requires no extra effort: If the FDs in n i=1 F i are maintained, f will be also. It is only when f is not entailed by n i=1 F i that the decomposition is not dependency preserving and so maintenance of f requires a join. In view of this definition, the above decomposition of HasAccount is not dependency preserving, and this is precisely what is wrong. The dependencies that exist in the original schema but are lost in the decomposition become interrelational constraints that cannot be maintained locally. Each time a relation in
20 230 Chapter 8 Database Design II: Relational Normalization Theory Figure 8.6 Decomposition of the HasAccount relation. AccountNumber ClientId OfficeId B SB01 A MN08 HasAccount AccountNumber OfficeId AccountNumber ClientId B123 A908 SB01 MN08 AcctOffice B A AcctClient Figure 8.7 HasAccount and its decomposition after the insertion of several rows. AccountNumber ClientId OfficeId B SB01 B SB01 A MN08 HasAccount AccountNumber OfficeId AccountNumber ClientId B123 B567 A908 SB01 SB01 MN08 AcctOffice B B A AcctClient the decomposition is changed, satisfaction of the interrelational constraints can be checked only after the reconstruction of the original relation. To illustrate, consider the decomposition of HasAccount in Figure 8.6. If we now add the tuple B567, SB01 to the relation AcctOffice and the tuple B567, to AcctClient, the two relations will still satisfy their local FDs (in fact, we see from (8.8) that only AcctOffice has a dependency to satisfy). In contrast, the interrelational FD (8.6) is not satisfied after these updates, but this is not immediately apparent. To verify this, we must join the two relations, as depicted in Figure 8.7. We now see that constraint (8.6) is violated by the first two tuples in the updated HasAccount relation.
Database Management Systems. Redundancy and Other Problems. Redundancy
Database Management Systems Winter 2004 CMPUT 391: Database Design Theory or Relational Normalization Theory Dr. Osmar R. Zaïane Lecture 2 Limitations of Relational Database Designs Provides a set of guidelines,
More informationLimitations of ER Designs. Relational Normalization Theory. Redundancy and Other Problems. Redundancy. Anomalies. Example
Limitations of ER Designs Relational Normalization Theory Chapter 6 Provides a set of guidelines, does not result in a unique database schema Does not provide a way of evaluating alternative schemas Normalization
More informationDatabase Design and Normalization
Database Design and Normalization CPS352: Database Systems Simon Miner Gordon College Last Revised: 9/27/12 Agenda Checkin Functional Dependencies (continued) Design Project ER Diagram Presentations
More informationRelational Database Design
Relational Database Design To generate a set of relation schemas that allows  to store information without unnecessary redundancy  to retrieve desired information easily Approach  design schema in appropriate
More informationDesign of Relational Database Schemas
Design of Relational Database Schemas T. M. Murali October 27, November 1, 2010 Plan Till Thanksgiving What are the typical problems or anomalies in relational designs? Introduce the idea of decomposing
More informationWeek 11: Normal Forms. Logical Database Design. Normal Forms and Normalization. Examples of Redundancy
Week 11: Normal Forms Database Design Database Redundancies and Anomalies Functional Dependencies Entailment, Closure and Equivalence Lossless Decompositions The Third Normal Form (3NF) The BoyceCodd
More informationSchema Refinement and Normalization
Schema Refinement and Normalization Module 5, Lectures 3 and 4 Database Management Systems, R. Ramakrishnan 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational
More informationFunctional Dependencies and Normalization
Functional Dependencies and Normalization 5DV119 Introduction to Database Management Umeå University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Functional
More informationChapter 7: Relational Database Design
Chapter 7: Relational Database Design Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 7: Relational Database Design Features of Good Relational Design Atomic Domains
More informationDatabase Design and Normal Forms
Database Design and Normal Forms Database Design coming up with a good schema is very important How do we characterize the goodness of a schema? If two or more alternative schemas are available how do
More informationLecture Notes on Database Normalization
Lecture Notes on Database Normalization Chengkai Li Department of Computer Science and Engineering The University of Texas at Arlington April 15, 2012 I decided to write this document, because many students
More informationSchema Design and Normal Forms Sid Name Level Rating Wage Hours
EntityRelationship Diagram Schema Design and Sid Name Level Rating Wage Hours Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Database Management Systems, 2 nd Edition. R. Ramakrishnan
More informationChapter 10. Functional Dependencies and Normalization for Relational Databases
Chapter 10 Functional Dependencies and Normalization for Relational Databases Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant
More informationAdvanced Relational Database Design
APPENDIX B Advanced Relational Database Design In this appendix we cover advanced topics in relational database design. We first present the theory of multivalued dependencies, including a set of sound
More informationFunctional Dependencies and Finding a Minimal Cover
Functional Dependencies and Finding a Minimal Cover Robert Soulé 1 Normalization An anomaly occurs in a database when you can update, insert, or delete data, and get undesired sideeffects. These side
More informationDatabases Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases  Normalization III 1 / 31
Databases Normalization III (N Spadaccini 2010 and W Liu 2012) Databases  Normalization III 1 / 31 This lecture This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases 
More informationSchema Refinement, Functional Dependencies, Normalization
Schema Refinement, Functional Dependencies, Normalization MSCI 346: Database Systems Güneş Aluç, University of Waterloo Spring 2015 MSCI 346: Database Systems Chapter 19 1 / 42 Outline 1 Introduction Design
More informationRelational Database Design: FD s & BCNF
CS145 Lecture Notes #5 Relational Database Design: FD s & BCNF Motivation Automatic translation from E/R or ODL may not produce the best relational design possible Sometimes database designers like to
More informationWhy Is This Important? Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) Example (Contd.)
Why Is This Important? Schema Refinement and Normal Forms Chapter 19 Many ways to model a given scenario in a database How do we find the best one? We will discuss objective criteria for evaluating database
More informationIntroduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 5: Normalization II; Database design case studies. September 26, 2005
Introduction to Databases, Fall 2005 IT University of Copenhagen Lecture 5: Normalization II; Database design case studies September 26, 2005 Lecturer: Rasmus Pagh Today s lecture Normalization II: 3rd
More informationLimitations of DB Design Processes
Normalization CS 317/387 1 Limitations of DB Design Processes Provides a set of guidelines, does not result in a unique database schema Does not provide a way of evaluating alternative schemas Pitfalls:
More informationFunctional Dependency and Normalization for Relational Databases
Functional Dependency and Normalization for Relational Databases Introduction: Relational database design ultimately produces a set of relations. The implicit goals of the design activity are: information
More informationTheory of Relational Database Design and Normalization
Theory of Relational Database Design and Normalization (Based on Chapter 14 and some part of Chapter 15 in Fundamentals of Database Systems by Elmasri and Navathe, Ed. 3) 1 Informal Design Guidelines for
More informationDatabase design theory, Part I
Database design theory, Part I Functional dependencies Introduction As we saw in the last segment, designing a good database is a non trivial matter. The E/R model gives a useful rapid prototyping tool,
More informationCS143 Notes: Normalization Theory
CS143 Notes: Normalization Theory Book Chapters (4th) Chapters 7.16, 7.8, 7.10 (5th) Chapters 7.16, 7.8 (6th) Chapters 8.16, 8.8 INTRODUCTION Main question How do we design good tables for a relational
More informationNormalization in OODB Design
Normalization in OODB Design Byung S. Lee Graduate Programs in Software University of St. Thomas St. Paul, Minnesota bslee@stthomas.edu Abstract When we design an objectoriented database schema, we need
More informationCSCIGA.2433001 Database Systems Lecture 7: Schema Refinement and Normalization
CSCIGA.2433001 Database Systems Lecture 7: Schema Refinement and Normalization Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com View 1 View 2 View 3 Conceptual Schema At that point we
More informationCOSC344 Database Theory and Applications. Lecture 9 Normalisation. COSC344 Lecture 9 1
COSC344 Database Theory and Applications Lecture 9 Normalisation COSC344 Lecture 9 1 Overview Last Lecture Functional Dependencies This Lecture Normalisation Introduction 1NF 2NF 3NF BCNF Source: Section
More information6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization
6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization Animals(name,age,species,cageno,keptby,feedtime) Keeper(id,name)
More informationDetermination of the normalization level of database schemas through equivalence classes of attributes
Computer Science Journal of Moldova, vol.17, no.2(50), 2009 Determination of the normalization level of database schemas through equivalence classes of attributes Cotelea Vitalie Abstract In this paper,
More informationnormalisation Goals: Suppose we have a db scheme: is it good? define precise notions of the qualities of a relational database scheme
Goals: Suppose we have a db scheme: is it good? Suppose we have a db scheme derived from an ER diagram: is it good? define precise notions of the qualities of a relational database scheme define algorithms
More informationCS 377 Database Systems. Database Design Theory and Normalization. Li Xiong Department of Mathematics and Computer Science Emory University
CS 377 Database Systems Database Design Theory and Normalization Li Xiong Department of Mathematics and Computer Science Emory University 1 Relational database design So far Conceptual database design
More informationDatabase Systems Concepts, Languages and Architectures
These slides are for use with Database Systems Concepts, Languages and Architectures Paolo Atzeni Stefano Ceri Stefano Paraboschi Riccardo Torlone To view these slides onscreen or with a projector use
More informationChapter 15 Basics of Functional Dependencies and Normalization for Relational Databases
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases Copyright 2011 Pearson Education, Inc. Publishing as Pearson AddisonWesley Chapter 15 Outline Informal Design Guidelines
More informationDatabase Management System
UNIT 6 Database Design Informal Design Guidelines for Relation Schemas; Functional Dependencies; Normal Forms Based on Primary Keys; General Definitions of Second and Third Normal Forms; BoyceCodd Normal
More informationC# Cname Ccity.. P1# Date1 Qnt1 P2# Date2 P9# Date9 1 Codd London.. 1 21.01 20 2 23.01 2 Martin Paris.. 1 26.10 25 3 Deen London.. 2 29.
4. Normalisation 4.1 Introduction Suppose we are now given the task of designing and creating a database. How do we produce a good design? What relations should we have in the database? What attributes
More informationTheory behind Normalization & DB Design. Satisfiability: Does an FD hold? Lecture 12
Theory behind Normalization & DB Design Lecture 12 Satisfiability: Does an FD hold? Satisfiability of FDs Given: FD X Y and relation R Output: Does R satisfy X Y? Algorithm: a.sort R on X b.do all the
More informationChapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES
1 Chapter 5: FUNCTIONAL DEPENDENCIES AND NORMALIZATION FOR RELATIONAL DATABASES INFORMAL DESIGN GUIDELINES FOR RELATION SCHEMAS We discuss four informal measures of quality for relation schema design in
More informationRelational Database Design Theory
Relational Database Design Theory Informal guidelines for good relational designs Functional dependencies Normal forms and normalization 1NF, 2NF, 3NF BCNF, 4NF, 5NF Inference rules on functional dependencies
More informationChapter 10 Functional Dependencies and Normalization for Relational Databases
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright 2004 Pearson Education, Inc. Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of
More informationDatabase Design and Normalization
Database Design and Normalization Chapter 10 (Week 11) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 1 Computing Closure F + Example: List all FDs with:  a single
More informationIntroduction Decomposition Simple Synthesis Bernstein Synthesis and Beyond. 6. Normalization. Stéphane Bressan. January 28, 2015
6. Normalization Stéphane Bressan January 28, 2015 1 / 42 This lecture is based on material by Professor Ling Tok Wang. CS 4221: Database Design The Relational Model Ling Tok Wang National University of
More informationFormal Languages and Automata Theory  Regular Expressions and Finite Automata 
Formal Languages and Automata Theory  Regular Expressions and Finite Automata  Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March
More informationIntroduction to Database Systems. Normalization
Introduction to Database Systems Normalization Werner Nutt 1 Normalization 1. Anomalies 1. Anomalies 2. BoyceCodd Normal Form 3. 3 rd Normal Form 2 Anomalies The goal of relational schema design is to
More informationNormalization in Database Design
in Database Design Marek Rychly mrychly@strathmore.edu Strathmore University, @ilabafrica & Brno University of Technology, Faculty of Information Technology Advanced Databases and Enterprise Systems 14
More informationSQL DDL. DBS Database Systems Designing Relational Databases. Inclusion Constraints. Key Constraints
DBS Database Systems Designing Relational Databases Peter Buneman 12 October 2010 SQL DDL In its simplest use, SQL s Data Definition Language (DDL) provides a name and a type for each column of a table.
More informationDatabase Constraints and Design
Database Constraints and Design We know that databases are often required to satisfy some integrity constraints. The most common ones are functional and inclusion dependencies. We ll study properties of
More information2. Basic Relational Data Model
2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that
More informationNormalisation. Why normalise? To improve (simplify) database design in order to. Avoid update problems Avoid redundancy Simplify update operations
Normalisation Why normalise? To improve (simplify) database design in order to Avoid update problems Avoid redundancy Simplify update operations 1 Example ( the practical difference between a first normal
More informationNormalisation to 3NF. Database Systems Lecture 11 Natasha Alechina
Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina In This Lecture Normalisation to 3NF Data redundancy Functional dependencies Normal forms First, Second, and Third Normal Forms For more
More informationChapter 10. Functional Dependencies and Normalization for Relational Databases. Copyright 2007 Ramez Elmasri and Shamkant B.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline 1 Informal Design Guidelines for Relational Databases
More informationRelational Normalization Theory (supplemental material)
Relational Normalization Theory (supplemental material) CSE 532, Theory of Database Systems Stony Brook University http://www.cs.stonybrook.edu/~cse532 2 Quiz 8 Consider a schema S with functional dependencies:
More informationTheory of Relational Database Design and Normalization
Theory of Relational Database Design and Normalization (Based on Chapter 14 and some part of Chapter 15 in Fundamentals of Database Systems by Elmasri and Navathe) 1 Informal Design Guidelines for Relational
More informationDatabase Sample Examination
Part 1: SQL Database Sample Examination (Spring 2007) Question 1: Draw a simple ER diagram that results in a primary key/foreign key constraint to be created between the tables: CREATE TABLE Salespersons
More informationMathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson
Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement
More informationKNOWLEDGE FACTORING USING NORMALIZATION THEORY
KNOWLEDGE FACTORING USING NORMALIZATION THEORY J. VANTHIENEN M. SNOECK Katholieke Universiteit Leuven Department of Applied Economic Sciences Dekenstraat 2, 3000 Leuven (Belgium) tel. (+32) 16 28 58 09
More informationDATABASE NORMALIZATION
DATABASE NORMALIZATION Normalization: process of efficiently organizing data in the DB. RELATIONS (attributes grouped together) Accurate representation of data, relationships and constraints. Goal:  Eliminate
More informationFunctional Dependencies
BCNF and 3NF Functional Dependencies Functional dependencies: modeling constraints on attributes studid name address courseid sessionid classroom instructor Functional dependencies should be obtained
More informationCarnegie Mellon Univ. Dept. of Computer Science 15415  Database Applications. Overview  detailed. Goal. Faloutsos CMU SCS 15415
Faloutsos 15415 Carnegie Mellon Univ. Dept. of Computer Science 15415  Database Applications Lecture #17: Schema Refinement & Normalization  Normal Forms (R&G, ch. 19) Overview  detailed DB design
More informationBCA. Database Management System
BCA IV Sem Database Management System Multiple choice questions 1. A Database Management System (DBMS) is A. Collection of interrelated data B. Collection of programs to access data C. Collection of data
More informationAn Algorithmic Approach to Database Normalization
An Algorithmic Approach to Database Normalization M. Demba College of Computer Science and Information Aljouf University, Kingdom of Saudi Arabia bah.demba@ju.edu.sa ABSTRACT When an attempt is made to
More informationNormalisation in the Presence of Lists
1 Normalisation in the Presence of Lists Sven Hartmann, Sebastian Link Information Science Research Centre, Massey University, Palmerston North, New Zealand 1. Motivation & Revision of the RDM 2. The Brouwerian
More informationReading 13 : Finite State Automata and Regular Expressions
CS/Math 24: Introduction to Discrete Mathematics Fall 25 Reading 3 : Finite State Automata and Regular Expressions Instructors: Beck Hasti, Gautam Prakriya In this reading we study a mathematical model
More informationTesting LTL Formula Translation into Büchi Automata
Testing LTL Formula Translation into Büchi Automata Heikki Tauriainen and Keijo Heljanko Helsinki University of Technology, Laboratory for Theoretical Computer Science, P. O. Box 5400, FIN02015 HUT, Finland
More informationObjectives of Database Design Functional Dependencies 1st Normal Form Decomposition BoyceCodd Normal Form 3rd Normal Form Multivalue Dependencies
Objectives of Database Design Functional Dependencies 1st Normal Form Decomposition BoyceCodd Normal Form 3rd Normal Form Multivalue Dependencies 4th Normal Form General view over the design process 1
More informationQuestion 1. Relational Data Model [17 marks] Question 2. SQL and Relational Algebra [31 marks]
EXAMINATIONS 2005 MIDYEAR COMP 302 Database Systems Time allowed: Instructions: 3 Hours Answer all questions. Make sure that your answers are clear and to the point. Write your answers in the spaces provided.
More informationJordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010
Jordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010 Student Name: ID: Part 1: MultipleChoice Questions (17 questions, 1
More informationOPRE 6201 : 2. Simplex Method
OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2
More informationRELATIONAL DATABASE DESIGN
RELATIONAL DATABASE DESIGN g Good database design  Avoiding anomalies g Functional Dependencies g Normalization & Decomposition Using Functional Dependencies g 1NF  Atomic Domains and First Normal Form
More informationSUBGROUPS OF CYCLIC GROUPS. 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by
SUBGROUPS OF CYCLIC GROUPS KEITH CONRAD 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by g = {g k : k Z}. If G = g, then G itself is cyclic, with g as a generator. Examples
More informationINCIDENCEBETWEENNESS GEOMETRY
INCIDENCEBETWEENNESS GEOMETRY MATH 410, CSUSM. SPRING 2008. PROFESSOR AITKEN This document covers the geometry that can be developed with just the axioms related to incidence and betweenness. The full
More informationMCQs~Databases~Relational Model and Normalization http://en.wikipedia.org/wiki/database_normalization
http://en.wikipedia.org/wiki/database_normalization Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy. Normalization usually involves
More informationC H A P T E R Regular Expressions regular expression
7 CHAPTER Regular Expressions Most programmers and other powerusers of computer systems have used tools that match text patterns. You may have used a Web search engine with a pattern like travel cancun
More informationInformation, Entropy, and Coding
Chapter 8 Information, Entropy, and Coding 8. The Need for Data Compression To motivate the material in this chapter, we first consider various data sources and some estimates for the amount of data associated
More informationCOMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the EntityRelationship Model
COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the EntityRelationship Model The entityrelationship (ER) model is a a data model in which information stored
More informationAutomata and Formal Languages
Automata and Formal Languages Winter 20092010 Yacov HelOr 1 What this course is all about This course is about mathematical models of computation We ll study different machine models (finite automata,
More informationGraham Kemp (telephone 772 54 11, room 6475 EDIT) The examiner will visit the exam room at 15:00 and 17:00.
CHALMERS UNIVERSITY OF TECHNOLOGY Department of Computer Science and Engineering Examination in Databases, TDA357/DIT620 Tuesday 17 December 2013, 14:0018:00 Examiner: Results: Exam review: Grades: Graham
More informationModule 5: Normalization of database tables
Module 5: Normalization of database tables Normalization is a process for evaluating and correcting table structures to minimize data redundancies, thereby reducing the likelihood of data anomalies. The
More informationAnswer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division
Answer Key UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division CS186 Fall 2003 Eben Haber Midterm Midterm Exam: Introduction to Database Systems This exam has
More informationYou know from calculus that functions play a fundamental role in mathematics.
CHPTER 12 Functions You know from calculus that functions play a fundamental role in mathematics. You likely view a function as a kind of formula that describes a relationship between two (or more) quantities.
More informationWHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT?
WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? introduction Many students seem to have trouble with the notion of a mathematical proof. People that come to a course like Math 216, who certainly
More informationLiTH, Tekniska högskolan vid Linköpings universitet 1(7) IDA, Institutionen för datavetenskap Juha Takkinen 20070524
LiTH, Tekniska högskolan vid Linköpings universitet 1(7) IDA, Institutionen för datavetenskap Juha Takkinen 20070524 1. A database schema is a. the state of the db b. a description of the db using a
More information1 Uncertainty and Preferences
In this chapter, we present the theory of consumer preferences on risky outcomes. The theory is then applied to study the demand for insurance. Consider the following story. John wants to mail a package
More informationNormalization. CIS 331: Introduction to Database Systems
Normalization CIS 331: Introduction to Database Systems Normalization: Reminder Why do we need to normalize? To avoid redundancy (less storage space needed, and data is consistent) To avoid update/delete
More informationTOPOLOGY: THE JOURNEY INTO SEPARATION AXIOMS
TOPOLOGY: THE JOURNEY INTO SEPARATION AXIOMS VIPUL NAIK Abstract. In this journey, we are going to explore the so called separation axioms in greater detail. We shall try to understand how these axioms
More informationQuiz 3: Database Systems I Instructor: Hassan Khosravi Spring 2012 CMPT 354
Quiz 3: Database Systems I Instructor: Hassan Khosravi Spring 2012 CMPT 354 1. [10] Show that each of the following are not valid rules about FD s by giving a small example relations that satisfy the given
More informationDesign Theory for Relational Databases: Functional Dependencies and Normalization
Design Theory for Relational Databases: Functional Dependencies and Normalization Juliana Freire Some slides adapted from L. Delcambre, R. Ramakrishnan, G. Lindstrom, J. Ullman and Silberschatz, Korth
More informationTheory I: Database Foundations
Theory I: Database Foundations 19. 19. Theory I: Database Foundations 07.2012 1 Theory I: Database Foundations 20. Formal Design 20. 20: Formal Design We want to distinguish good from bad database design.
More informationLecture 6. SQL, Logical DB Design
Lecture 6 SQL, Logical DB Design Relational Query Languages A major strength of the relational model: supports simple, powerful querying of data. Queries can be written intuitively, and the DBMS is responsible
More informationBoyceCodd Normal Form
4NF BoyceCodd Normal Form A relation schema R is in BCNF if for all functional dependencies in F + of the form α β at least one of the following holds α β is trivial (i.e., β α) α is a superkey for R
More informationChapter 7: Relational Database Design
Chapter 7: Relational Database Design Pitfalls in Relational Database Design Decomposition Normalization Using Functional Dependencies Normalization Using Multivalued Dependencies Normalization Using Join
More information11 Ideals. 11.1 Revisiting Z
11 Ideals The presentation here is somewhat different than the text. In particular, the sections do not match up. We have seen issues with the failure of unique factorization already, e.g., Z[ 5] = O Q(
More informationChapter 23. Database Security. Security Issues. Database Security
Chapter 23 Database Security Security Issues Legal and ethical issues Policy issues Systemrelated issues The need to identify multiple security levels 2 Database Security A DBMS typically includes a database
More informationNotes on Complexity Theory Last updated: August, 2011. Lecture 1
Notes on Complexity Theory Last updated: August, 2011 Jonathan Katz Lecture 1 1 Turing Machines I assume that most students have encountered Turing machines before. (Students who have not may want to look
More informationThe process of database development. Logical model: relational DBMS. Relation
The process of database development Reality (Universe of Discourse) Relational Databases and SQL Basic Concepts The 3rd normal form Structured Query Language (SQL) Conceptual model (e.g. EntityRelationship
More informationRelational Normalization: Contents. Relational Database Design: Rationale. Relational Database Design. Motivation
Relational Normalization: Contents Motivation Functional Dependencies First Normal Form Second Normal Form Third Normal Form BoyceCodd Normal Form Decomposition Algorithms Multivalued Dependencies and
More informationa 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
More information[Refer Slide Time: 05:10]
Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture
More informationLecture 2 Normalization
MIT 533 ระบบฐานข อม ล 2 Lecture 2 Normalization Walailuk University Lecture 2: Normalization 1 Objectives The purpose of normalization The identification of various types of update anomalies The concept
More informationIntroduction to Database Systems. Chapter 4 Normal Forms in the Relational Model. Chapter 4 Normal Forms
Introduction to Database Systems Winter term 2013/2014 Melanie Herschel melanie.herschel@lri.fr Université Paris Sud, LRI 1 Chapter 4 Normal Forms in the Relational Model After completing this chapter,
More information