4NF
Boyce-Codd Normal Form A relation schema R is in BCNF if for all functional dependencies in F + of the form α β at least one of the following holds α β is trivial (i.e., β α) α is a superkey for R bor_loan = (customer_id, loan_number, amount) is not in BCNF loan_number amount holds on bor_loan but loan_number is not a superkey CMPT 354: Database I -- 4NF 2
Redundancy May Still Exist in BCNF Database classes(course, teacher, book) (c,t,b) classes means that t is qualified to teach c, and b is a required textbook for c course teacher book Hank Hank Sudarshan Sudarshan Jim Jim DB Concepts Ullman DB Concepts Ullman DB Concepts Ullman OS Concepts Shaw OS Concepts Shaw CMPT 354: Database I -- 4NF 3
Insertion Anomalies Relation classes(course, teacher, book) is in BCNF Insertion anomalies If Sara is a new teacher that can teach, two tuples need to be inserted (, Sara, DB Concepts) (, Sara, Ullman) CMPT 354: Database I -- 4NF 4
Decomposition to BCNF course teacher book Hank Hank Sudarshan Sudarshan Jim Jim DB Concepts Ullman DB Concepts Ullman DB Concepts Ullman OS Concepts Shaw OS Concepts Shaw course teaches teacher Hank Sudarshan Jim course text book DB Concepts Ullman OS Concepts Shaw CMPT 354: Database I -- 4NF 5
Multivalued Dependency Let R be a relation schema and let α R and β R The multivalued dependency α β holds on R if in any legal relation r(r), for all pairs for tuples t 1 and t 2 in r such that t 1 [α] = t 2 [α], there exist tuples t 3 and t 4 in r such that: t 1 [α] = t 2 [α] = t 3 [α] = t 4 [α] t 3 [β] = t 1 [β] t 3 [R β] = t 2 [R β] t 4 [β] = t 2 [β] t 4 [R β] = t 1 [R β] CMPT 354: Database I -- 4NF 6
Example In relation classes(course, teacher, book), course teacher course book Given a particular course, it has associated with a set of teachers and a set of values of books, and these two sets are in some sense independent of each other If Y Z then Y Z For relation r not satisfying a given multivalued dependency, relations r can be constructed to satisfy the MVD by adding tuples to r How? CMPT 354: Database I -- 4NF 7
Fourth Normal Form (4NF) A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D + of the form α β, where α R and β R, at least one of the following hold α β is trivial (i.e., β αor α β= R) α is a superkey for schema R If a relation is in 4NF, it is in BCNF CMPT 354: Database I -- 4NF 8
Restriction of MV Dependencies D is a set of functional and multivalued dependencies The restriction of D to R i is the set D i consisting of All functional dependencies in D + that include only attributes of R i All multivalued dependencies of the form α (β R i ) where α R i and α β is in D + CMPT 354: Database I -- 4NF 9
4NF Decomposition Algorithm result: = {R}; done := false; compute D + ; Let D i denote the restriction of D + to R i while (not done) if (there is a schema R i in result that is not in 4NF) then begin let α β be a nontrivial multivalued dependency that holds on R i such that α R i is not in D i, and α β=φ; result := (result - R i ) (R i - β) (α, β); end else done:= true; Each R i is in 4NF, and decomposition is lossless-join CMPT 354: Database I -- 4NF 10
Example R =(A, B, C, G, H, I), F ={ A B, B HI, CG H } R is not in 4NF A B and A is not a superkey for R Decomposition a) R 1 = (A, B) (R 1 is in 4NF) b) R 2 = (A, C, G, H, I) (R 2 is not in 4NF) c) R 3 = (C, G, H) (R 3 is in 4NF) d) R 4 = (A, C, G, I) (R 4 is not in 4NF) Since A B and B HI, A HI, A I e) R 5 = (A, I) (R 5 is in 4NF) f)r 6 = (A, C, G) (R 6 is in 4NF) Result: (A, B), (C, G, H), (A, I), (A, C, G) CMPT 354: Database I -- 4NF 11
How to Come up with a Design? R could have been generated when converting E-R diagram to a set of tables R could have been a single relation containing all attributes that are of interest universal relation Normalization breaks R into smaller relations R could have been the result of some ad hoc design of relations, which we then test / convert to normal form CMPT 354: Database I -- 4NF 12
ER Model and Normalization When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated from the E-R diagram should not need further normalization In a real (imperfect) design, there can be FDs from non-key attributes of an entity to other attributes of the entity Employee entity with attributes department-number and department-address, and an FD department-number department-address Good design would have made department an entity FDs from non-key attributes of a relationship set are possible, but rare most relationships are binary CMPT 354: Database I -- 4NF 13
The Universal Relation Approach Dangling tuples: tuples that disappear in a join Let r 1 (R 1 ),., r n (R n ) be a set of relations A tuple r of the relation r i is a dangling tuple if r is not in the relation: Ri (r 1 r n ) The relation r 1 r n is called a universal relation it involves all the attributes in the universe defined by R 1 R 2 R n CMPT 354: Database I -- 4NF 14
Denormalization for Performance May use non-normalized schema for performance Displaying customer-name along with account-number and balance requires join of account with depositor Alternative 1: Use a denormalized relation containing attributes of account as well as depositor with all above attributes Faster lookup Extra space and extra execution time for updates Extra coding work for programmer and possibility of error in extra code Alternative 2: use a materialized view defined as account depositor Benefits and drawbacks same as above, except no extra coding work for programmer and avoiding possible errors CMPT 354: Database I -- 4NF 15
Other Design Issues Some aspects of design are not caught by normalization Instead of earnings (company_id, year, amount ), use company_year(company_id, earnings_2000, earnings_2001, earnings_2002) In BCNF, but make querying across years difficult and needs new table each year Require new attribute each year An example of a crosstab, where values for one attribute become column names Used in spreadsheets, and in data analysis tools CMPT 354: Database I -- 4NF 16
Modeling Temporal Data Temporal data have an association time interval during which the data are valid A snapshot is the value of the data at a particular point in time Adding a temporal component results in functional dependencies like customer_id customer_street customer_city not to hold, because the address varies over time A temporal functional dependency holds on schema R if the corresponding functional dependency holds on all snapshots for all legal instances r (R ) CMPT 354: Database I -- 4NF 17
Summary and To-Do List Multivalued dependencies and 4NF ER model and normalization Normalization and denormalization Why a relation in 4NF is also in BCNF? CMPT 354: Database I -- 4NF 18
The Banking Schema branch = (branch_name, branch_city, assets) customer = (customer_id, customer_name, customer_street, customer_city) loan = (loan_number, amount) account = (account_number, balance) employee = (employee_id. employee_name, telephone_number, start_date) dependent_name = (employee_id, dname) account_branch = (account_number, branch_name) loan_branch = (loan_number, branch_name) borrower = (customer_id, loan_number) depositor = (customer_id, account_number) cust_banker = (customer_id, employee_id, type) works_for = (worker_employee_id, manager_employee_id) payment = (loan_number, payment_number, payment_date, payment_amount) savings_account = (account_number, interest_rate) checking_account = (account_number, overdraft_amount) CMPT 354: Database I -- 4NF 19