Department of Computer and Information Science Examination paper for TDT4145 Data Modelling, Databases and Database Management Systems Academic contact during examination: Svein Erik Bratsberg: 995 39 963 Roger Midtstraum: 995 72 420 Examination date: 1 st of June 2015 Examination time (from-to): 09:00-13:00 Permitted examination support material: D No written and handwritten examination support materials are permitted. A specified, simple calculator is permitted. Other information: Language: English Number of pages: 6 Number of pages of attachments: 0 Checked by: Svein-Olaf Hvasshovd (sign.) Date Signature
Problem 1 Data Models (20 %) a) (8 %) The ER diagram below gives an example of an ER diagram with a specialization. Show how you would translate this ER model into a schema for a relational database (show necessary tables with attributes, primary keys and foreign keys). Give a brief overview of the alternatives you have considered and explain why you consider your proposal to be the best choice. b) (12 %) The following ER diagram provides a data model for a simple vehicle database application. Your task is to take this ER model and expand the model in order to accommodate data about accidents where one or more vehicles are involved. For an accident we would like to store a describing text, GPS position, date, time and accident type ( material damage, personal injury, lethal ). It should be possible to register all vehicles and persons involved in an accident. For a vehicle involved in an accident, we must be able to store the driver, passengers in the vehicle, if any, and the damage inflicted on the vehicle ( none, minor, major, wrecked ). For a person, we will register the level of injury ( none, minor, major, life threatening, fatal ) suffered in the accident. For a driver it must be possible to register whether the person had a valid driver s license or not, and Page 2 of 6
whether the person was impaired by alcohol or other drugs ( unknown, sober, intoxicated ). You may use all ER modelling concepts from the curriculum. State any assumptions you find necessary. Problem 2 Relational Algebra and SQL (20 %) Use the following relational database (primary keys are underlined), which is designed to store data for a simple bird observations application: BirdGroups(BGID, GroupName) BirdSpecies(BSID, SpeciesName, PrevalenceStatus, Prevalence, BirdGroupID) BirdGroupID is a foreign key against the BirdGroups table. BirdGroupID is not allowed to have the NULL value. Observation(ObsNo, ObsDay, ObsMonth, ObsYear, SpeciesID, LocationNo, BirdCount) SpeciesID is a foreign key against the BirdSpecies table. LocationNo is a foreign key against the Location table. SpeciesID and LocationNo are not allowed to have the NULL value. Location(LNo, LocName, Description, MunicipalityNo) MunicipalityNo is a foreign key against the Municipality table. MunicipalityNo is not allowed to have the NULL value. Municipality(Mno, MunName) Relational algebra can be stated as text or graphs. If you master both notations we encourage you to state the answers in the graphical form. However, a correct query stated as text will get full score. a) (3 %) Write a query in relational algebra that finds the species name of all bird species, which are migratory (PrevalenceStatus has the value migratory ). b) (4 %) Write a query in relational algebra that finds the species name of all bird species, which have been observed in Trondheim municipality in 2015. c) (4 %) Write a query in SQL that finds the species name of all bird species in the Thrushes bird group. The result should be sorted in alphabetical order. d) (4 %) Write a query in SQL that finds the species name, the number of observations of that species in 2015, and the total number of observed birds (bird count) of that species in 2015, for all bird species observed in 2015. The result should be sorted on number of birds in descending order. e) (5 %) Write a query in SQL that finds the species name of all bird species, which have been observed in Trondheim municipality but not in Trondheim in 2015. NB: The text continues on the next page! Page 3 of 6
Problem 3 Theory (20 %) a) (3 %) Consider the table Birds(SpeciesID, SpeciesName, BirdGroup, Status), which contains the following data: Find the set of functional dependencies, which you think should hold for this table. State any assumptions you find necessary. You should not include trivial functional dependencies or functional dependencies, which can be deduced from functional dependencies in your answer. b) (3 %) Consider R = {A, B, C, D, E} and F = {A -> B, B ->C, CD->E}. Find all candidate keys in R. You have to explain your answer. c) (4 %) Consider R = {A, B, C, D, E} and F = {A -> B, B ->C, CD->E}. Assume that R fulfils the requirements for first normal form. Decide which normal form is the highest normal form satisfied by R. You have to explain your answer. d) (6 %) Consider R = {A, B, C, D, E} and F = {A -> B, B ->C, CD->E}. A possible decomposition of R is R1(A, B), R2(B, C) and R3(C, D, E). Is this a good decomposition? You have to explain your answer. e) (4 %) Consider the table Teaching(Lecturer, Subject, Campus). Assume that the multivalue dependencies Lecturer ->> Subject and Lecturer ->> Campus are valid, and that the following tuples (rows) are stored in the table: (svein, db, gløs), (rune, db, gjøvik), (tore, db, kalvskinnet) and (mads, os, gløs). Draw the table with columns and rows and show all the data (rows) that necessarily have to be stored in the table. NB: The text continues on the next page! Page 4 of 6
Problem 4 Storage and scaling (5 %) You work in a company which launches a web shop where the number of customers and orders is uncertain. The database consists of the tables Customer, Order, Product and OrderLine. Customers are identified by email address, and almost all accesses to the tables are done directly by primary key, i.e., email address, customer number, order number and product number. Which storage/indexing structure(s) do you think is wise to use when you are unsure if it is going to be 1000 or 1000000 customers? Explain your answer. Problem 5 B+-trees and query processing (15 %) Assume the following table: CREATE TABLE Employee (empno INT PRIMARY KEY, lastname CHAR(30), firstname CHAR(30), email CHAR(30), startyear INT, salary INT); Assume the table to be stored in a clustered B+-tree with 1500 blocks on the leaf level. Empno is the search key in the B+-tree. We assume the B+-tree to have height 3. a) (10 %) Make an estimate on how many blocks which are accessed (read and written) by the following SQL queries: i. INSERT INTO Employee VALUES (123123,'Hansen','Hans','hans@email.org',2015, 100000); ii. SELECT lastname, firstname, email, startyear, salary FROM Employee WHERE empno = 123123; iii. SELECT * FROM Employee; iv. SELECT COUNT(*) FROM Employee WHERE empno > 100000; Assume that the first employee has empno = 10000 and that all numbers to 123123 are used. Give an explanation for each answer. b) (5 %) If the query SELECT * FROM Employee WHERE startyear=2015 is run frequently, would you have changed the storage/indexing? Why/why not? Problem 6 Transactions and recoverability (10 %) Assume the following schedules: S1: r1(a); w1(a); r2(b); c1; r2(a); w2(a); c2; S2: r1(a); w1(a); w2(a); c2; c1; S3: r1(a); r2(c); r1(c); r3(a); r3(b); w1(a); w3(b); r2(b); w2(c); w2(b); c1; c2; c3; Decide the recoverability properties of the schedules (unrecoverable, recoverable, ACA, strict). Give an explanation for each case. Page 5 of 6
Problem 7 Transactions - recovery (10 %) Assume the use of ARIES recovery and let A, B, and C be data elements. After a crash the following log was found. There is one log record in each row of the table: LSN Last_lsn Transaction OpType Page_id Other_info 101 0 T1 Update C 102 0 T2 Update B 103 101 T1 Commit 104 Begin_ckpt 105 End_ckpt 106 0 T3 Update A 107 102 T2 Update C 108 107 T2 Commit a) Assume T1, T2 and T3 to be the transactions that exist and that the transaction table in log record 105 is this: Transaction Last_lsn Status T1 103 Commit T2 102 In progress The Dirty Page Table that exists in log record 105 is this one: Page_id Rec_lsn C 101 B 102 What are the contents of the transaction table and Dirty Page Table after the analysis phase of the recovery? b) Which log records will be added to the log during the undo phase of recovery? Page 6 of 6