ITM661 Database Systems Lecture 3/1 The Relational Model: Relational Algebra and Relational Calculus T. Connolly, and C. Begg, Database Systems: A Practical Approach to Design, Implementation, and Management, 5th edition, Addison-Wesley, 2009. ISBN: 0-321-60110-6, ISBN-13: 978-0-321-60110-0 (International Edition). T. Connolly, and C. Begg, Database Systems: A Practical Approach to Design, Implementation, and Management, 4th edition, Addison-Wesley, 2004. ISBN: 0-321-21025-5. R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, 5th ed., Pearson, 2007, ISBN: 0-321-41506-X.
Objectives Overview of the relational model, e.g., how tables represent data. The connection between mathematical relations and relations in the relational model. Properties of database relations How to identify candidate, primary, and foreign keys. The meaning of entity integrity and referential integrity. How to form queries in relational algebra. How relational calculus queries are expressed. The purpose and advantages of views in relational systems. ITS322 - DBMSs The Relation Model 2
An Example (Branch and Staff Relations) A relation is a table with columns and rows. Only applies to logical structure of the database, not the physical structure. An attribute is a named column of a relation. A domain is a set of allowable values for one or more attributes. A tuple is a row of a relation. A degree is a number of attributes in a relation. A cardinality is a number of tuples in a relation. A relational database is a collection of normalized relations. ITS322 - DBMSs The Relation Model 3
Terminology for Relational Model and Attribute Domain Terminology Attribute Domain ITS322 - DBMSs The Relation Model 4
Mathematical Relations (I) Mathematical definition of relation Consider two sets D1 = {2, 4} and D2 = {1, 3, 5}. Cartesian product is D1D2, the set of all ordered pairs, 1 st element is member of D1 and 2 nd element is member of D2. Alternative way is to find all combinations of elements with first from D1 and second from D2. D1D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)} Another example: D3 = { Mr. X, Ms. Y } and D4 = { M, F } (i.e., M=Male, F=Female) D3D4 = { ( Mr. X, M ), ( Mr. X, F ), ( Ms. Y, M ), ( Ms. Y, F ) } ITS322 - DBMSs The Relation Model 5
Mathematical Relations (II) Any subset of Cartesian product is a relation. R = {(2, 1), (4, 1)} May specify which pairs are in relation using some condition for selection. For example, the second element is 1 R = {(x, y) x D1, y D2, and y = 1} Using same sets, form another relation S, where first element is always twice the second. S = {(x, y) x D1, y D2, and x = 2y} Only one ordered pair in the Cartesian Product satisfies this condition. S = {(2, 1)} ITS322 - DBMSs The Relation Model 6
Mathematical Relations (III) Consider three sets D1, D2, and D3 with Cartesian Product D1D2D3. For example D1 = {1, 3}, D2 = {2, 4}, D3 = {5, 6} D1D2D3 = {(1,2,5), (1,2,6), (1,4,5), (1,4,6), (3,2,5), (3,2,6), (3,4,5), (3,4,6)} Any subset of these ordered triples is a relation. T = {(x, y, z) x D1, y D2, z D3 and y = 2x and z = 3y} T = { (1, 2, 6) } ITS322 - DBMSs The Relation Model 7
Mathematical Relations (IV) To define a general relation on n domains let D1, D2,..., Dn be n sets with Cartesian product defined as D1 D2... Dn = {(d1, d2,..., dn) d1 D1, d2 D2,..., dn Dn} usually written as n i 1 D i In defining relations we specify the sets, or domains, from which we chose values. ITS322 - DBMSs The Relation Model 8
Database Relations and Their Properties Relation schema Named relation defined by a set of attribute and domain name pairs. Relational database schema Set of relation schemas, each with a distinct name. Relation name is distinct from all other relations. Each cell of relation contains exactly one atomic (single) value. Each attribute has a distinct name. Values of an attribute are all from the same domain. Order of attributes has no significance. Each tuple is distinct; there are no duplicate tuples. Order of tuples has no significance, theoretically. ITS322 - DBMSs The Relation Model 9
Relational Keys (I) Superkey An attribute or a set of attributes that uniquely identifies a tuple within a relation. Candidate Key A superkey (K) such that no proper subset is a superkey within the relation. In each tuple of R, the values of K uniquely identify that tuple (uniqueness). No proper subset of K has the uniqueness property (irreducicility). ITS322 - DBMSs The Relation Model 10
Relational Keys (II) Primary Key Candidate key selected to identify tuples uniquely within relation. Alternate Keys Candidate keys that are not selected to be the primary key. Foreign Key An attribute or set of attributes within one relation that matches candidate key of some (possibly same) relation. ITS322 - DBMSs The Relation Model 11
Relational Integrity (I) Null Represents a value for an attribute that is currently unknown or is not applicable for this tuple. Deals with incomplete or exceptional data. represents the absence of a value and is not the same as zero or spaces, which are values. ITS322 - DBMSs The Relation Model 12
Relational Integrity (II) Entity Integrity In a base relation, no attribute of a primary key can be null. Referential Integrity If foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or foreign key value must be wholly null. Enterprise Constraints Additional rules specified by users or database administrators. ITS322 - DBMSs The Relation Model 13
Relational Algebra and Calculus Relational Algebra Unary Relational Operations Relational Algebra Operations From Set Theory Binary Relational Operations Additional Relational Operations Examples of Queries in Relational Algebra Relational Calculus Tuple Relational Calculus Domain Relational Calculus ITS322 - DBMSs The Relation Model 14
Relational Algebra and Calculus Relational algebra and relational calculus are formal languages associated with the relational model. Informally, Relational algebra is a (high-level) procedural language and Relational calculus a non-procedural language. However, formally both are equivalent to one another. A language that produces a relation that can be derived using relational calculus is relationally complete. ITS322 - DBMSs The Relation Model 15
Relational Algebra Relational algebra operations work on one or more relations to define another relation without changing the original relations. Thus, both operands and results are relations, so output from one operation can become input to another operation. This allows expressions to be nested, just as in arithmetic. This property is called closure. ITS322 - DBMSs The Relation Model 16
Relational Algebra (Overview) Relational Algebra consists of several groups of operations Unary Relational Operations SELECT (symbol: σ (sigma)) PROJECT (symbol: π (pi)) RENAME (symbol: ρ (rho)) Relational Algebra Operations From Set Theory UNION ( ), INTERSECTION ( ), DIFFERENCE (or MINUS, ) CARTESIAN PRODUCT ( x ) Binary Relational Operations JOIN (several variations of JOIN exist) DIVISION Additional Relational Operations OUTER JOINS, OUTER UNION AGGREGATE FUNCTIONS These compute summary of information: for example, SUM, COUNT, AVG, MIN, MAX ITS322 - DBMSs The Relation Model 17
Relational Algebra There are 5 basic operations, in relational algebra, that performs most of the data retrieval operations needed. Selection Projection Cartesian Product Union Set Difference Also operations that can be expressed by 5 basic operations. Join Intersection Division ITS322 - DBMSs The Relation Model 18
Relational Algebra Operations ITS322 - DBMSs The Relation Model 19
Relational Algebra Operations ITS322 - DBMSs The Relation Model 20
An Example (Home Rental Database) ITS322 - DBMSs The Relation Model 21
An Example (Home Rental Database) ITS322 - DBMSs The Relation Model 22
An Example (Home Rental Database) ITS322 - DBMSs The Relation Model 23
Selection (or Restriction) s predicate (R) Selection operation works on a single relation R and defines a relation that contains only those tuples (rows) of R that satisfy the specified condition (predicate). Ex.: List all staff with a salary greater than 10,000. s salary> 10000 (Staff) ITS322 - DBMSs The Relation Model 24
Projection P col1,..., coln (R) Projection operation works on a single relation R and defines a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates. Ex.: Produce a list of salaries for all staff, showing only the StaffNo, fname, lname, and salary details. P staffno, fname, lname, salary (Staff) ITS322 - DBMSs The Relation Model 25
Cartesian Product R S The Cartesian product operation defines a relation that is the concatenation of every tuple of relation R with every tuple of relation S. Ex.: List the names and comments of all renters who have viewed a property. (P clientno, fname, lname (Client))(P clientno, propertyno,comment (Viewing)) ITS322 - DBMSs The Relation Model 26
Example - Cartesian Product and Selection Use selection operation to extract those tuples where Renter.Rno = Viewing.Rno. s Client.clientNo=Viewing.clientNo ((P clientno,fname,lname (Client))(P clientno,propertyno,comment (Viewing))) Note that: Cartesian product and Selection can be reduced to a single operation called a join. ITS322 - DBMSs The Relation Model 27
Union R S Union of two relations R and S defines a relation that contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated. R and S must be union-compatible. If R and S have I and J tuples, respectively, union is obtained by concatenating them into one relation with a maximum of (I + J) tuples. List all cities where there is either a branch office or a property for rent. P city (Branch) P city (PropertyForRent) ITS322 - DBMSs The Relation Model 28
Set Difference R S Define a relation consisting of the tuples that are in relation R, but not in S. R and S must be union-compatible. List all cities where there is a branch office but no properties for rent. P city (Branch) P city (PropertyForRent) ITS322 - DBMSs The Relation Model 29
Join Operations Join is a derivative of Cartesian product. Equivalent to performing a selection, using the join predicate as the selection formula, over the Cartesian product of the two operand relations. One of the most difficult operations to implement efficiently in a relational DBMS and one of the reasons why RDBMSs have intrinsic performance problems. ITS322 - DBMSs The Relation Model 30
Join Operations There are various forms of join operation Theta-join Equi-join (a particular type of theta-join) Natural join Outer join Semi-join ITS322 - DBMSs The Relation Model 31
Theta-join (q-join) R F S Defines a relation that contains tuples satisfying the predicate F from the Cartesian product of R and S. The predicate F is of the form R.a i q S.b i where q may be one of the comparison operators (<, < =, >, > =, =, ~ =). ITS322 - DBMSs The Relation Model 32
Theta-join (q-join) We can rewrite the theta-join in terms of the basic Selection and Cartesian product operations. R F S = s F (R S) Degree of a theta-join is sum of the degrees of the operand relations R and S. If predicate F contains only equality (=), the term equi-join is used. ITS322 - DBMSs The Relation Model 33
Example - Equi-join List the names and comments of all clients who have viewed a property for rent. (P clientno, fname, lname (Client)) Client.clientNo = Viewing.clientNo (P clientno, propertyno, comment (Viewing)) ITS322 - DBMSs The Relation Model 34
Natural Join R S Natural join is an equi-join of the two relations R and S over all common attributes x. One occurrence of each common attribute is eliminated from the result. List the names and comments of all clients who have viewed a property for rent. (P clientno, fname, lname (Client)) (P clientno, propertyno, comment (Viewing)) ITS322 - DBMSs The Relation Model 35
Outer Join Often in joining two relations, there is no matching value in the join columns. To display rows in the result that do not have matching values in the join column, we use the outer join. R S The (left) outer join is a join in which tuples from R that do not have matching values in the common columns of S are also included in the result relation. ITS322 - DBMSs The Relation Model 36
Example - Left Outer Join Produce a status report on property viewings. P propertyno, street, city (PropertyForRent) Viewing ITS322 - DBMSs The Relation Model 37
Semi-join R F S The semi-join operation defines a relation that contains the tuples of R that participate in the join of R with S. Can rewrite Semijoin using Projection and Join: R F S = P A (R F S) List complete details of all staff who work at the branch in Partick. Staff Staff.branchNo = Branch.branchNo and Branch.city = Glasgow Branch ITS322 - DBMSs The Relation Model 38
Intersection R S The intersection operation consists of the set of all tuples that are in both R and S. R and S must be union-compatible. Expressed using basic operations R S = R (R S) List all cities where there is both a branch office and at least one property for rent. P city (Branch) P city (PropertyForRent) ITS322 - DBMSs The Relation Model 39
Division R S The division operation consists of the set of tuples from R defined over the attributes C that match the combination of every tuple in S. Expressed using basic operations T 1 = P C (R) T 2 = P C (( ST 1 ) R) T = T 1 T 2 ITS322 - DBMSs The Relation Model 40
Example - Division Identify all clients who have viewed all properties with three rooms. (P clientno, propertyno (Viewing)) (P propertyno (s rooms = 3 (PropertyForRent))) ITS322 - DBMSs The Relation Model 41
Relational Algebra - Aggregate Function Aggregate Function Operation MAX Salary (EMPLOYEE) retrieves the maximum salary value from the EMPLOYEE relation MIN Salary (EMPLOYEE) retrieves the minimum Salary value from the EMPLOYEE relation SUM Salary (EMPLOYEE) retrieves the sum of the Salary from the EMPLOYEE relation COUNT SSN, AVERAGE Salary (EMPLOYEE) computes the count (number) of employees and their average salary Note: count just counts the number of rows, without removing duplicates ITS322 - DBMSs The Relation Model 42
Relational Algebra - Aggregate Function Group by Dno ITS322 - DBMSs The Relation Model 43
Relational Calculus Relational calculus query specifies what is to be retrieved rather than how to retrieve it. No description of how to evaluate a query. In first-order logic (or predicate calculus), predicate is a truth-valued function with arguments. When we substitute values for the arguments, function yields an expression, called a proposition, which can be either true or false. When applied to databases, relational calculus is in two forms: tuple-oriented and domain-oriented. ITS322 - DBMSs The Relation Model 44
Relational Calculus If a predicate contains a variable, as in x is a member of staff, there must be a range for x. When we substitute some values of this range for x, the proposition may be true; for other values, it may be false. If P is a predicate, then we write the set of all x such that P is true for x, as {x P(x)} Predicates can be connected using (AND), (OR), and ~ (NOT) ITS322 - DBMSs The Relation Model 45
Tuple-oriented Relational Calculus Interested in finding tuples for which a predicate is true. Based on use of tuple variables. Tuple variable is a variable that ranges over a named relation: i.e., variable whose only permitted values are tuples of the relation. Specify range of a tuple variable S as the Staff relation as: Staff(S) To find set of all tuples S such that P(S) is true: {S P(S)} ITS322 - DBMSs The Relation Model 46
Tuple-oriented Relational Calculus (Examples and quantifiers) Examples of tuple-oriented relational calculus To find details of all staffs earning more than 10,000: {S Staff(S) S.salary > 10000} To find a particular attribute, such as salary, write: {S.salary Staff(S) S.salary > 10000} Can use two quantifiers to tell how many instances the predicate applies to: Existential quantifier $ ( there exists ) Universal quantifier " ( for all ) Tuple variables qualified by " or $ are called bound variables, otherwise called free variables. ITS322 - DBMSs The Relation Model 47
Existential quantifier (Tuple-oriented Relational Calculus) Existential quantifier used in formulae that must be true for at least one instance, such as: { S Staff(S) ($B)(Branch(B) (B.branchNo=S.branchNo) B.city = London ) } Means There exists a Branch tuple with same branchno as the branchno of the current Staff tuple, S, and is located in London. ITS322 - DBMSs The Relation Model 48
Universal quantifier (Tuple-oriented Relational Calculus) Universal quantifier is used in statements about every instance, such as: ("B) (B.city Paris ) Means For all Branch tuples, the address is not in Paris. Can also use ~($B) (B.city= Paris ) which means There are no branches with an address in Paris. ITS322 - DBMSs The Relation Model 49
Tuple-oriented Relational Calculus Formulae should be unambiguous and make sense. General form A (well-formed) formula is made out of atoms: R(S i ), where S i is a tuple variable and R is a relation An atom is a formula { S 1.a 1, S 2.a 2,, S n.a n F(S 1, S 2,, S n ) } S i.a 1 q S j.a 2 q is one of comparison operators (<,>, so on) S i.a 1 q c c is a constant. Can recursively build up formulae from atoms: If F 1 and F 2 are formulae, so are their conjunction, F 1 F 2 ; disjunction, F 1 F 2 ; and negation, ~F 1 If F is a formula with free variable X, then ($X)(F) and ("X)(F) are also formulae. ITS322 - DBMSs The Relation Model 50
Tuple-oriented Relational Calculus (An example I) List the names of all managers who earn more than 25,000. {S.fName, S.lName Staff(S) S.position = Manager S.salary > 25000} List the staff who manage properties for rent in Glasgow. {S Staff(S) ($P) (PropertyForRent(P) (P.staffNo = S.staffNo) P.city = Glasgow )} ITS322 - DBMSs The Relation Model 51
Tuple-oriented Relational Calculus (An example II) List the names of staff who currently do not manage any properties. {S.fName, S.lName Staff(S) (~($P) Or (PropertyForRent(P)(S.staffNo = P.staffNo)))} {S.fName, S.lName Staff(S) (("P) (~PropertyForRent(P) ~(S.staffNo = P.staffNo)))} ITS322 - DBMSs The Relation Model 52
Tuple-oriented Relational Calculus (An example III) List the names of clients who have viewed a property for rent in Glasgow. {C.fName, C.lName Client(C) (($V)($P) (Viewing(V) Ù PropertyForRent(P) (C.clientNo = V.clientNo) (V.propertyNo=P.propertyNo) P.city= Glasgow ) ) } ITS322 - DBMSs The Relation Model 53
Tuple-oriented Relational Calculus (An example IV) Expressions can generate an infinite set. For example: {S ~Staff(S)} To avoid this, add restriction that all values in result must be values in the domain of the expression. Domain Relational Calculus ITS322 - DBMSs The Relation Model 54
Domain-oriented Relational Calculus Uses variables that take values from domains instead of tuples of relations. If F(d 1, d 2,..., d n ) stands for a formula composed of atoms and d 1, d 2,..., d n represent domain variables, then: {d 1, d 2,..., d n F(d 1, d 2,..., d n )} is a general domain relational calculus expression. ITS322 - DBMSs The Relation Model 55
Domain-oriented Relational Calculus (An example I) Find the names of all managers who earn more than 25,000. {fn, ln ($sn, posn, sex, DOB, sal, bn) (Staff (sn, fn, ln, posn, sex, DOB, sal, bn) posn = Manager sal > 25000)} List the staff who manage properties for rent in Glasgow. {sn, fn, ln, posn, sex, DOB, sal, bn ($sn1,cty)(staff(sn,fn,ln,posn,sex,dob,sal,bn) PropertyForRent(pN, st, cty, pc, typ, rms, rnt, on, sn1, bn1) (sn=sn1) cty= Glasgow )} ITS322 - DBMSs The Relation Model 56
Domain-oriented Relational Calculus (An example II) List the names of staff who currently do not manage any properties for rent. {fn, ln ($sn) (Staff(sN,fN,lN,posn,sex,DOB,sal,bN) (~($sn1) (PropertyForRent(pN, st, cty, pc, typ, rms, rnt, on, sn1, bn1) (sn=sn1))))} List the names of clients who have viewed a property for rent in Glasgow. {fn, ln ($cn, cn1, pn, pn1, cty) (Client(cN, fn, ln,tel, pt, mr) Viewing(cN1, pn1, dt, cmt) PropertyForRent(pN, st, cty, pc, typ, rms, rnt,on, sn, bn) (cn = cn1) (pn = pn1) cty = Glasgow )} ITS322 - DBMSs The Relation Model 57
Domain-oriented Relational Calculus When domain relational calculus is restricted to safe expressions, it is equivalent to tuple relational calculus restricted to safe expressions, which is equivalent to relational algebra. This means every relational algebra expression has an equivalent relational calculus expression, and vice versa. ITS322 - DBMSs The Relation Model 58
Other Languages Transform-oriented languages are non-procedural languages that use relations to transform input data into outputs (e.g. SQL). Graphical languages provide the user with a picture or illustration of the structure of the relation. The user fills in an example of what is wanted and the system returns the required data in that format (e.g QBE). Fourth-generation languages (4GLs) can create a complete customized application using a limited set of commands in a user-friendly, often menu-driven environment. Some systems accept a form of natural language, sometimes called a fifth-generation language (5GL). This development is still in its infancy. ITS322 - DBMSs The Relation Model 59
Exercise The following tables form a part of a database held in a relational DBMS: Hotel: (hotelno, hotelname, hoteladdress) Room: (roomno, hotelno, Type, Price) Booking: (hotelno, guestno, datefrom, datato, roomno) Guest: (guestno, guestname, guestaddress) Where the primary keys are underlined. Generate the relational algebra, tuple-oriented and domain-oriented calculus for the following queries: 1. List all hotels. 2. List all single rooms with a price below 20 per night. 3. List the names and addresses of all guests. 4. List the price and type of all rooms at the Grosvenor Hotel. 5. List all guests currently staying at the Grovenor Hotel. 6. List the details of all rooms at the Grosvenor Hotel, including the name of the guest staying in the room, if the room is occupied. 7. List the guest details (guestno, guestname, and guestaddress) of all guests staying at the Grosvenor Hotel. ITS322 - DBMSs The Relation Model 60