The Relational Algebra and Relational Calculus Relational Algebra Slide 6-2 Relational Query Languages Query languages Allow manipulation and retrieval of data Not like programming languages Not intend to be used for complex calculations Support easy, efficient access to large data sets Slide 6-3 1
Formal Relational Query Languages Two mathematical query languages: Relational algebra More operational, useful for representing execution plans. Relational Calculus Describe what users want, rather than how to compute it. Non-operational, declarative Slide 6-4 Preliminaries A query is applied to relation instances, and the result is also a relation instance. Input schemas are fixed Output schemas are fixed (Determined by definition of query language constructs.) Positional vs. named-column notation. Positional notation easier for formal definitions, named-column notation more readable Both used in SQL Slide 6-5 Unary Relational Operations: SELECT and PROJECT SELECT Operation: σ Select a subset of rows from a relation PROJECT Operation: л Delete unwanted columns from a relation RENAME Operation: ρ Define new names for relations and/or attributes Slide 6-6 2
Selection Operation σ <selection condition> (R) Selects rows that satisfy selection condition. Selection condition is a Boolean expression. No duplicates in result. Schema of result identical to schema of input relation. Result relation can be the input for another relation algebra operation. Number of tuples in result relation no more than that of input relation. Commutative: σ <cond1> (σ <cond2> (R)) =σ <cond2> (σ <cond1> (R)) Slide 6-7 Projection Operation л <attribute list> (R) Delete attributes that are not in attribute list. No duplicates in result (real systems don t do duplicate elimination unless asked). Schema of result contains exactly the columns in attribute list, with same names that they had in input relation. Result relation can be the input for another relation algebra operation. Number of tuples in result relation no more than that of input relation. л <list1> (л <list2> (R)) = л <list1> (R) Slide 6-8 Rename Operation ρ S(B1,, Bn) (R) or ρ S (R) or ρ (B1,, Bn) (R) Define new names for relations and/or attributes as S(B 1,, B n ). SELECT without renaming, names of attributes in result relation are the same as those in original relation and in the same order. PROJECT without renaming, result relation has the same attribute names as those in projection list and in the same order in which they appear in the list. Slide 6-9 3
Sequences of operations Apply several relational algebra operations one after the other. Create intermediate result relations. E.g. FNAME, LNAME, SALARY ( DNO=5 (EMPLOYEE)) Slide 6-10 Relational Algebra Operations from Set Theory Union Operation: Intersection Operation: Minus Operation: - Cartesian Product (or Cross Product) Operation: Slide 6-11 Relational Algebra Operations from Set Theory (Cont) All of these operations take two input relations, which must be union-compatible: Same number of columns. Corresponding columns have the same type. Commutative operations: R S = S R R S = S R Associative operations: R (S T) = (R S) T (R S) T = R (S T) Slide 6-12 4
Cartesian Product S R Each row of S is paired with each row of R. Result schema has one column per column of S and R, with column names inherited if possible. Conflict: both S and R have a column with same name. Rename: ρ(result(position1 name1, position2 name2), S R) Result relation has S R tuples. Slide 6-13 Binary Relational Operations: JOIN and DIVISION Join Operation Equijoin Operation Natural join Operation Division Operation Slide 6-14 Condition Join Operation R <join condition>s <join condition> (R S) Result schema same as that of cross-product. Fewer tuples than cross-product, might be able to compute more efficiently. Called a theta-join. Slide 6-15 5
Equijoin and Natural Join Operations R * <join attribute> S Equijoin: A special case of condition join where the join condition contains only equalities. Natural join: an equijoin in which equalities are specified on all fields having the same name in R and S. Inner joins: condition join, equijoin and natural join. Slide 6-16 Division Operation R S Not supported as a primitive operator, but useful for some cases. Given two relations R(x,y) and S(y): R S contains all x tuples such that for every y tuple in S, there is an <x,y> tuple in R. In general, x and y can be any lists of columns; y is the list of columns in S, and x y is the list of columns of R. Slide 6-17 Additional Relational Operations Aggregate Function and Grouping Outer join Operation Outer Union Operation Slide 6-18 6
Aggregate Functions and Grouping <grouping attributes>f <function list> (R) Group the tuples in a relation by the value of some of their attributes and apply an aggregate function independently to each group. Aggregate functions: SUM, AVERAGE, MAXIMUM, MINUMUM and COUNT. Duplicates are not eliminated when an aggregate function is applied. Result of an aggregation is a relation. Slide 6-19 Outer Join Operation R S, R S, R S Keep all the tuples in R, or all those in both, or those in S in the result relation, regardless of whether or not they have matching tuples in the other relation. Slide 6-20 Outer Union Operation Take the union of tuples from two relations if the relations are not union compatible. R(x, y) and S(x, z) are partial compatible x is only represented once in the result y and z are also kept in the result relation T(x, y, z). Slide 6-21 7
Relational Calculus Slide 6-22 Relational Calculus Comes in two flavors: Tuple Relational Calculus (TRC) and Domain Relational Calculus (DRC). Calculus has variables, constants, comparison ops, logical connectives and quantifiers. TRC: Variables range over tuples. DRC: Variables range over domains. Expressions in the calculus are called formulas. An answer tuple is essentially an assignment of constants to variables that make the formula evaluate to true. Slide 6-23 Tuple Relational Calculus {t COND(t)} Need to specify: Tuple variable t. Range relation R of t, R(t). Condition(s) to select t. Requested attributes to be retrieved. Slide 6-24 8
Syntax of TRC An atomic formula: t Є Relation R t.a op s.b, t.a op constant, or constant op t.a where op = {<, >,,, =, } A formula is recursively defined as: Any atomic formula Not p, p AND q, p OR q, where p and q are formulas t(p(t)), where t is a tuple variable t(p(t)), where t is a tuple variable Slide 6-25 Existential and Universal Quantifiers Existential quantifier: Universal quantifier: A tuple variable t is bound if it is quantified (appears in ( t) or ( t)); otherwise, it is free. Slide 6-26 Transforming Universal and Existential Quantifiers ( x) (P(x)) NOT ( x) (NOT (P(x))) ( x) (P(x)) NOT ( x) (NOT (P(x))) ( x) (P(x) AND Q(x)) NOT ( x) (NOT (P(x)) OR NOT (Q(x))) ( x) (P(x) OR Q(x)) NOT ( x) (NOT (P(x)) AND NOT (Q(x))) ( x) (P(x) OR Q(x)) NOT ( x) (NOT (P(x)) AND NOT (Q(x))) ( x) (P(x) AND Q(x)) NOT ( x) (NOT (P(x)) OR NOT (Q(x))) ( x) (P(x)) ( x) (P(x)) NOT ( x) (P(x)) NOT ( x) (P(x)) Slide 6-27 9
Semantics of TRC Queries The answer to a TRC query {t p(t)} is the set of all tuples t for which the formula p(t) evaluates to true: t Є Relation R t.a op s.b, t.a op constant, or constant op t.a Not p, p AND q, p OR q, where p and q are formulas t(p(t)), where t is a tuple variable t(p(t)), where t is a tuple variable Slide 6-28 Domain Relational Calculus Query has the form: {<x 1,, x n > p(<x 1,,x n >)} Answer includes all tuples <x 1,, x n > that make the formula p(<x 1,, x n >) be true. Formula is recursively defined, starting with simple atomic formulas (getting tuples from relations or making comparisons of values), and building bigger and better formulas using the logical connectives. Slide 6-29 DRC Formulas An atomic formula: <x 1,, x n > Є Relation R, where R has n attributes; each x i is either a variable or a constant. X op Y, X op constant, or constant op X where X, Y are domain variables. A formula is recursively defined as: Any atomic formula Not p, p AND q, p OR q, where p and q are formulas X(p(X)), where X is a domain variable X(p(X)), where X is a domain variable Slide 6-30 10
Unsafe Queries, Expressive Power It is possible to write syntactically correct calculus queries that have an infinite number of answers! Such queries are called unsafe. It is known that every query that can be expressed in relational algebra can be expressed as a safe query in DRC / TRC; the converse is also true. Relational Completeness: Query language (e.g., SQL) can express every query that is expressible in relational algebra/calculus. Slide 6-31 Summary Relational calculus is non-operational, and users define queries in terms of what they want, not in terms of how to compute it. (Declarative.) Algebra and safe calculus have same expressive power, leading to the notion of relational completeness. Slide 6-32 11