Relational Algebra R & G, Chater 4 By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced roblems, and, in effect, increases the mental ower of the race. -- Alfred North Whitehead (1861-1947) Relational Query Languages Query languages: Allow maniulation and retrieval of data from a database. Relational model suorts simle, owerful QLs: Strong formal foundation based on logic. Allows for much otimization. Query Languages!= rogramming languages! QLs not exected to be Turing comlete. QLs not intended to be used for comlex calculations. QLs suort easy, efficient access to large data sets. Formal Relational Query Languages Two mathematical Query Languages form the basis for real languages (e.g. SQL), and for imlementation: Relational Algebra: More oerational, very useful for reresenting execution lans. Relational Calculus: Lets users describe what they want, rather than how to comute it. (Non-rocedural, declarative.) * Understanding Algebra & Calculus is key to understanding SQL, query rocessing! Preliminaries A query is alied to relation instances, and the result of a query is also a relation instance. Schemas of inut relations for a query are fixed (but query will run over any legal instance) The schema for the result of a given query is also fixed. It is determined by the definitions of the query language constructs. Positional vs. named-field notation: Positional notation easier for formal definitions, named-field notation more readable. Both used in SQL Though ositional notation is not encouraged Relational Algebra: 5 Basic Oerations Selection ( s ) (horizontal). Selects a subset of rows from relation Projection ( ) Retains only wanted columns from relation (vertical). Cross-roduct ( ) Allows us to combine two relations. Set-difference ( ) Tules in r1, but not in r2. Union (» ) Tules in r1 and/or in r2. Since each oeration returns a relation, oerations can be comosed! (Algebra is closed.) Examle Instances bid bname color 101 Interlake blue 102 Interlake red 103 Clier green 104 Marine red 1
Projection age ( ) ( ) Projection sname rating Examles: ;, Retains only attributes that are in the rojection list. Schema of result: exactly the fields in the rojection list, with the same names that they had in the inut relation. Projection oerator has to eliminate dulicates (How do they arise? Why remove them?) Note: real systems tyically don t do dulicate elimination unless the user exlicitly asks for it. (Why not?) sname yuy 9 lubber 8 guy 5 rusty 10 rating ( ) sname, rating age 35.0 55.5 age ( ) Selection (s) Selects rows that satisfy selection condition. Result is a relation. Schema of result is same as that of the inut relation. Do we need to do dulicate elimination? s S rating >8 2) sname rating yuy 9 rusty 10 ( s ( S )) sname, rating rating>8 Union and Set-Difference All of these oerations take two inut relations, which must be union-comatible: Same number of fields. `Corresonding fields have the same tye. For which, if any, is dulicate elimination required? Union Set Difference» - 2
Cross-Product : Each row of aired with each row of. Q: How many rows in the result? Result schema has one field er field of and, with field names `inherited if ossible. May have a naming conflict: Both and have a field with the same name. In this case, can use the renaming oerator: r ( C( 1Æsid1, 5Æ sid2), ) Cross Product Examle X = (sid) sname rating age (sid) bid day Comound Oerator: Intersection In addition to the 5 basic oerators, there are several additional Comound Oerators These add no comutational ower to the language, but are useful shorthands. Can be exressed solely with the basic os. Intersection takes two inut relations, which must be union-comatible. Q: How to exress it using basic oerators? R «S = R - (R - S) Intersection «Comound Oerator: Join Joins are comound oerators involving cross roduct, selection, and (sometimes) rojection. Most common tye of join is a natural join (often just called join ). R S concetually is: Comute R S Select rows where attributes that aear in both relations have equal values Project all unique atttributes and one coy of each of the common ones. Note: Usually done much more efficiently than this. Useful for utting normalized relations back together. Natural Join Examle = bid day 101 10/10/96 103 11/12/96 3
Other Tyes of Joins Condition Join (or theta-join ): R >< c S = s c ( R S) (sid) sname rating age (sid) bid day >< S 1. sid <. sid Result schema same as that of cross-roduct. May have fewer tules than cross-roduct. Equi-Join: Secial case: condition c contains only conjunction of equalities. Comound Oerator: Division Useful for exressing for all queries like: Find sids of sailors who have reserved all boats. For A/B attributes of B are subset of attrs of A. May need to roject to make this haen. E.g., let A have 2 fields, x and y; B have only field y: A B = { x " y Œ B( $ x, y Œ A) } A/B contains all tules (x) such that for every y tule in B, there is an xy tule in A. Examles of Division A/B Exressing A/B Using Basic Oerators no 1 2 3 4 s2 1 s2 2 s3 2 s4 2 s4 4 A no 2 B1 s2 s3 s4 no 2 4 B2 s4 no 1 2 4 B3 A/B1 A/B2 A/B3 Division is not essential o; just a useful shorthand. (Also true of joins, but joins are so common that systems imlement joins secially.) Idea: For A/B, comute all x values that are not `disqualified by some y value in B. x value is disqualified if by attaching y value from B, we obtain an xy tule that is not in A. Disqualified x values: x (( x ( A) B) - A) A/B: x ( A) - Disqualified x values Examles Reserves Sailors bid bname color 101 Interlake Blue 102 Interlake Red 103 Clier Green 104 Marine Red Find names of sailors who ve reserved boat #103 Solution 1: sname(( s Re serves) >< Sailors) bid =103 Solution 2: sname( s (Re serves>< Sailors)) bid =103 4
Find names of sailors who ve reserved a red boat Information about boat color only available in ; so need an extra join: sname (( s serves Sailors color ' red ' ) >< Re >< ) = v A more efficient solution: Find sailors who ve reserved a red or a green boat Can identify all red or green boats, then find sailors who ve reserved one of these boats: r ( Temboats, ( s )) color = ' red ' color = ' green' sname ( Temboats>< Re serves>< Sailors) sname ( s sid (( s Sailors bid color ' red ' ) >< Re ) >< ) = * A query otimizer can find this given the first solution! Find sailors who ve reserved a red and a green boat Previous aroach won t work! Must identify sailors who ve reserved red boats, sailors who ve reserved green boats, then find the intersection (note that sid is a key for Sailors): r ( Temred, (( s ) >< Re serves)) sid color = ' red ' r ( Temgreen, (( s ) >< Re serves)) sid color = ' green' sname (( Temred «Temgreen) >< Sailors) Find the names of sailors who ve reserved all boats Uses division; schemas of the inut relations to / must be carefully chosen: r ( Temsids, ( Re serves) / ( )) sid, bid bid sname ( Temsids>< Sailors) v To find sailors who ve reserved all Interlake boats:... / ( s ) bid bname= ' Interlake' Summary Relational Algebra: a small set of oerators maing relations to relations Oerational, in the sense that you secify the exlicit order of oerations A closed set of oerators! Can mix and match. Basic os include: s,,,», Imortant comound os: «,, / 5