Databases 2011 Christian S. Jensen Computer Science, Aarhus University
What is a Database? Main Entry: da ta base Pronunciation: \ˈdā-tə-ˌbās, ˈda- also ˈdä-\ Function: noun Date: circa 1962 : a usually large collection of data organized especially for rapid search and retrieval (as by a computer) database transitive verb Queries are much more general than searching Database Management System (DBMS): Efficient, convenient, and safe storage of and multi-user access to very large amounts of persistent data 2
What is a Database? Main Entry: da ta base Pronunciation: \ˈdā-tə-ˌbās, ˈda- also ˈdä-\ Function: noun Date: circa 1962 : a usually large collection of data organized especially for rapid search and retrieval (as by a computer) database transitive verb Bank accounts Queries are much more general than searching Blog archives Database Management System (DBMS): Google.com Efficient, convenient, and safe storage of and Amazon.com multi-user access to massive amounts of persistent data Human genome Student records 3
Data Model A (mathematical) representation of data tables/relations sets, multisets, lists trees, graphs Operations on data insert, delete, update, query Constraints on data data types uniqueness dependencies 4
The Relational Data Model Data is stored in tables (relations) name age city Joe 22 London Jacques 27 Paris Jose 34 Madrid Simple but flexible and support many real-world applications 5
The Relational Data Model Data is stored in tables (relations) row (tuple) name age city Joe 22 London Jacques 27 Paris Jose 34 Madrid 6
The Relational Data Model Data is stored in tables (relations) schema name age city Joe 22 London Jacques 27 Paris Jose 34 Madrid 7
The Relational Data Model Data is stored in tables (relations) name age city Joe 22 London Jacques 27 Paris Jose 34 Madrid column 8
The Relational Data Model Data is stored in tables (relations) name age city Joe 22 London Jacques 27 Paris Jose 34 Madrid attribute 9
The Relational Data Model Data is stored in tables (relations) name age city Joe 22 London Jacques 27 Paris Jose 34 Madrid attribute value 10
The Relational Data Model Data is stored in tables (relations) Abstract tables name age city Joe 22 London Jacques 27 Paris Jose 34 Madrid invariant under permutation of rows and columns no information is stored in the order May or may not allow duplicate rows 11
The Relational Data Model Data is stored in tables (relations) Abstract tables city name age Madrid Jose 34 London Joe 22 Paris Jacques 27 invariant under permutation of rows and columns no information is stored in the order May or may not allow duplicate rows 12
NULL Values An attribute value may be NULL it is unknown no value exists it is unknown or does not exist animal color zoo lion yellow Copenhagen crocodile green London Tyrannosaurus Rex NULL NULL polar bear white Berlin NULL values are treated specially 13
Advantages of The Relational Model A simple, intuitive model Often convenient for real-life data but richer models, e.g., XML, are useful in some settings An elegant mathematical foundation set and multi-set theory relational algebra and calculi Allows efficient algorithms Industrial strength implementations are available 14
Schemas Relation schema name of the relation names of the attributes types of the attributes constraints Database schema collection of all relation schemas 15
Running Example The database behind a tiny calendar system Rooms People Meetings Participants Equipment 16
Rooms room capacity Turing-216 6 Ada-333 26 Store-Aud 286 room: the name of a room capacity: the number of people that it will hold 17
People userid name group office csj Christian S. Jensen vip Turing-216 doina Doina Bucur phd NULL bnielsen Kai Birger Nielsen tap Hopper-017 userid: unique user name name: ordinary name group: vip, tap, phd office: a room or NULL 18
Meetings meetid date slot owner what 34716 2011-08-22 14 csj ddb 34717 2010-08-22 15 csj ddb 42835 2010-08-17 9 ceikute TA meeting meetid: a unique id date: the date of the meeting slot: 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 owner: the userid of the owner what: a textual description 19
Participants meetid pid status 34716 Store-Aud a 34716 csj a 42835 radaelli d meetid: the id of the meeting pid: a userid or a room status: u(nknown), a(ccept), d(ecline) 20
Equipment room Store-Aud Store-Aud Hopper-017 type projector whiteboard mini-fridge room: the name of a room type: the type of equipment 21
SQL Structured Query Language Invented by IBM in the 1970s (many versions) High-Level, declarative, no low-level manipulations Algebraic foundations Representations, operations, constraints Query optimization MySQL, DB2, Oracle, SQL Server, 22
Declaring Tables (1/3) CREATE TABLE Rooms ( room VARCHAR(15), capacity INT ); CREATE TABLE People ( name VARCHAR(40), office VARCHAR(15), userid VARCHAR(15), `group` CHAR(3) ); 23
Declaring Tables (2/3) CREATE TABLE Meetings ( meetid INT, date DATE, slot INT, owner VARCHAR(15), what VARCHAR(40) ); 24
Declaring Tables (3/3) CREATE TABLE Participants ( meetid INT, pid VARCHAR(15), status CHAR(1) ); CREATE TABLE Equipment ( room VARCHAR(15), type VARCHAR(20) ); 25
SQL Data Types INT 217 CHAR(2) 'aa', 'ab', '12', '++' VARCHAR(5) '', '12345', 'foo', 'x''y' FLOAT 3.14, 42, 0.0018 DATE '2008-08-25' TIME '14:15:00' TEXT a text file BLOB a movie 26
Refinements NOT NULL the value cannot be NULL DEFAULT value a default value is specified UNIQUE the value is unique in the table unless it is NULL PRIMARY KEY the value is unique in the table the value is never NULL special syntax for multi-attribute primary keys 27
Refined Tables (1/3) CREATE TABLE Rooms ( room VARCHAR(15) PRIMARY KEY, capacity INT NOT NULL ); CREATE TABLE People ( name VARCHAR(40) NOT NULL, office VARCHAR(15), userid VARCHAR(15) PRIMARY KEY, `group` CHAR(3) ); 28
Declaring Tables (2/3) CREATE TABLE Meetings ( meetid INT PRIMARY KEY, date DATE, slot INT, owner VARCHAR(15) NOT NULL, what VARCHAR(40) ); 29
Declaring Tables (3/3) CREATE TABLE Participants ( meetid INT NOT NULL, pid VARCHAR(15) NOT NULL, status CHAR(1) DEFAULT 'u' ); CREATE TABLE Equipment ( room VARCHAR(15) NOT NULL, type VARCHAR(20) NOT NULL, PRIMARY KEY (room, type) ); 30
SELECT-FROM-WHERE The basic form of an SQL query SELECT desired attributes FROM one or more tables WHERE condition about the involved rows 31
Simple Example Which meetings ( what ) have csj arranged? meetid what date slot owner what 34716 ddb 2010-08-22 14 csj ddb 34717 ddb 2010-08-22 15 csj ddb 42835 2010-08-17 10 ceikute TA meeting SELECT what FROM Meetings WHERE owner = 'csj'; 32
Loop Semantics for Single Table Loop through all rows in the table Check if the condition is true Project the rows onto the desired attributes Note that duplicates are kept... 33
Renaming in SELECT The selected attributes can be given new names SELECT name,`group` AS category FROM People WHERE office = 'Ada-230'; name Vaida Ceikute Rasmus Ibsen-Jensen category phd phd 34
Expressions in SELECT The attributes may have computed values SELECT owner, date, slot*60 AS minute FROM Meetings WHERE owner = 'csj'; owner date minute csj 2010-08-22 840 csj 2010-08-22 900 35
Conditions in WHERE AND, OR, NOT, =, <>, <, >, <=, >=, LIKE,... SELECT owner, what FROM Meetings WHERE slot >= 12 AND slot < 16 AND what LIKE '%beer%'; owner anderssk anderssk anderssk what Afternoon beer Belgian beer testing Return empty beer bottles 36
3-Valued Logic Arithmetic operations on NULL yield NULL Any comparison with NULL yields unknown This gives 3 truth values: true, false, unknown Boolean connectives are defined appropriately AND tt ff u OR tt ff u NOT tt tt ff u tt tt tt tt tt ff ff ff ff ff ff tt ff u ff tt u u ff u u tt u u u u The WHERE clause accepts if the result is true 37
A Surprise? People userid name group office csj Christian S. Jensen vip Turing-216 doina Doina Bucur phd NULL bnielsen Kai Birger Nielsen tap Hopper-017 SELECT userid FROM People WHERE office='turing-216' OR office<>'turing-216'; userid csj bnielsen 38
Testing for NULL People userid name group office csj Christian S. Jensen vip Turing-216 doina Doina Bucur phd NULL bnielsen Kai Birger Nielsen tap Hopper-017 SELECT userid FROM People WHERE office IS NULL; userid doina 39
Multiple Relations Who have booked meetings on August 22, 2010? SELECT name FROM People, Meetings WHERE date = '2010-08-22' AND owner = userid; The relations are joined 40
Multiple Relations Example userid name group office csj Christian S. Jensen vip Turing-216 doina Doina Bucur phd NULL bnielsen Kai Birger Nielsen tap Hopper-017 meetid date slot owner what 34716 2010-08-22 14 csj ddb 34717 2010-08-22 15 csj ddb 42835 2010-08-17 10 ceikute TA meeting 41
General Loop Semantics Loop through all rows in all tables For each combination check if the condition is true project the rows onto the desired attributes Note that duplicates are still kept... 42
Prefixing Attribute Variables Avoid possible name clashes SELECT People.name FROM People, Meetings WHERE Meetings.date = '2008-08-22' AND Meetings.owner = People.userid; 43
Multiple Relations Who shares a room? userid name group office csj Christian S. Jensen vip Turing-216 vaida Vaida Ceikute phd Turing-216 ira Ira Assent vip Turing-217 roomie1 Christian S. Jensen roomie2 Vaida Ceikute 44
Naming Row Variables Enables self-joins SELECT p1.name AS roomie1, p2.name AS roomie2 FROM People p1, People p2 WHERE p1.office = p2.office AND p1.userid <> p2.userid; A table of all roommates... 45
Avoiding Symmetric Pairs SELECT p1.name AS roomie1, p2.name AS roomie2 FROM People p1, People p2 WHERE p1.office = p2.office AND p1.userid < p2.userid; 46
Aggregation The SELECT clause may involve aggregate functions SUM AVG COUNT MIN MAX NULLs are ignored in these computations Except that count(*) counts all rows 47
Requirements Aggregation of a column computes a 1 a 2 a 3... a n for some operator x a 1 a 2 This is only well-formed if is commutative: a b = b a associative: (a b) c = a (b c) since the rows may be permuted a 3... a n 48
Simple Example What is the average capacity of a room? SELECT AVG(capacity) AS average FROM Rooms; average 106 49
Avoiding Duplicates SELECT DISTINCT removes duplicates This is expensive! But sometime necessary... What kinds of equipment do we have? SELECT DISTINCT type FROM Equipment; 50
Avoiding Duplicates in Aggregation How many kinds of equipment do we have? SELECT COUNT(DISTINCT type) as number FROM Equipment; number 4 51
Scalar Functions Lots of useful functions are available integer and float functions string functions calendar functions... SELECT CHARACTER_LENGTH(name), FROM People; UPPER(`group`) 52
Subqueries Any query in parentheses can be used in FROM clauses WHERE clauses A query may be used as a value if it returns only one row and one column otherwise, a run-time error occurs 53
Simple Example Who shares an office with Ira? SELECT name FROM People WHERE office = (SELECT office FROM People WHERE userid='ira'); 54
Membership Tests IN and NOT IN test membership in tables Who has csj arranged to meet? SELECT pid FROM Participants WHERE meetid IN (SELECT meetid FROM Meetings WHERE owner='csj') AND pid NOT IN (SELECT room FROM Rooms); 55
Membership Tests meetid pid status 34716 Store-Aud a 34716 csj a 42835 sigurd d meetid date slot owner what 34716 2010-08-22 14 csj ddb 34717 2010-08-22 15 csj ddb 42835 2010-08-17 10 ceikute TA meeting 56
Correlated Subqueries Which meetings exceed the capacity of a room? SELECT meetid FROM Meetings WHERE (SELECT COUNT(DISTINCT pid) FROM Participants WHERE meetid=meetings.meetid AND status<>'d' AND pid NOT IN (SELECT room FROM Rooms) ) > (SELECT capacity FROM Rooms, Participants WHERE room=pid AND meetid=meetings.meetid) ; 57
Correlated Subqueries Which meetings exceed the capacity of a room? SELECT meetid FROM Meetings static nested scope rules WHERE (SELECT COUNT(DISTINCT pid) FROM Participants WHERE meetid=meetings.meetid AND status<>'d' AND pid NOT IN (SELECT room FROM Rooms) ) > (SELECT capacity FROM Rooms, Participants WHERE room=pid AND meetid=meetings.meetid) ; 58
EXISTS and NOT EXISTS Check for emptiness or non-emptiness of a table Who is alone in an office? SELECT name FROM People p1 WHERE NOT EXISTS ( SELECT * FROM People WHERE office = p1.office AND userid <> p1.userid ); 59
ANY and ALL Allow comparisons against any row in a subquery all rows in a subquery Which are the latest meetings that are planned? SELECT what FROM Meetings WHERE date >= ALL( SELECT date FROM Meetings ); 60
UNION, INTERSECT, and EXCEPT Treat tables with the same schema as sets remove duplicates (unless ALL is added) computes,, and \ Who do not participate in a meeting they have themselves arranged? (SELECT owner AS userid, meetid FROM Meetings) EXCEPT (SELECT pid AS userid, meetid FROM Participants); 61
INTERSECT and MySQL Who participates in a meeting they have arranged themselves? Intersect is not supported in MySQL Use, e.g., EXISTS instead. SELECT owner, meetid FROM Meetings m WHERE EXISTS (SELECT pid, meetid FROM Participants p WHERE p.pid= m.owner AND p.meetid = m.meetid); 62
EXCEPT and MySQL Except is not supported in MySQL Use NOT EXISTS instead. SELECT owner, meetid FROM Meetings m WHERE NOT EXISTS (SELECT pid, meetid FROM Participants p WHERE p.pid= m.owner AND p.meetid = m.meetid); 63
The JOIN Operator T1 JOIN T2 ON condition is syntactic sugar for: SELECT * FROM T1,T2 WHERE condition 64
Dangling Rows and FULL JOIN T1 JOIN T2 ON condition A row in T1 or T2 that does not match a row in the other table is dangling An ordinary JOIN throws away dangling rows A OUTER JOIN preserves dangling rows by padding them with NULL values A LEFT or RIGHT JOIN preserves dangling rows from one argument only 65
Simple Example In which offices are meetings planned? All offices with meetings or NULL SELECT office, meetid FROM People LEFT JOIN Participants ON pid=office; Only those offices with meetings SELECT office, meetid FROM People JOIN Participants ON pid=office; 66
People and Participants userid name group office csj Christian S. Jensen vip Turing-216 doina Doina Bucur phd NULL bnielsen Kai Birger Nielsen tap Hopper-017 meetid pid status 34716 Store-Aud a 34716 csj a 42835 sigurd d 67
Grouping SELECT-FROM-WHERE-GROUP BY Rows are grouped by a set of attributes Aggregations in SELECT are done for each group The attributes in SELECT must be either aggregates or mentioned in the GROUP BY clause 68
Simple Example How many meetings have each person arranged? SELECT owner, COUNT(meetid) as number FROM Meetings GROUP BY owner; owner number amoeller 4 kjensen 1 csj 3 69
Advanced Example What is the average number of invitations for the meetings that each person has arranged? SELECT owner, AVG(pidno) AS average FROM (SELECT owner, m.meetid, COUNT(pid) as pidno FROM Meetings m, Participants p WHERE m.meetid = p.meetid GROUP BY owner, m.meetid) AS ownavg GROUP BY owner; 70
HAVING A HAVING clause may eliminate some groups Which offices have more than one occupant? SELECT office FROM People GROUP BY office HAVING COUNT(*) > 1; Attributes in HAVING must be aggregates or mentioned in GROUP BY 71
Modifications SQL commands may modify the database Three kinds of modifications insert one or more rows delete one or more rows update existing rows or columns Modifications do not return a result 72
Inserting a Single Row INSERT INTO table VALUES (list of values); INSERT INTO Participants VALUES (42432, 'mis', 'a'); Optionally specify attribute names: INSERT INTO Participants(pid, status, meetid) VALUES ('mis', 'a', 42432); Missing values are NULL or defaults 73
Inserting a Subquery Invite everyone Anders meets with to his Belgian beer tasting INSERT INTO Participants ( SELECT 46432 AS meetid, pid, 'u' AS status FROM Meetings, Participants WHERE Meetings.meetid=Participants.meetid AND owner = 'anderssk' AND pid <> 'anderssk' AND pid NOT IN (SELECT room FROM Rooms)); 74
Deleting Some Rows DELETE FROM table WHERE condition; Delete Christian's office DELETE FROM Rooms WHERE room='turing-216'; Delete all offices DELETE FROM Rooms; 75
Deleting a Subquery Delete all people with a roommate DELETE FROM People p WHERE EXISTS( SELECT * FROM People WHERE office = p.office AND userid <> p.userid ); 76
Meaning of Deletion First the condition is computed for all rows Then the deletions are performed Otherwise the last person in a multi-person office would not be deleted! 77
Update UPDATE table SET attribute assignments WHERE condition; Move Anders to a smaller office UPDATE People SET office = 'Turing-213' WHERE userid = 'anderssk'; 78
SQL is Everywhere 79