Part 4: Database Language - SQL Junping Sun Database Systems 4-1 Database Languages and Implementation Data Model Data Model = Data Schema + Database Operations + Constraints Database Languages such as SQL and QUEL can be viewed as a tool to implement database schema and data operations at logical or implementation level. Database Language = Database Definition Language (DDL) + Database Manipulation Language (DML) DDL implements database schema DML implements database operations Separation of DDL and DML is the major distinction between the application systems developed by database languages and developed by programming languages. Junping Sun Database Systems 4-2 Page 1
SQL - Structural Query Language SQL: It is the most accepted and implemented interface language for relational database systems(intergalactic dataspeak). History of Relational Database Languages: SEQUEL (1974 -- 1975) It was the Application Programing Interface (API) to System R. It was revised to SEQUEL/2 after several years, and later SEQUEL/2 was changed to SQL. SQL/DS (1981) DB2 (1983) SQL (ANSI-86) the first standardized version of SQL, called SQL1 SQL (ANSI-89) SQL (ANSI-92), called SQL2 SQL3, support recursive operation and object-oriented paradigm SQL-99 Standard Junping Sun Database Systems 4-3 Data Definition Schema Definition at Three Level of Databases: View data schema (table) definition: A view table can be defined on the top of one or more base table Base data table schema definition: A base table is corresponding to one physical data file in the storage system. Physical Each base table can be stored in different type of storage schema or data organization structure such as sequential file, hash index, ISAM, VSAM B-Tree, B + -Tree, B * -Tree, K-D Tree, KDB Tree, R-Tree, R + -Tree, R * -Tree Integrity constraints on schema Authorization, and security mechanism on user defined database operations such as query, update, and insert/delete operations. Junping Sun Database Systems 4-4 Page 2
Data Definition Create Statements: create table statement (to define a base table) create index statement (to define an index at internal level) create view statement (to define an view at user level) create schema statement (to treat a database as whole unit in SQL89 &SQL2) Drop Statements: drop table statement (to delete the definition and all instances of the table) drop index statement (to remove an existing index) drop view statement (to delete the view) drop schema statement (to delete schema) Junping Sun Database Systems 4-5 SQL Schema: Schema and Catalog in ANSI-SQL Standard It is identified by a schema name, and includes an authorization identifier to indicate the user or account who owns the schema. Example: CREATE SCHEMA COMPANY AUTHORIZATION JSMITH; It creates a schema called COMPANY, owned by the user with authorization identifier JSMITH. Syntax: schema ::= CREATE SCHEMA schema-name AUTHORIZATION user [ schema-element-list ] Junping Sun Database Systems 4-6 Page 3
CREATE TABLE EMPLOYEE Statement CREATE TABLE EMPLOYEE (NAME VARCHAR2(19) NOT NULL, SSN CHAR(9), BDATE DATE, ADDRESS VARCHAR(30), SEX CHAR, SALARY NUMBER(10,2), SUPERSSN CHAR(9), DNO VARCHAR(8) NOT NULL, CONSTRAINT EMPPK PRIMARY KEY(SSN), CONSTRAINT EMPSUPERFRK FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE (SSN) DISABLE, CONSTRAINT EMPDUMFRK FOREIGN KEY (DNO) REFERENCES DEPARTMENT (DNUMBER) DISABLE); The constraint can be enabled by using the ALTER TABLE statement after the data is loaded into the table. ALTER TABLE EMPLOYEE ENABLE CONSTRAINT EMPSUPERFRK; Junping Sun Database Systems 4-7 Specifying Referential Triggered Actions CREATE TABLE EMPLOYEE (NAME VARCHAR2(19) NOT NULL, SSN CHAR(9), BDATE DATE, ADDRESS VARCHAR(30), SEX CHAR, SALARY NUMBER(10,2) CHECK SALARY BETWEEN 10000 AND 99000, DNO VARCHAR(9) NOT NULL DEFAULT 1, CONSTRAINT EMPPK PRIMARY KEY (SSN), CONSTRAINT EMPSUPERFK FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE (SSN) ON DELETE CASCADE DISABLE); ORACLE supports ON DELETE CASCADE. Junping Sun Database Systems 4-8 Page 4
Specifying Referential Triggered Actions CREATE TABLE DEPARTMENT (DNAME VARCHAR2(15) NOT NULL, DNUMBER VARCHAR(8), MGRSSN CHAR(9) NOT NULL DEFAULT 888665555, CONSTRAINT DEPTPK PRIMARY KEY (DNUMBER), CONSTRAINT DEPTSK UNIQUE (DNAME), CONSTRAINT DEPTMGRFRK FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE(SSN) ON DELETE CASCADE DISABLE); ALTER TABLE EMPLOYEE ADD (CONSTRAINT EMPDNOFRK FOREIGN KEY (DNO) REFERENCES DEPARTMENT(DNUMBER) ); Junping Sun Database Systems 4-9 Data Types SQL Data Types: (ANSI-SQL) CHARACTER(n) CHARACTER VARYING(n) NUMERIC(p,s) DECIMAL(p,s) INTEGER INT SMALLINT FLOAT(b) DOUBLE PRECISION REAL DATE SQL Data Types: (ORACLE) CHAR(n) VARCHAR(n) VARCHAR2(2) NUMBER(p,s) NUMBER(38) NUMBER DATE RAW LONG LONG RAW ROWID Junping Sun Database Systems 4-10 Page 5
Data Manipulation in SQL Data Manipulation at Base Table Level: Query the database via statement Modify data (tuples) in a table of the database via update statement Remove data (tuples) a table of the database via delete statement. Append data (tuples) into a table in the database via insert statement. Data Manipulation at View (virtual table) Level: Query the partial database via statement view Update or modify the partial data defined at the view level mapping view update to the underlying base table single table update multiple table update still has unsolved problem. Junping Sun Database Systems 4-11 Query Database In SQL Querying database in SQL is done via statement. General format of statement: where <attribute list> <table list> <condition> <attribute list> is a list of attribute names whose values are to be retrieved by the query. <table list> is a list of the relation names required to process the query. multiple tables listed in the <table list> implies join operation involved. <condition> is a conditional (Boolean) expression that identifies the tuples to be retrieved by the query. <condition> specifies the ion and join operations. <condition> can include another statement as a subquery of nested query. Junping Sun Database Systems 4-12 Page 6
SELECT-PROJECT QUERY Q0: Retrieve the birth date and address of the employee whose name is John B. Smith. SQL Script for Q0: Q0: bdate, address employee where fname = John and minit = B and lname = Smith ; Relation Algebra Expression for Q0: S <bdate, address> ( V fname = John and minit = b and lname = smith (employee) Target Attribute: Constraint: Target Relation: bdate, address fname = John and minit = B and lname = Smith employee Junping Sun Database Systems 4-13 SELECT-PROJECT-JOIN QUERY Q1. Retrieve the first and last names and addresses of all employees who work for the 'Research ' department. fname, lname, address employee, department where dname = 'Research' and dnumber = dno; Target Attributes: Constraint: Select Condition: Join Condition: Target Relations: fname, lname, address dname = 'Research' dnumber = dno employee, department This query involves one ion on department relation and a join on relations employee and department. Junping Sun Database Systems 4-14 Page 7
Q2. For every project located in 'Stafford, list the project number, the controlling department number, and the department manager's last name, address, and birthdate. where pnumber, dnum, lname, address, bdate project, department, employee plocation = 'Stafford' and dnum = dnumber and mgrssn = ssn; Target Attributes: pnumber, dnum, lname, address, bdate Constraints: Select Condition: plocation='stafford' Join Condition: dnum=dnumber, mgrssn = ssn Target Relations: project, department, employee ion operation on project relation to project tuples located in 'Stafford'. join with project and department relation to find the controlling department join with department and employee relation to find manager s information in employee relation. two join operations implement two relationships in ER schema of the database, MANAGES and Controls. Junping Sun Database Systems 4-15 Dealing with Ambiguous Attribute Names and Aliasing Q1A: fname, lname, address employee, department where department.dname = 'Research' and department.dnumber = employee.dnumber ; if the attribute names for department number are the same in both employee and department tables, then qualifier will be necessary in specifying a query to avoid ambiguity. Q8. For each employee, retrieve the employee's first and last name and the first and last name of his or her immediate supervisor. e.fname, e.lname, s.fname, s.lname employee e, employee s where e.superssn = s.ssn; Junping Sun Database Systems 4-16 Page 8
Discussion on Aliasing ambiguity will arise in the case of queries that refer to the same relation name twice. the above query statement declares alternative relation names of employee relation e and s. e and s can be imagined as two different copies of the employee relation. e represents employees in the role of supervisees s represents employees in the role of supervisors join and ion operations are involved. join attributes are superssn and ssn. the join condition e.superssn = s.ssn links the employee s supervisor s corresponding information such as fname and lname. the join condition implements the recursive relationship supervision in original ER schema. this is an example of one level recursion. a general recursive query, with unknown number of levels, can be not specified. Junping Sun Database Systems 4-17 Query with PROJECT: Query Examples Q9: List all employees social security number. ssn employee; Query with SELECT: Q1C: Retrieve all employees tuples department 5. * employee where dno = 5; Junping Sun Database Systems 4-18 Page 9
Query Examples Query with CARTESIAN PRODUCT: Q10: List all combinations of EMPLOYEE SSN and DEPARTMENT DNAME ssn, dname employee, department; Query with Retrieving Distinct Attribute Values: Q11: Retrieve the salary of every employee ALL salary employee; Q11A: Retrieve all distinct salary values DISTINCT salary employee; Junping Sun Database Systems 4-19 Query Involving with Union Q4. Make a list of all project numbers for projects that involve an employee whose last name is Smith as a worker or as a manager of the department that controls the project. ( where union ( where distinct pnumber project, employee, department lname = Smith and dnum = dnumber and mgrssn = ssn) distinct pnumber project, employee, works_on lname = Smith and pnumber = pno and essn = ssn); the first query retrieves the projects that involve a 'Smith' as a manager of department that controls the project. the second query retrieves the projects that involve a 'Smith' as a worker on the project. if several employees have the last name 'Smith', the project names involving any of them would be retrieved. Junping Sun Database Systems 4-20 Page 10
Discussion The first part of union: Target Attributes: Constraints: Select Condition: Join Condition: Target Relations: pnumber lname = Smith dnum = dnumber (implement relationship control) mgrssn = ssn (implement relationship manager) project, employee, department The second part of union: Target Attributes: pnumber Constraints: Select Condition: lname = Smith Join Condition: pnumber = pno and essn = ssn (implement M:N relationship works_on) Target Relations: project, employee, works_on Junping Sun Database Systems 4-21 Predicate IN The IN predicates s those rows for which a specified value appears in a list of constant values enclosed in parentheses or the results a subquery. Q13: Retrieve the social security numbers of all employees who work on any one of the project with project number 1, 2, or 3. distinct essn works_on where pno in (1, 2, 3); Result the query: essn 123456789 666884444 453453453 333445555 Junping Sun Database Systems 4-22 Page 11
Workson Table Junping Sun Database Systems 4-23 Predicate NOT IN The NOT IN predicate is true if the expression preceding the keyword IN does not match any value in the list. Q13b: Retrieve the social security numbers of all employees who work on the project other than projects 1, 2, and 3. essn works_on where pno not in (1, 2, 3); Result the query: essn 333445555 888665555 987654321 987987987 999887777 Junping Sun Database Systems 4-24 Page 12
Quantifier ANY/SOME Predicate ANY /SOME: The ANY/SOME predicates those rows for which a specified value appears in the results a subquery. Query: Retrieve the social security numbers of employees who works on some projects controlled by department 5. distinct essn works_on where pno = any ( pnumber project where dnum = 5); =any predicate is same as the IN predicate. ANSI-SQL supports both ANY and SOME predicates, even they are equivalent. ORACLE only supports ANY predicate not SOME. The difference between IN and = ANY(=SOME) predicates is that IN could be connected with a set of values but ANY(SOME) only subqueries. Junping Sun Database Systems 4-25 Quantifier SOME and ANY Both SOME and ANY are designed to link a simple relational operator with a subquery that return a multi-row result. The sequence preceding the subquery has the following format: {expression relational-operator quantifier} is called quantifier predicate Expression Comparison-operator Quantifier Subquery quantity > ANY (... ) The whole quantifier predicate will be applied to each row of subquery result in return. Logical expression is true if and only if one or more rows in the subquery result satisfy the comparison. It is false if and only if absolutely none of the subquery result rows satisfy the comparison. Junping Sun Database Systems 4-26 Page 13
Quantifier ALL Quantifier ALL: The ALL predicates evaluates to true if and only if a comparison between a single value and the set of values retrieved by the subquery is true for all values retrieved by the subquery. Query: List the names of employees whose salary is greater than the salary of all the employees in department 5. lname, fname employee where salary > all ( salary employee where dno = 5); Predicate ANY, SOME, and ALL could be prefixed with any comparison operators such as { =, t!d z } z can be expressed by <> or!= in the sql condition expression. Junping Sun Database Systems 4-27 Discussions on Predicates IN and NOT IN The predicate a IN (x, y, z) is equivalent to a = x OR a = y OR a = z essn works_on where pno = 1 or pno = 2 or pno = 3; The predicate a NOT IN (x, y, z) is equivalent to a <> x AND a <> y AND a<> z a NOT IN (x, y, z) is equivalent to a <> ALL (x, y, z) essn works_on where pno <> and pno <> 2 and pno <> 3; The predicate a <> ANY/SOME (x, y, z) is equivalent to (a <> x) or (a <> y) or (a <> z). Junping Sun Database Systems 4-28 Page 14
Nested Query (Type-N) Q4A. Make a list of all project names for projects that involve an employee whose last name is Smith as a worker, or as a manager of the department that controls the project. distinct pname project where pnumber in ( pnumber project, department, employee where lname = Smith and dnum = dnumber and mgrssn =ssn) or pnumber in ( pno works_on, employee where lname = Smith and essn = ssn); The comparison operator IN compares a value V (here V is pnumber) with a set of (or multiset) of values V and evaluates to TRUE if V is one of the elements in V. Junping Sun Database Systems 4-29 Subquery 1: Decomposition of Nested Query temp1: where pnumber project, department, employee dnum = dnumber and mgrssn =ssn and lname ='Smith' Subquery 2: temp2: where pno workson, employee essn = ssn and lname = 'Smith' Subquery 3: where distinct pnumber project pnumber = temp1.pno o r pnumber = temp2.pno Junping Sun Database Systems 4-30 Page 15
Comparison Nested and Flatten Queries Query: Retrieve the social security numbers of employees who work on some projects controlled by department 5. distinct essn works_on where pno = ( pnumber project where dnum = 5); Equivalent Query: essn works_on, project where dnum = 5 and pno = pnumber ; The first implementation by using subquery can avoid join operation. The second implementation has to use join operation where pno = pnumber is the join condition or join path. Junping Sun Database Systems 4-31 Correlated Nested Query (Type-J) Q12. Retrieve the name of each employee who has a dependent with the same first name and same sex as the employee. e.fname, e.lname employee e where e.ssn in ( essn dependent where essn = e.ssn and sex = e.sex and e.fname = dependent_name); The where clause of inner query block contains join predicates that references the table of an outer query block (and the table is not included in the clause of the inner query block). essn = e.ssn correlates the current dependent tuple with the corresponding employee the dependent belongs to. sex = e.sex and e.fname = dependent_name checks the equivalence of sex and fname values between employee and dependent tuples. Junping Sun Database Systems 4-32 Page 16
Rule for Subqueries and Nested Queries 1. The subquery should be enclosed within parentheses. 2. Subqueries may contain nested subqueries. When subqueries are nested, SQL evaluates them the inside out. a. The innermost query is processed first b. Then the result of query is passed to the next outer query. 3. In general, we might have several levels of nested queries, the ambiguity among attribute names will be possible if attributes of the same name exist, one in a relation in the -clause of the outer query, and the other in a relation in the -clause of the nested query (inner query). The rule is that a reference to an unqualified attribute refers to the relation declared in the innermost nested query. 4. Column name in a subquery are implicitly qualified by the table name in the FROM clause of the subquery (that is the FROM clause at the same level). 5. A subquery may refer only to column names tables which are named in outer queries or in subquery s own FROM clause. A subquery may not access tables which are used only by a child query. 6. When a subquery is one of the two operands involved in a comparison, the subquery must be written as the second operand. Junping Sun Database Systems 4-33 Query with Exists Function Q12B: Retrieve the name of employee who has a dependent with the same first name and same sex as the employee. e.fname, e.lname employee e where exists ( * dependent where essn = e.ssn and sex = e.sex and e.fname = dependent_name); Junping Sun Database Systems 4-34 Page 17
The Exists Function in SQL exists and not exists in SQL is used to check whether the result of a correlated query is empty. exists and not exists in SQL are usually used in conjunction with a correlated nested query. In the example 12, the nest query within the exists function references the ssn, fname, and sex attributes of employee relation the outer query. For each employee tuple, evaluate the nested query, which retrieves all dependent tuples with the same social security number ssn, sex and name as the employee tuple. if at least one tuple exists in the results of the nested query, then that employee tuple. In general, exists(q) returns TRUE if there is at least one tuple in the result of query Q and returns FALSE otherwise. not exists(q) returns TRUE if there are no tuples in the result of query Q and returns FALSE otherwise. Junping Sun Database Systems 4-35 Query with Not Exists Function Q6: Retrieve the names of employees who have no dependents. fname, lname employee where not exists ( * dependent where ssn = essn); The correlated nested query retrieves all dependent tuples related to an employee tuple, if none exist, the employee tuple is ed. For each employee tuple, the nested query s all dependent tuples whose essn value matches the employee ssn. If the result of the nested query is empty then no dependents are related to the employee, so that employee tuple is ed and its fname and lname are retrieved. This is the implementation of difference operation. Junping Sun Database Systems 4-36 Page 18
Nested Query with Two Exists Function Q7. List the names of managers who have at least one dependent. fname, lname employee where exists ( * dependent where ssn = essn) and exists ( * department where ssn = mgrssn); the first nested query s all dependent tuple related to an employee the second nested query s all department tuples managed by the employee tuple. if at least one of the fist one and at least one of the second exist with the same ssn, the employee tuple is ed and the fname and lname are retrieved. this is the implementation of intersection operation. Junping Sun Database Systems 4-37 Query with Division (use contains) Q3. Retrieve the name of each employee who works on all the projects controlled by department 5. fname, lname employee where (( pno works_on where ssn = essn) contains ( pnumber project where dnum = 5)); the second nested query which is not correlated to the outer query retrieves the project numbers of all projects controlled by department 5. for each employee tuple, the first nested query, which is correlated, retrieves the project numbers on which the employee works; if these contain all projects controlled by department 5, the employee tuples is ed and the name of that tuple is retrieved. ANSI-SQL and most SQL engine do not support the contains operator. Junping Sun Database Systems 4-38 Page 19
Query with Division Q3: Retrieve the name of each employee who works on all the projects controlled by department 5. fname, lname employee e where not exists ( ( pnumber project where dnum = 5) minus ( pno workson w where e.ssn = w.essn) ) Junping Sun Database Systems 4-39 Query with Division Q3: Retrieve the name of each employee who works on all the projects controlled by department 5. fname, lname employee where not exists ( * workson b where (b.pno in ( pnumber project where dnum = 5)) and not exists ( * workson c where c.essn = ssn and c.pno = b.pno)); Junping Sun Database Systems 4-40 Page 20
Discussion The outer nested query s any works_on (b) tuples whose pno is of a project controlled by department 5 and there is not a works_on (c) with the same pno and the same ssn as that of the employee tuple under consideration in the outer query. if no such tuple exists, we the employee tuple, and retrieve the fname and lname of that employee tuple. the equivalent interpretation of the query script is as follows: there does not exist a project controlled by department 5 that the employee does not work on. equivalently, each employee who works on all the projects controlled by department 5. Junping Sun Database Systems 4-41 Renaming Attributes and Join Tables Q8a: Retrieve the last name of each employee and his or her supervisor, while renaming the resulting attribute names as employee_name and supervisor_name. where e.lname as employee_name, s.lname as supervisor_name employee as e, employee as s e.superssn = s.ssn; Q1a: Retrieve the names of the employees who work for Research department. where fname, lname, address (employee join department on dno = dnumber) dname = Research ; The concept of a joined table is only supported in ANSI-SQL92. Junping Sun Database Systems 4-42 Page 21
Natural Join, Outer Join, and Nested Join Q1b: fname, lname, address (employee natural join (department as dept(dname, dno, mssn, msdate) where dname = Research ; Q8b: Retrieve the last names of all employees and his or her supervisor if these employees have a supervisor. e.lname as employee_name, s.lname as supervisor_name (employee e left outer join employee s on e.superssn = s.ssn); Q2A: pnumber, dnum, lname, address, bdate ((project join department on dnum = dnumber) join employee on mgrssn = ssn) where plocation = Stafford ; Junping Sun Database Systems 4-43 Outer Join in ORACLE Q8b: Retrieve the last names of all employees and his or her supervisor if these employees have a supervisor. e.lname as employee_name, s.lname as supervisor_name employee e, employee s where e.superssn = s.ssn (+); This is equivalent to that the employee table as the role of employee left outer joins the employee table as the role of supervisor. Q8c: Retrieve the last names of all employees and his or her supervisees if these employees have a supervisee. s.lname as employee_name, e.lname as supervisor_name employee s, employee e where s.ssn = e.superssn (+); This is equivalent to that the employee as the role of supervisor left outer joins the employee table as the role of supervisee. Junping Sun Database Systems 4-44 Page 22
Aggregation Functions Aggregate Functions: It takes an entire column as an argument and compute a single value based on the contents of the column. The function result is an aggregate of the individual data values in the rows of the column. Q15 : Find the total number of employees in the company, the sum of the salaries of all employees, the maximum, the minimum, and the average salary. count(*), sum(salary), max(salary), min(salary), avg(salary) employee; count(*) is applied to count the total number of tuple employee tuple. sum(), max(), min(), and avg() functions is applied to salary column value of the tuples in employee table. Junping Sun Database Systems 4-45 Q16 : Find the total number of employees of the Research department, as well as the summation of the salaries, the maximum salary, the minimum salary, and the average salary in this department. where count(*), sum(salary), max(salary), min(salary), avg(salary) employee dno = dnumber and dname = Research ; all the aggregation functions, count(), sum(), max(), min(), and avg() are applied to these employee tuples Research department. the constraints dno = dnumber and dname = Research in where clause are evaluated first before aggregate functions are evaluated. Q19: Count the number of distinct salary values in the database. count (distinct salary) employee; Junping Sun Database Systems 4-46 Page 23
Q5: Retrieve the names of all employees who have two or more dependents Incorrect one: lname, fname employee where ( count(*) dependent where ssn = essn ) >= 2; when a subquery is one of the two operands involved in a comparison, the subquery must be written as the second operand. Correct one: lname, fname employee where 2 <= ( count(*) dependent where ssn = essn ); Junping Sun Database Systems 4-47 Group By Clause In many cases, we want to apply aggregate functions to subgroups of tuples in a relation based on some attribute values. Example: Find the average salary of employees in each department find the number of employees who work on each project. In these cases, we want to group the tuples have the same value of some attribute(s), called the grouping attribute(s), and apply the function to each such group independently. SQL has a group by clause for this purpose. The group by clause specifies the grouping attributes, which must also appear in the clause, so that the value of applying each function on the group of tuples appears along with the value of the grouping attribute(s). Junping Sun Database Systems 4-48 Page 24
Group by Clause Q20: For each department, retrieve the department number, the number of employees in the department, and their average salary. dno, count(*), avg(salary) employee group by dno; Q21: For each project, retrieve the project number, the project name, and number of employees who work on that project. pnumber, pname, count(*) project, works_on where pnumber = pno group by pnumber, pname; the grouping and aggregate functions are applied after the joining of the two relations. Junping Sun Database Systems 4-49 Having Clause Q22. For each project on which more than two employees work, retrieve the project number, project name, and number of employees work on that project. pnumber, pname, count(*) project, workson where pnumber = pno group by pnumber, pname having count(*) > 2; SQL provides a having clause, which can appear only in conjunction with group by clause having provides a condition on the group of tuples associated with each value of the grouping attributes, and only the groups that satisfy the condition are retrieved in the result of the query. ion condition in the where clause limits the tuples to which group function are applied. the having clause limits the whole groups. Junping Sun Database Systems 4-50 Page 25
Q23. For each project, retrieve the project number, project name, and number of employee department 5 who works on that project pnumber, pname, count(*) project, workson, employee where pnumber = pno and ssn = essn and dno = 5 group by pnumber, pname; Q5. Retrieve the name s of all employees who have two or more dependents. lname, fname employee where ssn in ( essn dependent where ssn = essn group by essn having count (essn) >= 2); Junping Sun Database Systems 4-51 Where Condition before Having Q24. Count the total number of employees with salaries greater than $40,000 who work in each department, but only these department with more than five employees. dname, count(*) department, employee where dnumber = dno and salary > 40000 group by dname having count(*) > 5; this is not the correct query statement. ion condition (salary > 40000) has eliminated these employee tuples whose salary <= 40000 before the group by and having clauses. it will only departments that have more than five employees who each earns more than $40,000. the rule is that the where clause is executed first to individual tuples; the having clause is applied later to individual groups of tuples. the tuples are already restricted to employees earning more than $40,000 before the function in the having clause is applied. Junping Sun Database Systems 4-52 Page 26
The correct one: where group by dname, count(*) department, employee dnumber = dno and salary > 40000 and dno in ( dno employee group by dno having count(*) > 5) dname; the constraints dnumber = dno and salary > 40000 in where clause join the department tuples with employee tuples whose salary is greater than 40000. the subquery which includesfive employees work. Junping Sun Database Systems 4-53 Having Clause HAVING clause is designed for use in conjunction with GROUP BY when it is desired to restrict the groups which appears in the final result. HAVING conditions often involve aggregation functions, permitting the filtering of groups based on summary calculations. Aggregation functions may not be used within a WHERE clause. WHERE clause filters individual rows going to the final result or intermediate result. HAVING filters groups going into the final result. WHERE and HAVING may be used together cooperatively: WHERE is applied first to filter single rows, then group are formed the rows which remain, then finally the HAVING clause is applied to filter the groups. Generally, the HAVING clause immediately follows the GROUP BY clause. Junping Sun Database Systems 4-54 Page 27
Summary of GROUP BY/HAVING Clauses 1. Attribute names or column names not listed in the GROUP BY clause may not appear in the HAVING condition in ANSI-1989 and ANSI-1992 SQL. 2. Aggregation functions may always be used in the HAVING clause, even if they do not appear in the SELECT attribute list. 3. The HAVING condition can involve compound conditions formed by combining simple logical expressions with the logical operators AND, OR, and NOT. 4. HAVING and WHERE can work together. HAVING condition is always applied to GROUP BY Clause. WHERE condition is always applied to attributes involved in ion or join. 5. Non-aggregation expression may be used in the HAVING clause, providing the expressions involve only columns which are named in the GROUP BY clause. Junping Sun Database Systems 4-55 Syntax Structure of SELECT Statements SELECT FROM [WHERE [GROUP BY [HAVING [ORDER BY <attribute list> <table list> <condition>] <grouping attribute(s)>] <grouping condition>] <attribute list>] SELECT clause lists the attributes or functions to be retrieved. FROM clause specifies all relations needed in the query but not those in nested query. WHERE clause specifies the conditions for ion of tuples these relations. GROUP BY specifies grouping attribute(s), whereas HAVING clause specifies a condition on the groups being ed rather than on the individual tuples. The built in aggregation functions COUNT, SUM, MIN, MAX, and AVG are used in conjunction with grouping. ORDER specifies an order Junping Sun Database Systems 4-56 Page 28
Sequence 1. FROM: The FROM clause is processed first. It specifies the table(s) or views which serve as the source of all data for the final result. If multiple tables are involved, the join operation is necessary. 2. WHERE: The WHERE clause is processed second. It eliminates those rows defined in FROM clause which do not satisfy the search condition. 3. GROUP BY: The GROUP BY clause groups the remaining rows on the basis of shared values in the GROUP BY column(s). The partial result now has the form of a set of groups. 4. HAVING: The HAVING clause is now applied to eliminate those groups which do not satisfy the HAVING condition. 5. SELECT: The SELECT list is used to remove unwanted columns or attributes the partial result. Only elements which appear in the SELECT list remain. 6. ORDER BY: The final result in the order based on ORDER BY list. Junping Sun Database Systems 4-57 Insert Statement: Insert Statement in SQL Insert a new tuple into employee table: insert into values employee ( Richard, K, Marini, 653298653, 30-DEC-52, 98 Oak Forest, Katy, TX', 'M', 37000, '987654321', 4); insert into employee(fname, lname, ssn) values ( Richard, Marimi, 653298653 ); Attributes that are not specified in the insert statement are set to their DEFAULT or to NULL if the attributes are defined with DEFAULT or NULL. The insert operation will be rejected if NOT NULL has been specified for those attributes. Junping Sun Database Systems 4-58 Page 29
Insert a set of tuples into a table: create a relation and load it with result of a query. create table depts_info (deptname vchar(15), noofemps integer, totalsal integer); insert into where group by depts_info (deptname, noofemps, totalsal) dname, count(*), sum(salary) department, employee dnumber = dno dname; Junping Sun Database Systems 4-59 Delete Statement in SQL Delete a tuple: to delete the employee tuple with lname Brown delete employee where lname = Brown ; Delete a set of tuples: to delete the employee tuples Research department delete employee where dno in ( dnumber department where dname = Research ); To delete all the tuples in employee table: delete employee; (this gives an empty table) Junping Sun Database Systems 4-60 Page 30
Update a single tuple: Update Statement in SQL to change the location and controlling department number of project number 10 to Bellaire and 5. update project set plocation = Bellaire, dnum = 5 where pnumber = 10; Update a set of tuples in a table: to raise the salary of employees Research department by 10%. update employee set salary = salary * 1.1 where dno in ( dnumber department where dname = Research ); Junping Sun Database Systems 4-61 View: Views in SQL It is a single table is derived other tables, these other tables can be base tables or previously defined views. A view does not necessarily exist in physical form, it is considered as a virtual table in contrast to base tables whose tuples are actually stored in the database. Advantages and Disadvantages of View: The advantage is that a frequent query involving with join operations can be represented. Queries involving join operations do not have to do join operations every time by querying the view. The disadvantage is that the possible update operations applied to views are limited. Junping Sun Database Systems 4-62 Page 31
Specification of Views in SQL Create a view on fname, lname, pname, hours V1: create view works_on1 as fname, lname, pname, hours employee, project, works_on where ssn = essn and pno = pnumber; works_on1: fname lname pname hours V2: create view dept_info (dept_name, no_of_emps, total_sal) as dname, count(*), sum(salary) department, employee where dnumber = dno group by dname; dept_info dept_name no_of_emps total_sal Junping Sun Database Systems 4-63 Querying on View QV1: To retrieve the last name, first name of all employees who work on ProjectX pname, fname, lname works_on1 where pname = ProductX ; A view is always up to date, if we modify the tuples in the base tables which the view is defined, the view automatically reflects these changes. The view is not realized at the time of view definition but rather at the time we specify a query on the view. It is the responsibility of the DBMS and not the user to make sure that the view is up to date. If the view is no longer useful, then view can be disposed by drop command. V1d: drop view works_on1; V2d: drop view dept_info; Junping Sun Database Systems 4-64 Page 32
Single Table View Update: Updating in Views An update on a view defined on a single table can be mapped to an update on the underlying base table. Multi Table View Update: An view involving joins, an update operation may be mapped to update operations on the underlying base relations in multiple ways. Suppose there is a view update the PNAME attribute of John Smith ProductX to ProductY. UV1: update works_on1 set pname = ProductY where lname = smith and fname = john and pname = ProductX this query can be mapped into several updates on the base relations to give the desired update on the view. Junping Sun Database Systems 4-65 There are two possible update (a) and (b) on the base relations corresponding to UV1. (a). update works_on set pno = ( pnumber project where pname ='ProdcutY') where essn = ( ssn employee where lname = 'Smith' and fname ='John') and pno = ( pnumber project where pname ='ProductX') (b). update project set pname = 'ProductY' where pname = 'ProductX' Junping Sun Database Systems 4-66 Page 33
Discussion Update (a) relates "John Smith to the Product Y project tuple in place of the Product X, and is the most likely to desired updated. Original update changes the project name pname in works_on1 view, it is unlikely that the update wants to change the PNAME itself, the semantics here is to update the project that John Smith works on. So the update (a) will update the correspondent project number where PNAME = Product Y in works_on base table. Update (b) would also give the desired updated effect on the view, but it accomplishes this by changing the name of of the Product X tuple in the project relation to Product Y. It is quite unlikely that the user who specified the view update UV1 wants to update to be interpreted as in update (b). Junping Sun Database Systems 4-67 Observation A view with a single defining table is updatable if the view attributes contain the primary key or some other candidate key of the base relation, because this maps each (virtual) view tuple to a single base tuple. Views defined on multiple tables using joins are generally not updatable. Views defined using grouping and aggregate function are not updatable. Example: UV2: modify dept_info set total_sal = 100000 where dname = Research ; A view update is feasible when only one possible update on the base relations can accomplish the desired update effect on the view. Whenever an update on the view can be mapped to more than one update on the underlying base relations, we must have a certain procedure to choose the desired update. some researchers have developed methods for choosing the most likely update. while other researchers prefer to have the user choose the desired update mapping view definition. Junping Sun Database Systems 4-68 Page 34
Specifying Additional Constraints as Assertions To specify the constraint The salary of an employee must not be greater than the salary of the manager of the department that employee works for. create assertion salary_constraint check ( not exists ( * employee e, employee m, department d where e.salary > m.salary and e.dno = d.dnumber and d.mgrssn = m.ssn) ); if tuples in the database cause the condition of an Assertion statement to evaluate to be FALSE, the constraint is violated. Junping Sun Database Systems 4-69 Specifying index on single attribute: I1: create index lname_index on employee (lname ); Specifying Index in SQL Specifying index on multiple attributes: I2: create index names_index on employee (lname asc, fname desc, minit); Specifying index on the attribute with unique value: I3: create unique index ssn_index on employee(ssn); Specifying cluster index: I4: create index dno_index on employee (dno) cluster; Junping Sun Database Systems 4-70 Page 35
Cluster in ORACLE create cluster deptandemp (deptemp varchar(9) ); create table department ( dname varchar(19), dnumber varchar(9),... ) cluster deptandemp (dnumber) ; create table employee ( name varchar(19),... dno varchar(9), ) cluster deptandemp (dno) ; Junping Sun Database Systems 4-71 Discussion on Index The reseason and motivation for index is to support efficient search and maintenance. Advantages: Indices support binary search Indices support dynamic maintenance Disadvantages: It costs extra memory space. Algorithms to support indices are more complex. Key work unique can be used to enforce the key constraint. The reason behind linking the definition of a key constraint with specifying an index is that it is much more efficient to enforce uniqueness of key values on a file if an index is defined on the key attribute, since the search on index is much more efficient. A clustering and unique index is similar to primary index. A clustering and non-unique index is similar to cluster index. A nonclustering index is similar to secondary index. Junping Sun Database Systems 4-72 Page 36