Extended Operators in SQL and Relational Algebra

Similar documents
Oracle SQL. Course Summary. Duration. Objectives

Quiz 2: Database Systems I Instructor: Hassan Khosravi Spring 2012 CMPT 354

COMP 5138 Relational Database Management Systems. Week 5 : Basic SQL. Today s Agenda. Overview. Basic SQL Queries. Joins Queries

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals NEW

More on SQL. Juliana Freire. Some slides adapted from J. Ullman, L. Delcambre, R. Ramakrishnan, G. Lindstrom and Silberschatz, Korth and Sudarshan

Oracle Database: SQL and PL/SQL Fundamentals

Relational Database: Additional Operations on Relations; SQL

Instant SQL Programming

Relational Databases

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Chapter 9 Joining Data from Multiple Tables. Oracle 10g: SQL

MOC 20461C: Querying Microsoft SQL Server. Course Overview

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

SUBQUERIES AND VIEWS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 6

SQL SELECT Query: Intermediate

Introduction to Microsoft Jet SQL

Database Administration with MySQL

Introduction to database design

Introduction to SQL and SQL in R. LISA Short Courses Xinran Hu

1. SELECT DISTINCT f.fname FROM Faculty f, Class c WHERE f.fid = c.fid AND c.room = LWSN1106

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

Oracle Database 12c: Introduction to SQL Ed 1.1

SQL: Queries, Programming, Triggers

Structured Query Language (SQL)

Example Instances. SQL: Queries, Programming, Triggers. Conceptual Evaluation Strategy. Basic SQL Query. A Note on Range Variables

Performing Queries Using PROC SQL (1)

DBMS / Business Intelligence, SQL Server

Relational Algebra. Basic Operations Algebra of Bags

Comp 5311 Database Management Systems. 3. Structured Query Language 1

Databases 2011 The Relational Model and SQL

Chapter 13: Query Optimization

SQL (structured query language)

Using Temporary Tables to Improve Performance for SQL Data Services

SQL: Queries, Programming, Triggers

Advance DBMS. Structured Query Language (SQL)

Financial Data Access with SQL, Excel & VBA

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database 10g: Introduction to SQL

Relational Algebra. Query Languages Review. Operators. Select (σ), Project (π), Union ( ), Difference (-), Join: Natural (*) and Theta ( )

1 Structured Query Language: Again. 2 Joining Tables

Chapter 8. SQL-99: SchemaDefinition, Constraints, and Queries and Views

Boats bid bname color 101 Interlake blue 102 Interlake red 103 Clipper green 104 Marine red. Figure 1: Instances of Sailors, Boats and Reserves

Exercise 1: Relational Model

Relational Algebra and SQL

Chapter 13: Query Processing. Basic Steps in Query Processing

Week 4 & 5: SQL. SQL as a Query Language

3.GETTING STARTED WITH ORACLE8i

IT2304: Database Systems 1 (DBS 1)

CS 338 Join, Aggregate and Group SQL Queries

SQL Server 2008 Core Skills. Gary Young 2011

IT2305 Database Systems I (Compulsory)

Part A: Data Definition Language (DDL) Schema and Catalog CREAT TABLE. Referential Triggered Actions. CSC 742 Database Management Systems

Chapter 1 Overview of the SQL Procedure

UNIT 6. Structured Query Language (SQL) Text: Chapter 5

Programming with SQL

Object-Oriented Query Languages: Object Query Language (OQL)

Handling Missing Values in the SQL Procedure

Advanced Query for Query Developers

How To Create A Table In Sql (Ahem)

Chapter 5. SQL: Queries, Constraints, Triggers

Introducing Microsoft SQL Server 2012 Getting Started with SQL Server Management Studio

Querying Microsoft SQL Server

types, but key declarations and constraints Similar CREATE X commands for other schema ëdrop X name" deletes the created element of beer VARCHARè20è,

Lecture 4: More SQL and Relational Algebra

Netezza SQL Class Outline

Querying Microsoft SQL Server 2012

SQL Server for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

Relational Division and SQL

Oracle Database 11g SQL

Information Systems SQL. Nikolaj Popov

Querying Microsoft SQL Server 2012

ICAB4136B Use structured query language to create database structures and manipulate data

Course 10774A: Querying Microsoft SQL Server 2012

Course 20461C: Querying Microsoft SQL Server Duration: 35 hours

Course ID#: W 35 Hrs. Course Content

Course 10774A: Querying Microsoft SQL Server 2012 Length: 5 Days Published: May 25, 2012 Language(s): English Audience(s): IT Professionals

Introduction to Querying & Reporting with SQL Server

Querying Microsoft SQL Server 20461C; 5 days

2874CD1EssentialSQL.qxd 6/25/01 3:06 PM Page 1 Essential SQL Copyright 2001 SYBEX, Inc., Alameda, CA

The SQL Query Language. Creating Relations in SQL. Referential Integrity in SQL. Basic SQL Query. Primary and Candidate Keys in SQL

SQL Basics. Introduction to Standard Query Language

Data Structure: Relational Model. Programming Interface: JDBC/ODBC. SQL Queries: The Basic From

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

Translating SQL into the Relational Algebra

3. Relational Model and Relational Algebra

Database Applications Microsoft Access

Question 1. Relational Data Model [17 marks] Question 2. SQL and Relational Algebra [31 marks]

David M. Kroenke and David J. Auer Database Processing: Fundamentals, Design and Implementation

Introduction to SQL C H A P T E R3. Exercises

A Brief Introduction to MySQL

Querying Microsoft SQL Server Course M Day(s) 30:00 Hours

Discovering SQL. Wiley Publishing, Inc. A HANDS-ON GUIDE FOR BEGINNERS. Alex Kriegel WILEY

Chapter 14: Query Optimization

Intermediate SQL C H A P T E R4. Practice Exercises. 4.1 Write the following queries in SQL:

4. SQL. Contents. Example Database. CUSTOMERS(FName, LName, CAddress, Account) PRODUCTS(Prodname, Category) SUPPLIERS(SName, SAddress, Chain)

RDBMS Using Oracle. Lecture Week 7 Introduction to Oracle 9i SQL Last Lecture. kamran.munir@gmail.com. Joining Tables

Ad Hoc Advanced Table of Contents

MOC QUERYING MICROSOFT SQL SERVER

Introduction to SQL: Data Retrieving

Transcription:

Extended Operators in SQL and Relational Algebra T. M. Murali September 15, 2010

Bags or Sets? So far, we have said that relational algebra and SQL operate on relations that are sets of tuples. Real RDBMSs treat relations as bags of tuples. A tuple can appear multiple times in a relation.

Bags or Sets? So far, we have said that relational algebra and SQL operate on relations that are sets of tuples. Real RDBMSs treat relations as bags of tuples. A tuple can appear multiple times in a relation. Performance is one of the main reasons;

Bags or Sets? So far, we have said that relational algebra and SQL operate on relations that are sets of tuples. Real RDBMSs treat relations as bags of tuples. A tuple can appear multiple times in a relation. Performance is one of the main reasons; duplicate elimination is expensive since it requires sorting. If we use bag semantics, we may have to redefine the meaning of each relation algebra operator.

Bag Semantics: Projection and Selection Projection (π()): process each tuple independently; a tuple may appear in the resulting relation multiple times. Selection (σ()): process each tuple independently; a tuple may appear in the resulting relation multiple times.

Bag Semantics: Projection and Selection Projection (π()): process each tuple independently; a tuple may appear in the resulting relation multiple times. Selection (σ()): process each tuple independently; a tuple may appear in the resulting relation multiple times. R A B C 3 4 4 4

Bag Semantics: Projection and Selection Projection (π()): process each tuple independently; a tuple may appear in the resulting relation multiple times. Selection (σ()): process each tuple independently; a tuple may appear in the resulting relation multiple times. R A B C 3 4 4 4 π A,B (R)

Bag Semantics: Projection and Selection Projection (π()): process each tuple independently; a tuple may appear in the resulting relation multiple times. Selection (σ()): process each tuple independently; a tuple may appear in the resulting relation multiple times. R A B C 3 4 4 4 π A,B (R) A B

Bag Semantics: Projection and Selection Projection (π()): process each tuple independently; a tuple may appear in the resulting relation multiple times. Selection (σ()): process each tuple independently; a tuple may appear in the resulting relation multiple times. R A B C 3 4 4 4 π A,B (R) A B σ C 3 (R)

Bag Semantics: Projection and Selection Projection (π()): process each tuple independently; a tuple may appear in the resulting relation multiple times. Selection (σ()): process each tuple independently; a tuple may appear in the resulting relation multiple times. R A B C 3 4 4 4 π A,B (R) A B σ C 3 (R) A B C 3 4 4 4

Bag Semantics: Union, Intersection, and Difference R S: if tuple t appears k times in R and l times in S, t appears in R S k + l times.

Bag Semantics: Union, Intersection, and Difference R S R S: if tuple t appears k times in R and l times in S, t appears in R S k + l times. R A B S A B 2 4

Bag Semantics: Union, Intersection, and Difference R S: if tuple t appears k times in R and l times in S, t appears in R S k + l times. R A B S A B 2 4 R S A B 2 4

Bag Semantics: Union, Intersection, and Difference R S: if tuple t appears k times in R and l times in S, t appears in R S k + l times. R S: if tuple t appears k times in R and l times in S, t appears in R S min{k, l} times. R A B S A B 2 4 R S A B 2 4 R S

Bag Semantics: Union, Intersection, and Difference R S: if tuple t appears k times in R and l times in S, t appears in R S k + l times. R S: if tuple t appears k times in R and l times in S, t appears in R S min{k, l} times. R A B S A B 2 4 R S A B 2 4 R S A B

Bag Semantics: Union, Intersection, and Difference R S: if tuple t appears k times in R and l times in S, t appears in R S k + l times. R S: if tuple t appears k times in R and l times in S, t appears in R S min{k, l} times. R S: if tuple t appears k times in R and l times in S, t appears in R S max{0, k l} times. R A B R S A B S A B 2 4 R S R S A B 2 4

Bag Semantics: Union, Intersection, and Difference R S: if tuple t appears k times in R and l times in S, t appears in R S k + l times. R S: if tuple t appears k times in R and l times in S, t appears in R S min{k, l} times. R S: if tuple t appears k times in R and l times in S, t appears in R S max{0, k l} times. R A B R S A B S A B 2 4 R S A B R S A B 2 4

Bag Semantics: Products and Joins Product ( ): If a tuple r appears k times in a relation R and tuple s appears l times in a relation S, then the tuple rs appears kl times in R S. Theta-join and Natural join ( ): Since both can be expressed as applying a selection followed by a projection to a product, use the semantics of selection, projection, and the product.

Extended Operators Powerful operators based on basic relational operators and bag semantics.

Extended Operators Powerful operators based on basic relational operators and bag semantics. Sorting: convert a relation into a list of tuples. Duplicate elimination: turn a bag into a set by eliminating duplicate tuples.

Extended Operators Powerful operators based on basic relational operators and bag semantics. Sorting: convert a relation into a list of tuples. Duplicate elimination: turn a bag into a set by eliminating duplicate tuples. Grouping: partition the tuples of a relation into groups, based on their values among specified attributes.

Extended Operators Powerful operators based on basic relational operators and bag semantics. Sorting: convert a relation into a list of tuples. Duplicate elimination: turn a bag into a set by eliminating duplicate tuples. Grouping: partition the tuples of a relation into groups, based on their values among specified attributes. Aggregation: used by the grouping operator and to manipulate/combine attributes.

Extended Operators Powerful operators based on basic relational operators and bag semantics. Sorting: convert a relation into a list of tuples. Duplicate elimination: turn a bag into a set by eliminating duplicate tuples. Grouping: partition the tuples of a relation into groups, based on their values among specified attributes. Aggregation: used by the grouping operator and to manipulate/combine attributes. Extended projections: projection on steroids. Outerjoin: extension of joins that make sure every tuple is in the output.

Sorting RA τ A1,A 2,...(R). SQL SELECT... FROM... WHERE... ORDER BY A 1, A 2,....

Sorting RA τ A1,A 2,...(R). SQL SELECT... FROM... WHERE... ORDER BY A 1, A 2,.... The result is a list of tuples in R but with the tuples sorted by their values in attributes A 1, A 2,... In SQL, use DESC after an attribute to specify sorting in descending order; ASC is the default. If you use the result in another query, sorted order is lost.

Duplicate Elimination RA δ(r) is the relation containing exactly one copy of each tuple in R. SQL SELECT DISTINCT...

Duplicate Elimination RA δ(r) is the relation containing exactly one copy of each tuple in R. SQL SELECT DISTINCT... Duplicate elimination is expensive, since tuples must be sorted or partitioned.

Duplicate Elimination RA δ(r) is the relation containing exactly one copy of each tuple in R. SQL SELECT DISTINCT... Duplicate elimination is expensive, since tuples must be sorted or partitioned. Set operations in SQL (UNION, INTERSECT, and EXCEPT) operate on sets of tuples, i.e., they first eliminate duplicates. To make these operators treat relations as bags, follow the operation with the keyword ALL.

Aggregation Operators that summarise or aggregate the values in a single attribute of a relation. Operators are the same in relational algebra and SQL. All operators treat a relation as a bag of tuples. SUM: computes the sum of a column with numerical values.

Aggregation Operators that summarise or aggregate the values in a single attribute of a relation. Operators are the same in relational algebra and SQL. All operators treat a relation as a bag of tuples. SUM: computes the sum of a column with numerical values. AVG: computes the average of a column with numerical values.

Aggregation Operators that summarise or aggregate the values in a single attribute of a relation. Operators are the same in relational algebra and SQL. All operators treat a relation as a bag of tuples. SUM: computes the sum of a column with numerical values. AVG: computes the average of a column with numerical values. MIN and MAX: for a column with numerical values, computes the smallest or largest value, respectively. for a column with string or character values, computes the lexicographically smallest or largest values, respectively.

Aggregation Operators that summarise or aggregate the values in a single attribute of a relation. Operators are the same in relational algebra and SQL. All operators treat a relation as a bag of tuples. SUM: computes the sum of a column with numerical values. AVG: computes the average of a column with numerical values. MIN and MAX: for a column with numerical values, computes the smallest or largest value, respectively. for a column with string or character values, computes the lexicographically smallest or largest values, respectively. COUNT: computes the number of non-null tuples in a column. In SQL, can use COUNT (*) to count the number of tuples in a relation.

Grouping How do we answer the query Count the number of classes and the total enrollment of the classes each department teaches?

Grouping How do we answer the query Count the number of classes and the total enrollment of the classes each department teaches? Can we answer the query using the operators discussed so far?

Grouping How do we answer the query Count the number of classes and the total enrollment of the classes each department teaches? Can we answer the query using the operators discussed so far? We need to group the tuples of Teach by DeptName and then aggregate within each group.

Grouping How do we answer the query Count the number of classes and the total enrollment of the classes each department teaches? Can we answer the query using the operators discussed so far? We need to group the tuples of Teach by DeptName and then aggregate within each group. Use the grouping operator.

Example of Grouping in Relational Algebra How do we answer the query Count the number of classes and total enrollment of the classes each department teaches?

Example of Grouping in Relational Algebra How do we answer the query Count the number of classes and total enrollment of the classes each department teaches? 1. Group Courses by DeptName.

Example of Grouping in Relational Algebra How do we answer the query Count the number of classes and total enrollment of the classes each department teaches? 1. Group Courses by DeptName. 2. For each group, create a new attribute that stores the number of classes taught by the department.

Example of Grouping in Relational Algebra How do we answer the query Count the number of classes and total enrollment of the classes each department teaches? 1. Group Courses by DeptName. 2. For each group, create a new attribute that stores the number of classes taught by the department. 3. For each group, create a new attribute that stores the total enrollment of the classes taught by the department.

Example of Grouping in Relational Algebra How do we answer the query Count the number of classes and total enrollment of the classes each department teaches? 1. Group Courses by DeptName. 2. For each group, create a new attribute that stores the number of classes taught by the department. 3. For each group, create a new attribute that stores the total enrollment of the classes taught by the department. γ L (Courses), where L is a list containing three elements: 1. DeptName: the grouping attribute

Example of Grouping in Relational Algebra How do we answer the query Count the number of classes and total enrollment of the classes each department teaches? 1. Group Courses by DeptName. 2. For each group, create a new attribute that stores the number of classes taught by the department. 3. For each group, create a new attribute that stores the total enrollment of the classes taught by the department. γ L (Courses), where L is a list containing three elements: 1. DeptName: the grouping attribute, 2. COUNT(Number) NumCourses: an aggregated attribute computing the count of the Number attribute in each group and naming the new attribute NumCourses, and

Example of Grouping in Relational Algebra How do we answer the query Count the number of classes and total enrollment of the classes each department teaches? 1. Group Courses by DeptName. 2. For each group, create a new attribute that stores the number of classes taught by the department. 3. For each group, create a new attribute that stores the total enrollment of the classes taught by the department. γ L (Courses), where L is a list containing three elements: 1. DeptName: the grouping attribute, 2. COUNT(Number) NumCourses: an aggregated attribute computing the count of the Number attribute in each group and naming the new attribute NumCourses, and 3. SUM(Enrollment) TotalEnrollment: an aggregated attribute computing the total of the Enrollment attribute and naming the new attribute TotalEnrollment.

Example of Grouping Continued How do we answer the query Count the number of classes and total enrollment of the classes each department teaches?

Example of Grouping Continued How do we answer the query Count the number of classes and total enrollment of the classes each department teaches? The complete operator is γ DeptName,COUNT(Number) NumCourses,SUM(Enrollment) TotalEnrollment (Courses) The schema of the new relation is (DeptName, NumCourses, TotalEnrollment).

Example of Grouping Continued How do we answer the query Count the number of classes and total enrollment of the classes each department teaches? The complete operator is γ DeptName,COUNT(Number) NumCourses,SUM(Enrollment) TotalEnrollment (Courses) The schema of the new relation is (DeptName, NumCourses, TotalEnrollment). We can group by multiple attributes. We can create as many new attributes as necessary. We can apply other operators to the result, since the grouping operator produces a relation.

Grouping in SQL Syntax is much simpler than relational algebra. Use the GROUP BY clause after the WHERE clause or after the FROM, if there is no WHERE clause. List grouping attributes after GROUP BY. Use SELECT clause to aggregate attributes.

Grouping in SQL Syntax is much simpler than relational algebra. Use the GROUP BY clause after the WHERE clause or after the FROM, if there is no WHERE clause. List grouping attributes after GROUP BY. Use SELECT clause to aggregate attributes. How do we answer the query Count the number of classes and total enrollment of the classes each department teaches?

Grouping in SQL Syntax is much simpler than relational algebra. Use the GROUP BY clause after the WHERE clause or after the FROM, if there is no WHERE clause. List grouping attributes after GROUP BY. Use SELECT clause to aggregate attributes. How do we answer the query Count the number of classes and total enrollment of the classes each department teaches? SELECT DeptName, COUNT(Number) AS NumCourses, SUM(Enrollment) AS TotalEnrollment FROM COURSES GROUP BY DeptName;

Grouping in SQL SELECT DeptName, COUNT(Number) AS NumCourses, SUM(Enrollment) AS TotalEnrollment FROM COURSES GROUP BY DeptName; Aggregated attributes are evaluated on a per-group basis. Only attributes mentioned in the GROUP BY clause may appear unaggregated in the SELECT clause, e.g., Number must have an aggregation operator applied to it. There need not be any aggregated attribute in the SELECT clause. Read Chapter 6.4.6 of the textbook about affect of NULL values on grouping and aggregation.

Restricting Grouping in SQL How do we answer the query Count the number of classes each department teaches, restricted to departments that have total enrollment at least 500 in their classes (the classes taught by that department)?

Restricting Grouping in SQL How do we answer the query Count the number of classes each department teaches, restricted to departments that have total enrollment at least 500 in their classes (the classes taught by that department)? Need to introduce the HAVING clause SELECT DeptName, COUNT(Number) AS NumCourses FROM COURSES GROUP BY DeptName HAVING SUM(Enrollment) >= 500;

Rules for HAVING Clauses An aggregation in a HAVING clause applies only to the group being tested. If an attribute appears unaggregated in a HAVING clause, it must appear in the GROUP BY line.

Complete SELECT Statement SELECT Attribute list FROM Relation list WHERE Condition or Subquery GROUP BY Attribute list HAVING Condition or Subquery ORDER BY Attribute list;

Complete SELECT Statement SELECT Attribute list FROM Relation list WHERE Condition or Subquery GROUP BY Attribute list HAVING Condition or Subquery ORDER BY Attribute list; WHERE is evaluated before GROUP BY and HAVING.

Joins in Relational Algebra and SQL Cross product: RA R S SQL R CROSS JOIN S; Theta join: RA R C S SQL R JOIN S ON C; Natural join: RA R S SQL R NATURAL JOIN S;

Outer Joins A dangling tuple is one that fails to pair with any tuple in the other relation in a join operation. Outer joins allow dangling tuples to be included in the result of join operations, by padding them with NULL values.

Outer Joins A dangling tuple is one that fails to pair with any tuple in the other relation in a join operation. Outer joins allow dangling tuples to be included in the result of join operations, by padding them with NULL values. RA R S SQL R NATURAL FULL OUTER JOIN S; Contains all tuples in R S. Also includes every tuple in R that is not joined with a tuple in S, after padding a special null symbol (NULL in case of SQL). Same condition applied to S.

Outer Joins A dangling tuple is one that fails to pair with any tuple in the other relation in a join operation. Outer joins allow dangling tuples to be included in the result of join operations, by padding them with NULL values. RA R S SQL R NATURAL FULL OUTER JOIN S; Contains all tuples in R S. Also includes every tuple in R that is not joined with a tuple in S, after padding a special null symbol (NULL in case of SQL). Same condition applied to S. Left outer join: RA R L S SQL R NATURAL LEFT OUTER JOIN S; Like R S but ignores dangling tuples in S.

Outer Joins A dangling tuple is one that fails to pair with any tuple in the other relation in a join operation. Outer joins allow dangling tuples to be included in the result of join operations, by padding them with NULL values. RA R S SQL R NATURAL FULL OUTER JOIN S; Contains all tuples in R S. Also includes every tuple in R that is not joined with a tuple in S, after padding a special null symbol (NULL in case of SQL). Same condition applied to S. Left outer join: RA R L S SQL R NATURAL LEFT OUTER JOIN S; Like R S but ignores dangling tuples in S. Right outer join is analogous to left outer join. All outerjoin operators have theta-join analogues.