Unit 3. Retrieving Data from Multiple Tables What This Unit Is About How to retrieve columns from more than one table or view. What You Should Be Able to Do Retrieve data from more than one table or view. Specify JOIN predicates. Use correlation names in queries How You Will Check Your Progress Checkpoint questions Machine labs Copyright IBM Corp. 1999, 2000 Unit 3. Retrieving Data from Multiple Tables 3-1
3-1. Unit Objectives (CF123010) 3-2 DB2 SQL Workshop Copyright IBM Corp. 1999, 2000
3.1 Accessing Data Stored in Multiple Tables Copyright IBM Corp. 1999, 2000 Unit 3. Retrieving Data from Multiple Tables 3-3
3-2. Retrieving Data from Multiple Tables (Principle) (CF123020) For each department number in table PROJECT, one row exists in table DEPARTMENT. By looking at the data within the two tables we can see how a row in one table is related to a row in the other table. The rows are related by the values in the DEPTNO columns. 3-4 DB2 SQL Workshop Copyright IBM Corp. 1999, 2000
3-3. Retrieving Data from Multiple Tables (JOIN) (CF123030) You can use the SELECT statement to produce reports that contain information from two or more tables. This is commonly referred as a JOIN. To join two tables, specify the columns you want to be displayed in the SELECT clause, the table names in the FROM clause, and the join predicate in the WHERE clause. Because DEPTNO is a column of tables DEPARTMENT and PROJECT, the column name must be qualified. Furthermore, it is a good practice that all column names in a SELECT referencing more than one table be fully qualified to avoid potential error. When the join predicate is omitted each qualified row from the first table is combined with all qualifying rows in the second table. This is called a Cartesian Product and is usually an unwanted result. Copyright IBM Corp. 1999, 2000 Unit 3. Retrieving Data from Multiple Tables 3-5
3-4. JOIN Syntax 1 (CF123040) The tables EMPLOYEE and DEPARTMENT have common values for columns WORKDEPT and DEPTNO. WORKDEPT contains the department number to which the employee belongs. The join predicate tells DB2 UDB to combine the row for an employee with the row for the department to which the employee belongs. In most cases, when joining two tables we provide at least one JOIN condition. Fore three tables, we provide at least two JOIN conditions.. The general rule of thumb is: The number of tables minus one (n-1) is USUALLY the LEAST number of join predicates needed for the query, ensuring that there are no un-linked tables. Further conditions are allowed and may be added by means of AND or OR. 3-6 DB2 SQL Workshop Copyright IBM Corp. 1999, 2000
3-5. JOIN Syntax 2 (JOIN Keyword) (CF123050) When the JOIN keyword is used in the FROM clause, the join predicates must be specified in an ON clause. Row conditions (local predicates) must be written in a WHERE clause which must follow the ON clause. Copyright IBM Corp. 1999, 2000 Unit 3. Retrieving Data from Multiple Tables 3-7
3-6. JOIN with Three Tables (CF123060) To find the name of the department that a project is assigned to, DB2 UDB can first read the PROJECT table to obtain the department number of the department. Next, read the corresponding row of the DEPARTMENT table to obtain the department name and the employee number of the responsible manager. Next, read the corresponding row of the EMPLOYEE table to obtain the manager's last name. The DB2 UDB optimizer determines the sequence of the above steps. 3-8 DB2 SQL Workshop Copyright IBM Corp. 1999, 2000
3-7. JOIN with Three Tables (Cont) (CF123070) If multiple tables are specified, but no JOIN predicate and no search condition is specified, the intermediate result table consists of all possible combinations of the rows of the specified tables (the Cartesian product). The number of rows in the result is the product of the number of rows of all specified tables. Each row of the result is a row of the first table concatenated with a row of the second table, concatenated with a row of the third table, and so on. -- JP stands for Join Predicate -- LP stands for Local Predicate Copyright IBM Corp. 1999, 2000 Unit 3. Retrieving Data from Multiple Tables 3-9
3-8. Correlation Name (CF123080) Correlation names are defined in the FROM clause of any query. In this example, the correlation names are merely used as synonyms for the table names. Furthermore, a correlation name can be used to avoid column name ambiguity. 3-10 DB2 SQL Workshop Copyright IBM Corp. 1999, 2000
3-9. Joining a Table with Itself (CF123090) To get the required data, the EMPLOYEE table must be accessed twice. Using the EMPLOYEE table, for each employee we read his/her department number. Then, from the department table we find his/her manager's employee number. Using the manager's employee number, we access the EMPLOYEE table again to retrieve the manager's date of birth. Then, the dates of birth are compared and for each employee that is older than his/her manager information is put into the result table. Copyright IBM Corp. 1999, 2000 Unit 3. Retrieving Data from Multiple Tables 3-11
3-10. Joining a Table with Itself (Cont) (CF123100) Table qualifiers (correlation names) are required because a table is referenced twice within the FROM clause of the query. 3-12 DB2 SQL Workshop Copyright IBM Corp. 1999, 2000
Checkpoint Exercise Unit 3 Checkpoint T F 1. If you reference multiple tables in the FROM clause, you should use JOIN conditions to obtain the desired result. 2. Which of the following situations applies if you forget the JOIN conditions in a SELECT statement using multiple tables: a. You receive an error and the statement is not executed. b. The statement is executed and the result is the Cartesian product of the tables. 3. Why do we use correlation names in a SELECT? Copyright IBM Corp. 1999, 2000 Unit 3. Retrieving Data from Multiple Tables 3-13
3-11. Unit Summary (CF123110) 3-14 DB2 SQL Workshop Copyright IBM Corp. 1999, 2000