Useful Business Analytics SQL operators and more Ajaykumar Gupte IBM 1
Acknowledgements and Disclaimers Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. Copyright IBM Corporation 2015. All rights reserved. U.S. Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM, the IBM logo, ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml Other company, product, or service names may be trademarks or service marks of others. 2
AGENDA Set Operators Functionality, Basic Rules Null Friendly Intersect and Minus Usage Execution Plans Scenarios ANSI JOIN Query improvements 4/9/15 3
SET Operators UNION Operator - Combines the rows from two or more result sets into a single result set. INTERSECT Operator - Computes a result set that contains the common rows from two result sets. MINUS/EXCEPT Operator - Evaluates two result sets and returns all rows from the first set that are not also contained in the second set. 4/9/15 4
Set Operations Result Sets MINUS and EXCEPT are synonyms.
Functionality Of Intersect and Minus Extension to the existing UNION/UNION ALL SET operation Results are always distinct or unique rows (eliminate duplicate rows) Same rules of UNION also applies e.g - Both query blocks should have exact same number of columns - Projection clause should have comparable data types - Projection clause can not have BYTE or TEXT 4/9/15 6
Functionality Of Intersect and Minus - Order by should be at the end - Precedence will be from left to right, unless they are grouped using parentheses - Existing restrictions for UNION, applies to these operators too. 4/9/15 7
NULL Friendly SET Operators Both Intersect and Minus are NULL friendly, means when comparing NULL to NULL they are considered equal 4/9/15 8
Examples create table t1 (col1 int); create table t2 (col1 int); insert into t1 values (1); insert into t2 values (1); insert into t1 values (2); insert into t2 values (3); insert into t1 values (2); insert into t2 values (4); insert into t1 values (2); insert into t2 values (4); insert into t1 values (3); insert into t2 values (NULL); insert into t1 values (4); insert into t1 values (4); insert into t1 values (NULL); insert into t1 values (NULL); insert into t1 values (NULL); 4/9/15 9
Examples select col1 from t1 intersect select col1 from t2; col1 1 3 4 4 row(s) retrieved. select col1 from t1 minus select col1 from t2; col1 2 1 row(s) retrieved. NULL 4/9/15 10
Usage Inside VIEW definitions create view v1(c1,c2) as select * from tabp intersect select * from tabr; create view v55(c1,c2) as select * from tabp minus (select * from tabr minus select * from v1) union (select * from tabp minus select * from tabr); 4/9/15 11
Usage Inside the Derived Table select * from (select tab1.* from tab1 LEFT OUTER JOIN tab2 ON tab1.intcol = tab2.intcol2 intersect select tab2.* from tab3 FULL OUTER JOIN tab2 ON tab2.charcol2 = tab3.charcol3); 4/9/15 12
Usage Inside the Subquery select c1,c2,c3,c4,c5 from mtab1 where exists (select c1,c2,c3,c4,c5 from stab1 group by c2,c3,c4,c5,c1 intersect select c1,c2,c3,c4,c5 from stab2 group by c2,c3,c4,c5,c1 having count(*) < 3) and c1 = 1; 4/9/15 13
Usage Inside the Procedure create procedure p1_1() returning int; define ret_val int; define row_val int; let ret_val = 0; foreach select intcol into row_val from tab1 intersect select intcol2 from tab2 let ret_val = ret_val + 1; end foreach return ret_val; end procedure; 4/9/15 14
Usage Cross database and Cross server select intcol2, charcol2 from tab2 minus (select intcol3, charcol3 from db2:tab3 intersect select intcol, charcol from db3@serv3:tab1); 4/9/15 15
Set Operators Optimization INTERSECT rows common to both arms internally transformed into EXISTS subquery with special NULL handling MINUS or EXCEPT rows in first arm that s not in second arm internally transformed into NOT EXISTS subquery with special NULL handling
Nested Loop Semi Join Execute subquery as a variation of nested-loop join Semi Join- read inner table only until server finds a match for each row in the outer table, the inner table contributes at most one row Anti Semi Join return all non-matching rows from inner table
Set Operations in explain QUERY: ------ select intcol from tab1 intersect select intcol2 from tab2 Estimated Cost: 4 Estimated # of Rows Returned: 1 1) informix.tab1: SEQUENTIAL SCAN 2) informix.tab2: SEQUENTIAL SCAN (First Row) Filters: informix.tab1.intcol == informix.tab2.intcol2 NESTED LOOP JOIN (Semi Join)
Set Operations in explain QUERY: ------ select intcol, charcol from tab1 intersect select intcol2, charcol2 from tab2 minus select intcol3, charcol3 from tab3 Estimated Cost: 6 Estimated # of Rows Returned: 1 1) informix.tab1: SEQUENTIAL SCAN 2) informix.tab2: SEQUENTIAL SCAN (First Row) Filters: (informix.tab1.intcol == informix.tab2.intcol2 AND informix.tab1.charcol == informix.tab2.charcol2 ) NESTED LOOP JOIN (Semi Join) 3) informix.tab3: SEQUENTIAL SCAN (First Row) Filters: (informix.tab1.charcol == informix.tab3.charcol3 AND informix.tab1.intcol == informix.tab3.intcol3 ) NESTED LOOP JOIN (Anti Semi Join)
Scenarios This INTERSECT query example finds suppliers who have placed an order. select supplier_id from suppliers INTERSECT select supplier_id from orders; This MINUS query example finds suppliers who have not placed any order. select supplier_id from suppliers MINUS select supplier_id from orders;
ANSI Join improvements Join Directives supported in ANSI queries ORDERED directive not allowed. HASH Join Support Support for Bushy tree and Right deep tree execution. Optimizer changes to allow comparison between Nested Loop and Hash Joins.
Hash Join Support in ANSI JOIN Without Hash join support, only way to execute joins on large tables without index is to create DYNAMIC index followed by Nested Loop join. Hash join can be faster for large joins Optimizer costing is adjusted for situation where build/probe sides for hash join can be composite
Hash Join for ANSI JOIN in sqexplain QUERY: ------ select * from (t1 left join t2 on t1.a = t2.a ) left join (t3 inner join t4 on t3.a = t4.a) on t4.a = t1.a 1) informix.t1: SEQUENTIAL SCAN 2) informix.t2: INDEX PATH (1) Index Name: informix.ind2 Index Keys: a (Serial, fragments: ALL) Lower Index Filter: informix.t1.a = informix.t2.a ON-Filters:informix.t1.a = informix.t2.a NESTED LOOP JOIN(LEFT OUTER JOIN) 3) informix.t3: SEQUENTIAL SCAN 4) informix.t4: INDEX PATH (1) Index Name: informix.ind4 Index Keys: a (Serial, fragments: ALL) Lower Index Filter: informix.t3.a = informix.t4.a ON-Filters:informix.t3.a = informix.t4.a NESTED LOOP JOIN ON-Filters:informix.t4.a = informix.t1.a DYNAMIC HASH JOIN (LEFT OUTER JOIN) Dynamic Hash Filters: informix.t4.a = informix.t1.a
Questions? 24