Efficient Interval Management in Microsoft SQL Server



Similar documents
Using SQL Server Management Studio

Full and Complete Binary Trees

Physical Data Organization

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Unit Storage Structures 1. Storage Structures. Unit 4.3

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Database Design Patterns. Winter Lecture 24

FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r

SQL Server Auditing. By Minette Steynberg. Audit all SQL Server activities using ApexSQL Comply

Indexing XML Data in RDBMS using ORDPATH

Binary Trees and Huffman Encoding Binary Search Trees

Binary Heap Algorithms

Report on the Train Ticketing System

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

SQL Server Table Design - Best Practices

Data Structure with C

Analyzing & Optimizing T-SQL Query Performance Part1: using SET and DBCC. Kevin Kline Senior Product Architect for SQL Server Quest Software

Data Warehousing und Data Mining

Implementing and Maintaining Microsoft SQL Server 2008 Integration Services

MOC 20461C: Querying Microsoft SQL Server. Course Overview

Raima Database Manager Version 14.0 In-memory Database Engine

From Last Time: Remove (Delete) Operation

StruxureWare Power Monitoring Database Upgrade FAQ

Ordered Lists and Binary Trees

TrendWorX32 SQL Query Engine V9.2 Beta III

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

Querying Microsoft SQL Server (20461) H8N61S

CalPlanning. Smart View Essbase Ad Hoc Analysis

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Binary Heaps. CSE 373 Data Structures

Binary Search Trees. Ric Glassey

Lecture 1: Data Storage & Index

Persistent Binary Search Trees

DATABASE DESIGN - 1DL400

[MS-WSSDM]: Windows SharePoint Services: Content Database Data Migration Communications Protocol Specification

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Querying Microsoft SQL Server 2012

Services. Relational. Databases & JDBC. Today. Relational. Databases SQL JDBC. Next Time. Services. Relational. Databases & JDBC. Today.

Websense SQL Queries. David Buyer June 2009

How To Improve Performance In A Database

Improving SQL Server Performance

Introduction to SQL for Data Scientists

Course 20461C: Querying Microsoft SQL Server Duration: 35 hours

T-SQL STANDARD ELEMENTS

10CS35: Data Structures Using C

MS ACCESS DATABASE DATA TYPES

SQL Server for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

Converting a Number from Decimal to Binary

Oracle 10g PL/SQL Training

Algorithms and Data Structures

IT2305 Database Systems I (Compulsory)

SQL - QUICK GUIDE. Allows users to access data in relational database management systems.

WRITING EFFICIENT SQL. By Selene Bainum

Algorithms Chapter 12 Binary Search Trees

Big Data and Scripting. Part 4: Memory Hierarchies

Instant SQL Programming

SQL Performance for a Big Data 22 Billion row data warehouse

Optimizing Your Data Warehouse Design for Superior Performance

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

ATTACHMENT 6 SQL Server 2012 Programming Standards

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

Introduction This document s purpose is to define Microsoft SQL server database design standards.

USER GUIDE Appointment Manager

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Querying Microsoft SQL Server

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

Oracle Database 11g: SQL Tuning Workshop Release 2

CSE 530A Database Management Systems. Introduction. Washington University Fall 2013

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Data Structures and Algorithms

HansaWorld SQL Training Material

SQLMutation: A tool to generate mutants of SQL database queries

EASRestoreService. Manual

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Optimizing Performance. Training Division New Delhi

SQL Server 2008 Core Skills. Gary Young 2011

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Teradata SQL Assistant Version 13.0 (.Net) Enhancements and Differences. Mike Dempsey

Inquiry Formulas. student guide

Analyze Database Optimization Techniques

In This Lecture. Physical Design. RAID Arrays. RAID Level 0. RAID Level 1. Physical DB Issues, Indexes, Query Optimisation. Physical DB Issues

Power BI Performance. Tips and Techniques

Chapter 14 The Binary Search Tree

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2

Data warehousing with PostgreSQL

David Dye. Extract, Transform, Load

4 Simple Database Features

How to schedule and automate backups of SQL Server databases in SQL Server Express Editions

PROC SQL for SQL Die-hards Jessica Bennett, Advance America, Spartanburg, SC Barbara Ross, Flexshopper LLC, Boca Raton, FL

Course ID#: W 35 Hrs. Course Content

Analysis of Algorithms I: Binary Search Trees

Preparing a SQL Server for EmpowerID installation

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Transcription:

Efficient Interval Management in Microsoft SQL Server Itzik Ben-Gan, SolidQ Last modified: August, 0 Agenda Inefficiencies of classic interval handling Solution: RI-tree by Kriegel, Pötke and Seidl of University of Munich Optimized solution: Static RI-tree by Laurent Martin Potential for integration in SQL Server References

Intervals Examples for temporal intervals: sessions, contracts, appointments, shifts Allen s interval algebra: base relations http://www.ics.uci.edu/~alspaugh/cls/shr/allen.html Common relation: X intersects Y Example: return contracts that were active during an input period Classic interval representation: Table: Intervals( id, lower, upper ) Indexes: idx_lower ON Intervals(lower) INCLUDE(upper) idx_upper ON Intervas(upper) INCLUDE(lower) X Y Inefficiencies of Classic Interval Handling Return intervals [lower, upper] that intersect with [@l, @u]: Classic predicate: WHERE lower <= @u AND upper >= @l Problem: two range predicates An Index Seek can use only one range predicate as a Seek Predicate Other range predicate is evaluated as a [residual] Predicate Sensitive to parameter-sniffing @l @u lower upper logical reads:, CPU time: ms

Work used as Foundation for Solution Relational Interval tree (RI-tree) model Paper [RI]: Managing Intervals Efficiently in Object-Relational Databases (Kriegel, Pötke and Seidl of University of Munich) http://www.dbs.ifi.lmu.de/publikationen/papers/vldb000.pdf Static RI-tree and SQL Server implementation Article [SRI]: A Static Relational Interval Tree (Laurent Martin) http://www.solidq.com/sqj/pages/0-september-issue/a-static-relational-interval-tree.aspx Article [SRI]: Advanced interval queries with the Static Relational Interval Tree (Laurent Martin) http://www.solidq.com/sqj/pages/relational/advanced-interval-queries-with-the-static-relational-interval- Tree.aspx Article [SRI]: Using the Static Relational Interval Tree with time intervals (Laurent Martin) http://www.solidq.com/sqj/pages/relational/using-the-static-relational-interval-tree-with-timeintervals.aspx [RIBG]: Interval Queries in SQL Server (Itzik Ben-Gan) http://sqlmag.com/t-sql/sql-server-interval-queries * Above sources will be referred to as [RI], [SRI], [SRI], [SRI] and [RIBG] respectively Relational Interval tree (RI-tree) [RI] Virtual backbone binary tree, height = num bits, range: h -, root: h- Fork node: first node within interval when descending the tree with bisection Table: Intervals( id, node, lower, upper ) Indexes: idx_lower ON Intervals(node, lower), idx_upper ON Intervals(node, upper) 000 000 00 000 00 0 0 0 000 00 0 0 0 0 0000 000 00 00 00 0 0 0 00 0 0

Fork Node CREATE FUNCTION dbo.forknode(@lower AS INT, @upper AS INT) RETURNS INT WITH SCHEMABINDING AS BEGIN DECLARE @node AS INT = ; -- height = DECLARE @step AS INT = @node / ; WHILE @step >= BEGIN IF @upper < @node SET @node -= @step; ELSE IF @lower > @node SET @node += @step; ELSE BREAK; SET @step /= ; END; RETURN @node; END; Cons: iterative T-SQL; slows down insertions Optimized Fork Node [SRI] Compute fork node from lower and upper: lowest common ancestor Observations: : For any value N, L = leading bits before trailing 0s : For any X starting with L, X s left subtree nodes start with (L-), right start with L : For nonleaf node X and X- ancestors are same, so X can be replaced with X- : Leaf node Z and Z- differ only in last bit; Example: for Z and 0 X for = Z-(00) Hence... Left node: N, = (00), same (0) ancestors L = 0 Example: (00), (000) 000 Right node: (0) 000 00 000 00 0 0 0 000 00 0 0 0 0 0000 000 00 00 00 0 0 0 00 0 0

Optimized Fork Node Fork node = matching prefix of (lower - ) and upper 0s Let A = (lower - ) ^ upper -- mark different bits 0 ^ 0 = 00 Let B = POWER(, FLOOR(LOG(A, ))) -- first different bit set to like in upper: 000 Let C = upper % B -- keep trailing bits from upper after set bit in B: 0 Let D = upper - C concat and set trailing bits to 0s, voilà fork node: 00 Formula: upper - upper % POWER(, CAST(LOG((lower - ) ^ upper, ) AS INT)) 000 000 00 Not clear? Details in [RIBG] 000 00 0 0 0 000 00 0 0 0 0 0000 000 00 00 00 0 0 0 00 0 0 Optimized Fork Node Implemented as a computed column CREATE TABLE dbo.intervalsrit ( id INT NOT NULL, node AS upper - upper % POWER(, FLOOR(LOG((lower - ) ^ upper, ))) PERSISTED NOT NULL, lower INT NOT NULL, upper INT NOT NULL, CONSTRAINT PK_IntervalsRIT PRIMARY KEY(id), CONSTRAINT CHK_IntervalsRIT_upper_gteq_lower CHECK(upper >= lower) ); CREATE INDEX idx_lower ON dbo.intervalsrit(node, lower); CREATE INDEX idx_upper ON dbo.intervalsrit(node, upper); * Above are basic index definitions to support intersection queries; additional columns may be needed (filters, included columns) Pros: Much faster inserts Support seamless multi-row modifications Cons: Complex (even more so with date and time requires mapping to integers)

Fork node for DATETIME data type [SRI] node AS DATEADD(ns, ((((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 - (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + ) % POWER(CAST( AS BIGINT), FLOOR(LOG( (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), lower) AS BIGINT) * 0 + DATEPART(mi, lower)) * 0 + DATEPART(s, lower)) * 000000 + DATEPART(ns, lower) / 0 ) ^ (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + )) / LOG()))) % 000000) * 0, Fork node for DATETIME data type [SRI] DATEADD(s, ((((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 - (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + ) % POWER(CAST( AS BIGINT), FLOOR(LOG( (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), lower) AS BIGINT) * 0 + DATEPART(mi, lower)) * 0 + DATEPART(s, lower)) * 000000 + DATEPART(ns, lower) / 0 ) ^ (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + )) / LOG()))) / 000000) % 00,

Fork node for DATETIME data type [SRI] DATEADD(d, (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 - (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + ) % POWER(CAST( AS BIGINT), FLOOR(LOG( (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), lower) AS BIGINT) * 0 + DATEPART(mi, lower)) * 0 + DATEPART(s, lower)) * 000000 + DATEPART(ns, lower) / 0 ) ^ Fork node for DATETIME data type [SRI] (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + )) / LOG())) ) / CAST(000000000 AS BIGINT), CONVERT(DATETIME, '000', )))) Could be much shorter if BIGDATEADD and BIGDATEDIFF were implemented: https://connect.microsoft.com/sqlserver/feedback/details/

Querying [RI] Left nodes Middle nodes Right nodes All Together leftnodes and rightnodes functions Querying, Left Nodes Left nodes: W = {w on path leading to lower; and w < lower} In below example W = {, } 000 000 00 000 0 00 0 000 0 0 0 CREATE FUNCTION dbo.leftnodes (@lower AS INT, @upper AS INT) RETURNS @T TABLE ( node INT NOT NULL PRIMARY KEY ) AS BEGIN DECLARE @node AS INT = ; DECLARE @step AS INT = @node / ; -- descend from root node to lower WHILE @step >= BEGIN -- right node IF @lower < @node SET @node -= @step; -- left node ELSE IF @lower > @node BEGIN INSERT INTO @T(node) VALUES(@node); SET @node 00 += @step; END -- lower ELSE 0 BREAK; SET @step /= ; 0 END; 0000 000 00 00 00 0 0 0 00 0 RETURN; END; 0

Querying, Left Nodes Intervals [lower, upper] that intersect with input [@l, @u]: All intervals registered at w, where upper >= @l JOIN dbo.leftnodes(@l, @u) AS L ON I.node = L.node AND I.upper >= @l 000 @l= @u = lower = upper = w 000 00 000 000 00 w 0 00 0 0 0 0 0 0 0000 000 00 00 00 0 0 0 00 0 0 Querying, Right Nodes Symmetric to Left Nodes (with rightnodes function) Intervals that intersect with input [@l, @u]: All intervals registered at w, where lower <= @u JOIN dbo.rightnodes(@l, @u) AS R ON I.node = R.node AND I.lower <= @u lower = upper = @l= @u = w 000 000 00 000 000 00 0 00 w 0 0 0 0 0 0 0000 000 00 00 00 0 0 0 00 0 0

Querying, Middle Nodes Intervals that intersect with input [@l, @u]: All intervals registered at w, where node between @l and @u SELECT id FROM dbo.intervalsrit WHERE node BETWEEN @l AND @u lower = upper = @l= @u = 000 000 000 w 00 0 0 00 0 0000 000 000 00 00 00 00 0 w 0 w 0 0 0 00 0 0 0 0 Querying, All Together JOIN dbo.leftnodes(@l, @u) AS L ON I.node = L.node AND I.upper >= @l UNION ALL JOIN dbo.rightnodes(@l, @u) AS R ON I.node = R.node AND I.lower <= @u UNION ALL SELECT id FROM dbo.intervalsrit WHERE node BETWEEN @l AND @u; logical reads: (IntervalsRIT) + (@T - leftnodes) + (@T - rightnodes), CPU time: ms

leftnodes and rightnodes Functions Cons: Use iterative T-SQL and table variable In a join, called per row Can be implemented more efficiently Optimized Ancestors Function [RIBG] Ancestors [SRI]: each step clears rightmost set bit and sets bit to the left to Not clear? Details in [RIBG] Example: n = 00 = 00 = 000 = 0000 = 0000 = Identify rightmost set bit in (00): -n: - (0) keep bits from right until first set bit () reverse rest of bits (00 becomes 0) n & -n = n s rightmost set bit (00000) BitMasks (n < num bits - ) n b B 0 0000 00 0000 000 0000 0000 0000 Left nodes: ancestors where node < @node SELECT node FROM dbo.ancestors() AS A WHERE node < ; --, Right nodes: ancestors where node > @node SELECT node FROM dbo.ancestors() AS A WHERE node > ; --, CREATE FUNCTION dbo.ancestors(@node AS INT) RETURNS TABLE AS RETURN SELECT @node & b b as node -- compute ancestor FROM dbo.bitmasks WHERE b > @node & -@node; -- b > rightmost set bit

Optimized Ancestors Function JOIN dbo.ancestors(@l) AS L ON L.node < @l AND I.node = L.node AND I.upper >= @l UNION ALL JOIN dbo.ancestors(@u) AS R ON R.node > @u AND I.node = R.node AND I.lower <= @u UNION ALL SELECT id FROM dbo.intervalsrit WHERE node BETWEEN @l AND @u; logical reads: (IntervalsRIT) + (BitMasks), CPU time: 0 ms Further Optimization [SRI] JOIN dbo.ancestors(@l) AS L ON L.node < @l AND L.node >= (SELECT MIN(node) FROM dbo.intervalsrit) AND I.node = L.node AND I.upper >= @l UNION ALL JOIN dbo.ancestors(@u) AS R ON R.node > @u AND R.node <= (SELECT MAX(node) FROM dbo.intervalsrit) AND I.node = R.node AND I.lower <= @u Filter out ancestors outside of range covered by table Eliminates unnecessary index seeks UNION ALL SELECT id FROM dbo.intervalsrit WHERE node BETWEEN @l AND @u; logical reads: (IntervalsRIT) + (BitMasks), CPU time: 0 ms

Optimized Ancestors Function Pros: More efficient than iterative T-SQL functions Doesn t require a table variable Cons: A bit complex Potential for Integration in SQL Server [RIBG] Model is complex; can add engine support for model and optimizations Indexing: CREATE INDEX myindex ON dbo.intervals[(fcol, fcol,...)] -- leading equality-based filters INTERVAL(lower, upper) -- interval columns [INCLUDE(icol, icol,...)] -- included columns [WITH (INTERSECTS_ONLY = ON)]; -- determines keylist Internally compute fork node and create two B-tree indexes: key([fcol, fcol,,] node, lower[, upper]) [include(icol, icol)] key([fcol, fcol,,] node, upper[, lower]) [include(icol, icol)] Querying: Optimizer support for detecting interval queries with classic predicates Adding declarative SQL with RI-tree engine support More efficient native functions (for advanced users who wish to roll their own): forknode, leftnodes, rightnodes, ancestors Support integer, as well as date and time types

References [RI]: Managing Intervals Efficiently in Object-Relational Databases (Kriegel, Pötke and Seidl of University of Munich) http://www.dbs.ifi.lmu.de/publikationen/papers/vldb000.pdf [SRI]: A Static Relational Interval Tree (Laurent Martin) http://www.solidq.com/sqj/pages/0-september-issue/a-static-relational-interval-tree.aspx [SRI]: Advanced interval queries with the Static Relational Interval Tree (Laurent Martin) https://www.solidq.com/sqj/pages/relational/advanced-interval-queries-with-the-static- Relational-Interval-Tree.aspx [SRI]: Using the Static Relational Interval Tree with time intervals (Laurent Martin) http://www.solidq.com/sqj/pages/relational/using-the-static-relational-interval-tree-with-timeintervals.aspx [RIBG]: Interval Queries in SQL Server (Itzik Ben-Gan) http://sqlmag.com/t-sql/sql-server-interval-queries Feature Enhancement Request for Microsoft Connect item: https://connect.microsoft.com/sqlserver/feedback/details/0