Efficient Interval Management in Microsoft SQL Server

Efficient Interval Management in Microsoft SQL Server Itzik Ben-Gan, SolidQ Last modified: August, 0 Agenda Inefficiencies of classic interval handling Solution: RI-tree by Kriegel, Pötke and Seidl of University of Munich Optimized solution: Static RI-tree by Laurent Martin Potential for integration in SQL Server References

Intervals Examples for temporal intervals: sessions, contracts, appointments, shifts Allen s interval algebra: base relations http://www.ics.uci.edu/~alspaugh/cls/shr/allen.html Common relation: X intersects Y Example: return contracts that were active during an input period Classic interval representation: Table: Intervals( id, lower, upper ) Indexes: idx_lower ON Intervals(lower) INCLUDE(upper) idx_upper ON Intervas(upper) INCLUDE(lower) X Y Inefficiencies of Classic Interval Handling Return intervals [lower, upper] that intersect with [@l, @u]: Classic predicate: WHERE lower <= @u AND upper >= @l Problem: two range predicates An Index Seek can use only one range predicate as a Seek Predicate Other range predicate is evaluated as a [residual] Predicate Sensitive to parameter-sniffing @l @u lower upper logical reads:, CPU time: ms

Work used as Foundation for Solution Relational Interval tree (RI-tree) model Paper [RI]: Managing Intervals Efficiently in Object-Relational Databases (Kriegel, Pötke and Seidl of University of Munich) http://www.dbs.ifi.lmu.de/publikationen/papers/vldb000.pdf Static RI-tree and SQL Server implementation Article [SRI]: A Static Relational Interval Tree (Laurent Martin) http://www.solidq.com/sqj/pages/0-september-issue/a-static-relational-interval-tree.aspx Article [SRI]: Advanced interval queries with the Static Relational Interval Tree (Laurent Martin) http://www.solidq.com/sqj/pages/relational/advanced-interval-queries-with-the-static-relational-interval- Tree.aspx Article [SRI]: Using the Static Relational Interval Tree with time intervals (Laurent Martin) http://www.solidq.com/sqj/pages/relational/using-the-static-relational-interval-tree-with-timeintervals.aspx [RIBG]: Interval Queries in SQL Server (Itzik Ben-Gan) http://sqlmag.com/t-sql/sql-server-interval-queries * Above sources will be referred to as [RI], [SRI], [SRI], [SRI] and [RIBG] respectively Relational Interval tree (RI-tree) [RI] Virtual backbone binary tree, height = num bits, range: h -, root: h- Fork node: first node within interval when descending the tree with bisection Table: Intervals( id, node, lower, upper ) Indexes: idx_lower ON Intervals(node, lower), idx_upper ON Intervals(node, upper) 000 000 00 000 00 0 0 0 000 00 0 0 0 0 0000 000 00 00 00 0 0 0 00 0 0

Fork Node CREATE FUNCTION dbo.forknode(@lower AS INT, @upper AS INT) RETURNS INT WITH SCHEMABINDING AS BEGIN DECLARE @node AS INT = ; -- height = DECLARE @step AS INT = @node / ; WHILE @step >= BEGIN IF @upper < @node SET @node -= @step; ELSE IF @lower > @node SET @node += @step; ELSE BREAK; SET @step /= ; END; RETURN @node; END; Cons: iterative T-SQL; slows down insertions Optimized Fork Node [SRI] Compute fork node from lower and upper: lowest common ancestor Observations: : For any value N, L = leading bits before trailing 0s : For any X starting with L, X s left subtree nodes start with (L-), right start with L : For nonleaf node X and X- ancestors are same, so X can be replaced with X- : Leaf node Z and Z- differ only in last bit; Example: for Z and 0 X for = Z-(00) Hence... Left node: N, = (00), same (0) ancestors L = 0 Example: (00), (000) 000 Right node: (0) 000 00 000 00 0 0 0 000 00 0 0 0 0 0000 000 00 00 00 0 0 0 00 0 0

Optimized Fork Node Fork node = matching prefix of (lower - ) and upper 0s Let A = (lower - ) ^ upper -- mark different bits 0 ^ 0 = 00 Let B = POWER(, FLOOR(LOG(A, ))) -- first different bit set to like in upper: 000 Let C = upper % B -- keep trailing bits from upper after set bit in B: 0 Let D = upper - C concat and set trailing bits to 0s, voilà fork node: 00 Formula: upper - upper % POWER(, CAST(LOG((lower - ) ^ upper, ) AS INT)) 000 000 00 Not clear? Details in [RIBG] 000 00 0 0 0 000 00 0 0 0 0 0000 000 00 00 00 0 0 0 00 0 0 Optimized Fork Node Implemented as a computed column CREATE TABLE dbo.intervalsrit ( id INT NOT NULL, node AS upper - upper % POWER(, FLOOR(LOG((lower - ) ^ upper, ))) PERSISTED NOT NULL, lower INT NOT NULL, upper INT NOT NULL, CONSTRAINT PK_IntervalsRIT PRIMARY KEY(id), CONSTRAINT CHK_IntervalsRIT_upper_gteq_lower CHECK(upper >= lower) ); CREATE INDEX idx_lower ON dbo.intervalsrit(node, lower); CREATE INDEX idx_upper ON dbo.intervalsrit(node, upper); * Above are basic index definitions to support intersection queries; additional columns may be needed (filters, included columns) Pros: Much faster inserts Support seamless multi-row modifications Cons: Complex (even more so with date and time requires mapping to integers)

Fork node for DATETIME data type [SRI] node AS DATEADD(ns, ((((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 - (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + ) % POWER(CAST( AS BIGINT), FLOOR(LOG( (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), lower) AS BIGINT) * 0 + DATEPART(mi, lower)) * 0 + DATEPART(s, lower)) * 000000 + DATEPART(ns, lower) / 0 ) ^ (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + )) / LOG()))) % 000000) * 0, Fork node for DATETIME data type [SRI] DATEADD(s, ((((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 - (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + ) % POWER(CAST( AS BIGINT), FLOOR(LOG( (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), lower) AS BIGINT) * 0 + DATEPART(mi, lower)) * 0 + DATEPART(s, lower)) * 000000 + DATEPART(ns, lower) / 0 ) ^ (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + )) / LOG()))) / 000000) % 00,

Fork node for DATETIME data type [SRI] DATEADD(d, (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 - (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + ) % POWER(CAST( AS BIGINT), FLOOR(LOG( (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), lower) AS BIGINT) * 0 + DATEPART(mi, lower)) * 0 + DATEPART(s, lower)) * 000000 + DATEPART(ns, lower) / 0 ) ^ Fork node for DATETIME data type [SRI] (((CAST(DATEDIFF(hh, CONVERT(DATETIME, '000', ), upper) AS BIGINT) * 0 + DATEPART(ns, upper) / 0 + )) / LOG())) ) / CAST(000000000 AS BIGINT), CONVERT(DATETIME, '000', )))) Could be much shorter if BIGDATEADD and BIGDATEDIFF were implemented: https://connect.microsoft.com/sqlserver/feedback/details/

Querying [RI] Left nodes Middle nodes Right nodes All Together leftnodes and rightnodes functions Querying, Left Nodes Left nodes: W = {w on path leading to lower; and w < lower} In below example W = {, } 000 000 00 000 0 00 0 000 0 0 0 CREATE FUNCTION dbo.leftnodes (@lower AS INT, @upper AS INT) RETURNS @T TABLE ( node INT NOT NULL PRIMARY KEY ) AS BEGIN DECLARE @node AS INT = ; DECLARE @step AS INT = @node / ; -- descend from root node to lower WHILE @step >= BEGIN -- right node IF @lower < @node SET @node -= @step; -- left node ELSE IF @lower > @node BEGIN INSERT INTO @T(node) VALUES(@node); SET @node 00 += @step; END -- lower ELSE 0 BREAK; SET @step /= ; 0 END; 0000 000 00 00 00 0 0 0 00 0 RETURN; END; 0

Querying, Left Nodes Intervals [lower, upper] that intersect with input [@l, @u]: All intervals registered at w, where upper >= @l JOIN dbo.leftnodes(@l, @u) AS L ON I.node = L.node AND I.upper >= @l 000 @l= @u = lower = upper = w 000 00 000 000 00 w 0 00 0 0 0 0 0 0 0000 000 00 00 00 0 0 0 00 0 0 Querying, Right Nodes Symmetric to Left Nodes (with rightnodes function) Intervals that intersect with input [@l, @u]: All intervals registered at w, where lower <= @u JOIN dbo.rightnodes(@l, @u) AS R ON I.node = R.node AND I.lower <= @u lower = upper = @l= @u = w 000 000 00 000 000 00 0 00 w 0 0 0 0 0 0 0000 000 00 00 00 0 0 0 00 0 0

Querying, Middle Nodes Intervals that intersect with input [@l, @u]: All intervals registered at w, where node between @l and @u SELECT id FROM dbo.intervalsrit WHERE node BETWEEN @l AND @u lower = upper = @l= @u = 000 000 000 w 00 0 0 00 0 0000 000 000 00 00 00 00 0 w 0 w 0 0 0 00 0 0 0 0 Querying, All Together JOIN dbo.leftnodes(@l, @u) AS L ON I.node = L.node AND I.upper >= @l UNION ALL JOIN dbo.rightnodes(@l, @u) AS R ON I.node = R.node AND I.lower <= @u UNION ALL SELECT id FROM dbo.intervalsrit WHERE node BETWEEN @l AND @u; logical reads: (IntervalsRIT) + (@T - leftnodes) + (@T - rightnodes), CPU time: ms

leftnodes and rightnodes Functions Cons: Use iterative T-SQL and table variable In a join, called per row Can be implemented more efficiently Optimized Ancestors Function [RIBG] Ancestors [SRI]: each step clears rightmost set bit and sets bit to the left to Not clear? Details in [RIBG] Example: n = 00 = 00 = 000 = 0000 = 0000 = Identify rightmost set bit in (00): -n: - (0) keep bits from right until first set bit () reverse rest of bits (00 becomes 0) n & -n = n s rightmost set bit (00000) BitMasks (n < num bits - ) n b B 0 0000 00 0000 000 0000 0000 0000 Left nodes: ancestors where node < @node SELECT node FROM dbo.ancestors() AS A WHERE node < ; --, Right nodes: ancestors where node > @node SELECT node FROM dbo.ancestors() AS A WHERE node > ; --, CREATE FUNCTION dbo.ancestors(@node AS INT) RETURNS TABLE AS RETURN SELECT @node & b b as node -- compute ancestor FROM dbo.bitmasks WHERE b > @node & -@node; -- b > rightmost set bit

Optimized Ancestors Function JOIN dbo.ancestors(@l) AS L ON L.node < @l AND I.node = L.node AND I.upper >= @l UNION ALL JOIN dbo.ancestors(@u) AS R ON R.node > @u AND I.node = R.node AND I.lower <= @u UNION ALL SELECT id FROM dbo.intervalsrit WHERE node BETWEEN @l AND @u; logical reads: (IntervalsRIT) + (BitMasks), CPU time: 0 ms Further Optimization [SRI] JOIN dbo.ancestors(@l) AS L ON L.node < @l AND L.node >= (SELECT MIN(node) FROM dbo.intervalsrit) AND I.node = L.node AND I.upper >= @l UNION ALL JOIN dbo.ancestors(@u) AS R ON R.node > @u AND R.node <= (SELECT MAX(node) FROM dbo.intervalsrit) AND I.node = R.node AND I.lower <= @u Filter out ancestors outside of range covered by table Eliminates unnecessary index seeks UNION ALL SELECT id FROM dbo.intervalsrit WHERE node BETWEEN @l AND @u; logical reads: (IntervalsRIT) + (BitMasks), CPU time: 0 ms

Optimized Ancestors Function Pros: More efficient than iterative T-SQL functions Doesn t require a table variable Cons: A bit complex Potential for Integration in SQL Server [RIBG] Model is complex; can add engine support for model and optimizations Indexing: CREATE INDEX myindex ON dbo.intervals[(fcol, fcol,...)] -- leading equality-based filters INTERVAL(lower, upper) -- interval columns [INCLUDE(icol, icol,...)] -- included columns [WITH (INTERSECTS_ONLY = ON)]; -- determines keylist Internally compute fork node and create two B-tree indexes: key([fcol, fcol,,] node, lower[, upper]) [include(icol, icol)] key([fcol, fcol,,] node, upper[, lower]) [include(icol, icol)] Querying: Optimizer support for detecting interval queries with classic predicates Adding declarative SQL with RI-tree engine support More efficient native functions (for advanced users who wish to roll their own): forknode, leftnodes, rightnodes, ancestors Support integer, as well as date and time types

References [RI]: Managing Intervals Efficiently in Object-Relational Databases (Kriegel, Pötke and Seidl of University of Munich) http://www.dbs.ifi.lmu.de/publikationen/papers/vldb000.pdf [SRI]: A Static Relational Interval Tree (Laurent Martin) http://www.solidq.com/sqj/pages/0-september-issue/a-static-relational-interval-tree.aspx [SRI]: Advanced interval queries with the Static Relational Interval Tree (Laurent Martin) https://www.solidq.com/sqj/pages/relational/advanced-interval-queries-with-the-static- Relational-Interval-Tree.aspx [SRI]: Using the Static Relational Interval Tree with time intervals (Laurent Martin) http://www.solidq.com/sqj/pages/relational/using-the-static-relational-interval-tree-with-timeintervals.aspx [RIBG]: Interval Queries in SQL Server (Itzik Ben-Gan) http://sqlmag.com/t-sql/sql-server-interval-queries Feature Enhancement Request for Microsoft Connect item: https://connect.microsoft.com/sqlserver/feedback/details/0