Caching Dynamic Skyline Queries

Similar documents

Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse

Data Warehousing und Data Mining

Efficient Processing of Joins on Set-valued Attributes

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries

Multi-dimensional index structures Part I: motivation

Indexing Techniques in Data Warehousing Environment The UB-Tree Algorithm

Similarity Search in a Very Large Scale Using Hadoop and HBase

2 Associating Facts with Time

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Indexing Techniques for Data Warehouses Queries. Abstract

International Journal of Advance Research in Computer Science and Management Studies

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

FlexPref: A Framework for Extensible Preference Evaluation in Database Systems

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases

Data Structures for Moving Objects

SQL Query Evaluation. Winter Lecture 23

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB

Efficient Data Access and Data Integration Using Information Objects Mica J. Block

The Classical Architecture. Storage 1 / 36

Big Data Paradigms in Python

Chapter 5: Overview of Query Processing

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Decision Trees from large Databases: SLIQ

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Analysis Services Step by Step

Graph Database Proof of Concept Report

Clustering and scheduling maintenance tasks over time

Fast Sequential Summation Algorithms Using Augmented Data Structures

Operating Systems. Virtual Memory

Data warehousing with PostgreSQL

Chapter 3 Data Warehouse - technological growth

Cluster Description Formats, Problems and Algorithms

New Approach of Computing Data Cubes in Data Warehousing

Cloud Service Placement via Subgraph Matching

Invited Applications Paper

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

Vector storage and access; algorithms in GIS. This is lecture 6

Database Marketing, Business Intelligence and Knowledge Discovery

Physical DB design and tuning: outline

Best Practices for DB2 on z/os Performance

Principles of Data Mining by Hand&Mannila&Smyth

On Saying Enough Already! in MapReduce

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

UVA. Failure and Recovery. Failure and inconsistency. - transaction failures - system failures - media failures. Principle of recovery

Procedia Computer Science 00 (2012) Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu.

Clustering & Visualization

5.4 Closest Pair of Points

Fast Contextual Preference Scoring of Database Tuples

Learning in Abstract Memory Schemes for Dynamic Optimization

Graphical Representation of Multivariate Data

RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE

Base Table. (a) Conventional Relational Representation Basic DataIndex Representation. Basic DataIndex on RetFlag

Common Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Efficient and Scalable Method for Processing Top-k Spatial Boolean Queries

Unsupervised Data Mining (Clustering)

Partitioning under the hood in MySQL 5.5

Bounded Treewidth in Knowledge Representation and Reasoning 1

GameTime: A Toolkit for Timing Analysis of Software

The enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images

Arrangements And Duality

Big Fast Data Hadoop acceleration with Flash. June 2013

Comparison of different image compression formats. ECE 533 Project Report Paula Aguilera

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

A Multi-layered Domain-specific Language for Stencil Computations

In-Memory BigData. Summer 2012, Technology Overview

Indexing and Processing Big Data

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Indexing Spatio-Temporal archive As a Preprocessing Alsuccession

Concept of Cache in web proxies

Image Compression through DCT and Huffman Coding Technique

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Enterprise Applications

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Transcription:

Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management of Information Systems R.C. Athena

Outline Introduction Skyline (SL) and dynamic skyline queries (DSL) Related work Evaluating dynamic skyline queries Computing orthant skylines (OSL) Computing dynamic skyline via caching LRU, LFU, LPP cache replacement policies Experimental evaluation Conclusions and Future work

Skyline queries (SL) Given a dataset of d- dimensional points SL contains points not dominated by others x dominates y iff x as good as y in all dimensions and strictly better in at least one

Skyline queries (SL) Given a dataset of d- dimensional points SL contains points not dominated by others x dominates y iff x as good as y in all dimensions and strictly better in at least one Example Dataset of hotels Prefer cheap hotels close to the sea

Skyline queries (SL) Given a dataset of d- dimensional points SL contains points not dominated by others x dominates y iff x as good as y in all dimensions and strictly better in at least one Example Skyline points Dataset of hotels Prefer cheap hotels close to the sea

Skyline queries (SL) Given a dataset of d- dimensional points SL contains points not dominated by others x dominates y iff x as good as y in all dimensions and strictly better in at least one Example Skyline points Dataset of hotels Prefer cheap hotels close to the sea p 1

Skyline queries (SL) Given a dataset of d- dimensional points SL contains points not dominated by others x dominates y iff x as good as y in all dimensions and strictly better in at least one Example p 2 Skyline points Dataset of hotels Prefer cheap hotels close to the sea p 1

Dynamic skyline queries (DSL) Extension of skyline queries Given a query point q DSL contains points not dynamically dominated by others w.r.t q x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one Can be treated as static SL Transform points w.r.t. q

Dynamic skyline queries (DSL) Extension of skyline queries Given a query point q DSL contains points not dynamically dominated by others w.r.t q x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one Can be treated as static SL Transform points w.r.t. q Example Query point q User defines ideal hotel q

Dynamic skyline queries (DSL) Extension of skyline queries Given a query point q DSL contains points not dynamically dominated by others w.r.t q x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one Can be treated as static SL Transform points w.r.t. q q Example Dynamic Skyline points User defines ideal hotel q

Dynamic skyline queries (DSL) Extension of skyline queries Given a query point q DSL contains points not dynamically dominated by others w.r.t q x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one Can be treated as static SL Transform points w.r.t. q p 4 p 5 q Example Dynamic Skyline points User defines ideal hotel q

Intuition (1) Traditional SL algorithms need to run anew for each DSL query Our idea Exploit results from past queries to reduce processing cost for future DSL queries Cache past queries Decide which queries in cache are useful

Intuition (2)

Intuition (2) 2 past DSL queries q a, q b Each query partitions space in 4 quadrants q a q b

Intuition (3) A new query q arrives Consider DSL for q a p 1 is contained DSL(q a ) p 1 dominates p 2, p 3, p 4 p 1 lies in upper right quadrant w.r.t. q a q a lies in upper right quadrant w.r.t. q p 1 dominates also p 2, p 3, p 4 w.r.t. to q Exclude p 2, p 3, p 4 from dominance test for DSL(q) q b q q a p 1 p 2 p 3 p 4 Shaded area denotes points dominated by p 1

Contribution in brief Caching past DSL queries cannot reduce processing cost for future ones We need more information about dominance relationships Introduce orthant skylines (OSL) and examine their relationship with DSL Extend Bitmap algorithm to compute OSL in parallel with DSL Cache OSL to enhance DSL queries evaluation Present 3 cache replacement policies LRU, LFU, LPP Experimental evaluation of caching mechanism

Related work Non-indexed methods Block-Nested Loops (BnL) Bitmap Multidimensional Divide and Conquer (DnC) Sort First Scan (SFS) Index-based methods B-tree sort points according to the lowest valued coordinate R-tree Nearest neighbor based (NN) Branch and bound (BBS)

Related work Non-indexed methods Block-Nested Loops (BnL) Bitmap Multidimensional Divide and Conquer (DnC) Sort First Scan (SFS) Index-based methods B-tree sort points according to the lowest valued coordinate R-tree Nearest neighbor based (NN) Branch and bound (BBS)

Bitmap BnL variant Suitable for domains with low cardinality and discrete In brief Computes a bitmap representation of the points in the dataset Examines each point separately (dominance test) Checks whether it is contained in the skyline or not Exploits fast bitwise operations OR/AND

Bitmap Dominance test For each point p Define A = A 1 & A 2 & & A d Denotes the points as good as p in all dimensions Define B = B 1 B 2 B d Denotes the points strictly better than p in at least one dimension Dominance test: If C = A & B has all bits set to 0 then p is in SL

Orthant skyline (OSL) OSL provides more information about dominance relationships than DSL Useful for pruning Given a dataset of d- dimensional points and a query point q Space partitioned in 2 d orthants o-th orthant skyline (OSL) of q contains points of the o-th orthant not dynamically dominated by others inside orthant o w.r.t q

Orthant skyline (OSL) OSL provides more information about dominance relationships than DSL Useful for pruning Given a dataset of d- dimensional points and a query point q Space partitioned in 2 d orthants o-th orthant skyline (OSL) of q contains points of the o-th orthant not dynamically dominated by others inside orthant o w.r.t q Quadrant 1 Quadrant 0 Query point q Quadrant 3 Quadrant 2

Orthant skyline (OSL) OSL provides more information about dominance relationships than DSL Useful for pruning Given a dataset of d- dimensional points and a query point q Space partitioned in 2 d orthants o-th orthant skyline (OSL) of q contains points of the o-th orthant not dynamically dominated by others inside orthant o w.r.t q Quadrant 1 Quadrant 0 Query point q Quadrant 3 Quadrant 2

Orthant skyline (OSL) OSL provides more information about dominance relationships than DSL Useful for pruning Given a dataset of d- dimensional points and a query point q Space partitioned in 2 d orthants o-th orthant skyline (OSL) of q contains points of the o-th orthant not dynamically dominated by others inside orthant o w.r.t q Quadrant 1 Quadrant 3 Quadrant 0 Query point q Quadrant 2 skyline points Quadrant 2

OSL and DSL relationship Quadrant 1 Quadrant 0 q Quadrant 3 Quadrant 2

OSL and DSL relationship Quadrant 1 Quadrant 0 q Quadrant 3 Quadrant 2

OSL and DSL relationship Map points from quadrants 1,2,3 to points inside quadrant 0 Quadrant 1 q Quadrant 0 Quadrant 3 Quadrant 2

OSL and DSL relationship Map points from quadrants 1,2,3 to points inside quadrant 0 Compute DSL w.r.t. q Quadrant 1 q Quadrant 0 Quadrant 3 Quadrant 2

OSL and DSL relationship Map points from quadrants 1,2,3 to points inside quadrant 0 Compute DSL w.r.t. q Union of all OSLs is superset of DSL w.r.t. to q Quadrant 1 q Quadrant 3 Quadrant 0 Quadrant 2

OSL and DSL relationship Map points from quadrants 1,2,3 to points inside quadrant 0 Compute DSL w.r.t. q Union of all OSLs is superset of DSL w.r.t. to q Quadrant 1 Quadrant 3 p 1 p 2 q Quadrant 0 Quadrant 2

OSL and DSL relationship Map points from quadrants 1,2,3 to points inside quadrant 0 Compute DSL w.r.t. q Union of all OSLs is superset of DSL w.r.t. to q Quadrant 1 Quadrant 3 p 2 q p 3 Quadrant 0 Quadrant 2

Computing orthant skylines Algorithm DBM Extends Bitmap to compute DSL and OSLs at the same time Method: Compute bitmap representation Transform each point coordinates w.r.t. to query q Dominance test, point p, orthant o p not in OSL o and not in DSL p not in DSL, but in OSL o p in DSL and in OSL o

Dynamic skylines Via Caching Cache OSLs instead of DSLs Query cache contains (query point q j, OSLs) OSLs encode by bitmaps Algorithm cdbm OSL contains information about dominance test inside orthant Discard points inside orthants from dominance tests Method: Compute bitmap representation For each point p consider its position (orthant) w.r.t. to cache queries q j If p in the same orthant o w.r.t q j as q j w.r.t. q and p not in OSL o (q j ) then exclude it from OSL o (q), DSL(q)

Cache Replacement Policies General idea Limited cache space Identify least useful query point in cache Replace it with new one

Usage-based policies Only a few queries in cache are useful Log cache query usage Given a new query q Consider as input the query point cache Q Only query points in OSL of Q w.r.t. q are useful Update cache - remove: Least Recently Used (LRU) query point Least Frequently Used (LFU) query point

Usage-based policies Only a few queries in cache are useful Log cache query usage Given a new query q Consider as input the query point cache Q Only query points in OSL of Q w.r.t. q are useful Update cache - remove: Least Recently Used (LRU) query point Least Frequently Used (LFU) query point q q b q d q a q c

Usage-based policies Only a few queries in cache are useful Log cache query usage Given a new query q Consider as input the query point cache Q Only query points in OSL of Q w.r.t. q are useful Update cache - remove: Least Recently Used (LRU) query point Least Frequently Used (LFU) query point q q b q d q a Redundant queries q c

Usage-based policies Only a few queries in cache are useful Log cache query usage Given a new query q Consider as input the query points in cache Q Only query points in OSL of Q w.r.t. q are useful Update cache - remove: Least Recently Used (LRU) query point Least Frequently Used (LFU) query point q q b q d q a Redundant queries q c

Usage-based policies Only a few queries in cache are useful Log cache query usage Given a new query q Consider as input the query points in cache Q Only query points in OSL of Q w.r.t. q are useful Update cache - remove: Least Recently Used (LRU) query point Least Frequently Used (LFU) query point q q b q d q a Redundant queries q c

Pruning power-based policy Usage-based policies do not indicate usefulness Useful cached query Great pruning power Probability that a query can prune points of dataset from DSL computation Depends on Points dominated by query in an orthant j Points contained in the antisymetric orthant of j Update cache remove Query point with less pruning power (LPP) q a q

Pruning power-based policy Usage-based policies do not indicate usefulness Useful cached query Great pruning power Probability that a query can prune points of dataset from DSL computation Depends on Points dominated by query in an orthant j Points contained in the antisymetric orthant of j Update cache remove Query point with less pruning power (LPP) 2:2 4 3:2 4 q a 5:2 4 74:2 4 q

Pruning power-based policy Usage-based policies do not indicate usefulness Useful cached query Great pruning power Probability that a query can prune points of dataset from DSL computation Depends on Points dominated by query in an orthant j Points contained in the antisymetric orthant of j Update cache remove Query point with less pruning power (LPP) 2:3 4 3:4 4 q a 5:7 4 74:88 4 q

Pruning power-based policy Usage-based policies do not indicate usefulness Useful cached query Great pruning power Probability that a query can prune points of dataset from DSL computation Depends on Points dominated by query in an orthant j Points contained in the antisymetric orthant of j Update cache remove Query point with less pruning power (LPP) 2:3 176 3:4 4 q a 5:7 4 74:88 4 q

Pruning power-based policy Usage-based policies do not indicate usefulness Useful cached query Great pruning power Probability that a query can prune points of dataset from DSL computation Depends on Points dominated by query in an orthant j Points contained in the antisymetric orthant of j Update cache remove Query point with less pruning power (LPP) 2:3 176 3:4 21 q a 5:7 20 74:88 222 q

Pruning power-based policy Usage-based policies do not indicate usefulness Useful cached query Great pruning power Probability that a query can prune points of dataset from DSL computation Depends on Points dominated by query in an orthant j Points contained in the antisymetric orthant of j Update cache remove Query point with less pruning power (LPP) 2:3 176 3:4 21 q a 5:7 20 74:88 222 q

Experimental Evaluation Synthetic datasets Distribution types Independent, correlated, anti-correlated Number of points N 10k, 20k, 50k, 100k, Dimensionality d = {2,3,4,5,6} Domain size for dimension D = {10,20,50} Compare Bitmap (NO-CACHE) cdbm with LFU,LRU,LPP cache replacement policies Query cache Q = {10,20,30,40,50} past query points Cache size is Q *N bits uncompressed

Varying query cache size Independent Anti-correlated Dataset: N = 50k points, with d = 4 dimensions of D = 20 domain size LFU,LRU cache queries not representative for future ones LPP caches queries with great pruning power

Effect of distribution parameters Correlated vary N Correlated vary d Relative improvement in running time over NO-CACHE Vary number of points N d = 4 dimensions of D = 20 domain size Vary number of dimensions d N = 50k, D = 20

Conclusions and Future work Conclusions Introduced orthant skylines (OSLs) and discussed its relationship with DSL Extended Bitmap to compute OSLs and DSL at the same time (DBM algorithm) Proposed caching mechanism of OSLs to reduce cost for future DSL queries LRU, LFU, LPP cache replacement policies Experimentally verified the efficiency of caching mechanism Future work Apply caching mechanism to index-based methods Further increase pruning power of cached queries

Questions?