Efficient Selection of Various k-objects for a keyword Query based on MapReduce Skyline Algorithm

Size: px

Start display at page:

Download "Efficient Selection of Various k-objects for a keyword Query based on MapReduce Skyline Algorithm"

Hollie Anthony
7 years ago
Views:

1 DNIS 2014 Efficient Selection of Various k-objects for a keyword Query based on MapReduce Skyline Algorithm Md. Anisuzzaman Siddique and Yasuhiko Morimoto Hiroshima University 1

2 Overview DNIS Top-k Query 2. Skyline Query 3. Related work 4. Top-k and Skyline Query 5. Contributions 6. k-objects Selection using MapReduce 7. Performance Evaluation 8. Conclusions 2

3 Distance 1. Top-k Query Top-k Query processing DNIS 2014 Find k objects that minimize a scoring function. Price Distance O O O O O O O O O : : : Hotels Dataset O 9 O 1 O 7 O 3 O 6 O 5 O 8 O 4 O 2 Price Geometric Interpretation 3

4 Distance 1. Top-k Query Top-k Query processing Find the 2 hotels that minimizes (price+distance). DNIS 2014 Price Distance O O O O O O O O O : : : Hotels Dataset O 9 O 1 Scoring function 1 O 7 O 3 O 6 O 5 O 8 O 4 O 2 Price Top-2 Hotel (price+distance) 4

5 Distance 1. Top-k Query Top-k Query processing Find the 2 hotels that minimizes (price) only. DNIS 2014 Price Distance O O O O O O O O O : : : Hotels Dataset O 9 O 1 O 7 Scoring function 2 O 3 O 6 O 5 O 8 O 4 Top-2 Hotel (price) O 2 Price 5

6 2. Skyline Query Distance DNIS 2014 Skyline Query processing Price Distance O O O O O O O O O : : : Hotels Dataset O 9 O 1 O 7 O 3 O 6 O 2 O 8 O 4 O 5 Price Dominance regions 6

7 2. Skyline Query Distance DNIS 2014 Skyline Query processing Price Distance O O O O O O O O O : : : Hotels Dataset O 9 O 1 O 7 O 3 O 6 Skyline O 4 O 5 O 2 O 8 Price Skyline of Hotels Dataset 7

8 3. Related Work DNIS 2014 Top-k Related work No Random Access (NRA), Fagin et al., PODS 01 Threshold Algorithm (TA), Fagin et al., PODS 01 PREFER, Hristidis et al., SIGMOD 01 etc. Skyline Sort Filter Skyline (SFS), Chomicki et al., ICDE 03 Linear Elimination Sort for Skyline (LESS), Parke et al., VLDB 05 Nearest Neighbor (NN), Kossmann et al., VLDB 02 Branch and Bound Skyline (BBS), Papadias et al., SIGMOD 03 etc. 8

9 4. Top-k and Skyline Query Problems DNIS 2014 Top-k Top-k and Skyline Query Advantage: Always select k-objects according to scoring function. Disadvantage: It require a concrete scoring function. Skyline Advantage: It does not require any scoring function. Disadvantage: No control on retrieved objects. It is difficult to identify interesting objects from large result. 9

10 5. Contributions DNIS 2014 Our Contributions We combine the strength of both skyline and top-k query and propose an algorithm to select various k objects without scoring function. We consider an efficient algorithm to select the k objects with MapReduce framework. 10

11 Problem Definition DNIS 2014 A user looking for k hotels that are cheap and close to beach. (keyword = Aizu hotel and set k=2). Price Distance O O O O O O O O O Aizu Hotels Dataset How to choose 2 hotels without scoring function? 11

12 k-objects Selection DNIS 2014 k-objects Selection Procedure Skyline computation Price Distance O O O O O O O O O Aizu Hotels Dataset Price Distance O O O O Skyline result 12

13 k-objects Selection DNIS 2014 k-objects Selection Procedure Case 1: No. of skyline objects >= k (say, k = 2) Price Distance O O O O Skyline Dataset Price Distance Pri.+Dist. O 1 3 (1) 17 (4) 20 (3) O 3 9 (3) 6 (2) 15 (1) O 5 13 (4) 3 (1) 16 (2) O 7 8 (2) 12 (3) 20 (3) Skyline Dataset Comb. Rank O 1 8 O 3 6 O 5 7 O 7 8 Skyline Dataset Comb. Rank O 3 6 O 5 7 Top-2 Result 13

14 k-objects Selection DNIS 2014 k-objects Selection Procedure Case 2: k > No. of skyline objects (say, k = 5) Price Distance O O O O Skyline result No solution! 14

15 k-objects Selection Distance DNIS 2014 We compute extended skyline objects and select k objects. Price Distance O O O O O O O O O Aizu Hotels Dataset O 9 O 1 O 7 O 3 O 6 Ext-Skyline O 4 O 5 O 2 O 8 Price Ext-Skyline of Hotels Dataset

16 k-objects Selection DNIS 2014 k-objects Selection Procedure Case 2: k > No. of skyline objects (say, k = 5) Price Distance Price Distance Pri.+Dist. O O 1 3 (1) 17 (5) 20 (4) O O 3 9 (4) 6 (2) 15 (1) O O 5 13 (6) 3 (1) 16 (2) O O 6 10 (5) 6 (2) 16 (2) O O 7 8 (3) 12 (4) 20 (4) O O 9 3 (1) 20 (6) 23 (6) Ext. Skyline result Ext. Skyline result Comb. Rank Comb. Rank O 1 10 O 1 10 O 3 7 O 3 7 O 5 9 O 5 9 O 6 9 O 6 9 O 7 11 O 7 11 O 9 13 Skyline Dataset Top-5 result 16

17 Distance k-objects Selection DNIS 2014 k-objects Selection Procedure Case 3: does not follow case 1 & 2 (k > ext. skyline objects) We need to consider other procedure. Set skyline can be a solution O 9 O 1 O 8 O 2 We discussed on skyline sets query in DNIS 2010 O 7 O 3 O 6 O 4 O 5 Set-Skyline Price How to select k objects from set skyline. (Future work)

18 6. k-objects Selection using MapReduce DNIS 2014 k-objects Selection using MapReduce It has three phases: Data splitting phase Map and ranking phase Reduce and selection phase Price Distance O O O O Skyline result 18

19 Skyline and Splitting phase DNIS 2014 Skyline and Splitting phase Phase 1: Split skyline dataset vertically Price Distance O O O O Skyline Dataset Price O 1 3 O 3 9 O 5 13 O 7 8 Distance O 1 17 O 3 6 O 5 3 O 7 12 Price+Distance O 1 20 O 3 15 O 5 16 O

20 Map and Ranking Phase DNIS 2014 Map and Ranking Phase Phase 2(a): Map all data objects Price O 1 3 O 3 9 O 5 13 O 7 8 Price+ Distance O 1 20 O 3 15 O 5 16 O 7 20 Distance O 1 17 O 3 6 O 5 3 O 7 12 Input Map Rank O 1 1 O 3 3 O 5 4 O 7 2 Rank O 1 3 O 3 1 O 5 2 O 7 3 Rank O 1 4 O 3 2 O 5 1 O 7 3

21 Map and Ranking Phase DNIS 2014 Map and Ranking Phase Phase 2(b): Send (, Rank) pairs to the combiner Rank Rank O 1 1 O 1 1 O 3 3 O O O 7 2 O 1 4 Rank O 3 3 O 1 3 O 3 1 O 3 1 O 3 2 O 5 2 O 5 4 O 7 3 O 5 2 Rank O 5 1 O 1 4 O 7 2 O 3 2 O 7 3 O 5 1 O 7 3 O 7 3 Map Output Combiner

22 Reduce and Selection Phase DNIS 2014 Reduce and Selection Phase Phase 3(a): Reducer produce (, list(rank)) pairs Rank O 1 1 O 1 3 O 1 4 O 3 3 O 3 1 O 3 2 O 5 4 O 5 2 O 5 1 O 7 2 O 7 3 O 7 3 Reducer Input Comb. Rank O 1 8 O 3 6 O 5 7 O 7 8 Reducer Output

23 Reduce and Selection Phase DNIS 2014 Phase 3(b): Select k objects Reduce and Selection Phase Comb. Rank O 1 8 O 3 6 O 5 7 O 7 8 Reducer Output Comb. Rank O 3 6 O 5 7 Result for k = 2

24 7. Performance Evaluation DNIS 2014 Synthetic Datasets Independent Anti-Correlated Experimental Datasets Experimental Setup 4 PCs with Intel Core i7 3.4GHz CPU 4GB RAM Windows 8.0 OS. 100 Mbits/S LAN connection 24

25 7. Performance Evaluation Runtime [s] Runtime [s] DNIS 2014 No of records = 100k Dimension = 2-8 Query Number = 500 Varying Dimensionality Proposed Proposed Ext Proposed Proposed Ext Dimensionality Dimensionality a) Anti-Correlated b) Independent 25

26 7. Performance Evaluation Runtime [s] Runtime [s] DNIS 2014 No of records = k Dimension = 6 Query Number = 500 Varying Cardinality Proposed Proposed Ext Proposed Proposed Ext k 200k 300k 400k Cardinality 0 100k 200k 300k 400k Cardinality a) Anti-Correlated b) Independent 26

27 7. Performance Evaluation Runtime [s] Runtime [s] DNIS 2014 No of records = 100k Dimension = 7 Query Number = Varying Query Number Proposed Proposed Ext. 0.5k 1k 1.5k 2k Query Number Proposed Proposed Ext. 0.5k 1k 1.5k 2k Query Number a) Anti-Correlated b) Independent 27

28 8. Conclusions DNIS 2014 Conclusions This work addresses k-objects selection problem and give guideline how to selects k objects. We applied the idea of skyline queries to select the k objects and proposed an efficient algorithm. Experimental results show the effectiveness of the proposed algorithm. 28

29 DNIS 2014 THANK YOU 29

30 Data Management Laboratory, Hiroshima University DNIS 2014 Questions? 30

Caching Dynamic Skyline Queries

Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management of Information Systems R.C. Athena Outline Introduction