Efficient Selection of Various k-objects for a keyword Query based on MapReduce Skyline Algorithm
|
|
- Hollie Anthony
- 7 years ago
- Views:
Transcription
1 DNIS 2014 Efficient Selection of Various k-objects for a keyword Query based on MapReduce Skyline Algorithm Md. Anisuzzaman Siddique and Yasuhiko Morimoto Hiroshima University 1
2 Overview DNIS Top-k Query 2. Skyline Query 3. Related work 4. Top-k and Skyline Query 5. Contributions 6. k-objects Selection using MapReduce 7. Performance Evaluation 8. Conclusions 2
3 Distance 1. Top-k Query Top-k Query processing DNIS 2014 Find k objects that minimize a scoring function. Price Distance O O O O O O O O O : : : Hotels Dataset O 9 O 1 O 7 O 3 O 6 O 5 O 8 O 4 O 2 Price Geometric Interpretation 3
4 Distance 1. Top-k Query Top-k Query processing Find the 2 hotels that minimizes (price+distance). DNIS 2014 Price Distance O O O O O O O O O : : : Hotels Dataset O 9 O 1 Scoring function 1 O 7 O 3 O 6 O 5 O 8 O 4 O 2 Price Top-2 Hotel (price+distance) 4
5 Distance 1. Top-k Query Top-k Query processing Find the 2 hotels that minimizes (price) only. DNIS 2014 Price Distance O O O O O O O O O : : : Hotels Dataset O 9 O 1 O 7 Scoring function 2 O 3 O 6 O 5 O 8 O 4 Top-2 Hotel (price) O 2 Price 5
6 2. Skyline Query Distance DNIS 2014 Skyline Query processing Price Distance O O O O O O O O O : : : Hotels Dataset O 9 O 1 O 7 O 3 O 6 O 2 O 8 O 4 O 5 Price Dominance regions 6
7 2. Skyline Query Distance DNIS 2014 Skyline Query processing Price Distance O O O O O O O O O : : : Hotels Dataset O 9 O 1 O 7 O 3 O 6 Skyline O 4 O 5 O 2 O 8 Price Skyline of Hotels Dataset 7
8 3. Related Work DNIS 2014 Top-k Related work No Random Access (NRA), Fagin et al., PODS 01 Threshold Algorithm (TA), Fagin et al., PODS 01 PREFER, Hristidis et al., SIGMOD 01 etc. Skyline Sort Filter Skyline (SFS), Chomicki et al., ICDE 03 Linear Elimination Sort for Skyline (LESS), Parke et al., VLDB 05 Nearest Neighbor (NN), Kossmann et al., VLDB 02 Branch and Bound Skyline (BBS), Papadias et al., SIGMOD 03 etc. 8
9 4. Top-k and Skyline Query Problems DNIS 2014 Top-k Top-k and Skyline Query Advantage: Always select k-objects according to scoring function. Disadvantage: It require a concrete scoring function. Skyline Advantage: It does not require any scoring function. Disadvantage: No control on retrieved objects. It is difficult to identify interesting objects from large result. 9
10 5. Contributions DNIS 2014 Our Contributions We combine the strength of both skyline and top-k query and propose an algorithm to select various k objects without scoring function. We consider an efficient algorithm to select the k objects with MapReduce framework. 10
11 Problem Definition DNIS 2014 A user looking for k hotels that are cheap and close to beach. (keyword = Aizu hotel and set k=2). Price Distance O O O O O O O O O Aizu Hotels Dataset How to choose 2 hotels without scoring function? 11
12 k-objects Selection DNIS 2014 k-objects Selection Procedure Skyline computation Price Distance O O O O O O O O O Aizu Hotels Dataset Price Distance O O O O Skyline result 12
13 k-objects Selection DNIS 2014 k-objects Selection Procedure Case 1: No. of skyline objects >= k (say, k = 2) Price Distance O O O O Skyline Dataset Price Distance Pri.+Dist. O 1 3 (1) 17 (4) 20 (3) O 3 9 (3) 6 (2) 15 (1) O 5 13 (4) 3 (1) 16 (2) O 7 8 (2) 12 (3) 20 (3) Skyline Dataset Comb. Rank O 1 8 O 3 6 O 5 7 O 7 8 Skyline Dataset Comb. Rank O 3 6 O 5 7 Top-2 Result 13
14 k-objects Selection DNIS 2014 k-objects Selection Procedure Case 2: k > No. of skyline objects (say, k = 5) Price Distance O O O O Skyline result No solution! 14
15 k-objects Selection Distance DNIS 2014 We compute extended skyline objects and select k objects. Price Distance O O O O O O O O O Aizu Hotels Dataset O 9 O 1 O 7 O 3 O 6 Ext-Skyline O 4 O 5 O 2 O 8 Price Ext-Skyline of Hotels Dataset
16 k-objects Selection DNIS 2014 k-objects Selection Procedure Case 2: k > No. of skyline objects (say, k = 5) Price Distance Price Distance Pri.+Dist. O O 1 3 (1) 17 (5) 20 (4) O O 3 9 (4) 6 (2) 15 (1) O O 5 13 (6) 3 (1) 16 (2) O O 6 10 (5) 6 (2) 16 (2) O O 7 8 (3) 12 (4) 20 (4) O O 9 3 (1) 20 (6) 23 (6) Ext. Skyline result Ext. Skyline result Comb. Rank Comb. Rank O 1 10 O 1 10 O 3 7 O 3 7 O 5 9 O 5 9 O 6 9 O 6 9 O 7 11 O 7 11 O 9 13 Skyline Dataset Top-5 result 16
17 Distance k-objects Selection DNIS 2014 k-objects Selection Procedure Case 3: does not follow case 1 & 2 (k > ext. skyline objects) We need to consider other procedure. Set skyline can be a solution O 9 O 1 O 8 O 2 We discussed on skyline sets query in DNIS 2010 O 7 O 3 O 6 O 4 O 5 Set-Skyline Price How to select k objects from set skyline. (Future work)
18 6. k-objects Selection using MapReduce DNIS 2014 k-objects Selection using MapReduce It has three phases: Data splitting phase Map and ranking phase Reduce and selection phase Price Distance O O O O Skyline result 18
19 Skyline and Splitting phase DNIS 2014 Skyline and Splitting phase Phase 1: Split skyline dataset vertically Price Distance O O O O Skyline Dataset Price O 1 3 O 3 9 O 5 13 O 7 8 Distance O 1 17 O 3 6 O 5 3 O 7 12 Price+Distance O 1 20 O 3 15 O 5 16 O
20 Map and Ranking Phase DNIS 2014 Map and Ranking Phase Phase 2(a): Map all data objects Price O 1 3 O 3 9 O 5 13 O 7 8 Price+ Distance O 1 20 O 3 15 O 5 16 O 7 20 Distance O 1 17 O 3 6 O 5 3 O 7 12 Input Map Rank O 1 1 O 3 3 O 5 4 O 7 2 Rank O 1 3 O 3 1 O 5 2 O 7 3 Rank O 1 4 O 3 2 O 5 1 O 7 3
21 Map and Ranking Phase DNIS 2014 Map and Ranking Phase Phase 2(b): Send (, Rank) pairs to the combiner Rank Rank O 1 1 O 1 1 O 3 3 O O O 7 2 O 1 4 Rank O 3 3 O 1 3 O 3 1 O 3 1 O 3 2 O 5 2 O 5 4 O 7 3 O 5 2 Rank O 5 1 O 1 4 O 7 2 O 3 2 O 7 3 O 5 1 O 7 3 O 7 3 Map Output Combiner
22 Reduce and Selection Phase DNIS 2014 Reduce and Selection Phase Phase 3(a): Reducer produce (, list(rank)) pairs Rank O 1 1 O 1 3 O 1 4 O 3 3 O 3 1 O 3 2 O 5 4 O 5 2 O 5 1 O 7 2 O 7 3 O 7 3 Reducer Input Comb. Rank O 1 8 O 3 6 O 5 7 O 7 8 Reducer Output
23 Reduce and Selection Phase DNIS 2014 Phase 3(b): Select k objects Reduce and Selection Phase Comb. Rank O 1 8 O 3 6 O 5 7 O 7 8 Reducer Output Comb. Rank O 3 6 O 5 7 Result for k = 2
24 7. Performance Evaluation DNIS 2014 Synthetic Datasets Independent Anti-Correlated Experimental Datasets Experimental Setup 4 PCs with Intel Core i7 3.4GHz CPU 4GB RAM Windows 8.0 OS. 100 Mbits/S LAN connection 24
25 7. Performance Evaluation Runtime [s] Runtime [s] DNIS 2014 No of records = 100k Dimension = 2-8 Query Number = 500 Varying Dimensionality Proposed Proposed Ext Proposed Proposed Ext Dimensionality Dimensionality a) Anti-Correlated b) Independent 25
26 7. Performance Evaluation Runtime [s] Runtime [s] DNIS 2014 No of records = k Dimension = 6 Query Number = 500 Varying Cardinality Proposed Proposed Ext Proposed Proposed Ext k 200k 300k 400k Cardinality 0 100k 200k 300k 400k Cardinality a) Anti-Correlated b) Independent 26
27 7. Performance Evaluation Runtime [s] Runtime [s] DNIS 2014 No of records = 100k Dimension = 7 Query Number = Varying Query Number Proposed Proposed Ext. 0.5k 1k 1.5k 2k Query Number Proposed Proposed Ext. 0.5k 1k 1.5k 2k Query Number a) Anti-Correlated b) Independent 27
28 8. Conclusions DNIS 2014 Conclusions This work addresses k-objects selection problem and give guideline how to selects k objects. We applied the idea of skyline queries to select the k objects and proposed an efficient algorithm. Experimental results show the effectiveness of the proposed algorithm. 28
29 DNIS 2014 THANK YOU 29
30 Data Management Laboratory, Hiroshima University DNIS 2014 Questions? 30
Caching Dynamic Skyline Queries
Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management of Information Systems R.C. Athena Outline Introduction
More informationKEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
More informationSimilarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel Yahoo! Research New York, NY 10018 goel@yahoo-inc.com John Langford Yahoo! Research New York, NY 10018 jl@yahoo-inc.com Alex Strehl Yahoo! Research New York,
More informationSIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
More informationFast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
More informationOn Saying Enough Already! in MapReduce
On Saying Enough Already! in MapReduce Christos Doulkeridis and Kjetil Nørvåg Department of Computer and Information Science (IDI) Norwegian University of Science and Technology (NTNU) Trondheim, Norway
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationPerformance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationBenchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside
Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment Sanjay Kulhari, Jian Wen UC Riverside Team Sanjay Kulhari M.S. student, CS U C Riverside Jian Wen Ph.D. student, CS U
More informationRECOMMENDATION SYSTEM USING BLOOM FILTER IN MAPREDUCE
RECOMMENDATION SYSTEM USING BLOOM FILTER IN MAPREDUCE Reena Pagare and Anita Shinde Department of Computer Engineering, Pune University M. I. T. College Of Engineering Pune India ABSTRACT Many clients
More informationBenchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
More informationFeature Selection vs. Extraction
Feature Selection In many applications, we often encounter a very large number of potential features that can be used Which subset of features should be used for the best classification? Need for a small
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationSurvey On: Nearest Neighbour Search With Keywords In Spatial Databases
Survey On: Nearest Neighbour Search With Keywords In Spatial Databases SayaliBorse 1, Prof. P. M. Chawan 2, Prof. VishwanathChikaraddi 3, Prof. Manish Jansari 4 P.G. Student, Dept. of Computer Engineering&
More informationDistributed Image Processing using Hadoop MapReduce framework. Binoy A Fernandez (200950006) Sameer Kumar (200950031)
using Hadoop MapReduce framework Binoy A Fernandez (200950006) Sameer Kumar (200950031) Objective To demonstrate how the hadoop mapreduce framework can be extended to work with image data for distributed
More informationTime series experiments
Time series experiments Time series experiments Why is this a separate lecture: The price of microarrays are decreasing more time series experiments are coming Often a more complex experimental design
More informationPerformance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications
Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce
More informationA Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationR-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants
R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationBUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com Outline Predictive modeling methodology k-nearest Neighbor
More informationProviding Diversity in K-Nearest Neighbor Query Results
Providing Diversity in K-Nearest Neighbor Query Results Anoop Jain, Parag Sarda, and Jayant R. Haritsa Database Systems Lab, SERC/CSA Indian Institute of Science, Bangalore 560012, INDIA. Abstract. Given
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationExpert Finding Using Social Networking
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 1-1-2009 Expert Finding Using Social Networking Parin Shah San Jose State University Follow this and
More informationFP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez To cite this version: Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez. FP-Hadoop:
More informationDeveloping MapReduce Programs
Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2015/16 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationStatic Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
More informationTHE concept of Big Data refers to systems conveying
EDIC RESEARCH PROPOSAL 1 High Dimensional Nearest Neighbors Techniques for Data Cleaning Anca-Elena Alexandrescu I&C, EPFL Abstract Organisations from all domains have been searching for increasingly more
More informationKEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
More informationThis exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.
Big Data Processing 2013-2014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,
More informationMEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK
MEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK 1 K. LALITHA, 2 M. KEERTHANA, 3 G. KALPANA, 4 S.T. SHWETHA, 5 M. GEETHA 1 Assistant Professor, Information Technology, Panimalar Engineering College,
More informationCharacterizing Task Usage Shapes in Google s Compute Clusters
Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key
More informationVoronoi Treemaps in D3
Voronoi Treemaps in D3 Peter Henry University of Washington phenry@gmail.com Paul Vines University of Washington paul.l.vines@gmail.com ABSTRACT Voronoi treemaps are an alternative to traditional rectangular
More informationBiDAl: Big Data Analyzer for Cluster Traces
BiDAl: Big Data Analyzer for Cluster Traces Alkida Balliu, Dennis Olivetti, Ozalp Babaoglu, Moreno Marzolla, Alina Sirbu Department of Computer Science and Engineering University of Bologna, Italy BigSys
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More informationQuery Selectivity Estimation for Uncertain Data
Query Selectivity Estimation for Uncertain Data Sarvjeet Singh *, Chris Mayfield *, Rahul Shah #, Sunil Prabhakar * and Susanne Hambrusch * * Department of Computer Science, Purdue University # Department
More informationA Survey of Top-k Query Processing Techniques in Relational Database Systems
11 A Survey of Top-k Query Processing Techniques in Relational Database Systems IHAB F. ILYAS, GEORGE BESKALES, and MOHAMED A. SOLIMAN University of Waterloo Efficient processing of top-k queries is a
More informationCSE 326: Data Structures B-Trees and B+ Trees
Announcements (4//08) CSE 26: Data Structures B-Trees and B+ Trees Brian Curless Spring 2008 Midterm on Friday Special office hour: 4:-5: Thursday in Jaech Gallery (6 th floor of CSE building) This is
More informationI.T. System Requirements 2015
I.T. System Requirements 2015 1 Contents: page Contents 3. 4. 5. 6. Examples of incorrectly configured systems Simple server specification Standard server specification Complex server specification *DISCLAIMER*
More informationPLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.
PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo. VLDB 2009 CS 422 Decision Trees: Main Components Find Best Split Choose split
More informationDuplicate Detection Algorithm In Hierarchical Data Using Efficient And Effective Network Pruning Algorithm: Survey
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 12 December 2014, Page No. 9766-9773 Duplicate Detection Algorithm In Hierarchical Data Using Efficient
More informationFeature Selection with Monte-Carlo Tree Search
Feature Selection with Monte-Carlo Tree Search Robert Pinsler 20.01.2015 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 1 Agenda 1 Feature Selection 2 Feature Selection
More informationIntroduc)on to Hadoop
Introduc)on to Hadoop Slides compiled from: Introduc)on to MapReduce and Hadoop Shivnath Babu Experiences with Hadoop and MapReduce Jian Wen Word Count over a Given Set of Web Pages see bob throw see spot
More informationImage Search by MapReduce
Image Search by MapReduce COEN 241 Cloud Computing Term Project Final Report Team #5 Submitted by: Lu Yu Zhe Xu Chengcheng Huang Submitted to: Prof. Ming Hwa Wang 09/01/2015 Preface Currently, there s
More informationChapter 7. Feature Selection. 7.1 Introduction
Chapter 7 Feature Selection Feature selection is not used in the system classification experiments, which will be discussed in Chapter 8 and 9. However, as an autonomous system, OMEGA includes feature
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationNearest Neighbor Join with Groups and Predicates
Nearest Neighbor Join with Groups and Predicates F. Cafagna, M. Boehlen, A. Bracher Department of Computer Science Agroscope University of Zürich Switzerland DOLAP, Melbourne 23.0.20 Francesco Cafagna
More informationRedundant Data Removal Technique for Efficient Big Data Search Processing
Redundant Data Removal Technique for Efficient Big Data Search Processing Seungwoo Jeon 1, Bonghee Hong 1, Joonho Kwon 2, Yoon-sik Kwak 3 and Seok-il Song 3 1 Dept. of Computer Engineering, Pusan National
More informationA Survey of Top-k Query Processing Techniques in Relational Database Systems
A Survey of Top-k Query Processing Techniques in Relational Database Systems IHAB F. ILYAS, GEORGE BESKALES and MOHAMED A. SOLIMAN David R. Cheriton School of Computer Science University of Waterloo Efficient
More informationEchidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of
More informationLLamasoft K2 Enterprise 8.1 System Requirements
Overview... 3 RAM... 3 Cores and CPU Speed... 3 Local System for Operating Supply Chain Guru... 4 Applying Supply Chain Guru Hardware in K2 Enterprise... 5 Example... 6 Determining the Correct Number of
More informationKeyword Search in Graphs: Finding r-cliques
Keyword Search in Graphs: Finding r-cliques Mehdi Kargar and Aijun An Department of Computer Science and Engineering York University, Toronto, Canada {kargar,aan}@cse.yorku.ca ABSTRACT Keyword search over
More informationDistributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More information! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II
! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
More informationRANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING
= + RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING Stefan Savev Berlin Buzzwords June 2015 KEYWORD-BASED SEARCH Document Data 300 unique words per document 300 000 words in vocabulary Data sparsity:
More informationHadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010
Hadoop s Entry into the Traditional Analytical DBMS Market Daniel Abadi Yale University August 3 rd, 2010 Data, Data, Everywhere Data explosion Web 2.0 more user data More devices that sense data More
More informationBig Data and Scripting map/reduce in Hadoop
Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb
More informationClustering on Large Numeric Data Sets Using Hierarchical Approach Birch
Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationK-means Clustering Technique on Search Engine Dataset using Data Mining Tool
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 505-510 International Research Publications House http://www. irphouse.com /ijict.htm K-means
More informationQuery Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis
Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Rajesh Reddy Muley 1, Sravani Achanta 2, Prof.S.V.Achutha Rao 3 1 pursuing M.Tech(CSE), Vikas College of Engineering and
More informationFAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION
FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G. Lowe Computer Science Department, University of British Columbia, Vancouver, B.C., Canada mariusm@cs.ubc.ca,
More informationOptimization and analysis of large scale data sorting algorithm based on Hadoop
Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,
More informationResearch on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
More informationIndex Selection Techniques in Data Warehouse Systems
Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES
More informationHow To Write A Bigbench Benchmark For A Retailer
BigBench Overview Towards a Comprehensive End-to-End Benchmark for Big Data - bankmark UG (haftungsbeschränkt) 02/04/2015 @ SPEC RG Big Data The BigBench Proposal End to end benchmark Application level
More informationContinuous Monitoring of Top-k Queries over Sliding Windows
Continuous Monitoring of Top-k Queries over Sliding Windows Kyriakos Mouratidis 1 Spiridon Bakiras 2 Dimitris Papadias 2 1 School of Information Systems Singapore Management University 8 Stamford Road,
More informationFast Contextual Preference Scoring of Database Tuples
Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation
More informationChanging the Face of Database Cloud Services with Personalized Service Level Agreements
Changing the Face of Database Cloud Services with Personalized Service Level Agreements Jennifer Ortiz, Victor Teixeira de Almeida,, Magdalena Balazinska Computer Science and Engineering Department, University
More informationOverview. Introduction. Recommender Systems & Slope One Recommender. Distributed Slope One on Mahout and Hadoop. Experimental Setup and Analyses
Slope One Recommender on Hadoop YONG ZHENG Center for Web Intelligence DePaul University Nov 15, 2012 Overview Introduction Recommender Systems & Slope One Recommender Distributed Slope One on Mahout and
More informationAn Efficient Multi-Keyword Ranked Secure Search On Crypto Drive With Privacy Retaining
An Efficient Multi-Keyword Ranked Secure Search On Crypto Drive With Privacy Retaining 1 B.Sahaya Emelda and 2 Mrs. P. Maria Jesi M.E.,Ph.D., 1 PG Student and 2 Associate Professor, Department of Computer
More informationScenario: Optimization of Conference Schedule.
MINI PROJECT 1 Scenario: Optimization of Conference Schedule. A conference has n papers accepted. Our job is to organize them in a best possible schedule. The schedule has p parallel sessions at a given
More informationDistributed Kd-Trees for Retrieval from Very Large Image Collections
ALY et al.: DISTRIBUTED KD-TREES FOR RETRIEVAL FROM LARGE IMAGE COLLECTIONS1 Distributed Kd-Trees for Retrieval from Very Large Image Collections Mohamed Aly 1 malaa@vision.caltech.edu Mario Munich 2 mario@evolution.com
More informationMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationXML Data Integration Based on Content and Structure Similarity Using Keys
XML Data Integration Based on Content and Structure Similarity Using Keys Waraporn Viyanon 1, Sanjay K. Madria 1, and Sourav S. Bhowmick 2 1 Department of Computer Science, Missouri University of Science
More informationEvaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
More informationAn Efficiency Keyword Search Scheme to improve user experience for Encrypted Data in Cloud
, pp.246-252 http://dx.doi.org/10.14257/astl.2014.49.45 An Efficiency Keyword Search Scheme to improve user experience for Encrypted Data in Cloud Jiangang Shu ab Xingming Sun ab Lu Zhou ab Jin Wang ab
More informationClustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015
Clustering Big Data Efficient Data Mining Technologies J Singh and Teresa Brooks June 4, 2015 Hello Bulgaria (http://hello.bg/) A website with thousands of pages... Some pages identical to other pages
More informationComputational Geometry. Lecture 1: Introduction and Convex Hulls
Lecture 1: Introduction and convex hulls 1 Geometry: points, lines,... Plane (two-dimensional), R 2 Space (three-dimensional), R 3 Space (higher-dimensional), R d A point in the plane, 3-dimensional space,
More informationSmart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets
Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets Dudu Lazarov, Gil David, Amir Averbuch School of Computer Science, Tel-Aviv University Tel-Aviv 69978, Israel Abstract
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationHow To Cluster Of Complex Systems
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationEfficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
More informationAn Extensible Architecture for Run-time Monitoring of Conversational Web Services
An Extensible Architecture for Run-time Monitoring of Conversational Web Services Konstantinos Bratanis, Dimitris Dranidis, Anthony J.H. Simons South East European Research Centre (SEERC) Research Centre
More informationData Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
More informationMining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
More informationFast Data in the Era of Big Data: Twitter s Real-
Fast Data in the Era of Big Data: Twitter s Real- Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Presented by: Rania Ibrahim 1 AGENDA Motivation
More informationDynamic Querying In NoSQL System
Dynamic Querying In NoSQL System Reshma Suku 1, Kasim K. 2 Student, Dept. of Computer Science and Engineering, M. E. A Engineering College, Perinthalmanna, Kerala, India 1 Assistant Professor, Dept. of
More informationGraphalytics: A Big Data Benchmark for Graph-Processing Platforms
Graphalytics: A Big Data Benchmark for Graph-Processing Platforms Mihai Capotă, Tim Hegeman, Alexandru Iosup, Arnau Prat-Pérez, Orri Erling, Peter Boncz Delft University of Technology Universitat Politècnica
More informationFast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
More informationCan we Analyze all the Big Data we Collect?
DBKDA/WEB Panel 2015, Rome 28.05.2015 DBKDA Panel 2015, Rome, 27.05.2015 Reutlingen University Can we Analyze all the Big Data we Collect? Moderation: Fritz Laux, Reutlingen University, Germany Panelists:
More information