Similarity Search for Numerous Patterns in Multiple High-Speed Time-Series Streams

Size: px
Start display at page:

Download "Similarity Search for Numerous Patterns in Multiple High-Speed Time-Series Streams"

Transcription

1 Similarity Search for Numerous Patterns in Multiple High-Speed Time-Series Streams Bui Cong Giao, Duong Tuan Anh Presenter: Bui Cong Giao

2 Contents 1. Introduction 2. Preliminaries 3. The Proposed Method 4. Experimental Evaluation 5. Conclusions

3 Introduction Pattern discovery by similarity search in streaming context where new values are continuously appended as time progresses Retrievvals of newcoming time-series subsequences of streaming time series, which are approximately matched with static time-series patterns under the Euclidean distance (ED) Important scenario in which incoming time-series data are from many concurrent time-series streams at high-speed rates, and there are numerous patterns 3

4 Main contributions A novel multi-scale representation of time-series data for similarity search in streaming context Range search over streaming time-series for numerous patterns in which every pattern has its own search radius 4

5 Preliminaries Two ways to search patterns in time-series sequences under ED Whole matching : the sequences to be compared have the same length, e.g UCR-ED (UCR- Euclidean Distance) Subsequence matching : the sequences is partitioned into many segments. The search procedure begins from the first segment to the last one, e.g SS-NOS (Similar Search using Non- Overlapped Segmentation) 5

6 UCR-ED Introduced by Rakthanmanon et al. in 2012 Conduct similarity search for patterns in static timeseries sequences Read the time-series sequence into many big sections. After that, UCR-ED uses z-normalization in an incremental fashion while the window slides over a big section of the time-series sequence to find matching pairs Change UCR-ED so that the method accommodates with multi-threading, referred as TUCR-ED 6

7 SS-NOS Similar Search using Non-Overlapped Segmentation Introduced by us in 2014 Similar search for patterns over streaming time-series using non-overlapped segmentation Fig. 1 The non-overlapped segmentation of a time-series pattern 7

8 SS-NOS (cont.) Phase 1 Phase 2 Retrieve the coefficient vectors of the z-normalized non-overlapped segments of patterns by DFT, or Haar DWT, or PAA Store the coefficient vectors in an array of R-trees as a multi-resolution index structure Equipped with multi-threading, SS-NOS carries out similarity search in streaming time series using the array of R-trees 8

9 Restrictions of SS-NOS If the length of the remainder is long, then the filtering process is likely inefficient for such a time- series pattern since the filtering process can miss out the unpromising patterns. SS-NOS performs range search with one search radius for all time-series patterns, so this is inflexible and rather impractical. 9

10 The Proposed Method Similar search for patterns over streaming time-series using overlapped segmentation, Similar Search using Overlapped Segmentation (SS-OS) Fig. 2 The overlapped segmentation of a time-series pattern 10

11 The Proposed Method (cont.) SS-OS is basically similar to SS-NOS in similarity search Fig. 3 SS-OS conducts similar search for patterns in a time-series stream. 11

12 The Proposed Method (cont.) Algorithm RangeSearch( S) When there is a new-coming data of S, T n // Phase 2 1. postcheckset // the set of patterns for post-checking 2. pset P // the set of potential patterns 3. for i = 1 to maxlevel 4. Incrementally normalize s i 5. for i = 1 to maxlevel 6. Retrieve v i 7. pset SearchInRtree( R-tree[i], pset, v i ) 8. if pset = then 9. break // go to phase foreach (p in pset) 11. if i is the maximum filter level of p then 12. postcheckset postcheckset p 13. Remove p from pset 14. foreach (p in postcheckset) // Phase Normalize c 16. Compute the ED distance between np and z-normalized c to check whether the distance is within p.r The core subroutine searches patterns whose i th coefficient vector is similar to v i within their own search radius. The range search takes place in the R-tree of the i th filter level. 12

13 Experimental Evaluation Platform Intel Dual Core i3 M GHz, 4GB RAM PC C# Parameters The circular buffers of the time-series streams have the size of 1,024. The minimum node occupancy of R-trees is 4 and the maximum node occupancy is

14 Three query sets were created from the time-series dataset. The number of queries in each query set is The length of the query sequences varies from 8 to

15 Experimental Evaluation Implement range search by UCR-ED, TUCR-ED, SS- NOS, and SS-OS on the three pattern sets with the same radius search (0.01) Use Haar DWT in SS-NOS and SS-OS. Compare the search methods in terms of their precision, the number of distance function calls in post processing, and wall-clock time. 15

16 Experimental Results SS-OS has the same precision as UCR-ED and SS- NOS. The number of distance function calls of the UCR-ED and TUCR-ED are very large, while SS-OS and SS- NOS use multi-scale filtering so their numbers are very small. The pruning power of SS-OS is over 99.92%, whereas that of SS-NOS is only over 99.89%. 16

17 Experimental Results Fig. 4 The number of distance function calls in the post-processing phase 17

18 Experimental Results On average, the wall-clock times of SS-OS and SS- NOS are tiny, varying from 16 seconds to 19 seconds. The wall-clock times of UCR-ED for the three pattern sets are roughly 10 minutes, 13 minutes, and 11 minutes, respectively. The wall-clock times of TUCR-ED for the three pattern sets are roughly 2 minutes. 18

19 Experimental Results SearchInRtree in Algorithm RangeSearch performs range search in R-trees precisely. The average CPU times to process a new-coming data point of RangeSearch in all cases are tiny, varying from 2,000 ticks (*) to 2,600 ticks. PAA has the best performance in run time. Able to perform similarity search for numerous patterns over multiple high-speed time-series streams. (*) 1 millisecond = 10,000 ticks 19

20 Conclusions Propose an efficient multi-scale representation of timeseries data, the overlapped segmentation, for similarity search Perform range search for time-series patterns in which each pattern has its own search radius Work precisely and have fast responses while dealing with multiple streaming time series at high-speed rates 20

21 References [1] B. C. Giao and D. T. Anh, "Efficient similarity search for static queries in streaming time series," in Proceedings of International Conference on Green and Human Information Technology (ICGHIT) 2014, HoChiMinh City, 2014, pp [2] T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria and E. Keogh, "Searching and mining trillions of time series subsequences under Dynamic Time Warping," in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, August 12 16, 2012, pp [3] R. Agrawal, C. Faloutsos, and A. Swami, "Efficient similarity search in sequence databases," in Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms (FODO '93), Chicago, Illinois, USA, October 13-15, 1993, pp [4] K.-p. Chan and A. W.-c. Fu, "Efficient time series matching by wavelets," in Proceedings of the 15th IEEE International Conference on Data Engineering, March 23-26, 1999, pp [5] A. Guttman, "R-tree : A dynamic index structure for spatial searching," in Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 1984, pp [6] E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani, "Locally adaptive dimensionality reduction for indexing large time series databases," in Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, May 2001, pp [7] E. Keogh. The UCR time series classification/clustering page. [Online].

22 Thanks for listening Questions & Answers

Similarity Search in Multiple High-Speed Time-Series Streams under DTW

Similarity Search in Multiple High-Speed Time-Series Streams under DTW Similarity Search in Multiple High-Speed Time-Series Streams under DTW Bui Cong Giao, Duong Tuan Anh Presenter: Bui Cong Giao Contents 1. Introduction 2. Preliminaries 3. The Proposed Method 4. Experimental

More information

Efficient k-nn Search for Static Queries over High Speed Time-Series Streams

Efficient k-nn Search for Static Queries over High Speed Time-Series Streams 2014 Efficient k-nn Search for Static Queries over High Speed Time-Series Streams Bui Cong Giao, Duong Tuan Anh Presenter: Bui Cong Giao Contents 1. Introduction 2. Supporting Techniques 3. Proposed Method

More information

Improving Sort-Tile-Recusive Algorithm for R-tree Packing in Indexing Time Series

Improving Sort-Tile-Recusive Algorithm for R-tree Packing in Indexing Time Series Improving Sort-Tile-Recusive Algorithm for R-tree Packing in Indexing Time Series Bui Cong Giao, Duong Tuan Anh Presenter: Bui Cong Giao Contents 1. Introduction 2. Preliminaries 3. Strategies for improving

More information

Handwriting and Gestures in the Air, Recognizing on the Fly

Handwriting and Gestures in the Air, Recognizing on the Fly Handwriting and Gestures in the Air, Recognizing on the Fly Sharad Vikram Computer Science Division University of California, Berkeley sharad.vikram@berkeley.edu Lei Li Computer Science Division University

More information

Time series databases. Indexing Time Series. Time series data. Time series are ubiquitous

Time series databases. Indexing Time Series. Time series data. Time series are ubiquitous Time series databases Indexing Time Series A time series is a sequence of real numbers, representing the measurements of a real variable at equal time intervals Stock prices Volume of sales over time Daily

More information

Energy Characterization and Optimization of Embedded Data Mining Algorithms: A Case Study

Energy Characterization and Optimization of Embedded Data Mining Algorithms: A Case Study Energy Characterization and Optimization of Embedded Data Mining Algorithms: A Case Study Hanqing Zhou, Lu Pu, Yu Hu, Xiaowei Xu School of Optical Electronic Information Huazhong University of Science

More information

Similarity Search on Time Series Data. Presented by Zhe Wang

Similarity Search on Time Series Data. Presented by Zhe Wang Similarity Search on Time Series Data Presented by Zhe Wang Motivations Fast searching for time-series of real numbers. ( data mining ) Scientific database: weather, geological, astrophysics, etc. find

More information

Impact of the Sakoe-Chiba Band on the DTW Time-Series Distance Measure for knn Classification

Impact of the Sakoe-Chiba Band on the DTW Time-Series Distance Measure for knn Classification Impact of the Sakoe-Chiba Band on the DTW Time-Series Distance Measure for knn Classification Zoltan Geler 1, Vladimir Kurbalija 2, Miloš Radovanović 2, Mirjana Ivanović 2 1 Faculty of Philosophy, University

More information

The Influence of Global Constraints on DTW and LCS Similarity Measures for Time-Series Databases

The Influence of Global Constraints on DTW and LCS Similarity Measures for Time-Series Databases The Influence of Global Constraints on DTW and LCS Similarity Measures for Time-Series Databases Vladimir Kurbalija 1, Miloš Radovanović 1, Zoltan Geler 2, and Mirjana Ivanović 1 1 Department of Mathematics

More information

Iterative Incremental Clustering of Time Series

Iterative Incremental Clustering of Time Series Iterative Incremental Clustering of Time Series Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos Computer Science & Engineering Department University of California, Riverside Riverside,

More information

Subsequence Matching on Structured Time Series Data

Subsequence Matching on Structured Time Series Data Subsequence Matching on Structured Time Series Data Huanmei Wu Northeastern University maggiewu@ccs.neu.edu Steve B Jiang Harvard Medical School Jiang.steve@mgh.harvard.edu Betty Salzberg Northeastern

More information

A likelihood ratio distance measure for the similarity between the fourier transform of time series

A likelihood ratio distance measure for the similarity between the fourier transform of time series A likelihood ratio distance measure for the similarity between the fourier transform of time series A. J. Bagnall, G. J. Janacek and M. Powell School of Computing Sciences University of East Anglia Norwich,

More information

Efficient Selection of Various k-objects for a keyword Query based on MapReduce Skyline Algorithm

Efficient Selection of Various k-objects for a keyword Query based on MapReduce Skyline Algorithm DNIS 2014 Efficient Selection of Various k-objects for a keyword Query based on MapReduce Skyline Algorithm Md. Anisuzzaman Siddique and Yasuhiko Morimoto Hiroshima University 1 Overview DNIS 2014 1. Top-k

More information

Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping. T. Rakthanmanon, et. Al KDD 12

Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping. T. Rakthanmanon, et. Al KDD 12 Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping T. Rakthanmanon, et. Al KDD 12 Introduction Time Series Data Mining T = [(p 1, t 1 ), (p 2, t 2 ), (p 3, t 3 ) (p i,

More information

Mining Frequent Itemset Using Parallel Computing Apriori Algorithm

Mining Frequent Itemset Using Parallel Computing Apriori Algorithm Mining Frequent Itemset Using Parallel Computing Apriori Algorithm Prof. Kamani Gautam J. 1, Dr. Y. R. Ghodasara 2, Dr. Vaishali S Parsania 3 Assistant Professor, College of Agricultural Information Technology,

More information

Mining Sequential Patterns Using I-PrefixSpan

Mining Sequential Patterns Using I-PrefixSpan Mining Sequential Patterns Using I-PrefixSpan Dhany Saputra, Dayang R. A. Rambli, Oi Mean Foong Abstract In this paper, we propose an improvement of pattern growth-based PrefixSpan algorithm, called I-PrefixSpan.

More information

Fast Window Correlations Over Uncooperative Time Series

Fast Window Correlations Over Uncooperative Time Series Fast Window Correlations Over Uncooperative Time Series Richard Cole Dennis Shasha Xiaojian Zhao Department of Computer Science Courant Institute of Mathematical Sciences New York University {cole,shasha,xiaojian}@cs.nyu.edu

More information

DRSP : DIMENSION REDUCTION FOR SIMILARITY MATCHING AND PRUNING OF TIME SERIES DATA STREAMS

DRSP : DIMENSION REDUCTION FOR SIMILARITY MATCHING AND PRUNING OF TIME SERIES DATA STREAMS DRSP : DIMENSION REDUCTION FOR SIMILARITY MATCHING AND PRUNING OF TIME SERIES DATA STREAMS Vishwanath R H 1, Samartha T V 1, Srikantaiah K C 1, Venugopal K R 1, L M Patnaik 2 1 Department of Computer Science

More information

Incremental Mining for Regular Frequent Patterns in Vertical Format

Incremental Mining for Regular Frequent Patterns in Vertical Format Incremental Mining for Regular Frequent Patterns in Vertical Format Vijay Kumar G. #1, Valli Kumari V.* 2 # School of Computing, K L University Guntur 522502, India 1 gvijay_73@yahoo.co.in * Department

More information

The String Similarity Query Processing in Cloud Computing System

The String Similarity Query Processing in Cloud Computing System , pp.25-36 http://dx.doi.org/10.14257/ijgdc.2015.8.2.04 The String Similarity Query Processing in Cloud Computing System LiaoYuanLai Heyuan Polytechnic HeYuan 517000, China zsblyl@163.com Abstract The

More information

Chapter 5: Stream Processing. Big Data Management and Analytics 193

Chapter 5: Stream Processing. Big Data Management and Analytics 193 Chapter 5: Big Data Management and Analytics 193 Today s Lesson Data Streams & Data Stream Management System Data Stream Models Insert-Only Insert-Delete Additive Streaming Methods Sliding Windows & Ageing

More information

Real-Time Adaptive Algorithm for Resource Monitoring

Real-Time Adaptive Algorithm for Resource Monitoring Real-Time Adaptive Algorithm for Resource Monitoring Mauro Andreolini, Michele Colajanni, Marcello Pietri, Stefania Tosi University of Modena and Reggio Emilia {mauro.andreolini,michele.colajanni,marcello.pietri,stefania.tosi}@unimore.it

More information

Introduction to Data Mining. Chris Clifton Mining of Time Series Data

Introduction to Data Mining. Chris Clifton Mining of Time Series Data Introduction to Data Mining Chris Clifton Mining of Time Series Data Time-series database Mining Time-Series and Sequence Data Consists of sequences of values or events changing with time Data is recorded

More information

Cosine Similarity Measure and Genetic Algorithm for extracting main content from web documents

Cosine Similarity Measure and Genetic Algorithm for extracting main content from web documents Cosine Similarity Measure and Genetic Algorithm for extracting main content from web documents 1 Digvijay B. Gautam, 2 Pradnya V. Kulkarni 1,2 Department of Computer Engineering, Maharashtra Institute

More information

Stock Price Forecasting by Hybrid Machine Learning Techniques

Stock Price Forecasting by Hybrid Machine Learning Techniques Stock Price Forecasting by Hybrid Machine Learning Techniques Tsai, C.-F. and Wang, S.-P. Abstract Stock investment has become an important investment activity in Taiwan. However, investors usually get

More information

Fast Algorithm for Modularity-based Graph Clustering

Fast Algorithm for Modularity-based Graph Clustering Fast Algorithm for Modularity-based Graph Clustering Hiroaki Shiokawa NTT Software Innovation Center, NTT Corporation, July 23 rd, 2013 BACKGROUND & MOTIVATION 2 Large Graphs Large-scale graphs become

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

MODULE 15 Clustering Large Datasets LESSON 34

MODULE 15 Clustering Large Datasets LESSON 34 MODULE 15 Clustering Large Datasets LESSON 34 Incremental Clustering Keywords: Single Database Scan, Leader, BIRCH, Tree 1 Clustering Large Datasets Pattern matrix It is convenient to view the input data

More information

Effective Clustering of Time-Series Data Using FCM

Effective Clustering of Time-Series Data Using FCM Effective Clustering of Time-Series Data Using FCM Saeed Aghabozorgi and Teh Ying Wah Abstract Today, wide important advances in clustering time series have been obtained in the field of data mining. A

More information

Where can SAAM apply? Software Architecture Analysis Method (SAAM)

Where can SAAM apply? Software Architecture Analysis Method (SAAM) Where can SAAM apply? Software Architecture Analysis Method (SAAM) Lecture 7A U0882 Peter Lo 200 Software Architecture Analysis Method (SAAM) can be applied to two different analysis and evaluation tasks:

More information

Must-read Material : Multimedia Databases and Data Mining. Outline. Association rules - outline. C. Faloutsos

Must-read Material : Multimedia Databases and Data Mining. Outline. Association rules - outline. C. Faloutsos Must-read Material 15-826: Multimedia Databases and Data Mining Rakesh Agrawal, Tomasz Imielinski and Arun Swami Mining Association Rules Between Sets of Items in Large Databases Proc. ACM SIGMOD, Washington,

More information

Exact and Approximate Reverse Nearest Neighbor Search for Multimedia Data

Exact and Approximate Reverse Nearest Neighbor Search for Multimedia Data Exact and Approximate Reverse Nearest Neighbor Search for Multimedia Data Jessica Lin David Etter David DeBarr jessica@ise.gmu.edu detter@gmu.edu Dave.DeBarr@microsoft.com George Mason University Microsoft

More information

Distributed Algorithm for Text Documents Clustering Based on k-means Approach

Distributed Algorithm for Text Documents Clustering Based on k-means Approach Distributed Algorithm for Text Documents Clustering Based on k-means Approach Martin Sarnovsky, Noema Carnoka Department of cybernetics and artificial intelligence, Faculty of electrotechnics and informatics,

More information

Database support for concurrent digital mock up

Database support for concurrent digital mock up Proceedings of the Tenth International IFIP TC5 WG-5.2; WG-5.3 Conference PROLAMAT 1998 Database support for concurrent digital mock up S. Berchtold, H. P. Kriegel, M. Pötke Institute for Computer Science,

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

Preference Mining and Data Stream Mining. Sandra de Amo IT4BI Data Mining Advanced Topics

Preference Mining and Data Stream Mining. Sandra de Amo IT4BI Data Mining Advanced Topics Preference Mining and Data Stream Mining Sandra de Amo IT4BI Data Mining Advanced Topics Mining Contextual Object Preferences Mining Data Streams 5/14/13 MASTER IT4BI - UNIV-TOURS 2013 2 Our Agenda Seminar

More information

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, rajalakshmi@cit.edu.in

More information

Finding All Frequent Patterns Starting from the Closure

Finding All Frequent Patterns Starting from the Closure Finding All Frequent Patterns Starting from the Closure Mohammad El-Hajj and Osmar R. Zaïane Department of Computing Science, University of Alberta, Edmonton AB, Canada {mohammad, zaiane}@cs.ualberta.ca

More information

Finding Spatio-temporal Patterns in Multidimensional Data Streams

Finding Spatio-temporal Patterns in Multidimensional Data Streams Finding Spatio-temporal Patterns in Multidimensional Data Streams Santiago A. Nunes 1, Luciana A. S. Romani 2, Ana M. H. Avila 3, Priscila P. Coltri 3, Agma J. M. Traina 1, Elaine P. M. Sousa 1 1 University

More information

Efficient Time Series Matching by Wavelets

Efficient Time Series Matching by Wavelets Efficient Time Series Matching by Wavelets Kin-pong Chan and Ada Wai-chee Fu Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, Hong Kong kpchan, adafu csecuhkeduhk

More information

THE concept of Big Data refers to systems conveying

THE concept of Big Data refers to systems conveying EDIC RESEARCH PROPOSAL 1 High Dimensional Nearest Neighbors Techniques for Data Cleaning Anca-Elena Alexandrescu I&C, EPFL Abstract Organisations from all domains have been searching for increasingly more

More information

A Novel Method For Fast Collision Detection on the PR2. William Marshall, SUNFEST (CS), Lehigh University Advisor: Dr. Camillo J. Taylor.

A Novel Method For Fast Collision Detection on the PR2. William Marshall, SUNFEST (CS), Lehigh University Advisor: Dr. Camillo J. Taylor. A Novel Method For Fast Collision Detection on the PR2 William Marshall, SUNFEST (CS), Lehigh University Advisor: Dr. Camillo J. Taylor Abstract In the current robot motion planning pipeline for Willow

More information

MAMView: A Framework for Visualization of Metric Trees

MAMView: A Framework for Visualization of Metric Trees MAMView: A Framework for Visualization of Metric Trees Marcos R. Vieira 1, Fabio J. T. Chino 2, Caetano Traina Jr. 2, Agma J. M. Traina 2 1 University of California, Riverside, CA USA 2 University of São

More information

Review: DBMS Components

Review: DBMS Components Review: DBMS Components Database Management System Components CMPT 454: Database Systems II Advanced Queries (1) 1 / 17 Research Topics in Databases System Oriented How to implement a DBMS? How to manage

More information

Performance of KDB-Trees with Query-Based Splitting*

Performance of KDB-Trees with Query-Based Splitting* Performance of KDB-Trees with Query-Based Splitting* Yves Lépouchard Ratko Orlandic John L. Pfaltz Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science University of Virginia Illinois

More information

Temporal Data Mining for Small and Big Data. Theophano Mitsa, Ph.D. Independent Data Mining/Analytics Consultant

Temporal Data Mining for Small and Big Data. Theophano Mitsa, Ph.D. Independent Data Mining/Analytics Consultant Temporal Data Mining for Small and Big Data Theophano Mitsa, Ph.D. Independent Data Mining/Analytics Consultant What is Temporal Data Mining? Knowledge discovery in data that contain temporal information.

More information

Relaxed Queries over Data Streams. Relaxed Queries over Data Streams

Relaxed Queries over Data Streams. Relaxed Queries over Data Streams Relaxed Queries over Data Streams e Relaxed Queries over Data Streams Barbara Catania, Giovanna Guerrini, Maria Teresa Pinto, and Paola Podestà Barbara Catania, Giovanna Guerrini, Maria Teresa Pinto, and

More information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering

More information

Lecture 4: Principles of Parallel Algorithm Design (part 4)

Lecture 4: Principles of Parallel Algorithm Design (part 4) Lecture 4: Principles of Parallel Algorithm Design (part 4) 1 Mapping Technique for Load Balancing Sources of overheads: Inter-process interaction Idling Goals to achieve: To reduce interaction time To

More information

Stream Sequential Pattern Mining with Precise Error Bounds

Stream Sequential Pattern Mining with Precise Error Bounds Stream Sequential Pattern Mining with Precise Error Bounds Luiz F. Mendes,2 Bolin Ding Jiawei Han University of Illinois at Urbana-Champaign 2 Google Inc. lmendes@google.com {bding3, hanj}@uiuc.edu Abstract

More information

Index Terms Data mining, frequent itemset, closed itemset, maximal itemset

Index Terms Data mining, frequent itemset, closed itemset, maximal itemset Volume, Issue 9, September ISSN: 77 X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Efficient Data Mining

More information

Efficient Mining of Temporal High Utility Itemsets from Data streams

Efficient Mining of Temporal High Utility Itemsets from Data streams Efficient Mining of Temporal High Utility Itemsets from Data streams Vincent S. Tseng Dept. Computer Science and Information Engineering National Cheng Kung University, Taiwan, ROC tsengsm@mail.ncku.edu.tw

More information

CHAPTER FIVE RESULT ANALYSIS

CHAPTER FIVE RESULT ANALYSIS CHAPTER FIVE RESULT ANALYSIS 5.1 Chapter Introduction 5.2 Discussion of Results 5.3 Performance Comparisons 5.4 Chapter Summary 61 5.1 Chapter Introduction This chapter outlines the results obtained from

More information

BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies

BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies T-61.6020 Popular Algorithms in Data Mining and Machine Learning BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies Sami Virpioja Adaptive Informatics Research Centre Helsinki University

More information

BPP: Large Graph Storage for Efficient Disk-Based Processing

BPP: Large Graph Storage for Efficient Disk-Based Processing , pp.117-121 http://dx.doi.org/10.14257/astl.2013 BPP: Large Graph Storage for Efficient Disk-Based Processing Kamran Najeebullah, Kifayat Ullah Khan, Muhammad Waqas Nawaz, Young-Koo Lee Department of

More information

Warping the Time on Data Streams

Warping the Time on Data Streams Warping the Time on Data Streams Paolo Capitani, Paolo Ciaccia DEIS - IEIIT-BO/CNR, University of Bologna, Italy {pcapitani,pciaccia}@deis.unibo.it Abstract. Continuously monitoring through time the correlation/distance

More information

Improving frequent subgraph mining in the presence of symmetry

Improving frequent subgraph mining in the presence of symmetry Improving frequent subgraph mining in the presence of symmetry Christian Desrosiers Philippe Galinier Pierre Hansen Alain Hertz Introduction The difficulty of the frequent subgraph mining problem arises

More information

Fast Fibonacci Encoding Algorithm. Fast Fibonacci Encoding Algorithm

Fast Fibonacci Encoding Algorithm. Fast Fibonacci Encoding Algorithm Fast Fibonacci Encoding Algorithm Fast Fibonacci Encoding Algorithm Jiří Walder, Michal Krátký, and Jan Platoš Jiří Walder, Michal Krátký, and Jan Platoš Department of Computer Science VŠB Technical Department

More information

Spatial, textual and multimedia databases

Spatial, textual and multimedia databases Spatial, textual and multimedia databases Erik Zeitler erik.zeitler@it.uu.se Abstract This paper presents an overview of indexes for spatial and multimedia databases, whose indexes are often of the same

More information

MINING TIME SERIES DATA

MINING TIME SERIES DATA Chapter 1 MINING TIME SERIES DATA Chotirat Ann Ratanamahatana, Jessica Lin, Dimitrios Gunopulos, Eamonn Keogh University of California, Riverside Michail Vlachos IBM T.J. Watson Research Center Gautam

More information

198:671 Processing Massive Data Sets. S. Muthukrishnan

198:671 Processing Massive Data Sets. S. Muthukrishnan 198:671 Processing Massive Data Sets S. Muthukrishnan Details Meeting: Core B, Thursday 6 8 PM. Muthu: x7212, Core 319, Office: Monday 3 4. Graham: x4580, Core 413, Office: We meet [1] 01/30 [4] 02/06

More information

DWMiner : A tool for mining frequent item sets efficiently in data warehouses

DWMiner : A tool for mining frequent item sets efficiently in data warehouses DWMiner : A tool for mining frequent item sets efficiently in data warehouses Bruno Kinder Almentero, Alexandre Gonçalves Evsukoff and Marta Mattoso COPPE/Federal University of Rio de Janeiro, P.O.Box

More information

ASSOCIATION RULE MINING BASED

ASSOCIATION RULE MINING BASED ASSOCIATION RULE MINING BASED ON TRADE LIST Ms. Sanober Shaikh 1 Ms. Madhuri Rao 2 1 Department of Information Technology, TSEC, Bandra (w), Mumbai s.sanober1@gmail.com 2 Department of Information Technology,

More information

SWAT: Hierarchical Stream Summarization in Large Networks

SWAT: Hierarchical Stream Summarization in Large Networks SWAT: Hierarchical Stream Summarization in Large Networks Ahmet Bulut Ambuj K. Singh Department of Computer Science, University of California, Santa Barbara, CA, 9316 bulut,ambuj @cs.ucsb.edu October 3,

More information

3. Name some network architectures prevalent in machines supporting the message passing paradigm. Ans: Ethernet, Infiniband, Tree

3. Name some network architectures prevalent in machines supporting the message passing paradigm. Ans: Ethernet, Infiniband, Tree Frequently asked questions Parallel Computing by Prof. Subodh Kumar, Department of Computer Science and Engineering, IIT Delhi, Frequently asked questions: 1. What is shared-memory architecture? Ans: A

More information

Visualization Techniques in Data Mining

Visualization Techniques in Data Mining Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano

More information

A Comparison of Hardware and Software in Sequence Rule Evolution

A Comparison of Hardware and Software in Sequence Rule Evolution A Comparison of Hardware and Software in Sequence Rule Evolution Magnus Lie HETLAND 1 and Pål SÆTROM 2 1 Norwegian University of Science and Technology, Dept. of Computer and Information Science, Sem Sælands

More information

Sequence Clustering II

Sequence Clustering II Sequence Clustering II COMP 790-90 Research Seminar Spring 2010 CLUSEQ Sequence Cluster: a set of sequences S is a sequence cluster if, for each sequence in S, the similarity il it SIM S ( )between and

More information

Identifying Performance Bottlenecks in Hive: Use of Processor Counters

Identifying Performance Bottlenecks in Hive: Use of Processor Counters Identifying Performance Bottlenecks in Hive: Use of Processor Counters Alexander C Shulyak, Lizy K John Presented By: Shuang Song Problem Businesses and online services increasingly rely on insights derived

More information

Temporal and Spatial Data

Temporal and Spatial Data Temporal and Spatial Data Transaction systems Relational DB OO DB OR DB Decision Support OLAP Data cube Special indexing structures Data Mining Temporal and spatial databases 23.1 1 Overview Temporal Data

More information

Optimizing In-Order Execution of Continuous Queries over Streamed Sensor Data

Optimizing In-Order Execution of Continuous Queries over Streamed Sensor Data Optimizing In-Order Execution of Continuous Queries over Streamed Sensor Data Moustafa A. Hammad University of Calgary Calgary, Alberta, Canada T2N 1N4 hammad@cpsc.ucalgary.ca Walid G. Aref Ahmed K. Elmagarmid

More information

SPE MS Data Mining with Shapelets for Predicting Valve Failures in Gas Compressors Abstract 1. Introduction

SPE MS Data Mining with Shapelets for Predicting Valve Failures in Gas Compressors Abstract 1. Introduction SPE-180452-MS Data Mining with Shapelets for Predicting Valve Failures in Gas Compressors Om P. Patri, Arash S. Tehrani, Viktor K. Prasanna, Rajgopal Kannan, University of Southern California; Anand Panangadan,

More information

Storage Structures for Efficient Query Processing in a Stock Recommendation System

Storage Structures for Efficient Query Processing in a Stock Recommendation System Storage Structures for Efficient Query Processing in a Stock ecommendation System You-Min Ha Department of Computer Science Yonsei University, Korea ymha@cs.yonsei.ac.kr Sanghyun Park Department of Computer

More information

Computing Partial Data Cubes for Parallel Data Warehousing Applications

Computing Partial Data Cubes for Parallel Data Warehousing Applications Computing Partial Data Cubes for Parallel Data Warehousing Applications Frank Dehne 1,ToddEavis 2, and Andrew Rau-Chaplin 3 1 School of Computer Science Carleton University, Ottawa, Canada K1S 5B6 frank@dehne.net,

More information

Query Processing on Cubes Mapped from Ontologies to Dimension Hierarchies

Query Processing on Cubes Mapped from Ontologies to Dimension Hierarchies Query Processing on Cubes Mapped from Ontologies to Dimension Hierarchies Carlos Garcia-Alvarado Greenplum EMC USA Carlos Ordonez University of Houston USA Scenario Dimension Measurements Explore Digital

More information

Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi

More information

Email Spam Detection Using Customized SimHash Function

Email Spam Detection Using Customized SimHash Function International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

More information

A Novel Optimum Depth Decision Tree Method for Accurate Classification

A Novel Optimum Depth Decision Tree Method for Accurate Classification I J C T A, 9(7), 2016, pp. 3359-3365 International Science Press ISSN: 0974-5572 A Novel Optimum Depth Decision Tree Method for Accurate Classification Pullela S. V. V. S. R. Kumar, N. V. Nagamani Satyavani,

More information

Primary Data Deduplication Large Scale Study and System Design

Primary Data Deduplication Large Scale Study and System Design Primary Data Deduplication Large Scale Study and System Design A. El-Shimi, R. Kalach, A. Kumar, J. Li, A. Oltean, S. Sengupta Microsoft Corporation, Redmond (USA) Primary Data Deduplication for File-based

More information

Generalizing the Optimality of Multi-Step k-nearest Neighbor Query Processing

Generalizing the Optimality of Multi-Step k-nearest Neighbor Query Processing In Proc. 1th International Symposium on Spatial and Temporal Databases (SSTD'7), Boston, U.S.A., 27. Generalizing the Optimality of Multi-Step k-nearest Neighbor Query Processing Hans-Peter Kriegel, Peer

More information

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what

More information

Multiscale Analysis Of Data: Clusters, Outliers and Noise - Preliminary Results

Multiscale Analysis Of Data: Clusters, Outliers and Noise - Preliminary Results Multiscale Analysis Of Data: Clusters, Outliers and Noise - Preliminary Results Chetan Gupta Dept Of Mathematics Statistics and Computer Science, University of Illinois, Chicago Robert Grosssman National

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Session 2 Fundamentals of Computing II

Session 2 Fundamentals of Computing II 15.561 Information Technology Essentials Session 2 Fundamentals of Computing II Copyright 2003 Thomas Malone, Chris Dellarocas Acknowledgments:. Adapted from slides by Chris Dellarocas, U. Md.. Outline:

More information

Incremental Cosine Computations for Search and Exploration of Tag Spaces

Incremental Cosine Computations for Search and Exploration of Tag Spaces Incremental Cosine Computations for Search and Exploration of Tag Spaces Raymond Vermaas, Damir Vandic, and Flavius Frasincar Erasmus University Rotterdam PO Box 1738, NL-3000 DR, Rotterdam, the Netherlands

More information

Content-Based Image Retrieval ---Challenges & Opportunities

Content-Based Image Retrieval ---Challenges & Opportunities Content-Based Image Retrieval ---Challenges & Opportunities Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC 28223 Networks How can I access image/video

More information

A Time-weighted Average-based PAA Representation for Time Series Symbolization

A Time-weighted Average-based PAA Representation for Time Series Symbolization Int. J. Advance Soft Compu. Appl, Vol. 7, No. 3, November 2015 ISSN 2074-8523 A Time-weighted Average-based PAA Representation for Time Series Symbolization Yahyia Benyahmed 1, Azuraliza Abu Bakar 1, Abdul

More information

A Comparative Study on Content- Based Music Genre Classfication

A Comparative Study on Content- Based Music Genre Classfication A Comparative Study on Content- Based Music Genre Classfication Tao Li, Mitsunori Ogihara, and Qi Li, Proceedings of the 26th Annual International ACM Conference on Research and Development in Information

More information

FP-Tree Based Algorithms Analysis: FP- Growth, COFI-Tree and CT-PRO

FP-Tree Based Algorithms Analysis: FP- Growth, COFI-Tree and CT-PRO FP-Tree Based Algorithms Analysis: FP- Growth, COFI-Tree and CT-PRO Bharat Gupta Student, Department of Computer Science Thapar University Patiala, India bharatgupta35@gmail.com Dr. Deepak Garg IEEE Senior

More information

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism Jianqiang Dong, Fei Wang and Bo Yuan Intelligent Computing Lab, Division of Informatics Graduate School at Shenzhen,

More information

An Augmented Visual Query Mechanism for Finding Patterns in Time Series Data

An Augmented Visual Query Mechanism for Finding Patterns in Time Series Data An Augmented Visual Query Mechanism for Finding Patterns in Time Series Data 1 Eamonn Keogh, 2 Harry Hochheiser, 2 Ben Shneiderman* 1 Computer Science & Engineering Department University of California

More information

Accelerating K-Means using GPU and Multi-core

Accelerating K-Means using GPU and Multi-core Accelerating K-Means using GPU and Multi-core -Topics in Parallel Processing, Monsoon 2011, IIIT-H K-means Clustering k-means clustering (also referred as Lloyd s Algorithm) is a method of cluster analysis

More information

ivms-5200 ANPR V1.0.0 Software Requirements & Hardware Performance

ivms-5200 ANPR V1.0.0 Software Requirements & Hardware Performance ivms-5200 ANPR V1.0.0 Software Requirements & Hardware 1 Contents 1. Software Requirements... 2 2. Client... 4 3. Server... 6 1 1. Software Requirements Microsoft Windows 7 (64-bit) Microsoft Windows 8

More information

An Ontology-enhanced Cloud Service Discovery System

An Ontology-enhanced Cloud Service Discovery System An Ontology-enhanced Cloud Service Discovery System Taekgyeong Han and Kwang Mong Sim* Abstract This paper presents a Cloud service discovery system (CSDS) that aims to support the Cloud users in finding

More information

Learning-Based Super-Resolution System Using Single Facial Image and Multi-Resolution Wavelet Synthesis

Learning-Based Super-Resolution System Using Single Facial Image and Multi-Resolution Wavelet Synthesis Learning-Based Super-Resolution System Using Single Facial Image and Multi-Resolution Wavelet Synthesis Shu-Fan Lui, Jin-Yi Wu, Hsi-Shu Mao, and Jenn-Jier James Lien Robotics Laboratory, Dept. of Computer

More information

ALAR 2005 Conference

ALAR 2005 Conference on Applied Research in Information Technology Parallelization of Data Mining Algorithms on Computing Grids Jarret M. Warren 1 and Wing Ning Li 2 1 IT Department, Data Tronics, Fort Smith, AR 72903 2 Department

More information

FEATURE ANALYSIS OF EEG SIGNALS USING SOM

FEATURE ANALYSIS OF EEG SIGNALS USING SOM FEATURE ANALYSIS OF EEG SIGNALS USING SOM L. Gráfová, O. Vyšata, and A. Procházka Institute of Chemical Technology Department of Computing and Control Engineering Abstract The electroencephalogram (EEG)

More information

Efficient Parallel Set-Similarity Joins Using MapReduce

Efficient Parallel Set-Similarity Joins Using MapReduce Efficient Parallel Set-Similarity Joins Using MapReduce Rares Vernica Michael J. Carey Chen Li Department of Computer Science University of California, Irvine Special Interest Group on Management of Data,

More information

Cross Table Cubing: Mining Iceberg Cubes from Data Warehouses

Cross Table Cubing: Mining Iceberg Cubes from Data Warehouses Cross Table Cubing: Mining Iceberg Cubes from Data Warehouses Moonjung Cho State University of New York at Buffalo, U.S.A. mcho@cse.buffalo.edu Jian Pei Simon Fraser University, Canada jpei@cs.sfu.ca David

More information