Temporal Data Mining for Small and Big Data. Theophano Mitsa, Ph.D. Independent Data Mining/Analytics Consultant

Size: px
Start display at page:

Download "Temporal Data Mining for Small and Big Data. Theophano Mitsa, Ph.D. Independent Data Mining/Analytics Consultant"

Transcription

1 Temporal Data Mining for Small and Big Data Theophano Mitsa, Ph.D. Independent Data Mining/Analytics Consultant

2 What is Temporal Data Mining? Knowledge discovery in data that contain temporal information. Two types of time data: -event data (i.e., time of purchase) -time series data (EKG data). Talk Outline A. General Concepts B. Temporal Data Mining Applications: Medicine, bioinformatics, spatiotemporal data C. Temporal Data Mining and Big Data: business process, web data.

3 A. General Concepts

4 A.1 Time Data Representation and Temporality in Databases

5 Time Data Representation and Temporality in Databases Time series: Real-valued measurements at regular temporal intervals. Temporal sequences: Time stamped at regular or irregular time intervals. Example: The sequence of purchases of a customer on an online store. Transaction time: The time that information is entered in the database. For example, the time of a purchase. Valid time: The time an entity is valid in the real world. For example, the time the subscription of a customer starts. Bi-temporal time stamping. Have both a transaction and valid time.

6 Types of databases Snapshot databases: Keep the most recent version of data. Rollback databases: Support only a transaction time. Historical databases: Support only valid time. Temporal databases: Support both valid and transaction time. Allen s interval algebra Allen s interval algebra offers the most widely accepted way to express temporal relations and perform temporal reasoning [1]. Allen defines 13 temporal relations: before, after, meets, overlaps, etc.

7 Time Series Representation Requirements: Reduce the dimensionality of the similarity search problem, distance in the feature space less or equal in original space. Schemes: Fourier transform. Wavelet transform. Piecewise Aggregate Approximation and Piecewise Line Approximation. Shape Definition Language. Model-based, such as Hidden Markov model. Perceptually Important Points.

8 EKG PIP points

9 A.2. Temporal Data Mining Tasks Similarity Computation Classification/Clustering Pattern recognition Prediction

10 A.2.1 Time Series Similarity Computation

11 Similarity Computation in time series Distance-based. Dynamic Time Warping. This is applied when the time series are not aligned. Longest Common Subsequence. It assumes the same scale and baseline. It is tolerant to gaps and is more resistant to noise and outliers than DTW.

12 A.2.2 Classification/Clustering

13 Classification/Clustering of Time Series Data Non-model-based (traditional): Example: NNs, SVMs, decision trees, k-means. These can be applied to (a) features extracted from the series, such as PIPs, FT coefficients, trend, seasonality, mean, or (b) the raw time series data. Model-based. They use some model information about the time series, which comes from the fact that time series data values are usually correlated. Example: HMM, ARMA, AR, Markov chain.

14 A.2.3 Pattern Discovery in Temporal Data

15 Pattern Discovery Pattern discovery in event sequences: 1.Sequence mining (multiple sequences): Apriori, GSP.. 2. Association rule discovery (single sequence) 3. Frequent Episode Discovery (single sequence). An episode is a sequence of events appearing within a specific time window in a specific order, i.e., interest rates increase (event 1) and stock market drop (event 2). Pattern discovery in time series: 1. Motif and anomaly discovery (e.g., bioinformatics and computer networking monitoring). 2. Streaming data pattern discovery (e.g. financial data analysis or sensor data).

16 A.2.4 Prediction

17 Event prediction: Rare event prediction Prediction Event duration prediction: Regression Time Series Forecasting: Moving average Autoregression ARMA models

18 B. Applications B.1 Applications in Medicine

19 Chronus II Chronus II [3] is a temporal database mediator, that allows temporal abstractions. It extends the SQL language to allow general temporal queries on clinical databases for decisionsupport systems. On its basic level it uses Allen s interval algebra to define temporal relationships. A later ontological version [4] of the mediator exists that utilizes OWL and SWRL.

20 The TEMPADIS System This is a system for the discovery of patterns in courseof-disease data [5]. It was applied on a database of HIV patients. 18 variables were used, such as white blood cell count and drug types. Classification was performed in order to determine the health status of a patient. A decision tree approach was used. There were five health status categories ranging from asymptomatic to severe illness. Finally, the GSP algorithm was used for pattern detection in sequences of events across patients in the database.

21 Analysis and Classification of EEG Time Series In [6], the fractal dimension was used to analyze EEG signals and detect patterns. The fractal dimension was chosen because of the chaotic nature of the signals. In [7], 3 methods to classify EEG time series were compared. 1. Linear Discriminant Analysis. 2. Neural Networks. 3. Support Vector Machines. SVMs gave the best results.

22 B.2 Applications in Bioinformatics

23 General Concepts Microarray technology has enabled us to study thousands of genes simultaneously. This is done using gene expression profiles, which measure a gene s activity. Gene expression profiles can be obtained either at specific time points or at successive time intervals. In the second case, they are known as gene expression time series.

24 Clustering of Gene Expression Time Series Difficult problem because: Possible presence of noise, intersecting clusters. Time series are very short (even as short as four samples) Time series can be unevenly sampled. The time series could have different scaling and shifting. The similarity measure should be shape-based, i.e., it should be based on the changes in the intensity and not the intensity itself.

25 Clustering of Gene Expression Time Series Spline-Based Methods: can be used in time series with missing points. Model-Based Methods: For example, autoregressive equations or Hidden Markov Models can be used to model the series. Fuzzy-Clustering Based Methods Template-Based Methods: a template is used, after DTW is employed for alignment.

26 B.3 Spatiotemporal Applications

27 Analysis of moving point objects (MPOs) Two types of analysis: Descriptive modeling: Describe the entire lifeline of the moving object. Retrieval by content: Find a specific motion pattern. Descriptive Modeling Goal: Find clusters that describe the lifelines (movements). For example, the motion of a group of objects can be described using the motion azimuth. 1: the objects move in the same direction the objects move in perpendicular directions. 0: the objects move in opposite directions

28 MPO analysis: Retrieval by content Problem: Detect relative motion patterns, i.e. detect how the attributes of different object movements related over space and time (speed, change of speed, etc.) Main idea: Fit to the data a motion template with specific motion attributes Example patterns: Flocking: Objects within a circular area of radius moving in the same direction. Leadership: Objects moving in the same direction with one object being ahead of all other objects.

29 Trajectory Data Mining Problem: Find similar trajectories. This is an important problem ( e.g., object identification in video). The similarity measure must be able to handle: Different sampling rates, similar motions in different space regions, noise, data with different lengths. Approaches: LCSS [8], Minimum Bounding Rectangles [9], FT combined with SOM [10].

30 Open GeoDa Adds a Temporal Feature GeoDa [11] is a very popular open source tool for spatial analysis and modeling. In Sept. 12, it was announced that its new version will include space-time analysis maps, that will allow the user to track changes in spatial patterns over time, such as follow the change in the vegetation of an area.

31 C. Temporal Data Mining and Big Data

32 The 3 Vs of Big Data Variety Volume Velocity -> Real time/agile Analytics.

33 Agile Analytics In agile analytics, collective intelligence from the entire organization is used to develop continuously evolving prediction models as to how to enhance customer satisfaction and improve strategic business decisions.

34 C.1 Big Data and Business Processes

35 Value Chain Temporal Optimization Embedding of real-time fine granularity data in the business decision process: Real-time inventory management and efficient response to high demand times. Acquisition of real-time sensor data from the manufacturing process: - Manufacturing process efficiency: bottleneck identification, yield maximization, defect reduction. - X-raying [12] of business processes to ensure conformance with process design.

36 The Hospital and Agile Analytics Electronic medical records enable agile analytics. Possible uses: 1. Disease outbreak detection, with minimum latency. 2. Pharmacovigilance: Identification of drug adverse effects on a scale that is not possible in clinical trials.

37 C.2 Big Data from Web Usage Mining

38 Web Data Analysis for Behavioral Targeting Goals: Build behavior profiles for web users. At real-time, compute a relevance score for an ad that will decide or not the appearance of the ad. Data Mining Operations regarding users: Classification: Classify groups of users based on their profiles. Clustering: Used when the user categories are not known.

39 Mining the Web Usage Data Statistical analysis: For example, most frequently accessed web page, number of accessed web pages, maximum viewing time of a page, average length of a path to a site, etc. Path Analysis: paths. This yields the most frequent visited Association Rule Discovery: Discover the pages that are accessed together in a user session whose support exceeds a certain threshold. Sequential Pattern Discovery: Discover patterns that appear in a sequence of site visits by a user.

40 C.3 Big Data from Data Streams

41 Stream Pattern Discovery Algorithms Streaming data are of growing importance in many areas including monitoring for security purposes, financial forecasting, and analysis of location data. Challenges: 1. Huge amounts of data that arrive at high rates. 2. Often users need to respond immediately. Insight: The stream values are often correlated and a few hidden variables are enough to characterize the data.

42 Stream Pattern Discovery SPIRIT [13] : An algorithm that finds trends and hidden variables in a family of incoming streams. Main idea: Use Principle Component Analysis. Advantages : Adaptive,automatically detects changes in the incoming streams, scales linearly with the number of streams. 2. SpADe[14]: For the problem of matching an incoming stream against a predefined pattern: A warping distance that can handle shifting and scaling both in the amplitude and temporal dimensions. It can be incorporated in stream pattern discovery (in similarity search).

43 The AWSOM algorithm Purpose: For streaming data coming from sensors operating in hostile and remote environments. It allows sensors to detect patterns and trends [15]. Requirements of an algorithm that processes sensor stream data: Ability to detect simple or periodic patterns. Ability to filter out noise. Low memory usage. Be online and one pass. Ability to detect outliers. Should not require supervision by humans.

44 The AWSOM algorithm (continued) Main idea: The AWSOM algorithm utilizes wavelet primarily for the following reasons: (a) easy periodicity detection (b) need to store just a few coefficients (c) operates without supervision (d) requires only one pass. Experimental results showed that the algorithm can detect periodicities and bursts.

45 Conclusion Knowledge discovery in temporal data has applications in many areas. Since Big Data are temporal in nature, temporal data mining and especially real-time analytics and Agile Analytics are of increasing importance in order to understand the evolution of processes/customers in time and reduce the latency between data collection and using the data in decision making.

46 References 1. Allen, J. F., Maintaining Knowledge about Temporal Intervals, Communications of the ACM, vol. 26, no. 11, pp , Weiss, G.M. and H. Hirsch, Learning to Predict Rare Events in Event Sequences, Proceedings of the 4 th International Conference on Knowledge Discovery and Data Mining, pp , AAAI Press, O Connor, M.J., S.W. Tu, M.A. Musen, The Chronus II Temporal Database Mediator, Proceedings of the AMIA Annual Symposium, pp , San Antonio, TX, O Connor, M.J., R.D. Shankar, A.K.Das, An Ontology-Driven Mediator for Querying Time-Oriented Biomedical Data, 19 th IEEE International Symposium on Computer-Based Medical Systems, pp , Salt Lake City, Utah, Ramirez, J.C.G. et al., Temporal Pattern Discovery in Course-of- Disease Data, IEEE Engineering in Medicine and Biology, vol. 19, no. 4, pp , 2000.

47 References 6. Paramanathan, P. and R. Uthayakumar, Detecting Patterns in Irregular Time Series with Fractal Dimension, Proceedings of the International Conference on Computational Intelligence and Multimedia Applications, pp , [Gar03] Garrett, D. et al., Comparison of Linear, Non-Linear, and Feature Selection Methods for EEG Signal Classification, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 11, no. 2, pp , June Vlachos, M., G. Kollios, D. Gunopoulos, Discovering Similar Multidimensional Trajectories, Proceedings of the International Conference on Data Engineering (ICDE), pp , Vlachos M., Hadjieleftheriou M., Gunopoulos D., Keogh E., Indexing Multi- Dimensional Time Series with Support for Multiple Distance Measures, Proceedings of the ACM SIGKDD Conference, pp , Washington DC (USA), August [Kha05] Khalid, S. and A. Naftel, Classifying Spatiotemporal Object Trajectories Using Unsupervised Learning of Basis Functions Coefficients, Proceedings of the 3 rd ACM International Workshop on Video Surveillance and Sensor Networks, pp , 2005.

48 References 11. https://geodacenter.asu.edu/ogeoda 12. Van der Aalst, W., Process Mining, Communications of the ACM, pp , Papadimitriou, S., J. Sun, C. Faloutsos, Streaming Pattern Discovery in Multiple Time Series, Proceedings of the 31 st VLDB Conference, pp , Chen, Y. et al., SpADe: On Shape-Based Pattern Detection in Streaming Time Series, Proceedings of the IEEE 23 rd International Conference on Data Engineering, pp , Papadimitriou, S., A. Brockwell, C. Faloutsos, Adaptive, Unsupervised Stream Mining, The VLDB Journal, vol. 13, pp , 2004.

CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Introduction. Jun Du The University of Western Ontario

Introduction. Jun Du The University of Western Ontario Introduction Jun Du The University of Western Ontario jdu43@uwo.ca Outline Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT?

WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT? WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT? Data mining is mainly used for decision making in business. The abundance of data, coupled with the need for powerful data analysis tools, has been described

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome

Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Web mining and knowledge discovery of usage patterns - A survey. CS748 Yan Wang

Web mining and knowledge discovery of usage patterns - A survey. CS748 Yan Wang Web mining and knowledge discovery of usage patterns - A survey CS748 Yan Wang Introduction Web data mining Usage mining on the Web WebSIFT: a usage mining system Personalization vs. User navigation pattern

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

SPE MS Data Mining with Shapelets for Predicting Valve Failures in Gas Compressors Abstract 1. Introduction

SPE MS Data Mining with Shapelets for Predicting Valve Failures in Gas Compressors Abstract 1. Introduction SPE-180452-MS Data Mining with Shapelets for Predicting Valve Failures in Gas Compressors Om P. Patri, Arash S. Tehrani, Viktor K. Prasanna, Rajgopal Kannan, University of Southern California; Anand Panangadan,

More information

Introducing Machine Learning

Introducing Machine Learning Introducing Machine Learning What is Machine Learning? Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational

More information

DATA MINING REVIEW BASED ON. MANAGEMENT SCIENCE The Art of Modeling with Spreadsheets. Using Analytic Solver Platform

DATA MINING REVIEW BASED ON. MANAGEMENT SCIENCE The Art of Modeling with Spreadsheets. Using Analytic Solver Platform DATA MINING REVIEW BASED ON MANAGEMENT SCIENCE The Art of Modeling with Spreadsheets Using Analytic Solver Platform What We ll Cover Today Introduction Session II beta training program goals Brief overview

More information

A Performance Comparison of Pattern Discovery Methods on Web Log Data

A Performance Comparison of Pattern Discovery Methods on Web Log Data A Performance Comparison of Pattern Discovery Methods on Web Log Data Murat Ali Bayır, Ismail H. Toroslu, Ahmet Coşar Department of Computer Engineering Middle East Technical University E-Mail: {ali.bayir,

More information

Data Warehousing. Technological Education Institution of Larisa in collaboration with Staffordshire University Larisa Dr.

Data Warehousing. Technological Education Institution of Larisa in collaboration with Staffordshire University Larisa Dr. Data Warehousing Technological Education Institution of Larisa in collaboration with Staffordshire University Larisa 2005-2006 Dr. Theodoros Mitakos AGENDA DATA WAREHOUSES DATA MINING INTRODUCTION There

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Stock Price Forecasting by Hybrid Machine Learning Techniques

Stock Price Forecasting by Hybrid Machine Learning Techniques Stock Price Forecasting by Hybrid Machine Learning Techniques Tsai, C.-F. and Wang, S.-P. Abstract Stock investment has become an important investment activity in Taiwan. However, investors usually get

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours.

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours. (International Program) 01219141 Object-Oriented Modeling and Programming 3 (3-0) Object concepts, object-oriented design and analysis, object-oriented analysis relating to developing conceptual models

More information

Data Warehousing & Data Mining IT434

Data Warehousing & Data Mining IT434 Data Warehousing & Data Mining IT434 Lab Instructors Ms. Wejdan Alkaldi Ms. Sumayah Al-Rabiaah Ms. Weam AlRashed Note: when you email me, please insert [IT434]

More information

Introduction to Machine Learning. What is Machine Learning?

Introduction to Machine Learning. What is Machine Learning? Introduction to Machine Learning CS195-5-2003 Thomas Hofmann 2002,2003 Thomas Hofmann CS195-5-2003-01-1 What is Machine Learning? Machine learning deals with the design of computer programs and systems

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Study and Analysis of Data Mining Concepts

Study and Analysis of Data Mining Concepts Study and Analysis of Data Mining Concepts M.Parvathi Head/Department of Computer Applications Senthamarai college of Arts and Science,Madurai,TamilNadu,India/ Dr. S.Thabasu Kannan Principal Pannai College

More information

Knowledge Discovery in Databases

Knowledge Discovery in Databases Knowledge Discovery in Databases Javier Béjar cbea CS - MIA AMLT - 2016/2017 Javier Béjar cbea (CS - MIA) Knowledge Discovery in Databases AMLT - 2016/2017 1 / 32 Outline 1 Knowledge Discovery in Databases

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis The First International Symposium on Optimization and Systems Biology (OSB 07) Beijing, China, August 8 10, 2007 Copyright 2007 ORSC & APORC pp. 45 51 Integrated Data Mining Strategy for Effective Metabolomic

More information

Introduction to Data Mining. Chris Clifton Mining of Time Series Data

Introduction to Data Mining. Chris Clifton Mining of Time Series Data Introduction to Data Mining Chris Clifton Mining of Time Series Data Time-series database Mining Time-Series and Sequence Data Consists of sequences of values or events changing with time Data is recorded

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA POLITECNICO DI MILANO GRADUATE SCHOOL OF BUSINESS BABD INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA Courses Description A JOINT PROGRAM WITH POLITECNICO DI MILANO SCHOOL OF MANAGEMENT PRE-COURSES

More information

Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT.

Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT. Wil M.R van der Aalst Process Mining Discovery, Conformance and Enhancement of Business Processes Q UNIVERS1TAT m LIECHTENSTEIN Bibliothek ^J Springer Contents 1 Introduction I 1.1 Data Explosion I 1.2

More information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering

More information

Investigating Clinical Care Pathways Correlated with Outcomes

Investigating Clinical Care Pathways Correlated with Outcomes Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013 Outline Care Pathways Typical Challenges

More information

Topics in basic DBMS course

Topics in basic DBMS course Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate

More information

Multimodal Data: Acquisition, Processing, Storage and Exploration

Multimodal Data: Acquisition, Processing, Storage and Exploration COMMUNICATION IN TECHNICAL ENGLISH QUIM LLIMONA TORRAS Multimodal Data: Acquisition, Processing, Storage and Exploration ABSTRACT This paper is an introduction to the not- so- young but now rediscovered

More information

Clustering Multidimensional Trajectories based on Shape and Velocity

Clustering Multidimensional Trajectories based on Shape and Velocity Clustering Multidimensional Trajectories based on Shape and Velocity Yutaka Yanagisawa NTT Communicatin Science Laboratoriesy NTT Corporation 2-4 Hikaridai, Seika, Soraku, Kyoto, JAPAN yutaka@cslab.kecl.ntt.co.jp

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

Machine Learning Approaches in Bioinformatics and Computational Biology. Byron Olson Center for Computational Intelligence, Learning, and Discovery

Machine Learning Approaches in Bioinformatics and Computational Biology. Byron Olson Center for Computational Intelligence, Learning, and Discovery Machine Learning Approaches in Bioinformatics and Computational Biology Byron Olson Center for Computational Intelligence, Learning, and Discovery Machine Learning Background and Motivation What is learning?

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Similarity Search in Multiple High-Speed Time-Series Streams under DTW

Similarity Search in Multiple High-Speed Time-Series Streams under DTW Similarity Search in Multiple High-Speed Time-Series Streams under DTW Bui Cong Giao, Duong Tuan Anh Presenter: Bui Cong Giao Contents 1. Introduction 2. Preliminaries 3. The Proposed Method 4. Experimental

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

An Introduction to Text Data Mining

An Introduction to Text Data Mining An Introduction to Text Data Mining Adam Zimmerman The Ohio State University Data Mining and Statistical Learning Discussion Group September 6, 2013 Adam Zimmerman (OSU) DMSL Group Intro to Text Mining

More information

Data Mining An introduction

Data Mining An introduction Data Mining An introduction Devert Alexandre School of Software Engineering of USTC 13 February 2012 Slide 1/1 Table of Contents Devert Alexandre (School of Software Engineering of USTC) Data Mining Slide

More information

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] Stephan Spiegel and Sahin Albayrak DAI-Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7,

More information

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML www.bsc.es A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML Josep Ll. Berral, Nicolas Poggi, David Carrera Workshop on Big Data Benchmarks Toronto, Canada 2015 1 Context ALOJA: framework

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) 305 REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Similarity Search for Numerous Patterns in Multiple High-Speed Time-Series Streams

Similarity Search for Numerous Patterns in Multiple High-Speed Time-Series Streams Similarity Search for Numerous Patterns in Multiple High-Speed Time-Series Streams Bui Cong Giao, Duong Tuan Anh Presenter: Bui Cong Giao Contents 1. Introduction 2. Preliminaries 3. The Proposed Method

More information

Technology Marketing using PCA, SOM, and STP Strategy Modeling

Technology Marketing using PCA, SOM, and STP Strategy Modeling 87 Technology Marketing using PCA, SOM, and STP Strategy Modeling Sunghae Jun Department of Bioinformactics and Statistics, Cheongju University Cheongju, Chungbuk 360-764, Korea Abstract Technology marketing

More information

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

More information

DATA SCIENCE CONSULTING GIVE YOUR DATA MEANING

DATA SCIENCE CONSULTING GIVE YOUR DATA MEANING DATA SCIENCE CONSULTING GIVE YOUR DATA MEANING GIVE YOUR DATA MEANING: WITH DATA SCIENCE CONSULTING! Comma Data Science Consulting supports in optimizing business challenges with stateof-the-art methods

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Foundations - 2. Periodicity Detection, Time-series Correlation, Burst Detection. Temporal Information Retrieval

Foundations - 2. Periodicity Detection, Time-series Correlation, Burst Detection. Temporal Information Retrieval Foundations - 2 Periodicity Detection, Time-series Correlation, Burst Detection Temporal Information Retrieval Time Series An ordered sequence of values (data points) of variables at equally spaced time

More information

Shape Representation and Matching of 3D Objects for Computer Vision Applications

Shape Representation and Matching of 3D Objects for Computer Vision Applications Proceedings of the 5th WSEAS Int. Conf. on SIGNAL, SPEECH and IMAGE PROCESSING, Corfu, Greece, August 7-9, 25 (pp298-32) Shape Representation and Matching of 3D Objects for Computer Vision Applications

More information

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool. International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm

More information

Machine Learning. 01 - Introduction

Machine Learning. 01 - Introduction Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

More information

The Big Data mining to improve medical diagnostics quality

The Big Data mining to improve medical diagnostics quality The Big Data mining to improve medical diagnostics quality Ilyasova N.Yu., Kupriyanov A.V. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. The

More information

IEEE LansA Informatics JAVA

IEEE LansA Informatics JAVA 2013-2014 IEEE Projects @ LansA Informatics S.NO Project Code IEEE 2013 Project Titles Domain Lang/Year JAVA 1 LIPJ1301 A Stochastic Model to Investigate Data Center Performance and QoS in IaaS Cloud Computing

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data Yinong Chen 2 Big Data Big Data Technologies Cloud Computing Service and Web-Based Computing Applications Industry Control Systems

More information

Basic Pattern Recognition Concept

Basic Pattern Recognition Concept Concepts of Pattern Basic Pattern Concept Xiaojun Qi Pattern: A pattern is the description of an object. According to the nature of the patterns to be recognized, we may divide our acts of recognition

More information

Appendix III: Ten (10) Specialty Areas Data Sciences

Appendix III: Ten (10) Specialty Areas Data Sciences Appendix III: Ten (10) Specialty Areas Data Sciences Curriculum Mapping to Knowledge Units-Data Sciences Specialty Area IX. Data Sciences Specialty Area 1. Knowledge Unit title: Research Design and Application

More information

From Data to Foresight:

From Data to Foresight: Laura Haas, IBM Fellow IBM Research - Almaden From Data to Foresight: Leveraging Data and Analytics for Materials Research 1 2011 IBM Corporation The road from data to foresight is long? Consumer Reports

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Data Mining. Shahram Hassas Math 382 Professor: Shapiro

Data Mining. Shahram Hassas Math 382 Professor: Shapiro Data Mining Shahram Hassas Math 382 Professor: Shapiro Agenda Introduction Major Elements Steps/ Processes Examples Tools used for data mining Advantages and Disadvantages What is Data Mining? Described

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Data Isn't Everything

Data Isn't Everything June 17, 2015 Innovate Forward Data Isn't Everything The Challenges of Big Data, Advanced Analytics, and Advance Computation Devices for Transportation Agencies. Using Data to Support Mission, Administration,

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Resampling Detection for Digital Image Forensics

Resampling Detection for Digital Image Forensics 1 Resampling Detection for Digital Image Forensics John Ho, Derek Ma, and Justin Meyer Abstract A virtually unavoidable consequence of manipulations on digital images are statistical correlations introduced

More information

Using Time Series Analysis to Visualize and Evaluate Background Subtraction Results for Computer Vision Applications

Using Time Series Analysis to Visualize and Evaluate Background Subtraction Results for Computer Vision Applications Using Time Series Analysis to Visualize and Evaluate Background Subtraction Results for Computer Vision Applications Samah Ramadan Computer Science Department University of Maryland College Park, MD sramadan@cs.umd.edu

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive

More information

Data mining is described as the method of comparing large volumes of data, looking

Data mining is described as the method of comparing large volumes of data, looking Data Mining Shahram Hassas California State University, Northridge General Terms: Data Mining Additional Key Words and Phrases: Data Mining Data mining is described as the method of comparing large volumes

More information

PMU Time Series Data Mining

PMU Time Series Data Mining PMU Time Series Data Mining Natasha Balac, Ph.D Chuck Wells, Ph.D Nicole Wolter Albert Nguyen Jake Schurmeier Predictive Analytics Center of Excellence (PACE) San Diego Supercomputer Center University

More information

Data Mining. Part 1. Introduction. 1.2 Data Mining Functionalities. Fall Instructor: Dr. Masoud Yaghini. Data Mining Functionalities

Data Mining. Part 1. Introduction. 1.2 Data Mining Functionalities. Fall Instructor: Dr. Masoud Yaghini. Data Mining Functionalities Data Mining Part 1. Introduction 1.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Mining Frequent Patterns, Associations, and Correlations Classification Numeric Prediction Cluster Analysis

More information

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

More information

ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Big Data: Image & Video Analytics

Big Data: Image & Video Analytics Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH The Big Data Wave 60% of internet traffic is multimedia content (images and videos)

More information