Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle



Similar documents
Veracity in Big Data Reliability of Routes

Keywords: Mobility Prediction, Location Prediction, Data Mining etc

Statistics Graduate Courses

Advanced Methods for Pedestrian and Bicyclist Sensing

Preventing Denial-of-request Inference Attacks in Location-sharing Services

Learning outcomes. Knowledge and understanding. Competence and skills

What Is Probability?

MOBILITY DATA MODELING AND REPRESENTATION

Uncertain Data Management for Sensor Networks

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Real Time Bus Monitoring System by Sharing the Location Using Google Cloud Server Messaging

Bootstrapping Big Data

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Bayesian networks - Time-series models - Apache Spark & Scala

Statistical Models in Data Mining

Imputing Values to Missing Data

How To Calculate A Multiiperiod Probability Of Default

Stock Investing Using HUGIN Software

A Learning Based Method for Super-Resolution of Low Resolution Images

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

CHAPTER 2 Estimating Probabilities

Big Data a threat or a chance?

Model-based Synthesis. Tony O Hagan

Exploring Big Data in Social Networks

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Smarter Planet evolution

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Bayesian Statistics: Indian Buffet Process

Pontifical Catholic University of Parana Mechanical Engineering Graduate Program

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

How To Create A Data Science System

Christian Bettstetter. Mobility Modeling, Connectivity, and Adaptive Clustering in Ad Hoc Networks

Query Selectivity Estimation for Uncertain Data

Big Data Management and Analytics

Principles of Data Mining by Hand&Mannila&Smyth

Data Mining Practical Machine Learning Tools and Techniques

Tutorial on Markov Chain Monte Carlo

Integration of GPS Traces with Road Map

ANALYTICS IN BIG DATA ERA

Social Media Mining. Data Mining Essentials

11. Time series and dynamic linear models

Sample Size Designs to Assess Controls

Hyperbolic Location Fingerprinting: A Calibration-Free Solution for Handling Differences in Signal Strength

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Globally Optimal Crowdsourcing Quality Management

Practicing Differential Privacy in Health Care: A Review

ANALYTICS IN BIG DATA ERA

IDENTIFICATION OF KEY LOCATIONS BASED ON ONLINE SOCIAL NETWORK ACTIVITY

The STC for Event Analysis: Scalability Issues

The primary goal of this thesis was to understand how the spatial dependence of

STA 4273H: Statistical Machine Learning

Using the HP Vertica Analytics Platform to Manage Massive Volumes of Smart Meter Data

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

3.2 Roulette and Markov Chains

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities

Course: Model, Learning, and Inference: Lecture 5

Bayesian Phylogeny and Measures of Branch Support

Dealing with large datasets

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Copyright. Network and Protocol Simulation. What is simulation? What is simulation? What is simulation? What is simulation?

Discovering Trajectory Outliers between Regions of Interest

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

A Perspective on Statistical Tools for Data Mining Applications

Indoor Robot Localization System Using WiFi Signal Measure and Minimizing Calibration Effort

Statistics & Probability PhD Research. 15th November 2014

Source Anonymity in Sensor Networks

A Content based Spam Filtering Using Optical Back Propagation Technique

Avigdor Gal Technion Israel Institute of Technology

BIG DATA FOR MODELLING 2.0

DATA MINING IN FINANCE

BIG DATA IN SUPPLY CHAIN MANAGEMENT: AN EXPLORATORY STUDY

GN47: Stochastic Modelling of Economic Risks in Life Insurance

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, Abstract. Review session.

Continuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation

Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey

Global Soft Solutions JAVA IEEE PROJECT TITLES

Discovery of User Groups within Mobile Data

Big Data from a Database Theory Perspective

Fixed-Effect Versus Random-Effects Models

Using Machine Learning on Sensor Data

Protein Protein Interaction Networks

Is log ratio a good value for measuring return in stock investments

The Optimality of Naive Bayes

Traffic Prediction in Wireless Mesh Networks Using Process Mining Algorithms

Prediction of DDoS Attack Scheme

Privacy Challenges of Telco Big Data

UNCERTAINTIES OF MATHEMATICAL MODELING

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Framework and key technologies for big data based on manufacturing Shan Ren 1, a, Xin Zhao 2, b

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Introduction to Data Mining

Information Visualization WS 2013/14 11 Visual Analytics

Transcription:

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle

Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential New applications Innovative research Economic Boost $600 billion potential annual consumer surplus from using personal location data [1] [1] McKinsey Global Institute. Big data: The next frontier for innovation, competition, and productivity. June 2011. 2

Spatio Temporal Data (object, location, time) triples Queries: Find friends that attended the same concert last saturday Best case: Continuous function GPS log taken from a thirty minute drive through Seattle Dataset provided by: P. Newson and J. Krumm. Hidden Markov Map Matching Through Noise and Sparseness. ACMGIS 2009. 3

Sources of Uncertainty Missing Observations Missing GPS signal RFID sensors available in discrete locations only Wireless sensor nodes sending infrequently to preserve energy Infrequent check ins of users of geo social networks Dataset provided by: E. Cho, S. A. Myers and J. Leskovek. Friendship and Mobility: User Movement in Location Based Social Networks. SIGKDD 2011. 4

Sources of Uncertainty Uncertain Observations Imprecise sensor measurements (e.g. radio triangulation, Wi Fi positioning) Inconsistent information (e.g. contradictive sensor data) Human errors (e.g. in crowd sourcing applications) From database perspective, the position of a mobile object is uncertain Dataset provided by: E. Cho, S. A. Myers and J. Leskovek. Friendship and Mobility: User Movement in Location Based Social Networks. SIGKDD 2011. 5

Traditional Solutions Avoid uncertainty Store aggregated positions in the database Extrapolated positions Expected positions Most likely positions Impossible to assess the confidence of results 10:05 10:07? 10:06 6

Research Challenge Include the uncertainty, which is inherent in spatial and spatio temporal data, directly in the querying and mining process. 7

Research Challenge Include the uncertainty, which is inherent in spatial and spatio temporal data, directly in the querying and mining process. Assess the reliability of similarity search and data mining results, enhancing the underlying decision making process. 8

Research Challenge Include the uncertainty, which is inherent in spatial and spatio temporal data, directly in the querying and mining process. Assess the reliability of similarity search and data mining results, enhancing the underlying decision making process. Improve the quality of modern location based applications and of research results in the field. 9

Uncertain Spatio Temporal Data Model [1] Discretize Time and Space Model object movement as a Markov chain Weighted Random Walk Learn transition probabilities empirically Rejected possible worlds that do not match all observations Exact Probabilities can be computed for special queries [1] General Approach: Monte Carlo Sampling Draw a sufficiently high number of samples Approximate result probability = ratio of samples that satisfy the query and total number of drawn samples But how to draw samples efficiently such that they are conform with the observations? Solution: Adaption of transition matrices [1] T. Emrich, H. P. Kriegel, N. Mamoulis, M. Renz, and A. Züfle. Querying uncertain spatio temporal data. In Proceedings of the 28th International Conference on Data Engineering (ICDE), Washington, DC, 2012. 10

Uncertain Spatio Temporal Data Model [1] Discretize Time and Space Model object movement as a Markov chain Weighted Random Walk Learn transition probabilities empirically Rejected possible worlds that do not match all observations Exact Probabilities can be computed for special queries [1] General Approach: Monte Carlo Sampling Draw a sufficiently high number of samples Approximate result probability = ratio of samples that satisfy the query and total number of drawn samples But how to draw samples efficiently such that they are conform with the observations? Solution: Adaption of transition matrices [1] T. Emrich, H. P. Kriegel, N. Mamoulis, M. Renz, and A. Züfle. Querying uncertain spatio temporal data. In Proceedings of the 28th International Conference on Data Engineering (ICDE), Washington, DC, 2012. 11 11

Uncertain Spatio Temporal Data Model [1] Discretize Time and Space Model object movement as a Markov chain Weighted Random Walk Learn transition probabilities empirically Rejected possible worlds that do not match all observations Analytic solutions for special query types [1] General Approach: Monte Carlo Sampling Draw a sufficiently high number of samples Approximate result probability = ratio of samples that satisfy the query and total number of drawn samples But how to draw samples efficiently such that they are conform with the observations? Solution: Adaption of transition matrices [1] T. Emrich, H. P. Kriegel, N. Mamoulis, M. Renz, and A. Züfle. Querying uncertain spatio temporal data. In Proceedings of the 28th International Conference on Data Engineering (ICDE), Washington, DC, 2012. 12 12

Probabilistic NN Queries Extension of Nearest Neighbor Querys on (certain) trajectories to the uncertain case Certain Case: For a query trajectory q, and a time interval T, a Nearest Neighbor Query returns all objects having the smallest distance to q during the whole interval T. For a query trajectory q, and a time interval T, a Nearest Neighbor Query returns all objects having the smallest distance to q during any time in T. Uncertain Case: For an uncertain trajectory q, a probabilistic Nearest Neighbor Query returns, for each object in the database, the probability to be a Nearest Neighbor of q. Both variants are NP hard to solve analytically. (Proofs given in the paper) 13

Adding knowledge to the model: Bayesian Inference Using Bayesian inference, additional knowledge can be added to the model, such as discrete observations of an object. Use empirically learned transition probabilities as a priori model Adapt this model to an a posteriori given information about discrete observations of an object. Model adaption using a Forward Backward approach Observations 14

Adding knowledge to the model: Bayesian Inference Expected Distance (m) The adapted a posteriori model allows to effectively interpolate positions between discrete observations. Significant improvement to existing approaches. Good results even without having a trained a priori model A posteriori Markov model A priori Markov model Time (10s per tick) A posteriori Markov model without a priori knowledge Spatio Temporal approximations (competitor approach) 15

Summary & Other Contributions Theoretical Analysis: NP hardness of NN Queries using this model. Efficient Markov model adaption, given observations by Bayesian Inference Using the empirically learned Markov chain as prior Using a forward backward approach to derive the posterior Efficient sampling approach using the posterior model Applicable for any query having a solution for the certain certain case! Index support for Nearest Neighbor Queries Based on an existing index structure Algorithms for efficient query processing provided Strong Experimental Results Probabilistic Models can vastly reduce the expected prediction error Compared to Traditional approaches predicting a single location Existing approaches for uncertain spatio temporal data 16

Thank you for your attention Questions?!