Veracity in Big Data Reliability of Routes



Similar documents
Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

9700 South Cass Avenue, Lemont, IL URL: fulin

Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage

Veracity of data. New approaches are emerging to account for uncertainty in data at a giant scale IBM Corporation

BIG DATA POSSIBILITIES AND CHALLENGES

Geo-Social Co-location Mining

BigData at UI CS. Hasan Jamil Department of Computer Science University of Idaho

Big Data and Analytics: Challenges and Opportunities

Advanced Methods for Pedestrian and Bicyclist Sensing

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

COMP9321 Web Application Engineering

Load Balancing Using a Co-Simulation/Optimization/Control Approach. Petros Ioannou

Second International Workshop on Preservation of Evolving Big Data - Panel on Big Data Quality

Integration of GPS Traces with Road Map

Information Management course

Mobile Monetization Scenario Design & Big Data. Arther Wu Senior Director of Monetization and Business Operation

How To Teach Economics

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

Modern (Computational) Approaches to Big Data Analytics. CSC 576 Computer Science, University of Rochester Instructor: Ji Liu

Rackscale- the things that matter GUSTAVO ALONSO SYSTEMS GROUP DEPT. OF COMPUTER SCIENCE ETH ZURICH

Spatio-Temporal Networks:

CAS CS 565, Data Mining

Introduction to Data Mining

Big Data in Pictures: Data Visualization

Big-Data Computing with Smart Clouds and IoT Sensing

Statistical Models in Data Mining

Anomaly Detection and Predictive Maintenance

Industry 4.0 and Big Data

Uncertain Data Management for Sensor Networks

Proposed Advance Taxi Recommender System Based On a Spatiotemporal Factor Analysis Model

Introduction to Data Mining

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Architecture 3.0 Landscape Analytics

The value of data analytics

Global Internet Marketing. Presentation for the NTA Montage Meeting Seville, Spain April 17, 2010

Research of Postal Data mining system based on big data

Global Data Center Location Insights March 2013

Dimensionalizing Big Data. WA State vs. peers. Building on strengths CONTENTS. McKinsey & Company 1

The Big Data Paradigm Shift. Insight Through Automation

Keywords: Mobility Prediction, Location Prediction, Data Mining etc

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases

MONIC and Followups on Modeling and Monitoring Cluster Transitions

Are You Ready for Big Data?

How To Understand The History Of Navigation In French Marine Science

Collaborations between Official Statistics and Academia in the Era of Big Data

Big Data: A Closer Look

DIGITAL MARKETING STRATEGIES Leveraging The Back-End Tools

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Big Data Analytics. Lucas Rego Drumond

A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures

Online Content Optimization Using Hadoop. Jyoti Ahuja Dec

"BIG DATA A PROLIFIC USE OF INFORMATION"

Conference and Journal Publications in Database Systems and Theory

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

Query Selectivity Estimation for Uncertain Data

Post-Graduation Survey Results 2014 Dietrich College of Humanities & Social Sciences STATISTICS Bachelor of Science

Decentralized Utility-based Sensor Network Design

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Fast Data in the Era of Big Data: Twitter s Real-

Are You Ready for Big Data?

Data Cleansing for Remote Battery System Monitoring

Big Data, Statistics, and the Internet

class 1 welcome to CS265! BIG DATA SYSTEMS prof. Stratos Idreos

Paolo Costa

CS Data Science and Visualization Spring 2016

Context-Aware Online Traffic Prediction

Real Time Bus Monitoring System by Sharing the Location Using Google Cloud Server Messaging

Fundamentals of Visualizing Biological Data

Statistics, Big Data and Data Science!?

CALL for PARTICIPATION

What is Big Data? BCS Aberdeen Branch 6 November 2014

Big Data a threat or a chance?

DRIVEN BY TIME WINDOWS: PREDICTIVE STRATEGIES

Aggregation of Spatio-temporal and Event Log Databases for Stochastic Characterization of Process Activities

Shareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing

Questions and Answers Regarding the Watering Index and ET (EvapoTranspiration)

Big Data Management and Analytics

Evaluation of Unlimited Storage: Towards Better Data Access Model for Sensor Network

BIG DATA FOR MODELLING 2.0

1 Results from Prior Support

Data + Science Towards Organizational Excellence. eric Choo

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

International comparisons of road safety using Singular Value Decomposition

Big Data Analytics on Big Spatial Database

Distributed Continuous Range Query Processing on Moving Objects

INTEROPERABILITY IN DATA WAREHOUSES

1.5.3 Project 3: Traffic Monitoring

Nobody ever got fired for using Hadoop on a cluster

Comments on the Meaning of as appropriate. Bernard D. Goldstein, MD University of Pittsburgh Graduate School of Public Health

Ramandeep S. Randhawa

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Department of Political Science Phone: (805) University of California, Santa Barbara Fax: (805)

A Multimodal Trip Planning System Incorporating the Park-and-Ride Mode and Real-time Traffic/Transit Information

Better Buildings Neighborhood Program Data and Evaluation Peer Exchange Call: Homeowner and Contractor Surveys

Approaches for parallel data loading and data querying

HOW TO START YOUR SUCCESSFUL CAREER IN SEARCH ENGINE OPTIMIZATION DIGITAL MARKETING

Complex, true real-time analytics on massive, changing datasets.

Schedule Risk Analysis A large DoD Development Program

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Smarter Planet evolution

Transcription:

Veracity in Big Data Reliability of Routes Dr. Tobias Emrich Post-Doctoral Scholar Integrated Media Systems Center (IMSC) Viterbi School of Engineering University of Southern California Los Angeles, CA 900890781 emrich@usc.edu 1

OUTLINE Big (Uncertain) Data Reliability in Traffic Networks From Uncertainty to Reliability Outlook 2

BIG DATA Big data is like everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... Dan Ariely Center for Advanced Hindsight at Duke University 3

BIG DATA Big data is like : everyone talks about it, Studies* on Microsoft and Yahoo production cluster: nobody really knows how to do it, everyone Median thinks Hadoop job everyone is else ~13 GB is doing it, 90% of the jobs are < 100 GB so everyone claims they are doing it... Dan Ariely Center for Advanced Hindsight at Duke University *A. Rowstron, D. Narayanan, A. Donnelly, G. O'Shea and A. Douglas. "Nobody ever got fired for using Hadoop on a cluster, Proceedings of HotCDP, April 2012. 4

BIG DATA Variety Volume Veracity Velocity Value 5

BIG DATA Variety Volume Veracity Velocity Value 6

Uncertainty in (not so big) Data J. Niedermayer, A. Züfle, T. Emrich, M. Renz, N. Mamoulis, L. Chen, H. P. Kriegel: Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories In Proceedings of the 40th International Conference on Very Large Data Bases (VLDB), Hangzhou, China: 205 216, 2014. P. Zhang, R. Cheng, N. Mamoulis, M. Renz, A. Züfle, Y. Tang, T. Emrich: Voronoi based Nearest Neighbor Search for Multi Dimensional Uncertain Databases In Proceedings of the 29th International Conference on Data Engineering (ICDE), Brisbane, Australia: 158 169, 2013. J. Niedermayer, A. Züfle, T. Emrich, M. Renz, N. Mamoulis, L. Chen, H. P. Kriegel: Similarity Search on Uncertain Spatio temporal Data In Proceedings of the 6th Internation Conference on Similarity Search and Applications (SISAP), Coruna, Spain: 43 49, 2013 T. Emrich, H. P. Kriegel, J. Niedermayer, M. Renz, A. Suhartha, A. Züfle: Exploration of monte carlo based probabilistic query processing in uncertain graphs In Proceedings of the 21st ACM Conference on Information and Knowledge Management (CIKM), Maui, HI: 2728 2730, 2012. T. Emrich, H. P. Kriegel, N. Mamoulis, M. Renz, A. Züfle: Indexing uncertain spatio temporal data In Proceedings of the 21st ACM Conference on Information and Knowledge Management (CIKM), Maui, HI: 395 404, 2012. T. Bernecker, T. Emrich, H. P. Kriegel, M. Renz, A. Züfle: Probabilistic Ranking in Fuzzy Object Databases In Proceedings of the 21st ACM Conference on Information and Knowledge Management (CIKM), Maui, HI: 2647 2650, 2012. N. Hubig, A. Züfle, T. Emrich, M. A. Nascimento, M. Renz, H. P. Kriegel: Continuous Probabilistic Sum Queries in Wireless Sensor Networks with Ranges In Proceedings of the 24th International Conference on Scientific and Statistical Database Management (SSDBM), Chania, Crete, Greece: 96 105, 2012. T. Emrich, H. P. Kriegel, N. Mamoulis, M. Renz, A. Züfle: Querying Uncertain Spatio Temporal Data In Proceedings of the 28th International Conference on Data Engineering (ICDE), Washington, DC, 2012. T. Bernecker, T. Emrich, H. P. Kriegel, N. Mamoulis, M. Renz, A. Züfle: A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases In Proceedings of the 27th International Conference on Data Engineering (ICDE), Hannover, Germany: 339 350, 2011. T. Bernecker, L. Chen, T. Emrich, H. P. Kriegel, N. Mamoulis, A. Züfle: Managing Uncertain Spatio Temporal Data In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Querying and Mining Uncertain Spatio Temporal Data (QUeST), Chicago, IL: 16 20, 2011. T. Bernecker, T. Emrich, H. P. Kriegel, M. Renz, S. Zankl, A. Züfle: Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data In Proceedings of the 37th International Conference on Very Large Data Bases (VLDB), Seattle, WA: 669 680, 2011. 7

Uncertainty Databases Result 1) Efficient 2) No Confidence Query/ Datamining Uncertainty is inherent in many datasets: Automated Extraction of Information from HTML Sensor Readings Human Observations Predictions Result 1) Efficient Alg. needed 2) Confidence attached 8

Reliability in Traffic Networks Route B: ~58 min Usually, I take route A Route A: ~53 min when I have a meeting in the morning, I take route B 9

Reliability in Traffic Networks Route B: ~58 min Travel time prediction, incorporating uncertainty 0,3 Route A: ~53 min Probability 0,25 0,2 0,15 0,1 Route A Route B 0,05 0 35 40 45 50 55 60 65 70 75 80 Travel Time 10

Reliability in Traffic Networks Predicted travel time of a route vary due to Imprecise Prediction of traffic flow Unpredictable accidents Changing weather conditions Routes differ in variance of travel time Starting at 9am Route A arrives before 10am with 89% Route B arrives before 10am with 99.2% To have 99.2% on route A I have to leave 10 mins earlier Probability 0,3 0,2 0,1 0 Route A: ~53 min Route B: ~58 min Route A Route B 35 40 45 50 55 60 65 70 75 80 Travel Time 11

From Uncertainty to Reliability Current approaches Traditional D TT = 14 min 9.00pm S 12

From Uncertainty to Reliability Current approaches Traditional 9.00pm S D TT = 14 min TT = 12 min ClearPath 13

From Uncertainty to Reliability Considering uncertainty of predictions D 9.00pm S How to predict How to model 14

From Uncertainty to Reliability Considering uncertainty of predictions D 9.00pm S How to predict How to model 15

From Uncertainty to Reliability Considering uncertainty of predictions D 9.00pm S How to predict What time to use? How to model 16

From Uncertainty to Reliability Considering uncertainty of predictions D 9.00pm S How to predict What time to use? How to model 17

From Uncertainty to Reliability Considering uncertainty of predictions D TT =? min 9.00pm S How to predict What time to use? How to model How to add up uncertain travel times? 18

From Uncertainty to Reliability Considering uncertainty of predictions How to deal with correlations? D TT =? min 9.00pm S How to predict What time to use? How to model How to add up uncertain travel times? 19

From Uncertainty to Reliability Considering uncertainty of predictions How to deal with correlations? D TT =? min 9.00pm S How to predict What time to use? How to model How to add up uncertain travel times? 20

Outlook Evaluation of the quality of the result Efficient online prediction Extension to new query mechanisms: When do I have to start (and which route do I have to take) when I want to be at USC at 8.00am (or before) with a probability of 99%? 21

Questions? Dr. Tobias Emrich emrich@usc.edu 22

Uncertainty in Databases Uncertainty is inherent in many datasets: Automated Extraction of Information from HTML (i.e. John works at Google vs. John works at Microsoft) Sensor Readings (i.e. RFID sensors tracking the position of customers) Human Observations (i.e. the seen Bird was either a Raven (75%) or a Crow (25%)) Predictions (i.e. tomorrow its going to rain (10%) or not(90%)) Two approaches to solve this Cleaning (e.g. get rid of uncertainty) Management (e.g. handle the uncertainty) 23

24