Querying Past and Future in Web Applications

Similar documents
Querying Business Processes

A Quest for Beauty and Wealth (or, Business Processes for Database Researchers)

Algebraic Recognizability of Languages

Complexity Classes P and NP

KEYWORD SEARCH IN RELATIONAL DATABASES

Bayesian networks - Time-series models - Apache Spark & Scala

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Monitoring Business Processes with Queries

Small Maximal Independent Sets and Faster Exact Graph Coloring

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Lecture 7: NP-Complete Problems

CHAPTER 7 GENERAL PROOF SYSTEMS

Data Integration with Uncertainty

Minesweeper as a Constraint Satisfaction Problem

Complexity of Higher-Order Queries

Analysis of an Artificial Hormone System (Extended abstract)

Fixed-Point Logics and Computation

Borne inférieure de circuit : une application des expanders

Applied Algorithm Design Lecture 5

XML Data Integration

Consistent Query Answering in Databases Under Cardinality-Based Repair Semantics

Resources, process calculi and Godel-Dummett logics

Satisfiability Checking

BUSINESS RULES CONCEPTS... 2 BUSINESS RULE ENGINE ARCHITECTURE By using the RETE Algorithm Benefits of RETE Algorithm...

"LET S MIGRATE OUR ENTERPRISE APPLICATION TO BIG DATA TECHNOLOGY IN THE CLOUD" - WHAT DOES THAT MEAN IN PRACTICE?

Path Querying on Graph Databases

Cost effective Outbreak Detection in Networks

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Administration Challenges

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Software Active Online Monitoring Under. Anticipatory Semantics

Lecture 2: Complexity Theory Review and Interactive Proofs

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

DATA MINING IN FINANCE

Fabio Patrizi DIS Sapienza - University of Rome

OHJ-2306 Introduction to Theoretical Computer Science, Fall

Programming Tools based on Big Data and Conditional Random Fields

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Page 1. CSCE 310J Data Structures & Algorithms. CSCE 310J Data Structures & Algorithms. P, NP, and NP-Complete. Polynomial-Time Algorithms

MATHEMATICAL METHODS OF STATISTICS

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Informatique Fondamentale IMA S8

CSE 880. Advanced Database Systems. Active Database Design Issues

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li

On the Modeling and Verification of Security-Aware and Process-Aware Information Systems

Algorithmic Software Verification

1 Formulating The Low Degree Testing Problem

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

The Classes P and NP. mohamed@elwakil.net

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

The Classes P and NP

Big Data Provenance: Challenges and Implications for Benchmarking

MATHEMATICS: CONCEPTS, AND FOUNDATIONS Vol. III - Logic and Computer Science - Phokion G. Kolaitis

Chapter 7 Uncomputability

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

The Basics of Graphical Models

Tax Fraud in Increasing

Structure Preserving Model Reduction for Logistic Networks

Fairness in Routing and Load Balancing

Software Verification and System Assurance

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

GUIDELINE FOR PME 35 ONLINE BOOKING SYSTEM

Life of A Knowledge Base (KB)

Data exchange. L. Libkin 1 Data Integration and Exchange

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

ON THE COMPLEXITY OF THE GAME OF SET.

The Model Checker SPIN

Statistics Graduate Courses

Introduction to Logic in Computer Science: Autumn 2006

Course: Model, Learning, and Inference: Lecture 5

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Procedia Computer Science 00 (2012) Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu.

Extended Finite-State Machine Inference with Parallel Ant Colony Based Algorithms

Compact Representations and Approximations for Compuation in Games

Tetris is Hard: An Introduction to P vs NP

Formal Verification Coverage: Computing the Coverage Gap between Temporal Specifications

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Active database systems. Triggers. Triggers. Active database systems.

Runtime Verification - Monitor-oriented Programming - Monitor-based Runtime Reflection

Community Mining from Multi-relational Networks

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

Course Syllabus For Operations Management. Management Information Systems

Problem Set 7 Solutions

Statistical Models in Data Mining

CHAPTER 2 Estimating Probabilities

Relational Databases

Institut für Parallele und Verteilte Systeme. Abteilung Anwendersoftware. Universität Stuttgart Universitätsstraße 38 D Stuttgart

Portable Bushy Processing Trees for Join Queries

Question 2 Naïve Bayes (16 points)

1 Using a SQL Filter in Outlook 2002/2003 Views. 2 Defining the Problem The Task at Hand

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING

Data Science Center Eindhoven. Big Data: Challenges and Opportunities for Mathematicians. Alessandro Di Bucchianico

Big Data Provenance - Challenges and Implications for Benchmarking. Boris Glavic. IIT DBGroup. December 17, 2012

Logical Foundations of Relational Data Exchange

Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2

Transcription:

Querying Past and Future in Web Applications Daniel Deutch Tel-Aviv University Tova Milo Customer HR System Logistics ERP Bank ecomm CRM Supplier

Outline Introduction & Motivation Querying Future [VLDB 06, ICDE 09, ICDT 09, VLDB 09] Querying Past [VLDB 07, VLDB 08, ICDT 09, VLDB 09] Related and Future work

Outline Introduction & Motivation Querying Future [VLDB 06, ICDE 09, ICDT 09, VLDB 09] Querying Past [VLDB 07, VLDB 08, ICDT 09, VLDB 09] Related and Future work

Introduction and Motivation Querying Web applications Web-based Business Processes (BPs) are very popular Querying the past past executions of a given application for improving business logic, optimization, personalized ads, past design patterns of applications to exploit when building a new application Querying (possible) future executions the above + verification

Introduction and Motivation Example Web Application

Introduction and Motivation Application Specification BP specifications (e.g. in BPEL) are compiled to running code => Queries over specification structure Modeled as nested DAGs Each DAG corresponds to a function\web-page Nodes model activities (activation and completion) Edges mark flow relation Atomic/compound activities Nesting models implementation relations \ links Guarding formulas (on external events) model choices Recursion is allowed

Introduction and Motivation Example Specification choosetravel F1 $searchtype = flights only Advertise Login Confirm Flights $airline= BA $airline= AF $choice= confirm $choice= reset F1 $airline= AA $searchtype = flights + hotels Advertise Login Confirm $searchtype = flights+hotels+cars Flights Hotels $hotel= Plaza $hotel= Marriott F2 F3 F4

Introduction and Motivation Example Execution Flow 07:00 choosetravel $searchtype= flight only 07:02 07:04 Login Login 07:05 Advertise Flights $airline = BA 07:10 07:30 Advertise... Flights 07:40 07:41 07:42 Confirm Confirm $choice = confirm... 07:43 choosetravel

Introduction and Motivation Sources of Uncertainty Partial Tracing, due to lack of storage, confidentiality, etc. External Effects, e.g. user choices, server response time The effect of external events is modeled by logical formulas, guarding implementations At run-time, formulas truth values determines the chosen implementations Thus the past is uncertain, and the future unknown

Outline Introduction & Motivation Querying Future Querying Past Related and Future work

Querying Future Queries on possible EX-flows Can a user reserve a flight without giving her credit card details? What is the probability that this would happen? How can this happen / what are the most common scenarios? What is the common behavior of users that search for an Air France flight+hotel deal but quit without making a reservation? Focus on the search-related sub-flow. What is the best way to get a cheap flight+hotel deal, or one that maximizes the FF millage/points?

Querying Future Query types Boolean (+ probability) Selection ( + top-k ranking) Projection (+ top-k ranking)

Querying Future Query types Boolean (+ probability) easy (hard) Selection ( + top-k ranking) easy (easy) Projection (+ top-k ranking) easy (hard)

Querying Future Query language Execution patterns Intuitive, similar in structure to execution flows Seek for occurrences (homomorphism) of the pattern within execution flow DAGs May contain transitive nodes and edges May contain a projection part

Querying Future Example Query choosetravel Start Start Flights Flights $Airline = BA choosetravel choosetravel Hotels Any choosetravel Hotels Any choosetravel Confirm choosetravel Confirm

Querying Future Weighting flows A weighted model cweight - choice Weight (product cost, shipping time, likelihood) Aggr (sum, multiplication) fweight - flow Weight (total cost, shipping time, ) Varying sensitivity to flow history (cweight) Varying level of monotonicity (fweight)

Querying Future Our Example Specification choosetravel choosetravel $searchtype = flights only $searchtype = flights + hotels $searchtype = flights+hotels+cars F1 Login Login Advertise Advertise Confirm Confirm $choice= reset $airline= AL Flights $airline= BA Flights $airline= AF $choice= confirm F1 Advertise Advertise Login Login Confirm Confirm Flights Flights $hotel= Crown Hotels eplaza Hotels $hotel= Marriott F2 F3 F4

Querying Future Example (Weight = Likelihood) We distinct three classes of probability distributions, according to the level of dependency of cweight. History independence (markovian): No dependencies between formulas. Bounded-history: Dependency in(at most) B last formula values. Unbounded-history

Querying Future Sample of Results (top-k selection) Complexity of query evaluation depends on cweight PTIME in the spec., exponential in history bound and query. NP hard in both. Undecidable for unbounded history (Instance) Optimality of our algo depends on fweight Strongly monotone: Optimal Semi-strongly monotone: Instance optimal No optimal exists Weakly monotone: Not (instance) optimal No instance optimal exists

Querying Future Sample of Results (Boolean) Harder: Need to sum up probability of (a possibly infinite number of) qualifying flows Computing exact probability is impossible Approximation is possible Technique: representing probabilities via set of linear equations EXPTIME in general (NP Hard) PTIME for non-recursive apps

Querying Future Sample of Results (top-k projection) Source of difficulty: need to consider a possibly infinite # of answers, and sum up probabilities of their possibly infinite # of origins Technique: Small world theorem allows to consider only a bounded number of answers, then use (Boolean queries) Oracle EXPTIME even for non recursive apps (NP Hard)

Querying Future ShopIT Shopping assistant

Querying Future ShopIT Shopping assistant

Querying Future ShopIT Shopping assistant

Outline Introduction & Motivation Querying Future Querying Past Related and Future work

Querying Past Types of Partial Traces Partial Tracing, due to lack of storage, confidentiality, Naïve tracing records all activities accurately Semi-Naïve tracing contains only partial information on the names of some activities Selective tracing may omit some activities occurrences Tracing systems (called types) are represented by a renaming function and a deletion set

Querying Past Example BP Trip Luxury Trip Luxury Search Search Search Search Hotel Credit1 Flight Credit1 LuxHotel Credit2 LuxFlight Credit2 Hotel Credit1 Flight Credit1 LuxHotel Credit2 LuxFlight Credit2 Print Print Print Print

Querying Past Naïve Traces Trip Search Trip Luxury Search Search Search Hotel Credit1 Flight Credit1 LuxHotel Credit2 LuxFlight Credit2 Hotel Credit1 Flight Credit1 LuxHotel Credit2 LuxFlight Credit2 Print Print Trip Print Trip Luxury Print

Querying Past Semi-Naïve Traces Trip Search Trip Luxury Search Search Search Hotel Credit Flight Credit Hotel Credit Flight Credit Hotel Credit Flight Credit Hotel Credit Flight Credit Print Print Trip Print Trip Luxury Print

Querying Past Selective Traces Trip Trip Search Search Search Search Hotel Credit Flight Credit Hotel Credit Flight Credit Hotel Credit Flight Credit Hotel Credit Flight Credit Print Print Print Print Trip Trip

Querying Past Lets talk about (top-k) queries Given a partial trace, what is its most likely origin? Or, more generally, given a pattern (query) of partial traces, what are the most likely origins of partial traces of this pattern? Good news: Many of the query evaluation algorithms extend to this context (even without knowing the tracing system )

Outline Introduction & Motivation Querying Future Querying Past Related and Future work

Related & Future Work (Small subset of) Related work (Probabilistic) Recursive State Machines with temporal logic as query language Probabilistic Relational DBs Probabilistic XML Graph grammars with MSO (or FO) as query language BP and Web applications mining

Related & Future Work Future work Practical applications: Web-sites design On-line advertisements Improved business logic Enriched Query Language Joins Data values Optimization Inference of specs/probability distributions

( = Thanks! )