Data Stream Management Systems and Query Languages
|
|
|
- Lewis Harmon
- 9 years ago
- Views:
Transcription
1 Data Stream Management Systems and Query Languages Advanced School on Data Exchange, Integration, and Streams (DEIS'10) Dagstuhl Information Systems - Informatik 5 University
2 New Applications New Requirements Traffic Applications Health monitoring Rapid emission of messages, e.g., hazard warnings Derive traffic information from processed data Integration of data from multiple mobile and static sources Slide 2/45 Sensors produce data at high rates Integration with further information, e.g., EHR Real-time processing to analyze health information and predict events Other applications: Stock analysis Production monitoring User behaviour (click analysis) Position monitoring (soldiers, devices,..)
3 Running Example Car2X Communication Two kinds of messages: 1. Based on events vehicles produce a message describing the event 2. Vehicles send probe data periodically Timestamp; MsgID; Lng; Lat; Speed; Accel; Slide 3/45... t + n Message... Message Message Message Message Message Message t
4 Comparison of Applications Slide 4/45 Traditional Applications Irregular transactions, batch processing Possibly very large, but finite data set Frequent analysis, multiple passes More tolerant time requirements, predictable Time may be unimportant, neglected, all information may be important Passive behaviour (pull) Data assumed to be complete up to that point in time Permanent storage required Streaming Applications Continuous flow of data Unbounded stream Continuous analysis, one pass Data is produced at high rates, realtime requirements, bursty Notion of time is important, recent information more important Active behaviour (push), trigger-oriented, monitoring Asynchronous and incomplete data arrival, inaccuracies Not all information must/can be stored permanently volatile What does that mean for a data management system?
5 Agenda 1. Introduction 2. Data Stream Management Systems 3. Query Languages 4. Query Plans & Operators 5. Quality Aspects in DSMS 6. Our work Slide 5/45
6 Requirements for a DSMS Slide 6/45 Allow continuous queries, but also ad-hoc queries, views Handle unbounded streams while dealing with limited resources Delivery of incremental results and processing of subsets Fulfilment of real-time requirements for processing and response Scalability in number of queries and data rates Support for fault tolerance: missing, out-of-order, delayed data Active system behaviour push, trigger Predictable and repeatable results fault tolerance and recovery [Stonebraker et al. 2005] High-availability [Stonebraker et al. 2005] Update of data after processing [Abadi et al. 2005] Dynamic query modification [Abadi et al. 2005] Shared processing of data by multiple queries, adaptivity to addition and removal of queries [Chandrasekaran et al. 2003] Provide support for signal processing [Girod et al. 2008], objects, lists
7 Flaws in Common DBMS Slide 7/45 Processing Streams Human-active DBMS-passive model vs. DBMS-active humanpassive model [Abadi et al. 2003] Turns common DBMS idea bottom-up data retrieval triggers queries in contrast to queries trigger data retrieval [Chandrasekaran et al. 2003] Relational algebra assumes finite sets blocking operators do not suit for streams (wait for results, no time-out, no approximate query answering) Process-after-store mechanism: triggers can be used, but do not scale [Abadi et al. 2003] high latency and overhead for handling streaming data Cannot deal with out-of-order data [Stonebraker et al. 2005] Predictable results order of storage and processing of data has to be controlled externally [Stonebraker et al. 2005]
8 General Structure of an SPE Slide 8/45 [Ahmad and Çetintemel, 2009]
9 Overview of DSMSs Research Projects Project Research Group Runtime Description Slide 9/45 Tapestry TelegraphCQ du (Fjords, PSoup.) STREAM eam/ Aurora/Borealis earch/borealis PIPES /Projects/PIPES System S/ SPC/ SPADE/ om/comm/research_projects. nsf/pages/esps.index.html StreamMill am-mill Global Sensor Networks ac/gsn/ Xerox Parc (D. Terry, D. Goldberg et al.) 1992? uses a commercial append-only database, cont. querying by SPs UC Berkeley (Hellerstein, Franklin) reuses components from DBMS PostgreSQL, dataflows composed of set of operators (e.g., Eddy, Join) connected by Fjords, Language: SQL, scripts Stanford University (A. Arasu, J. Widom, B. Babcock, S. Babu et al.) Brown Univ., Brandeis Univ., MIT (Abadi, Cherniack, Madden, Zdonik, Stonebraker et al.) Universität Marburg (Seeger, Krämer et al.) Probably the most famous one, comprehensible abstract semantics description; Language: CQL Distributed system, uses notions of arrows, boxes and connection points for operator networks ; Commercial: StreamBase; Language SQuAl Commercial: RTM Analyzer Language: PIPES, define logical and physical query algebra on multi-sets, use algebraic optimizations IBM T.J. Watson Research Distributed System, notion of operator network, Commercial: InfoSphere; Language: SPADE UCLA (H. Takkhar, C. Zaniolo) Ongoing Inductive DSMS mining implementable with SQL and UDAs, support for XML data; language: ESL EPF Lausanne, Digital Enterprise Research Insitute (DERI) (Salehi, Aberer et al.) Ongoing Wraps existing rel. DBMS with stream functionality; language: common SQL
10 Overview DSMS Commercial Products System Company Based on Description Slide 10/45 InfoSphere Streams 01.ibm.com/software/da ta/infosphere/streams/ Oracle Streams echnetwork/database/fe atures/data- integration/default html StreamInsight m/sqlserver/2008/en/us /r2-complex-event.aspx StreamBase com TruSQL Engine Esper (Open Source) rg/ IBM System S/ /SPADE/ SPC Stand-alone product, only supports Linux?, queries over structured and unstructured data sources Language: SPADE Oracle -- Integrated in Oracle 11g; Language: CQL Microsoft --- Integrated in MS SQL Server 2008 Release 2; Language:.NET, LINQ StreamBase Aurora/Borealis Stand-alone products (Server, Studio, Adapters..); Language: StreamSQL Truviso TelegraphCQ? Language: StreaQL EsperTech --- Available in.net and Java, Stand-alone product; Language: EPL
11 Example The Aurora System Router: forwards elements to storage manager or outputs Storage Manager: Maintains operator queues & manages buffer For each queue, disk storage blocks are used (circular buffer) Keeps blocks of high priority queues in main memory Scheduler: picks the next operator to be executed Shares table with SM with priority, perc. of operator queues in main memory, flag if box is running Priority is based on QoS statistics Train scheduling and superbox scheduling: minimize box calls and I/O operations by building tuple trains Slide 11/45 Box processors: execute the operators (multi-threading) QoS Monitor: monitors system performance and activates load shedder [Abadi et al. 2003] Load Shedder: based on introspection tuples are dropped using QoS information Catalog: meta information about network, inputs, outputs, statistics etc.
12 Agenda 1. Introduction 2. Data Stream Management Systems 3. Query Languages 4. Query Plans & Operators 5. Quality Aspects in DSMS 6. Our work Slide 12/45
13 Query Processing Overview Query Parsing/ Translation Initial Logical Plan Optimized logical plan Algebraic Optimization Translation/ Physical Optimization GUI for logical algebra GUI for physical algebra Slide 13/45 Optimized physical plan Execution Results [Krämer & Seeger 2009]
14 Requirements for Query Languages in DSMS Slide 14/45 Windowing: which kinds of windows are supported? Correlation: combine streams and static relations in a query Provide all standard SQL operations approved set of query operators User-defined operations/functions Language closure: operators get streams as input and output streams no conversion into finite relations in between Pattern matching: identify subsequences of tuples Expressiveness: must be expressive enough for targeted apps which operations can be formulated? Well-understood formal semantics, e.g., enables optimization
15 Query Formulation Extensions of the SQL Standard, e.g., CQL: STREAM [Arasu et al. 2006], Oracle Streams PIPES [Krämer et al. 2009] ESL [Thakkar et al. 2008]: StreamMill SELECT Istream Count(*) FROM C2XMgs[Range 1 Minute Slide 10s] WHERE Speed > 30.0 Assembling of operators, e.g., Aurora/Borealis (SQuAl) Filter (Speed > 30.0) Slide 15/45 System S/ InfoSphere (SPADE) Aggregate(CNT, Assuming O, Size 1 minute, Advance 10 second) C2X_Source Functor Aggregate TCP_Sink XPath-based languages, e.g., [Peng and Chawathe 2003]
16 Timestamps Slide 16/45 Monotonic time domain T: ordered, infinite set of discrete time instants τ T [Patroumpas and Sellis 2006] multi-set semantics Explicit or external timestamp (application time): Tuples enter system with a predefined timestamp field from the source Disadvantage: elements may not arrive in order Implicit or internal timestamp (system time): Timestamp is defined by the system, add. timestamp field Preserve timestamps enables to measure output delay (Aurora) Logical clock: Consecutive integer with distinct values On receipt (global order) or by each operator s input queue Latent timestamps (StreamMill): Only created when required, other operators use order of input queue Operator timestamps, e.g., for a join which timestamp should be used?
17 Timestamps Example ESL Slide 17/45 CREATE STREAM C2XMgs( ts timestamp, msgid char(10), lng real, lat real, speed real, accel real) ORDER BY ts; SOURCE port5678 ; CREATE STREAM C2XMgs( ts timestamp, msgid char(10), lng real, lat real, speed real, accel real, current_time timestamp) ORDER BY current_time; SOURCE port5678 ; CREATE STREAM C2XMgs( ts timestamp, msgid char(10), lng real, lat real, speed real, accel real) SOURCE port5678 ; Explicit Timestamp Implicit Timestamp Latent Timestamp
18 Semantics Data Model (Stream Elements) In general: (s,τ), tuple s with schema (A 1,...,A n ), timestamp τ Slide 18/45 Simple data types (e.g., STREAM, Aurora, GSN,..) Borealis: With key (k 1,...k n,a 1,..,A m ), used to identify tuples for revision Adds revision flag: +, -,, also QoS information can be included CESAR (event processing algebra [Demers et al. 2005]): Event-based: (A 1,...,A m, τ 0,τ 1 ) denotes start and end of an event Objects (PIPES, System S, Xstream [Girod et al. 2008]): Finite sequence of objects and a timestamp Composite type of a tuple in relational case the schema [Krämer and Seeger 2009] Can use functions and predicates for arbitrary types
19 Ordering in Streams Temporal ordering as a many-to-one mapping f O : D S T with properties [Patroumpas and Sellis 2005] Existence: s S, τ T, such that f O (s) = τ Monotonicity: s 1,s 2 S, if s 1.A τ s 2.A τ f O (s 1 ) f O (s 2 ) In general: operators assume non-decreasing order of arriving elements, e.g., in STREAM: time advances from τ-1 to τ when all inputs of τ-1 have been processed Slide 19/45 But this is not valid, especially for explicit timestamps Data from sources may be in the right order due to communication problems, delays, asynchronism Ordering, arrival in time not guaranteed poses problems when windows are used (are the right tuples included?)
20 Handling of Unordered Streams Relax assumption about ordering (Aurora): Parameter specification to relax assumptions about local ordering (slack parameter k) Ordering Constraints [Arasu et al. 2004]: Ordered arrival constraint (windows) at least k+1 tuples with A value s.a after s Clustered arrival constraint (aggregates) at most k+1 further tuples after s without value v Referential integrity constraint (joins) delay between a tuple in S 1 and tuple in S 2 at most k Slide 20/45 Dictate ordering Heartbeats & Input Manager (STREAM): sends message with timestamp τ i which indicates, that τ i has ended no further elements with timestamp τ i will arrive Implicit timestamps elements are ordered anyways, DSMS sends heartbeat Explicit timestamps sources have to generate the heartbeat or DSMS has to deduce these from environment parameters, such as time delay between sources Dropping tuples (e.g., GSN) Partition into additional out-of-order stream (StreamMill) handling is left to the user Correct stream order locally Ordering operators, such as BSort (Aurora)
21 Semantics Data Model Slide 21/45 Stream (Streams and Relations) In general: Append-only (possibly infinite) sequence of tuples with uniform schema evolving in time STREAM: Base stream and derived stream, unbounded multi-set of elements duplicates PIPES (logical and physical algebra & query plans): Raw streams (s, τ): sequence of elements from input sources Logical streams (s, τ, n): order-agnostic representation of multi-set of elements show validity of tuples at time-instant level (S 1, S 2 ) := {(s 1 s 2, τ, n 1 n 2 ) (s 1, τ, n 1 ) S 1 (s 2, τ, n 2 ) S 2 } Physical streams (s,v): v validity interval, processed in physical operators Denotational View [Maier et al. 2005]: Gives several different representation possibilities described by reconstitution functions, e.g., set(t), bag(t) Relation STREAM Mapping from each time instant in T to a finite, but unbounded multi-set of tuples with schema R notion of time Set of tuples that may vary over time instantaneous relation
22 Semantics - Operators Create Stream Stream-to-Relation Relation-to-Relation Relation-to-Stream Stream-to-Stream source stream-to-relation relation-to-relation stream-to-stream Stream Relation relation-to-stream create stream Adapted from [Arasu et al.2004] Slide 22/45 Language closure: S2S operators (Borealis, System S,..) closed under streams Allows nesting of queries Allows for better algebraic optimization No real Stream-to-Stream (Istream, Rstream, Dstream) Correlation, e.g., in STREAM, StreamMill, Aurora: Variants of joins: with or without windows
23 Stream-to-Relation - Windows Tackle problem of unbounded streams retrieve finite portion of the stream (temporary relation) General definition [Patroumpas and Sellis 2005]: A window is the set of all elements of a stream for which a conjunctive window condition E holds at a certain time instant (also window state for this time instant) Windowing attribute: determines the ordering mostly timestamps Slide 23/45 Definition of Windows: Implicit definition: integrated in other operators (e.g., Aurora) Aggregate(CNT, Assuming O, Size 1 minute, Advance 10 second) Explicit definition: operator on its own, e.g., in STREAM may violate language closure
24 Windows Measurement Unit Slide 24/45 Measurement Unit: and Edge Shift One bound must be specified to define size Logical units: Time-based windows Value-based windows: need increasing sequence of values for discriminating attribute have to know when no more values lie in this interval Physical units: Edge shift Count-based or tuple-based windows Partitioned windows: separates the stream into substreams depending on grouping attributes window is the union of the windowed substreams Fixed-bound(s) windows: at least one bound is fixed, e.g., fixing lower bound and shifting upper bound landmark windows Fixed-band windows: fixed upper and lower bounds keep state Variable-bounds windows: both bounds are flexible, size is fixed sliding windows
25 Windows Progression Step Progression step: Window progresses up on arrival of new tuples or time advancement Unit step vs. Hops: number of tuple or time instants at a time Slide 25/45 Tumbling: windows is filled until its boundaries are reached, no overlapping If it is full operator evaluates the content Sliding: Window moves forward on tuples or time advancement, overlapping possible Non-monotonic while window moves, new results are produced and old ones expire (no accumulative results) Option: Use of negative tuples to cancel expired results Punctuation-based: Punctuations are flags set into the stream Operators accumulate elements until a punctuation is reached and then evaluate the window
26 Windows - Examples SELECT Istream Count(*) FROM C2XMgs[Range 1 Minute Slide 10s] WHERE Speed > 30.0 Sliding Window (CQL) time SELECT Istream Count(*) FROM C2XMgs[Range 1000 Slide 1000] WHERE Speed > 30.0 Tumbling Window (CQL) time Slide 26/45 time SELECT Count(*) FROM C2XMgs <LANDMARK RESET AFTER 600 ROWS ADVANCE 20 ROWS> WHERE Speed > 30.0 Landmark Window (TruSQL)
27 Slide 27/45 Stream-to-Relation Windowed Operators Projection, Selection: not necessary, but often required for applications Deduplication: only returns the most recent tuple of its kind Windowed Join, Sliding Window Join: Between two windows, but extendable to multiway join When a new tuple arrives in one of the windows it is matched against tuples of the other window Commutative & associative, distributive over selection and projection Eager and lazy variants [Golab and Öszu 2003] Aggregates: Grouping of tuples in window according to attributes in group list Application of aggregate function Set operations Windowed union & intersection: not distributive over selection S 1 S 2 Adapted from [Patroumpas and Sellis 2005]
28 Relation-to-Stream Slide 28/45 Creates an unbounded stream S from finite relation R Concatenate tuples by creation timestamps as operator output too many duplicates (accumulative results) Better: just consider the differences between two time steps Explicit use of specific operators (STREAM): Istream (insert stream): whenever a new tuple is added to R between τ- 1 and τ, it is also added to S only new tuples with timestamp τ are output Rstream (relation stream): outputs all tuples of relation R at time τ Dstream (delete stream): Outputs all tuples which have been deleted from R between τ-1 and τ only deleted tuples with timestamp τ are output SELECT Dstream(MsgID) FROM C2XMgs[Range 20 Seconds] Implicitly integrated into other operators Istream mostly used (e.g., TelegraphCQ, Aurora, StreamBase)
29 Stream-to-Stream Example Aurora (Stateless) Filter (P 1,...P m )(S): defines one or more filter predicates on an input stream. If a tuple matches route it to the corresponding P i output, m+1 outputs (one for else), similar to rel. SELECT Slide 29/45 Map(B 1 =F 1,...,B m =F m )(S): constructs new stream elements for an output by defining functions over the input tuples (similar to projection) Union(S 1,...,S n ): streams with common schema are merged into one stream
30 Stream-to-Stream Example Aurora (Stateful) Have to assume some ordering to stay in finite bounds Order: O (On A, Slack n, GroupBy B 1,..,B m ) BSort(Assuming O)(S): Bubble sort on the stream over the data on attribute A Join(P, Size s, Left Assuming O 1, Right Assuming O 2 )(S 1,S 2 ): P being a join predicate, s = Size of the window, O 1 and O 2 are orderings on S 1, S 2 respectively. Slide 30/45 Resample(F, Size s, Left Assuming O 1, Right Assuming O 2 ) (S 1,S 2 ): similar to semijoin, asymmetric, F= window/aggregate interpolation function over S 2 Aggregate (F, Assuming O, Size s, Advance i,[timeout z])(s): F = window/aggregate function (e.g., AVG), s = Size of the window, i=sliding step, timeout to prevent blocking when waiting for elements Aggregate(CNT, Assuming O, Size 1 minute, Advance 10 second)
31 Agenda 1. Introduction 2. Data Stream Management Systems 3. Query Languages 4. Query Plans & Operators 5. Quality Aspects in DSMS 6. Our work Slide 31/45
32 Query Processing Overview Query Parsing/ Translation Initial Logical Plan Optimized logical plan Algebraic Optimization Translation/ Physical Optimization GUI for logical algebra GUI for physical algebra Slide 32/45 Optimized physical plan Execution Results [Krämer & Seeger 2009]
33 Physical Query Plan - Examples Slide 33/45 [Arasu et al., 2006] [Abadi et al. 2003]
34 Supporting structures Slide 34/45 Queues Connect outputs of producing operators with inputs of consuming operators Buffer the elements for the consuming op. Ordering: elements can be placed in a specific order in the queue Synopses Stores a state for an operator in a specific data structure Examples: SHJ store hash tables for each stream Window state Summary for approximate query answering different techniques, e.g., using wavelets, histograms, sketching... Synopsis sharing with stores and stubs (STREAM) Some operators may need similar or identical results Store: keeps the union of intermediate results Same interface as synopses reports status to store
35 Blocking Operators Unable to produce a tuple without knowing the entire input Operators: sorting, count, min, max, avg, join Slide 35/45 Resolutions: Punctuations: Mark in the stream when an operator should evaluate After a punctuation no tuples with matching data will come Representation, e.g., in Niagara data using the schema of the stream filled with a series of pattern, e.g. restrict timestamp field indicates no more tuples matching an interval of dates will come Disadvantage: sources have to produce these punctuations Non-blocking counter parts Example: Joins
36 Join Implementations Slide 36/45 Nested Loop Join (sliding window join) [Kang et al. 2003] Non-blocking Symmetric Hash Join: Two hash tables A, B, both in memory, a hash function h If a new tuple t 1 for stream A arrives, calculate h(t 1 ) for A and probe it with values for h(t 1 ) in hash table B, store tuple in the hash table at h(t 1 ) Disadvantage: Only equi-join possible Use trees or lists can be used for Theta-Joins XJoin [Urhan and Franklin 2000]: Similar to SHJ if memory exceeded thresholds outsource biggest bucket if one or both sources are stalled (no tuple arrives) perform join with outsourced data no interruption, all results are produced Ripple join [Haas and Hellerstein 1999] Retrieve randomly one tuple from each stream at each sampling step are joined with each other and previously seen tuples Square: sampling rate of both is equal, rectangular: one stream is sampled more often than the other Adaptive solution [Kang et al. 2003]: Depending on predicate, stream rate etc. the operator is dynamically chosen
37 Example Hash-Merge-Join [Mokbel et al. 2004] Slide 37/45
38 Agenda 1. Introduction 2. Data Stream Management Systems 3. Query Languages 4. Query Plans & Operators 5. Quality Aspects in DSMS 6. Our work Slide 38/45
39 System Perspective (QoS): Aurora [Abadi et al. 2003] Calculates QoS values for response times, tuple drops and values produced Users defines two-dimensional QoS graphs for each output and each quality dimension to describe QoS tolerable QoS boundaries Example: importance of (numerical) values can be described by a function Uses QoS for adaptation of scheduling priorities State-based: rates utility of a box output, scheduler picks the output with highest utility, whereby utility means, how much it will harm QoS if its execution is deferred Feedback-based: if latency in QoS of an output is high, priority is increased, otherwise decreased Slide 39/45 Borealis [Abadi et al. 2005] QoS is predictable at any point in the query, not only outputs Extends messages with QoS information (Vector of Metrics) which contains content-related (e.g., tuple importance) and performance-related metrics (e.g., dropped tuples up to now) Also: parameterizable Score Function, which can calculate from a VM the current impact of a message on QoS
40 Slide 40/45 Data Perspective [Klein et al. 2009] Divide the stream into non-overlapping, jumping data quality windows for each attribute Window contains the values for the attribute, timestamp and a set of attributes, which contain values for quality dimensions Dimensions: accuracy, confidence, completeness, data volume, timeliness Distinguish operator classes: data-modifying (e.g., filtering, Join), datagenerating (e.g, Interpolation), data-reducing (e.g., Projection, Sampling ), data-merging( e.g., Aggregate Define quality operator analogs to operators Data quality operators implement a function which calculates a new quality value for elements resulting from the operator Implemented adaptive window size algorithms based on interestingness finer granularity of windows at high peaks, threshold excess, fluctuations
41 Agenda 1. Introduction 2. Data Stream Management Systems 3. Query Languages 4. Query Plans & Operators 5. Quality Aspects in DSMS 6. Our work Slide 41/45
42 The Cooperative Cars Project Development of a Car2X communication infrastructure and according applications based on cellular networks (emphasizing 3G and 3G+) Application: send hazard warning messages over cellular network infrastructure, e.g., a vehicle braking very hard Poses challenges for mobile communication: latency, data privacy, reliability Poses challenges for data management & applications High data rates scalability, performance Integration of multiple data sources Information accuracy (e.g., Floating Phone Data) Data stream mining to derive new information from events Slide 42/45
43 Queue-end Detection Scenario Idea: Separate each road into sections and determine if it contains a queue-end binary classification task Use CoCar messages as data sources only Use data stream mining test which algorithm suits the task best and which parameters influence mining results Data Stream Management System GSN Slide 43/45 Position Estimated Queue-end CoCar Messages VISSIM Traffic Simulation CoCar Wrapper Integration Degradation of Positions Aggregation Map Matching Data Mining Classification Training Training Classification Topolog. Data Access Determination of Estimated Queue-end Ground Truth Queue-end Queue-End Wrapper Map Matching [Geisler et al. 2010]
44 Realization Slide 44/45
45 Slide 45/45 References (1) [Abadi et al. 2005] Abadi, D. J.; Ahmad, Y.; Balazinska, M.; Çetintemel, U.; Cherniack, M.; Hwang, J.-H.; Lindner, W.; Maskey, A.; Rasin, A.; Ryvkina, E.; Tatbul, N.; Xing, Y. & Zdonik, S. B. The Design of the Borealis Stream Processing Engine Proc. 2nd Biennal Conference on Innovative Data Systems Research (CIDR), 2005, [Abadi et al. 2003] Abadi, D. J.; Carney, D.; Çetintemel, U.; Cherniack, M.; Convey, C.; Lee, S.; Stonebraker, M.; Tatbul, N. & Zdonik, S. B. Aurora: a new model and architecture for data stream management.vldb Journal, 2003, 12, [Ahmad & Centintemel 2009] Ahmad, Y. & Cetintemel, U. Liu, L. & Özsu, M. T. (ed.) Data Stream Management Architectures and Prototypes. Encyclopedia of Database Systems, Springer, 2009, [Amini et al. 2006] Amini, L.; Andrade, H.; Bhagwan, R.; Eskesen, F.; King, R.; Selo, P.; Park, Y. & Venkatramani, C. SPC: a distributed, scalable platform for data mining. DMSSP '06: Proceedings of the 4th international workshop on Data mining standards, services and platforms, ACM, 2006, [Arasu et al. 2004] Arasu, A.; Babcock, B.; Babu, S.; Cieslewicz, J.; Datar, M.; Ito, K.; Motwani, R.; Srivastava, U. & Widom, J. STREAM: The Stanford Data Stream Management System. Stanford InfoLab, 2004 [Arasu et al. 2006] Arasu, A.; Babu, S. & Widom, J. The CQL continuous query language: semantic foundations and query execution. VLDB Journal, 2006, 15, [Biem et al. 2010] Biem, A.; Bouillet, E.; Feng, H.; Ranganathan, A.; Riabov, A.; Verscheure, O.; Koutsopoulos, H. & Moran, C. IBM InfoSphere Streams for Scalable, Real-Time, Intelligent Transportation Services. Proc. of SIGMOD'10, 2010 [Babcock et al. 2002] Babcock, B.; Babu, S.; Datar, M.; Motwani, R. & Widom, J. Models and Issues in Data Stream Systems. PODS 2002, 2002 [Chandrasekaran et al. 2003] Chandrasekaran, S.; Cooper, O.; Deshpande, A.; Franklin, M. J.; Hellerstein, J. M.; Hong, W.; Krishnamurthy, S.; Madden, S.; Raman, V.; Reiss, F. & Shah, M. A. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. Proc. 1st Biennal Conference on Innovative Data Systems Research (CIDR), 2003 [Cherniack et al. 2009] Cherniack, M. & Zdonik, S. Liu, L. & Özsu, M. T. (ed.) Stream-Oriented Query Languages and Architectures. Encyclopedia of Database Systems, Springer, 2009, [Demers et al. 2005] Demers, A.; Gehrke, J.; Hong, M.; Riedewald, M. & White, W. A General Algebra and Implementation for Monitoring Event Streams.. Cornell University, 2005 [Gedik et al. 2008] Gedik, B.; Andrade, H.; Wu, K.-L.; Yu, P. S. & Doo, M. SPADE: the system s declarative stream processing engine. SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM, 2008, [Geisler et al. 2010] Geisler, S.; Quix, C. & Schiffer, S. Ali, M.; Hoel, E. & Shahabi, C. (ed.) A Data Stream-based Evaluation Framework for Traffic Information Systems Proc. 1st ACM SIGSPATIAL International Workshop on GeoStreaming, 2010
46 Slide 46/45 References (2) Girod et al. 2008] Girod, L.; Mei, Y.; Newton, R.; Rost, S.; Thiagarajan, A.; Balakrishnan, H. & Madden, S. XStream: a Signal- Oriented Data Stream Management System. ICDE, 2008, [Golab and Özsu 2003] Golab, L. & Özsu, M. T.Issues in Stream Management SIGMOD Record, 2003, 32, 5-14 [Kang et al. 2003] Kang, J.; Naughton, J. & Viglas, S. Evaluating window joins over unbounded streams Data Engineering, Proceedings. 19th International Conference on, 2003, [Klein et al. 2009] Klein, A. & Lehner, W. Representing Data Quality in Sensor Data Streaming Environments ACM Journal of Data and Information Quality, 2009, 1, 1-28 [Krämer & Seeger 2009] Krämer, J. & Seeger, B.,Semantics and Implementation of Continous Sliding Window Queries over Data Streams. ACM Trans. on Database Systems, 2009, 34, 1-49 [Maier 2005] Maier, D.; Li, J.; Tucker, P.; Tufte, K. & Papadimos, V. Semantics of Data Streams and Operators.ICDT 2005, Springer, 2005, [Mokbel et al. 2004] Mokbel, M. F.; Lu, M. & Aref, W. G. Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results ICDE, 2004 [Patroumpas & Sellis 2005] Patroumpas, K. & Sellis, T. K. Window Specification over Data Streams Current Trends in Database Technology - EDBT 2006 Workshops, 2006, [Peng and Chawathe 2003] Peng, F. & Chawathe., S. S. XSQ: A Streaming XPath Engine Technical Report CS-TR-4493 (UMIACS-TR )., Computer Science Department, University of Maryland, 2003 [Stonebraker et al. 2005] Stonebraker, M.; Çetintemel, U. & Zdonik, S. B. The 8 requirements of real-time stream processing. SIGMOD Record, 2005, 34, [Terry et al. 1992] Terry, D. B.; Goldberg, D.; Nichols, D. A. & Oki, B. M. Stonebraker, M. (ed.) Continuous Queries over Append-Only Databases. Proc. ACM SIGMOD International Conference on Management of Data, ACM Press, 1992, [Thakkar et al. 2008] Thakkar, H.; Mozafari, B. & Zaniolo., C. Designing an Inductive Data Stream Management System. the Stream Mill Experience The Second International Workshop on Scalable Stream Processing Systems, 2008 [Urhan and Franklin 2000] Urhan, T. & Franklin, M. J.XJoin: A Reactively-Scheduled Pipelined Join Operator Bulletin of the IEEE Computer Society Technical Committe on Data Engineering, 2000, 23, [Viglas 2005] Viglas, S. Chaudhry, N. A.; Shaw, K. & Abdelguerfi, M. (ed.) Query Execution and Optimization.Stream Data Management, Springer, 2005, [Zdonik et al. 2004] Zdonik, S.; Sibley, P.; Rasin, A.; Sweetser, V.; Montgomery, P.; Turner, J.; Wicks, J.; Zgolinski, A.; Snyder, D.; Humphrey, M. & Williamson, C. Streaming for Dummies,
Data Stream Management System for Moving Sensor Object Data
SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 12, No. 1, February 2015, 117-127 UDC: 004.422.635.5 DOI: 10.2298/SJEE1501117J Data Stream Management System for Moving Sensor Object Data Željko Jovanović
Data Stream Management Systems
Data Stream Management Systems Principles of Modern Database Systems 2007 Tore Risch Dept. of information technology Uppsala University Sweden Tore Risch Uppsala University, Sweden What is a Data Base
Flexible Data Streaming In Stream Cloud
Flexible Data Streaming In Stream Cloud J.Rethna Virgil Jeny 1, Chetan Anil Joshi 2 Associate Professor, Dept. of IT, AVCOE, Sangamner,University of Pune, Maharashtra, India 1 Student of M.E.(IT), AVCOE,
STREAM: The Stanford Data Stream Management System
STREAM: The Stanford Data Stream Management System Arvind Arasu, Brian Babcock, Shivnath Babu, John Cieslewicz, Mayur Datar, Keith Ito, Rajeev Motwani, Utkarsh Srivastava, and Jennifer Widom Department
Aurora: a new model and architecture for data stream management
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey 2, Sangdon Lee 2, Michael Stonebraker 3, Nesime Tatbul
RTSTREAM: Real-Time Query Processing for Data Streams
RTSTREAM: Real-Time Query Processing for Data Streams Yuan Wei Sang H Son John A Stankovic Department of Computer Science University of Virginia Charlottesville, Virginia, 2294-474 E-mail: {yw3f, son,
Preemptive Rate-based Operator Scheduling in a Data Stream Management System
Preemptive Rate-based Operator Scheduling in a Data Stream Management System Mohamed A. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis Department of Computer Science University of Pittsburgh Pittsburgh,
Data Stream Management System
Case Study of CSG712 Data Stream Management System Jian Wen Spring 2008 Northeastern University Outline Traditional DBMS v.s. Data Stream Management System First-generation: Aurora Run-time architecture
Optimizing Timestamp Management in Data Stream Management Systems
Optimizing Timestamp Management in Data Stream Management Systems Yijian Bai [email protected] Hetal Thakkar [email protected] Haixun Wang IBM T. J. Watson [email protected] Carlo Zaniolo [email protected]
Design, Implementation, and Evaluation of Network Monitoring Tasks with the TelegraphCQ Data Stream Management System
Design, Implementation, and Evaluation of Network Monitoring Tasks with the TelegraphCQ Data Stream Management System INF5100, Autumn 2006 Jarle Content Introduction Streaming Applications Data Stream
A Review of Window Query Processing for Data Streams
Invited Paper Journal of Computing Science and Engineering, Vol. 7, No. 4, December 2013, pp. 220-230 A Review of Window Query Processing for Data Streams Hyeon Gyu Kim* Division of Computer, Sahmyook
PLACE*: A Distributed Spatio-temporal Data Stream Management System for Moving Objects
PLACE*: A Distributed Spatio-temporal Data Stream Management System for Moving Objects Xiaopeng Xiong Hicham G. Elmongui Xiaoyong Chai Walid G. Aref Department of Computer Sciences, Purdue University,
Chapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
Scaling a Monitoring Infrastructure for the Akamai Network
Scaling a Monitoring Infrastructure for the Akamai Network Thomas Repantis Akamai Technologies [email protected] Scott Smith Formerly of Akamai Technologies [email protected] ABSTRACT We describe the
DataCell: Building a Data Stream Engine on top of a Relational Database Kernel
DataCell: Building a Data Stream Engine on top of a Relational Database Kernel Erietta Liarou (supervised by Martin Kersten) CWI Amsterdam, The Netherlands [email protected] ABSTRACT Stream applications gained
Data Mining for Knowledge Management. Mining Data Streams
Data Mining for Knowledge Management Mining Data Streams Themis Palpanas University of Trento http://dit.unitn.it/~themis Spring 2007 Data Mining for Knowledge Management 1 Motivating Examples: Production
Data Stream Management
Data Stream Management Synthesis Lectures on Data Management Editor M. Tamer Özsu, University of Waterloo Synthesis Lectures on Data Management is edited by Tamer Özsu of the University of Waterloo. The
Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams
Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams Lukasz Golab M. Tamer Özsu School of Computer Science University of Waterloo Waterloo, Canada {lgolab, tozsu}@uwaterloo.ca
Task Scheduling in Data Stream Processing. Task Scheduling in Data Stream Processing
Task Scheduling in Data Stream Processing Task Scheduling in Data Stream Processing Zbyněk Falt and Jakub Yaghob Zbyněk Falt and Jakub Yaghob Department of Software Engineering, Charles University, Department
Multilevel Secure Data Stream Processing
Multilevel Secure Data Stream Processing Raman Adaikkalavan 1, Indrakshi Ray 2, and Xing Xie 2 1 Computer and Information Science, Indiana University South Bend [email protected] 2 Computer Science, Colorado
2 Associating Facts with Time
TEMPORAL DATABASES Richard Thomas Snodgrass A temporal database (see Temporal Database) contains time-varying data. Time is an important aspect of all real-world phenomena. Events occur at specific points
Survey of Distributed Stream Processing for Large Stream Sources
Survey of Distributed Stream Processing for Large Stream Sources Supun Kamburugamuve For the PhD Qualifying Exam 12-14- 2013 Advisory Committee Prof. Geoffrey Fox Prof. David Leake Prof. Judy Qiu Table
An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams
An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams Susan D. Urban 1, Suzanne W. Dietrich 1, 2, and Yi Chen 1 Arizona
Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification
Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries
The 8 Requirements of Real-Time Stream Processing
The 8 Requirements of Real-Time Stream Processing Michael Stonebraker Computer Science and Artificial Intelligence Laboratory, M.I.T., and StreamBase Systems, Inc. [email protected] Uğur Çetintemel
Middleware support for the Internet of Things
Middleware support for the Internet of Things Karl Aberer, Manfred Hauswirth, Ali Salehi School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne,
Data Stream Management and Complex Event Processing in Esper. INF5100, Autumn 2010 Jarle Søberg
Data Stream Management and Complex Event Processing in Esper INF5100, Autumn 2010 Jarle Søberg Outline Overview of Esper DSMS and CEP concepts in Esper Examples taken from the documentation A lot of possibilities
The basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
Load Shedding for Aggregation Queries over Data Streams
Load Shedding for Aggregation Queries over Data Streams Brian Babcock Mayur Datar Rajeev Motwani Department of Computer Science Stanford University, Stanford, CA 94305 {babcock, datar, rajeev}@cs.stanford.edu
Alleviating Hot-Spots in Peer-to-Peer Stream Processing Environments
Alleviating Hot-Spots in Peer-to-Peer Stream Processing Environments Thomas Repantis and Vana Kalogeraki Department of Computer Science & Engineering, University of California, Riverside, CA 92521 {trep,vana}@cs.ucr.edu
DBMS / Business Intelligence, SQL Server
DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.
Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner
Understanding SQL Server Execution Plans Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author
SOLE: Scalable On-Line Execution of Continuous Queries on Spatio-temporal Data Streams
vldb manuscript No. (will be inserted by the editor) Mohamed F. Mokbel Walid G. Aref SOLE: Scalable On-Line Execution of Continuous Queries on Spatio-temporal Data Streams the date of receipt and acceptance
Models and Issues in Data Stream Systems
Models and Issues in Data Stream Systems Brian Babcock Shivnath Babu Mayur Datar Rajeev Motwani Jennifer Widom Department of Computer Science Stanford University Stanford, CA 94305 babcock,shivnath,datar,rajeev,widom
Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Hadoop Ecosystem Overview of this Lecture Module Background Google MapReduce The Hadoop Ecosystem Core components: Hadoop
Monitoring Streams A New Class of Data Management Applications
Monitoring Streams A New Class of Data Management Applications Don Carney dpc@csbrownedu Uğur Çetintemel ugur@csbrownedu Mitch Cherniack Brandeis University mfc@csbrandeisedu Christian Convey cjc@csbrownedu
PART III. OPS-based wide area networks
PART III OPS-based wide area networks Chapter 7 Introduction to the OPS-based wide area network 7.1 State-of-the-art In this thesis, we consider the general switch architecture with full connectivity
Efficient Data Streams Processing in the Real Time Data Warehouse
Efficient Data Streams Processing in the Real Time Data Warehouse Fiaz Majeed [email protected] Muhammad Sohaib Mahmood [email protected] Mujahid Iqbal [email protected] Abstract Today many business
RUBA: Real-time Unstructured Big Data Analysis Framework
RUBA: Real-time Unstructured Big Data Analysis Framework Jaein Kim, Nacwoo Kim, Byungtak Lee IT Management Device Research Section Honam Research Center, ETRI Gwangju, Republic of Korea jaein, nwkim, [email protected]
Software Engineering
Software Engineering Lecture 06: Design an Overview Peter Thiemann University of Freiburg, Germany SS 2013 Peter Thiemann (Univ. Freiburg) Software Engineering SWT 1 / 35 The Design Phase Programming in
Relational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute
Real-Time Component Software. slide credits: H. Kopetz, P. Puschner
Real-Time Component Software slide credits: H. Kopetz, P. Puschner Overview OS services Task Structure Task Interaction Input/Output Error Detection 2 Operating System and Middleware Applica3on So5ware
The Design of the Borealis Stream Processing Engine
The Design of the Borealis Stream Processing Engine Daniel J. Abadi 1, Yanif Ahmad 2, Magdalena Balazinska 1, Uğur Çetintemel 2, Mitch Cherniack 3, Jeong-Hyon Hwang 2, Wolfgang Lindner 1, Anurag S. Maskey
Database Replication with MySQL and PostgreSQL
Database Replication with MySQL and PostgreSQL Fabian Mauchle Software and Systems University of Applied Sciences Rapperswil, Switzerland www.hsr.ch/mse Abstract Databases are used very often in business
Chapter 1: Introduction
Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases
Data Stream Processing on Enterprise Integration Platforms
Data Stream Processing on Enterprise Integration Platforms A thesis submitted in part fulfilment of the degree of MSc Advanced Software Engineering in Computer Science with the supervision of Dr. Joseph
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture References Anatomy of a database system. J. Hellerstein and M. Stonebraker. In Red Book (4th
Inside the PostgreSQL Query Optimizer
Inside the PostgreSQL Query Optimizer Neil Conway [email protected] Fujitsu Australia Software Technology PostgreSQL Query Optimizer Internals p. 1 Outline Introduction to query optimization Outline of
MOC 20461C: Querying Microsoft SQL Server. Course Overview
MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server
Scalable Distributed Stream Processing
Scalable Distributed Stream Processing Mitch, Cherniack Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Ying Xing, and Stan Zdonik Abstract Stream processing fits a large class of
BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
www.gr8ambitionz.com
Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1
A Data Management Middleware for ITS Services in Smart Cities
Journal of Universal Computer Science, vol. 22, no. 2 (2016), 228-246 submitted: 19/5/15, accepted: 29/1/16, appeared: 1/2/16 J.UCS A Data Management Middleware for ITS Services in Smart Cities Luca Carafoli
Master s Thesis Nr. 14
Master s Thesis Nr. 14 Systems Group, Department of Computer Science, ETH Zurich Performance Benchmarking of Stream Processing Architectures with Persistence and Enrichment Support by Ilkay Ozan Kaya Supervised
Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks
Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks Benjamin Schiller and Thorsten Strufe P2P Networks - TU Darmstadt [schiller, strufe][at]cs.tu-darmstadt.de
Management of Human Resource Information Using Streaming Model
, pp.75-80 http://dx.doi.org/10.14257/astl.2014.45.15 Management of Human Resource Information Using Streaming Model Chen Wei Chongqing University of Posts and Telecommunications, Chongqing 400065, China
