Data Stream Management



Similar documents
SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY. Elena Zheleva Evimaria Terzi Lise Getoor. Privacy in Social Networks

From Immunotherapy of Cancer to the Discovery of Kidney Cancer Genes

Instant Recovery with Write-Ahead Logging Page Repair, System Restart, and Media Restore

The basic data mining algorithms introduced may be enhanced in a number of ways.

Effective Data Cleaning with Continuous Evaluation

Dynamic Data in terms of Data Mining Streams

Data Mining for Knowledge Management. Mining Data Streams

Perspectives on Business Intelligence

A Sequence-Oriented Stream Warehouse Paradigm for Network Monitoring Applications

Guide to Operating SAS IT Resource Management 3.5 without a Middle Tier

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Principles of Distributed Database Systems

Topics in basic DBMS course

Manifest for Big Data Pig, Hive & Jaql

Report on the Dagstuhl Seminar Data Quality on the Web

An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams

Information Management

Optimizing Timestamp Management in Data Stream Management Systems

Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage

How To Write A Privacy Preserving Firewall Optimization Protocol

Databases in Organizations

Building a Data Warehouse

The Data Quality Continuum*

INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS

Load Distribution in Large Scale Network Monitoring Infrastructures

Agile Web Development with Rails 4

International Journal of Advanced Research in Computer Science and Software Engineering

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Understanding traffic flow

Fact Sheet In-Memory Analysis

A Survey on Data Warehouse Constructions, Processes and Architectures

KEYWORD SEARCH IN RELATIONAL DATABASES

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Data Warehousing and OLAP Technology for Knowledge Discovery

White Paper. Quantum StorageCare Guardian

Data Stream Management Systems

Data Management, Analysis Tools, and Analysis Mechanics

LDIF - Linked Data Integration Framework

Data Mining with Big Data e-health Service Using Map Reduce

A STATISTICAL DATA FUSION TECHNIQUE IN VIRTUAL DATA INTEGRATION ENVIRONMENT

Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda

Data Warehouse Design

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

Efficient and Effective Duplicate Detection Evaluating Multiple Data using Genetic Algorithm

SAS IT Resource Management 3.2

Management of Human Resource Information Using Streaming Model

SQL Server 2012 Business Intelligence Boot Camp

Data Warehousing with Oracle

High-Volume Data Warehousing in Centerprise. Product Datasheet

Data Migration. How CXAIR can be used to improve the efficiency and accuracy of data migration. A CXAIR White Paper.

The Data Analytics Group at the Qatar Computing Research Institute

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

An Algorithm to Evaluate Iceberg Queries for Improving The Query Performance

Modelling Architecture for Multimedia Data Warehouse

SYSPRO Point of Sale: Architecture

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Introduction to Data Mining

SPATIAL DATA CLASSIFICATION AND DATA MINING

The University of Jordan

Managing Data in Motion

University Data Warehouse Design Issues: A Case Study

résumé de flux de données

A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

Contents RELATIONAL DATABASES

Data Integration and ETL Process

A UPS Framework for Providing Privacy Protection in Personalized Web Search

Library Requirements

Building Data Warehouse

SimCorp Solution Guide

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Decision Support and Business Intelligence Systems. Chapter 1: Decision Support Systems and Business Intelligence

Analytics: Pharma Analytics (Siebel 7.8) Student Guide

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Data Integration and ETL Process

Scenario 2: Cognos SQL and Native SQL.

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

Data Mining System, Functionalities and Applications: A Radical Review

A Survey on Data Warehouse Architecture

The 4 Pillars of Technosoft s Big Data Practice

Implementing a Data Warehouse with Microsoft SQL Server 2012

Turkish Journal of Engineering, Science and Technology

A New Era Of Analytic

Transcription:

Data Stream Management

Synthesis Lectures on Data Management Editor M. Tamer Özsu, University of Waterloo Synthesis Lectures on Data Management is edited by Tamer Özsu of the University of Waterloo. The series will publish 50- to 125 page publications on topics pertaining to data management. The scope will largely follow the purview of premier information and computer science conferences, such as ACM SIGMOD, VLDB, ICDE, PODS, ICDT, and ACM KDD. Potential topics include, but not are limited to: query languages, database system architectures, transaction management, data warehousing, XML and databases, data stream systems, wide scale data distribution, multimedia data management, data mining, and related subjects. Data Stream Management Lukasz Golab and M. Tamer Özsu 2010 Access Control in Data Management Systems Elena Ferrari 2010 An Introduction to Duplicate Detection Felix Naumann and Melanie Herschel 2010 Privacy-Preserving Data Publishing: An Overview Raymond Chi-Wing Wong and Ada Wai-Chee Fu 2010 Keyword Search in Databases Jeffrey Xu Yu, Lu Qin, and Lijun Chang 2009

Copyright 2010 AT&T Labs, Inc. and M. Tamer Özsu. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Data Stream Management Lukasz Golab and M. Tamer Özsu www.morganclaypool.com ISBN: 9781608452729 ISBN: 9781608452736 paperback ebook DOI 10.2200/S00284ED1V01Y201006DTM005 A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON DATA MANAGEMENT Lecture #5 Series Editor: M. Tamer Özsu, University of Waterloo Series ISSN Synthesis Lectures on Data Management Print 2153-5418 Electronic 2153-5426

Data Stream Management Lukasz Golab AT&T Labs Research, USA M. Tamer Özsu University of Waterloo, Canada SYNTHESIS LECTURES ON DATA MANAGEMENT #5 & M C Morgan & claypool publishers

ABSTRACT In this lecture many applications process high volumes of streaming data, among them Internet traffic analysis, financial tickers, and transaction log mining. In general, a data stream is an unbounded data set that is produced incrementally over time, rather than being available in full before its processing begins. In this lecture, we give an overview of recent research in stream processing, ranging from answering simple queries on high-speed streams to loading real-time data feeds into a streaming warehouse for off-line analysis. We will discuss two types of systems for end-to-end stream processing: Data Stream Management Systems (DSMSs) and Streaming Data Warehouses (SDWs). A traditional database management system typically processes a stream of ad-hoc queries over relatively static data. In contrast, a DSMS evaluates static (long-running) queries on streaming data, making a single pass over the data and using limited working memory. In the first part of this lecture, we will discuss research problems in DSMSs, such as continuous query languages, non-blocking query operators that continually react to new data, and continuous query optimization. The second part covers SDWs, which combine the real-time response of a DSMS by loading new data as soon as they arrive with a data warehouse s ability to manage Terabytes of historical data on secondary storage. KEYWORDS Data stream Management Systems, Stream Processing, Continuous Queries, Streaming Data Warehouses

vii Contents 1 Introduction...1 1.1 Overview of Data Stream Management...1 1.2 Organization...6 2 Data Stream Management Systems...9 2.1 Preliminaries...9 2.1.1 Stream Models...9 2.1.2 Stream Windows...11 2.2 Continuous Query Semantics and Operators...12 2.2.1 Semantics and Algebras...12 2.2.2 Operators...13 2.2.3 Continuous Queries as Views...17 2.2.4 Semantics of Relations in Continuous Queries...18 2.3 Continuous Query Languages...19 2.3.1 Streams, Relations and Windows...19 2.3.2 User-Defined Functions...21 2.3.3 Sampling...22 2.3.4 Summary...22 2.4 Stream Query Processing...23 2.4.1 Scheduling...23 2.4.2 Heartbeats and Punctuations...24 2.4.3 Processing Queries-As-Views and Negative Tuples...26 2.5 Stream Query Optimization...29 2.5.1 Static Analysis and Query Rewriting...29 2.5.2 Operator Optimization - Join...30 2.5.3 Operator Optimization - Aggregation...31

viii CONTENTS 2.5.4 Multi-Query Optimization...33 2.5.5 Load Shedding and Approximation...34 2.5.6 Load Balancing...35 2.5.7 Adaptive Query Optimization...35 2.5.8 Distributed Query Optimization...36 3 Streaming Data Warehouses...39 3.1 Data Extraction, Transformation and Loading...39 3.2 Update Propagation...40 3.3 Data Expiration...42 3.4 Update Scheduling...43 3.5 Querying a Streaming Data Warehouse...45 4 Conclusions...47 Bibliography...49 Authors Biographies...65