Data Stream Management

Size: px
Start display at page:

Download "Data Stream Management"

Transcription

1 Data Stream Management Yanlei Diao University of Massachusetts Amherst Slides courtesy of Michael Franklin and Jianjun Chen

2 Data Stream Management Two driving forces:! A collection of applications where data streams naturally exist but DBMS doesn t help much! Advances of sensor technologies 11/6/13 2

3 Financial Applications v Financial services Data feeds: stock tickers, foreign exchange transactions Data rate: 10 s or 100 s thousands of messages per second Applications: routing trade requests, automating trade strategies, market trend analysis Stream systems: e.g., /6/13 3

4 Network and System Monitoring v Network monitoring Packet traces, network performance measurements Data rate: gigabits per second Applications: traffic analysis, performance monitoring, router configuration, intrusion detection Stream systems: e.g., Gigascope at AT&T v System/Application monitoring Data: system log, measurements Stream systems: e.g., Ganglia 11/6/13 4

5 Wireless Sensor Networks v Wireless sensor networks Sensor devices: temperature, light, pressure, acceleration, humidity, magnetic field, A set of sensor devices auto-configure themselves into a communication network Applications: environment monitoring habitat monitoring structural monitoring vehicle tracking 11/6/13 5

6 Radio Frequency Identification v RFID Technology Tags: small devices, transmitting a tag id when brought close to an RFID reader Readers: interrogate tags with radio signal; read constantly, over a range, without line-of-sight Applications: supply chain management healthcare pharmaceuticals postal services Stream systems: e.g., E.001.F EF.0A D.01 reader_id, tag_id, timestamp 11/6/13 6

7 Execution Models: A Comparison Traditional DBMS Data Stream System Results Results Query Attr1 Attr2 Attr3 Data Queries, Rules, Event Specs, Subscriptions Data at rest One-time or periodic queries Query-driven execution Results returned once or periodically Data in motion, unending Continuous, long-running queries Data-driven execution Results returned continuously in real-time 11/6/13 7

8 Latency Time-critical Decisions Value of Data to Decision-Making Preventive/ Predictive Actionable Information Half-Life In Decision-Making Traditional Batch Business Intelligence Reactive Historical Real- Time Seconds Minutes Hours Days Time Slide Courtesy of Michael Franklin Months 11/6/13 8 8

9 History of Stream Processing v Early: Active DBs, ECA rules, Triggers, Data Broadcast v Field took off around 2002 ~10 SIGMOD/VLDB/ICDE stream papers thru since then (holding steady at 40-50/yr) v Lots of Research Systems-Building Projects AT&T Gigascope Berkeley TelegraphCQ->HIFI Brandeis, Brown, MIT - Aurora->Borealis Purdue Nile Stanford STREAM Wisconsin, Portland State NiagaraCQ v Lots of Technology Transfer: e.g., 11/6/13 9

10 Main Issues of Stream Management v Query languages v Single query processing v Multi-query optimization v Adaptive query processing v Real-time processing v Approximate answers v Out of order processing ü Sliding windows ü Non blocking, windowed operations ü Sharing ü Adaptivity ü Quality of service ü Confidence, sketching, sampling, wavelets... ü Punctuation, order-agnostic 11/6/13 10

11 Data Stream Model v A New Data Model: an infinite sequence of elements An element has a system time and an application time Elements arrive in order of system time (arrival order) If an application wants data in application time, need buffering & sorting If data comes out of order or very late, more work is needed to take care the order. (out of order processing) 11/6/13 11

12 Stream Query Languages v Extensions of SQL: SEQUIN [SLR 96], CQL [ABW03], TCQ [CCD +03], GSQL [CJS+03] v Continuous Query Language (CQL) FROM can address both streams and relations PARTITION BY creates sub-streams Window: extracts a set of tuples from an infinite stream and applies relational processing ROW, physical window in number of tuples; RANGE, logical window in terms of seconds, minutes, or hours (more generally, positions in an ordered domain) SLIDE, window movement. 11/6/13 12

13 Examples of CQL Sliding window join: Select * From S1 [Rows 5], S2 [Rows 10] Where S1.a = S2.a Streaming aggregates: Select Count(*) From S [Range 1 Minute] Select Count(*) From S [Range 1 Minute Slide 1 Minute] 11/6/13 13

14 Example Windows Tumbling: Non-overlapping windows, e.g., 5 second windows <Range 5 seconds Slide 5 seconds> Data Stream Sliding: Overlapping windows, e.g., 5 second windows move every 2 seconds < Range 5 seconds Slide 2 seconds> Data Stream Landmark: Growing windows, e.g., extend window every 2 seconds <LANDMARK ADVANCE 2 sec> [Truviso Language] Data Stream Slide Courtesy of Michael Franklin 11/6/13 14

15 Example Windows Partitioned: value-based sub-streams e.g. <Partition By attr Rows 2 Slide 2 > partitioned tumbling windows Data Stream Red Stream Blue Stream Green Stream Slide Courtesy of Michael Franklin 11/6/13 15

16 Formal Semantics v At time t, there is a window R_t, run query to obtain Q(R_t), t=1,2,3 v Collapse the results at adjacent time points based on identical values v Use Istream and Dstream to ensure that the output also conforms to the stream model 11/6/13 16

17 Main Issues of Stream Management v Query languages v Single query processing v Multi-query optimization v Adaptive query processing v Real-time processing v Approximate answers v Out of order processing ü Sliding windows ü Non blocking, windowed operations ü Sharing ü Adaptivity ü Quality of service ü Confidence, sketching, sampling, wavelets... ü Punctuation, order-agnostic 11/6/13 17

18 Single Query Processing v Blocking query operator A query operator that is unable to produce the first tuple of its output until it has seen its entire input. [BBD+02] Sorting? Sort-merge join? Nested-loops join? Hash join? v Non blocking query operator The operator does not stage data (either in memory or on disk) without producing results for a long time. [UF01] produces the initial result early returns result tuples incrementally as they become available 11/6/13 18

19 (1) Join: First A Blocking Operator Hash Table A Build Probe Source A Source B Assume first that each stream is finite and fits in memory. Traditional Hash Join blocks when one input stalls. 11/6/13 19

20 Non-Blocking Join Operator Hash Hash Hash Table ATable ATable B Build Probe Source A Source B Symmetric Hash Join processes tuples as they arrive. Produces all tuples in the join and no duplicates. Blocks only if both inputs stall. 11/6/13 20

21 Non Blocking Join with Limited Memory v SHJ requires both inputs to be memory resident. Unrealistic in stream processing v XJoin extends SHJ to work for long (but finite) streams with limited memory [UF00]. Partition each input using a hash function. When allocated memory is used up, flush a partition to disk. At each point, a partition = disk-resident data (older) + memory-resident data (newer). Join processing continues on memory-resident data. Disk-resident tuples are handled in background. 11/6/13 21

22 XJoin: Handling the Partitions M E M O R Y 1 Memory-resident partitions of source A n Memory-resident partitions of source B 1 k n flush Tuple A hash(tuple A) = 1 D I S K 1... n 1... k... n Tuple B hash(tuple B) = n SOURCE-A Disk-resident partitions of source A Disk-resident partitions of source B SOURCE-B 11/6/13 22

23 Stage 1: Memory-to-Memory Joins Output Partitions of source A Partitions of source B M E M O R Y i j i j build probe probe build Tuple A hash(record A) = i Tuple B hash(record B) = j SOURCE-A SOURCE-B 11/6/13 23

24 Stage 2: Disk-to-Memory Joins Output Partitions of source A Partitions of source B M E M O R Y i i i i..... D I S K Partitions of source A Partitions of source B 11/6/13 24

25 Three Stages of XJoin v Stage 1 - Symmetric hash join (memory-tomemory) with partitions v Stage 2- Disk-to-memory Separate thread - runs when stage 1 blocks (both input stall). Stage 1 and 2 interleave until all tuples have been received. v Stage 3 - Clean up stage Stage 1 misses pairs that were not in memory concurrently. Stage 2 misses pairs when both are on disk, and may not get to run to completion. Stage 3 joins all the partitions (memory-resident and diskresident portions) of the two sources. 11/6/13 25

26 Windowed Joins v Windowed joins Use windows to obtain a finite set from an infinite stream Basis is non-blocking symmetric hash join Add support for windows Window size: in number of tuples or logical time Sliding of window: smooth movement or hopping Maintain right state (tuples residing in the window) for the join; need to prune expired tuples from hash tables! [MAF+03, GO03, KNV03, MLA04] 11/6/13 26

27 (2) Windowed Aggregates v Windowed aggregates: Use windows to obtain a finite set from an infinite stream Apply MIN, MAX, SUM, COUNT, or AVG to each window Avoid scanning all tuples in each window for time efficiency Avoid storing all tuples in each window for space efficiency What is the time/space complexity for each aggr operator? v Windowed SUM/COUNT Symmetry between expired tuples and new tuples [SLR96] v Windowed MIN/MAX/TOP-K A naïve solution may have to store all tuples in each window How can we do better? [LMT+05, KWF+06, MBP+06, ZYC+07, YSR+11] 11/6/13 27

28 Predictability Property of Windows v When a tuple arrives, can predict 1) to which future windows the tuple will belong 2) whether the tuple belongs to the partial result (e.g., top-k) of each window These insights lead to a predicted top-k result for each of the future windows v When a new tuple arrives, it belongs to its own set of current and future windows it is also used to update the previously predicted top-k result for each relevant window 11/6/13 28

29 et. top-k ecifiw, the in the needs in the pted: ously ofall using putaategy. s topance ation. caners of be rensive tream tain a could ost of indicates its F score, while the position on the X-axis indicates the time it arrived at the system. Figure 1: (Predicted) Top-k results four consecutive windows at time of W 0 (slide size = 4 objects) Yang Yanlei et al. Diao, An Optimal University Strategy of Massachusetts for Monitoring Amherst Top-K Queries in 11/6/13 Streaming Windows. EDBT Based on the objects in W 0,wenotonlycalculatethetop-k(k=3) result in W 0,butwealsopre-determinethepotentialtop-kresults

30 updated predicted top-k results of our running example (Figure 1) after the insertion of four new objects. ject 14 is and thus f For the jects to m The size f Theore jects in pr Figure 2: Updated predicted top-k results of four consecutive windows at time of W 1 (slide size = 4 objects) Yang Yanlei et al. Diao, An Optimal University Strategy of Massachusetts for Monitoring Amherst Top-K Queries in 11/6/13 Streaming Windows. EDBT For the is simply each of th significan especially summary, C nw k Conclu fered by t keeps the monitorin very large processing no longer ment over cessing an tion are re

Design, Implementation, and Evaluation of Network Monitoring Tasks for the Borealis Stream Processing Engine

Design, Implementation, and Evaluation of Network Monitoring Tasks for the Borealis Stream Processing Engine University of Oslo Department of Informatics Design, Implementation, and Evaluation of Network Monitoring Tasks for the Borealis Stream Processing Engine Master s Thesis 30 Credits Morten Lindeberg May

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Electric Power Consumption Data

Electric Power Consumption Data Using Data Stream Management Systems to analyze Electric Power Consumption Data Talel Abdessalem, Raja Chiky, Georges Hébrail and Jean-Louis Vitti Ecole Nationale Supérieure des Télécommunications, Electricité

More information

See the wood for the trees

See the wood for the trees See the wood for the trees Dr. Harald Schöning Head of Research The world is becoming digital socienty government economy Digital Society Digital Government Digital Enterprise 2 Data is Getting Bigger

More information

Cloud and Big Data Summer School, Stockholm, Aug. 2015 Jeffrey D. Ullman

Cloud and Big Data Summer School, Stockholm, Aug. 2015 Jeffrey D. Ullman Cloud and Big Data Summer School, Stockholm, Aug. 2015 Jeffrey D. Ullman 2 In a DBMS, input is under the control of the programming staff. SQL INSERT commands or bulk loaders. Stream management is important

More information

Data Mining for Knowledge Management. Mining Data Streams

Data Mining for Knowledge Management. Mining Data Streams Data Mining for Knowledge Management Mining Data Streams Themis Palpanas University of Trento http://dit.unitn.it/~themis Spring 2007 Data Mining for Knowledge Management 1 Motivating Examples: Production

More information

Models and Issues in Data Stream Systems

Models and Issues in Data Stream Systems Models and Issues in Data Stream Systems Brian Babcock Shivnath Babu Mayur Datar Rajeev Motwani Jennifer Widom Department of Computer Science Stanford University Stanford, CA 94305 babcock,shivnath,datar,rajeev,widom

More information

Supporting Views in Data Stream Management Systems

Supporting Views in Data Stream Management Systems 1 Supporting Views in Data Stream Management Systems THANAA M. GHANEM University of St. Thomas AHMED K. ELMAGARMID Purdue University PER-ÅKE LARSON Microsoft Research and WALID G. AREF Purdue University

More information

Data Stream Management Systems

Data Stream Management Systems Data Stream Management Systems Principles of Modern Database Systems 2007 Tore Risch Dept. of information technology Uppsala University Sweden Tore Risch Uppsala University, Sweden What is a Data Base

More information

Data Stream Management

Data Stream Management Data Stream Management Synthesis Lectures on Data Management Editor M. Tamer Özsu, University of Waterloo Synthesis Lectures on Data Management is edited by Tamer Özsu of the University of Waterloo. The

More information

ucosminexus Stream Data Platform - Application Framework Description 3020-3-V01(E)

ucosminexus Stream Data Platform - Application Framework Description 3020-3-V01(E) ucosminexus Stream Data Platform - Application Framework Description 3020-3-V01(E) Relevant program products P-2464-9B17 ucosminexus Stream Data Platform - Application Framework 01-00 (for Windows Server

More information

Infrastructures for big data

Infrastructures for big data Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)

More information

SASE: Complex Event Processing over Streams

SASE: Complex Event Processing over Streams SASE: Complex Processing over Streams Daniel Gyllstrom 1 Eugene Wu 2 Hee-Jin Chae 1 Yanlei Diao 1 Patrick Stahlberg 1 Gordon Anderson 1 1 Department of Computer Science University of Massachusetts, Amherst

More information

Web Traffic Capture. 5401 Butler Street, Suite 200 Pittsburgh, PA 15201 +1 (412) 408 3167 www.metronomelabs.com

Web Traffic Capture. 5401 Butler Street, Suite 200 Pittsburgh, PA 15201 +1 (412) 408 3167 www.metronomelabs.com Web Traffic Capture Capture your web traffic, filtered and transformed, ready for your applications without web logs or page tags and keep all your data inside your firewall. 5401 Butler Street, Suite

More information

Query Processing C H A P T E R12. Practice Exercises

Query Processing C H A P T E R12. Practice Exercises C H A P T E R12 Query Processing Practice Exercises 12.1 Assume (for simplicity in this exercise) that only one tuple fits in a block and memory holds at most 3 blocks. Show the runs created on each pass

More information

Active RFID Solutions for Asset Tracking and Inventory Management

Active RFID Solutions for Asset Tracking and Inventory Management Active RFID Solutions for Asset Tracking and Inventory Management Introduction RFID (Radio Frequency Identification) technology is fast replacing ScanCode technology for asset tracking and inventory management.

More information

Complex Event Processing (CEP) Why and How. Richard Hallgren BUGS 2013-05-30

Complex Event Processing (CEP) Why and How. Richard Hallgren BUGS 2013-05-30 Complex Event Processing (CEP) Why and How Richard Hallgren BUGS 2013-05-30 Objectives Understand why and how CEP is important for modern business processes Concepts within a CEP solution Overview of StreamInsight

More information

Aurora: a new model and architecture for data stream management

Aurora: a new model and architecture for data stream management Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey 2, Sangdon Lee 2, Michael Stonebraker 3, Nesime Tatbul

More information

Middleware support for the Internet of Things

Middleware support for the Internet of Things Middleware support for the Internet of Things Karl Aberer, Manfred Hauswirth, Ali Salehi School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne,

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

More information

Big Data and Advanced Analytics Technologies for the Smart Grid

Big Data and Advanced Analytics Technologies for the Smart Grid 1 Big Data and Advanced Analytics Technologies for the Smart Grid Arnie de Castro, PhD SAS Institute IEEE PES 2014 General Meeting July 27-31, 2014 Panel Session: Using Smart Grid Data to Improve Planning,

More information

Data Mining on Streams

Data Mining on Streams Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

More information

Solving big data problems in real-time with CEP and Dashboards - patterns and tips

Solving big data problems in real-time with CEP and Dashboards - patterns and tips September 10-13, 2012 Orlando, Florida Solving big data problems in real-time with CEP and Dashboards - patterns and tips Kevin Wilson Karl Kwong Learning Points Big data is a reality and organizations

More information

Architectural Considerations for Distributed RFID Tracking and Monitoring. Department of Computer Science Univeristy of Massachusetts Amherst

Architectural Considerations for Distributed RFID Tracking and Monitoring. Department of Computer Science Univeristy of Massachusetts Amherst Architectural Considerations for Distributed RFID Tracking and Monitoring Zhao Cao, Yanlei Diao, and Prashant Shenoy Department of Computer Science Univeristy of Massachusetts Amherst Abstract In this

More information

Design, Implementation, and Evaluation of Network Monitoring Tasks with the TelegraphCQ Data Stream Management System

Design, Implementation, and Evaluation of Network Monitoring Tasks with the TelegraphCQ Data Stream Management System Design, Implementation, and Evaluation of Network Monitoring Tasks with the TelegraphCQ Data Stream Management System INF5100, Autumn 2006 Jarle Content Introduction Streaming Applications Data Stream

More information

RFID BASED VEHICLE TRACKING SYSTEM

RFID BASED VEHICLE TRACKING SYSTEM RFID BASED VEHICLE TRACKING SYSTEM Operating a managed, busy parking lot can pose significant challenges, especially to a government organization that also owns some of the vehicles in the lot. The parking

More information

WAMLocal. Wireless Asset Monitoring - Local Food Safety Software. Software Installation and User Guide BA/WAM-L-F

WAMLocal. Wireless Asset Monitoring - Local Food Safety Software. Software Installation and User Guide BA/WAM-L-F Wireless Asset Monitoring - Local Food Safety Software BA/WAM-L-F Software Installation and User Guide System Overview The BAPI Wireless Asset Monitoring Local (WAM Local) Software receives temperature

More information

Data Management in Sensor Networks

Data Management in Sensor Networks Data Management in Sensor Networks Ellen Munthe-Kaas Jarle Søberg Hans Vatne Hansen INF5100 Autumn 2011 1 Outline Sensor networks Characteristics TinyOS TinyDB Motes Application domains Data management

More information

High-Volume Data Warehousing in Centerprise. Product Datasheet

High-Volume Data Warehousing in Centerprise. Product Datasheet High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified

More information

SQL Query Evaluation. Winter 2006-2007 Lecture 23

SQL Query Evaluation. Winter 2006-2007 Lecture 23 SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution

More information

Big Data and Scripting map/reduce in Hadoop

Big Data and Scripting map/reduce in Hadoop Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb

More information

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries

More information

Availability Digest. www.availabilitydigest.com. Raima s High-Availability Embedded Database December 2011

Availability Digest. www.availabilitydigest.com. Raima s High-Availability Embedded Database December 2011 the Availability Digest Raima s High-Availability Embedded Database December 2011 Embedded processing systems are everywhere. You probably cannot go a day without interacting with dozens of these powerful

More information

What is LOG Storm and what is it useful for?

What is LOG Storm and what is it useful for? What is LOG Storm and what is it useful for? LOG Storm is a high-speed digital data logger used for recording and analyzing the activity from embedded electronic systems digital bus and data lines. It

More information

Combining Sequence Databases and Data Stream Management Systems Technical Report Philipp Bichsel ETH Zurich, 2-12-2007

Combining Sequence Databases and Data Stream Management Systems Technical Report Philipp Bichsel ETH Zurich, 2-12-2007 Combining Sequence Databases and Data Stream Management Systems Technical Report Philipp Bichsel ETH Zurich, 2-12-2007 Abstract This technical report explains the differences and similarities between the

More information

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860 Java DB Performance Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860 AGENDA > Java DB introduction > Configuring Java DB for performance > Programming tips > Understanding Java DB performance

More information

16.1 MAPREDUCE. For personal use only, not for distribution. 333

16.1 MAPREDUCE. For personal use only, not for distribution. 333 For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several

More information

Security Analytics Topology

Security Analytics Topology Security Analytics Topology CEP = Stream Analytics Hadoop = Batch Analytics Months to years LOGS PKTS Correlation with Live in Real Time Meta, logs, select payload Decoder Long-term, intensive analysis

More information

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771 ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced

More information

Partitioning under the hood in MySQL 5.5

Partitioning under the hood in MySQL 5.5 Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael Ronström, Partitioning author Who are we? Mikael is a founder of the technology behind NDB

More information

Network congestion control using NetFlow

Network congestion control using NetFlow Network congestion control using NetFlow Maxim A. Kolosovskiy Elena N. Kryuchkova Altai State Technical University, Russia Abstract The goal of congestion control is to avoid congestion in network elements.

More information

Data management in Wireless Sensor Networks (WSN)

Data management in Wireless Sensor Networks (WSN) Data management in Wireless Sensor Networks (WSN) Giuseppe Amato ISTI-CNR giuseppe.amato@isti.cnr.it Outline Data management in WSN Query processing in WSN State of the art Future research directions Outline

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

An Intelligent Middleware Platform and Framework for RFID Reverse Logistics

An Intelligent Middleware Platform and Framework for RFID Reverse Logistics International Journal of Future Generation Communication and Networking 75 An Intelligent Middleware Platform and Framework for RFID Reverse Logistics Jihyun Yoo, and Yongjin Park Department of Electronics

More information

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data : High-throughput and Scalable Storage Technology for Streaming Data Munenori Maeda Toshihiro Ozawa Real-time analytical processing (RTAP) of vast amounts of time-series data from sensors, server logs,

More information

A Platform for Scalable One-Pass Analytics using MapReduce

A Platform for Scalable One-Pass Analytics using MapReduce A Platform for Scalable One-Pass Analytics using MapReduce Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, Prashant Shenoy Department of Computer Science University of Massachusetts, Amherst, Massachusetts,

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

RFID 101: Using RFID to Manage School Assets and Achieve Huge Savings

RFID 101: Using RFID to Manage School Assets and Achieve Huge Savings RFID 101: Using RFID to Manage School Assets and Achieve Huge Savings Are You Missing Out On Huge Savings through Better Asset Management? Many schools around the country have implemented wireless networking

More information

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview: Performance Counters Technical Data Sheet Microsoft SQL Overview: Key Features and Benefits: Key Definitions: Performance counters are used by the Operations Management Architecture (OMA) to collect data

More information

Limitations of Packet Measurement

Limitations of Packet Measurement Limitations of Packet Measurement Collect and process less information: Only collect packet headers, not payload Ignore single packets (aggregate) Ignore some packets (sampling) Make collection and processing

More information

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context MapReduce Jeffrey Dean and Sanjay Ghemawat Background context BIG DATA!! o Large-scale services generate huge volumes of data: logs, crawls, user databases, web site content, etc. o Very useful to be able

More information

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the

More information

What's New in SAS Data Management

What's New in SAS Data Management Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases

More information

Implementation of a Hardware Architecture to Support High-speed Database Insertion on the Internet

Implementation of a Hardware Architecture to Support High-speed Database Insertion on the Internet Implementation of a Hardware Architecture to Support High-speed Database Insertion on the Internet Yusuke Nishida 1 and Hiroaki Nishi 1 1 A Department of Science and Technology, Keio University, Yokohama,

More information

Configuring Apache Derby for Performance and Durability Olav Sandstå

Configuring Apache Derby for Performance and Durability Olav Sandstå Configuring Apache Derby for Performance and Durability Olav Sandstå Sun Microsystems Trondheim, Norway Agenda Apache Derby introduction Performance and durability Performance tips Open source database

More information

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall. Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall. 5401 Butler Street, Suite 200 Pittsburgh, PA 15201 +1 (412) 408 3167 www.metronomelabs.com

More information

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level) Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster

More information

Relational Database: Additional Operations on Relations; SQL

Relational Database: Additional Operations on Relations; SQL Relational Database: Additional Operations on Relations; SQL Greg Plaxton Theory in Programming Practice, Fall 2005 Department of Computer Science University of Texas at Austin Overview The course packet

More information

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor The research leading to these results has received funding from the European Union's Seventh Framework

More information

Data Stream Management and Complex Event Processing in Esper. INF5100, Autumn 2010 Jarle Søberg

Data Stream Management and Complex Event Processing in Esper. INF5100, Autumn 2010 Jarle Søberg Data Stream Management and Complex Event Processing in Esper INF5100, Autumn 2010 Jarle Søberg Outline Overview of Esper DSMS and CEP concepts in Esper Examples taken from the documentation A lot of possibilities

More information

Accelerating Development and Troubleshooting of Data Center Bridging (DCB) Protocols Using Xgig

Accelerating Development and Troubleshooting of Data Center Bridging (DCB) Protocols Using Xgig White Paper Accelerating Development and Troubleshooting of The new Data Center Bridging (DCB) protocols provide important mechanisms for enabling priority and managing bandwidth allocations between different

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Active database systems. Triggers. Triggers. Active database systems.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Active database systems. Triggers. Triggers. Active database systems. Active database systems Database Management Systems Traditional DBMS operation is passive Queries and updates are explicitly requested by users The knowledge of processes operating on data is typically

More information

Container tracking solution. HUG DIJON 2006 Jean-Christophe Lecosse

Container tracking solution. HUG DIJON 2006 Jean-Christophe Lecosse 1 Container tracking solution HUG DIJON 2006 Jean-Christophe Lecosse Contents Geodis Global logistic provider Expertise Healthcare Our RFID approach Optitr@ck: container tracking solution CHU Dijon project

More information

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election

More information

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC MyOra 3.0 SQL Tool for Oracle User Guide Jayam Systems, LLC Contents Features... 4 Connecting to the Database... 5 Login... 5 Login History... 6 Connection Indicator... 6 Closing the Connection... 7 SQL

More information

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data Yinong Chen 2 Big Data Big Data Technologies Cloud Computing Service and Web-Based Computing Applications Industry Control Systems

More information

Chuck Cranor, Ted Johnson, Oliver Spatscheck

Chuck Cranor, Ted Johnson, Oliver Spatscheck Gigascope: How to monitor network traffic 5Gbit/sec at a time. Chuck Cranor, Ted Johnson, Oliver Spatscheck June, 2003 1 Outline Motivation Illustrative applications Gigascope features Gigascope technical

More information

Radio Frequency Identification (RFID) An Overview

Radio Frequency Identification (RFID) An Overview Radio Frequency Identification (RFID) An Overview How RFID Is Changing the Business Environment Today Radio frequency identification (RFID) technology has been in use for several decades to track and identify

More information

Warehouse Management System

Warehouse Management System Warehouse Management System Solvo's WMS is an application suite designed to optimize warehouse operations. Solvo's solutions manage the entire warehouse operation cycle in a real time mode. The System

More information

FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data

FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez To cite this version: Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez. FP-Hadoop:

More information

Adaptive Join Operators for Result Rate Optimization on Streaming Inputs

Adaptive Join Operators for Result Rate Optimization on Streaming Inputs 1 Adaptive Join Operators for Result Rate Optimization on Streaming Inputs Mihaela A. Bornea, Vasilis Vassalos, Yannis Kotidis #1, Antonios Deligiannakis 2 # Department of Informatics, Department of Electronic

More information

Software Requirements Specification. Schlumberger Scheduling Assistant. for. Version 0.2. Prepared by Design Team A. Rice University COMP410/539

Software Requirements Specification. Schlumberger Scheduling Assistant. for. Version 0.2. Prepared by Design Team A. Rice University COMP410/539 Software Requirements Specification for Schlumberger Scheduling Assistant Page 1 Software Requirements Specification for Schlumberger Scheduling Assistant Version 0.2 Prepared by Design Team A Rice University

More information

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011 Real-time Analytics at Facebook: Data Freeway and Puma Zheng Shao 12/2/2011 Agenda 1 Analytics and Real-time 2 Data Freeway 3 Puma 4 Future Works Analytics and Real-time what and why Facebook Insights

More information

Using SQL Server 2014 In-Memory Optimized Columnstore with SAP BW

Using SQL Server 2014 In-Memory Optimized Columnstore with SAP BW Using SQL Server 2014 In-Memory Optimized Columnstore with SAP BW Applies to: SAP Business Warehouse 7.0 and higher running on Microsoft SQL Server 2014 and higher Summary SQL Server 2014 In-Memory Optimized

More information

Analysis of QoS parameters of VOIP calls over Wireless Local Area Networks

Analysis of QoS parameters of VOIP calls over Wireless Local Area Networks Analysis of QoS parameters of VOIP calls over Wireless Local Area Networks Ayman Wazwaz, Computer Engineering Department, Palestine Polytechnic University, Hebron, Palestine, aymanw@ppu.edu Duaa sweity

More information

Network Simulation Traffic, Paths and Impairment

Network Simulation Traffic, Paths and Impairment Network Simulation Traffic, Paths and Impairment Summary Network simulation software and hardware appliances can emulate networks and network hardware. Wide Area Network (WAN) emulation, by simulating

More information

DSMS Part 1 & 2. Data Management: Comparison - DBS versus DSMS. Handle Data Streams in DBS?

DSMS Part 1 & 2. Data Management: Comparison - DBS versus DSMS. Handle Data Streams in DBS? DSMS Part 1 & 2 Data Stream Management Systems - Applications, Concepts, and Systems Vera Goebel Department of Informatics University of Oslo 1 Introduction: What are DSMS? (terms) Why do we need DSMS?

More information

Detecting Network Anomalies. Anant Shah

Detecting Network Anomalies. Anant Shah Detecting Network Anomalies using Traffic Modeling Anant Shah Anomaly Detection Anomalies are deviations from established behavior In most cases anomalies are indications of problems The science of extracting

More information

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015 Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours

More information

There are numerous ways to access monitors:

There are numerous ways to access monitors: Remote Monitors REMOTE MONITORS... 1 Overview... 1 Accessing Monitors... 1 Creating Monitors... 2 Monitor Wizard Options... 11 Editing the Monitor Configuration... 14 Status... 15 Location... 17 Alerting...

More information

Inge Os Sales Consulting Manager Oracle Norway

Inge Os Sales Consulting Manager Oracle Norway Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database

More information

Data Integrator Performance Optimization Guide

Data Integrator Performance Optimization Guide Data Integrator Performance Optimization Guide Data Integrator 11.7.2 for Windows and UNIX Patents Trademarks Copyright Third-party contributors Business Objects owns the following

More information

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb, Consulting MTS The following is intended to outline our general product direction. It is intended for information

More information

WHITE PAPER. ABCs of RFID

WHITE PAPER. ABCs of RFID WHITE PAPER ABCs of RFID Understanding and using Radio Frequency Identification Basics - Part 1 B.Muthukumaran Chief Consultant Innovation & Leadership Gemini Communication Ltd #1, Dr.Ranga Road, 2nd Street,

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com NetFlow Tracker Overview Mike McGrath x ccie CTO mike@crannog-software.com 2006 Copyright Crannog Software www.crannog-software.com 1 Copyright Crannog Software www.crannog-software.com 2 LEVELS OF NETWORK

More information

System Services. Engagent System Services 2.06

System Services. Engagent System Services 2.06 System Services Engagent System Services 2.06 Overview Engagent System Services constitutes the central module in Engagent Software s product strategy. It is the glue both on an application level and on

More information

ImagineWorldClient Client Management Software. User s Manual. (Revision-2)

ImagineWorldClient Client Management Software. User s Manual. (Revision-2) ImagineWorldClient Client Management Software User s Manual (Revision-2) (888) 379-2666 US Toll Free (905) 336-9665 Phone (905) 336-9662 Fax www.videotransmitters.com 1 Contents 1. CMS SOFTWARE FEATURES...4

More information

INTRUSION PREVENTION AND EXPERT SYSTEMS

INTRUSION PREVENTION AND EXPERT SYSTEMS INTRUSION PREVENTION AND EXPERT SYSTEMS By Avi Chesla avic@v-secure.com Introduction Over the past few years, the market has developed new expectations from the security industry, especially from the intrusion

More information

CIS 631 Database Management Systems Sample Final Exam

CIS 631 Database Management Systems Sample Final Exam CIS 631 Database Management Systems Sample Final Exam 1. (25 points) Match the items from the left column with those in the right and place the letters in the empty slots. k 1. Single-level index files

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Fall 2007 Lecture 1 - Class Introduction

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Fall 2007 Lecture 1 - Class Introduction CSE 544 Principles of Database Management Systems Magdalena Balazinska (magda) Fall 2007 Lecture 1 - Class Introduction Outline Introductions Class overview What is the point of a db management system

More information

Technical Challenges for Big Health Care Data. Donald Kossmann Systems Group Department of Computer Science ETH Zurich

Technical Challenges for Big Health Care Data. Donald Kossmann Systems Group Department of Computer Science ETH Zurich Technical Challenges for Big Health Care Data Donald Kossmann Systems Group Department of Computer Science ETH Zurich What is Big Data? technologies to automate experience Purpose answer difficult questions

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design Physical Database Design Process Physical Database Design Process The last stage of the database design process. A process of mapping the logical database structure developed in previous stages into internal

More information

Optimizing Performance. Training Division New Delhi

Optimizing Performance. Training Division New Delhi Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,

More information

ASSET TRACKING USING RFID SRAVANI.P(07241A12A7) DEEPTHI.B(07241A1262) SRUTHI.B(07241A12A3)

ASSET TRACKING USING RFID SRAVANI.P(07241A12A7) DEEPTHI.B(07241A1262) SRUTHI.B(07241A12A3) ASSET TRACKING USING RFID BY SRAVANI.P(07241A12A7) DEEPTHI.B(07241A1262) SRUTHI.B(07241A12A3) OBJECTIVE Our main objective is to acquire an asset tracking system. This keeps track of all the assets you

More information

Technical Bulletin. Enabling Arista Advanced Monitoring. Overview

Technical Bulletin. Enabling Arista Advanced Monitoring. Overview Technical Bulletin Enabling Arista Advanced Monitoring Overview Highlights: Independent observation networks are costly and can t keep pace with the production network speed increase EOS eapi allows programmatic

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information

Tertiary Storage and Data Mining queries

Tertiary Storage and Data Mining queries An Architecture for Using Tertiary Storage in a Data Warehouse Theodore Johnson Database Research Dept. AT&T Labs - Research johnsont@research.att.com Motivation AT&T has huge data warehouses. Data from

More information