Efficient Data Streams Processing in the Real Time Data Warehouse

Size: px
Start display at page:

Download "Efficient Data Streams Processing in the Real Time Data Warehouse"

Transcription

1 Efficient Data Streams Processing in the Real Time Data Warehouse Fiaz Majeed Muhammad Sohaib Mahmood Mujahid Iqbal Abstract Today many business applications are generating fast, multiple, continuous and time varying data streams [2] which are not manageable from available ETL (Extraction, Transformation and Loading) technology. This work gives architecture to extend the real time data warehouse for efficient handling of data streams while keeping traditional functionalities. The paper focuses on the data stream processing part of the architecture. The data stream processor constructed with the combination of mature techniques performs data stream processing reliably. It ensures synchronization of the data streams processing with the pace of the incoming data streams from source applications. Keywords-Data warehouse, Real-time, active databases, data streams I. INTRODUCTION The applications generating data streams include click stream based web applications, network monitoring, security, sensor, telecommunication and manufacturing applications [4] etc. In contrast to other transactional applications where limited numbers of transactions are carried out in uncontinuous manner like order processing applications, data stream applications continuously generate data in non uniform way [2]. Sometimes data may produce in trickle feed [5]; it may come in the form of bursts in other time interval. So it is very difficult to handle this form of data with traditional methods. Data stream applications are the source of most of the data warehouse systems. Data warehouse systems [14] are used in enterprises to take the dispersed data from heterogeneous source systems into a central container. It provides a single view of enterprise to the executives for strategic decision making. They can analyze current state of their business as well as predict for future with the use of historical data in the data warehouse. Traditionally, data warehouses were updated with the source data weekly or daily in nightly batches. That time those were built only for strategic analysis. For current analysis, operational systems were used. Now business is demanding for tactical analysis as well as strategic decision making from the data warehouse. For this purpose, data warehouse is required to be updated in real time when transaction is happened in any of the source systems. Therefore, real-time data warehouse fulfills this with closeloop functionality. Enterprises are using real-time data warehouse with traditional functionalities. They can add new functionality in the existing system rather than deploy a whole new system. The data stream handling is presented in the proposed architecture as independent module which does not overloads the working of the ETL tool. Some proposed works on data streams [1]; design systems from scratch that divert from the traditional functionalities. The rest of the paper is formatted as follows: Section II details the related work about real time data warehouse and data streams. In Section III, RTDW architecture with the inclusion of data streams is formulated. Data streams processing is discussed in Section IV. Finally, Section V concludes the work and provides future directions. II. RELATED WORK Data streams have been extensively studied in different domains. The potential issue in data stream handling is memory management. It is very difficult to store unlimited data streams in limited memory. Grid Technology [1] was used to cope over the storage requirement in the presence of data streams. In this solution, data streams are captured and stored in distributed grid nodes. These nodes contain high volume of storage capacity for huge quantity of data streams. Data stream processing systems normally hold fixed storage and computing power. In this case, the challenge is to get all the valuable contents from data streams in the presence of limited storage. Since, approximation techniques [2, 3, and 4] were produced to generate summary streams for handling the issue of small storage. Many techniques of summaries (or synopsis) creation have been proposed in the /10/$ IEEE 57

2 literature. These contain sampling, histograms and wavelets [11] etc. provides approximate results closer to accurate input data. The congressional samples [6] calculate approximate results for group by queries. This proposed technique discusses the problem of uniform random sampling and gives a notion of bias sampling to get valuable contents from the data. Architecture for continuous queries over data streams is presented in [4] which provide a way to contain the maximum data streams in short memory. It divides the storage in four containers named stream, store, scratch and throw. Stream holds the continuous processing elements, store saves those streams which are to be required after short period. Scratch contains streams for use in future analysis. The data no more beneficial is disposed through throw container. A data streams solution extract them using queue networks [8] in which streams are stored in queues before processing. ETL performance is then evaluated using queue theory. III. ARCHITECTURE OF THE REAL TIME DATA WAREHOUSE The focus of underlying research is to extract the data streams in efficient way, process them and load in a suitable format into the data warehouse. Data warehouse architecture with the extension of data streams handling is depicted in Fig. 1. conducted on WAN application performance [13], the web traffic is more than 25% of the overall WAN traffic. It is increasing rapidly with the increasing number of users all over the world. Following sections discuss in depth the data streams processing of real-time data warehouse. IV. DATA STREAM PROCESSING The click streams are pushed [10] from the web application to the stream processor to load them into the data warehouse. As depicted in Fig. 2, stream processor takes stream items as input, process and converts them into the format suitable for the ODS. Figure 2. Data Streams processing before Data Warehouse insertion The stream processing part of the architecture ensures memory management, synchronization and accurate processing. The challenge is to manage the fixed memory in the presence of heavy, burst and time varying data streams. Figure 1. RTDW architecture extended with Data Streams processing Real-time data is extracted from the source applications based on event driven approach and loaded into the Operational Data Store (ODS). When transaction is occurred in any of the source systems, it is detected and sent to the ODS. The novel part of the architecture is data stream handling which is the main focus of this research work. It is classified into three areas that are data stream extraction, processing and loading. According to the architecture, the data streams and other source systems are integrated in the ODS. Then real-time analysis and reports are generated from the ODS by the operational users. The load images are sent to data warehouse in batches from the ODS as it is done traditionally in enterprises. This work defines a general architecture address all type of data streams. For the elaboration of the components in the architecture, click streams are used. According to the survey A. Continuous Queries As we have already discussed that data streams arrive in unbounded size. The handling of these heavy data streams requires huge storage and computing technology which is unaffordable. The size of the data streams can be reduced by filtering the irrelevant data streams. The data streams processor filters them by using continuous query [4]. Data streams that pass the filtration criteria are allowed to enter in the stream processor. Remaining streams that do not follow the criteria are discarded. The queries are registered to the system before the execution. Continuous queries run continuously and evaluate the arriving data streams. The predicates defined in the query are used as filtering criteria. The memory in this way is now capable to carry maximum number of data streams and increase the efficiency of the system. The filtration through continuous query is shown in Fig. 3. Figure 3. Filtration through Continuous Queries 58

3 B. Data Streams Approximation The approximation techniques are broadly explored in query processing context. As the databases are mostly huge reservoirs of data in the enterprises. If a query is evaluated on the whole database, it takes long time for achieving results. The user composing the query expects faster response from the database management system (DBMS). The DBMS evaluates the query on stored datasets and return results within nanoseconds to the user. In fact, the query processor of DBMS makes it possible with the use of summarization techniques. It uses algorithms that compute summary of the detailed data in one pass and provide approximate results on the basis of those summaries. The results are not completely accurate but close to the accurate answers. Especially, data warehouse environment which stores historical data of decades. The query processing in that environment is highly complicated. If query is evaluated on detailed data in data warehouse, it takes days to compute the results which are unaffordable by the users. In this work, sampling [6] is used to produce approximate results. In sampling technique, the dataset is divided into equal parts and small samples are picked where each sample represents the essential characteristics of a part of data. The approximations are started when the size of data streams increase from a specified threshold level set on memory. This process is continued until the size of data streams reach below the threshold. There are two cases when data streams cross the threshold level of memory. First, when data streams arrive from the application. Due to their unpredictable nature, these might not be adjusted in memory. Second, if the rate of processing to data streams is not equal to their arrival rate then all data streams might not be processed due to lack of synchronization. In that case, extra data streams are discarded. The data streams processor generates summaries in both situations. 1) Approximations Production Alert The data streams are transformed in detail while memory is available to keep them for processing. As mentioned earlier, all valuable data can only be loaded into ODS and then to Real Time Data Warehouse (RTDW) if the quantity of data streams remains below the threshold. Otherwise extra data would be discarded due to lack of memory. To eliminate this risk, data should be summarized instantly when this condition occur. There should be applied some alert to monitor the data rate flow which informs the data streams processor to start the process of summarizations in case of imbalance in flow. Such alert in the data streams processor is implemented in the form of continuous query. Fig. 4 depicts the process of data streams processing that run normally until threshold level exceed. Figure 4. Data Stream approximations production alert The alert system generates exception when threshold level is exceeded from the specified limit with respect to memory. C. Regulate Flow of Data Streams The data streams processor is not able to process the data streams more than a fixed quantity in a unit time. The irregular and rush of data streams create disorder in their processing. They demand for high computing power of the data streams processor. The underlying real-time data warehouse architecture uses token bucket technique [12] to regulate the data streams arrive into the data streams processor. 1) Token Bucket Technique The token bucket holds tokens which are generated at every clock tick. The fixed number of streams can be transmitted on getting a token. It allows sending some burstness in the output with increasing rate of input streams. It discards tokens when fills up and saves the streams. By using this technique, data streams processor receive streams in a regular flow. It is capable to process data streams in a constant rate, thus increase the efficiency of the system. The valuable contents in the data streams are guaranteed to be stored in the real-time data warehouse reliably. Another advantage of this technique is the maintenance of synchronization between arrival and processing of data streams. Fig. 5 shows the use of token bucket technique in the data streams processing part of the proposed real-time data warehouse architecture. 59

4 restructure relations into streams reversely. In addition, processor creates initial extract files to integrate them with other source system s data into ODS. Figure 5. Regulate Streams using Token Bucket D. Format Conversion The click stream producing application generates items in web oriented formats. Currently, ODS structure is implemented in relational databases. Relational structures store data in two-dimensional format. Table 1 shows the relational structure. Source Bytes of Request (Host) TABLE I. Referring Page RELATIONAL FORMAT Date and Time of Request Browser Page Requested PID OID (HTTP protocol) Platform CID Pm " [24/May/2009 "GET P131 C dialip.mich.ne t ately.com/" "Mozilla/4.51 [en] (Win98; I)" :19:13: ] /images/tagline.gi f HTTP/1.0" Pm " [24/May/2009 "GET P131 O142 C dialip.mich.ne t ately.com/" "Mozilla/4.51 [en] (Win98; I)" :19:13: ] /images/bkgrnd.jp g HTTP/1.0" In current advanced technological environment, web applications use relational databases for data storage. If this is the case with source web applications, it is easy to transfer data streams to ODS with initial conversions. Most web applications running on legacy platforms store data in files. It is necessary to restructure the data streams into relational format before forwarding them to the ODS. Fig. 6 shows the mapping among streams and relations. Figure 6. Adapted from [3]: Mapping among Streams and Relations The conversion from streams to relations is performed by the data streams processor. The data streams processor guarantees the accurate conversion. It is also capable to E. Time Stamping Time dimension in the data warehouse has great importance for strategic analysis. The analyses in the data warehouse are performed against time dimension for forecasting and comparisons etc. Therefore, it is necessary to store each tuple with a time stamp. Especially, time stamp is required for each incoming stream item to store within arrival order. Two types of time stamps are defined [2] that are implicit and explicit. Implicit time stamps are appended by the system as a field. This type is used when streams do not already have time element. Explicit time stamps add an attribute for exact time information. A data model to deal with time delays in data warehouse is presented in [9]. It defines a time dimension includes three time stamps i.e. valid time, revelation time, and load time stamp. In our stream processing architecture, time stamps are assigned to the streams by the data streams processor. V. CONCLUSIONS Currently, a new class of applications are introduced which generate fast, multiple, continuous and time varying data streams [2, 3]. Most of the time, these applications generate data streams in the form of heavy bursts [4] which cannot be handled by existing ETL technology. The real time ETL and EAI (Enterprise Application Integration) tools built for the source systems generate data in non continuous form and little number of transactions is generated in a unit time, incapable to handle data streams. The existing solutions [1, 7] divert from the traditional functionality of the data warehouse which is the requirement of the enterprises. The real time data warehouse architecture with the inclusion of data stream management is presented in this work. Data stream management is divided into three parts that are extraction, processing and loading. This paper discusses in detail the data streams processing part. In each step, worthwhile techniques are used to make the data stream management efficient. The data streams processor takes valuable data contents from the data stream elements for achieving maximum accuracy. It is necessary to work on data streams extraction and loading according to the requirements of the data streams processor in the future work and required a solution of the integration of the data from both operational data sources and data streams applications. REFERENCES [1] N. M. Tho, A. M. Tjoa, Zero-latency Data Warehousing (ZLDWH): the state-of-the-art and experimental implementation approaches, In proceedings of 4th IEEE Intl. conference on computer science research, [2] Babcock., Models and issues in data stream systems, In proceedings of the 2002 ACM Symp on Principles of DatabaseSystems, June [3] Widomet., Query Processing, approximation, and resource management in a data stream management system, In proceedings of the CIDR Conference,

5 [4] S. Babu, J. Widom, Continuous queries over data streams, SIGMOD Record, 30(3): , sep [5] R. Basu, Challenges of Real-time Data Warehousing, DMReview article, [6] S. Acharya, B. Gibbons, V. Poosala, Congressional samples for approximate answering of group by queries, In proceedings of the special interest group on management of data, pages , [7] N. M. Tho, A.M. Tjoa, Zero latency data warehousing for heterogeneous data sources and continuous data streams, In proceedings of the 5th intl. conference on information integration, web applications and services, Jakarta, Indonesia, [8] P. Karakasidis, Vassiliadis, E. Pitoura, ETL queues for active data warehousing, In proceedings of IQIS, Pages 28-39, [9] R. Bruckner, A. M. Tjoa, Managing time consistency for active data warehouse environments, In proceedings of the intl. conference on data warehousing and knowledge discovery, [10] E. J. Kendall, E. K. Kendall, Information delivery systems: An exploration of web pull and push technologies, Tutorial, Volume1, Paper 14, April [11] S. Guha, N. Koudas, Approximating a data stream for querying and estimation: Algorithms and performance evaluation, In proceedings of the data engineering, [12] J. S. Turner, New directions in communications (or which way to the information age), IEEE Commun. Magazine, vol. 24, pp. 8-15, Oct [13] Blue Coat, WAN Application Performance, White Paper, [14] W.H. Inmon, Building the data warehouse, New York: Wiley,

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Chapter 5. Learning Objectives. DW Development and ETL

Chapter 5. Learning Objectives. DW Development and ETL Chapter 5 DW Development and ETL Learning Objectives Explain data integration and the extraction, transformation, and load (ETL) processes Basic DW development methodologies Describe real-time (active)

More information

The big data revolution

The big data revolution The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing

More information

A Service-oriented Dual-bus BAM System Model

A Service-oriented Dual-bus BAM System Model I.J. Engineering and Manufacturing, 2012,2, 1-7 Published Online April 2012 in MECS (http://www.mecs-press.net) DOI: 10.5815/ijem.2012.02.01 Available online at http://www.mecs-press.net/ijem A Service-oriented

More information

Middleware support for the Internet of Things

Middleware support for the Internet of Things Middleware support for the Internet of Things Karl Aberer, Manfred Hauswirth, Ali Salehi School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne,

More information

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil

More information

Reconciliation and best practices in a configuration management system. White paper

Reconciliation and best practices in a configuration management system. White paper Reconciliation and best practices in a configuration management system White paper Table of contents Introduction... 3 A reconciliation analogy: automobile manufacturing assembly... 3 Conflict resolution...

More information

Quality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS?

Quality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS? 18-345: Introduction to Telecommunication Networks Lectures 20: Quality of Service Peter Steenkiste Spring 2015 www.cs.cmu.edu/~prs/nets-ece Overview What is QoS? Queuing discipline and scheduling Traffic

More information

DSEC: A Data Stream Engine Based Clinical Information System *

DSEC: A Data Stream Engine Based Clinical Information System * DSEC: A Data Stream Engine Based Clinical Information System * Yu Fan, Hongyan Li **, Zijing Hu, Jianlong Gao, Haibin Liu, Shiwei Tang, and Xinbiao Zhou National Laboratory on Machine Perception, School

More information

Mario Guarracino. Data warehousing

Mario Guarracino. Data warehousing Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the

More information

Visionet IT Modernization Empowering Change

Visionet IT Modernization Empowering Change Visionet IT Modernization A Visionet Systems White Paper September 2009 Visionet Systems Inc. 3 Cedar Brook Dr. Cranbury, NJ 08512 Tel: 609 360-0501 Table of Contents 1 Executive Summary... 4 2 Introduction...

More information

An Ants Algorithm to Improve Energy Efficient Based on Secure Autonomous Routing in WSN

An Ants Algorithm to Improve Energy Efficient Based on Secure Autonomous Routing in WSN An Ants Algorithm to Improve Energy Efficient Based on Secure Autonomous Routing in WSN *M.A.Preethy, PG SCHOLAR DEPT OF CSE #M.Meena,M.E AP/CSE King College Of Technology, Namakkal Abstract Due to the

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Understanding traffic flow

Understanding traffic flow White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow

More information

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Abstract: Build a model to investigate system and discovering relations that connect variables in a database

More information

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the

More information

Fuzzy Active Queue Management for Assured Forwarding Traffic in Differentiated Services Network

Fuzzy Active Queue Management for Assured Forwarding Traffic in Differentiated Services Network Fuzzy Active Management for Assured Forwarding Traffic in Differentiated Services Network E.S. Ng, K.K. Phang, T.C. Ling, L.Y. Por Department of Computer Systems & Technology Faculty of Computer Science

More information

A Survey Study on Monitoring Service for Grid

A Survey Study on Monitoring Service for Grid A Survey Study on Monitoring Service for Grid Erkang You erkyou@indiana.edu ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide

More information

Congestion Control Overview

Congestion Control Overview Congestion Control Overview Problem: When too many packets are transmitted through a network, congestion occurs t very high traffic, performance collapses completely, and almost no packets are delivered

More information

The Role of Precise Timing in High-Speed, Low-Latency Trading

The Role of Precise Timing in High-Speed, Low-Latency Trading The Role of Precise Timing in High-Speed, Low-Latency Trading The race to zero nanoseconds Whether measuring network latency or comparing real-time trading data from different computers on the planet,

More information

1) A complete SCM solution includes customers, service providers and partners. Answer: TRUE Diff: 2 Page Ref: 304

1) A complete SCM solution includes customers, service providers and partners. Answer: TRUE Diff: 2 Page Ref: 304 Enterprise Systems for Management, 2e (Motiwalla/Thompson) Chapter 11 Supply Chain Management 1) A complete SCM solution includes customers, service providers and partners. Diff: 2 Page Ref: 304 2) SCM

More information

IFS-8000 V2.0 INFORMATION FUSION SYSTEM

IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence

More information

Near Real-time Data Warehousing with Multi-stage Trickle & Flip

Near Real-time Data Warehousing with Multi-stage Trickle & Flip Near Real-time Data Warehousing with Multi-stage Trickle & Flip Janis Zuters University of Latvia, 19 Raina blvd., LV-1586 Riga, Latvia janis.zuters@lu.lv Abstract. A data warehouse typically is a collection

More information

ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION

ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION Enzo Unified Solves Real-Time Data Integration Challenges that Increase Business Agility and Reduce Operational Complexities CHALLENGES

More information

A Comparison Study of Qos Using Different Routing Algorithms In Mobile Ad Hoc Networks

A Comparison Study of Qos Using Different Routing Algorithms In Mobile Ad Hoc Networks A Comparison Study of Qos Using Different Routing Algorithms In Mobile Ad Hoc Networks T.Chandrasekhar 1, J.S.Chakravarthi 2, K.Sravya 3 Professor, Dept. of Electronics and Communication Engg., GIET Engg.

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

Processing Flows of Information: From Data Stream to Complex Event Processing

Processing Flows of Information: From Data Stream to Complex Event Processing Processing Flows of Information: From Data Stream to Complex Event Processing GIANPAOLO CUGOLA and ALESSANDRO MARGARA, Politecnico di Milano A large number of distributed applications requires continuous

More information

Technical Bulletin. Arista LANZ Overview. Overview

Technical Bulletin. Arista LANZ Overview. Overview Technical Bulletin Arista LANZ Overview Overview Highlights: LANZ provides unparalleled visibility into congestion hotspots LANZ time stamping provides for precision historical trending for congestion

More information

Lection 3-4 WAREHOUSING

Lection 3-4 WAREHOUSING Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing

More information

Innovate and Grow: SAP and Teradata

Innovate and Grow: SAP and Teradata Partners Innovate and Grow: SAP and Teradata Lily Gulik, Teradata Director, SAP Center of Excellence Wayne Boyle, Chief Technology Officer Strategy, Teradata R&D Table of Contents Introduction: The Integrated

More information

Horizontal IoT Application Development using Semantic Web Technologies

Horizontal IoT Application Development using Semantic Web Technologies Horizontal IoT Application Development using Semantic Web Technologies Soumya Kanti Datta Research Engineer Communication Systems Department Email: Soumya-Kanti.Datta@eurecom.fr Roadmap Introduction Challenges

More information

Turkish Journal of Engineering, Science and Technology

Turkish Journal of Engineering, Science and Technology Turkish Journal of Engineering, Science and Technology 03 (2014) 106-110 Turkish Journal of Engineering, Science and Technology journal homepage: www.tujest.com Integrating Data Warehouse with OLAP Server

More information

INTELLIGENT PROFILE ANALYSIS GRADUATE ENTREPRENEUR (ipage) SYSTEM USING BUSINESS INTELLIGENCE TECHNOLOGY

INTELLIGENT PROFILE ANALYSIS GRADUATE ENTREPRENEUR (ipage) SYSTEM USING BUSINESS INTELLIGENCE TECHNOLOGY INTELLIGENT PROFILE ANALYSIS GRADUATE ENTREPRENEUR (ipage) SYSTEM USING BUSINESS INTELLIGENCE TECHNOLOGY Muhamad Shahbani, Azman Ta a, Mohd Azlan, and Norshuhada Shiratuddin INTRODUCTION Universiti Utara

More information

A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems

A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems RUPAM MUKHOPADHYAY, DIBYAJYOTI GHOSH AND NANDINI MUKHERJEE Department of Computer

More information

Effective Parameters on Response Time of Data Stream Management Systems

Effective Parameters on Response Time of Data Stream Management Systems Effective Parameters on Response Time of Data Stream Management Systems Shirin Mohammadi 1, Ali A. Safaei 1, Mostafa S. Hagjhoo 1 and Fatemeh Abdi 2 1 Department of Computer Engineering, Iran University

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

RUBA: Real-time Unstructured Big Data Analysis Framework

RUBA: Real-time Unstructured Big Data Analysis Framework RUBA: Real-time Unstructured Big Data Analysis Framework Jaein Kim, Nacwoo Kim, Byungtak Lee IT Management Device Research Section Honam Research Center, ETRI Gwangju, Republic of Korea jaein, nwkim, bytelee@etri.re.kr

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

Reducing ETL Load Times by a New Data Integration Approach for Real-time Business Intelligence

Reducing ETL Load Times by a New Data Integration Approach for Real-time Business Intelligence Reducing ETL Load Times by a New Data Integration Approach for Real-time Business Intelligence Darshan M. Tank Department of Information Technology, L.E.College, Morbi-363642, India dmtank@gmail.com Abstract

More information

Data Integration Models for Operational Data Warehousing

Data Integration Models for Operational Data Warehousing Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 2, February 2014,

More information

Switch Fabric Implementation Using Shared Memory

Switch Fabric Implementation Using Shared Memory Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today

More information

Preemptive Rate-based Operator Scheduling in a Data Stream Management System

Preemptive Rate-based Operator Scheduling in a Data Stream Management System Preemptive Rate-based Operator Scheduling in a Data Stream Management System Mohamed A. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis Department of Computer Science University of Pittsburgh Pittsburgh,

More information

Sensor Event Processing on Grid

Sensor Event Processing on Grid Sensor Event Processing on Grid Eui-Nam Huh Dept. of Computer Engineering Kyung Hee University #1 Seochon Kiheung, Yoingin, Kyunggi-Do, Korea johnhuh@khu.ac.kr Abstract. Wireless sensor networks are increasingly

More information

Research of Smart Distribution Network Big Data Model

Research of Smart Distribution Network Big Data Model Research of Smart Distribution Network Big Data Model Guangyi LIU Yang YU Feng GAO Wendong ZHU China Electric Power Stanford Smart Grid Research Institute Smart Grid Research Institute Research Institute

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Rotorcraft Health Management System (RHMS)

Rotorcraft Health Management System (RHMS) AIAC-11 Eleventh Australian International Aerospace Congress Rotorcraft Health Management System (RHMS) Robab Safa-Bakhsh 1, Dmitry Cherkassky 2 1 The Boeing Company, Phantom Works Philadelphia Center

More information

ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT ETL Process in Data Warehouse G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT Outline ETL Extraction Transformation Loading ETL Overview Extraction Transformation Loading ETL To get data out of

More information

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing www.ijcsi.org 227 Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing Dhuha Basheer Abdullah 1, Zeena Abdulgafar Thanoon 2, 1 Computer Science Department, Mosul University,

More information

ETPL Extract, Transform, Predict and Load

ETPL Extract, Transform, Predict and Load ETPL Extract, Transform, Predict and Load An Oracle White Paper March 2006 ETPL Extract, Transform, Predict and Load. Executive summary... 2 Why Extract, transform, predict and load?... 4 Basic requirements

More information

Cloud Based Distributed Databases: The Future Ahead

Cloud Based Distributed Databases: The Future Ahead Cloud Based Distributed Databases: The Future Ahead Arpita Mathur Mridul Mathur Pallavi Upadhyay Abstract Fault tolerant systems are necessary to be there for distributed databases for data centers or

More information

INTEROPERABILITY IN DATA WAREHOUSES

INTEROPERABILITY IN DATA WAREHOUSES INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

More information

Delivery models refer to the method by which Information and Business Intelligence is sent from its source to its destination (consumer).

Delivery models refer to the method by which Information and Business Intelligence is sent from its source to its destination (consumer). Delivery Models Delivery models refer to the method by which Information and Business Intelligence is sent from its source to its destination (consumer). The primary delivery models within and Enterprise

More information

Part 22. Data Warehousing

Part 22. Data Warehousing Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem

More information

IST722 Data Warehousing

IST722 Data Warehousing IST722 Data Warehousing Components of the Data Warehouse Michael A. Fudge, Jr. Recall: Inmon s CIF The CIF is a reference architecture Understanding the Diagram The CIF is a reference architecture CIF

More information

A comparative study of data mining (DM) and massive data mining (MDM)

A comparative study of data mining (DM) and massive data mining (MDM) A comparative study of data mining (DM) and massive data mining (MDM) Prof. Dr. P K Srimani Former Chairman, Dept. of Computer Science and Maths, Bangalore University, Director, R & D, B.U., Bangalore,

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information

BPM, EDA and SOA: How the Combination of these Technologies Facilitates Change. Dr. Neil Thomson, Head of Group Development, Microgen plc

BPM, EDA and SOA: How the Combination of these Technologies Facilitates Change. Dr. Neil Thomson, Head of Group Development, Microgen plc BPM, EDA and SOA: How the Combination of these Technologies Facilitates Change Dr. Neil Thomson, Head of Group Development, Microgen plc What are we trying to do? The aim is survival everything else is

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

An apparatus for P2P classification in Netflow traces

An apparatus for P2P classification in Netflow traces An apparatus for P2P classification in Netflow traces Andrew M Gossett, Ioannis Papapanagiotou and Michael Devetsikiotis Electrical and Computer Engineering, North Carolina State University, Raleigh, USA

More information

Query Selectivity Estimation for Uncertain Data

Query Selectivity Estimation for Uncertain Data Query Selectivity Estimation for Uncertain Data Sarvjeet Singh *, Chris Mayfield *, Rahul Shah #, Sunil Prabhakar * and Susanne Hambrusch * * Department of Computer Science, Purdue University # Department

More information

Methods and tools for data and software integration Enterprise Service Bus

Methods and tools for data and software integration Enterprise Service Bus Methods and tools for data and software integration Enterprise Service Bus Roman Hauptvogl Cleverlance Enterprise Solutions a.s Czech Republic hauptvogl@gmail.com Abstract Enterprise Service Bus (ESB)

More information

A Knowledge Management Framework Using Business Intelligence Solutions

A Knowledge Management Framework Using Business Intelligence Solutions www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For

More information

The Network Layer Functions: Congestion Control

The Network Layer Functions: Congestion Control The Network Layer Functions: Congestion Control Network Congestion: Characterized by presence of a large number of packets (load) being routed in all or portions of the subnet that exceeds its link and

More information

Load Shedding for Aggregation Queries over Data Streams

Load Shedding for Aggregation Queries over Data Streams Load Shedding for Aggregation Queries over Data Streams Brian Babcock Mayur Datar Rajeev Motwani Department of Computer Science Stanford University, Stanford, CA 94305 {babcock, datar, rajeev}@cs.stanford.edu

More information

Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover

Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover 1 Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover Jie Xu, Member, IEEE, Yuming Jiang, Member, IEEE, and Andrew Perkis, Member, IEEE Abstract In this paper we investigate

More information

INDEXING BIOMEDICAL STREAMS IN DATA MANAGEMENT SYSTEM 1. INTRODUCTION

INDEXING BIOMEDICAL STREAMS IN DATA MANAGEMENT SYSTEM 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 9/2005, ISSN 1642-6037 Michał WIDERA *, Janusz WRÓBEL *, Adam MATONIA *, Michał JEŻEWSKI **,Krzysztof HOROBA *, Tomasz KUPKA * centralized monitoring,

More information

Load Distribution in Large Scale Network Monitoring Infrastructures

Load Distribution in Large Scale Network Monitoring Infrastructures Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu

More information

Enabling Cloud Architecture for Globally Distributed Applications

Enabling Cloud Architecture for Globally Distributed Applications The increasingly on demand nature of enterprise and consumer services is driving more companies to execute business processes in real-time and give users information in a more realtime, self-service manner.

More information

Cleaning Encrypted Traffic

Cleaning Encrypted Traffic Optenet Documentation Cleaning Encrypted Traffic Troubleshooting Guide iii Version History Doc Version Product Date Summary of Changes V6 OST-6.4.300 01/02/2015 English editing Optenet Documentation

More information

Data Mining Governance for Service Oriented Architecture

Data Mining Governance for Service Oriented Architecture Data Mining Governance for Service Oriented Architecture Ali Beklen Software Group IBM Turkey Istanbul, TURKEY alibek@tr.ibm.com Turgay Tugay Bilgin Dept. of Computer Engineering Maltepe University Istanbul,

More information

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Big Data Mining Services and Knowledge Discovery Applications on Clouds Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades

More information

Big Data Challenges. Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com

Big Data Challenges. Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com Database Systems Journal vol. IV, no. 3/2013 31 Big Data Challenges Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com The amount of data that is traveling across

More information

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd Page 1 of 8 TU1UT TUENTERPRISE TU2UT TUREFERENCESUT TABLE

More information

Processing Flows of Information: From Data Stream to Complex Event Processing

Processing Flows of Information: From Data Stream to Complex Event Processing Processing Flows of Information: From Data Stream to Complex Event Processing GIANPAOLO CUGOLA and ALESSANDRO MARGARA Dip. di Elettronica e Informazione Politecnico di Milano, Italy A large number of distributed

More information

Zero-Latency Data Warehousing (ZLDWH): the State-of-the-art and experimental implementation approaches

Zero-Latency Data Warehousing (ZLDWH): the State-of-the-art and experimental implementation approaches Zero-Latency Data Warehousing (ZLDWH): the State-of-the-art and experimental implementation approaches Tho Manh Nguyen, and A Min Tjoa Abstract Increased data volumes and accelerating update speeds are

More information

Monitoring Traffic manager

Monitoring Traffic manager Monitoring Traffic manager eg Enterprise v6 Restricted Rights Legend The information contained in this document is confidential and subject to change without notice. No part of this document may be reproduced

More information

Sampling Methods In Approximate Query Answering Systems

Sampling Methods In Approximate Query Answering Systems Sampling Methods In Approximate Query Answering Systems Gautam Das Department of Computer Science and Engineering The University of Texas at Arlington Box 19015 416 Yates St. Room 300, Nedderman Hall Arlington,

More information

Virtual Operational Data Store (VODS) A Syncordant White Paper

Virtual Operational Data Store (VODS) A Syncordant White Paper Virtual Operational Data Store (VODS) A Syncordant White Paper Table of Contents Executive Summary... 3 What is an Operational Data Store?... 5 Differences between Operational Data Stores and Data Warehouses...

More information

Research on Video Traffic Control Technology Based on SDN. Ziyan Lin

Research on Video Traffic Control Technology Based on SDN. Ziyan Lin Joint International Mechanical, Electronic and Information Technology Conference (JIMET 2015) Research on Video Traffic Control Technology Based on SDN Ziyan Lin Communication University of China, Beijing

More information

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Executive Summary Oracle Berkeley DB is used in a wide variety of carrier-grade mobile infrastructure systems. Berkeley DB provides

More information

A Survey of Real-Time Data Warehouse and ETL

A Survey of Real-Time Data Warehouse and ETL Fahd Sabry Esmail Ali A Survey of Real-Time Data Warehouse and ETL Article Info: Received 09 July 2014 Accepted 24 August 2014 UDC 004.6 Recommended citation: Esmail Ali, F.S. (2014). A Survey of Real-

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Optimization of ETL Work Flow in Data Warehouse

Optimization of ETL Work Flow in Data Warehouse Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. Sivaganesh07@gmail.com P Srinivasu

More information

CHAPTER 7 SUMMARY AND CONCLUSION

CHAPTER 7 SUMMARY AND CONCLUSION 179 CHAPTER 7 SUMMARY AND CONCLUSION This chapter summarizes our research achievements and conclude this thesis with discussions and interesting avenues for future exploration. The thesis describes a novel

More information

GRIDS IN DATA WAREHOUSING

GRIDS IN DATA WAREHOUSING GRIDS IN DATA WAREHOUSING By Madhu Zode Oct 2008 Page 1 of 6 ABSTRACT The main characteristic of any data warehouse is its ability to hold huge volume of data while still offering the good query performance.

More information

Boarding to Big data

Boarding to Big data Database Systems Journal vol. VI, no. 4/2015 11 Boarding to Big data Oana Claudia BRATOSIN University of Economic Studies, Bucharest, Romania oc.bratosin@gmail.com Today Big data is an emerging topic,

More information

Distributed Sampling Storage for Statistical Analysis of Massive Sensor Data

Distributed Sampling Storage for Statistical Analysis of Massive Sensor Data Distributed Sampling Storage for Statistical Analysis of Massive Sensor Data Hiroshi Sato 1, Hisashi Kurasawa 1, Takeru Inoue 1, Motonori Nakamura 1, Hajime Matsumura 1, and Keiichi Koyanagi 2 1 NTT Network

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Proper study of Data Warehousing and Data Mining Intelligence Application in Education Domain

Proper study of Data Warehousing and Data Mining Intelligence Application in Education Domain Journal of The International Association of Advanced Technology and Science Proper study of Data Warehousing and Data Mining Intelligence Application in Education Domain AMAN KADYAAN JITIN Abstract Data-driven

More information

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems Load Balancing in Structured Peer to Peer Systems DR.K.P.KALIYAMURTHIE 1, D.PARAMESWARI 2 Professor and Head, Dept. of IT, Bharath University, Chennai-600 073 1 Asst. Prof. (SG), Dept. of Computer Applications,

More information

Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration

Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration 1 Harish H G, 2 Dr. R Girisha 1 PG Student, 2 Professor, Department of CSE, PESCE Mandya (An Autonomous Institution under

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems Load Balancing in Structured Peer to Peer Systems Dr.K.P.Kaliyamurthie 1, D.Parameswari 2 1.Professor and Head, Dept. of IT, Bharath University, Chennai-600 073. 2.Asst. Prof.(SG), Dept. of Computer Applications,

More information

Masters Project Proxy SG

Masters Project Proxy SG Masters Project Proxy SG Group Members Chris Candilora Cortland Clater Eric Garner Justin Jones Blue Coat Products Proxy SG Series Blue Coat Proxy SG appliances offer a comprehensive foundation for the

More information

Wide Area Monitoring, Control, and Protection

Wide Area Monitoring, Control, and Protection Wide Area Monitoring, Control, and Protection Course Map Acronyms Wide Area Monitoring Systems (WAMS) Wide Area Monitoring Control Systems (WAMCS) Wide Area Monitoring Protection and Control Systems (WAMPACS)

More information

The OSI model has seven layers. The principles that were applied to arrive at the seven layers can be briefly summarized as follows:

The OSI model has seven layers. The principles that were applied to arrive at the seven layers can be briefly summarized as follows: 1.4 Reference Models Now that we have discussed layered networks in the abstract, it is time to look at some examples. In the next two sections we will discuss two important network architectures, the

More information

Real-Time Data Warehouse Loading Methodology

Real-Time Data Warehouse Loading Methodology Real-Time Data Warehouse Loading Methodology Ricardo Jorge Santos CISUC Centre of Informatics and Systems DEI FCT University of Coimbra Coimbra, Portugal lionsoftware.ricardo@gmail.com Jorge Bernardino

More information