Efficient Data Streams Processing in the Real Time Data Warehouse
|
|
- Bethany Richardson
- 8 years ago
- Views:
Transcription
1 Efficient Data Streams Processing in the Real Time Data Warehouse Fiaz Majeed Muhammad Sohaib Mahmood Mujahid Iqbal Abstract Today many business applications are generating fast, multiple, continuous and time varying data streams [2] which are not manageable from available ETL (Extraction, Transformation and Loading) technology. This work gives architecture to extend the real time data warehouse for efficient handling of data streams while keeping traditional functionalities. The paper focuses on the data stream processing part of the architecture. The data stream processor constructed with the combination of mature techniques performs data stream processing reliably. It ensures synchronization of the data streams processing with the pace of the incoming data streams from source applications. Keywords-Data warehouse, Real-time, active databases, data streams I. INTRODUCTION The applications generating data streams include click stream based web applications, network monitoring, security, sensor, telecommunication and manufacturing applications [4] etc. In contrast to other transactional applications where limited numbers of transactions are carried out in uncontinuous manner like order processing applications, data stream applications continuously generate data in non uniform way [2]. Sometimes data may produce in trickle feed [5]; it may come in the form of bursts in other time interval. So it is very difficult to handle this form of data with traditional methods. Data stream applications are the source of most of the data warehouse systems. Data warehouse systems [14] are used in enterprises to take the dispersed data from heterogeneous source systems into a central container. It provides a single view of enterprise to the executives for strategic decision making. They can analyze current state of their business as well as predict for future with the use of historical data in the data warehouse. Traditionally, data warehouses were updated with the source data weekly or daily in nightly batches. That time those were built only for strategic analysis. For current analysis, operational systems were used. Now business is demanding for tactical analysis as well as strategic decision making from the data warehouse. For this purpose, data warehouse is required to be updated in real time when transaction is happened in any of the source systems. Therefore, real-time data warehouse fulfills this with closeloop functionality. Enterprises are using real-time data warehouse with traditional functionalities. They can add new functionality in the existing system rather than deploy a whole new system. The data stream handling is presented in the proposed architecture as independent module which does not overloads the working of the ETL tool. Some proposed works on data streams [1]; design systems from scratch that divert from the traditional functionalities. The rest of the paper is formatted as follows: Section II details the related work about real time data warehouse and data streams. In Section III, RTDW architecture with the inclusion of data streams is formulated. Data streams processing is discussed in Section IV. Finally, Section V concludes the work and provides future directions. II. RELATED WORK Data streams have been extensively studied in different domains. The potential issue in data stream handling is memory management. It is very difficult to store unlimited data streams in limited memory. Grid Technology [1] was used to cope over the storage requirement in the presence of data streams. In this solution, data streams are captured and stored in distributed grid nodes. These nodes contain high volume of storage capacity for huge quantity of data streams. Data stream processing systems normally hold fixed storage and computing power. In this case, the challenge is to get all the valuable contents from data streams in the presence of limited storage. Since, approximation techniques [2, 3, and 4] were produced to generate summary streams for handling the issue of small storage. Many techniques of summaries (or synopsis) creation have been proposed in the /10/$ IEEE 57
2 literature. These contain sampling, histograms and wavelets [11] etc. provides approximate results closer to accurate input data. The congressional samples [6] calculate approximate results for group by queries. This proposed technique discusses the problem of uniform random sampling and gives a notion of bias sampling to get valuable contents from the data. Architecture for continuous queries over data streams is presented in [4] which provide a way to contain the maximum data streams in short memory. It divides the storage in four containers named stream, store, scratch and throw. Stream holds the continuous processing elements, store saves those streams which are to be required after short period. Scratch contains streams for use in future analysis. The data no more beneficial is disposed through throw container. A data streams solution extract them using queue networks [8] in which streams are stored in queues before processing. ETL performance is then evaluated using queue theory. III. ARCHITECTURE OF THE REAL TIME DATA WAREHOUSE The focus of underlying research is to extract the data streams in efficient way, process them and load in a suitable format into the data warehouse. Data warehouse architecture with the extension of data streams handling is depicted in Fig. 1. conducted on WAN application performance [13], the web traffic is more than 25% of the overall WAN traffic. It is increasing rapidly with the increasing number of users all over the world. Following sections discuss in depth the data streams processing of real-time data warehouse. IV. DATA STREAM PROCESSING The click streams are pushed [10] from the web application to the stream processor to load them into the data warehouse. As depicted in Fig. 2, stream processor takes stream items as input, process and converts them into the format suitable for the ODS. Figure 2. Data Streams processing before Data Warehouse insertion The stream processing part of the architecture ensures memory management, synchronization and accurate processing. The challenge is to manage the fixed memory in the presence of heavy, burst and time varying data streams. Figure 1. RTDW architecture extended with Data Streams processing Real-time data is extracted from the source applications based on event driven approach and loaded into the Operational Data Store (ODS). When transaction is occurred in any of the source systems, it is detected and sent to the ODS. The novel part of the architecture is data stream handling which is the main focus of this research work. It is classified into three areas that are data stream extraction, processing and loading. According to the architecture, the data streams and other source systems are integrated in the ODS. Then real-time analysis and reports are generated from the ODS by the operational users. The load images are sent to data warehouse in batches from the ODS as it is done traditionally in enterprises. This work defines a general architecture address all type of data streams. For the elaboration of the components in the architecture, click streams are used. According to the survey A. Continuous Queries As we have already discussed that data streams arrive in unbounded size. The handling of these heavy data streams requires huge storage and computing technology which is unaffordable. The size of the data streams can be reduced by filtering the irrelevant data streams. The data streams processor filters them by using continuous query [4]. Data streams that pass the filtration criteria are allowed to enter in the stream processor. Remaining streams that do not follow the criteria are discarded. The queries are registered to the system before the execution. Continuous queries run continuously and evaluate the arriving data streams. The predicates defined in the query are used as filtering criteria. The memory in this way is now capable to carry maximum number of data streams and increase the efficiency of the system. The filtration through continuous query is shown in Fig. 3. Figure 3. Filtration through Continuous Queries 58
3 B. Data Streams Approximation The approximation techniques are broadly explored in query processing context. As the databases are mostly huge reservoirs of data in the enterprises. If a query is evaluated on the whole database, it takes long time for achieving results. The user composing the query expects faster response from the database management system (DBMS). The DBMS evaluates the query on stored datasets and return results within nanoseconds to the user. In fact, the query processor of DBMS makes it possible with the use of summarization techniques. It uses algorithms that compute summary of the detailed data in one pass and provide approximate results on the basis of those summaries. The results are not completely accurate but close to the accurate answers. Especially, data warehouse environment which stores historical data of decades. The query processing in that environment is highly complicated. If query is evaluated on detailed data in data warehouse, it takes days to compute the results which are unaffordable by the users. In this work, sampling [6] is used to produce approximate results. In sampling technique, the dataset is divided into equal parts and small samples are picked where each sample represents the essential characteristics of a part of data. The approximations are started when the size of data streams increase from a specified threshold level set on memory. This process is continued until the size of data streams reach below the threshold. There are two cases when data streams cross the threshold level of memory. First, when data streams arrive from the application. Due to their unpredictable nature, these might not be adjusted in memory. Second, if the rate of processing to data streams is not equal to their arrival rate then all data streams might not be processed due to lack of synchronization. In that case, extra data streams are discarded. The data streams processor generates summaries in both situations. 1) Approximations Production Alert The data streams are transformed in detail while memory is available to keep them for processing. As mentioned earlier, all valuable data can only be loaded into ODS and then to Real Time Data Warehouse (RTDW) if the quantity of data streams remains below the threshold. Otherwise extra data would be discarded due to lack of memory. To eliminate this risk, data should be summarized instantly when this condition occur. There should be applied some alert to monitor the data rate flow which informs the data streams processor to start the process of summarizations in case of imbalance in flow. Such alert in the data streams processor is implemented in the form of continuous query. Fig. 4 depicts the process of data streams processing that run normally until threshold level exceed. Figure 4. Data Stream approximations production alert The alert system generates exception when threshold level is exceeded from the specified limit with respect to memory. C. Regulate Flow of Data Streams The data streams processor is not able to process the data streams more than a fixed quantity in a unit time. The irregular and rush of data streams create disorder in their processing. They demand for high computing power of the data streams processor. The underlying real-time data warehouse architecture uses token bucket technique [12] to regulate the data streams arrive into the data streams processor. 1) Token Bucket Technique The token bucket holds tokens which are generated at every clock tick. The fixed number of streams can be transmitted on getting a token. It allows sending some burstness in the output with increasing rate of input streams. It discards tokens when fills up and saves the streams. By using this technique, data streams processor receive streams in a regular flow. It is capable to process data streams in a constant rate, thus increase the efficiency of the system. The valuable contents in the data streams are guaranteed to be stored in the real-time data warehouse reliably. Another advantage of this technique is the maintenance of synchronization between arrival and processing of data streams. Fig. 5 shows the use of token bucket technique in the data streams processing part of the proposed real-time data warehouse architecture. 59
4 restructure relations into streams reversely. In addition, processor creates initial extract files to integrate them with other source system s data into ODS. Figure 5. Regulate Streams using Token Bucket D. Format Conversion The click stream producing application generates items in web oriented formats. Currently, ODS structure is implemented in relational databases. Relational structures store data in two-dimensional format. Table 1 shows the relational structure. Source Bytes of Request (Host) TABLE I. Referring Page RELATIONAL FORMAT Date and Time of Request Browser Page Requested PID OID (HTTP protocol) Platform CID Pm " [24/May/2009 "GET P131 C dialip.mich.ne t ately.com/" "Mozilla/4.51 [en] (Win98; I)" :19:13: ] /images/tagline.gi f HTTP/1.0" Pm " [24/May/2009 "GET P131 O142 C dialip.mich.ne t ately.com/" "Mozilla/4.51 [en] (Win98; I)" :19:13: ] /images/bkgrnd.jp g HTTP/1.0" In current advanced technological environment, web applications use relational databases for data storage. If this is the case with source web applications, it is easy to transfer data streams to ODS with initial conversions. Most web applications running on legacy platforms store data in files. It is necessary to restructure the data streams into relational format before forwarding them to the ODS. Fig. 6 shows the mapping among streams and relations. Figure 6. Adapted from [3]: Mapping among Streams and Relations The conversion from streams to relations is performed by the data streams processor. The data streams processor guarantees the accurate conversion. It is also capable to E. Time Stamping Time dimension in the data warehouse has great importance for strategic analysis. The analyses in the data warehouse are performed against time dimension for forecasting and comparisons etc. Therefore, it is necessary to store each tuple with a time stamp. Especially, time stamp is required for each incoming stream item to store within arrival order. Two types of time stamps are defined [2] that are implicit and explicit. Implicit time stamps are appended by the system as a field. This type is used when streams do not already have time element. Explicit time stamps add an attribute for exact time information. A data model to deal with time delays in data warehouse is presented in [9]. It defines a time dimension includes three time stamps i.e. valid time, revelation time, and load time stamp. In our stream processing architecture, time stamps are assigned to the streams by the data streams processor. V. CONCLUSIONS Currently, a new class of applications are introduced which generate fast, multiple, continuous and time varying data streams [2, 3]. Most of the time, these applications generate data streams in the form of heavy bursts [4] which cannot be handled by existing ETL technology. The real time ETL and EAI (Enterprise Application Integration) tools built for the source systems generate data in non continuous form and little number of transactions is generated in a unit time, incapable to handle data streams. The existing solutions [1, 7] divert from the traditional functionality of the data warehouse which is the requirement of the enterprises. The real time data warehouse architecture with the inclusion of data stream management is presented in this work. Data stream management is divided into three parts that are extraction, processing and loading. This paper discusses in detail the data streams processing part. In each step, worthwhile techniques are used to make the data stream management efficient. The data streams processor takes valuable data contents from the data stream elements for achieving maximum accuracy. It is necessary to work on data streams extraction and loading according to the requirements of the data streams processor in the future work and required a solution of the integration of the data from both operational data sources and data streams applications. REFERENCES [1] N. M. Tho, A. M. Tjoa, Zero-latency Data Warehousing (ZLDWH): the state-of-the-art and experimental implementation approaches, In proceedings of 4th IEEE Intl. conference on computer science research, [2] Babcock., Models and issues in data stream systems, In proceedings of the 2002 ACM Symp on Principles of DatabaseSystems, June [3] Widomet., Query Processing, approximation, and resource management in a data stream management system, In proceedings of the CIDR Conference,
5 [4] S. Babu, J. Widom, Continuous queries over data streams, SIGMOD Record, 30(3): , sep [5] R. Basu, Challenges of Real-time Data Warehousing, DMReview article, [6] S. Acharya, B. Gibbons, V. Poosala, Congressional samples for approximate answering of group by queries, In proceedings of the special interest group on management of data, pages , [7] N. M. Tho, A.M. Tjoa, Zero latency data warehousing for heterogeneous data sources and continuous data streams, In proceedings of the 5th intl. conference on information integration, web applications and services, Jakarta, Indonesia, [8] P. Karakasidis, Vassiliadis, E. Pitoura, ETL queues for active data warehousing, In proceedings of IQIS, Pages 28-39, [9] R. Bruckner, A. M. Tjoa, Managing time consistency for active data warehouse environments, In proceedings of the intl. conference on data warehousing and knowledge discovery, [10] E. J. Kendall, E. K. Kendall, Information delivery systems: An exploration of web pull and push technologies, Tutorial, Volume1, Paper 14, April [11] S. Guha, N. Koudas, Approximating a data stream for querying and estimation: Algorithms and performance evaluation, In proceedings of the data engineering, [12] J. S. Turner, New directions in communications (or which way to the information age), IEEE Commun. Magazine, vol. 24, pp. 8-15, Oct [13] Blue Coat, WAN Application Performance, White Paper, [14] W.H. Inmon, Building the data warehouse, New York: Wiley,
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationChapter 5. Learning Objectives. DW Development and ETL
Chapter 5 DW Development and ETL Learning Objectives Explain data integration and the extraction, transformation, and load (ETL) processes Basic DW development methodologies Describe real-time (active)
More informationThe big data revolution
The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing
More informationA Service-oriented Dual-bus BAM System Model
I.J. Engineering and Manufacturing, 2012,2, 1-7 Published Online April 2012 in MECS (http://www.mecs-press.net) DOI: 10.5815/ijem.2012.02.01 Available online at http://www.mecs-press.net/ijem A Service-oriented
More informationMiddleware support for the Internet of Things
Middleware support for the Internet of Things Karl Aberer, Manfred Hauswirth, Ali Salehi School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne,
More informationEfficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
More informationReconciliation and best practices in a configuration management system. White paper
Reconciliation and best practices in a configuration management system White paper Table of contents Introduction... 3 A reconciliation analogy: automobile manufacturing assembly... 3 Conflict resolution...
More informationQuality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS?
18-345: Introduction to Telecommunication Networks Lectures 20: Quality of Service Peter Steenkiste Spring 2015 www.cs.cmu.edu/~prs/nets-ece Overview What is QoS? Queuing discipline and scheduling Traffic
More informationDSEC: A Data Stream Engine Based Clinical Information System *
DSEC: A Data Stream Engine Based Clinical Information System * Yu Fan, Hongyan Li **, Zijing Hu, Jianlong Gao, Haibin Liu, Shiwei Tang, and Xinbiao Zhou National Laboratory on Machine Perception, School
More informationMario Guarracino. Data warehousing
Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the
More informationVisionet IT Modernization Empowering Change
Visionet IT Modernization A Visionet Systems White Paper September 2009 Visionet Systems Inc. 3 Cedar Brook Dr. Cranbury, NJ 08512 Tel: 609 360-0501 Table of Contents 1 Executive Summary... 4 2 Introduction...
More informationAn Ants Algorithm to Improve Energy Efficient Based on Secure Autonomous Routing in WSN
An Ants Algorithm to Improve Energy Efficient Based on Secure Autonomous Routing in WSN *M.A.Preethy, PG SCHOLAR DEPT OF CSE #M.Meena,M.E AP/CSE King College Of Technology, Namakkal Abstract Due to the
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationUnderstanding traffic flow
White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow
More informationEnterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects
Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Abstract: Build a model to investigate system and discovering relations that connect variables in a database
More informationTHE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS
THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the
More informationFuzzy Active Queue Management for Assured Forwarding Traffic in Differentiated Services Network
Fuzzy Active Management for Assured Forwarding Traffic in Differentiated Services Network E.S. Ng, K.K. Phang, T.C. Ling, L.Y. Por Department of Computer Systems & Technology Faculty of Computer Science
More informationA Survey Study on Monitoring Service for Grid
A Survey Study on Monitoring Service for Grid Erkang You erkyou@indiana.edu ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide
More informationCongestion Control Overview
Congestion Control Overview Problem: When too many packets are transmitted through a network, congestion occurs t very high traffic, performance collapses completely, and almost no packets are delivered
More informationThe Role of Precise Timing in High-Speed, Low-Latency Trading
The Role of Precise Timing in High-Speed, Low-Latency Trading The race to zero nanoseconds Whether measuring network latency or comparing real-time trading data from different computers on the planet,
More information1) A complete SCM solution includes customers, service providers and partners. Answer: TRUE Diff: 2 Page Ref: 304
Enterprise Systems for Management, 2e (Motiwalla/Thompson) Chapter 11 Supply Chain Management 1) A complete SCM solution includes customers, service providers and partners. Diff: 2 Page Ref: 304 2) SCM
More informationIFS-8000 V2.0 INFORMATION FUSION SYSTEM
IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence
More informationNear Real-time Data Warehousing with Multi-stage Trickle & Flip
Near Real-time Data Warehousing with Multi-stage Trickle & Flip Janis Zuters University of Latvia, 19 Raina blvd., LV-1586 Riga, Latvia janis.zuters@lu.lv Abstract. A data warehouse typically is a collection
More informationENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION
ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION Enzo Unified Solves Real-Time Data Integration Challenges that Increase Business Agility and Reduce Operational Complexities CHALLENGES
More informationA Comparison Study of Qos Using Different Routing Algorithms In Mobile Ad Hoc Networks
A Comparison Study of Qos Using Different Routing Algorithms In Mobile Ad Hoc Networks T.Chandrasekhar 1, J.S.Chakravarthi 2, K.Sravya 3 Professor, Dept. of Electronics and Communication Engg., GIET Engg.
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationUnderstanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
More informationProcessing Flows of Information: From Data Stream to Complex Event Processing
Processing Flows of Information: From Data Stream to Complex Event Processing GIANPAOLO CUGOLA and ALESSANDRO MARGARA, Politecnico di Milano A large number of distributed applications requires continuous
More informationTechnical Bulletin. Arista LANZ Overview. Overview
Technical Bulletin Arista LANZ Overview Overview Highlights: LANZ provides unparalleled visibility into congestion hotspots LANZ time stamping provides for precision historical trending for congestion
More informationLection 3-4 WAREHOUSING
Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing
More informationInnovate and Grow: SAP and Teradata
Partners Innovate and Grow: SAP and Teradata Lily Gulik, Teradata Director, SAP Center of Excellence Wayne Boyle, Chief Technology Officer Strategy, Teradata R&D Table of Contents Introduction: The Integrated
More informationHorizontal IoT Application Development using Semantic Web Technologies
Horizontal IoT Application Development using Semantic Web Technologies Soumya Kanti Datta Research Engineer Communication Systems Department Email: Soumya-Kanti.Datta@eurecom.fr Roadmap Introduction Challenges
More informationTurkish Journal of Engineering, Science and Technology
Turkish Journal of Engineering, Science and Technology 03 (2014) 106-110 Turkish Journal of Engineering, Science and Technology journal homepage: www.tujest.com Integrating Data Warehouse with OLAP Server
More informationINTELLIGENT PROFILE ANALYSIS GRADUATE ENTREPRENEUR (ipage) SYSTEM USING BUSINESS INTELLIGENCE TECHNOLOGY
INTELLIGENT PROFILE ANALYSIS GRADUATE ENTREPRENEUR (ipage) SYSTEM USING BUSINESS INTELLIGENCE TECHNOLOGY Muhamad Shahbani, Azman Ta a, Mohd Azlan, and Norshuhada Shiratuddin INTRODUCTION Universiti Utara
More informationA Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems
A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems RUPAM MUKHOPADHYAY, DIBYAJYOTI GHOSH AND NANDINI MUKHERJEE Department of Computer
More informationEffective Parameters on Response Time of Data Stream Management Systems
Effective Parameters on Response Time of Data Stream Management Systems Shirin Mohammadi 1, Ali A. Safaei 1, Mostafa S. Hagjhoo 1 and Fatemeh Abdi 2 1 Department of Computer Engineering, Iran University
More informationKeywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.
Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement
More informationRUBA: Real-time Unstructured Big Data Analysis Framework
RUBA: Real-time Unstructured Big Data Analysis Framework Jaein Kim, Nacwoo Kim, Byungtak Lee IT Management Device Research Section Honam Research Center, ETRI Gwangju, Republic of Korea jaein, nwkim, bytelee@etri.re.kr
More informationPartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2
More informationReducing ETL Load Times by a New Data Integration Approach for Real-time Business Intelligence
Reducing ETL Load Times by a New Data Integration Approach for Real-time Business Intelligence Darshan M. Tank Department of Information Technology, L.E.College, Morbi-363642, India dmtank@gmail.com Abstract
More informationData Integration Models for Operational Data Warehousing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 2, February 2014,
More informationSwitch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
More informationPreemptive Rate-based Operator Scheduling in a Data Stream Management System
Preemptive Rate-based Operator Scheduling in a Data Stream Management System Mohamed A. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis Department of Computer Science University of Pittsburgh Pittsburgh,
More informationSensor Event Processing on Grid
Sensor Event Processing on Grid Eui-Nam Huh Dept. of Computer Engineering Kyung Hee University #1 Seochon Kiheung, Yoingin, Kyunggi-Do, Korea johnhuh@khu.ac.kr Abstract. Wireless sensor networks are increasingly
More informationResearch of Smart Distribution Network Big Data Model
Research of Smart Distribution Network Big Data Model Guangyi LIU Yang YU Feng GAO Wendong ZHU China Electric Power Stanford Smart Grid Research Institute Smart Grid Research Institute Research Institute
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationRotorcraft Health Management System (RHMS)
AIAC-11 Eleventh Australian International Aerospace Congress Rotorcraft Health Management System (RHMS) Robab Safa-Bakhsh 1, Dmitry Cherkassky 2 1 The Boeing Company, Phantom Works Philadelphia Center
More informationETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT
ETL Process in Data Warehouse G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT Outline ETL Extraction Transformation Loading ETL Overview Extraction Transformation Loading ETL To get data out of
More informationReal Time Network Server Monitoring using Smartphone with Dynamic Load Balancing
www.ijcsi.org 227 Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing Dhuha Basheer Abdullah 1, Zeena Abdulgafar Thanoon 2, 1 Computer Science Department, Mosul University,
More informationETPL Extract, Transform, Predict and Load
ETPL Extract, Transform, Predict and Load An Oracle White Paper March 2006 ETPL Extract, Transform, Predict and Load. Executive summary... 2 Why Extract, transform, predict and load?... 4 Basic requirements
More informationCloud Based Distributed Databases: The Future Ahead
Cloud Based Distributed Databases: The Future Ahead Arpita Mathur Mridul Mathur Pallavi Upadhyay Abstract Fault tolerant systems are necessary to be there for distributed databases for data centers or
More informationINTEROPERABILITY IN DATA WAREHOUSES
INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content
More informationDelivery models refer to the method by which Information and Business Intelligence is sent from its source to its destination (consumer).
Delivery Models Delivery models refer to the method by which Information and Business Intelligence is sent from its source to its destination (consumer). The primary delivery models within and Enterprise
More informationPart 22. Data Warehousing
Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem
More informationIST722 Data Warehousing
IST722 Data Warehousing Components of the Data Warehouse Michael A. Fudge, Jr. Recall: Inmon s CIF The CIF is a reference architecture Understanding the Diagram The CIF is a reference architecture CIF
More informationA comparative study of data mining (DM) and massive data mining (MDM)
A comparative study of data mining (DM) and massive data mining (MDM) Prof. Dr. P K Srimani Former Chairman, Dept. of Computer Science and Maths, Bangalore University, Director, R & D, B.U., Bangalore,
More informationIndex Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
More informationBPM, EDA and SOA: How the Combination of these Technologies Facilitates Change. Dr. Neil Thomson, Head of Group Development, Microgen plc
BPM, EDA and SOA: How the Combination of these Technologies Facilitates Change Dr. Neil Thomson, Head of Group Development, Microgen plc What are we trying to do? The aim is survival everything else is
More informationData Warehouse: Introduction
Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,
More informationAn apparatus for P2P classification in Netflow traces
An apparatus for P2P classification in Netflow traces Andrew M Gossett, Ioannis Papapanagiotou and Michael Devetsikiotis Electrical and Computer Engineering, North Carolina State University, Raleigh, USA
More informationQuery Selectivity Estimation for Uncertain Data
Query Selectivity Estimation for Uncertain Data Sarvjeet Singh *, Chris Mayfield *, Rahul Shah #, Sunil Prabhakar * and Susanne Hambrusch * * Department of Computer Science, Purdue University # Department
More informationMethods and tools for data and software integration Enterprise Service Bus
Methods and tools for data and software integration Enterprise Service Bus Roman Hauptvogl Cleverlance Enterprise Solutions a.s Czech Republic hauptvogl@gmail.com Abstract Enterprise Service Bus (ESB)
More informationA Knowledge Management Framework Using Business Intelligence Solutions
www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For
More informationThe Network Layer Functions: Congestion Control
The Network Layer Functions: Congestion Control Network Congestion: Characterized by presence of a large number of packets (load) being routed in all or portions of the subnet that exceeds its link and
More informationLoad Shedding for Aggregation Queries over Data Streams
Load Shedding for Aggregation Queries over Data Streams Brian Babcock Mayur Datar Rajeev Motwani Department of Computer Science Stanford University, Stanford, CA 94305 {babcock, datar, rajeev}@cs.stanford.edu
More informationMulti-service Load Balancing in a Heterogeneous Network with Vertical Handover
1 Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover Jie Xu, Member, IEEE, Yuming Jiang, Member, IEEE, and Andrew Perkis, Member, IEEE Abstract In this paper we investigate
More informationINDEXING BIOMEDICAL STREAMS IN DATA MANAGEMENT SYSTEM 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 9/2005, ISSN 1642-6037 Michał WIDERA *, Janusz WRÓBEL *, Adam MATONIA *, Michał JEŻEWSKI **,Krzysztof HOROBA *, Tomasz KUPKA * centralized monitoring,
More informationLoad Distribution in Large Scale Network Monitoring Infrastructures
Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu
More informationEnabling Cloud Architecture for Globally Distributed Applications
The increasingly on demand nature of enterprise and consumer services is driving more companies to execute business processes in real-time and give users information in a more realtime, self-service manner.
More informationCleaning Encrypted Traffic
Optenet Documentation Cleaning Encrypted Traffic Troubleshooting Guide iii Version History Doc Version Product Date Summary of Changes V6 OST-6.4.300 01/02/2015 English editing Optenet Documentation
More informationData Mining Governance for Service Oriented Architecture
Data Mining Governance for Service Oriented Architecture Ali Beklen Software Group IBM Turkey Istanbul, TURKEY alibek@tr.ibm.com Turgay Tugay Bilgin Dept. of Computer Engineering Maltepe University Istanbul,
More informationBig Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades
More informationBig Data Challenges. Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com
Database Systems Journal vol. IV, no. 3/2013 31 Big Data Challenges Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com The amount of data that is traveling across
More informationEnterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd
Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd Page 1 of 8 TU1UT TUENTERPRISE TU2UT TUREFERENCESUT TABLE
More informationProcessing Flows of Information: From Data Stream to Complex Event Processing
Processing Flows of Information: From Data Stream to Complex Event Processing GIANPAOLO CUGOLA and ALESSANDRO MARGARA Dip. di Elettronica e Informazione Politecnico di Milano, Italy A large number of distributed
More informationZero-Latency Data Warehousing (ZLDWH): the State-of-the-art and experimental implementation approaches
Zero-Latency Data Warehousing (ZLDWH): the State-of-the-art and experimental implementation approaches Tho Manh Nguyen, and A Min Tjoa Abstract Increased data volumes and accelerating update speeds are
More informationMonitoring Traffic manager
Monitoring Traffic manager eg Enterprise v6 Restricted Rights Legend The information contained in this document is confidential and subject to change without notice. No part of this document may be reproduced
More informationSampling Methods In Approximate Query Answering Systems
Sampling Methods In Approximate Query Answering Systems Gautam Das Department of Computer Science and Engineering The University of Texas at Arlington Box 19015 416 Yates St. Room 300, Nedderman Hall Arlington,
More informationVirtual Operational Data Store (VODS) A Syncordant White Paper
Virtual Operational Data Store (VODS) A Syncordant White Paper Table of Contents Executive Summary... 3 What is an Operational Data Store?... 5 Differences between Operational Data Stores and Data Warehouses...
More informationResearch on Video Traffic Control Technology Based on SDN. Ziyan Lin
Joint International Mechanical, Electronic and Information Technology Conference (JIMET 2015) Research on Video Traffic Control Technology Based on SDN Ziyan Lin Communication University of China, Beijing
More informationHighly Available Mobile Services Infrastructure Using Oracle Berkeley DB
Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Executive Summary Oracle Berkeley DB is used in a wide variety of carrier-grade mobile infrastructure systems. Berkeley DB provides
More informationA Survey of Real-Time Data Warehouse and ETL
Fahd Sabry Esmail Ali A Survey of Real-Time Data Warehouse and ETL Article Info: Received 09 July 2014 Accepted 24 August 2014 UDC 004.6 Recommended citation: Esmail Ali, F.S. (2014). A Survey of Real-
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationOptimization of ETL Work Flow in Data Warehouse
Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. Sivaganesh07@gmail.com P Srinivasu
More informationCHAPTER 7 SUMMARY AND CONCLUSION
179 CHAPTER 7 SUMMARY AND CONCLUSION This chapter summarizes our research achievements and conclude this thesis with discussions and interesting avenues for future exploration. The thesis describes a novel
More informationGRIDS IN DATA WAREHOUSING
GRIDS IN DATA WAREHOUSING By Madhu Zode Oct 2008 Page 1 of 6 ABSTRACT The main characteristic of any data warehouse is its ability to hold huge volume of data while still offering the good query performance.
More informationBoarding to Big data
Database Systems Journal vol. VI, no. 4/2015 11 Boarding to Big data Oana Claudia BRATOSIN University of Economic Studies, Bucharest, Romania oc.bratosin@gmail.com Today Big data is an emerging topic,
More informationDistributed Sampling Storage for Statistical Analysis of Massive Sensor Data
Distributed Sampling Storage for Statistical Analysis of Massive Sensor Data Hiroshi Sato 1, Hisashi Kurasawa 1, Takeru Inoue 1, Motonori Nakamura 1, Hajime Matsumura 1, and Keiichi Koyanagi 2 1 NTT Network
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationProper study of Data Warehousing and Data Mining Intelligence Application in Education Domain
Journal of The International Association of Advanced Technology and Science Proper study of Data Warehousing and Data Mining Intelligence Application in Education Domain AMAN KADYAAN JITIN Abstract Data-driven
More informationLoad Balancing in Structured Peer to Peer Systems
Load Balancing in Structured Peer to Peer Systems DR.K.P.KALIYAMURTHIE 1, D.PARAMESWARI 2 Professor and Head, Dept. of IT, Bharath University, Chennai-600 073 1 Asst. Prof. (SG), Dept. of Computer Applications,
More informationEfficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration
Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration 1 Harish H G, 2 Dr. R Girisha 1 PG Student, 2 Professor, Department of CSE, PESCE Mandya (An Autonomous Institution under
More informationPrediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
More informationLoad Balancing in Structured Peer to Peer Systems
Load Balancing in Structured Peer to Peer Systems Dr.K.P.Kaliyamurthie 1, D.Parameswari 2 1.Professor and Head, Dept. of IT, Bharath University, Chennai-600 073. 2.Asst. Prof.(SG), Dept. of Computer Applications,
More informationMasters Project Proxy SG
Masters Project Proxy SG Group Members Chris Candilora Cortland Clater Eric Garner Justin Jones Blue Coat Products Proxy SG Series Blue Coat Proxy SG appliances offer a comprehensive foundation for the
More informationWide Area Monitoring, Control, and Protection
Wide Area Monitoring, Control, and Protection Course Map Acronyms Wide Area Monitoring Systems (WAMS) Wide Area Monitoring Control Systems (WAMCS) Wide Area Monitoring Protection and Control Systems (WAMPACS)
More informationThe OSI model has seven layers. The principles that were applied to arrive at the seven layers can be briefly summarized as follows:
1.4 Reference Models Now that we have discussed layered networks in the abstract, it is time to look at some examples. In the next two sections we will discuss two important network architectures, the
More informationReal-Time Data Warehouse Loading Methodology
Real-Time Data Warehouse Loading Methodology Ricardo Jorge Santos CISUC Centre of Informatics and Systems DEI FCT University of Coimbra Coimbra, Portugal lionsoftware.ricardo@gmail.com Jorge Bernardino
More information