Near Real-time Data Warehousing with Multi-stage Trickle & Flip
|
|
|
- Joan Henry
- 10 years ago
- Views:
Transcription
1 Near Real-time Data Warehousing with Multi-stage Trickle & Flip Janis Zuters University of Latvia, 19 Raina blvd., LV-1586 Riga, Latvia Abstract. A data warehouse typically is a collection of historical data designed for decision support, so it is updated from the sources periodically, mostly on a daily basis. Today s business however asks for fresher data. Real-time warehousing is one of the trends to accomplish this, but there are a number of challenges to move towards true real-time. This paper proposes Multi-stage Trickle & flip methodology for data warehouse refreshment. It is based on the Trickle & flip principle and extended in order to further insulate loading and querying activities, thus enabling both of them to be more efficient. Keywords: Real-time data warehousing, data refreshment, data loading 1 Introduction A data warehouse (DW) is a collection of technologies aimed at enabling the knowledge worker (executive, manager, and analyst) to make better and faster decisions. Traditional online transaction processing (OLTP) systems are inappropriate for decision support. Data warehousing has become an important strategy to integrate heterogeneous data sources and to enable online analytic processing (OLAP) [1]. Traditionally a data warehouse is refreshed periodically (e.g. weekly, daily) and can be considered as a window on the past. Until recently, using periodically updated data was not a crucial issue. However, with enterprises such as e-business, stock brokering, online telecommunications, and health systems, for instance, relevant information needs to be delivered as fast as possible to knowledge workers or decision systems who rely on it to react in a near real-time manner, according to the new and most recent data captured by an organization s information system [2]. This makes supporting near real-time data warehousing a critical issue for such applications. Today s business demands are trying to gradually eliminate the inscription for history. Technically, one of the main roles a data warehouse is to separate the analytical part of the system from the operational one, in order to satisfy heavy performance demands in both of them. Providing a data warehouse with ever fresher data makes this task more and more difficult a frequent transportation of data from sources to data warehouse increases the collision risk between data loading and querying activities.
2 Real-time warehousing is an evitable stage of evolution of data warehouses. Unfortunately this involves a lot of challenges and trade-offs, especially those with regard to state-of-the-art technologies; the most typical ones are faced when data refreshment can no longer be postponed to off-peak hours. On the one hand, end-users of data warehouses need ever fresher data; on the other hand this involves additional requirements to hardware and possibly changes to data analysis behaviour due to frequent loadings. Because of more frequent and more complex operations related to data loading, OLAP performance could degrade dramatically. Real-time data warehouses aim for decreasing the time it takes to make decisions and try to attain zero latency between cause and effect for that decision, closing the gap between intelligent reactive systems and systems processes. [3] Moving towards real-time involves a number of issues/requirements: When loading data (near) continuously in real-time, there can t be any (transactional) system downtime [4]; By loading data continuously (more frequently than daily), Query/OLAP downtimes and other inconveniences should be taken into account; The system becomes more complicated. This makes the triple trade-off for the whole system for the sake of real-time: Data extraction should be as lightweight as possible; OLAP activities shouldn t be much affected due to more frequent loading; Data warehouse designers/users shouldn t suffer a lot from additional complicacy of the system. To solve this, a good deal of real-time approaches, methodologies and technologies have been proposed in each of the stages, int. al. ETL (extract-transform-load), data modelling, and data analysis. The proposed approach of Multi-stage Trickle & flip lies beyond actual data transportation issues between the source systems and the data warehouse (e.g. change data capture, CDC) and assumes conventional ETL adapted to get near real-time warehousing put into effect. 2 Related Work In general, the principle of real-time is approached in several ways from technologies of data transportation to DW architectures and OLAP query issues. One of the focuses is that of true real-time data transportation between the operational systems and the data warehouse. Such technologies, approaches and protocols, concerning data transport and integration, lie beyond the scope of this paper. Here well matured ETL systems are assumed, and the described approaches address the aspects of DW affecting structures and manipulations over the data after they have been loaded from the sources. Traditionally an ETL system (and thus DW refreshment, in general) works in a batch mode, but continuous ETL technologies are also being evolved.
3 Near real-time data warehousing addresses the challenge of need for fresh data by simply shortening the data warehouse refreshment intervals and hence, delivering source data to the data warehouse with lower latency [5]. This approach is referred to as near real-time data warehousing or microbatch ETL [6]. In contrast to true realtime solutions this approach builds on the mature and proven ETL system and does not require the re-implementation of the transformation logic. A workable solution to it is near real-time ETL, where the loading frequencies are simply increased without any other changes on the system (Fig. 1). This is the cheapest and the easiest way to solve the problem and is a good choice for relatively small data warehouses until off-peak hour loadings become an issue. As refreshment schedule is changed, additional side effects appear. One is that of refreshment anomalies during the ETL process, addressed in [5]. If the refreshment rate exceeds microbatch basis and loading is going to become continuous, this will require new approaches and technologies lying beyond the scope of this paper. The Trickle and Flip approach helps avert the scalability issues associated with querying tables of a data warehouse that are being simultaneously updated. [4] Here, instead of continuous (or very frequent microbatch) loading of data directly into warehouse tables a staging area is used. To make things easier, the staging tables are exactly of the same format as the target tables. Applying the principle of Trickle & Flip, on a periodic basis the staging tables are duplicated and the copy is swapped with the data warehouse tables. For smaller data warehouses the staging tables can contain a complete copy of all data while for bigger ones a typical case would be the data for the current day being put into a separate partition. [3] proposes an integrated data warehouses loading methodology that covers as many as four different areas of operation: (a) data warehouse schema adaptation, (b) ETL loading procedures, (c) OLAP query adaptation, and (d) DW database packing and reoptimization. As previously, here also the principle is used to have the same format for temporary tables as of the DW. 3 Trickle & Flip This Section goes deeper into the Trickle & Flip approach, as further in this paper this particular principle is being evolved. If using Trickle & Flip, only very small data warehouses can do without a realtime partition (Fig. 2). Applying the principle of Trickle & Flip with real-time partitions, on a periodic basis the staging tables are duplicated and the copy is swapped with the real-time tables (Figs. 3 and 4). When used, real-time partitions imply additional requirements to the system. A real-time partition must [7]: Contain all the activity occurred since the last update of the static data; Link as seamlessly as possible to the static data tables; Be so lightly indexed that incoming data can be continuously fed in; Support high performance querying.
4 Microbatch ETL OLAP Source database Data warehouse Fig. 1. Near real-time warehousing with near real-time ETL ETL Source database Staging tables Copy Copy of staging tables Flip OLAP Data warehouse Fig. 2. Data loading using Trickle & Flip with complete copy of data in the staging tables. Typical cycle times range from hourly to every minute or even faster In particular, the latter two requirements are competing ones and thus potentially leading to awkward trade-offs or workarounds. If the integration of real-time partition is implemented through views, the swap operation would only consist of changing the view definition. Even though flipping operation is a lightweight one, most likely it might be advisable to temporally pause the OLAP server. In general, integrating real-time partitions into a system is very a complex task in terms of engineering. The success is largely up to ability of the query tool to encapsulate this complexity and to hide it from end-users. A separate real-time data partition is a valuable technique to reduce loading/querying conflicts, yet at each flip (and this is to occur many times a day) it is advisable to restrict querying. If the warehouse is advancing towards real-time, the cycle times can be as frequent as of several minutes, and this would put a heavy burden to the end-users working on real-time data. In this Section and further, it is assumed that the real-time data is loaded into the static data warehouse on a daily basis (in peak-off hours), so this particular loading is not covered by the described refreshment methodologies.
5 ETL OLAP Source database D1. Staging tables Copy Static data D2. Copy of staging tables Flip D3. Realtime data Data warehouse Fig. 3. Data loading using Trickle & Flip with a separate real-time partition. Real-time data will typically contain the data of the current day only (A letter D in D1, D2, and D3 stands for day ), thus all the three partitions (D1, D2, and D3) contain full data from the beginning of the current day with potentially different latencies ALGORITHM trickle_and_flip_refresh (R) D3 real-time partition of the DW (Fig. 3) D1, D2 staging partitions with the same data format as D3 (Fig. 3) R refreshment rate (e.g., 1 hour) D1 is being continuously (or in a microbatch mode) fed from the source BEGIN Do Every R % e.g., every hour Copy D1 to D2 % i.e., do backup of D1 into D2 Flip D2 and D3 % D3 should not be locked by querying Fig. 4. DW refreshment algorithm using Trickle & Flip (related to Fig. 3). In off-peak time, real-time data is added to the static DW and the staging tables are emptied. Summary of the Trickle & Flip approach (assuming the refreshment rate to be one hour): All the three partitions contain approximately the data from the beginning of the day up to now; Normally, the data of the real-time partition is not older than 1 hour; Each hour the process of duplicating the data of the current day is being performed; Each hour it is advisable to restrict querying (on real-time data) The issues of the Trickle & Flip approach: For very large DWs, copying the full day data every hour could matter; Real-time querying being impeded every hour; By reducing the refreshment rate, the issues get heavier.
6 The Multi-stage Trickle & Flip approach, proposed in the next Section, applies additional intermediate real-time areas in order to aver the listed drawbacks of the pure Trickle & Flip. The main focus is put on reducing the refreshment rate to get closer to real-time. D1 is being fed from the source and has the most fresh data D1 D2 D3 Copy D1 to D2 D1 D2 D3 Flip D2 and D3 Now the real-time partition D3 has fresh data; normally, the latency is not greater than of the actual refreshment rate D1 D2 D3 Fig. 5. An example. One iteration of a Trickle & Flip loading (according to Fig. 4). D1 and D2 are staging areas, D3 is the real-time part of the DW 3 Multi-stage Trickle & Flip 3.1 The Main Idea For the pure Trickle & Flip, if the cycle times are as frequent as of several minutes, this could put a heavy burden to querying, so to the end-users working on real-time data. To overcome this, we propose Multi-stage Trickle & Flip approach (Fig. 6) by adding additional intermediate stages to the system, thus mitigating the competition between loading and querying processes. In particular, real-time data is divided into two (or potentially more) subpartitions, each one with a different latency (and thus, also amount of stored data). The main objective of the new approach: Collisions between data loading and querying activities should be reduced.
7 3.2. Setup and Operation Evolving the approach of Trickle & Feed is based on the following assumptions: Adding data to a smaller table (i.e., with less data) is faster; Updating last changes to a table is faster than making full copy of the last version. OLAP Source database H`. Staging tables 1 D`. Staging tables 2 Data warehouse ETL R1 R2 M. Staging tables 0 H. real-time data 1 D. real-time data 2 W. Static data Fig. 6. Multi-stage Trickle & Flip with one intermediate level. In partition names, a letter M stands for minute, H stands for hour, D stands for day. Data warehouse consists of partitions H, D, and W, while M, H`, and D` are staging partitions. R1 the first stage refreshment rate, e.g., 5 minutes, R2 the second stage refreshment rate, e.g., 1 hour Construction of Multi-stage Trickle & Flip infrastructure (Fig. 6): DW is represented by static data (W) and two (or potentially more) real time partitions (D and H). Each of real time partitions is designed to contain a different degree of amount of data (e.g., of the last day or the last hour); The staging area M is continuously fed with the source data; There are two additional staging areas (for each of the real-time partitions) D` and H` which are not affected by querying, thus always available for loading activities; For both stages, refreshment with two different refreshment rates is performed (See the algorithm in Fig. 7 for the first stage refreshment: the same algorithm suits second stage refreshment with parameters R2, H, D`, D). For simplicity of description, it is assumed that refreshment rates R1 and R2 are 5 minutes and 1 hour respectively; There are 3 possible levels of querying: 1) W, 2) W+D, 3) W+D+H (i.e. of latencies of 1 day, 1 hour, or 5 minutes respectively). An example of refreshment with this methodology is given in Fig. 8.
8 3.3. Benefits and Issues The Multi-stage Trickle & Flip approach reduces amount of data to be copied and possible collisions between loading and querying activities. A brief comparison between the two approaches is given in Table 1. ALGORITHM multiple_trickle_and_flip_refresh (R1, M, H`, H) H real-time partition for the current hour M, H` staging partitions with the same data format as W (Fig. 6) R1 first stage refreshment rate (e.g., 5 minutes) M is being continuously (or in a microbatch mode) fed from the source BEGIN Do Every R1 % e.g., every 5 minutes Add M to H` Empty M If H is available % not locked by querying Add H` to H Empty H` Fig. 7. The first stage DW refreshment algorithm using Multi-stage Trickle & Flip (related to Fig. 6). H will typically contain the data of the current hour (with latency of 5 minutes). H` will typically be empty unless H is not available for some time in this case H` would contain one or several portions of 5 minutes data. The algorithm is visualized in Fig. 8. Exactly the same algorithm suits the second stage, but with parameters R2, H, D`, D respectively. The summary obtained features of the new approach: Total amount of data copying is reduced; Collisions between data loading and querying activities have been reduced; By advancing the querying system (e.g., balancing needs for queries with a certain latency, separate caching for each stage) these collisions would have been more reduced; The approach is open to additional stages. Having appropriate querying tools available this could significantly help to approach true real-time refreshment. The main technical issue of the methodology is vulnerability menace for query integrity if data is temporarily stored in the staging area waiting for the appropriate partition (see step #4 in Fig. 8); for the pure Trickle & Flip this will simply cause higher latency. The main challenge of the approach is that of a very complicated multi-stage querying system. However the case is not over-complex, moreover, multi-stage querying is nothing new to have come with this approach: also the pure Trickle & Flip needs such in order to combine static data with real-time data. Such querying tools and methodologies are already available. [8] describes a similar principle in generating reports on a data warehouse of multiple versions (i.e., stages, in terms of this paper).
9 5 Conclusion The data warehouse refreshing approach of Multi-stage Trickle & Flip is proposed to mitigate the competition between the loading and querying processes when trying to approach real-time operation. It is designed as an extension to the well known principle of Trickle & Flip. Although the methodology doesn t directly cover all the areas of ETL and querying, still it requires additional engineering efforts to obtain the maximum benefits, in particular, those of querying mechanism: Separate caching for each stage; A mechanism to balance needs for higher latency and the loading process. M is being fed from the source and has the freshest data of the last 5 minutes; normally, H` is empty unless H is unavailable for some time due to querying. H contains the data of the last hour: M H` H Copy M to H` Empty M M H` H If H is available then copying is performed further to H Now the real-time partition H has fresh data; normally, the latency is not greater than of 5 minutes: M H` H If H is unavailable (due to querying activities), H` is likely to contain more then one portion of M (while waiting for H) M H` H Fig. 8. An example. One iteration of a Multi-stage Trickle & Flip loading; the first stage (according to the schema in Fig. 6 and the algorithm in Fig. 7). M and H` are staging areas, H is the first real-time part of the DW. The same way, the loading is performed in the second stage. Acknowledgements. This work has been supported by ESF project No. 2009/0216/1DP/ /09/APIA/VIAA/044.
10 References 1. Jarke, M., Lenzerini, M., Vassiliou, Y.: Panos Vassiliadis. Fundamentals of Data Warehouses. Springer Verlag, 2nd, rev. and extended ed., XIV (2003) 2. Inmon, W. H., Terdeman, R. H., Norris-Montanari, J., and Meers, D.: Data Warehousing for E-Business, J. Wiley & Sons (2001) 3. Santos, R.J., Bernardino, J.: Real-time Warehouse Loading Methodology. Proceedings of the 2008 international symposium on Database engineering & applications (IDEAS '08), ACM, New York, NY, USA (2008) 4. Langseth, J.: Real-time data warehousing: Challenges and solutions, DSSResources.COM, (2004) 5. Jorg, T., Dessloch, S.: Near Real-time data warehousing using state-of-the-art ETL tools. Enabling Real-Time Business Intelligence, Lecture Notes in Business Information Processing, Volume 41. ISBN Springer-Verlag Heidelberg (2010) 6. Kimball, R., Caserta, J.: The data warehouse ETL toolkit: Practical techniques for extracting, cleaning, conforming, and delivering data. John Wiley & Sons (2004) 7. Kimball, R.; Ross, M.; Thornthwaite, W.; Mundy, J.; Becker, B.: The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence. John Wiley & Sons (2010) 8. Solodovnikova, D.: Building Queries on Multiple Versions of Data Warehouse. Proceedings of the Eighth International Baltic Conference (DB&IS 2008), Tallinn, Estonia, pp (2008)
Towards Real-Time Data Integration and Analysis for Embedded Devices
Towards Real-Time Data Integration and Analysis for Embedded Devices Michael Soffner, Norbert Siegmund, Mario Pukall, Veit Köppen Otto-von-Guericke University of Magdeburg {soffner, nsiegmun, pukall, vkoeppen}@ovgu.de
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
Dimensional Modeling for Data Warehouse
Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or
BUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
DATA MINING AND WAREHOUSING CONCEPTS
CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation
Data Warehousing Systems: Foundations and Architectures
Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository
An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of
An Introduction to Data Warehousing An organization manages information in two dominant forms: operational systems of record and data warehouses. Operational systems are designed to support online transaction
An Architecture for Integrated Operational Business Intelligence
An Architecture for Integrated Operational Business Intelligence Dr. Ulrich Christ SAP AG Dietmar-Hopp-Allee 16 69190 Walldorf [email protected] Abstract: In recent years, Operational Business Intelligence
Data Warehousing and Data Mining
Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong [email protected] Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge
Reducing ETL Load Times by a New Data Integration Approach for Real-time Business Intelligence
Reducing ETL Load Times by a New Data Integration Approach for Real-time Business Intelligence Darshan M. Tank Department of Information Technology, L.E.College, Morbi-363642, India [email protected] Abstract
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]
An Oracle White Paper March 2014. Best Practices for Real-Time Data Warehousing
An Oracle White Paper March 2014 Best Practices for Real-Time Data Warehousing Executive Overview Today s integration project teams face the daunting challenge that, while data volumes are exponentially
Integrating Netezza into your existing IT landscape
Marco Lehmann Technical Sales Professional Integrating Netezza into your existing IT landscape 2011 IBM Corporation Agenda How to integrate your existing data into Netezza appliance? 4 Steps for creating
Real-Time Data Warehouse Loading Methodology
Real-Time Data Warehouse Loading Methodology Ricardo Jorge Santos CISUC Centre of Informatics and Systems DEI FCT University of Coimbra Coimbra, Portugal [email protected] Jorge Bernardino
Performance Measurement Framework with Indicator Life-Cycle Support
Performance Measurement Framework with Life-Cycle Support Aivars NIEDRITIS, Janis ZUTERS, and Laila NIEDRITE University of Latvia Abstract. The performance measurement method introduced in this paper is
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd
Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd Page 1 of 8 TU1UT TUENTERPRISE TU2UT TUREFERENCESUT TABLE
An Integration Adaptation for Real-Time Datawarehousing
, pp. 115-128 http://dx.doi.org/10.14257/ijseia.2014.8.11.10 An Integration Adaptation for Real-Time Datawarehousing Imane Lebdaoui 1, Ghizlane Orhanou 2 and Said Elhajji 3 Laboratory of Mathematics, Computing
Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda
Data warehouses 1/36 Agenda Why do I need a data warehouse? ETL systems Real-Time Data Warehousing Open problems 2/36 1 Why do I need a data warehouse? Why do I need a data warehouse? Maybe you do not
Data Warehouse: Introduction
Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,
www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28
Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT
MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus
MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus I. Contact Information Professor: Joseph Morabito, Ph.D. Office: Babbio 419 Office Hours: By Appt. Phone: 201-216-5304 Email: [email protected]
THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days
Three Days Prerequisites Students should have at least some experience with any relational database management system. Who Should Attend This course is targeted at technical staff, team leaders and project
Microsoft Data Warehouse in Depth
Microsoft Data Warehouse in Depth 1 P a g e Duration What s new Why attend Who should attend Course format and prerequisites 4 days The course materials have been refreshed to align with the second edition
The Data Warehouse ETL Toolkit
2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. The Data Warehouse ETL Toolkit Practical Techniques for Extracting,
Deriving Business Intelligence from Unstructured Data
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 9 (2013), pp. 971-976 International Research Publications House http://www. irphouse.com /ijict.htm Deriving
Indexing Techniques for Data Warehouses Queries. Abstract
Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 [email protected] [email protected] Abstract Recently,
ENABLING OPERATIONAL BI
ENABLING OPERATIONAL BI WITH SAP DATA Satisfy the need for speed with real-time data replication Author: Eric Kavanagh, The Bloor Group Co-Founder WHITE PAPER Table of Contents The Data Challenge to Make
Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.
Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc. Introduction Abstract warehousing has been around for over a decade. Therefore, when you read the articles
Next Generation Data Warehousing Appliances 23.10.2014
Next Generation Data Warehousing Appliances 23.10.2014 Presentert av: Espen Jorde, Executive Advisor Bjørn Runar Nes, CTO/Chief Architect Bjørn Runar Nes Espen Jorde 2 3.12.2014 Agenda Affecto s new Data
A Survey on Data Warehouse Architecture
A Survey on Data Warehouse Architecture Rajiv Senapati 1, D.Anil Kumar 2 1 Assistant Professor, Department of IT, G.I.E.T, Gunupur, India 2 Associate Professor, Department of CSE, G.I.E.T, Gunupur, India
Agile Business Intelligence Data Lake Architecture
Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step
CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS
CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS In today's scenario data warehouse plays a crucial role in order to perform important operations. Different indexing techniques has been used and analyzed using
Integrating SAP and non-sap data for comprehensive Business Intelligence
WHITE PAPER Integrating SAP and non-sap data for comprehensive Business Intelligence www.barc.de/en Business Application Research Center 2 Integrating SAP and non-sap data Authors Timm Grosser Senior Analyst
Data warehouse development with EPC
Proceedings of the 5th WSEAS Int. Conf. on ATA NETWORKS, COMMUNICATIONS & COMPUTERS, Bucharest, Romania, October 16-17, 2006 1 ata warehouse development with EPC MARIS KLIMAVICIUS aculty of Computer Science
A Design and implementation of a data warehouse for research administration universities
A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Institute of Computing Science Laboratory of Intelligent Decision Support Systems Politechnika Poznańska (Poznań University of Technology) Software
ETL-EXTRACT, TRANSFORM & LOAD TESTING
ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA [email protected] ABSTRACT Data is most important part in any organization. Data
Data warehouse Architectures and processes
Database and data mining group, Data warehouse Architectures and processes DATA WAREHOUSE: ARCHITECTURES AND PROCESSES - 1 Database and data mining group, Data warehouse architectures Separation between
Striving towards Near Real-Time Data Integration for Data Warehouses
Striving towards Near Real-Time Data Integration for Data Warehouses Robert M. Bruckner 1, Beate List 1, and Josef Schiefer 2 1 Institute of Software Technology Vienna University of Technology Favoritenstr.
A Knowledge Management Framework Using Business Intelligence Solutions
www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For
Master Data Management and Data Warehousing. Zahra Mansoori
Master Data Management and Data Warehousing Zahra Mansoori 1 1. Preference 2 IT landscape growth IT landscapes have grown into complex arrays of different systems, applications, and technologies over the
Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya
Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
An Instructional Design for Data Warehousing: Using Design Science Research and Project-based Learning
An Instructional Design for Data Warehousing: Using Design Science Research and Project-based Learning Roelien Goede North-West University, South Africa Abstract The business intelligence industry is supported
Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006
Practical Considerations for Real-Time Business Intelligence Donovan Schneider Yahoo! September 11, 2006 Outline Business Intelligence (BI) Background Real-Time Business Intelligence Examples Two Requirements
Gradient An EII Solution From Infosys
Gradient An EII Solution From Infosys Keywords: Grid, Enterprise Integration, EII Introduction New arrays of business are emerging that require cross-functional data in near real-time. Examples of such
An Enterprise Data Hub, the Next Gen Operational Data Store
An Enterprise Data Hub, the Next Gen Operational Data Store Version: 101 Table of Contents Summary 3 The ODS in Practice 4 Drawbacks of the ODS Today 5 The Case for ODS on an EDH 5 Conclusion 6 About the
Enable Better and Timelier Decision-Making Using Real-Time Business Intelligence System
I.J. Information Engineering and Electronic Business, 2015, 1, 43-48 Published Online January 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijieeb.2015.01.06 Enable Better and Timelier Decision-Making
BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining
BUSINESS INTELLIGENCE Bogdan Mohor Dumitrita 1 Abstract A Business Intelligence (BI)-driven approach can be very effective in implementing business transformation programs within an enterprise framework.
Data Warehouse for Interactive Decision Support for Addis Ababa City Administration
Data Warehouse for Interactive Decision Support for Addis Ababa City Administration Yilma Desta Ethiopian Airlines, Addis Ababa, Ethiopia [email protected] Mesfin Kifle Department of Computer Science,
A collaborative approach of Business Intelligence systems
A collaborative approach of Business Intelligence systems Gheorghe MATEI, PhD Romanian Commercial Bank, Bucharest, Romania [email protected] Abstract: To succeed in the context of a global and dynamic
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
Data Warehousing & Data Mining
Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 8. Real-Time DW 8. Real-Time Data Warehouses
BUILDING A WEB-ENABLED DATA WAREHOUSE FOR DECISION SUPPORT IN CONSTRUCTION EQUIPMENT MANAGEMENT
BUILDING A WEB-ENABLED DATA WAREHOUSE FOR DECISION SUPPORT IN CONSTRUCTION EQUIPMENT MANAGEMENT Hongqin Fan ([email protected]) Graduate Research Assistant, University of Alberta, AB, T6G 2E1, Canada Hyoungkwan
Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence
Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Appliances and DW Architectures John O Brien President and Executive Architect Zukeran Technologies 1 TDWI 1 Agenda What
Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov
Unlock your data for fast insights: dimensionless modeling with in-memory column store By Vadim Orlov I. DIMENSIONAL MODEL Dimensional modeling (also known as star or snowflake schema) was pioneered by
A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing
A Framework for Developing the Web-based Integration Tool for Web-Oriented Warehousing PATRAVADEE VONGSUMEDH School of Science and Technology Bangkok University Rama IV road, Klong-Toey, BKK, 10110, THAILAND
Reverse Engineering in Data Integration Software
Database Systems Journal vol. IV, no. 1/2013 11 Reverse Engineering in Data Integration Software Vlad DIACONITA The Bucharest Academy of Economic Studies [email protected] Integrated applications
TopBraid Insight for Life Sciences
TopBraid Insight for Life Sciences In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.
Data Warehouses & OLAP
Riadh Ben Messaoud 1. The Big Picture 2. Data Warehouse Philosophy 3. Data Warehouse Concepts 4. Warehousing Applications 5. Warehouse Schema Design 6. Business Intelligence Reporting 7. On-Line Analytical
Using In-Memory Computing to Simplify Big Data Analytics
SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed
A Service-oriented Architecture for Business Intelligence
A Service-oriented Architecture for Business Intelligence Liya Wu 1, Gilad Barash 1, Claudio Bartolini 2 1 HP Software 2 HP Laboratories {[email protected]} Abstract Business intelligence is a business
THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE
THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE Carmen Răduţ 1 Summary: Data quality is an important concept for the economic applications used in the process of analysis. Databases were revolutionized
Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA
Applied Business Intelligence Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA Agenda Business Drivers and Perspectives Technology & Analytical Applications Trends Challenges
An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies
An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies Ashish Gahlot, Manoj Yadav Dronacharya college of engineering Farrukhnagar, Gurgaon,Haryana Abstract- Data warehousing, Data Mining,
IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse
IBM Analytics Just the facts: Four critical concepts for planning the logical data warehouse 1 2 3 4 5 6 Introduction Complexity Speed is businessfriendly Cost reduction is crucial Analytics: The key to
Best Practices for Deploying Managed Self-Service Analytics and Why Tableau and QlikView Fall Short
Best Practices for Deploying Managed Self-Service Analytics and Why Tableau and QlikView Fall Short Vijay Anand, Director, Product Marketing Agenda 1. Managed self-service» The need of managed self-service»
Integrating a web application with Siebel CRM system
Integrating a web application with Siebel CRM system Mika Salminen, Antti Seppälä Helsinki University of Technology, course Business Process Integration: Special Course in Information Systems Integration,
Understanding Neo4j Scalability
Understanding Neo4j Scalability David Montag January 2013 Understanding Neo4j Scalability Scalability means different things to different people. Common traits associated include: 1. Redundancy in the
Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers
60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative
Deductive Data Warehouses and Aggregate (Derived) Tables
Deductive Data Warehouses and Aggregate (Derived) Tables Kornelije Rabuzin, Mirko Malekovic, Mirko Cubrilo Faculty of Organization and Informatics University of Zagreb Varazdin, Croatia {kornelije.rabuzin,
CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY
51 CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY Web application operations are a crucial aspect of most organizational operations. Among them business continuity is one of the main concerns. Companies
8 Ways that Business Intelligence Projects are Different
8 Ways that Business Intelligence Projects are Different And How to Manage BI Projects to Ensure Success Business Intelligence and Data Warehousing projects have developed a reputation as being difficult,
SimCorp Solution Guide
SimCorp Solution Guide Data Warehouse Manager For all your reporting and analytics tasks, you need a central data repository regardless of source. SimCorp s Data Warehouse Manager gives you a comprehensive,
Data Warehouse (DW) Maturity Assessment Questionnaire
Data Warehouse (DW) Maturity Assessment Questionnaire Catalina Sacu - [email protected] Marco Spruit [email protected] Frank Habers [email protected] September, 2010 Technical Report UU-CS-2010-021
LEARNING SOLUTIONS website milner.com/learning email [email protected] phone 800 875 5042
Course 20467A: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Length: 5 Days Published: December 21, 2012 Language(s): English Audience(s): IT Professionals Overview Level: 300
ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION
ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION Enzo Unified Solves Real-Time Data Integration Challenges that Increase Business Agility and Reduce Operational Complexities CHALLENGES
HOW INTERSYSTEMS TECHNOLOGY ENABLES BUSINESS INTELLIGENCE SOLUTIONS
HOW INTERSYSTEMS TECHNOLOGY ENABLES BUSINESS INTELLIGENCE SOLUTIONS A white paper by: Dr. Mark Massias Senior Sales Engineer InterSystems Corporation HOW INTERSYSTEMS TECHNOLOGY ENABLES BUSINESS INTELLIGENCE
MDM and Data Warehousing Complement Each Other
Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There
IST722 Data Warehousing
IST722 Data Warehousing Components of the Data Warehouse Michael A. Fudge, Jr. Recall: Inmon s CIF The CIF is a reference architecture Understanding the Diagram The CIF is a reference architecture CIF
Chapter 3 - Data Replication and Materialized Integration
Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 [email protected] Chapter 3 - Data Replication and Materialized Integration Motivation Replication:
DATA QUALITY IN BUSINESS INTELLIGENCE APPLICATIONS
DATA QUALITY IN BUSINESS INTELLIGENCE APPLICATIONS Gorgan Vasile Academy of Economic Studies Bucharest, Faculty of Accounting and Management Information Systems, Academia de Studii Economice, Catedra de
Datawarehousing and Analytics. Data-Warehouse-, Data-Mining- und OLAP-Technologien. Advanced Information Management
Anwendersoftware a Datawarehousing and Analytics Data-Warehouse-, Data-Mining- und OLAP-Technologien Advanced Information Management Bernhard Mitschang, Holger Schwarz Universität Stuttgart Winter Term
