An Integration Adaptation for Real-Time Datawarehousing
|
|
|
- Hillary Hodges
- 10 years ago
- Views:
Transcription
1 , pp An Integration Adaptation for Real-Time Datawarehousing Imane Lebdaoui 1, Ghizlane Orhanou 2 and Said Elhajji 3 Laboratory of Mathematics, Computing and Applications, Department of Mathematical and Computer Sciences, Faculty of Sciences, University of Mohammed V-Rabat, BP.1014 RP, Rabat, Morocco. 1 [email protected], 2 [email protected], 3 [email protected] Abstract Managing and storing big data makes use of powerful systems mainly based on the concept of data warehousing. Data changes usually occur in operational data sources that are mostly heterogeneous and remote from each other. To gather them in a single location called the data warehouse, they may undergo many treatments and processes according to the organization s policies and rules. Once in the data warehouse, they become ready for decision-making and analytical tools. For fast and good decisions for the future, based on fresh data, it is necessary that the data warehouse reflects the real operational data changes and provides the freshest data to the analytical systems. This paper addresses the problem of integrating big data into the data warehouse in short time and proposes a new model called DJ-DI. Based on Division of data changes by adapting table joins, this model increases the data integration s rate and thus data reach the data warehouse in shorter time. We have conducted different simulations of the DJ-DI model under our experimental platform. The obtained results show that the DJ-DI model offers a remarkable improvement of data integration s rate. Keywords: Big data, data changes, division, data warehouse, real-time, data integration, join 1. Introduction The era of big data induces the organizations to utilize robust systems to manage them efficiently in order to turn them into useful and valuable information. Data are firstly generated and handheld by operational data sources (ODS), which are, usually, heterogeneous and far from each other. In order to make them ready for decision-making processes and analytical tools, data need to be extracted from operational sources, transformed, cleansed and integrated into one repository called the data warehouse, which still is the main and the famous instrument of managing big data. Actually, a Data Warehouse (DW) gathers both the new and the historical data that have existed or still exist, under some form in the operational data sources. Data integration is the most important stage in data warehousing since it constitutes 80% of the work [7], in projects appealing to warehousing. The following schema (Figure 1) summarizes the process of data warehousing and data integration and shows a comparison between the ODS and the DW. ISSN: IJSEIA Copyright c 2014 SERSC
2 Figure 1. Comparison between ODS and DW Since the real-time (RT) becomes a strong requirement (for instance e-business, stock broking and online telecommunications) and large amount of data is a fact, it is necessary to manage data that come from different operational sources and deliver them in timely manner to the data warehouse. In fact, real-time business data get less value as they become older [5]. Real-time data warehousing (RTDW) is one potential trend that proposes solutions to manage large amounts of data in RT fashion. Indeed, to fulfill real-time requirements [8], systems use real-time based technologies like real-time or near real-time ETL, real-time data integration, real-time change data capture and real time partitions [8, 9]. However, such technologies imply additional requirements to the system [8] and further costs that can be heavy for small organizations. To override this situation and save more time, the data do not get lengthways the whole treatments they must receive. Some organizations mistakenly escapes some treatments that are essentials; essentials because they should be respected in the same way they were firstly established. The sequencing of these treatments provides certain level of data quality the organization aims. Some new people to real-time data warehousing propose overpassing the whole data integration process by moving data directly from the operational sources to the business reports [4] (Figure 2). Such process certainly insures a quick access to data for reports but it is dirty since it jeopardizes data quality and therefore data integrity. Actually, the integrity may be hampered because the report should have specific privileges to access the data in operational sources and the existing security rules in operational systems should be reviewed consequently. Figure 2. Both Possible Ways of Feeding Business Intelligence s (BI) reports 116 Copyright c 2014 SERSC
3 In our paper, we present a new methodology of improving data integration while preserving data integrity by respecting the existing rules of security. Our approach, called Divide-Join Data Integration (DJ-DI), behaves according to the size of data change. We have structured the rest of the paper as follows: Section 2 addresses related works to data integration improvement and issues. In Section 3, we introduce our methodology of data integration based on division and join adaptation, then Section 4 presents the experiment results. Finally, Section 5 concludes with the paper s findings and discussion. 2. Related Works To allow decision systems to react efficiently to the real world s requirements, the real world data have to feed and update them correctly and in timely manner. A data warehouse, which is a component of decision systems, need to be refreshed with the operational data change to include the most recent operational records. This mechanism is commonly called data integration (DI). The conventional approach to DI highlights two types of systems: source systems and the target system. Under this approach, all data are first extracted from the source systems and then are integrated into the target system, through an incremental strategy [5]. For any kind of data warehousing (Traditional, near-real time or real-time), the goal of DI is to gather data from different operational data sources that are likely heterogeneous [6], in order to combine them and then to present them in such a way that they appear to be a unified whole, overall the available information [6]. Actually, the business name for data integration allows the 360-degree view of the business [6]. Others meanings of DI include the ability to access, parse, normalize, standardize, integrate, cleanse, extract, match, classify, mask, and deliver data [7]. DI is important] and crucial at the same time; Important since it is strongly demanded by organizations since 2008 [2] that have started to uses different variant of DI. Furthermore, it constitutes 80% of the works in Data warehousing and when it is done correctly; it assures that data complies with policies for data access, use, security, privacy, and data standards [4]. It becomes crucial when DI delivers inconsistent, incomplete and inaccurate data to the business. Thus, serious problems may occur especially those related to data quality [7]. Consequently, these issues involve extra delays, poor customer service and the most critical inconvenient is jeopardizing decisions making processes which may compromises the organization s future, loyalty and also the customer s trust[7]. Thus, many works [1, 7] have presented mechanisms to accelerate data integration in order to save time and make data available for decision systems in real-time or at least in near real-time fashion. The authors of [1] present a model of refreshing the DW with the necessary changes in near real-time manner, while managing three indicators that help on deciding when and what to update. In [7], the author describes a real-time loading methodology by duplicating the schema of DW to store the temporary updated information without defining any type of index, primary key, or constraints. This approach may enable DW to obtain continuous data integration with minimum On-Line Analytical Process (OLAP) response-time; however, data integrity becomes a serious issue that cannot be neglect as we have mentioned in our previous work [3]. 3. Data Integration by Division and Join Adaptation Mechanism To illustrate our methodology, we consider N operational data sources (OD 1, ODS 2 ODS N ) whose data should integrate a fact table in the DW, in real-time. In other words, Copyright c 2014 SERSC 117
4 data integration should be performed, in real-time, as the operational data changes occur. We assume that a map s job that permits transformations, joining and checking assures the minimum of DI routines. Figure 3 schematizes our simplified configuration: ODS 1 Map s job DW FT 0 ODS n Figure 3. A Simplified Schema of Data Integration In the above-mentioned schema (Figure 3), data flow up into a map s job where they undergo transformation, cleansing, checking and may need some tables joining. Afterwards, they integrate the target fact table where they become available for decisional reports and analytical tools. Assumptions: Given: - N operational data sources OD 1, OD 2,, OD N, N 1 - the operational Tables T i,j, the j th Table that belongs to the i th operational data source, i, j 1. - At the instant t when a data change is observed, we refer by DC i,j,k,l (t), to data changes that occurs in T i,j and in T k,l that are joined (T i,j T k,l ), and to their sizes respectively by S i,j (t) and S k,l (t). The key idea of our model starts from the values of the changes size S i,j (t) and S k,l (t), which influence the behavior of our model. Our methodology is based on three (3) main axis: - Instantaneous Data Change measurement - Integration adaptation; - OLAP query adaptation. We appeal to divide to conquer principle, in a restricted way. We assume that dividing infinitely may threaten the system s performance, which is the opposite of our model s goal. 3.1 Instantaneous Data Change Measurement When a change occurs in the operational Tables, our model launches a trigger that records the size of data changes. At a specific instant t, for each data change that is recorded into separate Table, we have created the following function that instantaneously records the size of the change. 118 Copyright c 2014 SERSC
5 Algorithm n 1 : Data Change measurement Function getsize (t_table_name varchar2) RETURN NUMBER IS l_size NUMBER; BEGIN l_size := 0; select sum (bytes)/(1024*1024) into l_size from xxx where segment_name= t_table_name ; RETURN l_size; END get_table_size; According to the recorded value, one of two modes of the map s job should be launched. One solution of dividing data change can be measuring data change in each table that has received a new change and dividing all of the new changes in all Tables. This solution may imply, each time, the creation of many Tables, each one of them will receive one division of data change. Unfortunately, the experimenting of this solution shows is complexity while it does not bring a remarkable improvement of data integration time. Thus, we have based our model on the following method shown by Figure 4. Figure 4. DJDI Model Switches between the Original Configuration and the Adapted One The method consists on the following points: - We assume that new changes may concern many Tables of ODS; at least two Tables (t 1 and t 2 ). We concentrate our interest, and so the model s effort on the Table that had received the smaller change volume. To simplify the explanation of our methodology, we call this Table by t 2 and we call the second Table t 1. - Thus, we divide data changes volume of t 2 into two portions as follows Copyright c 2014 SERSC 119
6 Given i is an integer ; Insert the i th division of the changes (occurred in Table t 2 ) into t 2i. The data change of the other Tables will not receive any division or partitioning. - We create two Tables t 21 and t 22 that have the same structure of t 2 and the same integrity constraints, we insert each data change division into t 2i where 1 i 2; - We rename t 2 into another name. We rename t 2i into t 2. In this way, at an instant t when data change is captured, t 2 contains just one division of data change. - Then, the model launches the map s job on t 2 that have joins with the other Table t 1. - Thus, only one portion of data change will be integrated each time. When the map s job is over, t 2 is renamed into t 21 and we rename t 2 i+1 into t 2 - The other portions will be integrated thereafter in the same way - At the end, the original Table gets its name and the t 2i are emptied. The following simplified algorithm shows the main routines of the DJDI methodology. Algorithm n 2 : DJDI StoredProcedure DJDI() { h is Number; Begin h:= Get_size(data_change()); if h t h then execute map s job(); else Divide data_change (Tb 1 ) into T 11 and T 12 ; i := 1; while i <> 3 loop Execute 'alter table T 1 _' i ' rename to T 1 ' ; commit; execute map s job; Execute 'alter table T 1 rename to T_' i ; commit; i:=i+1; end loop; end if commit; End. } 120 Copyright c 2014 SERSC
7 3.2 Integration Adaptation Relying on the divide to conquer principle, our methodology proposes performing data integration with the respect of the existing organization s rules. Actually, the principle that masters our methodology in the integration phase is bi-pillared: - Pillar n 1: It is faster to insert new rows into a table, which has little or no contents, than it is into a large sized Table [8]. - Pillar n 2: It is faster to integrate small amounts of data to integrate a big amount [8]. Compared to the work presented in [8], our methodology proposes to preserve data integrity by maintaining the same existing integrity constraints. For this axis, our model appeals to techniques of renaming the original fact Table and creating a duplicate of it that has the same original name, this duplicated fact Table may receive the newest data. We avoid creating and dropping Tables each time in order not to disturb the system s performance. The same key idea masters this axis and consists of switching automatically between a direct integration of small amounts of data change, or fact table renaming when data changes value are important. We have defined a threshold t h that determine when the data changes is important or not. Thus, our model integrated the new changes differently depending on one of the following ways: Way 1: when data changes are not important If data change is less than the threshold value t h, then they are integrated normally according the original configuration. Way 2: when data changes are important and greater than t h, the new model activates a mechanism of renaming the original fact Table and duplicating it and giving it the real name of the fact Table. In this case, the map s job is executed but toward a free fact Table, which means faster data insertion (according to the principles we have aforementioned). The following schema shows the change we have brought to the original configuration (In case of important data changes). Figure 5. Fact Table Renaming and Data Change Division In this latest case, OLAP query adaptation should be predicated. The following algorithm shows the main actions of this step. Copyright c 2014 SERSC 121
8 Algorithm n 3 : Fact _Table_Duplication StoredProcedure FctDuplication() h is Number; Begin h:= Get_size(data_change()); if h <=t h then execute map s job(); else rename Fct 0 into Fct 0 and rename Fct 1 into Fct 0; execute map s job; end if commit; End. 3.3 OLAP Query Adaptation We consider that we recourse to OLAP adaptation when data changes are important. Under these circumstances, our model duplicates the fact Table. Thus, data are inserted into the fact Tables (the primary fact Table and the secondary fact Tables) and they can be available for OLAP systems by adjusting queries as following: Select (field 1, field n) from Fct0 where condition1, condition 2 becomes Select (field 1, field n) from Fct0 where condition1, condition 2 join Select (field 1, field n) from Fct1 where condition1, condition 2 Consequently, new data change is available in the DW side more quickly and can be retrieved while querying two Tables instead of querying one. The original Table contains historical data while the second one contains the most recent data changes. 4. Performance Evaluation of the DJ-DI Model In this section, we report the performance evaluation of our method. The algorithms are implemented with PlSql. All the experiments were conducted on Intel i CPU 2.50GHz, 2.89GB memory HP PC running Oracle 11g [9] under Windows 7. We implement the following map s job under Oracle Warehouse Builder (OWB) 11g [9]: 122 Copyright c 2014 SERSC
9 Figure 6. The Initial Data Integration Schema The ROLAP approach is used to store data within the DW (the fact Table) because for sql conventional techniques. We generate datasets automatically and run our model multiple times while varying the number of the datasets and thus the size of data changes. We have run the new model for many months. We presume that the data changes concern many fields of two Tables (Table pers and table sit). When data change size is important and is greater than t h, we execute the map s job twice as previously explained in section 3. We have run three (3) scenarios that we will compare thereafter: - Normal mapping (1) - Fact table duplication (only) (2) - Data change Division (only)(3) In order to follow the performance of the model, we have defined two metrics: - The Elapsed time for each scenario, its unit is the second s - The rate of the data integration, its unit is the numbers of MegaBytes per second MB/s Furthermore, we have also defined an indicator to compare the performance between the scenarios. This indicator is expressed as follows: Improvement (1)/(3)= (1- Intgr_rt (1) / Intgr_rt (3) ). Intgr_rt(i) indicates Integration rate of the scenario i Table 1 shows the pourcentage of improvement of integration rate for scenarios (1) and (3). We observe that the time required by the system with the newly adapted configuration with changes division is significantly greater than the time required with systems under the normal configuration. This difference becomes more important when the size of data changes increases, and therefore the earned time resulting from the new configuration is evident. We deduct from the Table 1 that when data change size reaches 64MB, it becomes useful to launches scenario n 3. Thus we define the threshold t h for our model to 64 MB. Copyright c 2014 SERSC 123
10 According to the value of t h, the system behaves differently as we have explained it in the previous section. For important data changes, we execute the map twice on small quantities of data instead of executing it once on big quantity of data (we split changes into 2 pieces). To test our model, we conduct experiments according to two situations: - Situation n 1: inserting data into free fact Table - Situation n 2: inserting data into non empty fact Table Table 1. Integration Rate Comparaison (Scenarios 1 and 3) Change size (MB) T h = 64 MB Integration rate (MB/s) Direct change Division mapping Improvment (%) 16 16,00 16,00 0% 40 31,11 40,00 22% 64 14,49 27,05 46% 72 18,00 36,00 50% 80 20,67 30,00 31% ,68 30,22 61% ,20 58,67 67% ,35 41,60 58% ,04 43,91 59% ,22 30,55 37% ,06 34,30 44% ,86 28,80 31% ,23 33,53 37% ,05 29,15 31% ,31 29,41 24% ,28 20,82 22% - Situation n 1: inserting data into free fact Table We observe the speed of executing the map s job and the rate of data integration while running three scenarios: - Normal mapping (1) - Fact table duplication (only) (2) - Data change Division (only)(3) 124 Copyright c 2014 SERSC
11 We are handling 500MB of operational data changes that generate about 1600MB in the target Tables (the data ware house). Figure 7. Elapsed Times (in seconds) when Integrating 500MB of Data Changes As shown in the abvementionned chart, our model improves the time of data integration by 12, 5% (between scenarios (1) and (3)). Indeed, the same size of the data changes needs 102s to be integrated in the DW by using the normal configuration (1) while they spend just 87s to be integrated by using our model (3). The scnerio (2) which consists of duplicating the fact Table don t bring real improvememnt since the original fact Table is already empty. Another representation of the results is shown in Figure 8. That displays a comparison between the rates of data integration by using the three scenarios while runing each of them four times: Figure 8. Rates of Data Integration when Integrating 500MB of the Data Changes empty Fact Table Copyright c 2014 SERSC 125
12 When doubling the volume of the data changes, i.e., by handling 1Go of data changes that generate around 6,3Go of data in the data warehouse, we obtain the following chart: Figure 9. Elapsed Times (in seconds) when Integrating 1GB of Data Changes - Situation n 2: inserting data into non empty fact Table We consider that the fact Table contains 1600 MB of data. We run the four options of each abovementioned scenarios. Further to the three-abovementioned scenarios, we have added a fourth scenario (4) which is a combination of the scenarios (2) and (3). We call this scenario mixed (1) and (2). This scenario is the summary of our model. We show the results of comparing the rate of data integration in the following Figure 10. Clearly, we observe that our model based on data change division and join adaptation, allows faster data integration. It offers up to 67% improvement according to the normal configuration. Furthermore, when combining this data change division and the fact Table duplication s option, we observe that when they are together, they do not lead to very important results compared to the scenario of data change division. The scenario of dividing data changes still is the speediest. Figure 10. Rates of Data Integration when integrating 1600MB of the Data Changes-non Empty Fact Table 126 Copyright c 2014 SERSC
13 5. Conclusion and Future Work In this paper, we have presented a DJDI model for data integration based on division and join adaptation. This model, built on dividing the volume of the data changes and on fact Table s duplication, shows a real improvement of the rate of data integration compared to the normal configuration. The model behaves according to the importance of the size of data changes. Then, the map s job is launched one or multiple times, in order to integrate data into destination, in shorter time while conserving data integrity. We consider that our methodology make data available in the DW in short time while respecting the existing organization s rules. It respects the whole steps of data warehousing and especially the sequencing. Future works may include the experiments of this model while managing different levels of data categories that we have presented in our previous work [3]. References [1] L. Chen, J. W. Rahayu and D. Taniar, Towards Near Real-Time Data Warehousing, In International Conference on Advanced Information Networking and Applications, 2010.Proceedings 24 th IEEE, (2010), pp [2] Oracle white paper, Real-Time Data Integration for Data Warehousing and Operational Business Intelligence, (2010). [3] I. Lebdaoui, G. Orhanou and S. El Hajji, Data Integrity in Real-time Data warehousing, in the World Congress on Engineering, Proceedings WCE 2013, vol. III, (2013), pp [4] P. Russom, Data Integration for Real-Time Data Warehousing and Data Virtualization, TDWI Checklist Report, (2010). [5] Oracle white paper- march 2014, Best Practices for Real-Time Data Warehousing, (2014). [6] R. Kimball and J. Caserta, The data warehouse ETL toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data, Wiley Publishing, Canada, (2004). [7] A. Reeve, Managing Data in Motion: Data Integration Best Practice Techniques and Technologies, Kindle Edition, (2013), pp [8] R. Jorge Santos and J. Bernardino, Real-Time Data Warehouse Loading Methodology, in international symposium on Database engineering & applications, 2008, Proceedings IDEAS '08, (2008), pp [9] Oracle Corporation, (2014), Authors Imane Lebdaoui, she received in 2005, the State Engineer Diploma in Information Systems from Hassania School of Public Works (EHTP- Morocco). She is a PhD student in Laboratory of Mathematics, Computing and Applications, Department of Mathematical and Computer Sciences, Faculty of Sciences, University of Mohammed V-Rabat, Morocco. Her research interests include databases Management, databases security, data warehouses, real-time and big data. Ghizlane Orhanou, is an Associate Professor in the Computing Sciences Department at the University Mohammed V Agdal, Morocco. She received Ph.D degree in computer sciences from the University Mohammed V Agdal, Morocco in She received in 2001 a Telecommunication Engineer diploma from Telecommunication Engineering Institute (INPT Morocco). Her main research interests include networked and Information systems security. Said El Hajji, is a Professor in the Mathematics Department since 1991 at the University Mohammed V Agdal, Morocco. Responsible of the Mathematics, Computing and Applications Laboratory. He received PhD degree from Laval University in Canada. His main research interests include modeling and numerical simulations, security in networked and Information systems. Copyright c 2014 SERSC 127
14 128 Copyright c 2014 SERSC
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,
Near Real-time Data Warehousing with Multi-stage Trickle & Flip
Near Real-time Data Warehousing with Multi-stage Trickle & Flip Janis Zuters University of Latvia, 19 Raina blvd., LV-1586 Riga, Latvia [email protected] Abstract. A data warehouse typically is a collection
Deductive Data Warehouses and Aggregate (Derived) Tables
Deductive Data Warehouses and Aggregate (Derived) Tables Kornelije Rabuzin, Mirko Malekovic, Mirko Cubrilo Faculty of Organization and Informatics University of Zagreb Varazdin, Croatia {kornelije.rabuzin,
Data Warehouse: Introduction
Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
A Framework for Data Migration between Various Types of Relational Database Management Systems
A Framework for Data Migration between Various Types of Relational Database Management Systems Ahlam Mohammad Al Balushi Sultanate of Oman, International Maritime College Oman ABSTRACT Data Migration is
Speeding ETL Processing in Data Warehouses White Paper
Speeding ETL Processing in Data Warehouses White Paper 020607dmxwpADM High-Performance Aggregations and Joins for Faster Data Warehouse Processing Data Processing Challenges... 1 Joins and Aggregates are
ETL-EXTRACT, TRANSFORM & LOAD TESTING
ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA [email protected] ABSTRACT Data is most important part in any organization. Data
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology
Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology Jun-Zhong Wang 1 and Ping-Yu Hsu 2 1 Department of Business Administration, National Central University,
Turkish Journal of Engineering, Science and Technology
Turkish Journal of Engineering, Science and Technology 03 (2014) 106-110 Turkish Journal of Engineering, Science and Technology journal homepage: www.tujest.com Integrating Data Warehouse with OLAP Server
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
A Knowledge Management Framework Using Business Intelligence Solutions
www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For
Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers
60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
Oracle Warehouse Builder 10g
Oracle Warehouse Builder 10g Architectural White paper February 2004 Table of contents INTRODUCTION... 3 OVERVIEW... 4 THE DESIGN COMPONENT... 4 THE RUNTIME COMPONENT... 5 THE DESIGN ARCHITECTURE... 6
Implementing a Data Warehouse with Microsoft SQL Server 2012
Course 10777 : Implementing a Data Warehouse with Microsoft SQL Server 2012 Page 1 of 8 Implementing a Data Warehouse with Microsoft SQL Server 2012 Course 10777: 4 days; Instructor-Led Introduction Data
CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS
CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS In today's scenario data warehouse plays a crucial role in order to perform important operations. Different indexing techniques has been used and analyzed using
A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment
DOI: 10.15415/jotitt.2014.22021 A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment Rupali Gill 1, Jaiteg Singh 2 1 Assistant Professor, School of Computer Sciences, 2 Associate
A Survey on Data Warehouse Architecture
A Survey on Data Warehouse Architecture Rajiv Senapati 1, D.Anil Kumar 2 1 Assistant Professor, Department of IT, G.I.E.T, Gunupur, India 2 Associate Professor, Department of CSE, G.I.E.T, Gunupur, India
Data Integration and ETL with Oracle Warehouse Builder: Part 1
Oracle University Contact Us: + 38516306373 Data Integration and ETL with Oracle Warehouse Builder: Part 1 Duration: 3 Days What you will learn This Data Integration and ETL with Oracle Warehouse Builder:
PartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE [email protected] 2
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012 OVERVIEW About this Course Data warehousing is a solution organizations use to centralize business data for reporting and analysis.
DATA MINING AND WAREHOUSING CONCEPTS
CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation
Implementing a Data Warehouse with Microsoft SQL Server 2012
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012 Length: Audience(s): 5 Days Level: 200 IT Professionals Technology: Microsoft SQL Server 2012 Type: Delivery Method: Course Instructor-led
SQL Server Business Intelligence on HP ProLiant DL785 Server
SQL Server Business Intelligence on HP ProLiant DL785 Server By Ajay Goyal www.scalabilityexperts.com Mike Fitzner Hewlett Packard www.hp.com Recommendations presented in this document should be thoroughly
Reducing ETL Load Times by a New Data Integration Approach for Real-time Business Intelligence
Reducing ETL Load Times by a New Data Integration Approach for Real-time Business Intelligence Darshan M. Tank Department of Information Technology, L.E.College, Morbi-363642, India [email protected] Abstract
Real-Time Data Warehouse Loading Methodology
Real-Time Data Warehouse Loading Methodology Ricardo Jorge Santos CISUC Centre of Informatics and Systems DEI FCT University of Coimbra Coimbra, Portugal [email protected] Jorge Bernardino
Move Data from Oracle to Hadoop and Gain New Business Insights
Move Data from Oracle to Hadoop and Gain New Business Insights Written by Lenka Vanek, senior director of engineering, Dell Software Abstract Today, the majority of data for transaction processing resides
Lection 3-4 WAREHOUSING
Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing
The Benefits of Data Modeling in Data Warehousing
WHITE PAPER: THE BENEFITS OF DATA MODELING IN DATA WAREHOUSING The Benefits of Data Modeling in Data Warehousing NOVEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2 SECTION 2
Data Warehousing Systems: Foundations and Architectures
Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]
Data Warehousing. Jens Teubner, TU Dortmund [email protected]. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1
Jens Teubner Data Warehousing Winter 2015/16 1 Data Warehousing Jens Teubner, TU Dortmund [email protected] Winter 2015/16 Jens Teubner Data Warehousing Winter 2015/16 13 Part II Overview
Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya
Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data
Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage
Moving Large Data at a Blinding Speed for Critical Business Intelligence A competitive advantage Intelligent Data In Real Time How do you detect and stop a Money Laundering transaction just about to take
A Design and implementation of a data warehouse for research administration universities
A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon
ENABLING OPERATIONAL BI
ENABLING OPERATIONAL BI WITH SAP DATA Satisfy the need for speed with real-time data replication Author: Eric Kavanagh, The Bloor Group Co-Founder WHITE PAPER Table of Contents The Data Challenge to Make
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence
Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Appliances and DW Architectures John O Brien President and Executive Architect Zukeran Technologies 1 TDWI 1 Agenda What
BUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III
White Paper Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III Performance of Microsoft SQL Server 2008 BI and D/W Solutions on Dell PowerEdge
Data Warehouse Overview. Srini Rengarajan
Data Warehouse Overview Srini Rengarajan Please mute Your cell! Agenda Data Warehouse Architecture Approaches to build a Data Warehouse Top Down Approach Bottom Up Approach Best Practices Case Example
Mergers and Acquisitions: The Data Dimension
Global Excellence Mergers and Acquisitions: The Dimension A White Paper by Dr Walid el Abed CEO Trusted Intelligence Contents Preamble...............................................................3 The
Tracking System for GPS Devices and Mining of Spatial Data
Tracking System for GPS Devices and Mining of Spatial Data AIDA ALISPAHIC, DZENANA DONKO Department for Computer Science and Informatics Faculty of Electrical Engineering, University of Sarajevo Zmaja
TOWARDS A FRAMEWORK INCORPORATING FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTS FOR DATAWAREHOUSE CONCEPTUAL DESIGN
IADIS International Journal on Computer Science and Information Systems Vol. 9, No. 1, pp. 43-54 ISSN: 1646-3692 TOWARDS A FRAMEWORK INCORPORATING FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTS FOR DATAWAREHOUSE
Reverse Engineering in Data Integration Software
Database Systems Journal vol. IV, no. 1/2013 11 Reverse Engineering in Data Integration Software Vlad DIACONITA The Bucharest Academy of Economic Studies [email protected] Integrated applications
Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov
Unlock your data for fast insights: dimensionless modeling with in-memory column store By Vadim Orlov I. DIMENSIONAL MODEL Dimensional modeling (also known as star or snowflake schema) was pioneered by
Implementing a Data Warehouse with Microsoft SQL Server
This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse 2014, implement ETL with SQL Server Integration Services, and
An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of
An Introduction to Data Warehousing An organization manages information in two dominant forms: operational systems of record and data warehouses. Operational systems are designed to support online transaction
Extraction Transformation Loading ETL Get data out of sources and load into the DW
Lection 5 ETL Definition Extraction Transformation Loading ETL Get data out of sources and load into the DW Data is extracted from OLTP database, transformed to match the DW schema and loaded into the
Microsoft Data Warehouse in Depth
Microsoft Data Warehouse in Depth 1 P a g e Duration What s new Why attend Who should attend Course format and prerequisites 4 days The course materials have been refreshed to align with the second edition
Data Integration and ETL with Oracle Warehouse Builder NEW
Oracle University Appelez-nous: +33 (0) 1 57 60 20 81 Data Integration and ETL with Oracle Warehouse Builder NEW Durée: 5 Jours Description In this 5-day hands-on course, students explore the concepts,
Query Optimization in Teradata Warehouse
Paper Query Optimization in Teradata Warehouse Agnieszka Gosk Abstract The time necessary for data processing is becoming shorter and shorter nowadays. This thesis presents a definition of the active data
BUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT
ISSN 1804-0519 (Print), ISSN 1804-0527 (Online) www.academicpublishingplatforms.com BUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT JELICA TRNINIĆ, JOVICA ĐURKOVIĆ, LAZAR RAKOVIĆ Faculty of Economics
Oracle Database In-Memory The Next Big Thing
Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes
COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER
Page 1 of 8 ABOUT THIS COURSE This 5 day course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL Server
Implementing a Data Warehouse with Microsoft SQL Server
Page 1 of 7 Overview This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL 2014, implement ETL
Implementing a SQL Data Warehouse 2016
Implementing a SQL Data Warehouse 2016 http://www.homnick.com [email protected] +1.561.988.0567 Boca Raton, Fl USA About this course This 4-day instructor led course describes how to implement a data
Designing a Dimensional Model
Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and
A Generic business rules validation system for ORACLE Applications
A Generic business rules validation system for ORACLE Applications Olivier Francis MARTIN System analyst European Laboratory for Particle Physics - CERN / AS-DB Geneva - SWITZERLAND Jean Francois PERRIN
Course Syllabus For Operations Management. Management Information Systems
For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third
Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence
Introduction to Oracle Business Intelligence Standard Edition One Mike Donohue Senior Manager, Product Management Oracle Business Intelligence The following is intended to outline our general product direction.
How to Build Business Intelligence Using Oracle Real User Experience Insight
An Oracle White Paper April 2009 How to Build Business Intelligence Using Oracle Real User Experience Insight Executive Overview... 3 Starting point... 3 How to indentify potential loss of revenue... 4
An Oracle White Paper March 2014. Best Practices for Real-Time Data Warehousing
An Oracle White Paper March 2014 Best Practices for Real-Time Data Warehousing Executive Overview Today s integration project teams face the daunting challenge that, while data volumes are exponentially
Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.
Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc. Introduction Abstract warehousing has been around for over a decade. Therefore, when you read the articles
Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract
224 Business Intelligence Journal July DATA WAREHOUSING Ofori Boateng, PhD Professor, University of Northern Virginia BMGT531 1900- SU 2011 Business Intelligence Project Jagir Singh, Greeshma, P Singh
ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT
ETL Process in Data Warehouse G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT Outline ETL Extraction Transformation Loading ETL Overview Extraction Transformation Loading ETL To get data out of
Implementing a Data Warehouse with Microsoft SQL Server 2014
Implementing a Data Warehouse with Microsoft SQL Server 2014 MOC 20463 Duración: 25 horas Introducción This course describes how to implement a data warehouse platform to support a BI solution. Students
Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777
Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777 Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing
Capacity Planning Process Estimating the load Initial configuration
Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
QAD Business Intelligence Data Warehouse Demonstration Guide. May 2015 BI 3.11
QAD Business Intelligence Data Warehouse Demonstration Guide May 2015 BI 3.11 Overview This demonstration focuses on the foundation of QAD Business Intelligence the Data Warehouse and shows how this functionality
College of Engineering, Technology, and Computer Science
College of Engineering, Technology, and Computer Science Design and Implementation of Cloud-based Data Warehousing In partial fulfillment of the requirements for the Degree of Master of Science in Technology
IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances
IBM Software Business Analytics Cognos Business Intelligence IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances 2 IBM Cognos 10: Enhancing query processing performance for
<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database
1 Best Practices for Extreme Performance with Data Warehousing on Oracle Database Rekha Balwada Principal Product Manager Agenda Parallel Execution Workload Management on Data Warehouse
www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28
Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT
FEDERATED DATA SYSTEMS WITH EIQ SUPERADAPTERS VS. CONVENTIONAL ADAPTERS WHITE PAPER REVISION 2.7
FEDERATED DATA SYSTEMS WITH EIQ SUPERADAPTERS VS. CONVENTIONAL ADAPTERS WHITE PAPER REVISION 2.7 INTRODUCTION WhamTech offers unconventional data access, analytics, integration, sharing and interoperability
Data Warehouse Snowflake Design and Performance Considerations in Business Analytics
Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management
High-Volume Data Warehousing in Centerprise. Product Datasheet
High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified
Oracle Database 11g Comparison Chart
Key Feature Summary Express 10g Standard One Standard Enterprise Maximum 1 CPU 2 Sockets 4 Sockets No Limit RAM 1GB OS Max OS Max OS Max Database Size 4GB No Limit No Limit No Limit Windows Linux Unix
Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse
Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load
Contextual Security with IF-MAP
, pp.427-438 http://dx.doi.org/10.14257/ijsia.2014.8.5.37 Contextual Security with IF-MAP Abdelmajid Lakbabi, Ghizlane Orhanou and Said El Hajji, Laboratory of Mathematics, Computing and Applications,
Whitepaper. Innovations in Business Intelligence Database Technology. www.sisense.com
Whitepaper Innovations in Business Intelligence Database Technology The State of Database Technology in 2015 Database technology has seen rapid developments in the past two decades. Online Analytical Processing
Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days
Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days Course
A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster
, pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing
Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited
Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? www.ptr.co.uk Business Benefits From Microsoft SQL Server Business Intelligence (September
Performance Tuning for the Teradata Database
Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document
Framework for Data warehouse architectural components
Framework for Data warehouse architectural components Author: Jim Wendt Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 04/08/11 Email: [email protected] Abstract:
