REAL TIME ON-LINE ANALYTICAL PROCESSING FOR BUSINESS INTELLIGENCE



Similar documents
A collaborative approach of Business Intelligence systems

Data Integration Models for Operational Data Warehousing

ENABLING OPERATIONAL BI

Original Research Articles

Innovative Analysis of a CRM Database using Online Analytical Processing (OLAP) Technique in Value Chain Management Approach

Real time Business Intelligence

MDM and Data Warehousing Complement Each Other

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

BENEFITS OF AUTOMATING DATA WAREHOUSING

Oracle Data Integrator 12c (ODI12c) - Powering Big Data and Real-Time Business Analytics. An Oracle White Paper October 2013

DATA MINING AND WAREHOUSING CONCEPTS

LUCRĂRI ŞTIINŢIFICE, SERIA I, VOL. XII (2) BUSINESS INTELLIGENCE TOOLS AND THE CONCEPTUAL ARCHITECTURE

Data warehouse and Business Intelligence Collateral

Data Mining Solutions for the Business Environment

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

Data Warehousing Systems: Foundations and Architectures

The IBM Cognos Platform

The Role of the BI Competency Center in Maximizing Organizational Performance

Real-Time Business Intelligence for the Utilities Industry

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

SQL Server 2012 Business Intelligence Boot Camp

The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

STRATEGIC AND FINANCIAL PERFORMANCE USING BUSINESS INTELLIGENCE SOLUTIONS

Hybrid Support Systems: a Business Intelligence Approach

LEARNING SOLUTIONS website milner.com/learning phone

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

By Makesh Kannaiyan 8/27/2011 1

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Business Intelligence Systems

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Data Warehousing and Data Mining in Business Applications

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

SQL Server 2012 End-to-End Business Intelligence Workshop

Microsoft Data Warehouse in Depth

BUILDING OLAP TOOLS OVER LARGE DATABASES

The difference between. BI and CPM. A white paper prepared by Prophix Software

Master Data Management. Zahra Mansoori

Establish and maintain Center of Excellence (CoE) around Data Architecture

Data Warehousing and Data Mining

SAP BusinessObjects SOLUTIONS FOR ORACLE ENVIRONMENTS

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Course Outline. Module 1: Introduction to Data Warehousing

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Structure of the presentation

How to Enhance Traditional BI Architecture to Leverage Big Data

Dimensional Modeling for Data Warehouse

Data Warehousing Concepts

Data Warehousing and OLAP Technology for Knowledge Discovery

A Knowledge Management Framework Using Business Intelligence Solutions

Integrazione Dati: Un fattore chiave per l innovazione delle imprese.

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

BENEFITS AND ADVANTAGES OF BUSINESS INTELLIGENCE IN CORPORATE MANAGEMENT

Five Technology Trends for Improved Business Intelligence Performance

In-memory databases and innovations in Business Intelligence

SQL Server Introduction to SQL Server SQL Server 2005 basic tools. SQL Server Configuration Manager. SQL Server services management

Implementing a Data Warehouse with Microsoft SQL Server

Integrating SAP and non-sap data for comprehensive Business Intelligence

Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage

When to consider OLAP?

Data Warehouse: Introduction

Budgeting and Planning with Microsoft Excel and Oracle OLAP

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Winning with an Intuitive Business Intelligence Solution for Midsize Companies

Master Data Management and Data Warehousing. Zahra Mansoori

Business Intelligence, Analytics & Reporting: Glossary of Terms

DATA QUALITY IN BUSINESS INTELLIGENCE APPLICATIONS

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Design of Electricity & Energy Review Dashboard Using Business Intelligence and Data Warehouse

CDCR EA Data Warehouse / Strategy Overview. February 12, 2010

Advantages of Implementing a Data Warehouse During an ERP Upgrade

Developing Business Intelligence and Data Visualization Applications with Web Maps

Understanding Data Warehousing. [by Alex Kriegel]

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Data Warehouse Overview. Srini Rengarajan

For Sales Kathy Hall

Oracle Business Intelligence 11g Business Dashboard Management

Business Intelligence for Everyone

The Evolution of ETL

W H I T E P A P E R B u s i n e s s I n t e l l i g e n c e S o lutions from the Microsoft and Teradata Partnership

Transcription:

U.P.B. Sci. Bull., Series C, Vol. 71, Iss. 3, 2009 ISSN 1454-234x REAL TIME ON-LINE ANALYTICAL PROCESSING FOR BUSINESS INTELLIGENCE Guran MARIUS 1, Mehanna AREF 2, Hussein BILAL 3 Cerinţele în conducerea afacerilor presupun cunoaşterea desfăşurării activităţilor şi proceselor la scara de timp a desfăşurării lor, pentru a determina şi influenţa ceea ce urmează să se întâmple în firma/organizaţie. Obiectivul în sistemele de afaceri inteligente (BI) este legat de acumularea datelor, informaţiilor şi cunoaşterii într-un timp cât mai scurt pentru a răspunde cerinţelor decizionale în desfăşurarea activităţilor şi proceselor. Satifacerea unui asemenea obiectiv nu poate evita utilizarea prelucrarea on-line şi în timp real a datelor (RT- OLAP). Utilizarea RT-OLAP în BI presupune depăşirea dimensiunii de timp, prin faptul că o întreprindere competitivă este presată de urgenţe care nu pot fi satisfăcute fără utilizarea OLAP în condiţii de timp real. The business needs to know what is happening right now, faster, in order to determine and influence what should happen next time. The goal of business intelligence (BI) systems is to capture (data, information, knowledge) and respond to business events and needs better, more informed, and faster, as decisions. To support those decisions faster, any discussion must be in the context of Real Time On-Line Analytical Processing (RT-OLAP). RT-OLAP for BI implies much more than time; it conveys a rich sense of urgency for competitive enterprises, because a real-time enterprise without RT- OLAP is a real dumb company, moving not really fast. Keywords: business intelligence, data warehouse, on-line analytical processing, real-time, business value, time latency, time value 1. Introduction Traditionally, most data warehouses (DW) use a staging approach to loading data into the warehouse fact and dimension tables. Data that needs to be loaded into the warehouse is first extracted from source tables, files, and transactional systems, and is then loaded in batches into interface tables within the staging area. Then, the data within these interface tables is transformed, cleansed, and checked for errors, often with temporary staging tables being created on the way to hold the versions of the source data as it is being processed. This cleansed 1 Professor, University POLITEHNICA of Bucharest, ROMANIA, mguran@mix.mmi.pub.ro 2 PhD student, University POLITEHNICA of Bucharest, ROMANIA 3 PhD student, Université de Valenciennes-LAMIH, FRANCE

80 Guran Marius, Mehanna Aref, Hussein Bilal and transformed data is then loaded into the presentation area (Fig. 1.1) of the DW. Once this loading has taken place, aggregates and summaries are either recreated or refreshed, and in some cases, separate OLAP databases must then be updated with data from the DW. Fig. 1.1 Loading DW for OLAP use The process of loading DW, usually referred to as the Extraction, Transformation, and Load (ETL) process, can take anywhere from a few hours to several days to complete, and can require many gigabytes of storage to hold the many versions of staging and interface tables. Because of the time taken by this traditional form of ETL, it s usually the case that the data in warehouses and data marts is at least a day or two out of date; in fact, usually it is between a week and a month behind their source systems. This latency as the situation is known, has in the past however not usually been seen as too much of an issue, as DWs were not considered operational, business-critical applications, but rather more as a way for a small group of headoffice analysts to do trend analysis. 2. Lifecycle data latencies Many times a combination of up-to-date and historical information is necessary. For example, while it may be useful for the store manager to know the number of sales of a particular item so far this morning, it is more useful if he can put that information in the context of the average number sales of that same item on the same day of the week over the last year. Webster dictionary defines "Real-time" as the simultaneous recording of an event with the actual occurrence of that event. In engineering disciplines, the term is extended to describe a system that records and responds to an event within milliseconds. To clarify the life cycle data latencies, [1, 7] we start with a business event taking place and data acquisition technologies deliver the event record to the DW. Analytical processing helps turn the data into information, and a business decision leads to a corresponding action. To approach real time, the duration between the event and its consequent action needs to be minimized. Typically, it is the data acquisition process that introduces majority of the latency (Fig. 2.1).

Real time on-line analytical processing for business intelligence 81 3. Time and business value Fig. 2.1 Life cycle data latencies Because RT-OLAP in BI must be understood in terms of real value of the business, we consider at an initial time, a business event requiring a response, occurs within some business process. At a later time [2], action is taken to respond to that event (Fig. 3.1). For example, customer requests information on a specific product, and then the company provides the information within a few seconds, minutes, hours, days, weeks or months. Fig. 3.1 Action time delay The assumption behind this decay curve is that the longer the delay of the response, the less business value accrues to the company. In this example, the value is the probability that the customer will purchase the product.

82 Guran Marius, Mehanna Aref, Hussein Bilal The action time or action distance is the duration between the event and the action, while the net value is the business value lost or gained over this duration. In the spirit of microeconomic analysis, a second curve can be charted showing the cost-time relationships, (Fig. 3.2). This curve represents the cost required to design, build and operate the system that responds to the event. At some point, the two curves may intersect, implying that the system cost equals the business value. To the right of this point, the company benefits because value exceeds cost; to the left, the company loses because cost exceeds value. The good news is that the intersection point is moving steadily to the left, allowing companies to create systems that increasingly benefit the business. Fig. 3.2 Cost-time relationships Fig. 3.3 Action time three components Within the context of DW, there are three components to this action time, (Fig. 3.3). The data latency is the time required to capture, transform/cleanse and store this data in the DW, ready for analysis. The analysis latency is the time required to analyze and disseminate the results to the appropriate persons. The decision latency is the time required for a person to understand the situation,

Real time on-line analytical processing for business intelligence 83 decide on a course of action and initiate it. For automated procedures, a person may not be directly involved; hence, decision latency would be very small. Figs. 3.1 through 3.3 assume that value decays rapidly after the business event. However, this may not be true in real business situations. Let's consider a few counter cases. First, Fig. 3.4 illustrates a situation in which value stays constant and then rapidly drops. For example, a customer orders a nice meal at a fine restaurant and has different expectations of service than at a fast-food restaurant. In fact, if the food is delivered too quickly, the customer may question the quality. Fig. 3.4 Rapid drops value Second, Fig. 3.5 illustrates a situation in which value increases with time rather than decreases. The value "matures" through time. For example, a farmer does not harvest the crop a day after planting the seeds. The crop requires time to mature. Fig. 3.5 Value matures through time Finally, Fig. 3.6 illustrates a situation that combines the previous two, a slow increase followed by a sudden decline. This curve is often associated with perishable goods or services. For example, bananas sell slowly when green, but sell more rapidly as they turn yellow and then get discarded as soon as the brown spots appear.

84 Guran Marius, Mehanna Aref, Hussein Bilal Fig. 3.6 Value declines after a fast increasing 4. Barriers against RT-OLAP in BI Many challenges oppose obtaining RT-OLAP data in BI (Fig. 4.1), some of which are not addressable by technology. Barriers [3] [12] include the following: 1. It is necessary to provide an historical view. Often, the original source systems do not provide historical data at all; therefore, it is necessary to maintain a separate copy of the data, with full history, in a DW. 2. It is necessary to coordinate with business processes and across systems. It might be meaningless to perform a particular analysis until certain business processes are complete. For example, in our simple retail chain example, store-to-store comparisons might not be meaningful until all stores, in all time zones, have closed. Similarly, the mechanisms by which some legacy systems are updated might set a lower limit on the latency that is achievable. 3. Frequently the data must go through a transformation process to enforce data quality. Depending on the complexity of those transformations and the volume of data, it might simply not be cost effective to reduce the load time below a certain level. 4. It is often necessary to integrate data from multiple data sources, which, even in the absence of significant transformations, might commonly be handled by transformation processes, thereby creating a separate store. 5. The goal of providing satisfactory user performance is often at odds with true Real-time results. Reporting directly from the

Real time on-line analytical processing for business intelligence 85 source system certainly provides Real-time information, but generally the common aggregate-level queries do not perform well against the systems that are orientated for frequent update. Nor could we often tolerate such a query load being placed on our transactional systems. Thus, often a separate store must be maintained, with special attention paid to providing pre computed aggregates. Maintaining those aggregates on each update to the source data takes time. 6. Continually reporting on data from a system that is constantly being updated requires that we consider the issue of data consistency. 7. Even if the data in a report is sufficiently up-to-date, how do we ensure that the report is immediately delivered to the necessary decision makers when they need it? Fig. 4.1 Time latency barriers against OLAP users 5. RT-OLAP approach If you want to achieve RT-OLAP updates to your DW, you ll need to do three things [4]: 1. Reduce or eliminate the time taken to get new and changed data out of your source systems. 2. Eliminate, or reduce as much as possible, the time required to cleanse, transform and load your data. 3. Reduce as much as possible the time required to update your aggregates.

86 Guran Marius, Mehanna Aref, Hussein Bilal Moving from nightly batch loads of data to a real-time data approach requires not just business justification, but technology solution availability and effectiveness. Fig. 5.1 Simplified RT-OLAP architectures for BI These technologies can be used in isolation or in tandem. In many cases they will make it easier to provide RT-OLAP for BI. An example of a simplified architecture (Fig. 5.1), where the Analysis cube is built directly over the source databases (OLTP). Build the cube over a database that is a replicated copy of the actual source OLTP database. The cost of maintaining such a replica is outweighed by the advantage of relieving the transaction system from extra overhead. The essential characteristic of RT-OLAP system is that holds all the data in RAM. Principal benefits of RT-OLAP applications [1] are showing (Table 5.2). Benefits using new DW workload for RT-OLAP Table 5.2 OLD DW Workload for OLAP Batch loading 100s of standard reports Limited numbers of ad-hoc users NEW DW Workload for RT-OLAP Continuous loading 1000s to 10000s of standard reports 1000s of ad-hoc users pervasive BI Warehouse interacting with all systems Embedded BI and analytics in OLTP applications We can summarize the following advantages [5]:

Real time on-line analytical processing for business intelligence 87 1. The size of a cube in a RT-OLAP system is smaller than that of an OLAP product. 2. RT-OLAP perform calculation just in time by only calculating values when they are needed space can be saved. 3. With RT-OLAP when a change is made, everyone sees the results. And we can count two disadvantages [5]: 1. Because RT-OLAP stores the entire cube in RAM, it doesn t scale to the data volume larger than the RAM size. 2. Performance of queries can be slower since the values need to be calculated on the fly instead of being accessed from the precalculated storage. 6. Conclusion The time-value of data started to gain importance; rather than monthly and weekly updates, data warehouses started receiving daily and hourly updates. In addition to custom scripts and ETL, technologies such as EAI (Enterprise Application Integration) started to augment data integration processes to ensure timely acquisition of data. Data quality, aggregation, profiling, and metadata management became integral to data warehousing. BI technologies entered the mainstream and business users became increasingly proactive with the aid of their budding enterprise analytics frameworks. In final, the DW goes active. As RT-OLAP data feeds the DW and matches pre-defined business patterns, business actions are automatically triggered. The active DW auto-initiates actions to systems based on rules and context to support business processes R E F E R E N C E S [1] GoldenGate, Transactional Data Management Platform, Technology and Solution Overview, http://www.goldengate.com/technology/tdm.html [2] Real-Time to Real-Value, the BI Watch, Richard Hackathorn, DM Review Magazine, January 2004 [3] SQL Server 2005, Real-time Business Intelligence using Analysis Services, April 1, 2005, By Paul Sanders [4] Real-time data warehousing, By Philip Howard, Bloor Research [5] Wikipedia / the free encyclopedia / http://en.wiki.pedia.org / [6] HP Business Intelligence Solutions, HP Reference Configurations for Oracle 10g Data Warehousing, Release 2 [7] Data Warehouse, Lifecycle Management Concepts and Principles, Cliff Longman, CTO, Kalido, April 2004 [8] SQL Server 2005 / Microsoft / [9] Oracle 10g Data Warehousing Original, OLAP Option Benefits, ORACLE Corporation

88 Guran Marius, Mehanna Aref, Hussein Bilal [10] Business Intelligence for the Enterprise, By Mike Biere, Publisher: Prentice Hall PTR, Pub Date: June 04, 2003, ISBN: 0-13-141303-1, 240 pages [11] Designing a Datawarehouse: Supporting Customer Relationship Management, By Chris Todman, Publisher: Prentice Hall PTR, First Edition Dec 01, 2000, ISBN: 0-13-089712-4, 360 pages [12] OLAP Solutions, Building Multidimensional Information Systems, Second Edition, Wiley Computer Publishing, John Wiley & Sons, Inc. [13] High-Dimensional OLAP: A Minimal Cubing Approache, Xiaolei Li, Jiawei Han, Hector Gonzalez, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA [14] OLAP in distributed DWs, Jens Albrecht, Wolfgang Lehner, University of Erlangen- Nuremberg, Germany, {jalbrecht, lehner}@informatik.uni-erlangen.de