Gradient An EII Solution From Infosys



Similar documents
Infosys GRADIENT. Enabling Enterprise Data Virtualization. Keywords. Grid, Enterprise Data Integration, EII Introduction

MDM and Data Warehousing Complement Each Other

Data Virtualization A Potential Antidote for Big Data Growing Pains

JOURNAL OF OBJECT TECHNOLOGY

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

The IBM Cognos Platform

BUSINESSOBJECTS DATA INTEGRATOR

Integrating SAP and non-sap data for comprehensive Business Intelligence

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Enterprise Data Integration

EAI vs. ETL: Drawing Boundaries for Data Integration

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

How to Enhance Traditional BI Architecture to Leverage Big Data

A Service-oriented Architecture for Business Intelligence

Chapter 5. Learning Objectives. DW Development and ETL

BUSINESSOBJECTS DATA INTEGRATOR

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

Virtual Operational Data Store (VODS) A Syncordant White Paper

FEDERATED DATA SYSTEMS WITH EIQ SUPERADAPTERS VS. CONVENTIONAL ADAPTERS WHITE PAPER REVISION 2.7

I N T E R S Y S T E M S W H I T E P A P E R INTERSYSTEMS CACHÉ AS AN ALTERNATIVE TO IN-MEMORY DATABASES. David Kaaret InterSystems Corporation

Michigan Criminal Justice Information Network (MiCJIN) State of Michigan Department of Information Technology & Michigan State Police

Master Data Management and Data Warehousing. Zahra Mansoori

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Master Data Management

Analance Data Integration Technical Whitepaper

POLAR IT SERVICES. Business Intelligence Project Methodology

Data virtualization: Delivering on-demand access to information throughout the enterprise

EII - ETL - EAI What, Why, and How!

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

CHAPTER 1 INTRODUCTION

Data Virtualization Usage Patterns for Business Intelligence/ Data Warehouse Architectures

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Attunity Integration Suite

Data Grids. Lidan Wang April 5, 2007

Next Generation Business Performance Management Solution

Reduce and manage operating costs and improve efficiency. Support better business decisions based on availability of real-time information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Integrating Netezza into your existing IT landscape

Enterprise Enabler and the Microsoft Integration Stack

Analance Data Integration Technical Whitepaper

Michigan Criminal Justice Information Network (MiCJIN) State of Michigan Department of Information Technology & Michigan State Police

Your Data, Any Place, Any Time.

ORACLE DATA INTEGRATOR ENTEPRISE EDITION FOR BUSINESS INTELLIGENCE

A Grid Architecture for Manufacturing Database System

ENABLING OPERATIONAL BI

SOA and Cloud in practice - An Example Case Study

Enterprise Data Integration The Foundation for Business Insight

Event based Enterprise Service Bus (ESB)

Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to:

IBM Customer Experience Suite and Electronic Forms

High-Volume Data Warehousing in Centerprise. Product Datasheet

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

Technical Management Strategic Capabilities Statement. Business Solutions for the Future

The IBM Cognos Platform for Enterprise Business Intelligence

By Makesh Kannaiyan 8/27/2011 1

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Unified Data Integration Across Big Data Platforms

White Paper. Unified Data Integration Across Big Data Platforms

Testing Big data is one of the biggest

Research on the Model of Enterprise Application Integration with Web Services

An Oracle White Paper October Oracle Data Integrator 12c New Features Overview

FROM DATA STORE TO DATA SERVICES - DEVELOPING SCALABLE DATA ARCHITECTURE AT SURS. Summary

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Integrating data in the Information System An Open Source approach

Service Oriented Architecture and the DBA Kathy Komer Aetna Inc. New England DB2 Users Group. Tuesday June 12 1:00-2:15

BIM the way we do it. Data Virtualization. How to get your Business Intelligence answers today

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

TDWI REPORT SERIES. Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise. By Colin White, BI Research NOVEMBER 2005

OWB Users, Enter The New ODI World

Understanding and Selecting Integration Approaches

Reverse Engineering in Data Integration Software

The ESB and Microsoft BI

Business Intelligence and Analytics: Leveraging Information for Value Creation and Competitive Advantage

Business Integration Architecture for Next generation OSS (NGOSS)

SAP BusinessObjects SOLUTIONS FOR ORACLE ENVIRONMENTS

Data Warehouse Overview. Srini Rengarajan

Efficient Data Access and Data Integration Using Information Objects Mica J. Block

Lection 3-4 WAREHOUSING

Data Integrator: Object Naming Conventions

Integrating Ingres in the Information System: An Open Source Approach

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Effecting Data Quality Improvement through Data Virtualization

Service Virtualization andRecycling

Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise. Colin White Founder, BI Research TDWI Webcast October 2005

Data Warehouse Architecture for Financial Institutes to Become Robust Integrated Core Financial System using BUID

DATA VIRTUALIZATION Whitepaper. Data Virtualization. and how to leverage a SOA implementation.

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

A discussion of information integration solutions November Deploying a Center of Excellence for data integration.

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS

Design Document. Offline Charging Server (Offline CS ) Version i -

Wrap and Renew Digital SOA Catalog Offerings

Service-Oriented Architectures

Transcription:

Gradient An EII Solution From Infosys Keywords: Grid, Enterprise Integration, EII Introduction New arrays of business are emerging that require cross-functional data in near real-time. Examples of such abound. These include customer support service, risk management, multi-channel integration, loyalty management, regulatory compliance, business performance management etc. This has created a demand for newer solutions for accessing and integrating data. Moreover, business units need to swiftly act on rapidly changing information in near real-time in order to generate value. The need for near real-time response, combined with the sheer number and complexity of the data sources, creates a new set of data access and integration challenges across the enterprise. Enterprise data assets are typified by different technologies in underlying data sources and different ownership issues with lines of business data. within most organizations exist in multiple, and often heterogeneous, databases under the administrative authority of different business units. It may very well be the case that the databases are also spatially dispersed due to the geographical separation of the business units. Need is often felt to integrate such geographically dispersed and heterogeneous databases on a real time basis for strategic decision making. As already stated such need is manifested in different such as loyalty management etc. In the past the data integration challenges has been addressed by Extract, Transform, Loading (ETL) frameworks. However, ETL frameworks have their own limitations as they fail to address the need for integrating data on a real time basis. Enterprise Information Integration (EII) has been proposed as a new technological innovation complimentary to the existing ETL based approaches. The prime focus of EII based solutions lie in their ability to integrate and access data on a real time basis.

Enterprise Integration Business Requirements Information within an enterprise is maintained in various formats and is often scattered across different geographical locations. A number of reasons can be attributed for the heterogeneity and the geographical dispersion of the data within an organization. Such a situation, for example, is exemplified in a post M&A scenario, where the different information systems prevalent within the enterprises need to be integrated. Typically the information required for Reporting and Decision Support Systems (DSS) are different for various business units and are under the administrative authority of different business units. This makes the consolidated access of the information difficult. Even though most companies resort to Marts and Warehouses, there is certain amount of latency involved in the data movement. The present day integration technologies do not address the issue of accessing and integrating up-to date information on a real time. This is so because it is not easy to cull this information on a real time due to the spread and heterogeneity of the data sources to be queried. Key Challenges Burgeoning volumes of data along with the need to take decisions on a real time basis makes data integration a complex task. This challenge assumes further complexity due to diversity of data sources and data freshness requirements. Companies are seeking more flexible ways of making integrated data available to various. Applications which need to analyze data on a real-time basis need to address three key challenges: 1. Information access The right information need to be delivered to the right application in a timely and consistent manner, irrespective of its source. Many require real time data retrieval abilities. 2. integration which can belong to different heterogeneous types need to be transformed, integrated and aggregated. This requires addressing the semantic heterogeneity issues. 3. stewardship and provisioning architects and administrators must protect the integrity and security of data sources, while making them available for consumption by across the enterprise. Added to these concerns are a new host of regulatory compliance requirements that require more careful auditing of enterprise information usage. Traditional Integration Approaches A number of integration methodologies address the need to integrate such multiple heterogeneous databases. Two popular frameworks: Extraction, Transformation and Loading (ETL) and Enterprise Application Integration (EAI) addresses a plethora of issues associated with such integration. The ETL framework enables companies to extract data from disparate data sources using standard JDBC/ODBC calls, transform the extracted data using specific business logic and load the transformed data into target repositories called data-warehouses and data marts. The source data often needs reformatting and cleansing. The amount of data movement involved in ETL framework is immense. Moreover, large amount of latency exists in the ETL framework. The transformation is often done using a proprietary piece of code. The loaded data is stored in some common repository in the relational format. This data can then be accessed for strategic decision making. However, the ETL approach inherently suffers from certain limitations. Significant amount of time is required for loading the central repository and hence this has to be done during a specific off peak hour of the business. Due to the time lag involved in the process, there are high chances of making decisions using stale information. Involves writing of proprietary code for transformation of disparate information. The amount of data movement involved in ETL framework is immense 2 Infosys White Paper

Due to the latency involved in cleaning, transforming and moving the data in ETL solutions, the databases cannot be integrated on a real time basis. Enterprise Information Integration (EII) differs from such conventional ETL based approaches since it focuses on accessing the data rather than moving the data. EII based solutions provide a powerful way to integrate data on a real time basis for strategic decision making. Legacy Systems Cleanse Transform Warehouse Custom Extract Load BI/Reporting tools ERP/ CRM Marts Enterprise Information Integration A Typical ETL framework involving Extract, Transform and Load Enterprise information integration, often abbreviated as EII, uses the concept of data virtualization to present a consolidated view of the disparate data sources that underlie such tools. The virtualized view of the heterogeneous data sources realized using such tools results in Enterprise Information Integration tools acting essentially as data federators than as data marts. The core engine of an Enterprise Information Integration tool is a query processing and optimization module that draws data from multiple data sources in a manner transparent to the user/client. Virtual data integration using Enterprise Information Integration tools helps to provide on demand data access and helps in real time data access Legacy Systems Custom Federator Virtualized view BI/Reporting tools ERP/ CRM A Typical EII framework providing virtualized view of the data sources Using EII tools, the client is presented with a virtualized view of the disparate data sources. The client queries this virtualized data source using the standard and popular SQL interface. The federated query system splits up the user query into multiple queries and administers different queries to different databases and then federates the results returned from the databases before returning the federated result-set to the user. This way the Enterprise Information Integration involves less expensive data transformation and focuses more on how to combine the diverse definitions of data elements belonging to different databases into a single information element for query federation and displaying the results. The EII technology, though more recent, draws heavily from the distributed database query processing and optimization literature which is quite mature. Infosys White Paper 3

Strong query optimization and performance is a key to the success of EII tools. Information virtualization through EII allows the end user to use the existing to fetch information from anywhere within the enterprise without bothering about where and how of the information retrieval process. CLIENT BI/Reporting tools Virtualized view Federated Query System Legacy System Custom ERP/ CRM Features and Benefits Query Federation using Enterprise Information Integration Real or near real-time data integration and delivery across heterogeneous data sources Wider, faster end-user access to key business data Ability to leverage proprietary information more effectively for competitive advantage A single, comprehensive view of the enterprise s information assets Integrating enterprise data for administrative cost savings Improving developer productivity in key data-intensive application development thereby reducing time-to-market (Studies suggest 30-40% time improvement in enterprise application development) Increased flexibility to environmental changes on demand Enhancing business intelligence for more competitive advantage Extending data and application integration for speedier, more cost-effective business processes Relieves writing of proprietary code for transformation. Other business benefits include Ability to analyze real-time business data to accelerate business decision making. Greater re-use of data EII infrastructure built for one data integration project can be re-used for other data integration projects involving the same data sources since data is only federated, not actually moved. Reduction of data integration infrastructure costs No need to build custom data warehouses for data integration and analysis tasks. Extension of the range of data warehouse querying for better decisions and BI Combine data warehouses query on historic data with query on real-time business data to make better decisions. Speedup of decisions support, portal and development An infrastructure that is flexible and extensible to support a wide variety of application end points. 4 Infosys White Paper

Grids and EII Grid computing has emerged as a new and powerful paradigm in distributed computing. It provides a stack of software services that allows interconnecting a large number of heterogeneous computing systems distributed across large geographic locations to provide a uniform, virtualized single source of computing to the user. Grid seeks to present a set of virtualized data services to take out the complexity and data federation and reduce data related latencies. Grids, also called as information grids or storage grids are concerned with the following aspects: Scalability Reliability Availability Interoperability Grids seek to harness idle computational nodes for computational load sharing. This way Grid enables sharing of computational load across different machines. Since EII solutions deal with huge volumes of data during data integration, it is useful to integrate the use of data grids for queries involving high computational requirements to achieve superior information integration benefits. GRADIENT: An Infosys EII solution GRADIENT (GRid Access of Distributed Information in the ENTerprise) is a grid based Enterprise Information Integration tool from Infosys for accessing distributed information across an enterprise called. GRADIENT is a service-oriented data grid solution that overcomes the limitations of ETL and EAI based data integration technologies and enables real-time data integration using data virtualization. GRADIENT allows the end user to seamlessly query disparate information sources using a standard SQL/OQL query facility. GRADIENT achieves greater scalability and performance using a Mediator based Distributed Query Processing engine. GRADIENT is developed using various robust open source technologies like the OGSA-DAI (Open Grid Services Architecture Access Integration). OGSA-DAI exposes various databases, XML bases and indexed flat files as Web services and allows querying relations stored across these diverse data-sources. GRADIENT is built on top of OGSA-DAI as a mediator and extends OGSA-DAI to support multiple features than is currently supported by OGSA-DAI. GRADIENT provides a virtualized view of enterprise data by using grid services. This allows GRADIENT to expose geographically dispersed and heterogeneous data sources in a way that provides virtualized view of the data sources to the end user. The end users can fire queries that involve join operations spanning multiple data sources using a single query without having the knowledge of the location of the underlying data sources and their formats. Presentation Layer Access Gradient Service Metadata Manager Cache Manager Security Integration Query Optimization Engine Query Execution Engine DQP Registry Provisioning Grid Service OGSA-DAI Sources GRADIENT Architecture Infosys White Paper 5

Components/Benefits At the core of the GRADIENT lie three layers: 1. The Provisioning Layer Provisioning layer helps provisions the data sources that need to be exposed to the Gradient as services. The layers helps to hide the complexities involved in posting a Query for different type of data sources. 2. The Integration Layer Integration layer integrates the metadata information of the different data sources that is used to search for the location of the data. The basic functionality of this layer includes parallelizing the query using Distributed Query Processor engine and Integration of the result sets. 3. The Access Layer Access layer is the point of contact for the end user application to access the information exposed by the Gradient using standard SQL queries. At the provisioning layer, GRADIENT uses OGSA-DAI to expose disparate data sources as data services, called Grid Services. Grid Services accepts the XML based perform documents that describe user queries and database metadata. The Grid Service parses and validates the query in the perform document against the metadata in the perform document, executes the query and constructs response documents. Response document is a XML document containing the query results At the integration layer, GRADIENT uses a powerful Distributed Query Processor Engine that splits a query into multiple sub-queries that are executed in parallel across different nodes on grid to ensure better scalability. At the access layer, GRADIENT employs a number of caching techniques for improved performance and quick response times. GRADIENT uses cache to store the result set of a query for better response times. It uses a Query cache to store the data sources required to satisfy a query in order to save time spent on metadata resolution. Additionally, it has a Metadata Cache to store column names and their associated data source references for quicker metadata resolution. Each cache has different expiration or stale times and can be configured as required. Additionally, GRADIENT is being enhanced to support adaptive distributed caching at the GDS level to cache the heavily used data from each of data sources. This adaptive distributed cache can also refresh the result set dynamically at specified intervals. Unique Value Proposition The business values provided by GRADIENT include Real or near real-time data integration and delivery across heterogeneous data sources Wider, faster end-user access to key business data Ability to leverage proprietary information more effectively for competitive advantage A single, comprehensive view of the enterprise s information assets Integrating enterprise data for administrative cost savings Improving developer productivity in key data-intensive application development thereby reducing time-to-market Rapid reaction to environmental changes on demand Enhancing business intelligence for more competitive advantage Extending data and application integration for speedier, more cost-effective business processes Completely SOA compliant architecture. Conclusion Efforts needed to overcome challenges posed by data integration are significant. Enterprise Information Integration (EII) solutions have been proposed as an alternative to the already existing Extraction, Transformation and Loading (ETL) based solutions. EII solution help to shorten the time required to carry out data integration and lowers the data management cost and maintenance costs over time. EII solutions enable developers to concentrate on the actual application logic rather than worry about data availability. This greatly enhances productivity. EII solutions allow one to leverage one s enterprise data in a number of ways not possible with traditional data integration techniques. GRADIENT is an Infosys EII solution focusing on the technology of Grids. GRADIENT is built upon the technologies of OGSA-DAI which are based on widely accepted, open standards in Grid computing and represents powerful and emerging XML standards. Gradient provides a true SOA platform for data integration within an enterprise. 6 Infosys White Paper

Glossary ETL Abbreviation for Extraction, Transformation and Loading. A popular framework for integrating data from multiple heterogeneous databases EII Abbreviation for Enterprise Information Integration. It represents an alternative to the existing ETL frameworks. base Collection of records stored in a computer system in a systematic manner. DBMS Abbreviation for database management system. It is a piece of software designed to manage a database. SQL Abbreviation for Structured Query Language. It is declarative query language for relational databases. Query Optimization A set of procedures used to transform a query written using SQL to C++/Java code that can be efficiently executed against the databases.