Infosys GRADIENT. Enabling Enterprise Data Virtualization. Keywords. Grid, Enterprise Data Integration, EII Introduction



Similar documents
Gradient An EII Solution From Infosys

MDM and Data Warehousing Complement Each Other

Data Virtualization A Potential Antidote for Big Data Growing Pains

Attunity Integration Suite

JOURNAL OF OBJECT TECHNOLOGY

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

A Grid Architecture for Manufacturing Database System

The IBM Cognos Platform

Integrating Netezza into your existing IT landscape

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Analance Data Integration Technical Whitepaper

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

Analance Data Integration Technical Whitepaper

Improve business agility with WebSphere Message Broker

EAI vs. ETL: Drawing Boundaries for Data Integration

BEA AquaLogic Integrator Agile integration for the Enterprise Build, Connect, Re-use

Next Generation Business Performance Management Solution

BUSINESSOBJECTS DATA INTEGRATOR

BUSINESSOBJECTS DATA INTEGRATOR

A Service-oriented Architecture for Business Intelligence

Data virtualization: Delivering on-demand access to information throughout the enterprise

A Survey Study on Monitoring Service for Grid

Service Oriented Architecture and the DBA Kathy Komer Aetna Inc. New England DB2 Users Group. Tuesday June 12 1:00-2:15

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Master Data Management

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Business Intelligence and Analytics: Leveraging Information for Value Creation and Competitive Advantage

Enterprise Data Integration

CA Repository for z/os r7.2

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

SOA and Cloud in practice - An Example Case Study

IBM WebSphere application integration software: A faster way to respond to new business-driven opportunities.

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Data Virtualization Usage Patterns for Business Intelligence/ Data Warehouse Architectures

DATA VIRTUALIZATION Whitepaper. Data Virtualization. and how to leverage a SOA implementation.

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

CMDB Federation. DMTF Standards for Federating CMDBs and other Management Data Repositories

IBM Cognos 8 Business Intelligence Reporting Meet all your reporting requirements

SOA IN THE TELCO SECTOR

Best Practices in Enterprise Data Governance

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

The Data Integration Company. Enterprise Data Integration. Maximizing the Business Value of Your Enterprise Data

Industry models for insurance. The IBM Insurance Application Architecture: A blueprint for success

Middleware- Driven Mobile Applications

IBM Enterprise Content Management Product Strategy

Beyond the Single View with IBM InfoSphere

MANAGING USER DATA IN A DIGITAL WORLD

Create a single 360 view of data Red Hat JBoss Data Virtualization consolidates master and transactional data

I N T E R S Y S T E M S W H I T E P A P E R INTERSYSTEMS CACHÉ AS AN ALTERNATIVE TO IN-MEMORY DATABASES. David Kaaret InterSystems Corporation

Technical Management Strategic Capabilities Statement. Business Solutions for the Future

Michigan Criminal Justice Information Network (MiCJIN) State of Michigan Department of Information Technology & Michigan State Police

Data Virtualization. Paul Moxon Denodo Technologies. Alberta Data Architecture Community January 22 nd, Denodo Technologies

A discussion of information integration solutions November Deploying a Center of Excellence for data integration.

Contents. Introduction... 1

Five Technology Trends for Improved Business Intelligence Performance

IBM Customer Experience Suite and Electronic Forms

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

ENABLING OPERATIONAL BI

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Innovate and Grow: SAP and Teradata

Integrating Enterprise Reporting Seamlessly Using Actuate Web Services API

Integrating SAP and non-sap data for comprehensive Business Intelligence

Enterprise Enabler and the Microsoft Integration Stack

Wrap and Renew Digital SOA Catalog Offerings

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

SOA REFERENCE ARCHITECTURE: SERVICE TIER

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Data Ownership and Enterprise Data Management: Implementing a Data Management Strategy (Part 3)

Managing Data in Motion

The ESB and Microsoft BI

Holistic Performance Analysis of J2EE Applications

Reverse Engineering in Data Integration Software

EII - ETL - EAI What, Why, and How!

Event based Enterprise Service Bus (ESB)

Evolutionary Multi-Domain MDM and Governance in an Oracle Ecosystem

SAP NetWeaver. SAP NetWeaver

IBM and ACI Worldwide Providing comprehensive, end-to-end electronic payment solutions for retail banking

Service Oriented Architecture (SOA) An Introduction

Business Intelligence and Service Oriented Architectures. An Oracle White Paper May 2007

Data Integration Checklist

The IBM Cognos Platform for Enterprise Business Intelligence

EMC IT S JOURNEY TO THE PRIVATE CLOUD: APPLICATIONS AND CLOUD EXPERIENCE

Michigan Criminal Justice Information Network (MiCJIN) State of Michigan Department of Information Technology & Michigan State Police

Accelerating the path to SAP BW powered by SAP HANA

Luncheon Webinar Series May 13, 2013

Research on the Model of Enterprise Application Integration with Web Services

Transcription:

Infosys GRADIENT Enabling Enterprise Data Virtualization Keywords Grid, Enterprise Data Integration, EII Introduction A new generation of business applications is emerging to support customer service, risk management, multi-channel integration, loyalty management, regulatory compliance, marketing, business performance management, and other critical functions. These applications require cross-functional data in near real-time. This has created new demand for data access and integration solutions. Burgeoning volume of data, quickened pace of data consumption and decision making, dealing with integration cost and timelines make data integration a complex challenge. This challenge assumes further complexity due to diversity of data sources and data freshness requirements. Companies are seeking more flexible ways of making integrated data available to applications ways that control costs by keeping custom application code and redundant data to a minimum. April 2005

Enterprise Data Integration Challenges In today s fast moving world, business decision makers, customers and trading partners need to act on rapidly changing information in near real-time in order to generate value. The need for near real-time response, combined with the sheer number and complexity of the data sources, creates a new set of data access and integration challenges across the enterprise. Enterprise data assets are typified by different technologies in underlying data sources and different ownership issues with lines of business data. Data requirements for applications which need to analyze near real-time data from all over the enterprise, involve challenges in three areas: 1. Information access The right information need to be delivered to the right application in a timely and consistent manner, irrespective of its source. Many applications require real time data retrieval abilities. 2. Data integration Data which can belong to different heterogeneous types need to be transformed, integrated and aggregated to create intelligent information for the application. Sometimes this requires creating integrated views of data from different data sources. 3. Data stewardship and provisioning Data architects and administrators must protect the integrity and security of data sources, while making them available for consumption by applications across the enterprise. Added to these concerns are a new host of regulatory compliance requirements that require more careful auditing of enterprise information usage. Business Requirement For various reasons information in an enterprise is scattered geographically and is maintained in various formats. The primary reason for such distributed and heterogeneous information is merger and acquisitions to grow in different geographic areas and supplier and partner integration though there can be many more such reasons. Typically the information required for Reporting and Decision Support Systems (DSS) is present in heterogeneous formats and under the administrative authority of different business units thereby making the consolidated access of this information difficult. Many Companies today have variety of data marts and warehouses in place for data consolidation and operational and business reporting / monitoring. There is a need in the organization to get the real time information from the underlying data in data repositories like Databases, Data Marts, CRM tools and Data Warehouses; say, last minute Shipment tracking for a particular customer. It is not easy to cull this information real time due to the spread and heterogeneity of the data sources to be queried. Traditional data integration approaches ETL methodology can provide one possible solution to the disparate information access. ETL is a physical data consolidation concept. Based on the concept, data is Extracted from various disparate data sources and Transformed using a proprietary piece of code to fit into some common repository in the relational format and finally Loaded into a central repository. Data can now be accessed for decision support and reporting. But this approach has some limitations - 1. Significant amount of time is required for loading the central repository and hence this has to be done during a specific off peak hour of the business. Hence there is a time lag and a chance of stale information. The access is not real time. 2. Writing of the proprietary code for transformation of disparate information. 3. Setting up data marts for the new information is often time consuming when a new data source is to be added to the warehouse. 4. The amount of data movement involved in ETL framework is immense. Because of such reasons, ETL may not meet the exact requirements of finding the real time data access for decision support. Moreover, many companies have varied information in multiple Data Warehouses and CRM tools and getting any collated information from these repositories is not an easy task. So we need a solution that can fetch the real time data from such storage repositories using a single query option with minimum cost towards the storage and infrastructure. 2 Infosys View Point

Enterprise Information Integration EII (Enterprise Information Integration) is a new class of software which addresses the enterprise data access and integration challenges. An EII solution is a software solution that enables applications to access both raw and integrated data from multiple, heterogeneous, distributed data sources, while hiding from those applications the complexity of the many disparate data sources involved. Instead of moving data or creating new stores of integrated data, an EII solution creates a loose federation of multiple existing data sources and provides one single view through which applications can access data. In other words, EII creates a data service or data veneer that allows applications and end users to treat a broad variety of multiple data sources as if they were one large single source of data. CLIENT BI/Reporting tools Virtualized view Federated Query System Legacy System Custom applications ERP/ CRM applications Figure 1 Enterprise Information Integration Using EII tools, the client is presented with a virtualized view of the disparate data sources. The client queries this virtualized data source using the standard and popular SQL interface. The federated query system splits up the user query into multiple queries and administers different queries to different databases and then federates the results returned from the databases before returning the federated result-set to the user. This way the Enterprise Information Integration involves less expensive data transformation and focuses more on how to combine the diverse definition of data elements belonging to different databases into a single information element for query federation and displaying the results. The EII technology, though more recent, draws heavily from the distributed database query processing and optimization literature which is quite mature. Strong query optimization and performance is a key to the success of EII tools. Information virtualization through EII allows the end user to use the existing applications to fetch information from anywhere within the enterprise without bothering about where and how of the information retrieval process. Infosys View Point 3

EII Business values and benefits The business values realized by an EII solution include: Real or near real-time data integration and delivery across heterogeneous data sources Wider, faster end-user access to key business data Ability to leverage proprietary information more effectively for competitive advantage A single, comprehensive view of the enterprise s information assets Integrating enterprise data for administrative cost savings Improving developer productivity in key data-intensive application development thereby reducing time-to-market (Studies suggest 30-40% time improvement in enterprise application development) Rapid reaction to environmental changes on demand Enhancing business intelligence for more competitive advantage Extending data and application integration for speedier, more cost-effective business processes Proven benefits from existing EII implementations include: Ability to analyse real-time business data to accelerate business decision making Support for near real-time enterprise data makes this possible. Greater re-use of data EII infrastructure built for one data integration project can be re-used for other data integration projects involving the same data sources since data is only federated, not actually moved. Reduction of data integration infrastructure costs No need to build custom data warehouses for data integration and analysis tasks. Extension of the range of data warehouse querying for better decisions and BI Combine data warehouses query on historic data with query on real-time business data to make better decisions Speedup of decisions support, portal and applications development An infrastructure that is flexible and extensible to support a wide variety of application end points EII Vendors and Solutions A number of EII solutions are available, both proprietary and open source. Some of these technologies are either based on data grids technology or derivatives of previous research work done on data grids. Here we take a look at a few of these solutions, which come under the umbrella of data grid technologies and EII solutions. 1. Avaki Avaki is an EII solution which streamlines integration of data from many distributed data sources while providing standardized access to integrated data views through a single data layer. Avaki is a commercial derivation of Legion, an Internet-scale enterprise-class grid middleware platform developed at the University of Virginia. 2. Gemfire Gemfire Enterprise is a high performance data services software solution from Gemstone, that makes data available on-demand to applications regardless of the underlying data sources or formats. GemFire Enterprise is based on most well-known technology standards (JDBC, JMS, JCache, SOAP /HTTP, Web Services, etc.) and seamlessly connects to databases, application stores, analytical tools, messaging systems, and mainframes, enabling the deployment of high-performance architectures at significantly lower costs. 3. SRB - The SDSC Storage Resource Broker (SRB) is client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and accessing replicated data sets. SRB, in conjunction with the Metadata Catalog (MCAT) is widely used in scientific and engineering projects to implement large scale, heterogeneous data grids. SDSC is however not suited for enterprise information integration due to a very limited support of relational data. There are a number of other commercial vendors in this space such as Metamatrix, Ascential, CenterBoard, Ipedo etc. 4 Infosys View Point

EII and Data Grids Grid computing has emerged as a new and powerful paradigm in distributed computing. It provides a stack of software services that allows interconnecting a large number of heterogeneous computing systems distributed across large geographic locations to provide a uniform, virtualized single source of computing to the user. Data Grid seeks to present a set of virtualized data services to take out the complexity and data federation and reduce data related latencies. Data Grids, also called as information grids or storage grids are concerned with the following aspects of virtualized grid environment: Scalability Reliability Availability Interoperability Canonically, a data grid can be defined as, a network of distributed, heterogeneous storage resources that are linked using a logical name space to create global, persistent identifiers which provide a uniform, virtualized view of multiple data repositories. Data grids provide a means of virtualizing widely distributed heterogeneous data, by implementing services which provide a single logical view of the complete data space to the user. EII also achieves to do the same by using virtual data federation techniques. Hence data grids are ideally suited for tasks such as EII in the enterprise as well as in widely distributed, collaborating scientific communities. Objectives of Infosys GRADIENT Following objectives are set for Infosys Gradient based on the study of the EII technology and Vendor landscape: 1. Gradient should provide Real or near real-time data integration and delivery across heterogeneous data sources or data sources of different formats like Relational and Non relational (spread sheets and XMLs) 2. Gradient parser should be compliant to ANSI SQL standard and support most of the vital features for Query Support using SQL. 3. Gradient should provide a fully functional Graphical User Interface for Accessing, Monitoring and Administering the Gradient repository and infrastructure. 4. Gradient should support Data retrieval without letting the user bother for the name and type of the data sources required for data retrieval. 5. Gradient should support virtual views for easier access of the information. 6. Gradient should support easier Integration with the existing applications preferably by complying with SOA standards using web services. 7. Gradient should provide enhanced scalability and improved performance using Distributed Processing and Parallel Query evaluation techniques. Here we take a look at a few of these solutions, which come under the umbrella of data grid technologies and EII Infosys GRADIENT Enabling Enterprise Data Virtualization Infosys GRADIENT (Grid Access of Distributed Information in the Enterprise) is a data virtualization solution that provides a virtualized view of enterprise data by using grid services. At the core of the GRADIENT lie three layers: 1. The Provisioning Layer Provisioning layer helps provision the data sources that need to be exposed to the GRADIENT as services. The layer helps to hide the complexities involved in posting a Query for different type of data sources. 2. The Integration Layer Integration layer integrates the metadata information of the different data sources that is used to search for the location of the data. The basic functionality of this layer includes parallelizing the query using Distributed Query Processor engine and Integration of the result sets. Infosys View Point 5

3. The Access Layer Access layer is the point of contact for the end user application to access the information exposed by the GRADIENT using standard SQL queries. Presentation Layer Data Access Gradient Service Metadata Manager Cache Manager Security Data Integration Query Optimization Engine Query Execution Engine DQ Registry Data Provisioning Grid Data Service OGSA- Data GRADIENT Architecture At the provisioning layer, GRADIENT uses OGSA-DAI to expose disparate data sources as data services, called Grid Data Services. Grid Data Services accepts the XML based perform documents that describe user queries and database metadata. The Grid Data Service parses and validates the query in the perform document against the metadata in the perform document, executes the query and constructs response documents. Response document is a XML document containing the query results At the integration layer, GRADIENT uses a powerful Distributed Query Processor Engine that splits a query into multiple sub-queries that are executed in parallel across different nodes on grid to ensure better scalability. At the access layer, GRADIENT employs a number of caching techniques for improved performance and quick response times. GRADIENT uses Data cache to store the result set of a query for better response times. It uses a Query cache to store the data sources required to satisfy a query in order to save time spent on metadata resolution. Additionally, it has a Metadata Cache to store column names and their associated data source references for quicker metadata resolution. Each cache has different expiration or stale times and can be configured as required. Additionally, GRADIENT is being enhanced to support adaptive distributed caching at the GDS level to cache the heavily used data from each of data sources. This adaptive distributed cache can also refresh the result set dynamically at specified intervals. This Architecture provides the capability to GRADIENT to host the data sources of different formats and geographically sparse as a uniform, virtualized view to the end user. The end users can perform a single query, which involve the expensive join operations to fetch the information residing in disparate data sources. Such Queries are optimised for performance and scalability by parallel execution on a Grid Infrastructure. The end user fires such queries without having the knowledge of location for the underlying data sources and their formats. Benefits of GRADIENT GRADIENT offers the following benefits to the end user applications as opposed to the traditional methods for data access and integration. Real or near real-time data integration and delivery across heterogeneous data sources Wider, faster end-user access to key business data Ability to leverage proprietary information more effectively for competitive advantage 6 Infosys View Point

A single, comprehensive view of the enterprise s information assets Integrating enterprise data for administrative cost savings Improving developer productivity in key data-intensive application development thereby reducing time-to-market Rapid reaction to environmental changes on demand Enhancing business intelligence for more competitive advantage Extending data and application integration for speedier, more cost-effective business processes Completely SOA compliant architecture. Conclusion Data integration can be a significant effort, whether you are involved in building new data-intensive applications, adapting a packaged application to a new context or trying to create a single point of access for your enterprise data. More companies are turning to EII solutions which can shorten data integration projects and lower data management and maintenance costs over time. EII solutions simplify data provisioning, access and integration, shortening the data integration time frame and enabling developers to spend their time concentrating on developing actual application logic, thereby enhancing developer productivity. Infosys data virtualization solution give you the single view of truth, allowing you to leverage your enterprise data in a number of ways not possible with traditional data integration techniques. Infosys GRADIENT is a data virtualization solution that uses data grid principles for scalable real-time data integration. It is built on top of open source OGSA-DAI framework and is based on widely accepted open standards in Grid computing. Infosys GRADIENT is optimized for Scalability and performance using Grid Infrastructure. Infosys GRADIENT provides a true SOA platform for information integration for your enterprise. References 1. An introduction to Enterprise Information Integration (EII) software Avaki Corporation, Dec 2004, http://www.avaki.com 2. Avaki Solution Brief Retail and Commercial Banking: Customer profiling and analysis, Avaki Corporation, http://www.avaki.com. 3. Infostructure Associates Leveraging information for Organizational success, http://www.dmreview.com/whitepaper/wid1016595.pdf 4. OGSA-DAI http:/www.ogsadai.org 5. SRB http://www.sdsc.edu/srb 6. Gemstone Gemfire http://www.gemstone.com/products/gemfire/enterprise.php 7. Avaki: A data layer for real-time data access and integration Avaki corporation, January 2005 - http://www.avaki.com/file/pdf/protected/avaki_product_whitepaper.html 8. How to identify, specify and realize services for your SOA Ali Arsanjani, Chief Architect, IBM SOA and Web service COE - http://www.webservices.org/index.php/ws/content/view/full/55047 9. Survey of Grid File Systems, Grid File System Working Group (GFS WG) http://phase.hpcc.jp/ggf/gfs-rg/ 10. Storage Resource Broker - Managing Distributed Data in a Grid, Arcot Rajasekar, Michael Wan, Reagan Moore, Wayne Schroeder, George Kremenek, Arun Jagatheesan, Charles Cowart, Bing Zhu, Sheau-Yen Chen, Roman Olschanowsky, Computer Society of India Journal, Special Issue on SAN, Vol. 33, No. 4, pp. 42-54 Oct 2003 11. Virtualization Services within Data Grids, Reagan W. Moore, Business Briefing: Data Management and Storage Technology, Business Briefings Ltd., 2003. 12. An Overview of the Federated MCAT Design: SRB presentation by Mike Wan, UK escience All Hands Meeting,Nottingham, UK, September, 2003. 13. Digital Libraries, Data Grids, and Persistent Archives, presentation at NARA, Dec 17, 2001 (SRB developers). Infosys View Point 7