DATA VIRTUALIZATION Whitepaper Data Virtualization and how to leverage a SOA implementation www.denodo.com
Incidences Address Customer Name Inc_ID Specific_Field Time New Jersey Chevron Corporation 3 3 Seattle IBM New Jersey JPMorgan 1 1 Minnesota Chevron Corporation Seattle JPMorgan 7 7 Seattle IBM 1 1 Minnesota Chevron Corporation 3 3 New Jersey JPMorgan Minnesota IBM 5 5 New Jersey Chevron Corporation 1 1 Executive Summary Widely recognized as the foundation of Software Orientated Architectures, an ESB serves as an integration and communication layer between different enterprise systems, the infrastructure that connects business applications and data sources, applying data transformations in the workflow process and orchestrating results. Thus, it may appear that this technology overlaps with Data Virtualization. However, in this paper we will explore: how the two of them solve different challenges, how they can actually work together to provide a more robust and easy to develop end-toend solution and how Data Virtualization will greatly minimize the costs associated with managing and maintaining said ESB deployment. Data Virtualization provides a Data Services Layer in a SOA Environment Data Virtualization technology decouples the heterogeneities of the underlying data sources from the upper layer client applications, providing seamless access to data of all types to serve any business needs, be they operational or informational (BI and/or reporting). It exposes a Virtual Data Layer comprised of canonical business entities that are built as a result of combining data from the underlying information sources, exposing them as Data Services that can be consumed by enterprise applications, ESBs tools, reporting solutions, portals, mobile applications, users, etc. (See fig. 1). It includes key capabilities for real-time query optimization supported by intelligent caching and scheduled data orchestration, unified data governance and the ability to deliver data services in multiple formats with managed security and service-levels. Data Consumers Enterprise Applications, ESB Reporting, BI, Portals Mobile, Web, Users Universal Data Provisioning Customer 1 N Support Incidents DATA VIRTUALIZATION 1 N Orders N 1 Products 1 N π σ Client Address Client Type Orders Products Incidents Logs Company CRM ERP Warehouse Incidents Web App Document Database Web Services Figure 1 Data Virtualization provides a Data Services layer comprised of canonical business entities exposed for easy consumption Data Virtualization, and how to leverage a SOA implementation
When deployed in a SOA environment, Data Virtualization can provide data services to any application through the ESB, which will then work as a consumer of such services. The ESB technology focuses on orchestrating complex business processes, a process-centric technology whereas Data Virtualization is a data-centric solution. Data Virtualization. ESB. The differences. Data Virtualization platforms deliver advanced capabilities to connect, combine and transform data from heterogeneous sources, offering a single point of entry for data access and metadata management. On the other hand, ESB technology has traditionally served as the integration and communication layer between different enterprise systems, providing the infrastructure for connecting applications and data sources and enabling some data transformations as part of a workflow process. Designed to orchestrate services, an ESB can also be used as an integration hub. However, when data combinations and transformations are critical components of a business solution, trying to make them happen within ESB frameworks can be frustrating, since they lack such advanced data management capabilities as those of a Data Virtualization middleware platform. Deployed without Data Virtualization, ESB implementations are far more cumbersome, far more costly and far harder to maintain; what s worse, the performance of the deployed solution will likely be far worse! That said, ESB solutions do indeed bring to the table certain vital functionalities in the area of business process automation and services orchestration things that no Data Virtualization solution attempts to do. As a general guideline, the decision criteria for choosing one technology stack over the other could be described as follows: Prefer a Data Virtualization solution whenever requirements include accessing a variety of different source types, for instance when creating business entity data ( customer, product, site, etc.) dispersed across physical silos in the enterprise. Obviously, in this scenario Data Virtualization outruns ESB approaches either in terms of productivity, performance or governance. Go for an ESB solution when requirements are heavily focused on complex business logic orchestrations and transactional coherence in updating affected sets of systems or services. As things are never easy, there will of course be times, many, when requirements will involve complex workflow orchestrations implicating access to different types of data sources and the subsequent complex combination and transformation. In cases such as these, both technologies can be deployed alongside one another to deliver a comprehensive solution. Data Virtualization, and how to leverage a SOA implementation 3
Top 3 in anyone s wish-list of things to improve: PRODUCTIVITY When requirements involve business processes that integrate data from several data sources, Data Virtualization offers several clear and straightforward advantages: It exposes a virtual data model comprised of views over disparate data sources (e.g. a cross-join between a CRM System and a Billing System combining the proper customer and billing information for a Contact Center agent). Though virtual, this data model can be queried from the application layer using a standard SQL language, like any physical database. Using just a workflow tool, the user never sees a unified data model as such, but rather a number of disparate data sources that have to be accessed independently and differently; in addition, all data combination has to be programmed in the workflow engine. Consider something so extensively used as a cross-join. With Data Virtualization out of the picture, it is being programmed manually by the user, after all the data has been collected from all participating data sources. Such functionality is provided out-of-the-box in a Data Virtualization platform, reducing development cycles and increasing the quality, speed and robustness of the final solution. Single point of access to information: an application will only need to query the Data Virtualization platform, rather than having to launch separate threads to query each and individual data source, as would be the case for a BPMS or ESB. Data Virtualization provides advanced, state-of-the-art functions for normalizing content and complex transformations. Multiple join strategies, textual similarity operators, mapping functions and many others reduce development time still further. We increased our productivity while, at the same time, decreasing our time to benefit from weeks to days R-Cable Manuel Doval, CIO The declarative approach taken by Data Virtualization platforms contrasts with the procedural approach of flow tools like ESB frameworks: using a Data Virtualization platform enables all data sources, combinations and normalizations to be managed using down-to-earth metadata and SQL queries, no complex programming needed. This allows applications to execute ad-hoc queries and data combinations without requiring new middleware artifacts for each discrete application need. To illustrate this point, consider the following example. Suppose you have created a view in a Data Virtualization tool which integrates customer profile information from a CRM with each customer s highest priority, currently open support requests (obtained from a Customer Services database). Suppose you also have another view accessing customer order information from the ERP. Two new application developers submit requests for reports: 1. a summary of the clients with the greatest number of current open tickets and. the client orders related to past service incidents over the past two quarters, in order to determine which issues have had the greatest impact on revenue. Using a declarative approach, both requests can take advantage of simple relational techniques (aggregations and joins in this case); the developers will only need to issue SQL statements directly to the DV layer. By contrast, in the case of an ESB, producing these reports would have required development of two whole new processes. Top 3 in anyone s wish-list of things to improve: PERFORMANCE Data Virtualization is specifically designed to deal with data sets and their combinations using relational operations, offering capabilities for achieving optimal performance in complex scenarios that are simply beyond the scope of a traditional ESB, by automatically making use of sophisticated query optimization techniques such as: auto-delegation of queries to high-powered data sources, minimizing network traffic and subsequent middleware workload, Data Virtualization, and how to leverage a SOA implementation
asynchronous result delivery, parallel sub-queries execution, nested, nested parallel, hash, merge, and other join strategies, and intelligent caching, which decreases response times and protects operational data stores from additional workload. On the contrary, an ESB can only implement such optimizations in the workflow editor - for instance, properly combining information from different sources, selecting the best possible join strategy, etc.). All this support is provided out-of-the-box in a Data Virtualization platform. Furthermore, effective Data Services layers are equipped to handle far larger volumes of data, relying on secondary storage automatically when needed, swapping partially-constructed datasets to secondary storage after a given memory threshold is reached in middleware. Top 3 in anyone s wish-list of things to improve: GOVERNANCE An effective Data Virtualization platform also provides tools for managing and enhancing Virtual Data Models. Features such as data source schema refresh/resynch, automatic change impact analysis, graphical views of metadata lineage, etc. are not available with typical BPMS platforms and still, they are critical to the timely and correct evolution of virtualized business solutions when modifying and upgrading the data sources they connect to. For example, a slight change in just one of all the data sources may require a complete review of all affected business processes in an ESB scenario, whereas the Data Virtualization tool set includes graphical management tools for determining which changes affect which client applications and will easily propagate those changes only to those business views with a single confirming click of a button. Data Virtualization also offers a refined granular security infrastructure, analogous to that of the relational databases it typically connects to. Authorization privileges on virtual data service elements can be configured for enterprise users and roles, whether it be at database, view, row or column level. Advanced Data Virtualization also supports pass-through credentials, enabling simple reuse of data source policies already in place. ESB architectures do not typically offer such a granular security subsystem and are usually limited to painstakingly configuring authorizations on a workflow-process-by-workflow-process basis. Integrating Data Virtualization with an ESB There will be, as noted above, times when ESB and Data Virtualization technologies enhance each other. The complex combinations and data transformations provided by Data Virtualization often exist within complex business workflows with an ESB being used to implement business logic, fetching clean data from the Data Virtualization layer quickly and easily, as needed. From an architectural perspective the ESB would work on top of the Data Virtualization platform which would publish Data Services into the ESB (see fig. below).at the same time being queried over JDBC from a BI reporting tool. In these cases tight integration between these technology platforms is vital in achieving the business goal quickly and affordably. To this end, high-powered Data Virtualization platforms offer a vast array of access methods, which external clients can use to consume data. Denodo, for example, provides surprising connectivity in numerous formats and standards that can be easily invoked from any BPMS or ESB: relational access via SQL queries using several alternative drivers: JDBC ODBC ADO.NET JMS queue used as the request channel for issuing SQL queries to a Denodo-supplied JMS listener, web services using SOAP, via HTTP or JMS, REST interface providing responses formatted in HTML, JSON or XML. Data Virtualization, and how to leverage a SOA implementation 5
As shown in the diagram below, one or many of these access methods can be used simultaneously, enabling many applications to take advantage of shared, common data views directly from the Data Virtualization layer. For example, a particular view could be plugged into the ESB using a SOAP web service as part of a specific business process, while at the same time being queried over JDBC from a BI reporting tool. Service Consumer Apps Apps UI Reports Portal Business Process Orchestration Data Services Enterprise Service Bus Data Virtualization Security Governance Management Service Provider CRM ERP SRM RDBMS Web/ Cloud Files Email SOA-enabled Apps. Non SOA-enabled Apps. Figure Data Virtualization - ESB ecosystem Customer success story R-Cable is a telecommunications company based in Spain offering quadruple-play services: broadband internet access, television, telephone and mobile phone. Founded in 1999, R-Cable currently leads the market in its territory after only twelve years in operation, with a 58% share and 70,000 customers (3% of which are corporate, 77% residential), over,000 employees and an annual revenue well exceeding $300 million. R-Cable adopted the Denodo Platform as their Data Virtualization solution back in 007, for a self-service customer portal project. Denodo s unique ability to offer a 360-degree view of R-Cable s customers made this project successful, with reduced development times, lower maintenance costs and increased end user satisfaction. Since deployment, R-Cable s use of Denodo has expanded into several areas and currently provides an enterprise-wide Data Services platform being used in all areas of the company, including: a self-service customer service portal, a unified call center desktop application, sales reports, TV listings, data retention, single view of customer information, end user profile information for set-top boxes and many more. Within R-Cable, Denodo lives in an ecosystem where ESB and ETL tools are also present. The three technologies, although with some overlapping areas, cover different use cases and niches for their integration needs. Definitely interesting, even useful, to have a look at the operational costs each of them entails. ESB 50% Integration processes 70% of total operational cost Data Virtualization (Denodo) 35% Integration processes 8% of total operational cost! Data Virtualization, and how to leverage a SOA implementation 6
ETL (Oracle) 15% Integration processes % of total operational cost Business Areas Sales & Provisioning Finance & Controlling Customer Services General Admin User Services Used For Business Process Automation (e.g. Provisioning) Application Integration Data Retention Business Intelligence (MicroStrategy) Partner (BB) Integration Services Layer Business Objects App Services Data Services Middleware Technology ESB Data Virtualization ETL Data Sources Provisioning System CRM Billing (Single View) Trouble Ticketing (Remedy) Content (Product Catalog, TV EFG,...) Network Inventory DW Several Web Sources ( Logistics, Web control,...) Figure 3 R-Cable technology stack Conclusions One of the most immediate consequences of the increase in productivity and the improvement in both performance and governance that Data Virtualization will bring to your organization is the drastic reduction in ESB operational costs. Data Virtualization can be used in conjunction with ESB in scenarios that may require the orchestration of complex business workflows, in order to help minimize the development costs the latter technology comes with, while in simpler scenarios the Data Virtualization platform can pay the bill alone. With a technology TCO of 1 million euros, we achieved a yearly cost reduction across all business units of 9 million and ROI in less than three months! Phonehouse David García, Business Change Manager Data Virtualization, and how to leverage a SOA implementation 7
DATA VIRTUALIZATION DATA SERVICES WEB INTEGRATION Denodo Technologies is a leader in Data Virtualization the only platform that delivers Information-as-a-Service across disparate structured, Web and unstructured sources. Innovative leaders in every industry use Denodo to dramatically increase flexibility and lower costs of data integration for agile BI and reporting, call center unified desktops, customer portals, Web and SaaS data integration, and enterprise data services. Founded in 1999, Denodo is privately held. Visit www.denodo.com Email: info@denodo.com twitter.com/denodo Denodo Technologies 530 Lytton Avenue,Suite 301 Palo Alto, CA 9301, USA Phone (+1) 650 566 8833 Email info.us@denodo.com & info.apac@denodo.com Denodo Technologies 17th Floor, Portland House Bressenden Place, London, SW1E 5RS, UK Phone (+) (0) 0 7869 8053 Email: info.emea@denodo.com Denodo Technologies C/ Montalbán, 5 801 Madrid, Spain Tel. (+3) 91 77 58 55 Email: info.iberia@denodo.com & info.la@denodo.com