M6A. TDWI Data Virtualization: Solving Complex Data Integration Challenges. Mark Peco

Size: px
Start display at page:

Download "M6A. TDWI Data Virtualization: Solving Complex Data Integration Challenges. Mark Peco"

Transcription

1 M6A European TDWI Conference with June 22 24, 2015 MOC Munich / Germany TDWI Data Virtualization: Solving Complex Data Integration Challenges Mark Peco TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

2 TDWI Data Virtualization Solving Complex Data Integration Challenges TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

3 TDWI Data Virtualization COURSE OBJECTIVES You will learn: How models are used to define and frame analytic needs Data virtualization definitions and terminology Business case and technical rationale for data virtualization Key concepts and foundational principles of virtualization views, services, etc. Data virtualization life cycle, capabilities, and processes How to extend the data warehouse with virtualization How virtualization enables federation and enterprise data integration How virtualization is applied to big data and cloud data challenges How companies use virtualization to solve business problems and drive business agility The Data Warehousing Institute takes pride in the educational soundness and technical accuracy of all of our courses. Please send us your comments we d like to hear from you. Address your feedback to: [email protected] Publication Date: August 2013 Copyright 2013 by The Data Warehousing Institute. All rights reserved. No part of this document may be reproduced in any form, or by any means, without written permission from The Data Warehousing Institute. ii TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

4 TDWI Data Virtualization TABLE OF CONTENTS Module 1 Data Virtualization Concepts and Principles. 1-1 Module 2 Data Integration Architecture Module 3 Data Virtualization in Integration Architecture Module 4 Data Virtualization Platforms Module 5 Implementing Data Virtualization Module 6 Getting Started with Data Virtualization Appendix A Data Virtualization Case Studies A-1 Appendix B Bibliography and References B-1 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY iii

5 TDWI Data Virtualization iv TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

6 Data Virtualization Concepts and Principles Module 1 Data Virtualization Concepts and Principles Topic Page Data Virtualization Basics 1-2 Why Data Virtualization? 1-14 The Data Virtualization Foundation 1-20 Virtualize or Materialize? 1-30 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-1

7 Data Virtualization Concepts and Principles TDWI Data Virtualization Data Virtualization Basics Data Virtualization Defined 1-2 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

8 TDWI Data Virtualization Data Virtualization Concepts and Principles Data Virtualization Basics Data Virtualization Defined WHAT IT MEANS TO BE VIRTUAL The Oxford Dictionary defines virtual as not physically existing as such but made by software to appear to do so. Virtual data, then, is a data structure that appears to exist but does not exist as a physically stored set of data. Data virtualization (DV) includes the processes and technologies that are used to create virtual data. Wikipedia describes data virtualization as the presentation of data as an abstract layer, independent of underlying database systems, structures, and storage. This definition captures two key elements of data virtualization: abstraction decoupling (removal of dependencies) FROM THE EXPERTS The facing page shows two definitions from recognized experts in the subject of data virtualization. Key concepts in Rick van der Lans s definition include: virtualization as a process data consumers hidden technology Judith Davis and Robert Eve define virtualization from a purposeful perspective, with the purpose encompassing: integration of disparate data reach across internal and external data sources complete information high-quality information actionable information TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-3

9 Data Virtualization Concepts and Principles TDWI Data Virtualization Data Virtualization Basics Virtualization vs. Materialization 1-4 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

10 TDWI Data Virtualization Data Virtualization Concepts and Principles Data Virtualization Basics Virtualization vs. Materialization BUSINESS AND TECHNICAL PERSPECTIVES ABSTRACT vs. PHYSICAL The physical form and location in which data are stored is generally of little interest to data consumers, and it can be a real barrier to finding data that is needed. When data is found the physical structure is typically optimized for database and application performance important technology considerations, but realities that make understanding of and access to data more difficult. The technology view of data is necessarily physical, working with data locations and database structures. The business view of data is more abstract, working with views that vary depending on the processes and circumstances in which data is applied. Both needs are readily met when data is materialized (managed physically) for technology purposes and virtualized (managed abstractly) for business purposes. Data structures provide the means to map from material to abstract. Physical models describe materialized data structures. Logical and conceptual models describe virtualized data structures. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-5

11 Data Virtualization Concepts and Principles TDWI Data Virtualization Data Virtualization Basics Virtualization vs. Materialization 1-6 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

12 TDWI Data Virtualization Data Virtualization Concepts and Principles Data Virtualization Basics Virtualization vs. Materialization VIRTUAL vs. MATERIAL DATA INTEGRATION The facing page illustrates the distinction between materialized and virtualized with examples seeking to integrate the same disparate data with similar but different goals. These examples are typical of the data integration challenges for a data warehouse, operational data store, or master data hub. The goal of materialization is a single source of rationalized data, where source implies a physical database. The goal of virtualization is a single view of rationalized data, where view implies a logical, but not physically instantiated data structure. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-7

13 Data Virtualization Concepts and Principles TDWI Data Virtualization Data Virtualization Basics Virtualization vs. Synchronization 1-8 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

14 TDWI Data Virtualization Data Virtualization Concepts and Principles Data Virtualization Basics Virtualization vs. Synchronization MATCHING MULTIPLE DATABASES Synchronization is a somewhat different data integration requirement than the consolidation work performed for data warehouses. The purpose of synchronization is to keep multiple databases aligned in time and state to maintain multiple copies of the same data where the data values are consistent across all copies at all times. Common use cases for synchronization include MDM synchronization of master reference data, geographically distributed data, local copy with cloud-hosted master, and application integration such as CRM to ERP alignment. Synchronization rules may be defined in a variety of forms: Master-slave priority where one database is always the master copy and changes are pushed to all other copies. Most recent transaction priority where every transaction that occurs in any database is propagated to all other copies in chronological sequence. Rule-based data selection where a complex business rule is used to determine from which database a value is pushed to other copies of a data item. Synchronization is not a form of virtualization because each distinct copy of the data is materialized. It is practical, however, to sometimes replace or supplement synchronization with virtual data services. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-9

15 Data Virtualization Concepts and Principles TDWI Data Virtualization Data Virtualization Basics Virtualization vs. Federation 1-10 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

16 TDWI Data Virtualization Data Virtualization Concepts and Principles Data Virtualization Basics Virtualization vs. Federation INTEGRATED YET AUTONOMOUS FEDERATION AS PART OF DATA INTEGRATION According to Rick van der Lans, the term federation refers to combining autonomously operating objects and data federation is combining autonomous data stores to form one large data store. 1 On that basis, van der Lans provides the definition shown on the facing page. Data integration, as we ve discussed thus far, encompasses two techniques materialization, which creates physical sources of integrated data, and virtualization, which creates non-material views of, integrated data. Federation is a subset of virtualization a specific use of virtualization to create views that rationalize multiple, disparate, and autonomous data stores. As van der Lans says, Not all forms of data virtualization imply federation but federation always results in virtualization. 2 PRINCIPLES OF FEDERATION Four principles capture the essence of data federation: Virtualization: Data federation is a form of data virtualization. Heterogeneity: Data federation works across multiple data types, data structures, data storage technologies, and data access methods. Autonomy: Each data store integrated through data federation is also able to operate independently and be applied for uses outside the scope of federation. On-Demand: Integration is triggered by a consumer request. Data access and integration occur only when the consumer asks for data 1 Clearly Defining Data Virtualization, Data Federation, and Data Integration, van der Lans, 2 Clearly Defining Data Virtualization, Data Federation, and Data Integration, van der Lans, TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-11

17 Data Virtualization Concepts and Principles TDWI Data Virtualization Data Virtualization Basics History and Evolution 1-12 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

18 TDWI Data Virtualization Data Virtualization Concepts and Principles Data Virtualization Basics History and Evolution A TIMELINE VIEW OF DATA INTEGRATION Prior to the late 1980s data integration was typically a handcrafted point solution to a specific problem or need. In 1988, IBM researchers Barry Devlin and Paul Murphy coined the term information warehouse. Bill Inmon popularized warehousing and moved it from experimental to mainstream in the early 1990s with his book Building the Data Warehouse. Early data warehouses (and most data warehouses today) were clearly physical. The concept of virtual data warehouse surfaced frequently but struggled to gain wide acceptance. Still the issues of lift and shift data duplication were cause for concern and the concept of virtualization persisted. As EII tools matured to become data virtualization tools, and as new kinds of data and new expectations pushed the limits of batch ETL, data virtualization gained acceptance. Change is driving adoption of data virtualization today change in data types and change in business expectations about information velocity and business agility. Data virtualization doesn t replace ETL, but it is an essential part of the integration toolbox. Today ETL is familiar and comfortable for most data integrators. They look to data virtualization only when ETL can t get the job done when batch ETL is too slow, the data sources are difficult to access, or the data types are challenging. Change will continue to drive the evolution. Look at the timeline on the facing page. Note the sparseness on the left and progressively increasing density as you move from left to right. The pattern is indicative of accelerating change and accelerating challenges for data integrators and data providers. In time, data virtualization will become familiar and comfortable. Expect in the future that virtualization will take center stage. We will choose data virtualization first, and turn to ETL and materialization when data virtualization doesn t meet the needs for example highly complex transformations or the need to persist history beyond its lifespan in source systems. The reality is that ETL and data virtualization are not competing technologies they are complementary. Data virtualization adds a new tool to the data integrator s toolbox. Technology decisions should be based on requirements choosing the best tool for the job. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-13

19 Data Virtualization Concepts and Principles TDWI Data Virtualization Why Data Virtualization? Business Agility 1-14 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

20 TDWI Data Virtualization Data Virtualization Concepts and Principles Why Data Virtualization? Business Agility SHAPING YOUR BUSINESS FUTURE Business agility is the popular term that describes the capabilities of a business to quickly respond to changing conditions. In a complex, competitive, and continuously changing business environment, it is easy to understand why agility is important. But knowing that it matters doesn t make it easy to achieve. Judith Davis and Robert Eve state the case clearly: While the importance of business agility is well understood, achieving it is a difficult and ongoing challenge. The key to success is information. Armed with the right information, business decision makers can better evaluate their environment and decide how to adapt it for future success. 1 There are two key messages in this quote: Business agility depends on information. Business agility is about shaping the future of your business. DIMENSIONS OF AGILITY Davis and Eve continue to describe three aspects of agility, all of which must be satisfied to achieve true business agility: Decision agility describes the speed at which informed decisions can be made. Time-to-solution agility describes the cycle time from recognizing a business need to delivery of the information services that are needed to respond to the need. Resource agility describes the ability of information services organizations to adjust people, projects, and priorities to quickly respond to business pressures. 1 Data Virtualization, Davis and Eve, 2011 ( TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-15

21 Data Virtualization Concepts and Principles TDWI Data Virtualization Why Data Virtualization? The Data Virtualization Business Case 1-16 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

22 TDWI Data Virtualization Data Virtualization Concepts and Principles Why Data Virtualization? The Data Virtualization Business Case VIRTUALIZATION ENABLES AGILITY The business case for technology initiatives is typically based in financials ROI, TCO, payback time, etc. The data virtualization business case is less finance oriented than outcome oriented. Perhaps that is because the decision to pursue data virtualization isn t truly a technology initiative. It is a business initiative to improve the speed and quality of actionable information. Data virtualization enables business agility, action ability, information speed, and information quality with: Rapid data integration, which results in quicker time-to-solution for business information needs. More information opportunities with reach into the new types and greater volumes of data that are available today. More robust business analysis through more types of data and more extensive data integration. More complete information through reach to new data types and greater data volumes. Better quality information that translated to business syntax and context instead of delivery in systems and data storage context. Simplified data governance by reducing the number of replicated and redundant data stores that must be reconciled. Clear connection of information and its value with time and resources used to get information. Less costly information infrastructure by reducing costly lift and shift processes and databases. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-17

23 Data Virtualization Concepts and Principles TDWI Data Virtualization Why Data Virtualization? The Data Virtualization Technical Case 1-18 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

24 TDWI Data Virtualization Data Virtualization Concepts and Principles Why Data Virtualization? The Data Virtualization Technical Case BUSINESS ALIGNED TECHNOLOGY While data virtualization is motivated by business agility, it does have substantial technical implications, enabling IT organizations to become more responsive to continuous and quickly changing needs for business information. Data virtualization enables fast and effective delivery of business information by: Making data integration easier to achieve both in scope and timeliness of information. Providing a platform for rapid, iterative development where information requirements can be discovered and change is not a barrier to quick delivery. Reducing development cycles (time to solution) by eliminating the need to design and develop redundant data stores and processes to lift and shift data. Making developers more productive with a development environment that focuses on business-perspective information delivery instead of detailed mechanics of data manipulation. Supporting the discovery-driven requirements and test-driven development needs of agile development projects. Breaking down the barriers of integrating structured and unstructured data into a single consumer view of information. Providing fast, easy access to cloud-hosted databases of all types. Meeting performance expectations and SLAs through query performance optimization. Reducing the maintenance and management overhead of data integration systems. Working together with ETL-based integration in a way that allows each technology to do what it does best. Extending the data integration toolbox with a new tool that doesn t demand radical change and readily supports systematic migration. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-19

25 Data Virtualization Concepts and Principles TDWI Data Virtualization The Data Virtualization Foundation Views 1-20 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

26 TDWI Data Virtualization Data Virtualization Concepts and Principles The Data Virtualization Foundation Views WINDOWS INTO COMPLEX DATA MULTIPLE VIEWS The central component of a data virtualization platform is a view. Views are purposeful means of seeing complex data in simplified and specific context that is matched to the viewer s perspective. Depending on the specific virtualization platform being used and the data types involved, the views may be SQL-based, XML-based, services-based, etc. The earlier statement that views are purposeful means captures an important consideration. Much of the value of views is that they are not one size fits all data structures. Data integrators work with three distinct kinds of views three purposes for working with data: Connection views serve the purpose of accessing data sources, corresponding to extraction in ETL processing. These are the windows through which we see the content of disparate data sources. Connection views may include a degree of normalization and rationalization. Integration views are used to combine and connect data from disparate sources, corresponding with transformation in ETL processing. Integration views show data relationships, resolve inconsistencies, rationalize data formats and values, and improve data quality. Consumer views are the business-oriented windows into data, with some correspondence to load functions of ETL processing. A significant difference from ETL is that load simply places data into a different collection of database tables; consumer views are more similar to publishing of business information. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-21

27 Data Virtualization Concepts and Principles TDWI Data Virtualization The Data Virtualization Foundation Query Optimization 1-22 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

28 TDWI Data Virtualization Data Virtualization Concepts and Principles The Data Virtualization Foundation Query Optimization SPEED OF INFORMATION Optimization is an essential element of any data virtualization platform. Unlike the ETL-based warehouse, the data is not stored in ready-to-access data marts. It resides only at the source until requested by a consumer s query. The good news: you haven t done a lot of work to integrate lots of data that is never accessed. The bad news: when data is requested, all of the work from access, through integration, to information delivery must be performed in real time. Thus, optimization is a must; not only must the work be performed but it must be done fast. To enable business and IT agility, a data virtualization tool must: Recognize that the network is a bottleneck. Effectively optimize to minimize network traffic without information loss. Recognize that each data source has its unique and technologyspecific access methods. For example, avoid using generic access language such as ANSI SQL. Instead translate to use the SQL native to each relational data source. Push as much work as practical to the data source. Use source specific features when practical to minimize the work done via virtualization views and to reduce the volume of data moving across the network. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-23

29 Data Virtualization Concepts and Principles TDWI Data Virtualization The Data Virtualization Foundation Data Services 1-24 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

30 TDWI Data Virtualization Data Virtualization Concepts and Principles The Data Virtualization Foundation Data Services DATA SERVICES AND INFORMATION SERVICES Extending from data virtualization to data and information services is a logical and natural progression. Adding a services layer to virtualized data has several distinct advantages. Right-time integrated data is easily available to your newer SOAbased applications. SOA-based, no-hub (MDM) is enabled with source data continuing to reside at the source, and fast, real-time, multidirectional data integration. Data services maximize opportunity to create reusable data objects that encapsulate both business-rule-based and integration-based behaviors. Information services achieve an exceptional level of consumer friendliness for information access. Data and information services enable data and information mashups, enhancing the self-service capabilities of consumers to meet their own needs. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-25

31 Data Virtualization Concepts and Principles TDWI Data Virtualization The Data Virtualization Foundation A Bird s-eye View 1-26 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

32 TDWI Data Virtualization Data Virtualization Concepts and Principles The Data Virtualization Foundation A Bird s-eye View FROM DATA TO INFORMATION AT HIGH SPEED The role of data virtualization is to combine data from disparate sources of many types and to move that data at high speed to be delivered as high quality, integrated, consistent, relevant, and timely business information. As with all complex problems, complexity becomes manageable when divided into logical parts. The logical parts of data virtualization include: A source layer implementing connection views. The source layer connects with many different data structures and types using a variety of access languages. The source layer is the point at which decoupling separating data use from data storage is achieved. When the source layer understands both read and write functions of each source then multi-directional integration becomes possible. An integration layer implementing integration views. Here data perspective shifts from data-storage syntax to business syntax. This layer delivers the transformation and federation capabilities to represent data relationships and data combinations not apparent in the individual data sources. Transformation capabilities are also applied for consistent representation of data and for data quality improvement. A business layer implementing consumer views. This layer shifts the perspective from business syntax to data usage. The views at this layer present data in accessible and understandable forms that make it readily available both for business user consumption. The same views at this level support rapid application development activities such as prototyping and agile projects. An application layer implementing data services, consumer views, or a combination of the two. Both single-point-of-interface and usage specific information capabilities are supported here. The application layer spans data access capabilities ranging from SOA applications to virtual data marts. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-27

33 Data Virtualization Concepts and Principles TDWI Data Virtualization Virtualize or Materialize? Decision Factors 1-28 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

34 TDWI Data Virtualization Data Virtualization Concepts and Principles Virtualize or Materialize? Decision Factors COMPLEX DECISIONS Deciding whether to integrate data materially, virtually, or as a hybrid is a complex process that involves many decision variables and that is made on a case-by-case basis: Time to solution is the speed at which a data integration solution is needed. Greater urgency indicates virtualization. Cost sensitivity is a budget-driven variable. Exceptionally limited budget indicates virtualization. Requirements stability is concerned with clarity and constancy of data integration requirements. Clear and stable requirements are suited to materialization uncertain and volatile requirements fit virtualization. Replication constraints consider privacy and policy limits to creating multiple copies of data. Use virtualization when constraints are strong. Organizational personality describes a cultural continuum that ranges from cautious and risk-averse to adventurous. Highly cautious organizations are better suited to tried-and-true methods such as ETL. Source system availability is essential to virtualization. Limited availability makes on-demand integration difficult to achieve. Source system load considers the processing capacity of source systems to take on additional query demand. For source systems with little headroom, demands of virtualization may exceed capacity. Data cleansing needs may inhibit use of virtualization. Messy data that requires complex cleansing algorithms is a poor fit for virtualization. Transformation complexity considers the structures, dependencies, and quantities of business and data rules that must be applied to integrate data. Highly complex transformations are better suited to materialization than to virtualization. Application focus ranges from operational and real-time decision support to time-series analysis and data mining. The real consideration here is the amount of history that is needed in integrated data. When history needs exceed that which is available in source systems at any point in time, then materialization is necessary. Data format influences the choice with multi-dimensional and other non-sql target data structures better suited to materialization than to virtualization. Target data freshness has similar influence. Real-time and very low latency data are virtualization friendly and difficult to achieve with ETL and materialization. Data volume per query must be considered. Processing large amounts of data with each query is not ideal for data virtualization. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-29

35 Data Virtualization Concepts and Principles TDWI Data Virtualization Virtualize or Materialize? Business Considerations Discussion 1-30 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

36 TDWI Data Virtualization Data Virtualization Concepts and Principles Virtualize or Materialize? Business Considerations Discussion Think about your organization and your BI systems and projects. Where do you fit for each of the five factors shown here? Is the answer the same for all integration needs or do answers change depending on data subjects, data sources, or data integration projects? TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 1-31

37 Data Virtualization Concepts and Principles TDWI Data Virtualization 1-32 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

38 Data Integration Architecture Module 2 Data Integration Architecture Topic Page Integration Architecture Concepts 2-2 Reference Architectures 2-8 Integration Architecture Examples 2-16 Virtualize or Materialize? 2-24 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-1

39 Data Integration Architecture TDWI Data Virtualization Integration Architecture Concepts Integration Architecture Defined 2-2 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

40 TDWI Data Virtualization Data Integration Architecture Integration Architecture Concepts Integration Architecture Defined ARCHITECTURE DATA INTEGRATION ARCHITECTURE Architecture defines the roles, structure, relationships, and rules by which a collection of components constitute a cohesive whole the glue that bonds individual parts into a system. Architecture is an early-stage design activity that precedes detailed design, specification, and construction. Effective architecture ensures that the things we build: Are suited to the purposes for which they are intended Comply with regulations and standards Fit gracefully into their environment Are sustainable through their expected lifespan Are aesthetically pleasing These principles hold true for architecture of many things buildings, bridges, information systems, and more. Data integration architecture defines the roles, structure, relationships, and rules to aggregate a collection of data integration components into a data integration system. The facing page illustrates generic data integration architecture comprising these components: Disparate data sources The non-integrated data that is the target of data integration activity. The scope of data types ranges from highly structured relational data to unstructured, web, cloud, and big data sources. Data access methods The means by which integration technologies connect to data sources. These methods encompass all of the common data access protocols. Data integration technologies The classes of tools that are available to automate and execute data integration tasks: data replication, data virtualization, extract-transform-load (ETL), and enterprise application integration (EAI). Data integration techniques The methods, processes, and products that are used to combine, connect, and rationalize disparate data as a unified data resource: propagation, transformation, consolidation, and federation. Integrated data applications The business and information systems that access and use integrated data Integration management The essential components to for integration system internals: quality, metadata, and systems management. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-3

41 Data Integration Architecture TDWI Data Virtualization Integration Architecture Concepts Data Sources, Middleware, and Data Consumers 2-4 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

42 TDWI Data Virtualization Data Integration Architecture Integration Architecture Concepts Data Sources, Middleware, and Data Consumers MORE THAN TECHNOLOGY The previous view of data integration architecture is a technology-focused perspective that begins with databases and ends with systems. A more holistic view extends the architecture to include the very important elements of business activities and business people. With these elements included, a three-layer view of data integration is useful perspective: Data sources include the technical components described earlier the databases containing structured and unstructured data. But the real sources are the business activities where data is created (planning, management, and day-to-day business functions) and the people (planners, managers, and staff) who perform those activities. Data consumers include the applications described earlier, ranging from domain specific systems to analytics. But the ultimate consumers are the business activities that are informed by data strategic, tactical, and operational and the executives, managers, and staff who perform those activities. Middleware is the technology that bridges from data sources to data consumers. Middleware includes all of the technological components for data access, integration technologies and techniques, and integration management. ETL, EAI, and data virtualization platforms are all types of data integration middleware. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-5

43 Data Integration Architecture TDWI Data Virtualization Integration Architecture Concepts You Have It (Whether Defined or Not) 2-6 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

44 TDWI Data Virtualization Data Integration Architecture Integration Architecture Concepts You Have It (Whether Defined or Not) LEGACY ARCHITECTURE You probably already have data integration architecture even if you don t recognize it as such. Anyone with a data warehouse, an operational data store, or even application interfaces has integration components with rules, roles, relationships, and purpose to combine data from multiple sources. The architecture may not be elegant and it may not be documented, but it does exist. Maybe not elegant, maybe not documented, maybe you don t recognize it as architecture. But you have components with roles, relationships, and purpose to present unified views of data. It is important to know what you have to begin there and then ask what you need and how to extend, expand, evolve existing architecture for the new challenges of data growth and business agility. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-7

45 Data Integration Architecture TDWI Data Virtualization Reference Architectures Forrester s Data Architecture Reference Model 2-8 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

46 TDWI Data Virtualization Data Integration Architecture Reference Architectures Forrester s Data Architecture Reference Model WHY REFERENCE ARCHITECTURE? FORRESTER DATA MANAGEMENT ARCHITECTURE Reference architecture captures the core concepts of components and relationships for a particular type of system or collection of systems. The purpose is to provide guidance for development of specific architecture for a targeted organizational and technical environment. Data integration architecture, then, captures the essential components and relationships of data integration systems, providing framework and guidance to define and develop specific data integration architectures. The facing page illustrates Forrester s Data Management Reference Architecture. 1 Note the substantial presence of data virtualization as architectural components. As indicated by the check marks ( ) on the diagram, data virtualization supports virtualized data access, derived data stores, and data integration (though the older term EII is used to refer to virtualization as a data rationalization component). Although the terminology and visualization are somewhat different, this reference architecture is quite similar to the earlier illustration of generic data integration architecture. It is also interesting that while the title refers to data management the core of the architecture is focused on data integration perhaps suggesting that integration is the predominant challenge in data management today. 1 Forrester s Data Management Reference Architecture, Yuhanna, Leganza, Karel, Evelson, Kobelius, & Owens, TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-9

47 Data Integration Architecture TDWI Data Virtualization Reference Architectures Forrester s IaaS Architecture 2-10 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

48 TDWI Data Virtualization Data Integration Architecture Reference Architectures Forrester s IaaS Architecture FORRESTER INFORMATION SERVICES ARCHITECTURE In addition to data management architecture, Forrester offers a reference architecture for Information as a Service (IaaS). Distributed data access, integration middleware, and SOA-based delivery are the core elements of this architecture core characteristics that depend upon data virtualization technology to enable them. This reference architecture can provide especially useful guidance to extend and evolve data integration architecture for those organizations pursuing service-oriented master data management (MDM) and 360 o views of enterprise data. 1 The Forrester Wave : Information-As-A-Service, Q1 2010, Yuhanna & Gilpin, quickscan/-/e-res55204 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-11

49 Data Integration Architecture TDWI Data Virtualization Reference Architectures Gartner s Data Services Layer Architecture 2-12 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

50 TDWI Data Virtualization Data Integration Architecture Reference Architectures Gartner s Data Services Layer Architecture GARTNER S SERVICES LAYERS Gartner also offers service oriented reference architecture that is geared to Data as a Service (DaaS). The Gartner model consists of eight layers that progress from data sources to business processes. Business processes, services, and applications constitute the business components. The data services layers work to access and transform data and to map it into semantic and business context. These layers are enabled by data virtualization technology. Data sources are the bottom layer of the stack, representing the wide variety of disparate data types that are today s integration challenge. Supporting structure includes optimization and data governance processes. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-13

51 Data Integration Architecture TDWI Data Virtualization Reference Architectures IBM s BI Reference Architecture 2-14 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

52 TDWI Data Virtualization Data Integration Architecture Reference Architectures IBM s BI Reference Architecture IBM S VIEW OF BUSINESS INTELLIGENCE IBM s Business Intelligence Reference Architecture puts data integration into BI context. Note the focus on connecting data consumers and data sources. But in the IBM model the bridge is a combination of data warehousing and business analytics a common BI perspective. In this architecture data virtualization isn t highly visible. It would fit well in the column of integration processes ETL, data quality, data integration with ability to bypass the data stores column immediately to the left of integration processes. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-15

53 Data Integration Architecture TDWI Data Virtualization Integration Architecture Examples Example 1 Ministry Social Services Logical Architecture 2-16 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

54 TDWI Data Virtualization Data Integration Architecture Integration Architecture Examples Example 1 Ministry Social Services Logical Architecture THREE-LAYER ARCHITECTURE The example on the facing page is drawn from a case study of Compassion International described in the book Data Virtualization. 1 The data virtualization system is designed to integrate data from multiple, complex sources including ERP, EDW, application databases, and cloudhosted databases. Progression from source views of data to consumer views depends on multi-layer architecture, models to describe the data in source and in business contexts, mapping and rules to drive data transformations. The data virtualization layer encompasses three sub-layers: source transform views, canonical objects, and consumer views each considered to be a collection of building blocks. The characteristics of the building blocks as described by Davis and Eve are: They are actual views where query is possible not just logical objects. They look more like source system views at the bottom of the diagram and become increasingly business-oriented as you move upward. They encapsulate standard and reusable business logic. They can be cached for performance optimization. Each is documented in a wiki form accessible to end-users and to developers. 1 Data Virtualization, pp , Davis and Eve, 2011 ( TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-17

55 Data Integration Architecture TDWI Data Virtualization Integration Architecture Examples Example 2 Energy Industry Logical Architecture 2-18 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

56 TDWI Data Virtualization Data Integration Architecture Integration Architecture Examples Example 2 Energy Industry Logical Architecture FOUR-LAYER ARCHITECTURE This example, also drawn from a Data Virtualization 1 case study of a Global 50 energy company, views the data virtualization layer as four distinct sub-layers: Source connections access the disparate data sources and provide data to the conforming layer. The conforming layer transforms data to conform to a common data model. The common semantic layer is a collection of views into the common data model. The business demand layer publishes views and services that are used by data consumers to access data. This data virtualization layer also includes a data storage component that improves performance by staging data for fast retrieval. The energy company s IT executive expresses a key concept of this architecture s IT executive: The [consuming] application does not go directly to the system of record [the data source] but rather to the record of reference, which is the data virtualization layer. 2 1 Data Virtualization, pp , Davis and Eve, 2011 ( 2 Data Virtualization, pp. 118, Davis and Eve, 2011 ( TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-19

57 Data Integration Architecture TDWI Data Virtualization Integration Architecture Examples Example 3 Energy Industry Technical Architecture 2-20 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

58 TDWI Data Virtualization Data Integration Architecture Integration Architecture Examples Example 3 Energy Industry Technical Architecture TECHNOLOGY ENABLED SERVICES Extending from logical to technical architecture shows the technologies involved in data virtualization and the services roles of each. This extends the view of architecture from what to how of virtualizing data, and shifts perspective from logical layers to a technology services. The variety of data sources includes Oracle, SQL Server, SAP, Web Services, and local interfaces for inbound data. Embarcadero Studio is used to manage and maintain the common data model. IBM Netezza Data Warehouse Appliance implements the data storage component. Microsoft technology is used for data mapping. Cisco Data Virtualization implements the virtualization services. A variety of technologies are used by BI and data integration consumers of virtualized data. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-21

59 Data Integration Architecture TDWI Data Virtualization Integration Architecture Examples Example 4 Financial Services Logical Architecture 2-22 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

60 TDWI Data Virtualization Data Integration Architecture Integration Architecture Examples Example 4 Financial Services Logical Architecture VIRTUALIZATION SERVICES ARCHITECTURE This example illustrates a logical services perspective of data virtualization architecture. Drawn from a Denodo case study 1 of a debt collection company, the architecture connects corporate systems and business processes with disparate web data sources through interaction of data acquisition, data transformation, and data virtualization services. The nature of the data sources LinkedIn, Facebook, Twitter, etc. is of particular interest here. Working with external and primarily unstructured data is different than working with internal and structured data, especially for data acquisition and transformation. Acquisition uses a combination of web services and web extraction techniques. Transformation must filter, prioritize, and normalize data to be passed to virtualization services. The virtualization services publish the views, services, and interfaces through which consumers access the data. 1 Case Study: Reintegra ( TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-23

61 Data Integration Architecture TDWI Data Virtualization Virtualize or Materialize? Data Source Considerations Discussion 2-24 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

62 TDWI Data Virtualization Data Integration Architecture Virtualize or Materialize? Data Source Considerations Discussion Think about your organization and your BI systems and projects. Where do you fit for each of the four factors shown here? Is the answer the same for all integration needs or do answers change depending on data subjects, data sources, or data integration projects? TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 2-25

63 Data Integration Architecture TDWI Data Virtualization 2-26 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

64 Data Virtualization in Integration Architecture Module 3 Data Virtualization in Integration Architecture Topic Page Virtualization in Data Integration Projects 3-2 Data Warehousing Use Cases 3-4 Data Federation Use Cases 3-16 MDM and EIM Use Cases 3-28 More Data Virtualization Applications 3-36 Virtualize or Materialize? 3-40 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-1

65 Data Virtualization in Integration Architecture TDWI Data Virtualization Virtualization in Data Integration Projects Data Virtualization Use Cases 3-2 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

66 TDWI Data Virtualization Data Virtualization in Integration Architecture Virtualization in Data Integration Projects Data Virtualization Use Cases VIRTUALIZATION OPPORTUNITIES The opportunities for data virtualization in projects are many and don t necessarily imply long-term virtualization in production applications. It is common to virtualize in development and materialize for production. The facing page illustrates many types of integration systems from data warehousing to cloud data integration, and many uses of virtualization in projects from prototyping to production. Many specific use cases are described on the following pages. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-3

67 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Warehousing Use Cases Data Warehouse Augmentation 3-4 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

68 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Warehousing Use Cases Data Warehouse Augmentation EXTENDING THE EXISTING DATA WAREHOUSE CHALLENGES Traditional data warehousing systems are designed to provide integration of structured data through extract, transform and load (ETL) processes. The output of ETL processing is integrated data that is physically stored in a relational database and made available for downstream reporting and access. Depending on the data architecture, additional data stores such as data marts may exist to optimize the information delivery functions. The batch nature of ETL processing necessitates some latency of warehouse data. Long-term success and sustainability of a data warehouse is based on ability to adapt and evolve to the meet continuously changing information needs. The time and effort required to bring in additional source data is a significant challenge for existing data warehouses. The challenge and the complexities increase when the new requirements include unstructured data. Real-time data requirements bring additional challenges in ETLbased data warehousing processes. Abundance of unstructured data and the impact of big data technologies bring both opportunities and challenges. The emergence of big data in a modern business context especially social media data creates opportunity to analyze and better understand customer perceptions and behaviors. But with the opportunity comes complexity unstructured social data is not a quick and easy fit into a traditional data warehouse. OPPORTUNITY ENABLED BY VIRTUALIZATION Data Virtualization can be applied to complement and augment an existing data warehouse with virtual views to meet new information requirements. Unstructured data, cloud data, and real-time data integration can be implemented without extensive and disruptive changes to the core data model and ETL processing. Speed of delivery and speed of data are accelerated with virtualization. Leveraging new and existing data sources more rapidly advances business agility. Unstructured data is integrated with structured data and new reporting applications are quickly implemented. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-5

69 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Warehousing Use Cases Data Warehouse Federation Data Virtualization Federated Views Data Warehouse Data Warehouse ETL Processing ETL Processing 3-6 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

70 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Warehousing Use Cases Data Warehouse Federation MULTIPLE DATA WAREHOUSES CHALLENGES Many organizations have multiple data warehouses for a variety of reasons. Mergers and acquisitions, independent departmental initiatives for data integration, and purchased or hosted applications with data warehouse components are among the most common causes. Whatever the causes may be, the result is typically new silos of data without having achieved full enterprise integration. Enterprise reporting and robust analytics require data integration across the enterprise, but physical integration of multiple data warehouses is time-consuming and costly. It is especially challenging in dynamic and volatile environments where the rate of data and systems change may exceed the capacity for continuous data warehouse alignment. The real challenge is to deliver an integrated view from many different data warehouses to support enterprise wide information needs quickly, efficiently, and cost-effectively. OPPORTUNITY ENABLED BY VIRTUALIZATION Data Virtualization provides the means to meet the challenges fast, efficient, and cost-effective. Each individual data warehouse continues to operate independently, serving the users and purposes for which it is designed. Simultaneously, the warehouses can be federated through views that support new uses and provide enterprise-wide perspective. Data virtualization is the means to achieve federation. Recall the earlier quote from Rick van der Lans (page 1-10): federation always results in virtualization. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-7

71 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Warehousing Use Cases Hub and Virtual Spoke 3-8 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

72 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Warehousing Use Cases Hub and Virtual Spoke SUSTAINABLE AND SCALABLE DATA MARTS CHALLENGES OPPORTUNITY ENABLED BY VIRTUALIZATION Hub and spoke data warehouse architecture uses a central data warehouse the hub as the point of data integration, then produces multiple data marts the spokes to support information needs of various workgroups Though strength of integration is high, the workloads for development and for operation are also high because each data mart has unique ETL processing. As demand for information accelerates, the demand for new data marts grows. Data mart proliferation brings redundancies and inconsistencies that degrade strength of integration and decay data quality. Parallel growth of workload and loss of quality is clearly not a sustainable approach to data warehousing. Data virtualization offers the opportunity to create virtual data marts. These virtual spokes can be deployed quickly, without the increased workload of additional ETL processing, and with substantially reduced risk of data quality issues. Many new data mart requirements, and many changes to existing data marts, can be met without increased development, processing, and administrative workload. When changes to an existing data mart are implemented by virtualizing the mart, overall workload may actually be reduced. Virtualization enables the concept of disposable data marts and creation of new data marts easily, and without need to build new physical data stores. This is a particularly powerful technique in highly volatile business and systems environments. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-9

73 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Warehousing Use Cases Complement ETL 3-10 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

74 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Warehousing Use Cases Complement ETL RESOLVING ETL INCOMPATIBILITY CHALLENGES OPPORTUNITY ENABLED BY VIRTUALIZATION Most older data warehouses acquire, process, and load data through ETL processing. The data sources in are typically structured data that is readily suited to relational database management systems. As technology has grown and evolved, many new data sources are not well suited to traditional ETL processing. Efforts to add new data sources to existing ETL processes are challenged when the ETL technology lacks interfaces and access methods for the data that is needed. Common examples of data source and ETL incompatibility include ERP-embedded databases and web services data. Modifying existing ETL processes to access these sources is complex and likely to introduce bugs and performance problems into previously stable processes. Data virtualization is an effective way to remove or reduce data source to ETL incompatibilities. Using a virtualization tool you can pre-process problem data sources, creating views that are readily accessible by your ETL technology. Changes to existing ETL processes, and the risks inherent in those changes are substantially reduced. The complexities and incompatibilities are managed externally while integrity of the ETL process is maintained. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-11

75 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Warehousing Use Cases Data Warehouse Prototyping 3-12 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

76 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Warehousing Use Cases Data Warehouse Prototyping REQUIREMENTS DISCOVERY Collecting data warehouse requirements is an iterative process that takes refinement. Requirements analysis for warehousing is a continuous process of discovery about business information needs and about hidden characteristics of source data. The warehouse requirements analyst quickly discovers that Users can t always tell you what they need. Data models and documentation are rarely complete, current, and accurate. Recent interest in agile methods for data warehouse development raises the stakes for warehouse prototyping. CHALLENGES OPPORTUNITY ENABLED BY VIRTUALIZATION The key to effective prototyping is speed fast response to discovery and change. Rapid prototyping is good prototyping, but physical warehouse development has many barriers to fast changes. When you prototype a data warehouse each cycle of discovery brings changes to database schema, to data transformation logic, and potentially to choice of data sources. These are programming changes labor intensive, and the antithesis of rapid. Data virtualization can substantially reduce the challenges of change when prototyping a data warehouse. Without physical database schema, and with much of integration and transformation logic embedded in canonical models, cycles of prototyping are executed much faster. New requirements and new data sources can be quickly integrated into existing virtual structures. Ultimately, when discovery diminishes and requirements stabilize, you can migrate from a virtual to a physical data warehouse for runtime efficiencies and historical data retention. The prototyping opportunity, of course, applies to extending existing data warehouses as well as building new data warehouses. Prototyping can be performed at any level from EDW to data marts. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-13

77 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Warehousing Use Cases Data Warehouse Migration 3-14 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

78 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Warehousing Use Cases Data Warehouse Migration MOVING THE DATA WAREHOUSE CHALLENGES OPPORTUNITIES ENABLED BY VIRTUALIZATION Sometimes you simply need to move a data warehouse from one platform to another. Cost savings, performance gains, increased accessibility, enhanced mobility, and more motivators drive warehouse migrations. The migration may be from one DBMS to another, from database server to appliance, from row-based to columnar, from locally hosted to cloud, and many other variations. The big challenge in moving a data warehouse is the demand for continued operation and availability. Migration is not an event but a process one that involves rebuilding of both databases and reporting processes. It occurs over a period of days and weeks, perhaps even months. Yet business information needs and reporting requirements continue to occur on a day-to-day basis. It is not practical to shut down to migrate. Data virtualization remediates the challenges of warehouse migration. Create a virtual reporting layer to decouple reporting from the physical data structure. You first build and test the reporting layer mapped to the original data warehouse. Next you build the data warehouse on the new platform. And finally you map the virtual reporting layer to the new data warehouse. The result is a step-by-step process with smooth transition that insulates warehouse users from the impacts migration. The physical data warehouse moves from one platform to another, but the virtual reporting layer is more than a temporary solution to a migration problem. Keeping the virtual layer in place increases the flexibility and agility with which new reporting requirements can be met. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-15

79 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Federation Use Cases Federated Views 3-16 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

80 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Federation Use Cases Federated Views VIEWS ACROSS DATABASES CHALLENGES VIRTUALIZATION AND VIEWS Database views are a powerful tool when working with relational data. They can be used to hide data structure complexities, to apply businessfriendly naming, to join data from multiple tables, and to get maximum advantage from query optimizers. A SQL view, however, is limited to work within the boundaries of a single database. As volume and variety of data increases every organization experiences corresponding increase in the number of databases that they manage simultaneously some data in packaged applications, some in ERP systems, some in legacy databases, some in warehouses, and some in the cloud, etc. The segmentation of databases is driven by technology and by the history and evolution of applications. But information needs often cut horizontally across vertically segmented databases. Database views could readily satisfy many of the information needs if they only worked across multiple databases. Data virtualization eliminates the boundary constraint of views contained within a single database by enabling federated views. Virtualization can combine data from multiple relational databases, Excel spreadsheets, XML, and other formats into a single view that is readily consumed by applications that are unaware of the multi-database data sourcing. The advantages of views now work across multiple databases. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-17

81 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Federation Use Cases Data Services 3-18 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

82 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Federation Use Cases Data Services SOA-BASED INTEGRATION CHALLENGES VIRTUALIZATION AND SERVICES Sometimes views aren t enough. The new world of service-oriented architectures and systems needs new approaches to data integration. Wrapping data sources with a services layer is a common way to get started with service-oriented integration. But wrappers are labor-intensive to build and require maintenance with every database change. Making integrated data available to service-oriented architecture (SOA)- based systems is uniquely challenging. Most data integration systems are predicated on relational technologies and optimized for SQL access. But SQL views don t do the job for web-services applications, serviceoriented master data management, etc. When SOAP, REST, WSDL, JMS, etc. are the right protocols, SQL isn t a satisfactory substitute. Some data virtualization tools include data services capabilities that make it practical to combine all of the core data integration functions multiple sources, abstraction, transformation, etc. with popular SOA-based data delivery formats. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-19

83 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Federation Use Cases Data Mashups 3-20 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

84 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Federation Use Cases Data Mashups NEW WAYS TO PRESENT DATA CHALLENGES VIRTUALIZATION AND MASHUP The most common form of mashup is web application that combines existing components from many sources for visual presentation in a new context. A data mashup combines, aggregates, and data from a variety of different sources. Data mashups are an effective way to meet new needs for information when the data already exists and the effort required is to present that data in new combinations and new visual formats. Web mashups are enabled by the published APIs of web applications that make their data and functions easy to access and integrate. This is the key to fast mashup based on the idea of assembling from existing components instead of building from scratch. The challenge of data mashups is that most corporate data sources do not have readily accessible APIs to support the mashup process. The data virtualization tools that enable SOA approaches are also enablers of data mashups. The same protocols, services, and data delivery formats that are used for virtualized services fill the role of data APIs for quick and easy access to data. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-21

85 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Federation Use Cases Caches 3-22 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

86 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Federation Use Cases Caches FREQUENTLY ACCESSED DATA CHALLENGES VIRTUALIZATION AND DUPLICATION While data virtualization is an effective technique for many data integration needs, it may have visible impact on performance of source systems and databases. One advantage of physical integration in a warehouse is that the source system is isolated from access by analysis and reporting applications. A virtual data integration system increases the number of ways that operational data is accessible and useful. When the potential becomes reality a shift from accessible and useful to accessed and used the operational databases may experience performance challenges. If query optimization isn t enough, then you may need to duplicate the data, creating a copy of frequently accessed data as a way to isolate the source database. When you need to create a copy of frequently accessed data, two options are possible replication and caching. A virtualization tool with cache capability does the job with lower overhead and greater flexibility than full database replication. Database replication simply creates copies of tables; integration follows replication. Caching can store copies of virtual views and services; integration is retained in the copy. The database replicate is static until updates are pushed to the copy. Caches can be automatically and periodically refreshed to synchronize with the source. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-23

87 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Federation Use Cases Virtual Data Marts 3-24 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

88 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Federation Use Cases Virtual Data Marts FEDERATION AND DATA MARTS We previously discussed the hub and virtual spoke approach to virtual data marts. In that use case, spokes are implemented as virtual data marts. A similar virtual data mart approach can be applied when the warehouse architecture is something different than hub-and-spoke. Some data marts may be needed urgently with possible short life spans. Other marts may have less urgency associated with their implementation and have longer life spans. The underlying reason for this range of urgency and life spans is found in the nature of the analytics and measurements that are being supported. Some measurements are needed to support short-term business needs, while others support longer term strategic management. Whether short- or long-term, urgent or less urgent, every data mart requirement adds to the collection of data marts to be built, managed, and supported. CHALLENGES VIRTUAL DATA MARTS Data mart proliferation is the core of the problem here. Demand for data marts often outpaces the capabilities of a data warehousing group to deliver them. Balancing speed of delivery and sustainability are key challenges too. Sometimes the need is urgent but the expected lifespan of a data mart relatively short. Others have less urgency but long life expectancy. And some are both urgent and long term. Balancing requirements, meeting expectations, and keeping pace with demand those are the challenges of data mart developers. Virtual data marts help developers to meet the challenges described above. Virtualization doesn t demand the schema and ETL development effort of physical consolidation, thus development is accelerated. Virtual data marts make sense when the need is urgent and when the expected lifespan is short. And they fit gracefully into hub-and-spoke, bus, and hybrid data warehousing architectures. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-25

89 Data Virtualization in Integration Architecture TDWI Data Virtualization Data Federation Use Cases Virtual Operational Data Store (ODS) 3-26 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

90 TDWI Data Virtualization Data Virtualization in Integration Architecture Data Federation Use Cases Virtual Operational Data Store (ODS) DESCRIPTION CHALLENGES VIRTUALIZING OPERATIONAL DATA The operational data store (ODS) is sometimes described as an expensive solution to legacy problems. The implication is that many legacy systems and databases are silos that inhibit transactional integration, enterprise reporting, and cross-functional operational analysis. The ODS integrates operational data to meet these needs, but with high development, operations, and maintenance costs. One of the real challenges with a physical ODS is the expectation for realtime or near real-time data. High latency and transactional integration don t work well together, but neither does batch ETL with real-time data. Another challenge is volatile source systems and databases where every change has a ripple effect into ETL processing and ODS database schema. Using an approach similar to virtual data marts, ODS challenges may be minimized with virtualization. Some components (and sometimes many components) of the ODS can be virtualized to reduce data latency, to minimize impact of source changes, or both. In ODS in which some tables are realized as physical schema and others as virtual views is a highly practical way to balance competing challenges of operational data integration. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-27

91 Data Virtualization in Integration Architecture TDWI Data Virtualization MDM and EIM Use Cases Master Data Hub Extension 3-28 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

92 TDWI Data Virtualization Data Virtualization in Integration Architecture MDM and EIM Use Cases Master Data Hub Extension A 360 O VIEW OF MASTER ENTITIES MASTER DATA HUB CHALLENGES Master data is the reference data that is shared across many business functions data about customers, products, employees, accounts, etc. The Master Data Management (MDM) vision is often described as providing a view of these entities a consistent and complete view from all perspectives. The view includes past, present, and future information about identity, relationships, activity, value, and expectations. A common architecture for MDM is integration of master data into a shared database called a hub. This is similar to the hub of hub-and-spoke data warehouse architecture a single point of integration for the data in scope. Choosing which data elements to include in a hub is always difficult. Too much data makes synchronization and consolidation exceptionally difficult. Too little data has very limited impact. A master data hub typically contains current identity and shared descriptive information about master data entities. The core function of MDM is, in fact, identity management. The hub may also contain some relationship information again limited to the current state. A customer, for example, may be represented in the hub with customer number, customer name, mailing address, address, and relationships with customer loyalty programs. Compare this example with the description of view above. The hub contains current identity and some relationships. It lacks past and future information and it is missing data about activity, value, and expectations. It is impractical, however, to include transaction detail (activity and value), transaction history (past), lifetime value calculations (value and future), retention forecasts (future) and other details in the hub. The vision of a view is not possible with an MDM hub alone. FEDERATING MASTER DATA Virtualization makes the view possible by federation of master hub data and detailed data from various data source. The hub serves consolidation and integration needs for current identity and relationship data. Virtualized views extend the hub to complete the view with transaction detail, past and future perspectives, etc. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-29

93 Data Virtualization in Integration Architecture TDWI Data Virtualization MDM and EIM Use Cases Master Data Services 3-30 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

94 TDWI Data Virtualization Data Virtualization in Integration Architecture MDM and EIM Use Cases Master Data Services MDM SERVICES CHALLENGES SOA-BASED MDM Advanced MDM architectures go beyond simple consolidation in a shared master data hub. They include a services layer as the interface with applications that create and consume master data. The services layer may exist between applications and a repository (hub) or it may work directly with system-of-record sources of master data. In both models data integration is bi-directional. Applications that create and modify master data do more than read they both read and write. The challenges of MDM services are similar to those described earlier for SOA-based applications and data mashups. Views alone don t do the job. Wrappers are costly to build and to maintain. SQL access isn t adequate to build APIs for bi-directional integration of master data, nor does it support access needs of more recent SOA-based applications where SOAP, REST, WSDL, JMS, etc. are the right protocols. The data virtualization tools that enable SOA approaches are can readily be applied for master data services. The same protocols, services, and data delivery formats that are used for virtualized services fill the role of master data APIs. In the repository plus hub model virtualization may provide both the services layer and the hub extension previously described. In the full SOA-based model all of the data past, present, future, activity, value, etc. is easily integrated whenever a system of record is designated. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-31

95 Data Virtualization in Integration Architecture TDWI Data Virtualization MDM and EIM Use Cases Virtual Data Layer 3-32 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

96 TDWI Data Virtualization Data Virtualization in Integration Architecture MDM and EIM Use Cases Virtual Data Layer DATA ABSTRACTION CHALLENGES ENTERPRISE VIEW AND BUSINESS CONTEXT As the volume and variety of data continue to grow it becomes increasingly complex to understand, use, and reuse data. Much of our transactional data uses trailing-edge technologies proprietary formats from ERP systems, in legacy mainframe file formats, in non-integrated relational databases, in spreadsheets, and more. Analytic data is often found in more state-of-practice technologies such as columnar databases. And the leading edge includes cloud-based data. As the gap between the leading edge and the trailing edge continues to expand it is increasingly difficult to find, understand, use, and reuse data. Metadata is typically inadequate, and most of the data is defined and structured in very specific application and technology context. Business context is much needed but elusive. Data virtualization can be used to create a virtual data layer that consists of abstract views that use business language and position data in business context. Business syntax makes it easier to find and understand the data. Views and virtualization decouple data sources from consuming applications, which enhances ability to use and reuse the data. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-33

97 Data Virtualization in Integration Architecture TDWI Data Virtualization MDM and EIM Use Cases Enterprise Data Services 3-34 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

98 TDWI Data Virtualization Data Virtualization in Integration Architecture MDM and EIM Use Cases Enterprise Data Services COMBINING DATA ABSTRACTION AND DATA SERVICES CHALLENGES BUSINESS AGILITY Data abstraction and business syntax help to achieve an enterprise view of data. But data is only part of the enterprise challenge. Getting the data right depends on getting business rules right. And the rules are varied, and interrelated, and often compound. Packaging business rules as services is an effective way to build reusable rules and to achieve consistency across business processes. Enterprise data management sometimes seems like too many balls in the air at one time. Data, information systems, business systems, and business rules all undergo continuous change. As the rate of change accelerates, the need for business agility becomes even more pressing. Data virtualization brings two capabilities that are essential to achieve agility in data-dependent businesses. Data abstraction (the virtual data layer) decouples data consumption from data collection and storage. Consumers of data can find the data that they need and understand the data that they find. Data services add function to the virtual data layer, abstracting business rules and decoupling them from specific business processes. Consider two common business activities as examples: (1) transfer an employee, and (2) reward customer. In the first example, the virtual data layer contains the knowledge about finding employee and job data. The data services layer contains the rules knowledge needed to apply transfer activity to employees and jobs. In the second example, the virtual data layer contains the knowledge about finding customer and account data. The data services layer contains the rules knowledge needed to apply reward activity to customers and their accounts. Business agility is supported by fast and relatively easy response to change. Data changes are effected by change to corresponding mappings in the virtual data layer. Rules changes are effected by adjustment to corresponding services in the data services layer. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-35

99 Data Virtualization in Integration Architecture TDWI Data Virtualization More Data Virtualization Applications Virtualization and Big Data 3-36 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

100 TDWI Data Virtualization Data Virtualization in Integration Architecture More Data Virtualization Applications Virtualization and Big Data DESCRIPTION TYPES OF BIG DATA Big data is a term that has become popular to describe rapid growth in the volume, variety, and velocity of data that is now available in business unstructured data, semi-structured data, social media data, location data, radio frequency data, and more. These types of data tend to yield data sets that are too large, complex, or unwieldy to work with traditional data management and analytics technologies. Structured big data can be readily stored in tabular forms. It is typically used for analysis and is stored and manipulated using columnar databases high speed analytic and data warehouse appliances based on multiple parallel processing (MPP) hardware architectures. Unstructured and semi-structured big data includes formats such as text, social media content, multi-media content, and web logs. The technologies to work with unstructured big data are emerging and evolving. Most frequently discussed technologies today include Hadoop, MapReduce, and NoSQL. Hadoop is distributed file system designed to work with large data collections that combine structured with more complex types of data. MapReduce is a programming framework to write applications that work with Hadoop datasets. NoSQL describes a class of database management systems whose common characteristics are that they are not based on a relational model and they do not use SQL as a programming language. NoSQL databases are optimized for distributed storage and retrieval of very large volumes of data, either structured or unstructured. NoSQL supports append and read functions. Insert, delete, update, and join are not typically supported. SOLUTIONS ENABLED BY DATA VIRTUALIZATION Data virtualization makes it practical to access a variety of big data sources, to abstract structured big data as views, and to integrate big data with other kinds of enterprise data. The data virtualization layer can generate optimized SQL queries to access Hadoop/MapReduce data. Columnar databases, NoSQL, and MPP-appliance stored data can be accessed via views and services. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-37

101 Data Virtualization in Integration Architecture TDWI Data Virtualization More Data Virtualization Applications Virtualization and Cloud Data 3-38 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

102 TDWI Data Virtualization Data Virtualization in Integration Architecture More Data Virtualization Applications Virtualization and Cloud Data SOFTWARE AS A SERVICE (SaaS) PLATFORM AS A SERVICE (PaaS) DATA AS A SERVICE (DaaS) ACCESSING CLOUD HOSTED DATA Software as a service (SaaS) is a class of cloud-based applications where software and data are hosted in the cloud. The software, sometimes called on demand software is accessed using a web browser. Common applications such as accounting, CRM, and sales management are typical SaaS applications. Most SaaS applications are implemented as public cloud services, though some providers do offer private cloud options. Whether public or private the data is cloud hosted, adding complexity to data access outside the application and to integration with other enterprise data. It is also possible to deploy custom applications in the cloud using Platform as a Service (PaaS) or Infrastructure as a Service (IaaS) providers. Unlike SaaS applications, custom applications are unique to the business and organization for which they are built, and are typically hosted as private cloud. Similar to SaaS, data as well as function is cloud based adding complexity to data access and integration. Data as a Service (DaaS) occurs in two forms using cloud services to host your organization s data, and as subscription to cloud-hosted external data. Internal data is typically hosted with private cloud DaaS services. Many types of database management systems relational, multidimensional, columnar, and NoSQL are practical for DaaS hosting. While these implementations are data only, with no application or software hosting the access and integration challenges are similar to those of SaaS and PaaS. Data virtualization technology supports access to SaaS, PaaS, and DaaS variations of cloud-hosted databases. Cloud data is accessed via the data virtualization layer and is readily accessible to consuming applications. All of the core virtualization functions integration, abstraction, and optimization can be applied to cloud data as effectively as with locally hosted data. Cloud-hosted data may also be delivered via data services. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-39

103 Data Virtualization in Integration Architecture TDWI Data Virtualization Virtualize or Materialize? Data Consumer Considerations Discussion 3-40 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

104 TDWI Data Virtualization Data Virtualization in Integration Architecture Virtualize or Materialize? Data Consumer Considerations Discussion Think about your organization and your BI systems and projects. Where do you fit for each of the four factors shown here? Is the answer the same for all integration needs or do answers change depending on data subjects, data sources, or data integration projects? TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 3-41

105 Data Virtualization in Integration Architecture TDWI Data Virtualization 3-42 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

106 Data Virtualization Platforms Module 4 Data Virtualization Platforms Topic Page Platform Requirements 4-2 Platform Capabilities 4-8 Platform Variations 4-28 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-1

107 Data Virtualization Platforms TDWI Data Virtualization Platform Requirements Data and Information Services 4-2 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

108 TDWI Data Virtualization Data Virtualization Platforms Platform Requirements Data and Information Services VIRTUALIZATION PLATFORMS A data virtualization platform encompasses all of the tools, technology, and practices that are needed to connect data consumers with disparate data sources using methods where: Data is accessed in business context and using business language. Data is integrated and source disparity minimized or eliminated. Data of different types structured, semi-structured, multistructured, and unstructured can be combined in a single view. The data does not need to be replicated or redundantly stored. A data virtualization platform can be described as a collection of requirements and capabilities. A data virtualization platform must provide capabilities for delivery, development, and management of data and information services. DELIVERY: THE TOP-LEVEL REQUIREMENT The first and most fundamental requirement of a data virtualization platform is to provide data and information services. These services provide the data access methods that are essential for data consumers to gain access to data. Typical data access methods include SQL access to relational views and web services for both structured and unstructured data. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-3

109 Data Virtualization Platforms TDWI Data Virtualization Platform Requirements Development Environment 4-4 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

110 TDWI Data Virtualization Data Virtualization Platforms Platform Requirements Development Environment DEVELOPMENT: BUILDING THE SERVICES To meet the basic requirement of services, you need a developer environment in which services are built. The development environment should support discovery, design, modeling, construction, and testing. Ease of use, intuitive interface, code generation, and source control functions are key features. Fast and easy are must have features for data virtualization to enable business agility. Specific terminology, interfaces, and steps vary from one platform to another, but all should meet these core requirements. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-5

111 Data Virtualization Platforms TDWI Data Virtualization Platform Requirements Management Functions 4-6 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

112 TDWI Data Virtualization Data Virtualization Platforms Platform Requirements Management Functions MANAGEMENT: ADMINISTRATION, CONFIGURATION, AND MONITORING Services development and services delivery are accomplished in a managed environment, thus the third aspect of data virtualization platform requirements is management. The management functions encompass all of the capabilities to configure, administer, and monitor the virtualization environment. These functions span across installation, performance, security, and run-time operations. Management functions vary among platforms. Common functions include activities such as: Domain Management Group Management User Management Resource Management Security Management Server Configuration Data Access and Connections Configuration Performance Monitoring Access Monitoring Cache Management Event Monitoring and Logging Network Load Monitoring Server and Storage Monitoring TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-7

113 Data Virtualization Platforms TDWI Data Virtualization Platform Capabilities Access 4-8 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

114 TDWI Data Virtualization Data Virtualization Platforms Platform Capabilities Access FULFILLING THE REQUIREMENTS GETTING THE DATA Capabilities are the things that a data virtualization platform must be able to do in order to meet the platform requirements of service delivery, development, and management. A rich set of capabilities makes a robust data virtualization platform. The most basic of data virtualization capabilities is data access the essential first step that precedes transformation, integration, federation, delivery, etc. The range of data sources in a typical enterprise encompasses structured, semi-structured, and unstructured data. Any data source may be internal or external, locally hosted, cloud hosted, or web hosted, etc. A robust data virtualization platform will connect to and extract data efficiently from all of them. The platform should support a broad range of data source formats and access protocols including those for: Relational databases Multi-dimensional databases Web services File systems The evolving set of technologies for big data TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-9

115 Data Virtualization Platforms TDWI Data Virtualization Platform Capabilities Delivery 4-10 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

116 TDWI Data Virtualization Data Virtualization Platforms Platform Capabilities Delivery CONNECTING DATA WITH SERVICES Delivery encompasses the processes that flow data to data and information services. The services are the data access interfaces. Delivery is the movement of data. Delivery methods must be standards-based and will ideally encompass all of the access methods that consumers may require. Fitting the services to the consumer not the consumer to the services they must support delivery of shared data to different consumers using different methods. The same data might, for example, be delivered to one consumer using XML and SOAP, and to another consumer as a relational view with ODBC. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-11

117 Data Virtualization Platforms TDWI Data Virtualization Platform Capabilities Transformation 4-12 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

118 TDWI Data Virtualization Data Virtualization Platforms Platform Capabilities Transformation TRANSFORMATION FUNCTIONS Data transformation prepares data for delivery. Integration, aggregation, cleansing, and conformity are all common reasons to transform data. Various virtualization tools handle data transformation in different ways but the core functions are similar. Those functions include all of the common transformation types found in a data warehousing environment: data selection, filtering, formatting, calculation, summarization, table lookup, etc. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-13

119 Data Virtualization Platforms TDWI Data Virtualization Platform Capabilities Abstraction 4-14 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

120 TDWI Data Virtualization Data Virtualization Platforms Platform Capabilities Abstraction ABSTRACTION OBJECTIVES Abstraction is the means by which consumer views of data are decoupled from physical implementation of data. Data abstraction is a commonly used concept in data modeling, and it aligns with layered data architecture concepts discussed previously. The table below illustrates correspondence of data modeling and two of the architecture examples described in Module Two. Compassion International Logical Architecture Global 50 Energy Logical Architecture Data Modeling Abstraction Levels Consumer Views Business Demand Layer Application Data Model Canonical Layer Common Semantic Layer Conforming Layer Business Data Model Logical Data Model Source Transform Views Source Connections Physical Data Model ABSTRACTION IN PRACTICE The correspondence shown above is intended to illustrate not to depict specific definition of abstraction layers or of data model types. There is no single, correct way to structure levels of data abstraction. A data virtualization platform must support the progression of data abstraction. Starting at the bottom and working upward, that progression is: Highly specific based on the way that data is stored. Somewhat generalized to identify connections and overlaps among disparate data sources. Highly generalized to represent business concepts using business lexicon and taxonomy. Specific to a particular consumer need for data. Mapping data across the layers is, of course, an important part of the capabilities provided by a virtualization platform. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-15

121 Data Virtualization Platforms TDWI Data Virtualization Platform Capabilities Federation 4-16 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

122 TDWI Data Virtualization Data Virtualization Platforms Platform Capabilities Federation HETEROGENEOUS FEDERATION Recall the data federation definition from Rick van der Lans that is cited in Module One: Data federation is a form of data virtualization where the data stored in a heterogeneous set of autonomous data stores is made accessible to data consumers as one integrated data store by using on-demand data integration. Not all forms of data virtualization imply data federation... but data federation always results in data virtualization. 1 Clearly federation is one of the primary reasons for virtualization, and a fundamental capability that must be provided by every data virtualization platform. Although van der Lans definition specifically refers to heterogeneous data stores, in practice federation occurs in several forms: Heterogeneous federation integrates data from autonomous data stores that are substantially different in content and structure. Integrating structured data such as a CRM database and sentiment data (unstructured data from social networks) is an example of heterogeneous federation. Homogeneous federation integrates data from data stores that are highly similar in content and structure yet operate autonomously. Integrating structured customer data from a CRM system with structured customer data from an ERP system is an example of homogeneous federation. Real-time federation occurs when point-in-time data is integrated in a virtual form with little or no latency. Streaming integrated data from order processing and inventory management systems to a real-time operational dashboard is an example. Historical federation occurs when latent and time-variant data is integrated in a virtual form. Time-variance brings unique challenges to data federation. Integrating data from multiple data warehouses in a virtual form is a frequent use case of historical federation. 1 Clearly Defining Data Virtualization, Data Federation, and Data Integration, van der Lans, TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-17

123 Data Virtualization Platforms TDWI Data Virtualization Platform Capabilities Query Optimization 4-18 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

124 TDWI Data Virtualization Data Virtualization Platforms Platform Capabilities Query Optimization VIRTUAL QUERY PERFORMANCE Robust data virtualization is a long path from consumer to data sources and back to the consumer: Consumer requests data and information services Access the data sources Transform the data Map data through multiple levels of abstraction Federate autonomous data sources Deliver data to the consumer via data and information services Data consumers, both human and automated, expect quick response to queries. Thus every data virtualization platform must provide query optimization capabilities. Specific optimization methods and details vary among platforms, but you should expect a data virtualization platform to include: Query optimization algorithms Performance tuning features, functions, and guidelines Performance driven query construction methods Data caching for query optimization TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-19

125 Data Virtualization Platforms TDWI Data Virtualization Platform Capabilities Caching 4-20 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

126 TDWI Data Virtualization Data Virtualization Platforms Platform Capabilities Caching CENTRAL vs. DISTRIBUTED A data cache is an effective way to achieve performance gains by keeping a copy of frequently accessed data where some of the work of access, transformation, mapping, and federation has already been done. Gains in performance are achieved by eliminating the need to do the same work multiple times. One consideration with caching is the degree to which performance is degraded by network bottlenecks. If network constraints are an issue, then consider distributed caching. If not, then consider a central cache. Caching, of course, brings some degree of data latency the cache is not real time. Balancing frequency of cache refresh with level of performance gain becomes a performance tuning consideration. Further optimization capability is gained when the virtualization platform allows incremental as well as full refresh of the cache. Also consider the complexity of distributed cache refresh vs. central cache refresh. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-21

127 Data Virtualization Platforms TDWI Data Virtualization Platform Capabilities Security 4-22 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

128 TDWI Data Virtualization Data Virtualization Platforms Platform Capabilities Security SECURITY METHODS Data virtualization does not eliminate the need to secure data. Intrusion, corruption, and privacy are virtual data issues just as they are material data issues. Every data virtualization platform should include security functions and features that include: Administration and management of users and groups Administration and management of rights and privileges User authentication Access authorization Of course, your technical infrastructure is likely to already have all of these capabilities implemented for other security needs. The data virtualization platform should not require duplicate security management. It should be compatible, connect with, and use existing standard-protocol user and rights management technologies. Minimally, a data virtualization platform must be LDAP compatible. Ideally a platform should through native, pass-through, or plug-in functions support a wide range of commonly used internet, network, and database security protocols. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-23

129 Data Virtualization Platforms TDWI Data Virtualization Platform Capabilities Quality 4-24 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

130 TDWI Data Virtualization Data Virtualization Platforms Platform Capabilities Quality VIRTUALIZATION & DATA QUALITY MANAGEMENT When evaluating data virtualization platforms, look carefully at the data quality capabilities that are offered. Data quality management is complex, and quality must be managed at multiple points in the flow of data at data sources, in a data warehouse, in virtualized data views and services, and at front-end and consumer applications. Ask how the platform will enable and advance your data quality goals and strategies. Every platform supports some level of data cleansing through transformation capabilities. But cleansing is only a small part of data quality. Quality metadata, quality monitoring, quality reporting, and defect prevention are all important parts of data quality management. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-25

131 Data Virtualization Platforms TDWI Data Virtualization Platform Capabilities Governance 4-26 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

132 TDWI Data Virtualization Data Virtualization Platforms Platform Capabilities Governance USER SECURITY Similar to data quality, take a close look at data governance capabilities that are offered by data virtualization platforms. How does virtualization ease data governance complexities? How does it compound the complexities? How will the platform enable and advance data governance goals? How might it inhibit them? Minimally, expect a data virtualization platform to support data governance with functions for: User security Tracing of data lineage Tracking and logging of data access and use TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-27

133 Data Virtualization Platforms TDWI Data Virtualization Platform Variations Stand-Alone Data Virtualization 4-28 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

134 TDWI Data Virtualization Data Virtualization Platforms Platform Variations Stand-Alone Data Virtualization VIRTUALIZATION ONLY A stand-alone platform meets the requirements and provides the capabilities needed to implement and operate data virtualization. The standalone platform is compatible with, but operates independently of data warehousing and business analytics platforms. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-29

135 Data Virtualization Platforms TDWI Data Virtualization Platform Variations Extension of BI or Data Warehousing Platform 4-30 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

136 TDWI Data Virtualization Data Virtualization Platforms Platform Variations Extension of BI or Data Warehousing Platform VIRTUALIZATION FOR DATA WAREHOUSING In addition to operating as stand-alone platforms, some data warehousing platform vendors have extended their platforms to include data virtualization capabilities. This configuration uses ETL and virtualization processes side-by-side each complementing the other to implement and operate a data warehouse where some integrated data is physically stored and some is virtualized. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-31

137 Data Virtualization Platforms TDWI Data Virtualization Platform Variations Embedded and Appliances 4-32 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

138 TDWI Data Virtualization Data Virtualization Platforms Platform Variations Embedded and Appliances HARDWARE, SOFTWARE, AND SERVICES Appliances combine hardware, software, processes, databases, and data storage in ready-to-run configurations for high-performance and highvolume data warehousing and/or business analytics. Data virtualization capabilities may be embedded in a data warehousing or analytic appliance providing integrated data views and services for data that is not physically stored in the warehouse. Appliances often combine in-memory and columnar databases with data virtualization to deliver fast, highperformance business analytics. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-33

139 Data Virtualization Platforms TDWI Data Virtualization Platform Variations Some Vendors 4-34 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

140 TDWI Data Virtualization Data Virtualization Platforms Platform Variations Some Vendors FORRESTER S ANALYSIS The facing page illustrates Forrester s analysis of data virtualization vendors as published in their report The Forrester Wave TM : Data Virtualization, Q by Noel Yuhanna and Mike Gilpin. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 4-35

141 Data Virtualization Platforms TDWI Data Virtualization 4-36 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

142 Implementing Data Virtualization Module 5 Implementing Data Virtualization Topic Page Analysis 5-2 Design and Modeling 5-10 Development 5-16 Deployment 5-24 Operation 5-26 Virtualize or Materialize? 5-30 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-1

143 Implementing Data Virtualization TDWI Data Virtualization Analysis Goals and Purpose 5-2 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

144 TDWI Data Virtualization Implementing Data Virtualization Analysis Goals and Purpose OVERVIEW Implementing data virtualization is a process that progresses through activities of analysis, design and modeling, development, deployment, and operation. Execute this sequence within a plan of phased adoption i.e., incremental implementation and repetition of steps in the sequence for managed growth of goals, scope, and maturity of data virtualization. Begin analysis with goal setting. It is important to know why data virtualization before defining what and how. BUSINESS GOALS TECHNICAL GOALS APPLICATION GOALS Business goals for data virtualization certainly include business agility, but more specific goals provide the foundation for scoping, data analysis, design, and modeling. Think about the business processes where you want to have impact, and the kind of impact for each process. Consider impacts such as decision speed, completeness and quality of decisionmaking information, opportunity recognition and realization, reduction of uncertainty, mitigation of risk, process effectiveness, and process efficiency. Technical goals for data virtualization usually focus on increasing speed of information, integrating unstructured data with structured data, reducing data latency, accelerating development cycles, incorporating big data into the information resource. Consider each and be specific about the goals: Speed of what information, integration of which unstructured data, latency of which data Application goals look at the purpose of data virtualization from a data consumer perspective. Which systems data warehousing, ERP, MDM, enterprise reporting, business analytics, etc. must data virtualization serve to achieve the stated business and technical goals? Which technical goals apply to each system? How will meeting technical goals for an application help to meet the business goals? TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-3

145 Implementing Data Virtualization TDWI Data Virtualization Analysis Scoping 5-4 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

146 TDWI Data Virtualization Implementing Data Virtualization Analysis Scoping RANGE & REACH With goals established, the next step is to define the scope of data virtualization work. What is the range of people, processes, applications, and systems to be affected? How far into processes, organizations, and systems should virtualization reach? What is in bounds? What is outside of the boundaries? Consider these scoping questions for each of the following: Scope of impact: Which programs, processes, and systems will feel the effects of data virtualization BI, data warehousing, MDM, ODS, ERP, etc. Scope of data: Which data requirements will be met with data virtualization? Which data services will be provided? What data will be delivered? Which data sources will be accessed? Scope of user base: Which data consumers people, systems, and organizations will realize the benefits of data virtualization? Scope of access: Which kinds of data will you access? Using what access methods? What types of data will be accessed? What volumes do you expect? Scope of controls: What data governance policies affect data virtualization? What security and access controls do you need? What data quality controls are needed? TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-5

147 Implementing Data Virtualization TDWI Data Virtualization Analysis Data Source Discovery 5-6 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

148 TDWI Data Virtualization Implementing Data Virtualization Analysis Data Source Discovery FINDING DATA SOURCES Depending on goals and scope you may sometimes know all of the data sources, and at other times need to discover data sources. If, for example, you have a situation similar to the Automated Web Data Extraction case study described in Appendix A it is likely that some exploration and discovery is needed to find the web data sources that do the best job of meeting business goals. Skip this activity when you re confident that you know all of the data sources. Perform the activity whenever you are uncertain about the best data sources for web data, external data, legacy systems, ERP databases, etc. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-7

149 Implementing Data Virtualization TDWI Data Virtualization Analysis Source Data Analysis 5-8 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

150 TDWI Data Virtualization Implementing Data Virtualization Analysis Source Data Analysis UNDERSTANDING THE DATA PROFILING THE DATA It is not practical to move to design activities especially data connection and data integration without good understanding of the data. One data virtualization vendor uses the term introspection (looking deeply into a subject) to describe this important step. The purpose is to gain deep knowledge of the content, meaning, and relationships within each data source, and to discover and analyze the relationships among multiple data sources. Data profiling is a process of examining stored data to collect information and statistics about data. The statistics are real and reliable metadata, describing the data accurately because they are derived directly from data content. The complete process of data profiling goes beyond collecting statistics to include analysis of those statistics and decisions about data management. Descriptive statistics tell us what characteristics are inherent in the data. Inference draws conclusions about why those characteristics exist. Structured data profiling is commonly practiced in data warehousing work, data quality, and data governance. Profiling tools produce statistics that describe columns, column dependencies, and table dependencies. Analysis of these statistics yields abundant information about the content, meaning, and relationships of the data. Unstructured data profiling less familiar to most data warehousing and business intelligence professionals than structured data profiling helps to understand the content, meaning, and relationships of data that is less rigorously organized than structured data. Profiling yields information about hidden structure that may exist in semi-structured. data, for example, does have some structure: sender address, recipient addresses and types (to, cc, bcc), subject line, dates, body text, attachments, etc. Body text has less structure than the other parts of an message. And body text or any text for that matter can be further profiled. Text profiling finds frequently used words, phrases, acronyms, and patterns that can be analyzed to help build business lexicon and taxonomy needed for text integration and text analytics. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-9

151 Implementing Data Virtualization TDWI Data Virtualization Design and Modeling Data Source Layer 5-10 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

152 TDWI Data Virtualization Implementing Data Virtualization Design and Modeling Data Source Layer MOVING TO DESIGN DESIGN FOR DATA SOURCES Using the information gathered from analysis activities, it makes sense to proceed to design and modeling. The design process corresponds with the layered architecture discussed earlier in the course. Here the process is illustrated using three layers: data source, data integration, and publishing and access. If your architecture has more layers, as we ve seen in some of the examples, then adapt the process to match. Design and model one layer at a time beginning closest to data sources and working toward data consumers. Design at this layer describes the features, functions, and data structures that are needed to connect to data sources and deliver data to the integration layer. Plan and specify the methods of data connection wrappers, APIs, protocols, etc. Determine how data will be obtained views, extracts, or other methods. Model the views or extracts, and determine for structured data the level of normalization that is desired. Clearly define each data item in every view and extract. Remember that every data acquisition process needs also to be a metadata collection process. Design the methods by which you will collect metadata to support development, monitoring, management, and data lineage tracing. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-11

153 Implementing Data Virtualization TDWI Data Virtualization Design and Modeling Data Integration Layer 5-12 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

154 TDWI Data Virtualization Implementing Data Virtualization Design and Modeling Data Integration Layer BUSINESS VIEWS Design of the data integration layer is the point at which data views are abstracted from source context to business context. Canonical and/or semantic models are developed to describe the data as business views. This design also includes mapping of source views to business views and specification of the data transformations that are needed to integrate, conform, and quality-assure the data. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-13

155 Implementing Data Virtualization TDWI Data Virtualization Design and Modeling Publish and Access Layer 5-14 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

156 TDWI Data Virtualization Implementing Data Virtualization Design and Modeling Publish and Access Layer USE-CASE SPECIFIC VIEWS Finally design the publishing and access layer. Define and model the application and consumer specific views that are needed for data access. Also design any data services provided at this level. Work from consumer perspective to design services that support the variety of multiple protocols that may be needed. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-15

157 Implementing Data Virtualization TDWI Data Virtualization Development Connect to Data Sources 5-16 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

158 TDWI Data Virtualization Implementing Data Virtualization Development Connect to Data Sources FROM DESIGN TO DEVELOPMENT REACHING DATA SOURCES With analysis and design complete (or in the case of prototyping or agile development, complete enough) proceed to development activities the work of creating data virtualization functions. Development encompasses three main categories of activity that correspond with the three layers of architecture connecting to data sources, building views and services, and publishing to consumers. Begin development by connecting to data sources. The specific steps of data connection vary depending on data virtualization platform and tools. The development environment of each platform will include screens and processes to build data connections. The screenshot examples shown here illustrate data connection with two widely used data virtualization platforms: Cisco Data Virtualization, which primarily connects using drivers, and Denodo whose primary connection method is wrappers. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-17

159 Implementing Data Virtualization TDWI Data Virtualization Development Build the Views 5-18 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

160 TDWI Data Virtualization Implementing Data Virtualization Development Build the Views TRANSFORMATION AND INTEGRATION Once connected to data sources you can build the views and services to deliver virtualized data. This work implements the mappings from source to business data views and all of the transformations to integrate, conform, and quality-assure the data. Specifics and details vary with each data virtualization platform. Again, Cisco Data Virtualization and Denodo screenshots are used for the illustrations on the facing page. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-19

161 Implementing Data Virtualization TDWI Data Virtualization Development Test and Validate 5-20 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

162 TDWI Data Virtualization Implementing Data Virtualization Development Test and Validate FINDING AND REMOVING BUGS As with any software development activity, testing of data virtualization processes is an essential step. Key elements to test include: Correct and reliable connection to data sources Data access functions that return the right data Data retrieved in the format and with the structure that matches data source layer design Data transformation that change the data correctly and that conform to integration layer design Data delivery that performs at an acceptable level, that makes the data accessible, and that conforms to the publish and access layer design TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-21

163 Implementing Data Virtualization TDWI Data Virtualization Development Publish and Connect Applications 5-22 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

164 TDWI Data Virtualization Implementing Data Virtualization Development Publish and Connect Applications COMPLETING DEVELOPMENT With testing complete the final step of development is to publish the views and services exposing them to data consumers. Once again, details and specific steps vary by platform, but every platform provides functions and screens for publishing. The screenshots shown here are again taken from Cisco Data Virtualization and Denodo data virtualization products. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-23

165 Implementing Data Virtualization TDWI Data Virtualization Deployment Acceptance Testing and Production 5-24 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

166 TDWI Data Virtualization Implementing Data Virtualization Deployment Acceptance Testing and Production MOVING TO PRODUCTION As is typical with development projects, the final phase is deployment to production. Data virtualization systems are business-critical systems that need the entire production-strength environment and practices that apply to core business systems: Acceptance test including attention to: o A test environment that will endure beyond deployment o Testing criteria especially pre-defined acceptance criteria o A test plan and test cases o Execution of test cases and review of test results Promote to production Establish a change control baseline and implement change control processes and procedures. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-25

167 Implementing Data Virtualization TDWI Data Virtualization Operation Runtime Operations 5-26 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

168 TDWI Data Virtualization Implementing Data Virtualization Operation Runtime Operations DAY-TO-DAY DATA VIRTUALIZATION Finally your data virtualization system is operating routinely to serve data to those who need it. To maintain service levels and meet data consumer expectations you ll need effective runtime operations processes including: Availability monitoring Performance monitoring Network monitoring Active management of events and error conditions TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-27

169 Implementing Data Virtualization TDWI Data Virtualization Operation Management and Governance 5-28 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

170 TDWI Data Virtualization Implementing Data Virtualization Operation Management and Governance USER AND GROUP MANAGEMENT Reliable and sustainable data virtualization depends on management and governance processes that connect data consumers with data and information services. Management and governance processes include: User and group management Security administration Access control Metadata management Data lineage tracing Auditability of data security and data access TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-29

171 Implementing Data Virtualization TDWI Data Virtualization Virtualize or Materialize? A Decision Tool 5-30 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

172 TDWI Data Virtualization Implementing Data Virtualization Virtualize or Materialize? A Decision Tool DEMONSTRATION The instructor will demonstrate use of a data integration decision tool that consolidates all of the factors business, source, and consumer considerations into a single evaluation process. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY 5-31

173 Implementing Data Virtualization TDWI Data Virtualization 5-32 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

174 Getting Started with Data Virtualization Module 6 Getting Started with Data Virtualization Topic Page Skills and Competencies 6-2 Human Factors 6-4 Goals and Expectations 6-6 Best Practices 6-12 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. 6-1

175 Getting Started with Data Virtualization TDWI Data Virtualization Skills and Competencies Capabilities and Expertise 6-2 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

176 TDWI Data Virtualization Getting Started with Data Virtualization Skills and Competencies Capabilities and Expertise RANGE OF SKILLS As you ve seen throughout the course, there are many different aspects of data virtualization ranging from business to technology, from architecture to operations, and from analysis to deployment. Building and operating data virtualization systems demands a similarly broad range of skills and competencies including: Technical skills and data virtualization platform knowledge Business subject expertise Data and systems subject expertise Project skills from planning to execution Development skills and capabilities Management and administrative skills applied to data, security, projects, and technology TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. 6-3

177 Getting Started with Data Virtualization TDWI Data Virtualization Human Factors People and Data Virtualization 6-4 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

178 TDWI Data Virtualization Getting Started with Data Virtualization Human Factors People and Data Virtualization CHANGING JOBS AND ROLES To maximize value and minimize risk with data virtualization, pay attention to the human factors. Be aware of impacts on people and plan to manage change. Everyone who has a stake in data virtualization will experience some change of job roles and responsibilities. For business people the volume and speed of decision-making data is changed. For data integrators the data types, techniques, and tools are changed. Security, network, systems, and database administrators all experience change. Analysts, designers, developers, and project managers experience change. TEAMS AND ORGANIZATIONS Data virtualization is a team job not something that can be done by one individual or a few superstars. Proactively planning the structure and makeup of a data virtualization organization is a good practice. As the scope of your data virtualization systems grows, evolve the organization to become a data virtualization competency center or center of excellence. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. 6-5

179 Getting Started with Data Virtualization TDWI Data Virtualization Goals and Expectations DV Readiness 6-6 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

180 TDWI Data Virtualization Getting Started with Data Virtualization Goals and Expectations DV Readiness GETTING STARTED If you re just getting started with data virtualization, begin with realistic and achievable expectations. List your data virtualization goals, and then assess your readiness to achieve those goals. The facing page lists 15 common kinds of goals. Achieving all of them simultaneously is a tall order even for seasoned data virtualization teams. Create early successes, then build upon them to grow your capabilities, advance your readiness, and evolve your data virtualization organization. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. 6-7

181 Getting Started with Data Virtualization TDWI Data Virtualization Goals and Expectations Choosing a First DV Project 6-8 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

182 TDWI Data Virtualization Getting Started with Data Virtualization Goals and Expectations Choosing a First DV Project FIRST PROJECT CRITERIA The facing page lists 12 factors that are good guidelines when choosing a first data virtualization project. Consider each of them, mitigate risk, and position for success by selecting a project that rates well on many of these factors. Defer or avoid projects that rate poorly on six or more factors. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. 6-9

183 Getting Started with Data Virtualization TDWI Data Virtualization Goals and Expectations Planning a DV Roadmap 6-10 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

184 TDWI Data Virtualization Getting Started with Data Virtualization Goals and Expectations Planning a DV Roadmap EVOLVING DATA VIRTUALIZATION Short-term impact in the framework of a long-term plan is the ideal way to build a robust data virtualization environment. Even before the first project, define the business case and conduct a systematic technical evaluation. The technical evaluation may deliver one or more proofs-ofconcept and help to choose the best-fit platform for your environment. Follow technology evaluation and selection with a first project. You may choose to advance from proof-of-concept to proof-ofvalue with the first project. With the experience and learning from the first project, establish your initial data virtualization organization perhaps a team that will evolve to become a competency center, or possibly a new function within an existing BI, analytics, or other competency center. Adopt data virtualization through a series of projects that bring new data sources and new consumers into scope in planned and managed phases. Continuously evolve toward enterprise scale data virtualization. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. 6-11

185 Getting Started with Data Virtualization TDWI Data Virtualization Best Practices What Works in DV 6-12 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

186 TDWI Data Virtualization Getting Started with Data Virtualization Best Practices What Works in DV TIPS FOR SUCCESS The facing page lists several best practices for data virtualization that are collected from many different sources including case studies, product vendor recommendations, and industry analysts. Consider these practices as guidelines to maximize success and value of your data virtualization activities. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. 6-13

187 Getting Started with Data Virtualization TDWI Data Virtualization Best Practices Mistakes to Avoid 6-14 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

188 TDWI Data Virtualization Getting Started with Data Virtualization Best Practices Mistakes to Avoid LESSONS LEARNED The facing page lists nine mistakes to avoid in data virtualization that are excerpted from an article by Robert Eve. These are followed with four mistakes to avoid in data federation that are drawn from a TDWI report. Together these 13 mistakes to avoid provide good guidance to minimize the risk of struggling or failed data virtualization. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. 6-15

189 Getting Started with Data Virtualization TDWI Data Virtualization 6-16 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

190 Data Virtualization Case Studies Appendix A Data Virtualization Case Studies Title Page Investment Risk Management Portal A-3 Merchandising Portal A-5 Agile Sales Reporting A-7 Subscriber Analytics A-10 Customer and Competitor Intelligence A-12 Insurance Claims Research and Analytics A-15 Automated Web Data Extraction A-17 Business Transformation A-19 Real-Time Operational Intelligence A-21 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-1

191 Data Virtualization Case Studies TDWI Data Virtualization A-2 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

192 TDWI Data Virtualization Data Virtualization Case Studies Investment Risk Management Portal Wealth Management Industry CASE STUDY BACKGROUND This case study is based on the results of a global wealth management firm implementing the data virtualization platform from Cisco Data Virtualization. The company in this case study provides retail and institutional investment products to customers throughout several different countries. The firm employs investment managers and product development specialists to create different types of investment funds and products. Liabilities Management Investment Systems Fund Accounting Securities Transaction The firm serves its customers through well-managed securities portfolios that deliver the levels of returns based on defined risk levels. It is essential that fund managers have access to the current risk profile of the products they are managing in order to comply with advertised and expected target risk levels. Securities Data Warehouse Short Duration Fund Investment Characteristics Data Extracts Manual Data Consolidation and Integration Figure 1: The Original Situation Manually Generated Risk Management Reports THE BUSINESS OPPORTUNITY There was a major opportunity for business improvement in the area of risk management. Before the new data virtualization platform was implemented the daily risk management reporting capability was not acceptable to the firm s general management team and to the individual investment managers. The Securities Risk Management Improvement project was sponsored to enable integrated, timely and granular risk measurement and monitoring. It was also recognized that the solution must be implemented as soon as possible to deal with the risk management issue that the firm was facing. See Figure 1 for a description of the original situation. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-3

193 Data Virtualization Case Studies TDWI Data Virtualization THE TECHNICAL SOLUTION A solution was created based on a virtual data layer that was implemented between the application systems and the risk monitoring and management functions. The virtual layer supported connectivity to all relevant application systems implemented on different platforms. Access to securities transaction systems in DB2, investment systems in VSAM and an existing data warehouse in Oracle allowed data from these disparate systems to be acquired, integrated and provisioned to the unified portal to enable a cohesive risk management function within the firm. Liabilities Management Investment Systems Fund Accounting Securities Transaction Securities Data Warehouse Short Duration Fund Investment Characteristics Intraday Securities Risk Management Virtual Layer See Figure 2 for a view of the virtualized solution. Multiple Securities Lending Systems Data Virtualization Layer Securities Lending Portal OBSERVED BENEFITS Figure 2: The Virtualized Environment A variety of different types of business and technical benefits were observed following the implementation of the virtual risk management layer. The firm s revenue increased based on a positive change in the satisfaction of its investment customers. Client retention increased by 3% and client risk management processes improved. Staff productivity improved by a factor of 5 in processes related to the gathering and reporting of risk metrics. The overall risk reporting and analysis process became faster to include intraday activities. Because the solution operated in a virtual environment, there was an infrastructure cost avoidance benefit by not having to build additional databases to accomplish the same result. The manual effort to create spreadsheets for risk monitoring was avoided and data quality improved. A-4 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

194 TDWI Data Virtualization Data Virtualization Case Studies Merchandising Portal Print Media Distribution Industry CASE STUDY BACKGROUND This case study considers a scenario based on a company that serves as a national distributor of print media, including nationally branded magazines. The solution was implemented using technology from Cisco Data Virtualization. The magazines are sold through retail outlets in locations throughout the U.S. The distributor has decided it needs to increase product profitability, raise the productivity of its field sales staff and deliver a higher sellthrough at the retail outlets. This means that the overall sales quantities of each media brand has to be increased. Enterprise DW Field Force Application Microsoft CRM Oracle Financials No Access Field Sales Staff Figure 3: The Original Situation THE BUSINESS OPPORTUNITY To support the sales functions of the distributor, hundreds of field sales people are located throughout the U.S. with the responsibility of increasing sales of the various media brands to the retail outlets. However, they have lacked access to current point of sales scan information due to bridging difficulties with multiple application systems. Management decided they needed to make sales data more available to the sales people if they were going to realize their overall revenue growth and productivity targets. The ability to deliver a solution quickly and stay within the constraints of available IT staff was deemed critical by the company. Refer to Figure 3 for a representation of the original THE TECHNICAL SOLUTION The solution to the business challenge was a virtualized application data hub enabled by Cisco Data Virtualization. The data acquisition and integration capabilities of the platform allowed a TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-5

195 Data Virtualization Case Studies TDWI Data Virtualization merchandising portal to be built. This portal integrated data from the existing data warehouse based on Oracle; from the Field Force Application based on Lotus Notes; from the CRM application based on SQL Server and the Oracle Financials ERP platform. The information was extracted from these source systems, integrated into a common business view and provisioned to business users using a business objects reporting layer. See Figure 4 for a view of the virtualized environment. Enterprise DW Field Force Application Microsoft CRM Oracle Financials Application Data Hub Data Virtualization Merchandising Layer Portal Figure 4: The Virtualization Solution Field Sales Staff OBSERVED BENEFITS Several benefits were observed following implementation of the merchandising portal. There was a 50% reduction in the time required to implement the solution compared to the experience of implementing a data warehouse or data mart. Business impact included a 10% increase in field sales staff productivity and higher product profitability levels due to an increase in the sell-through rates of the different merchandise categories. A-6 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

196 TDWI Data Virtualization Data Virtualization Case Studies Agile Sales Reporting Pharmaceutical Industry CASE STUDY BACKGROUND This case study describes how data virtualization technology helped a large pharmaceutical company, located in the U.S., implement a new agile sales reporting solution. This study is based on published information from Denodo Technologies. The company sells its pharmaceuticals and related products to a global market that External Wholesalers Wholesalers & Distributors Wholesalers & Distributors & Distributors Internal Functional Departments Functional Departments Functional Departments Manual Processing includes over 90 different Figure 5: Simplified View of the Original Sales Reporting Process countries. The products are delivered to the customer base through an extensive network of wholesalers and distributors who work as partners with the company. The senior leadership team drives business results by tracking and analyzing sales performance and market share information on a daily basis. The analysis breaks the information down into product categories and geographic areas. Historically, the data collection process was very labor intensive. Sales data was collected from multiple internal source systems and from a variety of external organizations. The data formats were varied and included spreadsheet, PDF and XML formats. After the data was collected, manual integration efforts were required to provide a consolidated view of the data with the correct business context to help transform the raw data into useful information. Because the data was collected from several global locations, the units of measure for financial records and product physical properties had to be converted and standardized into a consistent and coherent data set. To deliver the necessary information to the senior leadership team, final reports were created in spreadsheets and distributed to the proper individuals. The entire data acquisition process used to gather data from multiple country locations including the necessary consolidation and conversion activities was managed using a manual approach. A simplified view of the original sales data collection and process is shown in Figure 5. THE BUSINESS OPPORTUNITY Because of the cost and effort required to support the sales reporting solution, the management team recognized that the process could not support its future business requirements and a new approach must be considered. The company was aggressively promoting revenue growth through increased sales. This was based on selling a broader portfolio of products into existing and additional market TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-7

197 Data Virtualization Case Studies TDWI Data Virtualization locations. This business driven growth would place an even greater strain on the manual sales reporting solution. It was recognized that the aggressive sales growth initiative could not be effectively managed without a new and automated sales reporting solution. The leadership team set annual revenue growth targets of 20%. Attaining this objective required timely, accurate feedback about sales results. Efforts to achieve the new annual sales growth targets drove further organizational changes which eventually created a demand for new and modified reports based on additional external web based data sources. It became evident to the leadership team, that an agile sales reporting solution was required that could connect to new and evolving data sources, keep pace with the growth in demand, improve the level of accuracy and enable historical and real-time access to key data. THE TECHNICAL SOLUTION A Data Virtualization platform was evaluated and selected as the foundation to meet the sales monitoring requirements of the organization. To support the data access requirements from the external network of wholesalers and distributors, automated processes were implemented to access the disparate data sources from the external providers using a Data Virtualization platform. External semi-structured and web data were converted into virtual, structured views that could be accessed by internal integration and reporting processes. External Backend Systems Partner Wholesalers Wholesalers & Distributors Wholesalers & Distributors & Distributors Internal Functional Departments Functional Departments Functional Departments Data Analyst Data Virtualization Layer Data Steward Figure 6: Conceptual View of the Data Virtualization Solution Sales Managers & Analysts Data integration processes were implemented using graphical tools that defined data mappings, structures, cleansing rules, transformation rules and data validation rules. Data lineage and impact analysis reports were available to support a combined automated and manual data maintenance process. The external data acquired by the data virtualization layer was then integrated with internal enterprise systems to provide a comprehensive view of the sales activities and results. This integration also took place on the virtualization layer. Data delivery functions were implemented by supporting multiple business views of the integrated data. Some business views were physically stored using cache, while other views were implemented using a virtual approach. Data is delivered using a variety of modes including real-time and batch. Data is also provided in a variety of formats to downstream applications including a virtual database, A-8 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

198 TDWI Data Virtualization Data Virtualization Case Studies exports to a data mart, delivery to common BI and reporting tools, web services, web portals, HTML, spreadsheets and mobile devices. A conceptual view of the new Agile Reporting Solution is shown in Figure 6. OBSERVED BENEFITS The company realized business and technical benefits by implementing the new Agile Reporting Solution. Executives were able to receive an expanded set of information to track and analyze global sales activities. The data set contained richer content, had higher quality and was more timely. New sources of data from partners and the public web site provided additional insights to the sales analytics process. From a productivity perspective, new reports were developed 60% faster and change requests were met within a few days. Although the number of reporting projects doubled, IT was able to use 40% less business analyst time by enabling both self-service reporting and reusable data services. Additional technical benefits included seamless integration with data marts, common BI tools and their existing middleware. Following the implementation of the agile reporting solution, additional business reporting applications were implemented using the data virtualization platform to support HR reporting, realtime inventory tracking, compliance reporting, physician master data management and data warehouse extensions. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-9

199 Data Virtualization Case Studies TDWI Data Virtualization Subscriber Analytics Telecommunications Industry CASE STUDY BACKGROUND This case study considers a telecommunications company that implemented a data virtualization solution based on Cisco Data Virtualization s products. The company was working to improve revenue levels by increasing customer satisfaction and decreasing attrition levels. The effort to carry out this activity required information On-Premise and Cloud Based Applications Slow and Inflexible Data Integration Process Operational Reports that was not integrated. Figure 7: The Original Situation Customer service, marketing and customer care managers needed a holistic view of customer preferences, profiles and activities. This information was fragmented between internal and external systems. The business could not meet its revenue growth and customer attrition targets if the enabling information was not available to support the various processes and initiatives. THE BUSINESS OPPORTUNITY Internal business teams and external partners both required access to integrated subscriber data delivered more quickly to them based on a comprehensive business view that enabled them to improve operations and customer satisfaction levels. The required data was located across multiple and diverse application systems. The internal SAP system stored data related to financial records, customer care activities and customer billing information. The cloud based salesforce.com service stored data in support of marketing and sales activities. It became clear that a deeper understanding about subscriber patterns was very difficult to achieve because of issues related to data availability and integration challenges Refer to Figure 7 for a view of the original situation. THE TECHNICAL SOLUTION The technical solution consisted of a virtual data layer implemented with Cisco Data Virtualization s data virtualization platform. The virtual data layer was implemented to integrate data from multiple, disparate sources across the extended enterprise. This included integrating data from the SAP A-10 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

200 TDWI Data Virtualization Data Virtualization Case Studies environment with data from the salesforce.com environment. The result was a unified business view of all relevant subscriber information necessary to drive the desired revenue growth. To distribute the information, all business teams had access to portals, reports and extended search capabilities. See Figure 8 for a depiction of the virtualized solution. Virtual Data Layer Timely and Flexible Integration Integrated Reporting OBSERVED BENEFITS Figure 8: The Virtual Data Layer Following implementation, it was observed that the effort was approximately 50% of what traditional integration projects had previously required. There was a 25% improvement in process efficiency and a measurable increase in revenue. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-11

201 Data Virtualization Case Studies TDWI Data Virtualization Customer and Competitor Intelligence Cable Industry CASE STUDY BACKGROUND This case study presents the situation faced by a major cable operator located in Spain and how data virtualization helped them improve their business performance. The case study description is based on material provided by Denodo Technologies. The cable operator serves a growing domestic market of over 1 million homes and businesses. They provide data communications services that include telephone, high speed internet, video conferencing, digital messaging and television access. The company has a market share of just over 50%. THE BUSINESS OPPORTUNITY Manual Processing Manual Processing Figure 9: Original Information Flows Customer Account Analytics Competitive Intelligence Reports Customers Customer Account Managers Internal Business Teams The business model employed by the cable operator is information intensive. Requirements were identified that allow existing customers and internal business teams access to different types of realtime data, including customer activity and account information. This allows customers to manage their own accounts in terms of payments and selecting different types of services. Internal business teams use this information to improve the customer support process and to produce marketing campaigns targeted to customers that may purchase additional services. In addition to this, the internal teams also have a requirement to access and monitor competitor information within their market area. They need to access competitive pricing details, other marketing promotions and the availability of competitive service offerings. To achieve these information requirements addressing customer and competitor information, it is necessary to extract and integrate data from many different source locations, both internal and external to the organization. There is also a defined requirement to access and consolidate structured A-12 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

202 TDWI Data Virtualization Data Virtualization Case Studies and unstructured data from internal and external sources. There is also a requirement to integrate semi-structured data extracted from web sites to monitor competitive offerings. The initial information flows are shown in Figure 9. THE TECHNICAL SOLUTION The technical solution implemented to address these business requirements is based on the Enterprise Data Mashups platform from Denodo Technologies. This platform supports the acquisition of large volumes of data from the internet and integration with internal enterprise data. Integration Customer Account Analytics Customers Customer Account Managers The implemented platform Automated unifies integration of data Web sourced from standalone Extraction and Competitive Internal internal applications with web Integration Intelligence Business data extraction. Unstructured Reports Teams data from the web is searched, indexed and integrated with the internal data. Valuable unstructured data includes customer comments located in internal applications, s and other documents. Competitors Data Virtualization Layer Figure 10: Using the Data Virtualization Layer web sites are also automatically scanned to obtain relevant competitive data that is then integrated with the internal data. Unstructured data can be modified to define query structures within it to enable business user access to useful customer and competitor information originating as text and comments. A single view of customer data both structured and unstructured is integrated from many different sources and provided to business teams for their analysis. Refer to Figure 10 to see how the information flows are managed using the virtualization solution. OBSERVED BENEFITS Several direct business benefits have been identified from this implementation. Customers can now access all of their account data in real-time to enable self-sufficient account management. This has TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-13

203 Data Virtualization Case Studies TDWI Data Virtualization reduced the call center inbound load that leads to lower costs which then translates into more satisfied and empowered customers. Improved customer and competitor insight is now available to the internal business teams, allowing them to anticipate customer needs and new product offerings or features. New products are now brought to market in a more rapid, proactive manner. Computing infrastructure costs are managed to acceptable levels and more effective cross-selling and targeted marketing campaigns are carried out online. A-14 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

204 TDWI Data Virtualization Data Virtualization Case Studies Insurance Claims Research and Analytics Health Insurance Industry CASE STUDY BACKGROUND This case study is based on an implementation of data virtualization technology from Cisco Data Virtualization for a company in the healthcare insurance industry. The company provides financing of health care insurance, health care services and long-term care insurance nationwide. Claims processing is a critical activity that supports customer service and corporate profitability. Cloud Data Sources Claims Data Marts Internal Business Applications Enterprise Data Warehouse Request for Information Delivered Information THE BUSINESS OPPORTUNITY External Medical Vendor Applications The analysis of claims must be done online while on the Figure 11: The Original Situation phone with customers. This level of service is core to the company s strategy of providing superior customer service in a competitive marketplace. Unfortunately, as the business of insurance evolves, not all of the necessary information to support the claims analysis is available within the vendor based claims processing system. This deficiency puts increased pressure on acquiring the necessary but deficient data from other sources and integrating it with the existing application data. For example, new legislation for Medicare evolves over time. These changes can create new requirements for additional data elements that have not previously been stored by the claims systems. To support the changing requirements, the required data must be found in a third-party database and integrated with the existing data while supporting the real-time availability requirement. See Figure 11 for a view of the original situation. Information Access and Delivery Through Custom Developed SAS Queries and Reports TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-15

205 Data Virtualization Case Studies TDWI Data Virtualization THE TECHNICAL SOLUTION The technical solution enable by Cisco Data Virtualization s data virtualization platform consists of a virtual Claims Research Data Layer. The single data layer acquires the necessary data elements from internal applications or from external databases and integrates them into a unified view in real-time. The claims analysts are then capable of accessing the integrated claims data and carrying out the claims analysis and customer support processes. Cloud Data Sources Claims Data Marts Internal Business Applications Enterprise Data Warehouse Virtual Claims Research Layer Self Service Reports Customer and Broker Reports Analytics from SAS Real-time capability to permit fast self-service joining, External filtering and caching from Medical Vendor high-volume data sources Applications enables the analysis of Figure 12: The Data Virtualization Solution employer groups and episode treatment groups and then assigns risk categories to this data. Reports and OLAP See Figure 12 for a representation of the virtualization approach. OBSERVED BENEFITS Following implementation of the solution, it was observed that the delivery time was approximately twice as fast when compared to traditional approaches. The overall solution led to higher customer satisfaction because of the real-time telephone approval process. There was also a general increase in agility of both the business and IT teams to respond to process and data changing requirements. A-16 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

206 TDWI Data Virtualization Data Virtualization Case Studies Automated Web Data Extraction Financial Services Industry CASE STUDY BACKGROUND This case study describes the operations of a debt collection company operating in Spain and how data virtualization technology from Denodo Technology helped them streamline some of their processes. The company is a leader in debt collection management. They Debt Activities Collection maintain a network of 24 offices and 200 collection agents managing in excess of Google 500,000 files. They offer an integrated set of services Debt Collection related to debt collections to help their clients accelerate late Blogs Analysts and Agents payments or to collect on loans Slow, Time Consuming that are in arrears. The collection agents use telephone Figure 13: The Original Process directories to locate individuals who are the subject of their collection activities. The company s success is based on their computer systems that are used to identify and locate people they are searching for. Some people who are in debt and do not plan to make payments attempt to hide and become difficult to locate using conventional methods. It is this segment of the debt collection market that shows the most potential for improvement. The financial impact to the clients of this company is very large. Efficient and timely collections management significantly impacts the cash flow of an organization. If the collections company can improve their success rate, they can offer additional value to their customers. THE BUSINESS OPPORTUNITY Social Media Directories CRM Manual Search and Analysis A significant portion of the effort related to debt collections is in the effort to locate the individuals who are hiding from their debt obligations. Obtaining valid telephone numbers for these individuals is the starting point of the process. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-17

207 Data Virtualization Case Studies TDWI Data Virtualization Previous to the new solution, collections agents would manually navigate multiple computer screens and directories. The directories may be offline documents or they may be online located on a website. The manual effort to browse website directories and offline documents is very time consuming and laborious. Improving the speed of tracking and locating individuals using automation technology combined with information on the internet would drive higher business success rates and eventually happier clients who have had their debts collected for them. See Figure 13 for a view of the original process. THE TECHNICAL SOLUTION Social A data virtualization solution Media was implemented based on automated web extraction processes. Agents know that individuals who frequently use the internet for a variety of Data CRM reasons leave a trail of their Virtualization External activities. This footprint can Data be collected by automated Access Debt Layer Collection processes enabled in the technology platform. Public directories, social media sites, Google blogs, search engines and government portals all provide Blogs Debt Collection data fragments of the trail left Analysts and Agents by individuals. Software modules emulate the different web site navigation processes Automated, Productive and Comprehensive Figure 14: The Virtualized Environment used to access key web sites. The automated processes produced superior results to manual searches. The contact information including potential phone numbers is uploaded into the internal systems of the collections company. The newly discovered contact data is then integrated with other internal systems to enable the contact and collections process carried out the agents of the company. Refer to Figure 14 for a depiction of the virtualized environment. OBSERVED BENEFITS It has been observed that the search processes are now carried out at lower cost and with more rapid results. Errors are reduced and the new contact data is integrated with corporate systems to yield a higher rate of valid telephone numbers. The productivity of the agents carrying out investigations to find contact data for delinquent individuals has increased by a factor of 40. This was determined by measuring the change in the number of locate files managed per hour using the new automated process. The rate of positive debt collections has increased overall by 5%. A-18 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

208 TDWI Data Virtualization Data Virtualization Case Studies Business Transformation Pharmaceuticals Industry CASE STUDY BACKGROUND This case study considers the situation faced by a large global pharmaceutical company. Cisco Data Virtualization provided the data virtualization technology. The senior leadership team at the company developed an aggressive revenue growth strategy. Annual revenue growth targets were established that required a transformation of how the firm carried out its basic research, product development, sales, marketing and customer service functions. Expanded portfolio information scope Valuation, finance, commercial Expanded data flows via partnership with business units & Platform lines Expansion to novel Data formats & content Virtual Data Layer Information Delivery Styles and Formats LOB Applications Data Warehouses THE BUSINESS OPPORTUNITY The strategy developed that Agile Enhanced Integration and Access Change Management Data Discovery Capabilities supported the aggressive sales Infrastructure Consolidation Infrastructure Scalability growth targets was defined according to three phases. The first phase focused on the three Figure 15: Business Transformation Enabled with Data Virtualization business factors of improved agility, increasing value and increasing efficiency. All three of these business factors depended on information availability. The second phase focused on implementing data virtualization technologies to deliver the relevant and useful data to 2,000 R&D business users. The third phase focused on how the advanced visualization techniques could be applied to the virtual data layer to help drive insights into the product development cycle that would drive future revenue growth. THE TECHNICAL SOLUTION Governance Standards SOA Services The technical solution enabled by Cisco Data Virtualization provided a virtual data layer that integrated data from marketing, financial, clinical trails, research activities and external internet sites. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-19

209 Data Virtualization Case Studies TDWI Data Virtualization The virtual data layer acquired the data and consolidated it into a set of useful business and medical views that enabled easy access for the researchers, financial planners and marketing professionals. The available data in the virtual layer was then provided and displayed using advanced visual analytics approaches to seek relevant patterns and trends that drove the insights necessary to create the products that continued to drive revenue growth. Refer to Figure 15 for a conceptual view of the data virtualization solution. OBSERVED BENEFITS Some of the observed benefits of the data virtualization environment are shown below. Revenue potential of single drug increased by hundreds of millions dollars Data integration development times were 5 times faster than traditional methods Integration of useful data for the staff could be implemented in less than a week versus months using previous approaches Increased business value by driving product development Business users empowered with more information and able to make timely decisions based on market trends and research results. The empowerment is a key enabler of agile business performance. A-20 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

210 TDWI Data Virtualization Data Virtualization Case Studies Real-Time Operational Intelligence Telecommunications Industry CASE STUDY BACKGROUND This case study is based on one of the largest telecommunications operators in the world and how it used data virtualization products from Denodo Technology to improve its performance. The company is a dominant player serving 278 million customers throughout Europe and Latin America. To provide high quality service to its customers, various business units depend on information delivered by IT systems to enable their activities. To fulfill its obligation to client departments, the central IT department delivers new applications, maintains current ones and supports key product development and business Figure 16: The Original Situation improvement initiatives. The IT management has adopted and implemented the ITIL framework of best practices to delivery high quality service at lower cost to the rest of the organization. THE BUSINESS OPPORTUNITY To execute processes and serve its clients within an ITIL framework, the IT management team requires performance measures delivered in a timely manner to address issues and maintain the desired levels of client service. Operational Business Intelligence (BI) capability was identified as the mechanism for achieving this service delivery objective. Implementing BI in an operational context requires real-time extraction, integration, transformation and presentation of data. Operational BI enables managers to monitor and analyze the IT service levels in a real-time manner that is consistent with the ITIL framework TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-21

211 Data Virtualization Case Studies TDWI Data Virtualization Raw measurement data must be collected from multiple IT and network related systems. This includes application monitoring, network inventory, project status, incident management and service level management. Although each of these systems had individual dashboards, it generally required the effort of 20 people each month to extract and integrate the information necessary to produce the reports for ITIL performance management. It generally required one week of effort per report. IT management recognized that a lack of real-time views into the overall performance hampered their ability to respond to incidents and deal with changing priorities and conditions. The managers were restricted in their ability to achieve their target performance levels because the measurement data was lagging and typically at least a month old. See Figure 16 for a view of the original situation. THE TECHNICAL SOLUTION The data virtualization platform enabled a flexible and unified solution that was easy to build and maintain. Extracted data from the disparate inventory systems, version control applications, system probe logs and project management systems was used to enable a real-time portal for showing a unified view into the necessary IT processes for all managers to access and analyze. Automated Data Retrieval and Integration Data Virtualization Data Services IT Managers and Support Staff The data virtualization components were able to abstract, generalize and find data patterns to enable the combination of semi-structured data with structured data. The resulting data sets were then integrated with source systems from other departments. Policies On-Demand and Responsive and business rules were extracted from unstructured Figure 17: The Virtualized Solution documents to design suitable metrics and associated target settings. The solutions leveraged the web automation capability to access data about the external environment and resources from third-party partners and service providers. A-22 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

212 TDWI Data Virtualization Data Virtualization Case Studies The solution implemented an integration and service layer needed to solve the ITIL framework measurement and monitoring challenge in real-time. The solution delivers real-time views of the relevant metrics with drill-down capability to lower levels of detail to enable suitable management attention and response. Figure 17 shows a representation of the virtualized environment. OBSERVED BENEFITS A variety of benefits have been observed since the solution was implemented. Operational cost savings have been realized. Access to real-time measurements has increased the levels of service provided to the organization from the IT department. Report generation containing the metrics has been reduced from one week to one day. Manual effort on data integration was reduced by 90%. Personnel were then redirected to monitoring and problem solving and this contributed to improved service levels. The delivery of IT services has become more responsive to business needs and the delivery of new capabilities such as virtual dashboards that summarize other dashboards has enabled a broader overall process perspective. TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY. A-23

213 Data Virtualization Case Studies TDWI Data Virtualization A-24 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY.

214 Bibliography and References Appendix B Bibliography and References TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY B-1

215 Bibliography and References TDWI Data Virtualization B-2 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

216 TDWI Data Virtualization Bibliography and References Bibliography and References Data Architecture: From Zen to Reality, Tupper Elseveir, 2011 Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture, Giordano IBM Press, 2011 Data Strategy, Adelman, Moss & Abai Addison-Wesley, 2005 Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility, Davis and Eve Westminster Promotions, 2011 Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses, van der Lans Morgan Kaufmann, 2012 Managing Data in Motion: Data Integration Best Practice Techniques and Technologies, Reeve Morgan Kaufmann, 2013 Managing Your Business Data, Kushner & Villar Racom Books, 2009 Master Data Management, Loshin Morgan Kaufman, 2009 Principles of Data Management: Facilitating Information Sharing, Gordon British Computer Society, 2007 Tapping Into Unstructured Data, Inmon and Nesavich Prentice Hall, 2007 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY B-3

217 Bibliography and References TDWI Data Virtualization B-4 TDWI. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. DO NOT COPY

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here Data Virtualization for Agile Business Intelligence Systems and Virtual MDM To View This Presentation as a Video Click Here Agenda Data Virtualization New Capabilities New Challenges in Data Integration

More information

Data Virtualization. Paul Moxon Denodo Technologies. Alberta Data Architecture Community January 22 nd, 2014. 2014 Denodo Technologies

Data Virtualization. Paul Moxon Denodo Technologies. Alberta Data Architecture Community January 22 nd, 2014. 2014 Denodo Technologies Data Virtualization Paul Moxon Denodo Technologies Alberta Data Architecture Community January 22 nd, 2014 The Changing Speed of Business 100 25 35 45 55 65 75 85 95 Gartner The Nexus of Forces Today s

More information

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,

More information

MDM and Data Warehousing Complement Each Other

MDM and Data Warehousing Complement Each Other Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There

More information

Integrating SAP and non-sap data for comprehensive Business Intelligence

Integrating SAP and non-sap data for comprehensive Business Intelligence WHITE PAPER Integrating SAP and non-sap data for comprehensive Business Intelligence www.barc.de/en Business Application Research Center 2 Integrating SAP and non-sap data Authors Timm Grosser Senior Analyst

More information

Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise. Colin White Founder, BI Research TDWI Webcast October 2005

Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise. Colin White Founder, BI Research TDWI Webcast October 2005 Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise Colin White Founder, BI Research TDWI Webcast October 2005 TDWI Data Integration Study Copyright BI Research 2005 2 Data

More information

Data Virtualization A Potential Antidote for Big Data Growing Pains

Data Virtualization A Potential Antidote for Big Data Growing Pains perspective Data Virtualization A Potential Antidote for Big Data Growing Pains Atul Shrivastava Abstract Enterprises are already facing challenges around data consolidation, heterogeneity, quality, and

More information

Data virtualization: Delivering on-demand access to information throughout the enterprise

Data virtualization: Delivering on-demand access to information throughout the enterprise IBM Software Thought Leadership White Paper April 2013 Data virtualization: Delivering on-demand access to information throughout the enterprise 2 Data virtualization: Delivering on-demand access to information

More information

M6P. TDWI Data Warehouse Automation: Better, Faster, Cheaper You Can Have It All. Mark Peco

M6P. TDWI Data Warehouse Automation: Better, Faster, Cheaper You Can Have It All. Mark Peco M6P European TDWI Conference with BARC@TDWI-Track June 22 24, 2015 MOC Munich / Germany TDWI Data Warehouse Automation: Better, Faster, Cheaper You Can Have It All Mark Peco TDWI. All rights reserved.

More information

Data Virtualization Usage Patterns for Business Intelligence/ Data Warehouse Architectures

Data Virtualization Usage Patterns for Business Intelligence/ Data Warehouse Architectures DATA VIRTUALIZATION Whitepaper Data Virtualization Usage Patterns for / Data Warehouse Architectures www.denodo.com Incidences Address Customer Name Inc_ID Specific_Field Time New Jersey Chevron Corporation

More information

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University Given today s business environment, at times a corporate executive

More information

Adopting the DMBOK. Mike Beauchamp Member of the TELUS team Enterprise Data World 16 March 2010

Adopting the DMBOK. Mike Beauchamp Member of the TELUS team Enterprise Data World 16 March 2010 Adopting the DMBOK Mike Beauchamp Member of the TELUS team Enterprise Data World 16 March 2010 Agenda The Birth of a DMO at TELUS TELUS DMO Functions DMO Guidance DMBOK functions and TELUS Priorities Adoption

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

Effecting Data Quality Improvement through Data Virtualization

Effecting Data Quality Improvement through Data Virtualization Effecting Data Quality Improvement through Data Virtualization Prepared for Composite Software by: David Loshin Knowledge Integrity, Inc. June, 2010 2010 Knowledge Integrity, Inc. Page 1 Introduction The

More information

A Service-oriented Architecture for Business Intelligence

A Service-oriented Architecture for Business Intelligence A Service-oriented Architecture for Business Intelligence Liya Wu 1, Gilad Barash 1, Claudio Bartolini 2 1 HP Software 2 HP Laboratories {[email protected]} Abstract Business intelligence is a business

More information

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Data Virtualization and ETL. Denodo Technologies Architecture Brief Data Virtualization and ETL Denodo Technologies Architecture Brief Contents Data Virtualization and ETL... 3 Summary... 3 Data Virtualization... 7 What is Data Virtualization good for?... 8 Applications

More information

FROM DATA STORE TO DATA SERVICES - DEVELOPING SCALABLE DATA ARCHITECTURE AT SURS. Summary

FROM DATA STORE TO DATA SERVICES - DEVELOPING SCALABLE DATA ARCHITECTURE AT SURS. Summary UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Working paper 27 February 2015 Workshop on the Modernisation of Statistical Production Meeting, 15-17 April 2015 Topic

More information

Using Master Data in Business Intelligence

Using Master Data in Business Intelligence helping build the smart business Using Master Data in Business Intelligence Colin White BI Research March 2007 Sponsored by SAP TABLE OF CONTENTS THE IMPORTANCE OF MASTER DATA MANAGEMENT 1 What is Master

More information

Data Warehouse Automation A Decision Guide

Data Warehouse Automation A Decision Guide Data Warehouse Automation A Decision Guide A White Paper by Dave Wells Infocentric LLC Table of Contents Seven Myths of Data Warehouse Automation 1 Why Automate Data Warehousing? 2 The Basis of Data Warehouse

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Appliances and DW Architectures John O Brien President and Executive Architect Zukeran Technologies 1 TDWI 1 Agenda What

More information

Business Intelligence In SAP Environments

Business Intelligence In SAP Environments Business Intelligence In SAP Environments BARC Business Application Research Center 1 OUTLINE 1 Executive Summary... 3 2 Current developments with SAP customers... 3 2.1 SAP BI program evolution... 3 2.2

More information

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition 1 What s New with Informatica Data Services & PowerCenter Data Virtualization Edition Kevin Brady, Integration Team Lead Bonneville Power Wei Zheng, Product Management Informatica Ash Parikh, Product Marketing

More information

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS PRODUCT FACTS & FEATURES KEY FEATURES Comprehensive, best-of-breed capabilities 100 percent thin client interface Intelligence across multiple

More information

TDWI Data Integration Techniques: ETL & Alternatives for Data Consolidation

TDWI Data Integration Techniques: ETL & Alternatives for Data Consolidation TDWI Data Integration Techniques: ETL & Alternatives for Data Consolidation Format : C3 Education Course Course Length : 9am to 5pm, 2 consecutive days Date : Sydney 22-23 Nov 2011, Melbourne 28-29 Nov

More information

Service Oriented Data Management

Service Oriented Data Management Service Oriented Management Nabin Bilas Integration Architect Integration & SOA: Agenda Integration Overview 5 Reasons Why Is Critical to SOA Oracle Integration Solution Integration

More information

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS

ORACLE BUSINESS INTELLIGENCE SUITE ENTERPRISE EDITION PLUS Oracle Fusion editions of Oracle's Hyperion performance management products are currently available only on Microsoft Windows server platforms. The following is intended to outline our general product

More information

Master Data Management and Data Warehousing. Zahra Mansoori

Master Data Management and Data Warehousing. Zahra Mansoori Master Data Management and Data Warehousing Zahra Mansoori 1 1. Preference 2 IT landscape growth IT landscapes have grown into complex arrays of different systems, applications, and technologies over the

More information

Gradient An EII Solution From Infosys

Gradient An EII Solution From Infosys Gradient An EII Solution From Infosys Keywords: Grid, Enterprise Integration, EII Introduction New arrays of business are emerging that require cross-functional data in near real-time. Examples of such

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

www.ducenit.com Analance Data Integration Technical Whitepaper

www.ducenit.com Analance Data Integration Technical Whitepaper Analance Data Integration Technical Whitepaper Executive Summary Business Intelligence is a thriving discipline in the marvelous era of computing in which we live. It s the process of analyzing and exploring

More information

IBM Information Management

IBM Information Management IBM Information Management January 2008 IBM Information Management software Enterprise Information Management, Enterprise Content Management, Master Data Management How Do They Fit Together An IBM Whitepaper

More information

Informatica PowerCenter Data Virtualization Edition

Informatica PowerCenter Data Virtualization Edition Data Sheet Informatica PowerCenter Data Virtualization Edition Benefits Rapidly deliver new critical data and reports across applications and warehouses Access, merge, profile, transform, cleanse data

More information

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd Page 1 of 8 TU1UT TUENTERPRISE TU2UT TUREFERENCESUT TABLE

More information

A Perspective on the Benefits of Data Virtualization Technology

A Perspective on the Benefits of Data Virtualization Technology 110 Informatica Economică vol. 15, no. 4/2011 A Perspective on the Benefits of Data Virtualization Technology Ana-Ramona BOLOGA, Razvan BOLOGA Academy of Economic Studies, Bucharest, Romania [email protected],

More information

Busting 7 Myths about Master Data Management

Busting 7 Myths about Master Data Management Knowledge Integrity Incorporated Busting 7 Myths about Master Data Management Prepared by: David Loshin Knowledge Integrity, Inc. August, 2011 Sponsored by: 2011 Knowledge Integrity, Inc. 1 (301) 754-6350

More information

Data Services: The Marriage of Data Integration and Application Integration

Data Services: The Marriage of Data Integration and Application Integration Data Services: The Marriage of Data Integration and Application Integration A Whitepaper Author: Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy July, 2012 Sponsored by Copyright

More information

Business Intelligence & IT Governance

Business Intelligence & IT Governance Business Intelligence & IT Governance The current trend and its implication on modern businesses Jovany Chaidez 12/3/2008 Prepared for: Professor Michael J. Shaw BA458 IT Governance Fall 2008 The purpose

More information

Executive Summary WHO SHOULD READ THIS PAPER?

Executive Summary WHO SHOULD READ THIS PAPER? The Business Value of Business Intelligence in SharePoint 2010 Executive Summary SharePoint 2010 is The Business Collaboration Platform for the Enterprise & the Web that enables you to connect & empower

More information

SQL Server 2012 Business Intelligence Boot Camp

SQL Server 2012 Business Intelligence Boot Camp SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations

More information

A WHITE PAPER By Silwood Technology Limited

A WHITE PAPER By Silwood Technology Limited A WHITE PAPER By Silwood Technology Limited Using Safyr to facilitate metadata transparency and communication in major Enterprise Applications Executive Summary Enterprise systems packages such as SAP,

More information

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS! The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader

More information

www.sryas.com Analance Data Integration Technical Whitepaper

www.sryas.com Analance Data Integration Technical Whitepaper Analance Data Integration Technical Whitepaper Executive Summary Business Intelligence is a thriving discipline in the marvelous era of computing in which we live. It s the process of analyzing and exploring

More information

Managing Data in Motion

Managing Data in Motion Managing Data in Motion Data Integration Best Practice Techniques and Technologies April Reeve ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY

More information

OWB Users, Enter The New ODI World

OWB Users, Enter The New ODI World OWB Users, Enter The New ODI World Kulvinder Hari Oracle Introduction Oracle Data Integrator (ODI) is a best-of-breed data integration platform focused on fast bulk data movement and handling complex data

More information

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff The Challenge IT Executives are challenged with issues around data, compliancy, regulation and making confident decisions on their business

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

TDWI Project Management for Business Intelligence

TDWI Project Management for Business Intelligence TDWI Project Management for Business Intelligence Format : C3 Education Course Course Length : 9am to 5pm, 2 consecutive days Date : February, 2012 Venue : Syd / Melb - TBC Cost : Early bird rate $1,998

More information

Five Technology Trends for Improved Business Intelligence Performance

Five Technology Trends for Improved Business Intelligence Performance TechTarget Enterprise Applications Media E-Book Five Technology Trends for Improved Business Intelligence Performance The demand for business intelligence data only continues to increase, putting BI vendors

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA Applied Business Intelligence Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA Agenda Business Drivers and Perspectives Technology & Analytical Applications Trends Challenges

More information

Trends In Data Quality And Business Process Alignment

Trends In Data Quality And Business Process Alignment A Custom Technology Adoption Profile Commissioned by Trillium Software November, 2011 Introduction Enterprise organizations indicate that they place significant importance on data quality and make a strong

More information

Attunity Integration Suite

Attunity Integration Suite Attunity Integration Suite A White Paper February 2009 1 of 17 Attunity Integration Suite Attunity Ltd. follows a policy of continuous development and reserves the right to alter, without prior notice,

More information

Suresh Chandrasekaran, SVP North America and APAC Pablo Alvarez, Sales Engineer Denodo Technologies

Suresh Chandrasekaran, SVP North America and APAC Pablo Alvarez, Sales Engineer Denodo Technologies Suresh Chandrasekaran, SVP North America and APAC Pablo Alvarez, Sales Engineer Denodo Technologies March 9, 2011 Agenda Data Virtualization What Is It? 2011: The Tipping Point Business Needs and Analyst

More information

Exploring the Synergistic Relationships Between BPC, BW and HANA

Exploring the Synergistic Relationships Between BPC, BW and HANA September 9 11, 2013 Anaheim, California Exploring the Synergistic Relationships Between, BW and HANA Sheldon Edelstein SAP Database and Solution Management Learning Points SAP Business Planning and Consolidation

More information

redesigning the data landscape to deliver true business intelligence Your business technologists. Powering progress

redesigning the data landscape to deliver true business intelligence Your business technologists. Powering progress redesigning the data landscape to deliver true business intelligence Your business technologists. Powering progress The changing face of data complexity The storage, retrieval and management of data has

More information

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10

More information

Data Integration for the Real Time Enterprise

Data Integration for the Real Time Enterprise Executive Brief Data Integration for the Real Time Enterprise Business Agility in a Constantly Changing World Overcoming the Challenges of Global Uncertainty Informatica gives Zyme the ability to maintain

More information

Lection 3-4 WAREHOUSING

Lection 3-4 WAREHOUSING Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing

More information

Framework for Data warehouse architectural components

Framework for Data warehouse architectural components Framework for Data warehouse architectural components Author: Jim Wendt Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 04/08/11 Email: [email protected] Abstract:

More information

Realizing business flexibility through integrated SOA policy management.

Realizing business flexibility through integrated SOA policy management. SOA policy management White paper April 2009 Realizing business flexibility through integrated How integrated management supports business flexibility, consistency and accountability John Falkl, distinguished

More information

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining BUSINESS INTELLIGENCE Bogdan Mohor Dumitrita 1 Abstract A Business Intelligence (BI)-driven approach can be very effective in implementing business transformation programs within an enterprise framework.

More information

Business Intelligence

Business Intelligence Transforming Information into Business Intelligence Solutions Business Intelligence Client Challenges The ability to make fast, reliable decisions based on accurate and usable information is essential

More information

Data Warehouse (DW) Maturity Assessment Questionnaire

Data Warehouse (DW) Maturity Assessment Questionnaire Data Warehouse (DW) Maturity Assessment Questionnaire Catalina Sacu - [email protected] Marco Spruit [email protected] Frank Habers [email protected] September, 2010 Technical Report UU-CS-2010-021

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION KEY FEATURES Out-of-box integration with databases, ERPs, CRMs, B2B systems, flat files, XML data, LDAP, JDBC, ODBC Knowledge

More information

Nothing in this job description restricts management's right to assign or reassign duties and responsibilities to this job at any time.

Nothing in this job description restricts management's right to assign or reassign duties and responsibilities to this job at any time. H22111, page 1 Nothing in this job description restricts management's right to assign or reassign duties and responsibilities to this job at any time. DUTIES This is a non-career term job at the Metropolitan

More information

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR ORACLE DATA INTEGRATOR ENTERPRISE EDITION KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR ENTERPRISE EDITION OFFERS LEADING PERFORMANCE, IMPROVED PRODUCTIVITY, FLEXIBILITY AND LOWEST TOTAL COST OF OWNERSHIP

More information

Master Data Management. Zahra Mansoori

Master Data Management. Zahra Mansoori Master Data Management Zahra Mansoori 1 1. Preference 2 A critical question arises How do you get from a thousand points of data entry to a single view of the business? We are going to answer this question

More information

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs Database Systems Journal vol. III, no. 1/2012 41 SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs 1 Silvia BOLOHAN, 2

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management

More information

Enabling Data Quality

Enabling Data Quality Enabling Data Quality Establishing Master Data Management (MDM) using Business Architecture supported by Information Architecture & Application Architecture (SOA) to enable Data Quality. 1 Background &

More information

STRATEGIES ON SOFTWARE INTEGRATION

STRATEGIES ON SOFTWARE INTEGRATION STRATEGIES ON SOFTWARE INTEGRATION Cornelia Paulina Botezatu and George Căruţaşu Faculty of Computer Science for Business Management Romanian-American University, Bucharest, Romania ABSTRACT The strategy

More information

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2 Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on

More information

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server Extending Hyperion BI with the Oracle BI Server Mark Ostroff Sr. BI Solutions Consultant Agenda Hyperion BI versus Hyperion BI with OBI Server Benefits of using Hyperion BI with the

More information

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information

More information

What is Data Virtualization? Rick F. van der Lans, R20/Consultancy

What is Data Virtualization? Rick F. van der Lans, R20/Consultancy What is Data Virtualization? by Rick F. van der Lans, R20/Consultancy August 2011 Introduction Data virtualization is receiving more and more attention in the IT industry, especially from those interested

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition 12c delivers high-performance data movement and transformation among enterprise platforms with its open and integrated

More information

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT POINT-AND-SYNC MASTER DATA MANAGEMENT 04.2005 Hyperion s new master data management solution provides a centralized, transparent process for managing critical

More information

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)

More information

Delivering information you can trust June 2007. IBM Multiform Master Data Management: The evolution of MDM applications

Delivering information you can trust June 2007. IBM Multiform Master Data Management: The evolution of MDM applications June 2007 IBM Multiform Master Data Management: The evolution of MDM applications Page 2 Contents 2 Traditional approaches to master data management 2 The enterprise application 4 The data warehouse 5

More information

Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006

Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006 Practical Considerations for Real-Time Business Intelligence Donovan Schneider Yahoo! September 11, 2006 Outline Business Intelligence (BI) Background Real-Time Business Intelligence Examples Two Requirements

More information

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers 60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative

More information

Harness the value of information throughout the enterprise. IBM InfoSphere Master Data Management Server. Overview

Harness the value of information throughout the enterprise. IBM InfoSphere Master Data Management Server. Overview IBM InfoSphere Master Data Management Server Overview Master data management (MDM) allows organizations to generate business value from their most important information. Managing master data, or key business

More information

BIM the way we do it. Data Virtualization. How to get your Business Intelligence answers today

BIM the way we do it. Data Virtualization. How to get your Business Intelligence answers today BIM the way we do it Data Virtualization How to get your Business Intelligence answers today 2 BIM the way we do it The challenge: building data warehouses takes time, but analytics are needed urgently

More information

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8 Enterprise Solutions Data Warehouse & Business Intelligence Chapter-8 Learning Objectives Concepts of Data Warehouse Business Intelligence, Analytics & Big Data Tools for DWH & BI Concepts of Data Warehouse

More information

INFORMATION MANAGEMENT. Transform Your Information into a Strategic Asset

INFORMATION MANAGEMENT. Transform Your Information into a Strategic Asset INFORMATION MANAGEMENT Transform Your Information into a Strategic Asset The information explosion in all organizations has created a challenge and opportunity for enterprises. When properly managed, information

More information

An Oracle White Paper March 2014. Best Practices for Real-Time Data Warehousing

An Oracle White Paper March 2014. Best Practices for Real-Time Data Warehousing An Oracle White Paper March 2014 Best Practices for Real-Time Data Warehousing Executive Overview Today s integration project teams face the daunting challenge that, while data volumes are exponentially

More information

Data Ownership and Enterprise Data Management: Implementing a Data Management Strategy (Part 3)

Data Ownership and Enterprise Data Management: Implementing a Data Management Strategy (Part 3) A DataFlux White Paper Prepared by: Mike Ferguson Data Ownership and Enterprise Data Management: Implementing a Data Management Strategy (Part 3) Leader in Data Quality and Data Integration www.flux.com

More information

Enterprise Enabler and the Microsoft Integration Stack

Enterprise Enabler and the Microsoft Integration Stack Enterprise Enabler and the Microsoft Integration Stack Creating a complete Agile Enterprise Integration Solution with Enterprise Enabler Mike Guillory Director of Technical Development Stone Bond Technologies,

More information

A business intelligence agenda for midsize organizations: Six strategies for success

A business intelligence agenda for midsize organizations: Six strategies for success IBM Software Business Analytics IBM Cognos Business Intelligence A business intelligence agenda for midsize organizations: Six strategies for success A business intelligence agenda for midsize organizations:

More information

Course 103402 MIS. Foundations of Business Intelligence

Course 103402 MIS. Foundations of Business Intelligence Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:

More information

An Overview of SAP BW Powered by HANA. Al Weedman

An Overview of SAP BW Powered by HANA. Al Weedman An Overview of SAP BW Powered by HANA Al Weedman About BICP SAP HANA, BOBJ, and BW Implementations The BICP is a focused SAP Business Intelligence consulting services organization focused specifically

More information

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide IBM Cognos Business Intelligence (BI) helps you make better and smarter business decisions faster. Advanced visualization

More information