DIGGING DEEPER: What Really Matters in Data Integration Evaluations?

Similar documents
Measure Your Data and Achieve Information Governance Excellence

Whitepaper: Commercial Open Source vs. Proprietary Data Integration Software

QUICK FACTS. Implementing Oracle Business Intelligence Applications 11g for a Fortune 500 Corporation

ZAP Business Intelligence Application for Microsoft Dynamics

Government Business Intelligence (BI): Solving Your Top 5 Reporting Challenges

The Importance of a Single Platform for Data Integration and Quality Management

Choosing the Right Master Data Management Solution for Your Organization

BUSINESSOBJECTS DATA INTEGRATOR

A discussion of information integration solutions November Deploying a Center of Excellence for data integration.

Presented By: Leah R. Smith, PMP. Ju ly, 2 011

Integrating Netezza into your existing IT landscape

Data Integration Checklist

CRM Compared. CSI s Guide to Comparing On-Premise and On-Demand. Tim Agersea Managing Director, Customer Systems Scottsdale, AZ

Three Fundamental Techniques To Maximize the Value of Your Enterprise Data

Cisco Data Preparation

Big Data Integration: A Buyer's Guide

Anatomy of a Decision

POLAR IT SERVICES. Business Intelligence Project Methodology

BUSINESSOBJECTS DATA INTEGRATOR

White Paper. Are SaaS and Cloud Computing Your Best Bets?

FREQUENTLY ASKED QUESTIONS ABOUT CLOUD ERP

The DoD and Open Source Software. An Oracle White Paper February 2009

Bringing agility to Business Intelligence Metadata as key to Agile Data Warehousing. 1 P a g e.

AnalytiX MappingManager Big Data Edition

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

The Recipe for Sarbanes-Oxley Compliance using Microsoft s SharePoint 2010 platform

Calculating ROI for Business Intelligence Solutions in Small and Mid-Sized Businesses

Outline Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications

Independent process platform

Discover, Cleanse, and Integrate Enterprise Data with SAP Data Services Software

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Making Data Work. Florida Department of Transportation October 24, 2014

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Build an effective data integration strategy to drive innovation

Business Intelligence

An Oracle White Paper March The Department of Defense and Open Source Software

SAP Thought Leadership Data Migration. Approaching the Unique Issues of Data Migration

Oracle Role Manager. An Oracle White Paper Updated June 2009

Business Intelligence and Analytics: Leveraging Information for Value Creation and Competitive Advantage

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Integrating Ingres in the Information System: An Open Source Approach

Integrating SAP and non-sap data for comprehensive Business Intelligence

Mission-Driven Big Data

THE BENEFITS AND RISKS OF CLOUD PLATFORMS

SAP Thought Leadership Business Intelligence IMPLEMENTING BUSINESS INTELLIGENCE STANDARDS SAVE MONEY AND IMPROVE BUSINESS INSIGHT

8 Critical Success Factors When Planning a CMS Data Migration

Tapping the benefits of business analytics and optimization

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

Data Virtualization A Potential Antidote for Big Data Growing Pains

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

Request for Information Page 1 of 9 Data Management Applications & Services

Making Business Intelligence Easy. Whitepaper Measuring data quality for successful Master Data Management

Business Analysis Standardization & Maturity

EAI vs. ETL: Drawing Boundaries for Data Integration

The Worksoft Suite. Automated Business Process Discovery & Validation ENSURING THE SUCCESS OF DIGITAL BUSINESS. Worksoft Differentiators

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

MDM and Data Warehousing Complement Each Other

W H I T E P A P E R E d u c a t i o n a t t h e C r o s s r o a d s o f B i g D a t a a n d C l o u d

See the Big Picture. Make Better Decisions. The Armanta Technology Advantage. Technology Whitepaper

INFO What are business processes? How are they related to information systems?

Oracle Data Integrator and Oracle Warehouse Builder Statement of Direction

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

Test Data Management for Security and Compliance

Data Migration for Legacy System Retirement

ORACLE DATA INTEGRATOR ENTEPRISE EDITION FOR BUSINESS INTELLIGENCE

BIG DATA IS MESSY PARTNER WITH SCALABLE

MANAGING USER DATA IN A DIGITAL WORLD

Making Business Intelligence Easy. White Paper Spreadsheet reporting within a BI framework

Next Generation Business Performance Management Solution

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

DATABASES AND ERP SELECTION: ORACLE VS SQL SERVER

WINDOWS AZURE DATA MANAGEMENT

EMC PERSPECTIVE. The Private Cloud for Healthcare Enables Coordinated Patient Care

How To Build A Data Management Solution For A Construction Quality Control Project

Realizing the Benefits of Data Modernization

DATA ANALYSIS: THE CORNERSTONE OF EFFECTIVE INTERNAL AUDITING. A CaseWare IDEA Research Report

WHITEPAPER. The Death of the Traditional ECM System. SharePoint and Office365 with Gimmal can Enable the Modern Productivity Platform

Your Software Quality is Our Business. INDEPENDENT VERIFICATION AND VALIDATION (IV&V) WHITE PAPER Prepared by Adnet, Inc.

Application Test Management and Quality Assurance

Importance of Data Governance. Vincent Deeney Solutions Architect iway Software

Enterprise Data Management for SAP. Gaining competitive advantage with holistic enterprise data management across the data lifecycle

how can I deliver better services to my customers and grow revenue?

solution brief September 2011 Can You Effectively Plan For The Migration And Management of Systems And Applications on Vblock Platforms?

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014

SOLUTION BRIEF CA ERwin Modeling. How can I understand, manage and govern complex data assets and improve business agility?

Data Analysis: The Cornerstone of Effective Internal Auditing. A CaseWare Analytics Research Report

Data Management Emerging Trends. Sourabh Mukherjee Data Management Practice Head, India Accenture

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

Using business intelligence to drive performance through accuracy in insight

WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP

8 Characteristics of a Successful Data Warehouse

Perforce Helix vs. ClearCase

A Hyperion System Overview. Hyperion System 9

How service-oriented architecture (SOA) impacts your IT infrastructure

Simplify and Automate IT

Transcription:

DIGGING DEEPER: What Really Matters in Data Integration Evaluations?

It s no surprise that when customers begin the daunting task of comparing data integration products, the similarities seem to outweigh the differences. The corporate web sites and other marketing material seem the same, the user interfaces look similar, the demos seem similar, and the feature/function lists use a lot of the same terms, leading to customer confusion. Often customers believe that decisions will become clearer after requested vendors fill out a matrix or spreadsheet. But these spreadsheets are filled out by professionals, and usually come back making the products look even more similar than before. Vendors have become very adept at positioning their products, making it hard for customers to uncover the important differences. Customers will then shift the focus to cost, assuming that to be a deciding factor. But typically customers focus disproportionally on the upfront cost of software license and initial development. They ignore the far greater costs over the life of the project (often five to eight years) and overall risk exposure of choosing one vendor over another. By pealing back the covers to understand the key underlying factors, both technical and non-technical, customers can begin to see large differences that can and should affect their decisions of which products to use. Digging Deeper on Key Technical Factors Change Management - In any large-scale data integration project, the only thing that is certain is that there will always be change; for example, changing user requirements, changing business rules, changing data definitions, new data sources, or upgraded software versions. The ultimate predictor of success of a data integration project is the plan for managing these changes through the project lifecycle. If not planned for appropriately, the cost of handling changes to the data integration jobs over the five to eight year lifecycle of a typical IT system will be the most expensive and time consuming part of the project. It will far outpace initial development or software license cost. Many products differ dramatically in their approach to helping customers manage change through a data integration project lifecycle. This important factor is often overlooked in evaluations, especially considering the significant impact it can have on overall cost and success. Often evaluators think they have considered this element in the nebulous concept of ease-of-use. But typically ease-of-use focuses on the factors in initial development and fails to consider the amount of work and rework that will have to go into dealing with the inevitable change that will come in the project. Architectural Foundation - Product architecture provides the foundation on which all data integration capabilities are delivered. Looking across the market, market leaders are coalescing around a metadata-centric approach and many niche products are still employing other more dated, compiled-code approaches. When comparing products, it is critical that customers spend the time to understand the impact that each product s underlying architecture and overall approach to data integration will have on their ability to deal with future change. The approach that the market leaders have taken is to provide a foundation of a central metadata repository. This approach has a number of significant advantages, but one of the most important is that this type of approach generally requires minimal, if any, custom coding to build the necessary jobs. In a metadata-centric architecture, the vast majority of development is done through a GUI with very limited need to go outside the product for extensions and customization. Jobs and business rule are generally automatically captured in the metadata, providing a central place for all to view standard business definitions. Another common approach is that of compiling code into an executable, rather than storing the definitions centrally in a metadata repository. This approach typically requires developers with strong coding skills and can be very time consuming and expensive to use and support, especially when dealing with updates and new releases. As discussed below, this approach also brings with it several other costly drawbacks. 2

Reusability Reusability is a central concept in managing change in a data integration project. As developers build objects, their ability to share and reuse them across the organization enhances their productivity, but after the system goes into production, reuse can bring even more value. Reuse enables developers the ability to propagate objects far and wide, and after deployment there are likely many jobs that reuse rules. Almost all data integration products will support reuse at some level, but it is critical to determine exactly how the products accomplish reuse and what happens when it is decided that something in those jobs needs to change. With some products, reusable objects are stored centrally in a metadata repository, allowing the developer to make the change in one place, save it back to the central repository, and then propagate the change automatically to all data flows that use that object. It is very easy and inexpensive for customers to manage change using products that store reusable object definitions centrally. With other products, however, the approach to reusability is more similar to copy/paste than to the concept of having a centralized definition. With these products, an object and its definition can be copied and pasted to be used within a new data flow, helping with productivity in initial development. When it is time to change the definition, though, there is not a central place to make the change. This leads to the time consuming and expensive process of determining which of the data flows use the original object. Each one will then need to be opened, changed as appropriate, retested, and redeployed. In practice, the rules are often complicated and many of them must be changed at the same time, making this an important cost factor. But when asked in an evaluation matrix to indicate if they support reuse, both of these approaches, although dramatically different in cost for customers, will elicit an affirmative answer from the vendor. In order to compare the capabilities, customers must look under the covers to understand exactly how reuse is accomplished. Impact Analysis Just as important as reuse, is the ability for developers in large-scale complex data integration environments to have a full picture of the impact of changes they are considering before they make them. Very often in complex data environments, changing one variable, business rule or object can have unintended consequences downstream. A window into the impact of changes on the overall environment dramatically reduces the amount of work required to manage change, and eliminates the risk of a small change causing a major problem. Products with open metadata repositories can provide users the ability to graphically see a data map that shows the impact of any single change to the entire system before they make it. These products will often integrate metadata from other relevant data sources, including modeling tools, business intelligence tools, and any other metadata to provide complete Impact Analysis of any change across the whole system. This capability dramatically reduces the cost of dealing with the impacts of changes across the system. Products without a central metadata cannot provide a similar capability to assess the impact of change before it is made. Users have to perform much more extensive testing of any and all changes before they are deployed, especially in mission critical systems where downtime is not an option. Digging Deeper into Other Key Technical Factors Establishing trust in the Data A key aspect of any successful data integration project is ensuring that the analysts, or end-users, using the data produced are satisfied with the accuracy, and aware of the original source of the information they are analyzing. If the analysts regularly question the accuracy of the data or lose faith in what was done to the raw data during the data integration process, they will stop making decisions based on the results and the project is certain to fall short of expectations or to be considered a failure. Especially within law enforcement and intelligence 3

Overall Comparison of Informatica vs. Twister applications, making sure the analysts can see the lineage of where the original data came from, as well as how and when the data was processed, is critical to establishing confidence in the system. This data lineage capability is enabled by a central metadata repository, and is another area of value a metadata-centric architecture enables. In addition, customers evaluating data integration products may be surprised to find that the phrase data quality means different things to different people. It is perhaps the most loaded phrase in the data integration space. When asked, every vendor will answer that they provide data quality capabilities, but these capabilities vary widely among vendors. Some vendors provide only the ability to do basic pass/fail integrity checks, and will claim this as providing integrated data quality. But the more common industry usage of the term data quality refers to providing capabilities such as parsing, cleansing, enhancement, matching and merging in order to de-duplicate and improve the accuracy of data. Customers need to dig into specifics when asking about data quality or they risk being led astray. Enterprise Data Access - It s not uncommon for successful data integration projects to quickly find themselves dealing with more data from more systems than originally planned. This is especially true in government agencies where new requirements often lead to a need to access complex legacy systems, proprietary business applications, or even unstructured content. When comparing data integration approaches, it is important that customers consider the ability to connect to more advanced data sources that could come into play in their environment. If asked, many vendors will provide a long list of data sources they can access. But deeper digging is needed to understand how they access these data sources. Most will provide out-of-the-box connectivity to common sources, but that is where the similarities end. Some products will invest in specialized connectors to connect directly to sources such as enterprise applications, messaging services, technology standards, mainframes and vertical industry standards. These connectors often handle all of the underlying communication and translation with complex data sources, enabling developers to focus on building jobs. Others products will say they can access complex sources, but they really mean that customers have to build and maintain their own connectors, which is a very expensive undertaking. Customers need to dig into exactly how access to enterprise data sources is provided. Scalability - In any data integration project that deals with large volumes of data, scalability should be an important point of evaluation. Vendors will provide benchmarks to illustrate their scalability, but these are performed under ideal situations in a controlled lab environment. The scalability each customer will see in their application depends heavily on the specific environmental variables in each situation. Examples of important variables to consider are the number and complexity of data sources, volume of data, load window, latency requirements and resource optimization. The best way for customers to test these is through a head-to-head comparison on their specific data in their specific environment. But customers can get a quick understanding of the relative differences by digging deeper into the scalability features each product brings to the table. Common features are those of grid computing, multi-threading, parallelism, partitioning and intelligent load balancing. Customers should note that the more scalability features a product has that can work in parallel, the more scalable it is likely to be. A product that has just one or two of the above features is not going to compare well against the market leading products. Market leading products also scale through more advanced capabilities such as support for 64-bit architecture, change data capture (CDC), and ELT (Extract, Load, and Transform) as opposed to ETL. Customers should dig deeper into what features allow each product to scale, but should realize the ultimate barometer will be the on-site test. 4

Digging Deeper into Non-Technical Factors: Total Cost of Ownership and Risk Evaluating Total Cost of Ownership When analyzing the total cost of a data integration project, the initial focus is often on the software license cost. While it is important to consider this cost in the equation, it is essential to realize that the costs of development, change management, administration and even hardware costs over the life of the project can become major factors and must be considered up front in the overall context to gain a complete picture of the total cost. Cost Factors in Initial Development Often the upfront focus of any new data integration project is on the initial development and associated costs. All data integration tools have some sort of a GUI that is used by developers to build a job. If the project requirements are very basic, then the differences in initial development effort with different tools will be hard to perceive. However, in environments with more complex transformation requirements, many different sources, intricate or inaccurate data, or security challenges, the difference will quickly become evident. The largest cost factors for initial development are: the difficulty for developers to customize the jobs, and the difficulty in integrating additional data integration capabilities, such as data cleansing. Cost Factors in Operations and Administration This is the area of total project cost that is most often underestimated during initial planning. The typical initial development may last 6-12 months, but the average lifetime for an IT system can ranges from five to eight years. If not planned for upfront, the cost of operations and administration over the lifetime of the system can far exceed the upfront development and software costs than most customers focus on in their cost models. The largest factors in this area are: determining how many resources are needed for change management, and administration of the deployment. As discussed above, products often differ significantly in their ability to deal with the inevitable change that data integration environments undergo, especially as they relate to the reusability of objects and assessing the impact of changes on the overall system. This should be considered heavily in any evaluation of total cost. Cost Factors for Hardware Two major factors to consider with hardware are, first, how much hardware is required to scale to the level needed and, second, whether or not existing hardware can be used or if new specific hardware needs to be procured. Scalability is always a factor in achieving the project requirements, but when comparing costs, many customers forget to consider the need to budget for additional hardware when considering a cheaper, less scalable solution. In addition, considering factors such as the ability to run across heterogeneous grids can allow customers to reuse existing hardware instead of buying new. Availability of Qualified Practitioners As noted above, much of the cost associated with a data integration project is in the cost of services to develop, maintain, change and advance the application being built, and to control costs. It is critical that customers choose a platform that will allow them to minimize the need for services in the future as change occurs. But the need for some level of outside services is often unavoidable, and it is common for customers to bring in expertise to help. Therefore, it is also critically important that customers consider the availability of trained expert resources for the product they are working with. The availability of qualified practitioners varies widely among products. In some product comparisons the difference can be as high as 1000-to-1 in terms of qualified resources in the market. As a rule of thumb, the larger the market share of a product, the more qualified certified practitioners there will be in the market. Still, customers will often consider a niche product in their evaluation, because they believe it may be cheaper, without regard to the fact that 5

there are likely very few people that know how to develop in or support that product. That leaves customers in a very tough position when expert assistance or support is needed, and sooner or later it will be needed. Corporate Focus & Product Vision This is one of the most overlooked aspects of product evaluations. Customers often do not focus in on the fact that not only are they buying today s feature set, but they are also investing in the feature set they will get for years to come under their maintenance contract. It is important for customers to understand the product s roadmap and vision, as well as to determine where this product fits in the company s future plans. One telling factor is the annual R&D spend on data integration. Market leading vendors spend over $100 million each year advancing their data integration product lines, and customers receive all of this benefit. Smaller, more niche products, cannot keep up with that level of investment. Conclusion While on the surface many of the products in the data integration market may look the same, by digging deeper into certain key areas, customers can uncover the differences. The ability to quickly and easily handle the inevitable change that occurs in data integration projects is a critical differentiating aspect. Digging deeper in areas such as establishing trust in the data, accessing complex data sources, and scalability will provide clear differentiation. In addition, customers need to consider the costs of ongoing operations, maintenance and administration over the system s lifecycle as, if not properly planned for, they can far outweigh upfront costs. Risk factors such as availability of resources and corporate focus will also help create important differences that should not be ignored. Digging deeper into these key factors will allow customers to uncover the key differences in these important factors that determine the overall comparative value in a data integration evaluation. Most importantly, these differences should always be weighted in context of their potential cost, their relative risk, and the resulting value they drive for customers. Focusing on the key differentiating factors will provide customers with the opportunity to truly determine which product best fit their needs. 6

2011 Qlarion, Inc. and/or its affiliates. All rights reserved. Qlarion does not guarantee the accuracy of any information presented in this document and there is no commitment, expressed or implied, on the part of Qlarion to update or otherwise amend this document. This publication consists of opinions and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Qlarion may include a discussion of related legal issues, Qlarion does not provide legal advice or services and its research should not be construed or used as such. About Qlarion Qlarion is a professional services firm focused on helping public sector and related organizations use business intelligence (BI) to effectively manage, access, and understand information, and make faster, more informed business decisions. Our expertise lies in developing solutions that achieve organizational transparency, financial management, performance management and contact center analytics. Qlarion clients include the legislative branch of the US government, Department of Education, the Centers for Medicare and Medicaid Services, US Army, Department of Energy, US Postal Service, Internal Revenue Service, Office of the Secretary of Defense, and Government Sponsored Enterprises (GSEs). Qlarion is a GSA schedule holder, GS-35F-0117V. For more information, visit our website at www.qlarion.com. Copyright 2011 Qlarion, Inc. All Rights Reserved.