DIGGING DEEPER: What Really Matters in Data Integration Evaluations?
It s no surprise that when customers begin the daunting task of comparing data integration products, the similarities seem to outweigh the differences. The corporate web sites and other marketing material seem the same, the user interfaces look similar, the demos seem similar, and the feature/function lists use a lot of the same terms, leading to customer confusion. Often customers believe that decisions will become clearer after requested vendors fill out a matrix or spreadsheet. But these spreadsheets are filled out by professionals, and usually come back making the products look even more similar than before. Vendors have become very adept at positioning their products, making it hard for customers to uncover the important differences. Customers will then shift the focus to cost, assuming that to be a deciding factor. But typically customers focus disproportionally on the upfront cost of software license and initial development. They ignore the far greater costs over the life of the project (often five to eight years) and overall risk exposure of choosing one vendor over another. By pealing back the covers to understand the key underlying factors, both technical and non-technical, customers can begin to see large differences that can and should affect their decisions of which products to use. Digging Deeper on Key Technical Factors Change Management - In any large-scale data integration project, the only thing that is certain is that there will always be change; for example, changing user requirements, changing business rules, changing data definitions, new data sources, or upgraded software versions. The ultimate predictor of success of a data integration project is the plan for managing these changes through the project lifecycle. If not planned for appropriately, the cost of handling changes to the data integration jobs over the five to eight year lifecycle of a typical IT system will be the most expensive and time consuming part of the project. It will far outpace initial development or software license cost. Many products differ dramatically in their approach to helping customers manage change through a data integration project lifecycle. This important factor is often overlooked in evaluations, especially considering the significant impact it can have on overall cost and success. Often evaluators think they have considered this element in the nebulous concept of ease-of-use. But typically ease-of-use focuses on the factors in initial development and fails to consider the amount of work and rework that will have to go into dealing with the inevitable change that will come in the project. Architectural Foundation - Product architecture provides the foundation on which all data integration capabilities are delivered. Looking across the market, market leaders are coalescing around a metadata-centric approach and many niche products are still employing other more dated, compiled-code approaches. When comparing products, it is critical that customers spend the time to understand the impact that each product s underlying architecture and overall approach to data integration will have on their ability to deal with future change. The approach that the market leaders have taken is to provide a foundation of a central metadata repository. This approach has a number of significant advantages, but one of the most important is that this type of approach generally requires minimal, if any, custom coding to build the necessary jobs. In a metadata-centric architecture, the vast majority of development is done through a GUI with very limited need to go outside the product for extensions and customization. Jobs and business rule are generally automatically captured in the metadata, providing a central place for all to view standard business definitions. Another common approach is that of compiling code into an executable, rather than storing the definitions centrally in a metadata repository. This approach typically requires developers with strong coding skills and can be very time consuming and expensive to use and support, especially when dealing with updates and new releases. As discussed below, this approach also brings with it several other costly drawbacks. 2
Reusability Reusability is a central concept in managing change in a data integration project. As developers build objects, their ability to share and reuse them across the organization enhances their productivity, but after the system goes into production, reuse can bring even more value. Reuse enables developers the ability to propagate objects far and wide, and after deployment there are likely many jobs that reuse rules. Almost all data integration products will support reuse at some level, but it is critical to determine exactly how the products accomplish reuse and what happens when it is decided that something in those jobs needs to change. With some products, reusable objects are stored centrally in a metadata repository, allowing the developer to make the change in one place, save it back to the central repository, and then propagate the change automatically to all data flows that use that object. It is very easy and inexpensive for customers to manage change using products that store reusable object definitions centrally. With other products, however, the approach to reusability is more similar to copy/paste than to the concept of having a centralized definition. With these products, an object and its definition can be copied and pasted to be used within a new data flow, helping with productivity in initial development. When it is time to change the definition, though, there is not a central place to make the change. This leads to the time consuming and expensive process of determining which of the data flows use the original object. Each one will then need to be opened, changed as appropriate, retested, and redeployed. In practice, the rules are often complicated and many of them must be changed at the same time, making this an important cost factor. But when asked in an evaluation matrix to indicate if they support reuse, both of these approaches, although dramatically different in cost for customers, will elicit an affirmative answer from the vendor. In order to compare the capabilities, customers must look under the covers to understand exactly how reuse is accomplished. Impact Analysis Just as important as reuse, is the ability for developers in large-scale complex data integration environments to have a full picture of the impact of changes they are considering before they make them. Very often in complex data environments, changing one variable, business rule or object can have unintended consequences downstream. A window into the impact of changes on the overall environment dramatically reduces the amount of work required to manage change, and eliminates the risk of a small change causing a major problem. Products with open metadata repositories can provide users the ability to graphically see a data map that shows the impact of any single change to the entire system before they make it. These products will often integrate metadata from other relevant data sources, including modeling tools, business intelligence tools, and any other metadata to provide complete Impact Analysis of any change across the whole system. This capability dramatically reduces the cost of dealing with the impacts of changes across the system. Products without a central metadata cannot provide a similar capability to assess the impact of change before it is made. Users have to perform much more extensive testing of any and all changes before they are deployed, especially in mission critical systems where downtime is not an option. Digging Deeper into Other Key Technical Factors Establishing trust in the Data A key aspect of any successful data integration project is ensuring that the analysts, or end-users, using the data produced are satisfied with the accuracy, and aware of the original source of the information they are analyzing. If the analysts regularly question the accuracy of the data or lose faith in what was done to the raw data during the data integration process, they will stop making decisions based on the results and the project is certain to fall short of expectations or to be considered a failure. Especially within law enforcement and intelligence 3
Overall Comparison of Informatica vs. Twister applications, making sure the analysts can see the lineage of where the original data came from, as well as how and when the data was processed, is critical to establishing confidence in the system. This data lineage capability is enabled by a central metadata repository, and is another area of value a metadata-centric architecture enables. In addition, customers evaluating data integration products may be surprised to find that the phrase data quality means different things to different people. It is perhaps the most loaded phrase in the data integration space. When asked, every vendor will answer that they provide data quality capabilities, but these capabilities vary widely among vendors. Some vendors provide only the ability to do basic pass/fail integrity checks, and will claim this as providing integrated data quality. But the more common industry usage of the term data quality refers to providing capabilities such as parsing, cleansing, enhancement, matching and merging in order to de-duplicate and improve the accuracy of data. Customers need to dig into specifics when asking about data quality or they risk being led astray. Enterprise Data Access - It s not uncommon for successful data integration projects to quickly find themselves dealing with more data from more systems than originally planned. This is especially true in government agencies where new requirements often lead to a need to access complex legacy systems, proprietary business applications, or even unstructured content. When comparing data integration approaches, it is important that customers consider the ability to connect to more advanced data sources that could come into play in their environment. If asked, many vendors will provide a long list of data sources they can access. But deeper digging is needed to understand how they access these data sources. Most will provide out-of-the-box connectivity to common sources, but that is where the similarities end. Some products will invest in specialized connectors to connect directly to sources such as enterprise applications, messaging services, technology standards, mainframes and vertical industry standards. These connectors often handle all of the underlying communication and translation with complex data sources, enabling developers to focus on building jobs. Others products will say they can access complex sources, but they really mean that customers have to build and maintain their own connectors, which is a very expensive undertaking. Customers need to dig into exactly how access to enterprise data sources is provided. Scalability - In any data integration project that deals with large volumes of data, scalability should be an important point of evaluation. Vendors will provide benchmarks to illustrate their scalability, but these are performed under ideal situations in a controlled lab environment. The scalability each customer will see in their application depends heavily on the specific environmental variables in each situation. Examples of important variables to consider are the number and complexity of data sources, volume of data, load window, latency requirements and resource optimization. The best way for customers to test these is through a head-to-head comparison on their specific data in their specific environment. But customers can get a quick understanding of the relative differences by digging deeper into the scalability features each product brings to the table. Common features are those of grid computing, multi-threading, parallelism, partitioning and intelligent load balancing. Customers should note that the more scalability features a product has that can work in parallel, the more scalable it is likely to be. A product that has just one or two of the above features is not going to compare well against the market leading products. Market leading products also scale through more advanced capabilities such as support for 64-bit architecture, change data capture (CDC), and ELT (Extract, Load, and Transform) as opposed to ETL. Customers should dig deeper into what features allow each product to scale, but should realize the ultimate barometer will be the on-site test. 4
Digging Deeper into Non-Technical Factors: Total Cost of Ownership and Risk Evaluating Total Cost of Ownership When analyzing the total cost of a data integration project, the initial focus is often on the software license cost. While it is important to consider this cost in the equation, it is essential to realize that the costs of development, change management, administration and even hardware costs over the life of the project can become major factors and must be considered up front in the overall context to gain a complete picture of the total cost. Cost Factors in Initial Development Often the upfront focus of any new data integration project is on the initial development and associated costs. All data integration tools have some sort of a GUI that is used by developers to build a job. If the project requirements are very basic, then the differences in initial development effort with different tools will be hard to perceive. However, in environments with more complex transformation requirements, many different sources, intricate or inaccurate data, or security challenges, the difference will quickly become evident. The largest cost factors for initial development are: the difficulty for developers to customize the jobs, and the difficulty in integrating additional data integration capabilities, such as data cleansing. Cost Factors in Operations and Administration This is the area of total project cost that is most often underestimated during initial planning. The typical initial development may last 6-12 months, but the average lifetime for an IT system can ranges from five to eight years. If not planned for upfront, the cost of operations and administration over the lifetime of the system can far exceed the upfront development and software costs than most customers focus on in their cost models. The largest factors in this area are: determining how many resources are needed for change management, and administration of the deployment. As discussed above, products often differ significantly in their ability to deal with the inevitable change that data integration environments undergo, especially as they relate to the reusability of objects and assessing the impact of changes on the overall system. This should be considered heavily in any evaluation of total cost. Cost Factors for Hardware Two major factors to consider with hardware are, first, how much hardware is required to scale to the level needed and, second, whether or not existing hardware can be used or if new specific hardware needs to be procured. Scalability is always a factor in achieving the project requirements, but when comparing costs, many customers forget to consider the need to budget for additional hardware when considering a cheaper, less scalable solution. In addition, considering factors such as the ability to run across heterogeneous grids can allow customers to reuse existing hardware instead of buying new. Availability of Qualified Practitioners As noted above, much of the cost associated with a data integration project is in the cost of services to develop, maintain, change and advance the application being built, and to control costs. It is critical that customers choose a platform that will allow them to minimize the need for services in the future as change occurs. But the need for some level of outside services is often unavoidable, and it is common for customers to bring in expertise to help. Therefore, it is also critically important that customers consider the availability of trained expert resources for the product they are working with. The availability of qualified practitioners varies widely among products. In some product comparisons the difference can be as high as 1000-to-1 in terms of qualified resources in the market. As a rule of thumb, the larger the market share of a product, the more qualified certified practitioners there will be in the market. Still, customers will often consider a niche product in their evaluation, because they believe it may be cheaper, without regard to the fact that 5
there are likely very few people that know how to develop in or support that product. That leaves customers in a very tough position when expert assistance or support is needed, and sooner or later it will be needed. Corporate Focus & Product Vision This is one of the most overlooked aspects of product evaluations. Customers often do not focus in on the fact that not only are they buying today s feature set, but they are also investing in the feature set they will get for years to come under their maintenance contract. It is important for customers to understand the product s roadmap and vision, as well as to determine where this product fits in the company s future plans. One telling factor is the annual R&D spend on data integration. Market leading vendors spend over $100 million each year advancing their data integration product lines, and customers receive all of this benefit. Smaller, more niche products, cannot keep up with that level of investment. Conclusion While on the surface many of the products in the data integration market may look the same, by digging deeper into certain key areas, customers can uncover the differences. The ability to quickly and easily handle the inevitable change that occurs in data integration projects is a critical differentiating aspect. Digging deeper in areas such as establishing trust in the data, accessing complex data sources, and scalability will provide clear differentiation. In addition, customers need to consider the costs of ongoing operations, maintenance and administration over the system s lifecycle as, if not properly planned for, they can far outweigh upfront costs. Risk factors such as availability of resources and corporate focus will also help create important differences that should not be ignored. Digging deeper into these key factors will allow customers to uncover the key differences in these important factors that determine the overall comparative value in a data integration evaluation. Most importantly, these differences should always be weighted in context of their potential cost, their relative risk, and the resulting value they drive for customers. Focusing on the key differentiating factors will provide customers with the opportunity to truly determine which product best fit their needs. 6
2011 Qlarion, Inc. and/or its affiliates. All rights reserved. Qlarion does not guarantee the accuracy of any information presented in this document and there is no commitment, expressed or implied, on the part of Qlarion to update or otherwise amend this document. This publication consists of opinions and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Qlarion may include a discussion of related legal issues, Qlarion does not provide legal advice or services and its research should not be construed or used as such. About Qlarion Qlarion is a professional services firm focused on helping public sector and related organizations use business intelligence (BI) to effectively manage, access, and understand information, and make faster, more informed business decisions. Our expertise lies in developing solutions that achieve organizational transparency, financial management, performance management and contact center analytics. Qlarion clients include the legislative branch of the US government, Department of Education, the Centers for Medicare and Medicaid Services, US Army, Department of Energy, US Postal Service, Internal Revenue Service, Office of the Secretary of Defense, and Government Sponsored Enterprises (GSEs). Qlarion is a GSA schedule holder, GS-35F-0117V. For more information, visit our website at www.qlarion.com. Copyright 2011 Qlarion, Inc. All Rights Reserved.