white paper Approaching SaaS Integration with Data Integration Best Practices and Technology David S. Linthicum Introduction Many new and existing business processes and information continue to move outside of the firewall to on-demand platforms such as Software-as-a-Service (SaaS). For these new Web-delivered applications to provide the value required by the business, we need to create integration services between SaaS-delivered and on-premise enterprise applications in order to maintain data integrity, quality, and value. Approaching SaaS-to-enterprise integration is really a matter of making choices choices around the integration approaches to leverage, choices around architectural patterns, choices around the location of the integration engine, and, finally, choices around the enabling technology to leverage. Software as a service is forecast to have a compound annual growth rate of 22.1% through 2011 for the aggregate enterprise application software markets, more than double the growth rate for total enterprise software. Gartner Group In this paper we ll walk you through the core issues around SaaS-to-enterprise integration, including the various components of integration, the architectural approaches, and then the right integration solution for your enterprise. This paper is designed to provide you with core information, or, the what of SaaS integration with data integration, as well as the process of getting to a well-integrated state, or the how. Defining the Problem The growth of SaaS means that more information and business processes are residing offpremise on SaaS platforms. Thus, there is a clear need to integrate this information back into core enterprise systems. While this seems obvious to those who manage enterprise systems, the approaches and technology that you can leverage here varies widely in what they do, and how they do it. Indeed, many have a tendency to leverage more traditional application integration technology and quickly discover that they are square pegs in round holes. Or, worse, they have a tendency to attempt SaaS-to-on-premise data integration through custom programming projects that lead to thousands of hours in ongoing maintenance, and increasing the complexity of the integration solution.
Core to the problem is that SaaS-delivered systems have their own special requirements for integration, including: Dynamic nature of the SaaS interfaces that constantly change. Dynamic nature of the metadata native to a SaaS provider, such as Salesforce.com. Managing assets that exist outside of the firewall, but are nonetheless mission critical. Massive amounts of information that need to move between SaaS and on-premise systems daily, and the need to maintain data quality and data integrity. Defining the Value Defining the value of SaaS-to-enterprise integration is largely dependent upon the business and specific problem domain. However, there are general patterns to consider, including the value of data quality, and the operational value of synchronized data sets, from SaaS to onpremise (see Figure 1). Figure 1: SaaS to on-premise data integration. The approach to defining the value is around understanding the cost of the inefficiencies due to the lack of integration. Let s walk through a quick example. Say there is a mid-sized paper company that recently became a Salesforce.com CRM customer. They currently leverage an on-premise custom system that uses an Oracle database to track inventory and sales. The use of the Salesforce.com system provides the company with a significant value in terms of customer and sales management. However, the information that persists within the Salesforce.com system is somewhat redundant with the information stored within their on-premise legacy system (e.g., customer data). [ 2 ]
Thus, we can say that the as is state suffers from costly inefficiencies including the need to enter and maintain data in two different locations, which means additional costs. The largest issue, and largest cost, is the loss of data quality which is endemic when considering this kind of dual operation. This includes data integrity issues which are a natural occurrence when data is updated using different procedures, and there is no active synchronization between the SaaS and on-premise systems. Defining the to be state, we can layer in data synchronization technology between the source, meaning Salesforce.com, and the target, meaning the existing legacy system that leverages Oracle. This technology is able to provide automatic mediation of the differences between the two systems, including application semantics, security, interfaces, protocols, and native data formats. The end result is that information within the SaaS-delivered systems and the legacy systems are now in sync. Meaning that customers entered into the CRM system will also exist in the legacy systems, and vice versa, along with other operational data such as inventory, items sold, etc. The to be removes data quality and data integrity issues, thus saving the company thousands of dollars a month and producing a quick ROI from the integration technology that s leveraged. [ 3 ]
Integration Concepts Now that we understand the general notions and the value, let s take a much deeper dive into integration concepts, or, things you should understand conceptually before considering data integration and an integration solution. They include: Connectivity, semantic mediation, data mediation, data migration, security, integrity, and governance. Connectivity refers to the ability of the integration engine to engage with both the source and target systems using whatever native interfaces are available. This means leveraging the interface that each provides, which could vary from standards-based interfaces, such as Web services, to older more proprietary interfaces. Connectivity subsystems are responsible for the externalization of the correct information, as needed, and the internalization of information once processed by the integration engine. Semantic mediation refers to the ability to account for the differences between application semantics between two or more systems. One can consider semantics as the way we represent information within information systems. In fact you can consider semantics as the meaning of data and the use of data, simply put, as defined within the context of integration as a mapping between an object/attribute, represented and stored in an information system. Data mediation converts data from a source data format into destination data format. Coupled with semantic mediation, data mediation, or data transformation, is the process of converting data from one native format on the source system, to another data format for the target system. Data migration is the process of transferring data between storage types, formats, or systems. Data migration typically means that the data in the old system is mapped to the new systems, typically leveraging data extraction and data loading technology. Security, in the context of integration, means the ability to insure that information extracted from the source systems and placed within the target systems is done securely. The integration technology must leverage the native security systems of the source and target systems, mediate the differences, and provide the ability to transport the information securely between the systems. New concepts such as identity management come into play here. Integrity ensures data is complete (whole), consistent, or correct. Thus, integrity is the condition in which data is identically maintained during integration operations, such as synchronization of data between on-premise and SaaS-based systems. Governance refers to the processes and technology that surround a system or systems, which control how those systems are accessed. In the world of integration, governance is about managing changes to core information resources, including data semantics and structure, as well as interfaces. No matter the integration solution you leverage, it needs to address each notion as listed above. [ 4 ]
Data Integration for SaaS Best Practices So, what are the best practices when it comes to data integration from on-premise to SaaSbased systems? It really comes down to understanding, definition, design, implementation, and testing (see Figure 2). Figure 2: Best practices in data integration for SaaS. Understanding your existing problem domain means defining the metadata that is native within the source system (say Salesforce.com) and the target system (say your on premise inventory system). By doing this we have a complete semantic understanding of the problem domain, in this case, the source or the target systems. Keep in mind that there could be many systems in the problem domain, and the practice here is no different, although the resulting integration solution will be more complex. Definition refers to the process of taking the information culled during the previous step, and defining it at a high level, including what the information represents, ownership, and physical attributes. This provides us with a better understanding of the data we re dealing with, beyond the simple metadata, and thus insuring that we ll get the integration designed correctly the first time. Design your integration solution around the movement of data from one point to another, accounting for the differences in the semantics using the underlying data transformation and mediation layer, mapping one schema from the source, to the schema of the target. This defines how the data is to be extracted from one system or systems, transformed so it appears to be native, and then updated in the target system or systems. Typically, this design is done within the integration technology using visual mapping technology. In addition, we need to consider both security and governance, and need to consider these concepts within the design of the data integration solution. [ 5 ]
Implementation refers to actually implementing the data integration solution within the selected technology. This means connecting to the source and target systems, implementing the integration flows as designed within the previous step, and then other steps required to get the data integration solution up-and-running. Testing refers to assuring that the integration is properly designed and implemented, and that the data synchronizes properly between the systems. This means looking at known test data within the source system, say Saleforce.com, and monitoring how the information flows to the target system. We need to insure that the data mediation mechanisms function correctly, as well as review overall performance, durability, security, and connectivity. Leveraging the Integration-as-a-Service Option Integration-as-a-service is an approach to integration where the core integration technology that performs integration functions, such as semantic mediation, data migration, connectivity, and other core integration facilities, is delivered from the Web as a service. There are core advantages to this approach. Integration-as-a-service has a tendency to be more cost effective. Since the Integration technology is not on-premise, there is no need to purchase hardware or software to support the data integration solution. This reduces the need for capital, and is able to provide a quick return on investment (ROI) since the value, as defined above, is quickly realized. When leveraging integration-as-a-service, the speed in getting the integration solution upand-running can be days, not months, since there is no need to purchase, install, and test hardware and software. This not only provides a speed advantage when getting the initial solution up-and-running, but an agility advantage as well, since the solution is easily expanded or contracted. Integration-as-a-service provides better adaptability to the changing nature of a SaaSdelivered application, such as Salesforce.com. Thus, when metadata and interfaces change over time, the integration-as-a-service solution delivered from a single multi-tenant platform is updated once, and the interface and metadata changes automatically propagate to all integration-as-a-service users. No need to download and configure updates. Finally, the fact that the integration-as-a-service provider exists on the platform of the Web, along with the SaaS-delivered applications such as Salesforce.com, means that the firewall and other issues that effect on-premise systems are not an issue when considering integration-as-a-service. [ 6 ]
Informatica is the leading provider of data integration software, with the broadest access to enterprise data sources of any other integration vendor. Our partnership with Informatica represents a very strategic relationship for our customers ensuring they can manage and share all of their enterprise data and information on demand. Marc Benioff, Chairman and CEO, Salesforce.com Leveraging Informatica On-Demand Informatica offers a set of innovative on-demand data integration solutions called Informatica On-Demand Services. This is a group of easy to use, Software as a Service offerings which allows you to integrate data in SaaS applications, seamlessly and securely across the Internet, with data in your on-premise applications and systems. The Informatica on-demand service is a subscription-based integration service that provides all of the features and functions listed in this paper, using an on-demand or an as-a-service delivery model. This means the integration service is remotely hosted, and thus provides you with the benefit of not having to purchase or host software. There are a few key benefits to leveraging this technology: Rapid development and deployment with zero maintenance of the integration technology. Automatically upgraded and continuously enhanced by vendor Proven SaaS integration solutions, such as integration with Salesforce.com, meaning that the connections and the metadata understanding are provided for you. Proven data transfer and translation technology, meaning that core integration services such as connectivity and semantic mediation are built into the technology. Self-service 30 day free trial allows you to easily put the technology to the test and validate vendor claims Integration for SaaS The use of SaaS, as well as cloud computing, will continue to expand exponentially over the next several years. While the ability to utilize an enterprise application that you can leverage via a subscription just makes logical sense when considering the cost of an on-premise system, the need for integration is not always apparent until you are in an operational state and data quality becomes a core concern. Indeed, this could eliminate any value obtained through the use of SaaS. Data integration between a SaaS-delivered system and an existing enterprise system is clearly the right approach. However, your approach including best practices is just as important as what you do. Thus, you need to focus on the understanding, the design, and, most importantly, the use of the correct technology for the job. [ 7 ]
About the Author David Linthicum (Dave) is an internationally known Enterprise Application Integration (EAI), Service Oriented Architecture (SOA), and cloud computing expert. In his career, Dave has formed or enhanced many of the ideas behind modern distributed computing including EAI, B2B Application Integration, and SOA, approaches and technologies in wide use today. Currently, Dave is the founder of David S. Linthicum, LLC, a consulting organization dedicated to excellence in SOA product development, SOA implementation, corporate SOA strategy, and leveraging the next generation Web (Web 2.0). Dave is the former CEO of BRIDGEWERX, former CTO of Mercator Software, and has held key technology management roles with a number of organizations including CTO of SAGA Software, Mobil Oil, EDS, AT&T, and Ernst and Young. Dave is on the board of directors serving Bondmart. com, and provides advisory services for several venture capital organizations and key technology companies. In addition, Dave was an associate professor of computer science for eight years, and continues to lecture at major technical colleges and universities including the University of Virginia, Arizona State University, and the University of Wisconsin. Dave keynotes at many leading technology conferences on application integration, SOA, Web 2.0, cloud computing, and enterprise architecture, and has appeared on a number of TV and radio shows as a computing expert. 2009 David S. Linthicum, LLC 6985 (06/03/2009) [ 8 ]