An IT Briefing produced by
By David Linthicum 2008 TechTarget BIO David Linthicum is the CEO of the Linthicum Group LLC, an SOA consultancy. He is the former CEO of Bridgewerx and former CTO of Mercator Software. He has held key technology management roles with a number of organizations, including CTO of SAGA Software, Mobil Oil, EDS, AT&T, and Ernst and Young. Linthicum is a well-known expert in the field of service-oriented architecture and has served as a long-time site expert at SearchWebServices.com. This IT Briefing is based on a DataDirect Technologies/TechTarget Webcast, Bringing Together Data Integration and SOA. This TechTarget IT Briefing covers the following topics: Introduction..................................... 1 Goals and Characteristics of a SOA...................... 1 The SOA Meta Model............................... 1 Data Abstraction................................ 1 Data Services......................................... 3 Implementing Data Services................................ 4 SOA and Data Services, in Practice...........................5 Service Data Objects................................... 6 XQuery.............................................. 6 Summary............................................... 7 Common Questions...................................... 8 Copyright 2008 David Linthicum. All Rights Reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws. About TechTarget IT Briefings TechTarget IT Briefings provide the pertinent information that senior-level IT executives and managers need to make educated purchasing decisions. Originating from our industry-leading Vendor Connection and Expert Webcasts, TechTarget-produced IT Briefings turn Webcasts into easy-to-follow technical briefs, similar to white papers. Design Copyright 2004 2008 TechTarget. All Rights Reserved. For inquiries and additional information, contact: Dennis Shiao Director of Product Management, Webcasts dshiao@techtarget.com
Bringing Together Data Integration and SOA Introduction Data integration is an important topic for serviceoriented architecture (SOA) because it is so often neglected. However, in SOA the data and information are as valuable, if not more valuable, than the services or processes that sit on top of the information. In fact, as people implement SOAs, most of the services they are setting up are data services. Data services can be viewed as the foundation of SOA. If data services are not set up correctly, the chances of having successful SOA are reduced. This document addresses these topics: Goals and characteristics of SOA The SOA Meta model, with a detailed discussion of the data abstraction and data services layers Practical implementation advice, specifically about Service Data Objects (SDO) and XQuery Goals and Characteristics of SOA Why do people implement SOA? The foremost reason is agility, and a data services layer provides that agility. A data services layer decouples the services from the physical underlying database. As SOAs are built, data services layers, abstraction layers, and data abstraction software and standards will be indicated for most problem domains. Coupling into the physical database is not desirable because that database can change, it may not be structured correctly, or security issues may exist that must be dealt with through data services. SOA has other characteristics, including functional reusability, and independent change management, which allows a focus on configuration rather than programming. These will be discussed in the context of the data services layer, because the data services layer is about configuring the physical schema away from the logical or abstract schema. Other characteristics of SOA include interoperability instead of pointto-point integration and orchestration rather than 1 IT Briefing: integration. SOA allows you to create solutions by using an orchestration layer versus building them from the ground up through development efforts. The SOA Meta Model The SOA Meta model is illustrated in Figure 1. It shows that SOA s basis is information, whether it comes from data, databases, the Internet, new services, or legacy services. SOA must deal with information as it relates to a data abstraction layer, a data services layer, a true services layer, a process and orchestration layer, and a monitoring and event management layer. Information also means that subsystems are necessary to provide key features such as governance which controls access to the services and the information and security, so that information is not exposed to those without permission to see it. Figure 2 illustrates the basic principles. Information flows up into a data abstraction layer that in turn goes up into a data services layer. This is a simple principle. Those who understand how databases work and how information is abstracted out of databases will not find much new here. Information resides in a given data repository. It could be data storage such as Oracle or Sybase. It could be an object-oriented database, a hierarchical database, or a relational database. It could be another information system such as SAP, PeopleSoft, or Siebel. It makes no difference. The information is abstracted into a data abstraction layer. Data Abstraction The data abstraction layer is key because it allows us to remap the existing physical instantiations of the information into something that is more consumable and logically grouped. Customer information in multiple databases can be grouped around a single customer entity, or order information in multiple databases can be grouped around a single order entity. The data abstraction layer is able to account for the differences between the logical groupings of information and the actual physical instantiations of the information.
Figure 1 Figure 2 2 IT Briefing:
Figure 3 illustrates the data abstraction layer in more detail. The data abstraction layer is nothing more than a virtual schema that exists in software. Although you can write custom standards, Composite Software and other vendors have solutions that can remap many disparate systems into something that makes more logical sense. This decouples the physical information schema from the data services layer to this data abstraction layer. The data abstraction layer provides the agility that is one of the goals of SOA, the ability to change the information schema as needs change without having to change the underlying databases. Some businesses are not able to change back-end information and databases. They may not own the system, it may be poorly designed, or it may already be coupled to other mission-critical systems. Assuming that the information can be abstracted, a data abstraction layer can be created that eliminates the need to address the underlying problems with the existing database systems. In essence, it creates a virtual database in memory for use by the SOA. Most SOAs need a data abstraction layer now or will need one in the future. Consider data abstraction as key technology and the standards and the products in this area as key enabling standards that make SOA more viable and much more valuable. These standards restrict the volatility of the information to its own key domain. Data Services Figure 2 also illustrates that the data services layer is the next layer moving up the stack. Data services are about re-exposing data as services. The data can be either the physical instances of data as stored in the back-end systems or the virtual instance of data logically grouped through the schema in the data abstraction layer. Web services are the key enabling technology in this space. The service can be CORBA-based, J2EE-based, or proprietary. Within SOA, services are created at different levels of granularity. They can be orchestrated to provide process-based agility, just as the data abstraction layer provides the database agility. Once the abstraction layer is created, those logical groupings of data attributes if you will are assigned to services. Data services are typically information bound to behaviors, for example process customer, update inventory, or check credit. These services can be seen as verbs within the system that link back to the abstracted database, which in turn is linked to the physical database. Figure 3 3 IT Briefing:
A data services layer supports dealing with accessed information as a service. Query languages or other ways to communicate with back-end systems no longer need to be embedded within the services. Data services can be used to create composites within the systems. Examples are creating portals using enabling technology such as AJAX, externalizing information to user interfaces, using information with another service to create composite applications, or simply embedding data services to bring data service into remote application for simple information-to-information integration scenarios. Implementing Data Services A critical step in creating data services is to understand the semantics and data services in each domain. This is a step that everybody skips. People hear SOA and they go directly to the ESB layer, the application server layer, and the integration layer. The reality is that, without a clear understanding of the information, how it works within the abstraction layer and within the data services layer, the SOA implementation will fail. SOA will not provide the value it has the potential to offer. SOAs are implemented because they offer a strategic advantage for the business. They can make money but they require some sort of return on investment (ROI). It is extremely important to identify all application semantics that exist in the specific domain. Then the data can be dealt with properly. Defining the data services layer is also a critical step. To do that, data abstraction, metadata mapping, and ultimately selecting the appropriate technology are the three key steps. Figure 4 illustrates how data services are defined. It shows the legacy metadata that feeds into the processes. It shows the steps to be performed: metadata analysis, data abstraction layer definition, then finally data services definition based on those systems. Fundamentally we must understand where the information resides, who owns it, what it is, how it is technically structured, as well as any access, policy management, or security issues related to that information. Based on that understanding, we create the raw metadata, the information that exists within the SOA. In essence, this is the candidate information to be dealt with within the architecture. Out of that we create the data abstraction layer, which links down into the physical system, creating core data entities. Finally, we create the data services that are built on Figure 4 4 IT Briefing:
top of the data abstraction layers. The artifacts generated are important because they feed into the other layers of SOA. The data services fit into the services layer that fits into the processes layer that fits into the management layer. But a foundation of information is required to move through the process. Defining data services places a layer of software or middleware between the physical distributed databases and the applications or services that access the data. Data services layers connect to back-end systems using available interfaces to the physical system, such as CLI, ODBC, JDBC, or proprietary interfaces, but for all practical purposes they create a functional, virtual database. The application or services leverage the virtual database in the data abstraction layer to access required information. Data services software handles the collection and distribution of the data as needed into the physical instances of the data. XQuery-enabled software and middleware are critical components, and will be discussed later. The way data services are designed depends on the requirements of the SOA. As a rule they are collections of data around particular services that are reusable across different systems. They are actions. Process customer, check credit, replenish inventory, replenish information within the accounting system, and post a check, are all examples of data services. They are typically information-rich. The way data is bound to the behavior is defined, the data is abstracted into services, and those are abstracted into orchestration layers. Two factors are critical to success. First, avoid creating data services that are too fine-grained. For example, if a data service operates on only one data element, its scope is so small that composites of hundreds of them are required. This can cause performance problems. If the composite is communicating with too many services, it takes up too much processor time and too much network bandwidth, and the SOA is more complex than it should be. Similarly, avoid creating services that are too coarsegrained. This means that they contain too much functionality; they resemble small applications. We have a tendency to do this because we are used to writing applications. Such a service will not have much functional value, because it does so much it cannot be mixed and matched with other things to create a final target solution. It is a target solution in itself. The key is to find a balance between too coarse-grained and too fine-grained, to create a data service that will have value within the SOA. SOA and Data Services in Practice When you focus on the data management and data services layer, it is easy to see that these components take on the role of dealing with all existing IT assets. They manage interaction with the data management layer and representation at the data services layer of data, semantics, and behavior. In essence, they ensure proper communication using the necessary interfaces and protocols. They manage data movement in and out of the source and target systems on behalf of the services and they also provide advanced capabilities such as virtual database representation from existing physical instances and distributed query capabilities. Typically the infrastructure must be considered here as well. When exposing data services, one critical success factor is the communication capabilities of the back-end system. Another critical factor is understanding that communicating with the abstract layer is not database-bound or processor-bound in terms of performance. The final success factor is to make sure that the elements are manageable. In other words, since an abstraction layer exists between the physical database and the data services, a configuration layer is required to enable changes. The configuration layer also contains optimization features such as distributed query engines, caching systems, and other features that make going back and forth to the database system less taxing on the core environment. People often miss the performance elements of SOA. They write a Web service, which is in essence a data service, that communicates to the back-end system in such a way that the database operations are very inefficient. Those who fail to employ caching, distributed query systems, and other features that provide better performance within the SOA find that performance suffers, the database is saturated very quickly, and they have to go back and fix those issues. Service Data Objects Enabling technologies include Service Data Objects (SDO). SDO is a technology that allows heterogeneous databases to be accessed in a uniform way. This is in essence an abstraction layer that is created 5 IT Briefing:
in a realm of standards. SDO is like other data objects in the past, such as those from ActiveX, CORBA, and Java. SDO is much the same approach and the same technology; it simply uses a new standard. The SDO layer provides the ability to deal with multiple data sources or heterogeneous databases using a single interface that is consistent from service to service. While you can write a custom SDO layer, most will purchase technology built around SDO from a vendor. The beauty of using SDO is that it not only provides an abstraction layer but also one with a sound footing in a standard that has attained a broad level of acceptance. SDO is a fairly new technology and some users have experienced performance problems. Some vendors have not optimized their products, or the products are implemented incorrectly, or the standards are not quite stable. As the SDO standard changes over time, backward compatibility issues may occur as well. One caution when thinking about leveraging standards in SOA is that well over 100 SOA-related standards exist today. Some are the WS-* standards, some are proprietary standards, and the reality is that not all will survive. In many instances these standards have been created by groups of vendors who have gotten together to create a marketing effort around their product set. If one catches on, but users have selected a different standard, they could end up in the situation of someone who buys a product from a company that then goes out of business. XQuery XQuery is another standard to consider. XQuery provides a mechanism to extract and manipulate data from XML documents or any data source that can be viewed by XML, such as relational databases, Microsoft Office documents, and others. XQuery uses XPath expression syntax to address specific parts of the XML document with something like a patternmatching mechanism. The language also provides syntax allowing new XML documents to be constructed. XQuery is based on a tree-structured model of the information content, very much like an XML document hierarchy containing seven kinds of nodes: Document Elements Attributes Text Comments Processing instructions Name spaces XQuery is a very valuable, very stable standard. People have been using it for longer than the other Web services standards. It seems to work well especially when dealing with XML databases or relational databases. XQuery basically represents queries as XML hierarchies. The benefit of XQuery is standard simplicity as well as broad adoption by product sets, such as BEA with their technology set. Those who are bound to XML in SOA will use XQuery because it deals with XML and how it is persisted, presented, and processed. It is related to XSLT, XPath, and all the traditional XML processing standards that have been in use for some time, with all their limitations. They have a tendency to be text-based. Therefore, performance can be an issue. XQuery also has a tendency toward poor performance, because it does a lot of parsing and pattern matching. Definitely consider XQuery to leverage within the data services layer, but run benchmark tests to verify that it meets performance requirements. That is essential. XQuery can fail as a query mechanism in some projects, especially intense transactional projects with a lot of data coming across the line. Also, it tends to be implemented differently from product to product. XQuerys written in one product may not work in another product. XQuery 1.0 does not include features for updating XML documents or databases. That is done by XUpdate. XQuery 1.0 also lacks full text-search capability. These features are both under active development in subsequent versions of the language. XQuery is a language as well as a standard. It is a programming language that can express XML-to-XML data transformations with the following features: Logical and physical data independence Declarative in nature High level Side-effect free Strongly typed Its main advantage is its simplicity, but it is a programming language, not a configuration layer. In very 6 IT Briefing:
complex systems with a lot of heterogeneous information, a great deal of programming may be required to create a common data abstraction layer. The value of agility may be lost, because every time a change is made within the SOA, a development team has to reinvent the way they leverage XQuery to abstract the information. Summary Data services are a necessity for SOA. Without data services, SOAs are not possible except when dealing with the most rudimentary of embedded systems and those are probably not a good application of SOA. Information is a part of any SOA. The use of data services technologies makes managing both data and metadata possible. A number of options exist for data services technologies. The first is custom development. This should be done only in extreme cases because of the wide variety of technology and standards available. The second is software that provides not only data abstraction but also the ability to manage that abstraction layer through a data services layer manager with features like distributed query capabilities. The third is to use standards such as SDO and XQuery. Standards tend to put the onus to make them successful on the customer rather than the product vendor. Standards also have a tendency to be poor performers if they are not implemented correctly. Without using data services technology, the data layer becomes the biggest limitation to agility. If people get into SOA because they want an agile environment and then forget to create an agility layer (a configuration layer between the physical data and the data services), then when the data services change, they need to change the back-end systems. In many organizations, especially the Global 2000 and the government, that can be a process that takes months, not days or weeks. The physical infrastructure has to be changed and that change could break a number of different applications coupled to the same piece of information. When creating a data services layer, the abstraction layer is key because of its agility aspects. Using a data services layer means that most changes to the physical database will not necessarily affect the existing processes and services. Data volatility is restricted to its own domain. Data changes are implemented in the physical layer. From the perspective of the data services, only a change in the mapping system is required. If an underlying data format changes, after a corresponding change is made to the mapping system, the data is re-abstracted, so it looks the same to the services, the processes, and the orchestration that is consuming those data services. Only a small change is required, saving both money and time. Most physical data layers lack logical order for use within SOA. Data services layers allow re-representing data and metadata for the particular requirements of the SOA without having to change the back-end databases and applications. Data services are a key component for SOAs. If they are not designed and implemented, the SOA will not provide the value of agility and thus will not provide the ROI. Today many are not thinking about information, they are doing nothing more than re-abstracting existing APIs and existing interfaces and making them look like services. That has some value but not a tremendous amount. SOA is, as the name says, an architecture. The value of SOA is that it is made up of different layers that are essentially configuration layers. If the business changes, or the processing needs change, SOA minimizes the impact to the core architecture. Change can be a configuration exercise instead of a programming exercise. That is the value of SOA, but without the data services that value will not be realized. For those embarking on SOA, the best pilot project is to create a data services layer, make sure that is operating effectively, and work up from there. 7 IT Briefing:
Common Questions Question: How frequently do you to run into someone who has done anything significant on the data abstraction side before they have entered into the integration Web services creation portion of SOA? Answer: It is very uncommon. Typically I get called in when problems arise around that. A typical call is, We are building an SOA, we just bought an ESB, and we are not getting the value out of it that we thought we should. I ask about the issues and am told, Every time we need to change something within the architecture, we have to go to the back-end system to make changes and that is causing a significant delay in the way we deploy our architecture. We are getting no significant gain. Then I talk to them about constructing a data services layer. I walk them through the process of working from the bottom up, from the foundation of the SOA up to the deployment of services and processes. Today people say, I need an ESB or I need SOA, and immediately vendors are throwing orchestration layers, governance layers, and ESBs at them. These are valuable technologies but in the end this is an architecture. People have to think strategically about how all the components work together. We are suggesting a return to the fundamentals, because ultimately paying attention to fundamentals is what will make SOA a success. Question: It has always seemed to me that the IT world has an extreme separation of concerns. Yet with SOA, we keep hearing that you have to have an idea of the big picture. For instance, you cannot really expect performance to work with these integration projects if you do not accounted for the data. Is it a fundamental dictum for any architect considering SOA that you have to think broadly rather than just looking at SOA as a narrow application-specific realm? Answer: You have to think holistically. At the end of the day, SOA is a systemic change in the way we design architecture. That changes the game. How will SOA fit in your enterprise and work with your existing systems? Then you execute tactically. Once you know holistically where you need to go, you decide on the steps you need to reach that goal and select the key enabling technologies to make it happen. I do not think people are thinking about that right now. About TechTarget We deliver the information IT pros need to be successful. TechTarget publishes targeted media that address your need for information and resources. Our network of technology-specific Web sites gives enterprise IT professionals access to experts and peers, original content, and links to relevant information from across the Internet. Our events give you access to vendor-neutral, expert commentary and advice on the issues and challenges you face daily. Our magazines give you in-depth analysis and guidance on the critical IT decisions you face. Practical technical advice and expert insights are distributed via specialized e-newsletters, video TechTalks, podcasts, blogs, and wikis. Our Webcasts allow IT pros to ask questions of technical experts. What makes TechTarget unique? TechTarget is squarely focused on the enterprise IT space. Our team of editors and network of industry experts provide the richest, most relevant content to IT professionals. We leverage the immediacy of the Web, the networking and face-to-face opportunities of events, the expert interaction of Webcasts, the laser-targeting of e-newsletters, and the richness and depth of our print media to create compelling and actionable information for enterprise IT professionals. DataDirect_07_2008_0001 8 IT Briefing: