Enterprise Application Integration (Middleware) Cesare Pautasso Computer Science Department Swiss Federal Institute of Technology (ETHZ) pautasso@inf.ethz.ch http://www.iks.inf.ethz.ch/ EAI Course Administration Lecture: Tuesdays 13.15-15:00 (HRS F5) Discussion and Exercises: Thursdays 10:10-11:55 (HRS F5) Web site http://www.iks.inf.ethz.ch/education/ws04/eai Getting in touch with us: Cesare Pautasso HRS G7 pautasso@inf 01 632 0879 Thomas Heinis HRS G8 heinist@inf 01 632 4693 Daniel Jönsson HRS G12 jodaniel@inf 01 632 7259 Practical exercises: Designing, building and programming a composite Web service Exercise is mandatory Course material: Script of the lecture (download from the course website) Book (recommended) Exam: Oral exam, 15 minutes IKS, ETH Zürich. 2
EAI Text Book Available from Frau Schuemperlin, HRS G10 (01 632 4531) 50.- CHF IKS, ETH Zürich. 3 Goals of the EAI Course The course aims at introducing and discussing in depth several important topics related to distributed information systems in general and enterprise application integration in particular. In many ways, the course explores the synergy between information and communication systems and how this synergy can be best exploited for EAI and B2B integration. The course is more practical than theoretical. The objective is to give a clear overview of the problems and their nature, how can they be solved, and how this solutions are implemented in practice. While we will spend some time understanding the theoretical underpinnings of the ideas discussed, the emphasis will be on how these ideas can be implemented in practice. An important part of the course will be devoted to how technology has evolved and the reason why existing systems are the way they are. You will have the opportunity to program a relatively complex integrated information system. Without taking part in the exercises you will not be allowed to take the exam. The lectures, discussions and presentations form an integral part of the course. If you take the time to learn from them, you will get much more out of this course. Take advantage of the opportunity! IKS, ETH Zürich. 4
RESOURCE INTEGRATION APP ACCESS CLIENT Motivation for the EAI Course The architecture of the information systems we use is becoming increasingly complex. Communications Demand Components Today s systems are no The demands on the System integration is the longer isolated. Communications play a key role growing: centralized of the IT world. existing systems keep most challenging aspect in their use. New access solutions are not always Programming today is to methods also change the feasible; cooperation combine already existing, nature of the problems among systems is a must. heterogeneous systems. The access methods, the capabilities, the goals, and the available technology is continuously changing. What can we learn that will remain valuable in the years to come? One example: 70-90 % of the software costs are maintenance costs. Using the right abstractions helps! Databases used as services remove about 40 % of the code of commercial applications Another example: software reuse is truly efficient and makes economic sense at a large granularity. How can we build systems that can be tailored to the user needs and yet are applicable in a wide range of areas and environments? IKS, ETH Zürich. 5 web client api business object wrapper wap client ACCESS TIER api api business object INTEGRATION TIER wrapper java client business object wrapper db db db CLIENT TIER APP TIER RESOURCE TIER WWW and WAP browsers specialized clients (Java,.NET) Eclipse RCP, SMS... HTML, SOAP, XML WWW servers, J2EE, CGI JAVA Servlets API MOM, HTML, IIOP, RMI-IIOP, SOAP, XML TP-Monitors, stored procedures programs, scripts, beans MOM, IIOP, RMI-IIOP, XML system federations, filters object monitors, MOM ODBC, JDBC, RPC, MOM, IIOP, RMI-IIOP databases, multi-tier systems backends, mainframes
Understanding the Layers Presentation logic Application Logic Resource Manager 1-2 years Clients and external interface (presentation, access channels) Client is any user or program that wants to perform an operation over the system. To support a client, the system needs to have a presentation layer through which the user can submit operations and obtain a result. The application logic establishes what operations can be performed over the system and how they take place. It takes care of enforcing the business rules and establish the business processes. The application logic can be expressed and implemented in many different ways: constraints, business processes, server with encoded logic... 2-5 years The resource manager deals with the Application organization (storage, indexing, and (system s logic) retrieval) of the data necessary to support the application logic. This is typically a database but it can also be a ~10 years Data management systems text retrieval system or any other data (operational and strategic data) management system providing querying capabilities and persistence. IKS, ETH Zürich. 7 A modern e-commerce platform 5 2 5 2 Cache Server ASP FARM A SSL ASP FARM B SSL SQL Product Server ASP File Server Basket/Ad/Surplus ASP File Server SQL Product Server Games/Music Videos Receipt/Fulfillment Games/Music Videos Comp/Soft Books Music Monitor and cache Comp/Soft Books Music Search Servers Search Servers Diagram courtesy of Robert Barnes, IKS, ETH Zürich. Microsoft 8
Scale-up versus Scale-out Scale-up Scale up is based on using a bigger computer as the load increases. This requires to use parallel computers (SMP) with more and more processors. Scale out is based on using more computers as the load increases instead of using a bigger computer. Both are usually combined! Scale out can be applied at any level of the scale up. Diagrams courtesy of Jim Gray, Microsoft Scale-out IKS, ETH Zürich. 9 Challenges of Integration A lot of the problems to be addressed in Enterprise Application Integration stem from having to integrate standalone applications which have been developed independently, operate autonomously, and were not originally indented to be integrated with one another. Heterogeneous each application implements its own data model. Concepts may be shared, but representation mismatches are to be expected. Mappings and transformations are required. Autonomous applications update their state independently without coordinating with each other. The systems to be integrated are maintained independently and upgraded at different times. Distributed in the worst case, every application runs on a completely separate environment, e.g., database storage is not shared among applications. Message-based communication is the only possibility to exchange information. This part is taken from C. Bussler, B2B Integration, Springer, 2004 IKS, ETH Zürich. 10
Ideal integration The purpose of integration technology is to provide the illusion of an ideal integration scenario hiding the shortcomings of the real world. Secure and Reliable Messaging. Many technologies and protocols have been developed to achieve secure, exactly-once message delivery over unreliable and insecure networks. In an ideal scenario, there would be a single network connecting all partners and systems and providing such features at the network interface layer. Semantics Message Delivery Interaction Uniform Semantic Data Model. Ideally, all applications would share the same schema, providing a unique and well-defined model of the data avoiding all misinterpretation problems. Translation, mappings, and transformations between different formats and mismatching representations are no longer necessary. Homogeneous interface processes. The message-based interaction between different systems happens in the same way. The external interfaces of all systems follow the same public business processes, so that they can be seamlessly interconnected. IKS, ETH Zürich. 11 Why integration matters Useful information systems evolve over time by growing in size and by incorporating functionality of existing standalone systems. Applications originally intended to operate separately, later on are required to interoperate with others. Technology change affects all layers, legacy does not go away so easily. The architecture of the enterprise information system depends on constraints related to the technology but also to the organization. In the case of B2B, each company owns its information system and will not open it up more than strictly necessary as it is part of their competitive advantage. For example, not all business processes are going to be shared, as business processes are mostly kept secret. Within an enterprise, each department may have its own IT infrastructure, systems and databases which are maintained independently. Integrating them may bring additional value to the company. Mergers, acquisitions and spin-offs leave a long lasting trace in the information systems of the corresponding companies IKS, ETH Zürich. 12
EAI in Context Databases Networking Software Engineering Programming Languages How to build applications from scratch Enterprise Application Integration Middleware How to integrate two or more existing applications IKS, ETH Zürich. 13 Kinds of Integration Given two (or more) applications, how can you integrate them? It depends on the assumptions and on whether you can change the applications. Some examples: Manual Integration Manual Integration with Copy & Paste File based integration API extraction and publishing Script different command lines Wrap existing software (screen scraping) Data transformation and conversion Message based integration Point to point, Centralized, Peer to Peer There are many different ways of doing EAI. Also, Integration can be applied to many different domains. IKS, ETH Zürich. 14
Get products #23 and #45 Buy products #23, #45 and part #101 Retailer Customer 2 Build product #3, according to specs. Customer 1 Get parts #A1, #B42, #H2, #R2 Manufacturer 1 Order parts #A1, #H2, #G7, #G11, #B42 Supplier 1 IKS, ETH Zürich. Get parts #G7, #G11, #ES-01, #R2 Manufacturer 2 Order parts #R2, #101, #ES-01, #G7, #G11 Supplier 2 15 Astronomy Another application of EAI IKS, ETH Zürich. 16
Scientific Method? Our ability to produce data exceeds our capacity to explain how the data was produced. IKS, ETH Zürich. 17 WEB BROWSER STREAMCORDER THIN CLIENT (HTTP) HEDC web server (Apache) www.hedc.ethz.ch (HTTP) JAVA CLIENT LOCAL DB PRESENTATION LAYER PROCESSING LOGIC (PL) DATA MANAGEMENT (DM) SERVER MANAGER DIRECTORY SERVICES FRONT END (HTTP, RMI) ARCHIVE MANAGER REFERENCE MANAGER DATA FILTERS APPLICATION LAYER IDL SERVER IDL SERVER... IDL SERVER DBMS 1 (Oracle) DBMS 2 (Oracle) TMP STORAGE SPACE NETWORK FILE SYSTEM TMP STORAGE SPACE... TMP STORAGE SPACE IMAGES AND RAW DATA IKS, ETH Zürich. 18 RESOURCE MANAGEMENT LAYER DB SPACE DB SPACE LESS RELEVAT DATA TAPE ARCHIVE
Planets outside the solar system IKS, ETH Zürich. 19 New planets or program bugs? IKS, ETH Zürich. 20
The Grid IKS, ETH Zürich. 21 Course philosophy Addressing the increasing need for connectivity, the ever growing demand, and facing the challenge of component based software design requires to solve a number of data management issues. By learning to identify the problems and being aware of the state of the art and possible solutions both theoretical and practical, a system designer will be in a much better position to deal with evolving technology. Design Problem System Design Technical Solutions IKS, ETH Zürich. 22
The future of distributed IS Why distributed information systems? Computer environments: Distributed, heterogeneous, autonomous nodes linked by a network (intranet, internet. Emphasis on communication). Technology advances: On computing power (powerful clients), on networks (reliability, speed. ATM, ISDN ). Application demands: Larger and larger applications. Decentralized corporations. Need for autonomy. New environments and business models: WWW, distributed service providers, Java, CORBA, Workflow Management. Basic services: A great deal of work is being invested in producing the type of standards and reusable software needed to make this a reality (SOAP/WSDL/UDDI) Distributed IS applications: Emphasis on interoperability: combine your data with that of the rest of the world. Emphasis on distribution: Intranet, Internet are here to stay. Huge demand for this functionality: Lotus Notes (applications built on replicated databases). WWW+Java+persistence (distributed service providers). TP-Monitors (OLTP, OLAP, transactional processing). Queuing Systems (applications on top of reliable, asynchronous communications). CORBA (applications on top of a TP-Monitor like object oriented system) Workflow Web Services and more IKS, ETH Zürich. 23 The Web services stack Messaging Description Nonfunctional description Conversations Choreography Business processes Contracts Discovery Transactions Security WSDL-based WSCL WS- Coordination BPEL4WS WSFL/XLANG WS- Transactions WSDL WSEL UDDI WS-Security WSCI BPML BTP SOAP Semantic Web DAML-S ebxml CPP ebxml BPSS BPML ebxml CPA ebxml registries SAML S/MIME IKS, ETH Zürich. 24 RDF ebxml ebxml MSS BTP
The distributed systems dilemma Theoretical advantages of distributed systems: Locality of reference: With the proper data placement, most accesses should be to local data, which increases response time and throughput. Scalability/Processing capacity: With better hardware available, the overall processing power should be a function of the number of nodes in the system (see parallelism). If more power is needed, add more nodes. Availability/Fault tolerance: A distributed system should be able to provide services even when part of the system is down (unlike centralized systems). This is important for large installations and mission critical applications (24x7 computing). IKS, ETH Zürich. 25 In theory, a distributed system is faster (better response time and throughput), bigger (more capacity), and more reliable (built-in redundancy). But, in practice, this is not true. Centralized (mainframe based): the old-fashioned approach. Most of the valuable data is still in mainframes, although it is only 1 % of all existing data (mainframes are still a good business). Client/Server (a variation of the centralized version): a first approach to distribution. Made too many promises and now it is suffering from its lack of success. Servers are not mainframes and quickly become a bottleneck. Applications move towards distribution, and find there is no support for it. Course Organization Distributed Information Systems Middleware Remote Procedure Calls The role of the WWW in EAI SOAP WSDL UDDI Limitations of SOAP, WSDL, UDDI TP-Monitors Message Oriented Middleware Workflow Management Systems EAI in industry (Guest Lecture) October November December January IKS, ETH Zürich. 26
Concrete goals for the course Provide a basic understanding of the problems associated with distributed environments (many of the ideas we will discuss apply in many areas, not just typical commercial applications). Provide the conceptual tools required to understand commercial products (basic idea behind a product, what its weaknesses are, how to solve them). Understanding how technology has evolved and why products are the way they are is the key to understanding what might happen in the future Develop the skills and know how necessary to participate in an enterprise application integration effort: motivation, vocabulary, systems, some programming experience. Gain sufficient awareness of the state-of-the-art. Some of the problems covered in the course are very hard and many people have worked on them for years. It is very useful to know what has been done so far and how it can be used. and having fun in the process!! IKS, ETH Zürich. 27