A P P L I C A T I O N S A W h i t e P a p e r S e r i e s EAI and ETL technology have strengths and weaknesses alike. There are clear boundaries around the types of application integration projects most appropriate for each technology. EAI vs. ETL: Drawing Boundaries for Data Integration
Introduction 1 2 3 4 5 6 7 8 9 CALL for EAI call for ETL what is eai? What is ETL? Comparing EAI and ETL Distinctive Factors EAI vs. ETL: A decision making guide Drawing Boundaries: EAI vs. ETL the bottom line EAI vs. ETL: Drawing Boundaries for Data Integration Business enterprises invest millions of dollars to implement and deliver Data Warehousing and Business Intelligence (BI) initiatives that rely on consistent, accurate and reliable data. IT organizations in these enterprises must ensure that proper integration techniques are selected to address the data needs of the organization. Positioning a common enterprise-wide integration strategy with EAI is essential to establish a clearcut partnership between business needs and IT solutions. Data integration, a function of ETL, is a prominent need as mediocre data at the foundation of any BI initiative fails to provide an accurate picture of the business. Thus the vital question: EAI or ETL? In this paper, we'll explore this question, comparing ETL against the data integration element of EAI. 2 0 0 7 S y n t e l, i n c.
EAI, as a discipline, aims to create lean, proactive organizations. 1. call for eai 2. call for etl Most business activities involve multiple applications and information sources; incompatibilities between these systems can cause delays and errors that prevent organization from achieving real-time business. The key to increasing operational efficiency and maximizing the individual value of these systems is ensuring that they can communicate and interact in real time. Some of the challenges facing modern organizations are: Giving the business complete, transparent access to information Enabling seamless movement of information from one application to another EAI, as a discipline, aims to alleviate many of these problems as well as create new paradigms for truly lean proactive organizations. ETL (Extract, Transform and Load) is the technology with the focus for data integration, whether in batch or real time for data stores/data warehouses. It synchronizes data between diverse applications and involves a lot more data manipulation than simply moving data from point A to B. There is reconciliation, cross matching, de-duping, cleansing - all data-intensive tasks that lay the foundation for facilitating analysis and reporting. These systems are no longer stand-alone and separate from operational processing they are integrated with overall business processes. ETL is no longer nice to have, but is essential to success. EAI Levels Data-level EAI The data-level EAI technique implements information exchange among multiple application data stores using traditional extract, transform, and load (ETL) techniques that are commonplace in data warehouse deployments. Message-level EAI Message-level EAI manages message exchange among multiple applications using reliable queuing systems. Integration Technologies Working in Concert Process-level EAI Process-level EAI technique goes beyond message-level EAI by overlaying a workflow management capability on top of message delivery capability. Figure 1. Example of integration technologies working together.
EAI is the process of aligning a business's strategic vision with its 3. information technology. what is eai? Enteprise Application Integration is the process of aligning a business s strategic vision with its information technology Enterprise Application Integration (EAI) solutions enable the automation of end-to-end business processes by coordinating sequences of tasks and resources (both systems and people) that perform them. EAI solutions support sophisticated exception management and the dynamic modification of processes even when processes are underway. EAI involves developing a unified view of an enterprise s business and its applications, seeing how existing applications fit into the new view, and then devising ways to efficiently reuse what already exists while adding new applications and data. EAI provides packaged integration solutions to help the enterprise develop a consistent approach to integration for all applications. Figure 2. The EAI architecture has various layers that reflect an increasing level of maturity in the integration environment with the overall enterprise application framework. 4. what is etl? Extract, Transform and Load (ETL) provides data consolidation for building permanent databases used for analytics or reports, data federation for creating virtual dashboards or reports, and data propagation for the transfer of data between applications. These three database functions are combined into one tool to pull data out of source databases and place it into target databases. ETL is used to migrate data from one or more databases to others, to form data repositories, data marts, data warehouses and also to convert databases from one format or type to another. Extract - the process of reading data from source systems. Data can be extracted in schedule-driven pull mode or event driven push mode. Pull mode operation supports data consolidation and is typically done in batch. Push mode operation is one online by propagating data changes to target data stores. Transform - the process of converting the extracted data from its existing form into the format it needs to be in so that it can be placed into other systems or databases. Transformation occurs by using rules or lookup tables or by combining the data with other data. Load - the process of creation and execution of workflows to write data into the target systems. Data loading may cause a complete refresh of a target data store or may be done by updating the target destination. Interfaces here include de facto standards like ODBC, JBDC, JMS, or application interfaces. Loads could be parallel, synchronized or sequenced; e.g., ETL tool support parallel execution which dramatically reduces response time for data-intensive operations on data warehouses/data stores. Figure 3. The ETL process
The other services which form an integral part of the ETL framework are: ETL is no longer "nice-to-have," Administration and Operation services - these services ensure effective utilization of resources in the data synchronization environment. They ensure effective administration through job scheduling and tracking, metadata management, error recovery, etc. Transport services - the process of moving raw or transformed data from a source to a target system. Metadata services - Metadata is descriptive information about data and other structures, such as objects, business rules, and processes that manipulate data. Metadata can be grouped into two categories: but is essential to success. Technical metadata supports designers, developers, administrators during development, maintenance, and management of an information technology environment. It is the technical glue that links the tools, applications, and systems that together constitute a solution. Example of technical metadata: the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate the scripts that build data warehouse tables. Business metadata, on the other hand, gives a clearer picture of the services of the enterprise environment to end-users. Examples of business metadata include: business requirements, timelines, business metrics, business process flows, and business terminology. Metadata authors enter information about the business application into the metadata repository. 5. comparing eai and etl EAI tools are clearly most appropriate for process integration, which consists of multi-step business process management and real-time interactive processing when very large numbers of transactions are involved. ETL tools do not handle these processes well. ETL tools are not designed to handle discontinuous workflows, or to scale to moving very large numbers of small transactional messages. EAI and ETL are not competing technologies. They each rely on the concept of a unified view and the definition of a mapping that allows data from many disparate sources to be projected onto that view. There are many situations where they can be used in conjunction with each other where ETL can act as a service to EAI. One of the main objectives of EAI is to provide transparent access to the wide range of applications that exist in an organization. An EAIto-ETL interconnection could be built using a Web service or a message queue to give an ETL product access to this application data. Such an interconnection eliminates the need for ETL to develop point-to-point adapters for synchronizing applications data sources. EAI is focused on real-time processing, it can consequently act as a real-time event source or target by an ETL application. ETL tools allow developers to define ETL as Web services. These Web services can be invoked by EAI applications. This not only provides transformational power to the EAI environment, but also supports code and metadata reuse. In plain words, data integration (provided by ETL) is a sub-set of process integration (provided by EAI); a common functionality between ETL and EAI is data integration from disparate systems. It is important to note that data integration using EAI is at a cost software, hardware, infrastructure, skills, licenses, heavy footprint. AN EXAMPLE OF eai/etl overlap Employers send entitlement information for employees and dependents to healthcare insurance payers on a weekly basis. This information has records of all changes to entitlement information that occurred during the week. The incoming data from the employer is in a proprietary format and needs to be converted into the healthcare provider s backend mainframe system format. Data synchronization ETL Batch and real-time data synchronization Interactive processing (ETL or EAI) Point-to-point continuous processing Simple or no workflow Multi-step processing EAI BPM Multi-step process Summary records must be created that list the number of dependents and children for each employee. Here, EAI is used for transmitting the records which have incremental changes and ETL is used to perform format and content transformations in a batch mode. Figure 4. EAI/ETL overlap
6. Distinctive Factors Areas EAI ETL Definition Performance Optimization Technology solution that enables systems to communicate System is aimed at reducing the response time for a single user request or update Integration Applications Data Focus Operational & Strategic Operational Business Case IT, e-business Better Workflow Data entry once Process designed by users to extract, transform, and load data from one or more sources to a target data repository System is aimed at reducing total time to create the unified historical record Business Intelligence Decision making Time Real Time Batch (moving to real time) Data Transactional-small Historical-enormous Metadata Transformations Volume Targets Limited Message metadata Format oriented Code supported Single transactions Messages/second (KB) OLTP API Code supported Rich Dimensional metadata Analytic Joins Aggregations Days or weeks of data Records per min (GB) Relational Structures Native connectivity Codeless Extracts Data Using API s Directly from database System Admin Involvement EAI requires no system administrator involvement. Once implemented, EAI is a technology solution that is transparent to end users. ETL requires extensive system administrator involvement 7. EAI vs. ETL: a Decision Making guide Syntel has developed a list of questions to help guide your organization toward the best decision for the situation when deciding between EAI and ETL. This toolkit can be used as an aid to evaluate a project as process integration or a data integration project. Factors for consideration in the decision include: Costs of run-time processing and development. Proprietary nature of source or target systems. A situation where the source system can only be accessed via screen scrapping because the file layouts and key structures are part of package and source is not available. In such cases neither ETL nor EAI will work and a solution might have to be developed on case to case basis. The state of data and load-time window available to migrate data from source to target and vice versa which needs real-time movement of data. Complexity and mapping of source and target systems by data elements and data quality in each system. Skills of staff relative to EAI and ETL tools. To determine if your solution should be EAI or ETL, answer the questions on the next page:
EAI vs. ETL Decision-making Toolkit YES NO Do you anticipate data coming from disparate target systems lying in silos that you need to integrate? Is your source data straight-forward and does it fit directly to your target systems? (i.e. no data transformation required) Do you expect the tool to automatically analyze and execute operations on your data? Is the migration a one-off event? (i.e. you do not anticipate adding additional systems in the future) In the event of a system or connection failure, do you expect data rollback or data integrity checks to be executed automatically? Do you have any logic involved or business decisions to be made "on-the-fly" based on your source data? Do you have a large number of transactions to be completed and managed swiftly? Are you finished making EAI skill set and infrastructure investments? Do you need a workflow which will help streamline business processes and decision-making? Do you anticipate future business growth, additional target systems, or business mergers which would require sharing this data across systems? If you answered YES to the first four questions, the right choice for you is ETL. If the answer to the last 6 questions is YES, then an EAI tool is the solution for you. In that case, you should strongly consider bringing in an enterprise architect to evaluate the possibility. The enterprise architect will ensure that the pieces of the wider puzzle fit together properly. 8. drawing boundaries: eai vs. ETL 9. The bottom line EAI Reliability (guaranteed delivery) Enables real-time business decisions Out of box adapters for many enterprise systems ETL Metadata driven approach GUI tools for most tasks (little coding) Extremely efficient for large data volumes If data integration is the business pain point you are facing, the most effective solution will be ETL. However, if your real problem is process integration, you will be better off with an EAI implementation. High upfront cost Relatively complex design patterns High upfront costs Complexity of tool Batch oriented Most suitable for real time data needs High volume, low footprint data exchange Many consumers of the same data Suitable for large volumes of data Generally used to move data between two or more databases/data repositories
about SYNTEL: Syntel provides custom outsourcing solutions to Global 2000 corporations. Founded in 1980, Syntel's portfolio of services includes BPO, complex application development, management, product engineering, and enterprise application integration services, as well as e-business development and integration, wireless solutions, data warehousing, CRM, and ERP. We maximize outsourcing investments through an onsite/offshore Global Delivery Service, increasing the efficiency of how complex projects are delivered. Syntel's global approach also makes a significant and positive impact on speed-to-market, budgets, and quality. We deploy a custom delivery model that is a seamless extension of your organization to fit your business goals and a proprietary knowledge transfer methodology to guarantee knowledge continuity. SYNTEL 525 E. Big Beaver, Third Floor Troy, MI 48083 phone 248.619.3503 info@syntelinc.com v i s i t S y n t e l ' s w e b s i t e a t w w w. s y n t e l i n c. c o m