E-commerce Design and Implementation of an ETL Container
|
|
|
- Cori Laurel Terry
- 5 years ago
- Views:
Transcription
1 Association for Information Systems AIS Electronic Library (AISeL) ICIS 2003 Proceedings International Conference on Information Systems (ICIS) Container-Managed ETL Applications for Integrating Data in Near Real-Time Josef Schiefer IBM Watson Research Center Robert Bruckner Vienna University of Technology Follow this and additional works at: Recommended Citation Schiefer, Josef and Bruckner, Robert, "Container-Managed ETL Applications for Integrating Data in Near Real-Time" (2003). ICIS 2003 Proceedings. Paper This material is brought to you by the International Conference on Information Systems (ICIS) at AIS Electronic Library (AISeL). It has been accepted for inclusion in ICIS 2003 Proceedings by an authorized administrator of AIS Electronic Library (AISeL). For more information, please contact
2 CONTAINER-MANAGED ETL APPLICATIONS FOR INTEGRATING DATA IN NEAR REAL-TIME Josef Schiefer IBM Watson Research Center Hawthorne, NY USA Robert M. Bruckner Vienna University of Technology Vienna, Austria Abstract As the analytical capabilities and applications of e-business systems expand, providing real-time access to critical business performance indicators to improve the speed and effectiveness of business operations has become crucial. The monitoring of business activities requires focused, yet incremental enterprise application integration (EAI) efforts and balancing information requirements in real-time with historical perspectives. The decision-making process in traditional data warehouse environments is often delayed because data cannot be propagated from the source system to the data warehouse in a timely manner. In this paper, we present an architecture for a container-based ETL (extraction, transformation, loading) environment, which supports a continual near real-time data integration with the aim of decreasing the time it takes to make business decisions and to attain minimized latency between the cause and effect of a business decision. Instead of using vendor proprietary ETL solutions, we use an ETL container for managing ETLets (pronounced et-lets ) for the ETL processing tasks. The architecture takes full advantage of existing J2EE (Java 2 Platform, Enterprise Edition) technology and enables the implementation of a distributed, scalable, near real-time ETL environment. We have fully implemented the proposed architecture. Furthermore, we compare the ETL container to alternative continuous data integration approaches. Keywords: Real-time data integration, middleware architectures, data warehousing Introduction The widespread use of the Internet and related technologies in various business domains has accelerated the intensity of competition, increased the volume of data and information available, and shortened decision-making cycles considerably. Consequently, strategic managers are being exposed daily to huge inflows of data and information from the businesses they manage and they are under pressure to make sound decisions promptly. Typically, in a large organization, many distributed, heterogeneous data sources, applications, and processes have to be integrated to ensure delivery of the best information to the decision makers. In order to support effective analysis and mining of such diverse, distributed information, a data warehouse (DWH) collects data from multiple, heterogeneous (operational) source systems and stores integrated information in a central repository. Since market conditions can change rapidly, up-to-date information should be made available to decision makers with as little delay as possible. For a long time it was assumed that data in the DWH can lag at least a day if not a week or a month behind the actual operational data. That was based on the assumption that business decisions did not require up-to-date information but very rich historical data. Existing ETL (extraction, transformation, loading) tools often rely on this assumption and achieve high efficiency in loading large amounts of data periodically into the DWH system. Traditionally, there is no real-time connection between a DWH and its data sources, because the write-once read-many decision support characteristics conflict with the continuous update workload of operational systems and result in poor response time. Therefore, batch data loading is done during frequent update windows (e.g., every night). With this approach the analysis capabilities of DWHs are not affected. ETL approaches often take for granted that they are operating during a batch window and that they do not affect or disrupt active user sessions Twenty-Fourth International Conference on Information Systems
3 While this still holds true for a wide range of DWH applications, the new desire for monitoring information about business processes in near real-time is breaking the long-standing rule that data in a DWH is static except during the downtime for data loading. Continuous data integration aims to decrease the time it takes to integrate certain (time-critical) data and to make the information available to knowledge workers or software agents. This enables them to find out what is currently happening, decide on what should be done (utilizing the rich history of the DWH), and react faster to typical and abnormal data conditions (tactical decision support). Continuous and near real-time data integration for DWHs minimizes propagation delays from operational source systems of an organization, which are responsible for capturing real world events. This improves timeliness by minimizing the average latency from when a fact is first captured in an electronic format somewhere within an organization until it is available for knowledge workers who need it. Over the past couple of years, the Java 2 Platform Enterprise Edition (J2EE) has established itself as major technology for developing e-business solutions. This success is largely due to the fact that J2EE is not a proprietary product, but rather an industry standard, developed as result of an industry initiative by large IT companies. Many ETL solution vendors make their platforms J2EE compliant by extending their products with Java interfaces and J2EE connectors (Sun Microsystems 2002). Although these solutions can integrate J2EE components into the ETL processing, they cannot take full advantage of a J2EE application server with its middleware services (e.g., resource pooling, caching, clustering, failover, load-balancing, etc.). In this paper, we extend an existing J2EE application server with an ETL container that enables a seamless integration with other J2EE containers and the utilization of all available middleware services of the application server. Similar to J2EE Web applications, where servlets and JSPs take the place of traditional CGI scripts, our approach uses managed ETL components (socalled ETLets) that replace traditional ETL scripts. By extending a J2EE application server with an ETL container, organizations are able to develop platform and vendor independent ETL applications the same way as traditional J2EE applications. Developers are able to quickly and easily build scalable, reliable ETL applications and can utilize Java middleware services and reusable Java components provided by the industry. The paper is organized as follows. In the next section, we discuss related work. The third section provides a discussion of continuous data integration systems. In the fourth section, we describe the architecture and implementation of the ETL container. The fifth section presents and evaluates results of using the ETL container approach. Finally, we conclude and discuss future work. Related Work As organizations interact on a real-time basis and business processes cut across multiple departments and business lines, the need for information integration leads to the adoption of enterprise application integration (EAI) approaches in which message-oriented middleware forms the backbone of the enterprise integration (Arsanjani 2002). EAI also deals with the guaranteed delivery of information. Data has to get to its destination without loss as fast as possible. However, EAI tools are not primarily designed for taking care of data integration, aggregation, and consolidation, especially at an enterprise level. EAI is important but it has yet to master the cross-stream integration layers that ETL already offers. EAI technology focuses on moving small bursts of information between systems using a bus architecture, and not on bulk data movement. The presence of continuously changing data is a fundamental change to the stable data snapshot paradigm of traditional DWHs. Data modeling, report execution, and information delivery may need to cope with this new paradigm. There are two different temporal characterizations of the information appearing in a DWH: one is the classic description of the time when a given fact has occurred; the other represents the instant when the information is actually intelligible to the system (Bruckner and Tjoa 2002). This distinction, implicit and usually not critical in on-line transaction processing applications, is of particular importance for (near) real-time DWHs. There it can be useful (or even vital) to determine and analyze what the situation was in the past, with only the information available at a given point in time. This goes beyond solving the problem of slowly changing dimensions (Kimball 1996). Existing models lack built-in mechanisms for handling change and time (as revealed by a survey of multidimensional data models in Pederson et al. 2001). Data may take the form of continuous data streams (Babcock et al. 2002) rather than finite stored datasets. Fields of application include manufacturing processes, click-streams in Web personalization, and call detail records in telecommunication (Chen et al. 2000). By nature, a stored dataset is appropriate when significant portions of the data are queried again and again, and updates are small and/or relatively infrequent. In contrast, a data stream is appropriate when the data is constantly changing (often exclusively through insertions of new elements), and it is either unnecessary or impractical to operate on large portions of the data multiple times Twenty-Fourth International Conference on Information Systems 605
4 Vassiliadis et al. (2001) propose an approach that uses workflow technology for the management of DWH refreshment processes. The authors propose a meta model for DWH processes that is capable of modeling complex activities, their interrelationships, and the relationship of activities with data sources from a logical, physical, and conceptual perspective. Only the physical perspective of the meta model covers the execution details of the DWH processes. The logical and conceptual perspectives allow the modeling of complex DWH processes as workflows. Most of the previously discussed approaches focus on data refreshment processes with batch-oriented data integration. Although the ETL container proposed in this paper also supports batch-oriented ETL processing, its major addition is the definition and efficient execution of processing flows for individual event types, which is crucial for continuous data integration and difficult to achieve with a workflow management system. Moreover, the ETL container provides efficient evaluation capabilities for fresh calculated business metrics that can also be used to control the ETL process. Process Information Factory Enterprise Data Warehouse Process Warehouse Process Data Store Business Process Monitoring Batch Data Integration Loading Transformation Extraction Workflow Events Workflow Metrics Event ETLets Adapters ETL Container WFMS Evaluators Notifications Response Operational Source Operational Source Operational Source Operational Source Figure 1. Process Information Factory with ETL Container We use the ETL container as a solution for integrating workflow events. Figure 1 shows the ETL container as part of a process information factory for integrating event data continuously from a workflow management system (List et al. 2000). Arrows indicate a flow of data or control among these components. The highlighted components show the extensions to a conventional (passive) DWH environment. The ETL container ultimately transforms on-the-fly events from workflow management systems (WFMSs) into workflow metrics that are stored in the DWH environment. ETL Subsystem A DWH can only be considered (near) real-time, when essential parts of the data are updated or loaded on an intraday basis, without interrupting user access to the system. However, most ETL tools, whether based on off-the-shelf or custom-coded products, operate in a batch mode. They take for granted that they have free reign to drop and reload tables and conduct other major database operations without disturbing simultaneous end-user queries. During the update process of a traditional DWH, analytical queries are not processed. This period of time is often called the update window. Due to the constantly increasing size of DWHs and the rapid rates of change, there is increasing pressure to reduce the time taken for updating the DWH. There are three basic approaches to cope with this situation: Twenty-Fourth International Conference on Information Systems
5 Minimize the update window. Bulk loading tools achieve high-performance for data integration and therefore shrink update windows. These tools are able to generate data blocks, which are directly written to disk (i.e., they are added to the data files of the DWH database). However, this approach does not use database transactions for efficiency reasons. Therefore, bulk loading should not overlap with normal DWH workload processing in order to avoid interference and unexpected results. There has been substantial work done to identify optimal strategies for updating single materialized views of DWHs (Labio et al. 1999). These optimizations reduce the update window significantly by minimizing the whole update work (e.g., by avoiding recomputation of materialized views from scratch). Other approaches attempt to compute the exact delta for incremental maintenance (Ram and Do 2000). Data/table swapping. In order to isolate ETL tools from simultaneous end-user queries, data that changes throughout the day can be loaded into a parallel set of tables, either through a batch process that runs every few minutes or through a continuous data integration process. Once the load interval (e.g., five minutes) is up, the freshly loaded tables are simply swapped into production and the tables with the now stale data are released from production. This can be accomplished through the dynamic updating of views or by simple table renaming. The drawback of this type of n-minute cycle-loading process is that the data in the warehouse is not truly real-time. For applications where true real-time data is required, the best approach is to continuously trickle-feed the changing data from the source directly into the DWH. Continuous data integration. Near real-time data integration manages continuous data flows from transactional systems and performs the incorporation to a DWH. The data integration process is performed in parallel with complex analytical processing in the DWH and therefore differs from traditional batch load data integration. It has to use regular database transactions (i.e., generating inserts and updates on-the-fly), because, in general, database systems do not support block operations on tables while user queries simultaneously access these tables. In addition, the data rate of continuously integrated data is usually low with an emphasis on minimized latency for the integration. However, bulk loading processes buffer datasets in order to generate larger blocks and, therefore, the average latency for data integration with bulk loading increases. Data integration based on frequent bulk loads can exceed the data latency requirements. In the process information factory example, we need a continuously running service, which provides always-listening sessions. Obviously, continuous data integration cannot be as fast as bulk load integration. It is necessary to carefully identify critical data, which needs to be integrated constantly with minimized latency. In general, not all but only a relatively small amount of data represents transactions or other relevant information that must be captured continuously and live from transactional systems to be integrated in near real-time with the historical information of the DWH. We need to accept real-time feeds of transactional data, which can be a burden for the systems involved if they work in a synchronous mode using a direct connection. Using an asynchronous architecture, operational systems are able to publish (push) their transaction data to message queues without being slowed down or even disturbed if a data integration process fails. Messaging middleware provides a buffered data transport and reliable delivery. The data integration tier is able to decide when to pull new data from operational sources. Message queues provide several characteristics that are well suited for continuous loading. First, like the water in a faucet, the contents of a queue are able to continuously flow, or be held back, ready to flow at any moment. Second, queues can receive data from many distributed platforms and systems. Third, queues are reliable; they can guarantee the data delivery. Since every event is significant, the loss of a single event can lead to a loss of synchronization in state between a source system and the DWH. Messages containing event (rather than state) information must be queued and removed after reading to ensure that every event is processed exactly once. Most message queuing systems also offer transactional support and persistence for transmitted messages. ETL Container Architecture and Implementation In this section we introduce a J2EE (Java 2 Platform, Enterprise Edition) architecture for a container-based ETL environment which enables a continuous integration of data from various source systems in near real-time. J2EE environments have containers that are standardized runtime environments that provide specific services to the components. Components can expect these services to be available on any J2EE platform from any vendor. For example, EJB (Enterprise Java Beans) containers provide automated support for transaction and life cycle management of EJB components, as well as bean lookup and other services Twenty-Fourth International Conference on Information Systems 607
6 Containers also provide standardized access to enterprise information systems (e.g., providing access to relational data through the JDBC API Java Database Connectivity Application Programming Interface). In our approach, we extend an existing J2EE environment with an ETL container that provides services for the extraction, transformation, and loading of the data into a DWH. The ETL container is a robust, scalable, and high-performance data staging environment, which is able to handle a large number of data extracts or messages from various source systems in near real-time. It takes responsibility for system-level services (such as threading, resource management, transactions, security, persistence, and so on) that are important for the ETL processing. The ETL container is responsible for the monitoring of the data extracts and transformations, and ensures that resources, workload, and time-constraints are optimized. ETL developers are able to specify data propagation parameters (e.g., schedule and time constraints) in a deployment descriptor and the container will try to optimize these settings. This arrangement leaves the ETL developer with the simplified task of developing functionality for the ETL processing tasks. It also allows the implementation details of the system services to be reconfigured without changing the component code, making components useful in a wide range of contexts. Instead of developing ETL scripts, which are often hard to maintain, scale, and reuse, ETL developers are able to implement reusable components for ETL processing. We further extended this concept by adding new container services, which are useful for the development and execution of ETL applications. Examples of such container services are a flow management service, which allows a straight-through processing of complex ETL processes, or an evaluation service, which significantly reduces the effort for evaluating calculated business metrics. Figure 2 shows the architecture of a J2EE ETL environment with an ETL container, an EJB container, and various resource adapters. An ETL container is part of a Java application server and manages the lifecycle of ETL components, and also provides services for the execution and monitoring of ETL tasks. There are three component types that are managed by the ETL container: (1) event adapters, (2) ETLets, and (3) evaluators. Source Systems... External Sources Legacy Systems ERP WFMS Database Proprietary C onnecto rs EAI/BPI Con nector s J2EE Conn ectors JMS Conn ectors Database Co nnectors Resource Adapte rs Conversion EJ Bs Ass embly EJBs JMS Event Adapt er JCA Event Adapter Session Beans Event Adapters EJB Container Parsing EJ Bs Cleansing EJBs MQ Series Event Adapter JCA Event ETLets Adapt er Event Driven ETLet s ETL Container Scheduled ETLets Mes sage Dr iv en EJ Bs CMP Entity Beans Exception ETLet s BMP Entity Beans Range Evaluators Evaluators Cust om Evaluat or s Java Servi ces JAA S, JN DI, JMS, JAXP, JCE, JTA, JAF,... Compare Ev aluators J2EE ETL Environment Figure 2. J2EE ETL Environment Event adapters are used to extract or receive data from the source system, and they unify the extracted data in a standard XML format. ETLets use the extracted XML data as input and perform the ETL processing tasks. ETLets also publish business metrics that can be evaluated by evaluator components. Each of these components must implement a certain interface that is used by the ETL container in order to manage the components. The ETL container automatically instantiates these components and calls the interface methods during the components lifetime. EJB containers enhance the scalability of ETL applications and allow the distribution of the ETL processing on multiple machines. EJB containers manage the efficient access to instances of the EJB components regardless of whether the components are used locally or remotely. ETL developers can write EJB components for typical ETL processing tasks (data cleansing, data parsing, complex transformations, assembly of data, etc.) and can reuse these components in multiple ETL applications Twenty-Fourth International Conference on Information Systems
7 J2EE environments have a multitiered architecture, which provides natural access points for the integration with existing and future source systems (Sun Microsystems 2001). The integration tier (in J2EE environments also called enterprise information tier) is crucial for an ETL environment because it contains data and services often implemented by non-j2ee resources that can be utilized for the data extraction from source systems. Workflow management systems, databases, legacy systems, ERP (enterprise resource planning), EAI (enterprise application integration) systems, and other existing or purchased packages reside in the integration tier. A J2EE ETL environment provides a comprehensive set of resource adapters for these source systems. For instance, the J2EE platform includes a set of standard APIs for high-performance connectors. Many vendors of ERP or customer relationship management systems (e.g., SAP, Oracle) offer a J2EE connector interface for their systems. With our architecture, ETL developers can reuse existing high-performance, J2EE connectors and connectors of EAI solutions for the data extraction without worrying about issues like physical data formats of source systems, performance or concurrency. Moreover, the J2EE platform includes standard APIs for accessing databases (JDBC) and for messaging (JMS) which enables the ETL developers to access queue-based source systems that propagate data via messages. Extracting Data with Event Adapters The purpose of event adapters is to extract or receive data from source systems and to unify the different data formats. Event adapters translate all raw source data into an XML event format with a defined XML schema. Event adapters can accept event data asynchronously via messaging software or synchronously via a resource adapter (e.g., JDBC or J2EE connector). The first option is more scalable because it completely decouples the event source (e.g., a WFMS) from the event processing in the ETL container. Event adapters are running on their own threads and can receive and dispatch events in parallel. Figure 3 illustrates event adapters receiving and dispatching events via an event dispatcher that assigns threads for the ETL processing with ETLets and evaluators. The components shown with round boxes are the ETL components that are managed by the container and have to be implemented by the ETL developers. The components shown with square boxes are internal container components that are used to conjoin all ETL components. Please note that the ETL developers never see or have to deal with these internal components. We show these internal components for illustration purposes only. ETL Container Event Adapter Event Adapter Event D ispa tch er Event 1 Event 2 Ev ent 3 Event Handler -- Event Processing -- ETLet Service ETLet 1 ETLet 2 ETLet 3... Metric Dispatcher Metri c Handler Evaluation Ser vice Evaluator 1 Evaluator 3 Thread 1 Thread 2 Thread 3 Evaluator 2... ODS Data Warehouse Figure 3. Multi-Threading in the ETL Container (Adapted from J. Schiefer, J. J. Jeng, and R. M. Bruckner, Real-Time Workflow Audit Data Integration into Data Warehouse Systems, Proceedings of the Eleventh European Conference on Information Systems, C. Ciborra, R. Mercurio, M. DeMarco, M. Martinez, and A. Carignani (eds.), Naples, September 2003.) 2003 Twenty-Fourth International Conference on Information Systems 609
8 interface EventAdapter init +start:void +stop:void running stopped JMSEventAdapter MQEventAdapter JCAEventAdapter destroyed +start:void +stop:void +start:void +stop:void +start:void +stop:void Figure 4. Lifecycle, Interface/Classes (Event Adapters) In order to address overload situations, where not enough resources are available to instantiate ETLets for the event processing, the ETL container can block an event adapter temporarily. For instance, if there is no thread available for the processing of an incoming event within a specified timeout period, the event adapter is notified of the overload situation and can react to this situation individually. Figure 4 shows the lifecycle and interface of event adapters. All ETL components include an init, running, and destroyed state. Only the event adapters have an additional stopped state which enables the ETL container to stop the event processing. ETL Processing with ETLets After the dispatching of the standardized XML events in the event adapter, the ETL container invokes ETLets which have subscribed to the event. In order to achieve a high performance level, the ETL container uses a thread pool whose size is adjustable. ETLets run in multiple threads in parallel. However, all processing steps of an ETL processing flow usually run within the same thread. For one event type (e.g., ACTIVITY_STARTED events) there can be several ETLets that have subscribed to the same event type. In this case, the ETL container will invoke the subscribed ETLets in parallel. The ETLet components must implement one of the ETLet interfaces shown in Figure 5. init interface EventDrivenETLet +runetl:void interface ETLet running CycleTimeETLet +runetl:void ActivityCostsETLet +runetl:void interface ScheduledETLet +runetl:void interface ExceptionETLet +runetl:void destroyed Figure 5. Lifecycle, Interface/Classes (ETLets) Twenty-Fourth International Conference on Information Systems
9 The ETL container manages three types of ETLets: event-driven ETLets, scheduled ETLets, and exception ETLets. All ETLet types have a runetl() method with a different signature. ETL developers have to implement this method with the ETL processing logic. This processing logic can include any type of data transformation, the calculation of business metrics, and storing the metrics in a database table. ETLets can also publish the business metrics to allow the container to pass these metrics to the evaluator components that have subscribed to the metric type. Event-driven ETLets can subscribe to a number of XML events that are dispatched by the event adapters and that are relevant to the processing logic in the runetl() method. For the subscription, the ETLet has to define either the event-id or a matching XPath expression for the XML events of interest. For instance, the CycleTimeETLet determines the cycle time of business activities by subscribing to the ACTIVITY_STARTED, ACTIVITY_COMPLETED, and ACTIVITY_CANCELED events in order to be able to calculate the activity cycle times. Scheduled ETLets are triggered by the ETL container in intervals at specific points in time. The schedule for the triggering is also configurable in the deployment descriptor of the ETLet. Scheduled ETLets can be used to perform recurring ETL tasks, for instance aggregating the daily order data after a business day. Exception ETLets are a special kind of ETLets that are invoked when an exception is thrown within an ETLet of the ETL application and this exception cannot immediately be handled by the ETLet itself. For instance, this happens if a manual step is required in order to resolve the problem of the exception. Exception ETLets are used to store these exceptions and the triggering events that caused the exception in a file or database for a later manual correction of the problem. They also can be used for sending out notifications. For instance, an administrator can be notified via that an unhandled exception occurred in an ETL container. Exception ETLets must also be defined in the deployment descriptor. Evaluation of Business Metrics with Evaluators The metrics calculated and published by ETLets can be evaluated by evaluator components. An evaluation of fresh calculated business metrics can be very valuable because it can be used to create an intelligent response (e.g., sending out notifications to business people or triggering business operations) in near real-time. Evaluators have the same lifecycle as ETLets (see Figure 6). They can be either implemented by ETL developers or act as proxies by forwarding the evaluation requests to rule engines for more sophisticated evaluations. In the first case, ETL developers have to implement the evaluate() method of the evaluator interface with the evaluation logic. interface Evaluator init +evaluate:int running destroyed CycleTimeEvaluator +evaluate:int ActivityCostsEvaluator +evaluate:int Figure 6. Lifecycle, Interface/Classes (Evaluators) 2003 Twenty-Fourth International Conference on Information Systems 611
10 An evaluator can subscribe to a set of metric types that are defined in the deployment descriptor. For the subscription to metric types, there are two mechanisms available: (1) evaluators can subscribe to a set of metric types independent from which ETLets generated the metrics, and (2) they can also subscribe only to the metric types of a defined set of ETLets. The ETL container makes sure that an evaluator receives the correct metrics from the ETLets. Note that an evaluator is not aware of how a metric was calculated or which events triggered the metric calculation. An evaluator is automatically invoked by the container when a new metric of the subscribed metric type became available. Deployment of an ETL Application The J2EE platform allows ETL developers to create different parts of their ETL applications as reusable components. The process of assembling components into modules and modules into ETL solutions is called packaging. The process of installing and customizing the ETL components in an operational environment is called deployment. The J2EE platform provides facilities to make the packaging and deployment process simple. It uses JAR (Java archive) files as the standard package for modules and applications, and XML-based deployment descriptors for customizing components and applications. Although deployment can be performed directly by editing XML text files, specialized tools best handle the process. For the packaging and deployment of an ETL solution, two types of deployment descriptors are used for the modules (see Figure 7): EJB deployment descriptors (ejb-jar.xml). An EJB deployment descriptor provides both the structural and application assembly information for the EJBs that are used in the ETL application. The EJB deployment descriptor is part of the J2EE platform specification (Sun Microsystems 2002). ETL application deployment descriptor (etl-app.xml). The deployment descriptor for ETL applications lists all ETL components (event adapters, ETLets, evaluators) and specifies the configuration parameters for these components. This includes general configuration information about the ETL application and ETL components (e.g., name, description), the implementation classes for the ETL components, data propagation parameters for the event adapters (e.g., connection parameters), configuration and ETL processing parameters for ETLets (e.g., triggers, published metrics), and evaluation parameters for evaluators (e.g., evaluation thresholds). During the deployment, the deployment team has to adapt the deployment descriptor settings to an existing DWH environment. J2EE ETL Solution Event Adapters ETLets Evaluators ETL (Module) ETL Application Deployment Descriptor etl-app.xml EJB (Module) EJB Deployment Descriptor ejb-jar.xml Session EJBs Entity EJBs Message Driven EJBs Figure 7. ETL Solution Deployment Experiences and Comparison This section gives an overview of our experiments with the ETL container. Furthermore, we built another continuous data integration system using a fundamentally different design approach and compare it to the ETL container approach. Event Detection for Operational Source Systems There are several methods to detect changes of operational systems (Todman 2001) such as database log file analysis, snapshot comparison, materialized view maintenance, database triggers, and audit trails. In the following we discuss two methods used in our experiments Twenty-Fourth International Conference on Information Systems
11 Modifying operational systems. If the underlying database system supporting an operational application is relational, then it is possible to capture the changes by using database triggers. Figure 8 shows the trigger definition for an inventory table. In this scenario, our inventory system (running on a Microsoft SQL server) consists of several tables and a central table that stores the inventory levels for all articles. Every change (e.g., an item is sold) is captured by the trigger, which takes the relevant data and calls upon the stored procedure send_msmq_msg. This procedure sends the message to the specified message queuing server using DTS (data transformation services) objects. A similar solution could be an operational system based on an Oracle database using Oracle Advanced Queuing or other messaging middleware. Analysis of audit trail. Some operational applications maintain audit trails to enable changes to be traced. For example, workflow management systems (WFMSs) provide an audit trail about state changes of workflows, but include only a limited level of detail data, which can be a constraint in practice. In our second scenario, we use an integrated utility called QTool in order to read audit trail files and write one message per audit trail dataset into a message queue. We will briefly discuss QTool next. CREATE TRIGGER ETL_Inventory ON [dbo].[inventory] FOR INSERT, UPDATE AS Varchar(36) Varchar(21) Varchar(100) DECLARE mycursor CURSOR READ_ONLY FOR SELECT CONVERT(CHAR(36),ArticleID) AS ArticleID, CONVERT(CHAR(21), Level) AS Level FROM inserted OPEN mycursor FETCH NEXT FROM WHILE (@@FETCH_STATUS = 0) BEGIN + ';' + ';' + CONVERT(CHAR(23),getdate(),121)+CHAR(13)+CHAR(10) EXEC send_msmq_msg FETCH NEXT FROM Figure 8. Trigger for Operational System (Transact-SQL Syntax of SQL Server) Implementation of the Second Prototype The ETL container is one solution to the problem of continuous data integration. It is easily adaptable due to its J2EE-based architecture. However, since off-the-shelf database systems provide specialized data load utilities, we built a second system based on the Teradata database. It makes use of a Teradata-specific data load utility (tpump) in order to achieve continuous data loading. Our integrated utility QTool (Bruckner and Braito 2003) provides the following functionality: (1) queue administration, management, and monitoring of Microsoft Message Queuing (MSMQ), (2) message queue feed functionality (reading data from files), and (3) a job scheduler that controls and coordinates parallel executions of the instances of the database load utility (i.e., tpump). The job scheduler supports the automatic hand-off of control between tpump instances and handles the end-of-job processing. The load utility tpump generates SQL-statements on-the-fly and sends them to the database, which executes them as single transactions using row level locking. QTool currently offers support for MSMQ has been designed, but is not limited to using Teradata s tpump utility. The QTool scheduler acts as a job scheduler and is responsible for starting and stopping tpump instances and for managing the post-job processing (e.g., processing error tables). The loading jobs are configured and alternately called to enable continuous data integration from a message queue into a DWH. If one tpump instance ends or even if it fails due to an error, a second instance gets immediately access to the message queue and continues the data load without interruption. The scheduler makes sure that there is always one instance running and a second one waiting Twenty-Fourth International Conference on Information Systems 613
12 Settings of Experiments We use two different types of source systems as discussed above. The first source system is an inventory system running on a Pentium IV, 2 GHz, 512 MB RAM, WinXP, and SQL Server Every change to the central inventory table fires a database trigger, which sends a message to the messaging server containing the changed inventory level. The second source system is a WFMS running on the same machine. It generates audit trail data, which are read by QTool and feed into the message queue. The messaging server is a Pentium II, 400 MHz, 256 MB RAM, Win2K server, running MSMQ Version 2.0. We have also done experiments with MSMQ 3.0 on a WinXP machine. The load server is an Athlon XP 2100+, 512 MB RAM, WinXP, alternately running QTool or the ETLContainer (with IBM Websphere application server, Version 5). We are using typical DWH database systems in our ETL container experiments: DB/2 Version 8.1, SQL Server 2000, and Teradata V2R4.1. Since QTool uses the tpump data load utility, we had to use a Teradata database for the second prototype running on a dual-cpu SMP system (2*1GHz, 2 GB RAM, RAID). The systems are LAN-connected via Ethernet (100 megabit/sec). If continuous data loading concentrates on data freshness for certain data classes with very low data rates rather than throughput, an ADSL connection (upstream 64 kilobit/sec) between the load server and the DWH database is also sufficient. This might be applicable for highly distributed environments. Experiences and Comparison It is fact that continuous data integration (generating SQL-statements on-the-fly) cannot be as fast as bulk loads. Since both implemented systems still offer potential for (technical) performance optimizations, we do not provide detailed evaluations. However, the comparison of the presented ETL container and the second approach shows some interesting results. They can be summarized as follows: If you require configurability, flexibility, and complex event handling during continuous data integration, the ETL container is the best option. If one does not require platform or database independence, but needs better performance for continuous data loads, the second approach (utilizing special-purpose data load utilities from database vendors) is best. In our experiments, we were able to process up to 70 events per second. Every message contains about 100 bytes, which results in up to 500 MB of continuously integrated data per day with relatively low resource utilization at the underlying database system. Complex data transformations (based on ETLets) and metric evaluations will decrease the performance. Better hardware with multiple processors and better network connection will increase the performance. Since the design of the ETL container is component-oriented and the system makes use of the parallelism provided by an application server, it is very scalable. Both prototype systems provide robust continuous data loads; the ETL container provides sophisticated exception handling for ETLets, while the QTool scheduler controls the execution of loading jobs and makes sure that tpump instances are running alternately (an error condition initiates an immediate take-over by the other instance). The ETL container approach has several advantages over traditional ETL solutions: The J2EE platform provides interoperability with many source systems. Existing Java resource adapters (e.g., J2EE connectors, JDBC, JMS) and many Java-compliant EAI connectors can be utilized for the data extraction. It supports complex transformation and data processing tasks. All tasks are implemented in Java. Therefore, we can take full advantage of the J2EE platform and container services. The ETL container supports straight-through processing for ETL tasks. ETLets are executed in parallel and controlled by the flow management service of the container. ETLets can also use EJB components to distribute the ETL processing. The usage of Java threads for the ETL processing and lightweight flow management allows the processing of a large number of data extracts or messages concurrently in near real-time. The ETL container provides a clean separation of the extraction logic (event adapters), transformation logic (ETLets), and evaluation logic (evaluators), which makes the components pluggable and an ETL application more extendible. The evaluation capabilities of the ETL container can be utilized for automatic responses to source systems or for notifications Twenty-Fourth International Conference on Information Systems
13 ETL vendors are able to develop prepackaged J2EE ETL solutions that can be extended and customized for organizations. The ETL applications are platform independent and can be deployed in various DWH environments. However, the ETL container has several drawbacks compared to the database-oriented approach (based on a monitoring tool and scheduler, like Qtool): The flexibility and the capabilities of the ETL container reduce performance. If we integrate data with basic data transformations (converting strings to numbers and checking data conditions), the QTool/tpump environment processed 69 events per second, the ETL container only 41. Another experiment (based on the Process Information Factory) does complex transformations of workflow events from a supply chain process. Incoming events are processed by selecting several datasets from a database, calculating complex business metrics with Web services and inserting the data into the DWH. This experiment is possible with the ETL container (which processes five events per second on average), but was not feasible with tpump. Of-the-shelf data load utilities often provide a limited but easy-to-learn scripting language for programming the data transformation tasks. The development of ETL applications for the ETL container requires Java and JDBC/SQL development skills, because data transformation tasks of ETLets have to be written in Java. The advantages of massively parallel database systems (e.g., Teradata) can be better utilized by database-specific data load utilities than by external programs, which access the database through standard interfaces (e.g., JDBC). Summary and Future Work A traditional DWH focuses on strategic decision support. A well-developed strategy is vital, but its ultimate value to an organization is only as good as its execution. As a result, deployment of DWH solutions for operational and tactical decisionmaking is becoming an increasingly important use of information assets. In this paper we have briefly discussed traditional ETL (extract-transform-load) tools, which often provide only partial solutions to the challenge of continuous data integration with lowlatency and high data freshness. A DWH can only be considered (near) real-time, when all or essential parts of the data are updated or loaded on an intraday basis, without interrupting user access to the system. However, most ETL tools, whether based on off-the-shelf or custom-coded products, operate in a batch mode; they often take for granted that they are operating during a batch window and that they do not affect or disrupt active user sessions. We have described a continuous, near real-time data integration approach. The ETL container takes responsibility for system-level services (threading, transactions, resource management, etc.). This arrangement leaves the component developer with the simplified task of developing the ETL functionality. It also allows for reconfigurability of implementation details of system services without changing the component code, thus making components useful in a wide range of applications. Instead of developing traditional ETL scripts, which are hard to maintain, scale, and reuse, ETL developers are able to implement Java components for the ETL process. Therefore, we can make full use of the J2EE platform and are not limited to typical scripting languages of traditional ETL solutions. ETLets are similar to ETL scripts, but they run in a container of a Java application server. ETLets are executed in parallel and managed by an ETL container. With ETLets, a lightweight Java thread, rather than a heavyweight operating system process handles each request. An ETL container controls and monitors the processing by optimizing the settings specified in the deployment descriptor. Furthermore, it provides container services for the evaluation of business metrics and for the management of ETL processing flows. The overall ETL container architecture has a clean separation of extraction logic (event adapters), transformation logic (ETLets), and evaluation logic (evaluators). At least for the near future, ETL and EAI systems will not necessarily merge functionally. Best-of-breed ETL tools will not be able to compete with best-of-breed EAI systems simply because the technological gaps are still too wide. In the near future, EAI and relational database systems will try to increasingly integrate traditional ETL functionality, whereas ETL systems will retreat (Figure 9). Nonetheless, there will be a convergence of ETL and EAI functionality. Therefore, we plan to further improve the ETL container based on our experience from the alternative approach of using off-theshelf database load utilities as well as investigating and analyzing typical EAI systems Twenty-Fourth International Conference on Information Systems 615
14 EAI Features ETL RDBMS Features EAI RDBMS Figure 9. ETL Evolution Several future projects are in progress on this research. We are building a distributed and clustered environment for ETL containers that allows them to work together in a server farm. We are also developing an evaluation framework that allows rule engines to be seamlessly plugged into the ETL container for more dynamic metric calculation and evaluation. Furthermore, we want to add new container services, which are useful for ETL developers, such as services for caching, concurrency control, and security. References Arsanjani, A. Developing and Integrating Enterprise Components and Services, Communications of the ACM (45:10), October 2002, pp Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. Models and Issues in Data Stream Systems, in Proceedings of the Twenty-First Symposium on Principles of Database Systems, Madison, Wisconsin, May 2002, pp Bruckner, R. M., and Braito, R. QTool V2.5 Documentation, Technical Report, Institute of Software Technology, Vienna University of Technology, January Bruckner, R. M., and Tjoa, A M. Capturing Delays and Valid Times in Data Warehouses Towards Timely Consistent Analyses, Journal of Intelligent Information Systems (19:2), September 2002, pp Chen, Q., Hsu, M., and Dayal, U. A Data-Warehouse/OLAP Framework for Scalable Telecommunication Tandem Traffic Analysis, in Proceedings of the Sixteenth International Conference on Data Engineering, San Diego, CA, March 2000, pp Kimball, R. The Data Warehouse Toolkit, John Wiley and Sons, New York, Labio, W. J., Yerneni, R., and Garcia-Molina, H. Shrinking the Warehouse Update Window, ACM SIGMOD Record (28:2), June 1999, pp List, B., Schiefer, J., Quirchmayr, G., and Tjoa, A. M. Multi-dimensional Business Process Analysis with the Process Warehouse, in Knowledge Discovery for Business Information Systems, W. Abramowicz and J. Zurada (eds.), Kluwer Academic Publishers, Boston, 2000, pp Pedersen, T. B., Jensen, C. S., and Dyreson, C. E. A Foundation for Capturing and Querying Complex Multidimensional Data, Information Systems (26:5), 2001, pp Ram, P., and Do, L. Extracting Delta for Incremental Data Warehouse Maintenance, in Proceedings of the Sixteenth International Conference on Data Engineering, San Diego, CA, March 2000, pp Schiefer, J., Jeng, J. J., and Bruckner, R. M. Real-Time Workflow Audit Data Integration into Data Warehouse Systems, in Proceedings of the Eleventh European Conference on Information Systems, C. Ciborra, R. Mercurio, M. DeMarco, M. Martinez, and A. Carignani (eds.), Naples, September Sun Microsystems. J2EE Connector Specification 1.5, 2002 (available online at Sun Microsystems. Designing Enterprise Applications with the Java 2 Platform, Enterprise Edition, Second Edition, 2001 (available online at Todman, C. Designing a Data Warehouse, Prentice Hall, Upper Saddle River, NJ, Vassiliadis, P., Quix, C., Vassiliou, Y., and Jarke, M. Data Warehouse Process Management, Information Systems (25:3), 2001, pp Twenty-Fourth International Conference on Information Systems
Managing Continuous Data Integration Flows
Managing Continuous Data Integration Flows Josef Schiefer 1, Jun-Jang Jeng 1, Robert M. Bruckner 2 1 IBM Watson Research Center 19 Skyline Drive, Hawthorne, NY 10532 {josef.schiefer,jjjeng}@us.ibm.com
Striving towards Near Real-Time Data Integration for Data Warehouses
Striving towards Near Real-Time Data Integration for Data Warehouses Robert M. Bruckner 1, Beate List 1, and Josef Schiefer 2 1 Institute of Software Technology Vienna University of Technology Favoritenstr.
MDM and Data Warehousing Complement Each Other
Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There
High-Volume Data Warehousing in Centerprise. Product Datasheet
High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified
EAI vs. ETL: Drawing Boundaries for Data Integration
A P P L I C A T I O N S A W h i t e P a p e r S e r i e s EAI and ETL technology have strengths and weaknesses alike. There are clear boundaries around the types of application integration projects most
Enterprise Application Integration
Enterprise Integration By William Tse MSc Computer Science Enterprise Integration By the end of this lecturer you will learn What is Enterprise Integration (EAI)? Benefits of Enterprise Integration Barrier
Oracle Data Integrator Technical Overview. An Oracle White Paper Updated December 2006
Oracle Data Integrator Technical Overview An Oracle White Paper Updated December 2006 Oracle Data Integrator Technical Overview Introduction... 3 E-LT Architecture... 3 Traditional ETL... 3 E-LT... 4 Declarative
SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs
Database Systems Journal vol. III, no. 1/2012 41 SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs 1 Silvia BOLOHAN, 2
Reducing ETL Load Times by a New Data Integration Approach for Real-time Business Intelligence
Reducing ETL Load Times by a New Data Integration Approach for Real-time Business Intelligence Darshan M. Tank Department of Information Technology, L.E.College, Morbi-363642, India [email protected] Abstract
ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR
ORACLE DATA INTEGRATOR ENTERPRISE EDITION KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR ENTERPRISE EDITION OFFERS LEADING PERFORMANCE, IMPROVED PRODUCTIVITY, FLEXIBILITY AND LOWEST TOTAL COST OF OWNERSHIP
A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing
A Framework for Developing the Web-based Integration Tool for Web-Oriented Warehousing PATRAVADEE VONGSUMEDH School of Science and Technology Bangkok University Rama IV road, Klong-Toey, BKK, 10110, THAILAND
Oracle Warehouse Builder 10g
Oracle Warehouse Builder 10g Architectural White paper February 2004 Table of contents INTRODUCTION... 3 OVERVIEW... 4 THE DESIGN COMPONENT... 4 THE RUNTIME COMPONENT... 5 THE DESIGN ARCHITECTURE... 6
Methods and Technologies for Business Process Monitoring
Methods and Technologies for Business Monitoring Josef Schiefer Vienna, June 2005 Agenda» Motivation/Introduction» Real-World Examples» Technology Perspective» Web-Service Based Business Monitoring» Adaptive
zen Platform technical white paper
zen Platform technical white paper The zen Platform as Strategic Business Platform The increasing use of application servers as standard paradigm for the development of business critical applications meant
Persistent, Reliable JMS Messaging Integrated Into Voyager s Distributed Application Platform
Persistent, Reliable JMS Messaging Integrated Into Voyager s Distributed Application Platform By Ron Hough Abstract Voyager Messaging is an implementation of the Sun JMS 1.0.2b specification, based on
What Is the Java TM 2 Platform, Enterprise Edition?
Page 1 de 9 What Is the Java TM 2 Platform, Enterprise Edition? This document provides an introduction to the features and benefits of the Java 2 platform, Enterprise Edition. Overview Enterprises today
Gradient An EII Solution From Infosys
Gradient An EII Solution From Infosys Keywords: Grid, Enterprise Integration, EII Introduction New arrays of business are emerging that require cross-functional data in near real-time. Examples of such
Service Oriented Data Management
Service Oriented Management Nabin Bilas Integration Architect Integration & SOA: Agenda Integration Overview 5 Reasons Why Is Critical to SOA Oracle Integration Solution Integration
Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence
Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Service Oriented Architecture SOA and Web Services John O Brien President and Executive Architect Zukeran Technologies
Data Warehousing Systems: Foundations and Architectures
Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository
TIBCO Live Datamart: Push-Based Real-Time Analytics
TIBCO Live Datamart: Push-Based Real-Time Analytics ABSTRACT TIBCO Live Datamart is a new approach to real-time analytics and data warehousing for environments where large volumes of data require a management
Oracle SOA Suite: The Evaluation from 10g to 11g
KATTA Durga Reddy TATA Consultancy Services. Oracle SOA Suite: The Evaluation from 10g to 11g Introduction Oracle SOA Suite is an essential middleware layer of Oracle Fusion Middleware. It provides a complete
ActiveVOS Server Architecture. March 2009
ActiveVOS Server Architecture March 2009 Topics ActiveVOS Server Architecture Core Engine, Managers, Expression Languages BPEL4People People Activity WS HT Human Tasks Other Services JMS, REST, POJO,...
Data Virtualization and ETL. Denodo Technologies Architecture Brief
Data Virtualization and ETL Denodo Technologies Architecture Brief Contents Data Virtualization and ETL... 3 Summary... 3 Data Virtualization... 7 What is Data Virtualization good for?... 8 Applications
Internet Engineering: Web Application Architecture. Ali Kamandi Sharif University of Technology [email protected] Fall 2007
Internet Engineering: Web Application Architecture Ali Kamandi Sharif University of Technology [email protected] Fall 2007 Centralized Architecture mainframe terminals terminals 2 Two Tier Application
Real-time Data Replication
Real-time Data Replication from Oracle to other databases using DataCurrents WHITEPAPER Contents Data Replication Concepts... 2 Real time Data Replication... 3 Heterogeneous Data Replication... 4 Different
Oracle Identity Analytics Architecture. An Oracle White Paper July 2010
Oracle Identity Analytics Architecture An Oracle White Paper July 2010 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may
ORACLE DATA INTEGRATOR ENTERPRISE EDITION
ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION KEY FEATURES Out-of-box integration with databases, ERPs, CRMs, B2B systems, flat files, XML data, LDAP, JDBC, ODBC Knowledge
Oracle Data Integrator 11g New Features & OBIEE Integration. Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect
Oracle Data Integrator 11g New Features & OBIEE Integration Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect Agenda 01. Overview & The Architecture 02. New Features Productivity,
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
EII - ETL - EAI What, Why, and How!
IBM Software Group EII - ETL - EAI What, Why, and How! Tom Wu 巫 介 唐, [email protected] Information Integrator Advocate Software Group IBM Taiwan 2005 IBM Corporation Agenda Data Integration Challenges and
Jitterbit Technical Overview : Microsoft Dynamics CRM
Jitterbit allows you to easily integrate Microsoft Dynamics CRM with any cloud, mobile or on premise application. Jitterbit s intuitive Studio delivers the easiest way of designing and running modern integrations
Efficient and Real Time Data Integration With Change Data Capture
Efficient and Real Time Data Integration With Change Data Capture Efficiency. Low Latency. No Batch Windows. Lower Costs. an Attunity White Paper Change Data Capture (CDC) is a strategic component in the
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com WHITE PAPER Managing Web Applications Infrastructure with IBM Tivoli Monitoring Sponsored by: IBM
The TransactionVision Solution
The TransactionVision Solution Bristol's TransactionVision is transaction tracking and analysis software that provides a real-time view of business transactions flowing through a distributed enterprise
OWB Users, Enter The New ODI World
OWB Users, Enter The New ODI World Kulvinder Hari Oracle Introduction Oracle Data Integrator (ODI) is a best-of-breed data integration platform focused on fast bulk data movement and handling complex data
JOURNAL OF OBJECT TECHNOLOGY
JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,
IBM WebSphere MQ File Transfer Edition, Version 7.0
Managed file transfer for SOA IBM Edition, Version 7.0 Multipurpose transport for both messages and files Audi logging of transfers at source and destination for audit purposes Visibility of transfer status
Data Integration Checklist
The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media
Cache Database: Introduction to a New Generation Database
Cache Database: Introduction to a New Generation Database Amrita Bhatnagar Department of Computer Science and Engineering, Birla Institute of Technology, A 7, Sector 1, Noida 201301 UP [email protected]
JSLEE and SIP-Servlets Interoperability with Mobicents Communication Platform
JSLEE and SIP-Servlets Interoperability with Mobicents Communication Platform Jean Deruelle Jboss R&D, a division of Red Hat [email protected] Abstract JSLEE is a more complex specification than SIP
A Survey Study on Monitoring Service for Grid
A Survey Study on Monitoring Service for Grid Erkang You [email protected] ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide
An Oracle White Paper March 2014. Best Practices for Real-Time Data Warehousing
An Oracle White Paper March 2014 Best Practices for Real-Time Data Warehousing Executive Overview Today s integration project teams face the daunting challenge that, while data volumes are exponentially
An Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide
Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd
Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd Page 1 of 8 TU1UT TUENTERPRISE TU2UT TUREFERENCESUT TABLE
An Oracle White Paper October 2013. Oracle Data Integrator 12c New Features Overview
An Oracle White Paper October 2013 Oracle Data Integrator 12c Disclaimer This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should
RS MDM. Integration Guide. Riversand
RS MDM 2009 Integration Guide This document provides the details about RS MDMCenter integration module and provides details about the overall architecture and principles of integration with the system.
Pervasive Software + NetSuite = Seamless Cloud Business Processes
Pervasive Software + NetSuite = Seamless Cloud Business Processes Successful integration solution between cloudbased ERP and on-premise applications leveraging Pervasive integration software. Prepared
Near Real-time Data Warehousing with Multi-stage Trickle & Flip
Near Real-time Data Warehousing with Multi-stage Trickle & Flip Janis Zuters University of Latvia, 19 Raina blvd., LV-1586 Riga, Latvia [email protected] Abstract. A data warehouse typically is a collection
Oracle WebLogic Server 11g Administration
Oracle WebLogic Server 11g Administration This course is designed to provide instruction and hands-on practice in installing and configuring Oracle WebLogic Server 11g. These tasks include starting and
WebSphere Application Server - Introduction, Monitoring Tools, & Administration
WebSphere Application Server - Introduction, Monitoring Tools, & Administration presented by: Michael S. Pallos, MBA Senior Solution Architect IBM Certified Systems Expert: WebSphere MQ 5.2 e-business
WebSphere Server Administration Course
WebSphere Server Administration Course Chapter 1. Java EE and WebSphere Overview Goals of Enterprise Applications What is Java? What is Java EE? The Java EE Specifications Role of Application Server What
Client-Server Architecture & J2EE Platform Technologies Overview Ahmed K. Ezzat
Client-Server Architecture & J2EE Platform Technologies Overview Ahmed K. Ezzat Page 1 of 14 Roadmap Client-Server Architecture Introduction Two-tier Architecture Three-tier Architecture The MVC Architecture
ORACLE DATA INTEGRATOR ENTERPRISE EDITION
ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition 12c delivers high-performance data movement and transformation among enterprise platforms with its open and integrated
Chapter 3 - Data Replication and Materialized Integration
Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 [email protected] Chapter 3 - Data Replication and Materialized Integration Motivation Replication:
SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here
PLATFORM Top Ten Questions for Choosing In-Memory Databases Start Here PLATFORM Top Ten Questions for Choosing In-Memory Databases. Are my applications accelerated without manual intervention and tuning?.
QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE
QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QlikView Technical Brief April 2011 www.qlikview.com Introduction This technical brief covers an overview of the QlikView product components and architecture
A standards-based approach to application integration
A standards-based approach to application integration An introduction to IBM s WebSphere ESB product Jim MacNair Senior Consulting IT Specialist [email protected] Copyright IBM Corporation 2005. All rights
Performance Management Platform
Open EMS Suite by Nokia Performance Management Platform Functional Overview Version 1.4 Nokia Siemens Networks 1 (16) Performance Management Platform The information in this document is subject to change
Integrating data in the Information System An Open Source approach
WHITE PAPER Integrating data in the Information System An Open Source approach Table of Contents Most IT Deployments Require Integration... 3 Scenario 1: Data Migration... 4 Scenario 2: e-business Application
CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS
CHAPTER 1 - JAVA EE OVERVIEW FOR ADMINISTRATORS Java EE Components Java EE Vendor Specifications Containers Java EE Blueprint Services JDBC Data Sources Java Naming and Directory Interface Java Message
A roadmap to enterprise data integration.
Information integration solutions February 2006 A roadmap to enterprise data integration. Colin White BI Research Page 1 Contents 1 Data integration in the enterprise 1 Characteristics of data integration
Chapter 4. Architecture. Table of Contents. J2EE Technology Application Servers. Application Models
Table of Contents J2EE Technology Application Servers... 1 ArchitecturalOverview...2 Server Process Interactions... 4 JDBC Support and Connection Pooling... 4 CMPSupport...5 JMSSupport...6 CORBA ORB Support...
Implementing efficient system i data integration within your SOA. The Right Time for Real-Time
Implementing efficient system i data integration within your SOA The Right Time for Real-Time Do your operations run 24 hours a day? What happens in case of a disaster? Are you under pressure to protect
White Paper: 1) Architecture Objectives: The primary objective of this architecture is to meet the. 2) Architecture Explanation
White Paper: 1) Architecture Objectives: The primary objective of this architecture is to meet the following requirements (SLAs). Scalability and High Availability Modularity and Maintainability Extensibility
Three Simple Ways to Master the Administration and Management of an MDM Hub
Three Simple Ways to Master the Administration and Management of an MDM Hub Jitendra Malhotra Lead Engineer Global Customer Support Avneet Bajwa Senior Engineer Global Customer Support Informatica 1 Breakout
<Insert Picture Here> Oracle In-Memory Database Cache Overview
Oracle In-Memory Database Cache Overview Simon Law Product Manager The following is intended to outline our general product direction. It is intended for information purposes only,
How To Create A C++ Web Service
A Guide to Creating C++ Web Services WHITE PAPER Abstract This whitepaper provides an introduction to creating C++ Web services and focuses on:» Challenges involved in integrating C++ applications with
Data Integration for the Real Time Enterprise
Executive Brief Data Integration for the Real Time Enterprise Business Agility in a Constantly Changing World Overcoming the Challenges of Global Uncertainty Informatica gives Zyme the ability to maintain
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
How to Build an E-Commerce Application using J2EE. Carol McDonald Code Camp Engineer
How to Build an E-Commerce Application using J2EE Carol McDonald Code Camp Engineer Code Camp Agenda J2EE & Blueprints Application Architecture and J2EE Blueprints E-Commerce Application Design Enterprise
COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER
Page 1 of 8 ABOUT THIS COURSE This 5 day course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL Server
Integrating Ingres in the Information System: An Open Source Approach
Integrating Ingres in the Information System: WHITE PAPER Table of Contents Ingres, a Business Open Source Database that needs Integration... 3 Scenario 1: Data Migration... 4 Scenario 2: e-business Application
Implementing a Data Warehouse with Microsoft SQL Server
Page 1 of 7 Overview This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL 2014, implement ETL
Integrating SAP and non-sap data for comprehensive Business Intelligence
WHITE PAPER Integrating SAP and non-sap data for comprehensive Business Intelligence www.barc.de/en Business Application Research Center 2 Integrating SAP and non-sap data Authors Timm Grosser Senior Analyst
Software Life-Cycle Management
Ingo Arnold Department Computer Science University of Basel Theory Software Life-Cycle Management Architecture Styles Overview An Architecture Style expresses a fundamental structural organization schema
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Deploying Rule Applications
White Paper Deploying Rule Applications with ILOG JRules Deploying Rule Applications with ILOG JRules White Paper ILOG, September 2006 Do not duplicate without permission. ILOG, CPLEX and their respective
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should
Introduction to WebSphere Process Server and WebSphere Enterprise Service Bus
Introduction to WebSphere Process Server and WebSphere Enterprise Service Bus Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 4.0.3 Unit objectives
Service Oriented Architectures
8 Service Oriented Architectures Gustavo Alonso Computer Science Department Swiss Federal Institute of Technology (ETHZ) [email protected] http://www.iks.inf.ethz.ch/ The context for SOA A bit of history
Chapter 5. Learning Objectives. DW Development and ETL
Chapter 5 DW Development and ETL Learning Objectives Explain data integration and the extraction, transformation, and load (ETL) processes Basic DW development methodologies Describe real-time (active)
Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise. Colin White Founder, BI Research TDWI Webcast October 2005
Data Integration: Using ETL, EAI, and EII Tools to Create an Integrated Enterprise Colin White Founder, BI Research TDWI Webcast October 2005 TDWI Data Integration Study Copyright BI Research 2005 2 Data
BUSINESSOBJECTS DATA INTEGRATOR
PRODUCTS BUSINESSOBJECTS DATA INTEGRATOR IT Benefits Correlate and integrate data from any source Efficiently design a bulletproof data integration process Accelerate time to market Move data in real time
Oracle Database 11g Comparison Chart
Key Feature Summary Express 10g Standard One Standard Enterprise Maximum 1 CPU 2 Sockets 4 Sockets No Limit RAM 1GB OS Max OS Max OS Max Database Size 4GB No Limit No Limit No Limit Windows Linux Unix
Solutions for detect, diagnose and resolve performance problems in J2EE applications
IX Konferencja PLOUG Koœcielisko PaŸdziernik 2003 Solutions for detect, diagnose and resolve performance problems in J2EE applications Cristian Maties Quest Software Custom-developed J2EE applications
Unified Data Integration Across Big Data Platforms
Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using ELT... 6 Diyotta
White Paper. Unified Data Integration Across Big Data Platforms
White Paper Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using
ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING
ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING Enzo Unified Extends SQL Server to Simplify Application Design and Reduce ETL Processing CHALLENGES SQL Server does not scale out
Oracle WebLogic Foundation of Oracle Fusion Middleware. Lawrence Manickam Toyork Systems Inc www.toyork.com http://ca.linkedin.
Oracle WebLogic Foundation of Oracle Fusion Middleware Lawrence Manickam Toyork Systems Inc www.toyork.com http://ca.linkedin.com/in/lawrence143 History of WebLogic WebLogic Inc started in 1995 was a company
Understanding and Selecting Integration Approaches
Understanding and Selecting Integration Approaches David McGoveran Alternative Technologies 6221A Graham Hill Road, Suite 8001 Felton, California, 95018 Website: Email: [email protected] Telephone:
Part 2: The Neuron ESB
Neuron ESB: An Enterprise Service Bus for the Microsoft Platform This paper describes Neuron ESB, Neudesic s ESB architecture and framework software. We first cover the concept of an ESB in general in
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012 OVERVIEW About this Course Data warehousing is a solution organizations use to centralize business data for reporting and analysis.
An Architecture for Integrated Operational Business Intelligence
An Architecture for Integrated Operational Business Intelligence Dr. Ulrich Christ SAP AG Dietmar-Hopp-Allee 16 69190 Walldorf [email protected] Abstract: In recent years, Operational Business Intelligence
Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage
Moving Large Data at a Blinding Speed for Critical Business Intelligence A competitive advantage Intelligent Data In Real Time How do you detect and stop a Money Laundering transaction just about to take
Implementing a Data Warehouse with Microsoft SQL Server
This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse 2014, implement ETL with SQL Server Integration Services, and
IBM WebSphere Server Administration
IBM WebSphere Server Administration This course teaches the administration and deployment of web applications in the IBM WebSphere Application Server. Duration 24 hours Course Objectives Upon completion
