Evaluation of two ETL's : CloverETL vs. 1 Talend Open Studio



Similar documents
Oracle Applications Release 10.7 NCA Network Performance for the Enterprise. An Oracle White Paper January 1998

x64 Servers: Do you want 64 or 32 bit apps with that server?

4.1 Introduction 4.2 Explain the purpose of an operating system Describe characteristics of modern operating systems Control Hardware Access

Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.

System Requirements Table of contents

IBM Rational Asset Manager

Improved metrics collection and correlation for the CERN cloud storage test framework

Data processing goes big

MySQL and Virtualization Guide

Server Monitoring. AppDynamics Pro Documentation. Version Page 1

Microsoft Windows Apple Mac OS X

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Microsoft Windows Apple Mac OS X

VMware Server 2.0 Essentials. Virtualization Deployment and Management

Power of Oracle in the Cloud

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

System Requirements. SAS Profitability Management Deployment

Tivoli Endpoint Manager for Remote Control Version 8 Release 2. User s Guide

High performance ETL Benchmark

OMX, Android, GStreamer How do I decide what to use? 15 July 2011

Microsoft SQL Server versus IBM DB2 Comparison Document (ver 1) A detailed Technical Comparison between Microsoft SQL Server and IBM DB2

An Oracle Benchmarking Study February Oracle Insurance Insbridge Enterprise Rating: Performance Assessment

Note: A WebFOCUS Developer Studio license is required for each developer.

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Random Walk Shoes. Setting Up a Web Server

System Requirements - Table of Contents

SNOW LICENSE MANAGER (7.X)... 3

Kentico CMS 6.0 Performance Test Report. Kentico CMS 6.0. Performance Test Report February 2012 ANOTHER SUBTITLE

1. Product Information

Hard Disk Drive vs. Kingston SSDNow V+ 200 Series 240GB: Comparative Test

Performance Test Report KENTICO CMS 5.5. Prepared by Kentico Software in July 2010

System Requirements - CommNet Server

Software Requirements Specification. Human Resource Management System. Sponsored by Siemens Enterprise Communication. Prepared by InnovaSoft

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

Online Backup Client User Manual Linux

Hardware, Software and Training Requirements for DMFAS 6

PowerPivot for Advanced Reporting and Dashboards

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

RecoveryVault Express Client User Manual

Virtuoso and Database Scalability

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Datzilla. Error Reporting and Tracking for NOAA Data

1) SETUP ANDROID STUDIO

OWB Users, Enter The New ODI World

What s New in SPSS 16.0

Pfeiffer. The 30-inch Apple Cinema HD Display Productivity Benchmark. Measuring the impact of screen size on real-world productivity

Introweb Remote Backup Client for Mac OS X User Manual. Version 3.20

Revit products will use multiple cores for many tasks, using up to 16 cores for nearphotorealistic

Reminders. Lab opens from today. Many students want to use the extra I/O pins on

Online Backup Client User Manual

SNOW LICENSE MANAGER (7.X)... 3

System Requirements and Platform Support Guide

A Practical Approach to Process Streaming Data using Graph Database

CLOUD STORAGE USING HADOOP AND PLAY

Using Fuzzy Logic Control to Provide Intelligent Traffic Management Service for High-Speed Networks ABSTRACT:

An Oracle White Paper August Oracle WebCenter Content 11gR1 Performance Testing Results

A Middleware Strategy to Survive Compute Peak Loads in Cloud

An Oracle White Paper September SOA Maturity Model - Guiding and Accelerating SOA Success

WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION

Scaling from 1 PC to a super computer using Mascot

TIBCO ActiveMatrix BusinessWorks Plug-in for TIBCO Managed File Transfer Software Installation

LCMON Network Traffic Analysis

EMC Symmetrix V-Max with Veritas Storage Foundation

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Online Backup Client User Manual

Red Hat Enterprise Linux 6. Stanislav Polášek ELOS Technologies

Contents. 2. cttctx Performance Test Utility Server Side Plug-In Index All Rights Reserved.

ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY

CLIDATA In Ostrava 18/06/2013

NAS 249 Virtual Machine Configuration with VirtualBox

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK tel:

Getting Started with VMware Fusion. VMware Fusion for Mac OS X

Testing and Deploying IBM Rational HATS 8.5 Applications on Apache Geronimo Server 3.1

Streamline NX. Capture, Distribution & Output Management Solution

The power of IBM SPSS Statistics and R together

Netwrix Auditor for SQL Server

openlca 1.4 overview and first steps

Enterprise Service Bus

Ahsay BackupBox v1.0. Deployment Guide. Ahsay TM Online Backup - Development Department

Minimum Hardware Configurations for EMC Documentum Archive Services for SAP Practical Sizing Guide

Generate Android App

Scalability and Performance Report - Analyzer 2007

OCR LEVEL 2 CAMBRIDGE TECHNICAL

Computer Information & Recommendations

White Paper. Recording Server Virtualization

Integrating Ingres in the Information System: An Open Source Approach

SECURELINK.COM REMOTE SUPPORT NETWORK

VMWare Workstation 11 Installation MICROSOFT WINDOWS SERVER 2008 R2 STANDARD ENTERPRISE ED.

Muse Server Sizing. 18 June Document Version Muse

Fourth generation techniques (4GT)

Comparison of versions 7.5 and 9.2. IBM License Metric Tool & Software Use Analysis Questions and Answers ILMT Central Team

Understand and Build Android Programming Environment. Presented by: Che-Wei Chang

Hadoop Architecture. Part 1

Section 1.4. Java s Magic: Bytecode, Java Virtual Machine, JIT,

Scalability Factors of JMeter In Performance Testing Projects

Open Source Business Intelligence Intro

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Transcription:

Evaluation of two ETL's : CloverETL vs. 1 Talend Open Studio To follow our previous article about ETL s introduction, we will present and compare two open source ETLs: CloverETL and TOS (Talend Open Studio). Original: http://www.axege.com/evaluation-de-deux-etl-clover-etl-vs-talend-open- Studio.html 1 Extract, Transform, Load

Presentation Clover consists of two parts: CloverETL, which is an engine, and Clover.GUI 2, graphical interface facilitating the creation of data flows. Both are based on Java technology. They are platform independent and resource efficient. Clover.GUI is available under commercial license and is supplied in the form of an Eclipse Plugin. On the other hand, Clover.ETL engine is provided under LGPL 3 license and can co-operate with any tools (even with commercial licenses). Clover.GUI TOS is a code generator. Its interface allows transforming data flows into graphical representations, which automatically transcribe to Perl or Java code. They can also be exported and run without TOS. This ETL is distributed as an installation package. This tool is provided under license GPL and it can not be embedded into any software without this license. 2 Graphical User Interface 3 Lesser General Public License

Evaluation For the needs of their clients, Axège has elaborated a comparative study of Clover.ETL and TOS. The first one is widely used by developers of Axège Santé, the second is very common in the market. Methodology The process of evaluation conforms Business Readiness Rating. Four phases of the method BRR are following: To do a quick evaluation to create a short list of software to be valuated To identify categories and metrics of the evaluation To collect the relevant data To rate these data from 1 (low) to 5 (high) Study The study has been realized from the technical point of view. Its benefit is to analyze the software from various angles rather than in a general overview. It was realized with the following configuration: AMD 4 Athlon(tm) 64 x2 Dual Core Processor 4400+ 2 GB RAM Ubuntu version 8.04 (Hardy) TOS version 2.3.3 Clove.GUI version 1.9.2 and Clover.ETL 2.4.3 The categories with descending importance: Functionality: coverage of software functionality (metadata, transformations...) Performance: memory consumption and execution time Documentation: quality of the software documentation Usage spread: usage of the software in the market of ETL 4 Advanced Micro Devices

Professionalism: applied methods in the process of the development and of the organization of the project User-friendliness: quality of user interface Community: level of activity of the user / developer community Architecture: modularity, portability, flexibility, scalability, ease of integration Packaging: number of supported platform Maturity: age, stability, history and fork Quality: quality of the conception, the code and the tests Services: support and service Study category by category: Functionality: Comparison and results TOS has larger palette of components (246 components against 57), so it provides more functionality. This fact doesn't discriminate Clover.ETL in its role of ETL tool, it has some components that TOS is missing. For example, Clover.ETL has the component DataIntersection. It allows the intersection of two flows - A and B - based on the specific key to be done. Three outputs are present on this component: the records present only in A, the records in A and B, and the records only in B. Performance: Clover.ETL has an advantage because TOS consumes a lot of memory. Despite TOS good execution time on a small number of processed records (up to 2 million) its huge memory consumption doesn t allow it to read more than 3 million records. Here are some of the results: = Reading Time for execution = Reading Consumption of the memory

Documentation: The documentation of TOS is incomplete. Lots of components aren't described in the user manual and some explanations aren't so precise. Usage Spread : From its beginning TOS has been downloaded 250 000 times, but Talend estimates that it actually has about 75 000 users. TOS hugely invests into marketing, so it is more widespread than Clover.ETL in the world of ETL. Professionalism: TOS is more organized than Clover.ETL from the view of modification and extension of the code. For example, TOS uses roadmaps, so new functionality and versions are better managed. User-friendliness friendliness: Community: TOS has a very pleasant and comfortable interface because many actions can be done by drag-and-drop. Clover.ETL is easier to use because of fewer components and is therefore clearer. Architecture: We can see from the activity on forums that TOS has a very active and participating community. Packaging: Maturity: Quality: Services: The code of Clover.ETL is more understandable. So it is easier to be modified. Both ETL can be used on more systems like Windows, Linux, Debian, Unix Solaris, Mac OS X... Clover.ETL is a little bit older than TOS but both don t come from fork and have very little possibility to fail. Both have the bugtrackers, but only TOS uses it. Each organization offers the solutions of support and service. Their offers are organized on expert level and according to the size of the company.

Conclusion = Services = Maturity = Usage Spread = Performance = Community = Functionality = Professionalism = User friendliness = Quality = Material = Packaging = Structure None of these tools are better than the other: summary of the advantages and disadvantages of the TOS and the Clover.ETL. TOS Clover.ETL Explanation Functionality x Bigger palette of components for TOS but Clover.ETL is a complete ETL. Performance x TOS needs a larger part of the memory and doesn't finish all jobs within the limit. Documentation x Material of TOS is incomplete and inaccurate. Adoption x TOS has a good marketing and big community. Clover.ETL community doesn't exist. Professionalism x TOS is more organized than Clover.ETL in modification and code extension. TOS has better

User friendliness x project management. Interface of TOS is nicer. Clover.ETL is more easy-to-use. Community x TOS has active and participating community. Architecture x Code of the Clover.ETL is easier to modify. Packaging x x Both can be used on many systems. Maturity x TOS is more recent than Clover.ETL. Quality x TOS uses a bugtracker. Services x x Both offer various level of support. License x The license of TOS (GPL 5 ) is very restrictive. None of these ETL tools is better than the other so we can conclude that the choice between them should be done according to the customer s needs. 5 General Public License

Observation and comments from a CloverETL consultant Appendix A: Scalability Observation We believe that the performance measurement conducted by Axege, depicted by the two25 performance graphs above, clearly shows that TOC is not a scalable solution and cannot operate on larger data volumes. Good scalability is probably the most important characteristics of any ETL tool as it gives a clear idea of how the tools behaves with growing volume of input data, thus showing if the tool will be capable of processing large data volumes once deployed. Lack of TOC scalability can be easily observed on the graph Consommation memoire. With growing data volume, not only the processing time grows, the memory consumption increases as well with linear scale. At 2.5 million rows of input the TOC requires 1.4 GB of memory almost 580 kb per single input line! At 5 million rows the TOC runs out of memory and fails to process the input completely: according to hardware specification the system has 2 GB of physical memory, while TOC would need 2.8 GB of memory according to our calculations. ( 5 million lines * 580 kb per line= 2.8 GB). We would also like to provide reader with an estimate of physical data volume processed in this comparison study. An exact calculation requires knowledge of input file format. Unfortunately this information is not part of the study, therefore we have to compromise with an estimation. Let's consider an ASCII text file with fixed-length record format, having 32 bytes per line. For an input files with 5 million rows we calculate its size: 5 million * 32 bytes = 152 MB of input data. From our experience 150 MB of input is a relatively small data file and real-world scenarios operate with data volumes that are easily 100-1000x 1000x larger. An excerpt of such file could look like this: ID Cust Amount Cur 1 42385 35665.20 2 2 34210 36134.06 2 3 18495 35907.54 32 4 21780 8505.68 6

This is why CloverETL is designed to run with fixed memory consumption and scalability in mind. Once again, this statement is clearly proven by Axege graphs. The graph Consommation memoire shows CloverETL consumes less than 50 MB of memory for any data volume. Processing time grows linearly with data volume: if processing of 1 MB takes 1 second, processing 10 MB takes 10 seconds.