Genio from Hummingbird an extract from the Bloor Research report, Data Integration, Volume 1
Hummingbird Genio Hummingbird Genio Fast facts Hummingbird s Integration Suite component, Genio, is a data integration tool aimed primarily at the mid-market. Traditionally, data movement products have fallen into one of two categories: either they are engine-based, black box products or they are code generators. Hummingbird Genio represents a hybrid approach in that it is engine-based and uses a 4GL-based approach for developing transformation logic. Another significant feature is that it also provides the ability to offload process execution onto the source or target system. This combination provides the performance benefits associated with engine-based products without losing the ability to define complex transformations that often fall outside the scope of traditional engine-based environments. Key findings In the opinion of Bloor Research the following represent the key facts of which prospective users should be aware: Unlike other engine-based products, the use of a procedural language (4GL) for defining transformations means that you can inspect, debug and test what you have developed in a conventional manner, including any native SQL that may have been generated. While transformation logic is defined using the 4GL, different logic and other modules can be combined using a conventional icon-based, graphical, drag-and-drop based workflow environment. Hummingbird gives you the option of performing relevant SQL operations on data targets and sources, or you can run them all within the Genio environment, depending on which approach will give you the best performance. Hummingbird provides extensive native SQL capabilities and it also has specific metadata-based integration with SAP environments, together with support for data modelling tools such as Computer Associates AllFusion ERWin. The product does not, at present, support the Common Warehouse Metamodel (CWM) but we understand that the company is committed to support of that standard in the future. Hummingbird provides a number of real-time and near real-time options, most notably through integration with WebSphere MQ. We think it would make sense for the company to develop JMS (Java Message Service) support as well. The product includes significant capabilities for supporting impact analysis, dependencies and data lineage, which will be important both for ongoing maintenance and corporate governance. Bloor Research 2004 Page 1
Data Integration, Volume 1 We are pleased to see that Hummingbird has partnered with Similarity, a provider of data profiling and cleansing software, of which we think highly (a separate product review of Similarity Athanor is available from Bloor Research). Hummingbird is currently investigating the opportunity of developing tighter integration between the two products. The bottom line We like Hummingbird s approach. If it is faster to develop and maintain conventional applications using a 4GL (and it is) than by hand coding, then the same should (and does) apply when it comes to developing transformations and mappings. Given that people who currently write code by hand are the major target market for Hummingbird with this product, we believe that the arguments in favour of the product are compelling. As we have noted, Hummingbird Genio is something of a hybrid that attempts to combine the benefits of engine-based and code generating approaches to Genio. While some treat these different stances almost on a religious basis, there is a large middle ground to which Hummingbird s approach should appeal. Page 2 Bloor Research 2004
Hummingbird Genio Vendor Information Background information Hummingbird was founded in Canada in 1984, and initially specialised in terminal emulation and connectivity products. In particular, it launched Exceed, its PC X Server connectivity product in 1989 and is a market leader (with a market share in excess of 70%) within its sphere. However, while these products are lucrative they nevertheless represent a very limited, niche market. So, in 1997 the company announced that it would purchase Andyne, an acquisition that it completed in January 1998. This brought Business Intelligence (BI) capability to the company for the first time, with what is now Hummingbird BI. Since then, the company has been expanded significantly through acquisition, as well as organically. As far as this report is concerned, the company s most important acquisition was a French company called Leonard s Logic, which provided an Extract, Transform and Load (ETL) product called Genio. Other notable acquisitions (in no particular order) include PeopleDoc, a provider of advanced collaborative capabilities, Valid, LegalKEY, DisPro, Kramer Lee and Associates and, in particular, the PC DOCS Group. This last is (or was), depending on how you measure it, the leading vendor of document management software. That is to say, PC DOCS has/had the largest number of installed users, though Documentum exceeded it in revenue terms. The effect of this acquisition was to almost double the size of Hummingbird and led to the release of what is now the Hummingbird Portal, in 1999. The Hummingbird Portal was significant because, for a while, the company saw itself as primarily an enterprise portal vendor. While it has by no means turned its back on that market, the company now appreciates that this focus was limiting the opportunities of its other products, and it now has a more broad-based approach. In particular, it sees Hummingbird Genio as having major growth potential. Hummingbird web address: www.hummingbird.com Product availability At the time of writing, Hummingbird Genio is in version 5.0.4, which was released in September 2003. However, version 5.1 is imminent (it is scheduled for availability in summer 2004) and we have therefore included details about a number of the new features in that release. Hummingbird Genio runs on Windows, Sun Solaris, HP-UX and Aix platforms. For its repository it requires an Informix, SQL Server, Oracle, Sybase ASE, Sybase SQL Anywhere, or DB2 UDB database. When addressing these data- Bloor Research 2004 Page 3
Data Integration, Volume 1 bases as sources or targets it uses native SQL, which also applies to Teradata. Native access is also used in the case of Sybase and Oracle and, further, there are native capabilities for the direct population of both Hyperion Essbase and Oracle Express multi-dimensional databases. As the former is OEM d by IBM for use within DB2, Hummingbird Genio should also support that environment. We think it would make sense for Hummingbird to develop similar facilities for Microsoft SQL Server Analysis Services. For other data sources and targets the product uses ODBC (including IMS, DB2 on AS/400 or MVS, CA-IDMS, Adabas, VSAM and C-ISAM). A range of nonrelational formats are also supported, including XML (which is also used for all non-relational import and export), AS/400 flat files, COBOL and similar files, and Palm OS compatible files. There are also pre-built drivers for SAP environments (R/3, BW and IDOC). Also notable is the fact that the product provides WebSphere MQ support. Finally, it should be noted that Hummingbird used to market a product called Genio Miner. This combined the capabilities of Genio with data mining technology licensed from Angoss. While the code for this product is still available it is not actively sold. Financial results Hummingbird is a public company quoted on both NASDAQ and the Toronto Stock Exchange. In its most recent quarter (Q4 2003) revenues were $50.0m compared to $44.1m in the same period last year. This resulted in losses of $1.3m and a gain of $2.0m respectively. However, this is primarily due to increased amortization of intangibles resulting from last year s acquisitions and an increase in income tax expense. Adjusted (like GAAP but not GAAP) figures showed an increased profit of $5.3m compared to $5.1m last year. In the recently completed financial year, total revenues were $192.6m, up from $180.4m a year ago. Income improved from a loss of $2.9m to a profit of $3.7m. Adjusted income figures show a similar but smaller improvement. The company has some 1,470 personnel spread across 40 offices located throughout North America, Latin America, Australia, Belgium, the Netherlands, France, Germany, Italy, Switzerland, Japan, Singapore, South Korea and the UK. Page 4 Bloor Research 2004
Hummingbird Genio Product description Introduction What most distinguishes Hummingbird Genio from its competitors is that it uses a 4GL-like approach to developing transformations. It has a conventional iconbased, workflow-style interface on top of this, which works at the process level, but it is the procedural language provided by Hummingbird that gives the product is strength. Most products within the data integration market fall into one of two camps; either they generate code that is compiled and run on the mainframe or other system, or they provide an execution engine. The advantage of the latter approach is that you take processing load off the source systems, that they run on relatively inexpensive Windows, UNIX and Linux platforms, and that those platforms can be tuned for optimal performance. However, the downside is that they tend not to be able to cater well for complex tasks, which means that developers have to resort to hand coding those parts of an application for which the tool cannot easily cope. Moreover, where you only use the engine for transformations then that is a completely black-box approach: you cannot inspect what is happening so you cannot tune its performance. In other words, you can tune the whole platform but not individual tasks. By offering a 4GL-based approach, Hummingbird Genio aims to provide the best of both of these worlds. It uses an engine, so that you can tune the platform but you can also, if you wish, push processing tasks back to source systems when it is appropriate. On the other hand, because it is actually a 4GL, there should be no issue with complexity and you can inspect the code for individual tasks and tune those individually if you want to. At present, the one major downside to the product is that it is not easy to parallelise processes within the engine. In rival products, you can literally do this by clicking on a button whereas, with Hummingbird Genio, you have to design this yourself. In fact, the company is going to make this rather simpler in the forthcoming 5.1 release, so that there are features that help you to do this within the development environment. Nevertheless, this remains a process that you have to go through and it is not automated in the same way as in some other products. Hummingbird Genio is aimed primarily at the mid-market. In part this is because the company feels that this market has been underserved by other data integration vendors. In particular, most companies in this space still tend to hand code data movement solutions, with all the inherent difficulties that provides (more complex and costly maintenance, no data lineage, no impact analysis, and so on). In fact, Hummingbird estimates that users of its product should get something like four or five times better productivity compared to hand coding and twice that again for maintenance. The company normally does proof-of-concept type sales in which it will attempt to demonstrate the accuracy of these figures. Bloor Research 2004 Page 5
Data Integration, Volume 1 The main reason for the continued use of hand coding, Hummingbird believes, is that the major players in the market have very expensive products that require multi-project implementations before you get any payback. Hummingbird, in contrast, is aiming to give users an ROI on a single project basis (assuming a reasonable project size: say 150 days if hand coding). Architecture The architecture of the Hummingbird Genio product is illustrated in Figure 1. Figure 1: Architecture of Hummingbird Genio As can be seen, there are a number of major components within the product and we will concentrate on the Genio Designer, Engine, Repository and Scheduler in the following sections. Of the remaining modules, the Administration Manager has functions that should be self-evident, while there are also Genio MetaLinks, Genio DataLinks and GenioMet@Data. Of these, Genio DataLinks provides connectivity to the various source and target systems detailed above; Genio MetaLinks provides support for SAP (R/3, BW and IDOC) environments as well as the Sybase PowerDesigner and CA AllFusion ERWin data modelling tools; and Hummingbird Met@Data provides metadata management capabilities in conjunction with the Genio Repository (discussed later). Genio Designer The basic approach using Genio Designer is that you build processing modules using the product s 4GL and then you link these modules together using a standard icon-based, drag-and-drop style workflow environment Genio Designer adopts a 4GL-like approach to developing transformation rules, with the Genio language (which has just 15 instructions, so it should easy enough to learn) developed specifically for the tasks that it is designed for. However, the product makes extensive use of drop-down lists and other techniques that are designed to make use of the product as simple as possible and, in particular, most development will be wizard driven. The most important of these wizards are: 1. The Dataset wizard, which allows you to define the data you will be working with, and then automatically creates the relevant Select statements. This allows you to select equi-joins and both left and right joins simply by pointing and clicking. You can also define complex expressions that can be used, for example, if you want to concatenate fields before performing the join. 2. The Module wizard, where a module defines a field-level mapping or set of such mappings. This includes an extensible Boolean expression builder that has over 120 pre-built functions, including string, numeric, mathematical and trigonometric, and data functions, which can be executed either on Page 6 Bloor Research 2004
Hummingbird Genio the source, on the target or within the Humming bird environment, as required. Other facilities include a range of standard (customisable) exceptions, the ability to call the operating system, and a macro definition facility. Modules can, of course, be reused, and you can call one module from another. Finally, the Module wizard automatically generates data flow graphs, an example of which is illustrated in Figure 2. While you cannot directly create or amend these graphs, you can manipulate them, drill down from them and so forth. One of the features we particularly like is the way that you can have different icons (in both size and colour) to represent different source types. Figure 2: Generating data flow graphs in the Module Wizard Hummingbird Genio is component based, making extensive use of objects, of which three classes are definable. These are: Remote Objects. These are all related to things such as data sources and targets and include connections, tables, cubes, views, stored procedures and SQL extensions. Note that, as far as SQL is concerned, there are facilities to customise generated SQL for particular platforms if these are not supported out of the box by Hummingbird. Local Objects. These are used as a part of the logic of the ETL process. They include lookup tables, user defined variables, user exceptions, functions (which may be DLL, macro or query functions, in addition to the predefined functions discussed above) and messages. Genio Objects. These are used to define the links between the source and data targets and include datasets (combining tables, lookup tables and views), modules that contain exchange logic, and processes that define sequences of events. An additional object belonging to this group is the RunningContext object; this entity enables the isolation of the code from the physical environment; by supplying (dynamically at run-time) values for connection details, environment variables, logging and debugging levels, and so forth. This object is particularly useful in that it allows the user to write code that can be easily rolled between environments (for example, from development into test). A graphical, icon-based drag-and-drop environment rather than a drop-down, point-and-click style interface is provided to bring together the definition of data transformation processes. However, it is not used for the creation of mappings but for the definition of process tasks; for example a complete process might involve some data profiling using Similarity Athanor, a loading process, data cleansing, some transformations, perhaps further data profiling to ensure that the transformations had not introduced any new problems, and finally a load process. In fact, Hummingbird Genio is not limited to data flow diagrams and workflow for its graphical environment and the software also employs a number of other modelling techniques to depict the relationships that you have defined, notably Bloor Research 2004 Page 7
Data Integration, Volume 1 Dependency Graphs (for impact analysis) and Action Graphs. The latter, in particular, are relatively limited in their applicability and, in fact, all diagrams are especially used for documentation, which is generated automatically and is generally excellent. As far as tools are concerned, the last one within Genio Designer that we need to discuss is the Execution Viewer. This collects details about each process as it is run, along with volume metrics, exception logging, and so forth. This tool also allows you to see the SQL that is generated for each application, and there are facilities provided to edit and tune the generated SQL. Finally, for those designers who prefer to work in scripting mode rather than graphical mode, it is possible to drop down into this. In order to prevent problems arising, the 4GL will automatically detect any errors and suggest a correction (if, for example, a required statement has been accidentally deleted) once the developer reverts to the graphical development environment. Conventional programming facilities such as if-then-else and Case statements are provided. One major difference between the coding and 4GL environments is that, in the latter case, the Designer makes it impossible to create a script that is syntactically incorrect, greying out any options that are not available. Genio Engine The Genio Engine is responsible for the process of transferring data from one or more sources to the target(s). Actually, this is not a monolithic service, since the Engine offers three different modes of operation, depending on whether the transformation is performed by the Engine in stand-alone mode, or in conjunction with a remote object or, indeed, when remote databases take over the whole function for themselves. In the last case the Engine simply issues the appropriate SQL instructions. This is most useful when the source and target reside on the same server, and has the obvious advantage that it is rapid and does not put a load on the network. However, it will often be the case that this is not possible, in whole or in part. When this is the case, the Genio Engine may make use of source database facilities for such things as aggregation and consolidation where that is appropriate, but will otherwise take over all processing functions itself. In practice, a single Genio Engine may not be sufficient for performance reasons, and Hummingbird supports multiple engines. These support load balancing although this facility is not dynamic and must be configured in advance, as must the parallelisation already discussed. In so far as the actual loading process is concerned, Hummingbird Genio supports bulk loading: a single mode, which performs updates or refreshes a row at a time; and packet mode, which allows you to define a number of rows for loading at once. With finer granularity you can deploy higher levels of error management. There are also a number of real-time options. One is to use a polling facility that is built into the product, but this requires you to read a whole file at a time, a second Page 8 Bloor Research 2004
Hummingbird Genio (increasingly popular) method is to use WebSphere MQ, and a third option is to use the change data capture facilities provided by some database vendors (such as Oracle) and third party connectivity suppliers (like Attunity). Genio Scheduler As noted above, this is really a particular function of the Administration Console which, on a more general note, provides the sort of security and other facilities that you would expect. The Genio Scheduler allows you to define the timing and execution of processes and events, either on a time and data or periodic basis. In addition, it is also possible to define actions to be triggered on the basis of another process or event. In this case, such events may be external. In addition, scheduling may also be triggered by the presence or absence of a particular file, or when such a file is modified. The Scheduler not only provides conventional auditing facilities but also allows administrators to investigate the progress of processes while they are running. It may be used in conjunction with third party scheduling programs. Genio Repository The Genio Repository, as one might expect, stores all the relevant metadata pertaining to Hummingbird Genio. This allows the various objects to be reused. However, its most important function is to monitor the effect of changes. This is provided through both impact analysis and change tracking. Impact analysis is automatically run against the Dependency Chart whenever a change is made and relevant objects will be flagged as either valid or invalid. In the latter case, the process will halt pending user intervention. Alternatively, it is possible to assign an undefined status to an object and allow processing to continue, with correction and validation to take place subsequently. While impact analysis runs against Hummingbird Genio objects, local objects and metadata, the change tracking facility is used to monitor remote objects that are outside the direct control of Hummingbird Genio. Thus, for example, if a change is detected in the structure of a source database then the next time that data is imported from that source, the impact analysis feature will automatically be started in order to re-validate the relevant code. In order to communicate with metadata elsewhere, Hummingbird provides what it calls Genio MetaLinks. Specifically, it provides these bridges to SAP environments and to support detailing modelling tools (as detailed above), which can be used to reverse engineer existing database schemas that can be used to inform the definition of datasets. In this regard we would like to see Hummingbird support the Common Warehouse Metamodel (CWM), as this would give wider access to the Genio Repository without having to build proprietary bridges. We are pleased to note that Hummingbird plans to do this in a future release. Bloor Research 2004 Page 9
Data Integration, Volume 1 Summary Unless you are in France, where Hummingbird is the leading vendor in the market thanks to its product having originated there, Hummingbird is probably not the first name that springs to your mind when you think of data integration. This is mostly because Hummingbird s marketing efforts have largely been placed elsewhere over the last few years and it has only relatively recently recognised the opportunity that Hummingbird Genio represents. This has now changed. Further, as we have noted, the product is relatively inexpensive and it uses technology that will appeal to many users. We expect Hummingbird Genio to make significant headway in the market. Page 10 Bloor Research 2004
Copyright & Disclaimer This document is subject to copyright. No part of this publication may be reproduced by any method whatsoever without the prior consent of Bloor Research. Due to the nature of this material, numerous hardware and software products have been mentioned by name. In the majority, if not all, of the cases, these product names are claimed as trademarks by the companies that manufacture the products. It is not Bloor Research s intent to claim these names or trademarks as our own. Whilst every care has been taken in the preparation of this document to ensure that the information is correct, the publishers cannot accept responsibility for any errors or omissions.
Suite 6, Challenge House, Sherwood Drive, Bletchley, Milton Keynes, MK3 6DP, United Kingdom Tel: +44 (0)1908 625100 Fax: +44 (0)1908 625124 Web: www.bloor-research.com email: info@bloor-research.com