Legacy Data Migration: DIY Might Leave You DOA



Similar documents
SharePlex for SQL Server

Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica

OWB Users, Enter The New ODI World

Data Masking Secure Sensitive Data Improve Application Quality. Becky Albin Chief IT Architect

Realizing the Benefits of Data Modernization

Contents. Introduction... 1

<Insert Picture Here> Move to Oracle Database with Oracle SQL Developer Migrations

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

Green Migration from Oracle

Attunity Integration Suite

Extraction Transformation Loading ETL Get data out of sources and load into the DW

PROGRESS OPENEDGE PRO2 DATA REPLICATION

Dedicated Real-time Reporting Instances for Oracle Applications using Oracle GoldenGate

SAP Data Services 4.X. An Enterprise Information management Solution

Implementing a Data Warehouse with Microsoft SQL Server

Five Steps to Integrate SalesForce.com with 3 rd -Party Systems and Avoid Most Common Mistakes

University Data Warehouse Design Issues: A Case Study

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

IMS Users Group. The Right Change SQL-Based Middleware. Mike McKee CONNX Solutions.

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Enabling Better Business Intelligence and Information Architecture With SAP Sybase PowerDesigner Software

Consolidate by Migrating Your Databases to Oracle Database 11g. Fred Louis Enterprise Architect

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Enabling Better Business Intelligence and Information Architecture With SAP PowerDesigner Software

Selecting the Right Change Management Solution Key Factors to Consider When Evaluating Change Management Tools for Your Databases and Teams

Oracle Data Integrator Technical Overview. An Oracle White Paper Updated December 2006

An Oracle White Paper February Oracle Data Integrator 12c Architecture Overview

Oracle Data Integrator 12c: Integration and Administration

An Oracle White Paper. Using Oracle GoldenGate to Achieve Operational Reporting for Oracle Applications

East Asia Network Sdn Bhd

The Evolution of ETL

data express DATA SHEET OVERVIEW

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

In-memory databases and innovations in Business Intelligence

Using Oracle Data Integrator with Essbase, Planning and the Rest of the Oracle EPM Products

What You Don t Know Will Haunt You.

Shadowbase Data Replication Solutions. William Holenstein Senior Manager of Product Delivery Shadowbase Products Group

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Implementing and Maintaining Microsoft SQL Server 2008 Integration Services

Move Data from Oracle to Hadoop and Gain New Business Insights

Overview Western Mariusz Gieparda

Compared to MySQL database, Oracle has the following advantages:

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Best Practices and a Must Have Toolset for SOA Migration Projects

Efficient and Real Time Data Integration With Change Data Capture

IT Services Management Service Brief

Implementing a Data Warehouse with Microsoft SQL Server 2012

The Top 10 Things DBAs Should Know About Toad for IBM DB2

WHITE PAPER DATA GOVERNANCE ENTERPRISE MODEL MANAGEMENT

Real-time Data Replication

ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

How To Use Shareplex

ETL Overview. Extract, Transform, Load (ETL) Refreshment Workflow. The ETL Process. General ETL issues. MS Integration Services

Oracle Data Integrator 11g: Integration and Administration

Oracle to MySQL Migration

Migrating Non-Oracle Databases and their Applications to Oracle Database 12c O R A C L E W H I T E P A P E R D E C E M B E R

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Custom Consulting Services Catalog

Phire Architect Hardware and Software Requirements

SAP Sybase Replication Server What s New in SP100. Bill Zhang, Product Management, SAP HANA Lisa Spagnolie, Director of Product Marketing

Implementing a Data Warehouse with Microsoft SQL Server 2012

HP Shadowbase Solutions Overview

A roadmap to enterprise data integration.

ETL Tools. L. Libkin 1 Data Integration and Exchange

Cloud Ready Data: Speeding Your Journey to the Cloud

Oracle Data Integrator 11g New Features & OBIEE Integration. Presented by: Arun K. Chaturvedi Business Intelligence Consultant/Architect

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Real Time Data Integration

Module 5 Introduction to Processes and Controls

How to Implement Multi-way Active/Active Replication SIMPLY

IBM INFORMATION MANAGEMENT SYSTEMS (IMS ) MIGRATION AND MODERNIZATION - CONVERSION OF HIERARCHICAL DL/1 STRUCTURES TO RDBMS

Test Data Management Concepts

Course Outline. Module 1: Introduction to Data Warehousing

Integrating data in the Information System An Open Source approach

Foundations of Business Intelligence: Databases and Information Management

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

EAI vs. ETL: Drawing Boundaries for Data Integration

SQL Server Master Data Services A Point of View

Business Intelligence Tutorial

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

Oracle Data Integration: CON7926 Oracle Data Integration: A Crucial Ingredient for Cloud Integration

Addressing the SAP Data Migration Challenges with SAP Netweaver XI

Best Practices for Implementing Oracle Data Integrator (ODI) July 21, 2011

Oracle SQL Developer Migration

Implementing a Data Warehouse with Microsoft SQL Server 2012

MODULE FRAMEWORK : Dip: Information Technology Network Integration Specialist (ITNIS) (Articulate to Edexcel: Adv. Dip Network Information Specialist)

Contents. visualintegrator The Data Creator for Analytical Applications. Executive Summary. Operational Scenario

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Transcription:

Legacy Data Migration: DIY Might Leave You DOA By Wayne Lashley, Chief Business Development Officer White Paper 1

In any application migration/renewal project, data migration is 4. Capture of all source and target metadata into a repository; usually a relatively minor component in terms of overall project 5. Accommodation of source data structure changes over time; effort. Yet failure of the data migration process can and does 6. Native interfaces to both the source database and the cause failure of the entire project. So why do project managers target database to maximize throughput and ensure correct fail to take essential steps to mitigate this risk? interpretation and formatting; Often, data migration is perceived as the customer s problem, 7. An evolutionary methodology for refining the mappings and or a problem to be process, allowing considered later, for repeatable and the expectation comparative testing; being that someone will write a suite of 8. Minimized extraction programs extraction impact on that will dump production systems; the legacy data 9. Support for into flat files to be phased project loaded into the new implementation, relational database including Change management system Data Capture (RDBMS). However, (CDC) and possibly anytime a particular bidirectional data implementation replication capabilities. is expressed It would be a daunting in a foreign challenge for a representation, there Given the maturity, wealth of functionality and relative low cost of tools like tcvision, as migration project team is great opportunity compared to the effort, complexity and risk entailed in a Do-It-Yourself solution, there to design, develop, for mistranslation, is no reason why a legacy renewal project should run aground on data migration. test and deploy tools so such a simplistic to support these tactics and features. Furthermore, to reinvent point of view opens the door to potential catastrophe. the wheel for each new project is simply a waste of resources. A comprehensive legacy data migration strategy must promote a Fortunately, this is unnecessary, since mature, proven, featurerich and cost-effective products that contribute substantially to number of important tactics and features: 1. Discovery and analysis of legacy source data structures, project success are commercially available. including both logical and physical definitions; As a leading provider of legacy data migration, replication and 2. Data modeling and design of the RDBMS tables, often with integration products and services, Treehouse Software leverages the goal of not only reflecting the structure and content of the production experience gained in the field since the mid-1990s. legacy source, but also its behavior; Treehouse data replication solutions have been used to support many data transfer and migration projects. Each customer s 3. Specification and implementation of appropriate mappings project has led to enhancements and refinements such that and transformations from source to target; today it is possible to handle virtually any conceivable legacy data migration requirement. 2

As an illustration of the potential pitfalls of legacy data migration, consider some issues that Treehouse has successfully addressed with respect to Software AG s ADABAS DBMS: unique datatypes ( D and T ) that appear to be merely packed-decimal integers on the surface, but that represent date or date-time values when interpreted using Software AG s proprietary NATURAL-oriented algorithm. The most ADABAS has no concept of transaction isolation, in that a program may read a record that another program has updated, in its updated state, even though the update has not been committed. This means that programmatically reading a live ADABAS database one that is available to update users will almost inevitably lead to erroneous extraction of data. Record modifications (updates, inserts and deletes) that are extracted, and subsequently backed out, will be represented incorrectly or not at all in the appropriate way to migrate such datatypes is to recognize them and map them to the corresponding native RDBMS datatype (e.g., Oracle DATE) in conjunction with a transformation that decodes the NATURAL value and formats it to match the target datatype. tcvision, a comprehensive data migration and replication solution offered by Treehouse, provides high-productivity, high-efficiency automated capabilities in dealing with these and other issues (Figure 1). While the discussion below will focus target. Because of this, at Treehouse we say the only on ADABAS, in fact tcvision provides similar capabilities for a safe data source is a static data source not the live database. wide range of legacy sources, including CA-IDMS, CA-Datacom, IMS/DB, DL/I, DB2, Many legacy ADABAS applications make use of record typing, i.e., multiple logical tables stored in a single ADABAS file. Often, each must be extracted to a separate table in the target RDBMS. The classic example is that of the codelookup file. Most shops have a single file containing state codes, employee codes, product-type SQL/DS, VSAM and even sequential files, as well as non-legacy sources such as Oracle Database, Microsoft SQL Server, DB2 LUW and ODBC data sources. Moreover, any of the aforementioned sources can also be a target, making tcvision the most comprehensive and flexible data migration and replication solution anywhere. For discovery and analysis of legacy source data structures, tcvision codes, etc. Records belonging to a given Figure 1: The tcvision data migration and replication solution includes modeling and mapping facilities to code table may be distinguished by the presence of a value in a particular index (descriptor or superdescriptor in ADABAS parlance), or by a range of specific values. Thus, the extraction process must be able to dynamically assign data content from a given record to different target tables depending on the data content itself. view and capture logical ADABAS structures, as documented in Software AG s PREDICT data dictionary, as well as physical structures as described in ADABAS Field Definition Tables (FDTs). Note that PREDICT is a passive data dictionary there is neither requirement nor enforcement that the logical and physical representations agree, so it is necessary to scrutinize both to ADABAS is most often used in conjunction with Software AG s NATURAL 4GL, and conveniently provides for ensure that the source structures are accurately modeled. 3

Figure 4: tcvision target RDBMS DDL generation Figure 2: tcvision capture of source structure metadata Productivity in the data modeling and design of the RDBMS tables is greatly enhanced with tcvision. In seconds, the product automatically generates a complete, native, fullyfunctional yet completely-customizable RDBMS (IBM DB2, Oracle, Microsoft SQL Server, etc.) schema based on a specified ADABAS file (Figure 3). Furthermore, tcvision generates specification and implementation of appropriate mappings and transformations for converting ADABAS datatypes and structures to corresponding RDBMS datatypes and structures, including automatic handling of the proprietary D and T source datatypes. It is important to note that the schema, mappings and transformations that result from auto-generation can be tailored to any specific requirements after the fact. It is even possible to import an existing RDBMS schema and retrofit it, via drag-anddrop, to the source ADABAS elements. Using the tcvision GUI, the most complex transformations can be specified. Source fields can be combined into a single column, decomposed into separate columns, and be subject to calculations, database lookups, string operations and even programmatic manipulation. Furthermore, mapping rules can be implemented to specify that data content from a source ADABAS record be mapped to one or more target RDBMS tables each with its own different structure, as desired based on the data content itself. Target tables can even be populated from more than one source file. Figure 3: tcvision target schema auto-generation and mapping tcvision even generates the RDBMS Data Definition Language (DDL) to instantiate the schema (Figure 4). Over the course of a migration project, it is nearly inevitable that source data structure changes will be required and must be accommodated, which almost certainly will affect the target and the data migration process. tcvision source-to-target mappings can be specified with a Valid From date/time and Valid To date/time so that the structure change can be anticipated, scheduled and implemented smoothly (Figure 5). 4

tcvision ETL executes as a single continuous process from source to target, without the need for intermediate storage or middleware. The process can be scheduled and monitored from the tcvision Control Board (Figure 6). Figure 6: tcvision Control Board Figure 5: tcvision source-to-target mapping tcvision captures all source and target metadata into its own RDBMS-based repository, the schema of which is documented to facilitate customized querying, reporting or even integration with other applications. At any time, all metadata and specifications for source-to-target data migration are at the project team s fingertips, and where applicable can be exposed to or integrated with application-migration toolsets. tcvision offers highly-efficient batch Extract-Transform- Load (ETL) processing, known as Bulk Loading in tcvision terminology. Bulk Loading is optimized by use of native interfaces to both the source database and the target database: a static native ADABAS data source (a utility unload of the pertinent database files) can be used as a data source instead of accessing the live database itself thus avoiding contention and transaction-isolation issues and the transformed data is natively formatted to be RDBMS-loader-ready, including automatic generation of the loader utility control instructions, with a high-throughput transformation engine in the middle. With these productivity aids, and an optimized ETL process, an evolutionary methodology for refining the mappings and process is enabled. Treehouse advises data migration customers to migrate early and often. The sooner that data can be populated into the prospective RDBMS, the sooner that data quality and assumption issues can be discovered, confronted and resolved, and capacity and performance planning commenced. Moreover, very early on in the process a test bed can be generated for modeling and prototyping the new application. Using a static data source allows ETL processing to be repeated and compared between various mapping scenarios and data samples. Thus, the RDBMS schema and mappings can be iteratively refined over time as requirements are developed and discoveries made. 5

Customers also benefit by minimized extraction impact on the operational ADABAS database and applications. ETL can be resource-intensive and costly, particularly in scenarios where database calls are charged back to the user and batch windows are tight. When configured to use a static data source, tcvision places no workload whatsoever on ADABAS itself. Moreover, it is possible to simply transmit the unloaded source data in its raw format off the mainframe and have tcvision process on a platform with available low-tco capacity (Windows, Linux, Unix) incurring no mainframe workload at all. Due to pure logistics, most migration projects require support for phased project implementation. Only the smallest projects can risk a big bang approach. tcvision users benefit from Replication, tcvision s term for CDC, in that it enables implementation of the new RDBMS for certain tables, to be used in a read-only manner (generally speaking) in the new system, while maintenance of the source ADABAS data (the system of record ) continues in the legacy system for a period of time. tcvision s Log Processing mode enables changes made to ADABAS (reflected in the ADABAS protection log, or PLOG) to be periodically extracted and transformed to appropriate SQL statements, according to the established mapping rules the same ones as are used for ETL to update the target RDBMS tables. This can even be done in real time using tcvision s DBMS Extension mode, where changes are captured live from the operational ADABAS database. And for very complex project implementations, tcvision can be configured to capture changes from both the ADABAS and the RDBMS for batch or real-time bidirectional Replication. In projects where phased implementation may or may not be contemplated, but where data volumes to be migrated are large, tcvision Replication enables the data migration to be staged over a period of time as long a period as is needed to accomplish the migration according to the available processing capacity or other constraints. ADABAS files can be sequenced and grouped as desired, and each file or group of files separately Bulk Loaded to the RDBMS. Thereafter, Replication can be activated for each Bulk Loaded file. This process can continue until all files have been migrated through Bulk Loading and then maintained in sync via Replication so that all migrated data arrives at the goal line at the same time. In fact, the golive can be scheduled whenever convenient by simply continuing Replication until the appropriate time. This capability offers enormous planning and scheduling flexibility for the migration team. It is impractical to discuss all the features and capabilities of tcvision within an overview discussion, so suffice it to say that we have only touched on those most critical ones here. Given the maturity, wealth of functionality and relative low cost of tools like tcvision, as compared to the effort, complexity and risk entailed in a Do-It-Yourself, solution there is no reason why a legacy renewal project should run aground on data migration. About the Author J. Wayne Lashley is Chief Business Development Officer for, based in Sewickley, PA. Wayne oversees Treehouse product development; sales and marketing; technical support, documentation and quality assurance; partner relations; pre- and post-sales service delivery; and corporate technology. Prior to joining Treehouse in 2000, Wayne held IT technical and management positions in software development, academic, financial services, government and primary industry sectors through a career that began in the late 1970s. His positions involved responsibility for programming, analysis, testing, standards and quality assurance, documentation, configuration management, emerging technologies, business intelligence systems, database administration, development support, software procurement, product management and service delivery. 6

About Since 1982, Treehouse Software has been serving enterprises worldwide with industry-leading software products and outstanding technical support. Today, Treehouse is a global leader in providing data migration, replication and integration solutions for the most complex and demanding heterogeneous environments, as well as feature-rich, accelerated-roi offerings for information delivery, business intelligence and analytics and application modernization. Treehouse Software customers are able to: REPLICATE Data Anywhere INTEGRATE Data Everywhere MODERNIZE Data from Legacy Applications ANALYZE Data for Business Advantage With unmatched comprehensiveness of tools and depth of experience, Treehouse Software applies proven approaches to help customers and partners mitigate risk and profit sooner from modernization benefits. For more information, contact us: Email: sales@treehouse.com Phone: 1.724.759.7070 2605 Nicholson Road, Suite 1230 Sewickley, PA 15143 7