White Paper A Model Driven Approach to Data Migration



Similar documents
VIRTUAL DESKTOP INFRASTRUCTURE: PLENTY OF POTENTIAL, BUT DO THE HOMEWORK BEFORE DIVING IN. by Todd Cota, Jerin May and Joesph Springer

ECM Migration Without Disrupting Your Business: Seven Steps to Effectively Move Your Documents

Converting Microsoft Access to SQL Server

<workers> Online Claims and Injury Management

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

The GeoMedia Architecture Advantage. White Paper. April The GeoMedia Architecture Advantage Page 1

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

B.Com(Computers) II Year RELATIONAL DATABASE MANAGEMENT SYSTEM Unit- I

Implementing a Data Warehouse with Microsoft SQL Server 2012

Data Migration - The Basics

SQL Server 2012 Business Intelligence Boot Camp

Implementing a Data Warehouse with Microsoft SQL Server 2012

Project Management Development Scheduling Plan: Data Mart. By Turell Makins

Installing Windows XP Professional

How To Install The Arcgis For Inspire Server Extension On A Microsoft Gis For Inspire Server Extension (For Microsoft) On A Pc Or Macbook Or Ipa (For Macbook)

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Course Outline. Module 1: Introduction to Data Warehousing

A Guide To Evaluating Desktop Virtualization Solutions

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Unicenter Desktop DNA r11

Information Systems Analysis and Design CSC340. XXIV. Other Phases

CA Desktop Management Suite r11

Data Management for Exploration and Mining Companies

Capita Productivity Hub Combining secure private cloud with familiar Microsoft tools

Migrating and Converting Microsoft Access to SQL Server

ICT40115 Certificate IV in Information Technology (Release 1) COURSE GUIDE

The Software-as-a Service (SaaS) Delivery Stack

White Paper : Virtualization and Cloud Computing Virtualization and Cloud Computing: The primary solution to the future of Testing

How To Test A Website For Performance

This is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

How to Plan a Successful Load Testing Programme for today s websites

Providing a quality IT Support & Consultancy service in the South East

Document Reference : Title : GCloud Service Definition - Confirm OnDemand

The Key Elements of Digital Asset Management

Big Data Integration: A Buyer's Guide

Implementing an Electronic Document and Records Management System. Key Considerations

Redundancy Options. Presented By: Chris Williams

Time and AttendanceVendor Tutorial - 9 Must Have Features For a Successful Transition

Service Virtualization:

VoIP in the Enterprise

How to Prepare for the Upgrade to Microsoft Dynamics CRM 2013 (On-premises)

Outline SSS Microsoft Windows Server 2008 Hyper-V Virtualization

Visual Studio.NET Database Projects

Desktop Management for the Small Enterprise

10 top tips to reviewing recruitment software (0)

Network Assessment Services

Standalone Installation Guide V3.1. Recorder 6 JNCC

Inside Track Research Note. In association with. Storage Quality of Service Management. The automation imperative

SaaS-Based Budget Planning

The IBM Cognos Platform

SOFTWARE LICENSING AWARENESS IN DYNAMIC ENVIRONMENTS

DevOps: Roll out new software and functionality quicker with high velocity DevOps

White Paper A10 Thunder and AX Series Application Delivery Controllers and the A10 Advantage

A Getronics Whitepaper APPLICATION PACKAGING IN A CHANGING LANDSCAPE

Extracting and Preparing Metadata to Make Video Files Searchable

1. Talecom from BT. Talecom from BT. Service Definition

TECHNICAL DOCUMENTATION SPECOPS DEPLOY / APP 4.7 DOCUMENTATION

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

CA IT Client Manager. Desktop Migration

Using Rocket Software s PASSPORT as an Alternative to Attachmate EXTRA!

Cloud computing is a way of delivering IT services to users without the need to buy, install or manage any infrastructure.

pc resource monitoring and performance advisor

Page 1 of 5

The Arangen Approach to Enterprise Information Integration

APPLICATION PERFORMANCE MONITORING

Workspace Manager 2014 Module Comparison Chart

Contents. Introduction... 1

Beta: Implementing a Data Warehouse with Microsoft SQL Server 2012

Virtualized, Converged Data Centers and Cloud Service Providers

Windows Server 2008 R2 Hyper-V Live Migration

from Microsoft Office

Benefits of upgrading to Enterprise Vault

White paper. Business Applications of Wide Area Ethernet

Crystal Gears. Crystal Gears. Overview:

Changing attitudes to ERP How cloud is disrupting traditional approaches to ERP deployment

A Technology Based Solution to Move Client Server Applications to Java /.NET in Native 3-Tier Web Code Structures

Building a better branch office.

Four Things You Must Do Before Migrating Archive Data to the Cloud

System Requirements for Archiving Electronic Records PROS 99/007 Specification 1. Public Record Office Victoria

EMR Implementation Planning Guide

VMware Cloud Environment

Extend Your IT Infrastructure with Amazon Virtual Private Cloud

29200 Northwestern Hwy Suite 350 Southfield, MI WINSPC winspc.com

VOXOX 5BENEFITS OF A. HOSTED VoIP SOLUTION FOR MULTI-OFFICE BUSINESSES. a VOXOX ebook. Communications to the Cloud:

Overview. The Knowledge Refinery Provides Multiple Benefits:

Software as a Service (SaaS) Online HR

This ESG White Paper was commissioned by DH2i and is distributed under license from ESG.

SOFTWARE-DEFINED NETWORKING WHAT IT IS, AND WHY IT MATTERS

Leveraging the Private Cloud for Competitive Advantage

Taking the Migraine Out of Migrations

Application Outsourcing: The management challenge

Building a Highly Available and Scalable Web Farm

IT PROJECT EVALUATION

March 2008 Grant Halverson CEO, GFG Group. Regional Processing Models

Municipality Moves SCADA System from Desktop Computers to Terminal Services

Hardware/Software Guidelines

How To Grow A Business

Increase Business Intelligence Infrastructure Responsiveness and Reliability Using IT Automation

LuminonCore Virtual Desktop Infrastructure (VDI) Products

Software Defined Networking

Transcription:

White Paper A Model Driven Approach to Data Migration This paper describes the approach used by Snowflake Software to migrate data from legacy databases to a new database infrastructure. This process has been developed by Snowflake and has been applied in a number of projects with our customers. Why move the data? There are a number of reasons why you may want to move data from an existing database to a new one. The simplest reason is that you have decided to change your choice of database management system (DBMS). For example, you may have decided that an open source DBMS like PostGIS represents better value for money in the long run than your existing corporate database. More likely you have been prompted to look at your choice of DBMS because your requirements for accessing the data have changed. Many data providers are finding that where they used to provide data to their customers in files on disk, or through FTP, they are now expected to provide instant and dynamic access to data via web services. The EU INSPIRE Directive, for example, requires many agencies to provide on-line access to data that was previously only accessed in-house or distributed to a small community of users. Factors which arise from these changes include: New external demands on the database. The database is now expected to suddenly handle large numbers of queries coming from the outside world instead of a steady stream of transactions from in-house staff. The database needs to be re-optimised for these new query patterns. Licensing constraints. Licensing structures of commercial DBMS may mean a sudden hike in licence costs because of the potentially large number of users now hitting the system via the web. Cloud computing. Cloud computing platforms can be an appealing way to address issues like high availability and scaling to meet peak demand, but commercial DBMS licensing can be an obstacle to cloud deployment. Free and Open Source DBMS. These DBMS can remove licence obstacles, but this requires you to switch to a different DBMS for these new systems. The creation of a publication database is a common solution to these problems. This creates a copy of the data from the legacy system in a new database. The legacy system continues to run and the existing applications continue to connect to and update the legacy database. A new database is created with the new applications and web services connected to it. Data is migrated from the legacy to the new database and updated periodically to keep the publication database content up to date. Whatever the reason for migrating the database, the process described here could be applied and many of the benefits gained.

The data migration process The data migration process can be summed up as 1 : Model Extract Transform Validate Load Update Create a model of the data to be migrated This step creates an abstract model documented in UML. The model represents those parts of the data which need to be migrated to the new database. The model is abstract and represents the concepts behind the data free of implementation specific detail. Whilst CASE tools can be used to reverse engineer models from databases we have found that this process is best done as a largely manual process since one key aim in this process is to re-think the data and create a model which suits the new system. This makes this part of the process the most complex since this is where all the hard thinking takes place. Once the model has been created the rest of the process is fairly mechanical, if not entirely automated. Our three point plan for healthy data is: Re-normalise Re-normalise Rationalise Re-purpose Database schemas typically contain a number of design compromises (denormalisations) which arise from technical constraints and performance optimisation. These constraints may not apply in the new system. In fact, if the software was designed a long time ago, hardware improvements since the original design may mean that optimisations made at the time are no longer necessary. New applications connecting to the new database will apply different query patterns. This is especially true of web services which typically have very different query patterns to desktop applications. Optimisations made for the legacy applications may therefore not be appropriate for the new applications, and may even slow the new applications down. Simply migrating the existing database schema can therefore mean carrying across these denormalisations into the new system where they may no longer be useful and can make the database more confusing to work with. This modelling process is a good opportunity to revisit the model and remove any obsolete features. 1 No, it doesn t spell out a clever acronym.

Rationalise Databases which have been under maintenance for a long period will have accumulated a number of changes. The fact that they are often changed by different people at different times means that it is likely that different design principles are applied to different parts of the model. At Snowflake we have seen databases on customer sites where models have been merged from a number of different applications developed over many years. In these cases, different parts of the model can be implemented in completely different styles. For example, in one such system we found Boolean values were recorded in some tables as 0 or 1, others used true and false and some used y and n. At a more superficial level it is very common that no consistent naming conventions have been used throughout the life of the system. Remodelling the data provides an opportunity to apply consistent policies across the model, making it easier to work with and reducing future maintenance costs. Re-purpose With new applications it is common to find that concepts underlying the applications are different from the ones used to build the legacy systems. This is particularly true when creating publishing databases since the read-only uses of the data by end users require a different, and often simpler, view of the data than the editing view used by the applications used to maintain the data. For example, editing applications often make use of links between features to ensure that edits made to one feature are consistent with other features in the dataset. Users who are only reading the data want the data to be consistent, but because they are not editing the data they do not need those links to maintain consistency. They can therefore work with a simplified view of the data without the links. The model we create at this stage should take account of what the applications on the new system need but we do not need to assume that all the content from the original model is required. Other radical model changes may be appropriate. For example, a system which is used to maintain land polygons and links them to legal titles may treat the polygons as the main feature of the model with the title being an attribute of the polygons. When that data is published to end users the title may in-fact be the main feature, with the polygons being an attribute of the title. Our modelling exercise can take account of these changes of emphasis. Create XML and database schemas from the UML model Now that we have an abstract model of the data we can use it to drive the migration process. The first step is to create implementations of the model in the form of XML schemas and database schemas. The XML schema is generated using modelling tools. Many CASE tools are capable of creating XML schemas but for spatial data the ShapeChange or FullMoon tools (which are free and open source) provide a standards compliant method for converting UML into Geography Markup Language schemas. When creating the database schema, Snowflake s approach is to use our GO Loader product to generate the database schema from the XML schema. This has the advantage that we not only get the database schema, we also get a mapping from the XML schema to the database schema. We will need this when we come to the data migration itself.

Export and translate the data The next step is to map the legacy database model to the XML schema. At Snowflake we do this using our GO Publisher product which allows us to configure a translation using a graphical user interface. We can then export the data from the legacy database into a set of XML files. Since this export process translates the data from the legacy model to the XML schema, which is derived from the UML model, it is effectively upgrading the data to the new model. Validate the export Now that the data has been exported to XML it is ready for loading into the new database. However, since the XML standard includes the ability to validate documents against the schema we can take advantage of this feature of XML before we load the data. By validating the data at this stage we can verify that the translation from the legacy model to the refreshed data model has created complete and well structured elements in the new model. This allows us to crucially detect any problems with the data or the translation before we start to interact with the new database. Load the new database The final stage of the data migration is to load the XML export of the legacy data into the new database. This is done using the mapping created when we derived the new database schema from the XML schema. Applications can now be connected to the new database.

Apply periodic updates An additional stage can also be applied to this process to repeat the transfer of data from the legacy database to the new database periodically. Because the tools used can generate and apply changeonly updates we can move only the changed data from the legacy to the new database, thus preserving the lifecycle of entities in the new database and reducing the level of data traffic. The process can be automated, so updates can easily be scheduled to occur at regular intervals e.g. weekly or nightly. Benefits This approach to data migration offers a number of benefits: Complete decoupling of DBMSs Because the process of moving data goes via a set of files, the two databases are completely decoupled. This means: There is no need to simultaneously connect to two databases or maintain transactions across two databases. The query and update loads across the two databases can be managed independently. The process is very tolerant of low bandwidth or intermittent network connections. By putting the files on media you can even migrate the data across an air-gap. Validation of data in flight The use of XML means that you can apply an off-the-shelf validation process to the data as it is being transferred between the databases. This provides a valuable quality check on the data as it is being moved. Since the validation runs after the model transformation this provides a double check that the data after transformation is consistent with the model. Re-purpose/refresh data during migration The fact that you need to work on your database is a chance to refresh your data model. The fact that this process starts with a modelling exercise means that you should end up with a clean, well thought out model which can underpin application development for many years to come. Repeatable process The process is highly automated and repeatable. During development you can easily revisit the UML model and create new versions of your schemas and databases. This means you have plenty of opportunity to try different options and make sure you get your new database right. Automation of the extract, transform, validate and load means that it is easy to schedule periodic updates. Flexibility and productivity With the toolset we use there is no coding involved in the whole process. This means that it is quick to setup and equally quick and easy to modify the process and models if you find they are not quite as you need them.

Documentation generated A nice side effect of this process is that being model driven, the new data model is fully documented. Drawbacks We have found many benefits to this approach but only one real drawback: you need to do a lot of thinking! This process goes back to first principles with your data model. This means thinking hard about how your data should be in the future rather than how it is now. This is a good way of making sure your new database will support your applications for many years to come, but is an investment of both time and effort in order to get the most out of the process. Conclusion Starting your database migration by re-modelling your data is a good way to create a database that is an asset with a long term future. It gives you the opportunity to re-purpose your data for new applications such as publication databases and web services. The XML based data migration process of extract, transform, validate and load provides a robust method for populating your new database with content from your legacy system. This is a repeatable and automated process that can be quickly set up with off-the-shelf tools with no need for programming, which leaves you free to think about the real problem of how your data needs to look, not how to move it from A to B.