Metalogic Systems Pvt Ltd J 1/1, Block EP & GP, Sector V, Salt Lake Electronic Complex, Calcutta 700091 Phones: +91 33 2357-8991 to 8994 Fax: +91 33 2357-8989
Metalogic Systems: Data Migration Services 1. Introduction Data Migration Service An Overview Data Migration is an extremely important but often neglected or wrongly estimated phase in any software implementation / migration activity. Apparently it sounds rather easy to move the data from the source platform to the target platform, but in real life this can easily become an extremely daunting task. The objective of this document is to give a general idea behind the activities related to migrating or transforming data and the technical overview of the processes within the data migration service provided by MLS. 2. Scope of Data Migration One may need to transform or migrate data from one application to another due to various reasons and some of them can be listed as below: Technical obsolescence of old operating environment (hardware, o/s, application s/w) and a compulsion to migrate the application along with the data to newer platforms Implementation of new custom-built applications or packages requiring enterprise data from the legacy systems Requirement to provide web-enabled services that must have access to enterprise data online or updated up to a pre-determined interval. To provide a reliable and efficient electronic mechanism to make archived data available. The actual scope of data migration needs to be determined after analyzing various requirements of the customer. We need to answer some basic questions before finalizing on the scope. Some of the questions are given below but the list is not exhaustive: a) Whether it is a 1:1 data migration (e.g., attributes of the entities remain mostly unchanged between the source & target platforms) or there is a need for data transformation (i.e., the source and target data models differ by design) For application migration, generally the functionalities offered by the original application remain same and it is only the source and target environments (i.e., hardware, o/s, database products) that change. But in cases where the migrated data is going to be used by a new custom application or any package, there is a substantial change in the entity/relationship between the source and target databases. Naturally, in the latter case, the effort is much more and one needs to be thorough about the business rules to transform the data. Metalogic Systems Confidential Page 2 of 10
b) Whether the original application has to continue or it is going to be retired once the target application goes into production. In all likelihood, in the latter case the migration effort is one time, whereas in the former case we may need to put in place a process by which both these applications should be able to exchange data to and fro at a regular interval of time. Often, due to modular replacement of legacy systems with new or migrated systems, exchanges of data between the two environments become a critical requirement. c) In some cases it may so happen that the original application has already been retired and the new application is operational with current data, but there is still a requirement to transfer the older/archive data. It is likely that the source environment itself may not be available any longer and in such cases, one will need to devise special handling for correct interpretation of this kind of data (e.g., computational / embedded sign fields, data residing in machines with 6/9 bit architecture etc.). Apart from the issues outlined above, we need to consider various operational aspects of the specific application and the environment as listed below. However, some of these may not apply depending on the response to the above questions. d) Which data is static and which is transactional in nature and what are the respective volumes? What are the frequencies of various source database entities those undergo changes? e) How much disk space is available in the source environment to download the data? If the download process needs to be split into several phases, how to manage the changes in the transactional data that happen during these phases? f) Can the source application be shut down during the data download & upload phases? What is the estimated process run-time and is it lower than the maximum time allowed for such a shutdown? Answers to the above questions naturally will lead to the decision on whether we will need to download/upload the data incrementally. The above analysis will also determine the periodicity of the data exchanges. It is apparent that an incremental download & upload process is going to be far more complicated compared to developing a set of simpler extraction and loading programs that will run as a single one-time batch job. Lastly, there is a great need to carefully evaluate the actual requirements of any data exchange process between the source and target applications if both have to run side by side and are dependent on each other. We need to determine the mode of data transfer in such cases. Should the data travel at a pre-determined interval by batch (e.g., nightly transfer by ftp or similar means) or do we need to have inter-process communication services between the two applications (through rpc or similar mechanisms)? Metalogic Systems Confidential Page 3 of 10
3. Overview of the Data Migration services provided by MLS MLS has acquired specialized expertise in carrying out data migration projects successfully across the globe. While executing data migration projects for our customers, we have implemented certain processes to streamline all the activities involved in this kind of activity. This process is based around a set of software tools developed in-house that generate some of the essential components like data download and upload programs. The toolbased approach automates the migration to a great extent, thereby ensuring a faster completion time and reducing the chances of errors. S o u r c e P la tf o r m ( H /W & S /W ) S o u r c e D a t a T r a n s f o r m a t io n T a r g e t D a t a T a r g e t P la tf o r m ( H /W & S /W ) The above picture is the uppermost level depiction of a most common type of data transformation activity. The source data are transformed and transported to the target platform after a series of operations are applied on them in various stages. The scope generally ends with uploading the data into target platform. Simple though it sounds, all aspects of the hardware, application software, database design (e.g., table structure, relationship, other objects like DB procedures etc.) and installation / deployment details of source and target environments must be considered in order to complete a successful data migration. While designing the service at MLS, we have tried our best to cover all these aspects. Metalogic Systems Confidential Page 4 of 10
4. Process Overview The process of Transformation involves: 4.1 Study and analysis of source and target data models and extraction of the mapping rules for transforming the business entities in the source application into the target platform. 4.2 Mapping existing data types available in the source platform to equivalent data types in the target database. 4.3 Translation/re-coding and manipulation of the source data as per requirements of the target application. 4.4 Scraping and Cleansing of data invalid in the new environment The following figure represents the major processes involved in a data migration activity and the sequence in which they are performed. The square boxes each represent a separate process with further elaboration in the subsequent sections of this document. Source Data Model Analysis Target Data Model Analysis Application Programs / Database Definition Scripts Attribute Mapping Mapper Creation Tool Processes Error Reports Repository Population Data Download and Upload Testing Data Validation Routine Generation Data Download and Upload Generation Tested Data Download and Upload Utilities Source Data Live Data Migration Target Database Metalogic Systems Confidential Page 5 of 10
Process Input and Output Process Input Output Source Data Model Source Data Model Source file layouts Analysis COBOL layout finalization for Flat and ISAM files Source database scripts (DDL/ DDS/SDDL/DBD/PSB/SQL etc.) & Application programs Source field/record Rules Source database entities Discrimination rules for each file/record/segment Creation of data mapping rules and Data Mapping and Validation Rules validations Repository Population Source file layouts 1. Populated Repository (2) Data Validation Programs Data Download & Upload Program Generation Data Download Testing Data Upload Testing Tested Data Download and Upload programs Populated Repository Data Mapping and Validation Rules Populated Repository Data Mapping and Validation Rules 1. Test Database in the Source Environment 2. Generated Data Download programs Sample source data in plain ASCII format Tested programs, compilation and installation scripts for both platforms 2. Generated Data Validation Programs Sample Error Reports on Invalid Data (3) Generated Data Download & Upload programs 1. Sample source data in plain ASCII format 2. Sample Error Reports on Invalid Data 1. Populated Test Database in the Target environment 2. Test results on the target platform Data Download and Upload programs installed on respective platforms Metalogic Systems Confidential Page 6 of 10
Process Input Output Data Migration Dry Sample data for all sources (Related Migrated data in target Run and complete) environment corresponding to the provided sample. Rough Estimate of actual time needed in final run. Data Migration Plan Test Data Migration Live Data Migration Full data for all sources for identified phase(s) Source Data Storage on Target platform Plan document Migrated data in staging environment Actual estimate of time required for final run Revised plan document Transformed data migrated to target platform Control report to ensure complete migration (1) Validation Rules may be defined for implementation during data transformation on source fields/records. Applying expressions or functions on source field(s) may generate target data elements. (2) The repository is a complex set of data structures that stores all information related to source data models. It can be used to produce a variety of reports about the source data models and generate the download programs. (3) This will be a repetitive task inspection of error reports coming out of this step will gradually refine the data mapping rules. Only after a couple of iterations it will be possible to extract all the prevalent rules existing in the source entities. Metalogic Systems Confidential Page 7 of 10
5. A brief look inside the Processes 5.1 Source Data Model Analysis i. Identify all types of storage (Network/Hierarchical/Relation databases, ISAM files, sequential files, etc.) and respective Data Definition scripts (Schema/Sub-schema/DBD/PSB/SQL scripts etc.). ii. Identify all data storage units (records/tables) requiring transformation. iii. iv. Identify all possible layouts for each data storage unit. Determine layouts of individual data storage units with break-up of data elements to the lowest possible levels. v. Identify rules to validate records and/or fields in each storage unit. For example, whether a field is a date field or not and if yes, what is the format of this date field. Or, if a field contains a set of valid codes, what are these codes (e.g., M for Male, F for Female) and so on. vi. Identify records with multiple layouts and the rules to distinguish the different layouts. For example, if there is multiple record types all put together in a single file, which field identifies the record type? 5.2 Target Data Model Analysis i. Determine target data model ii. Determine significance of each data element in target data model with respect to data migration requirement iii. Identify data elements to be populated by migrated data. 5.3 Attribute Mapping i. Identify rules to transform source data types to target data types ii. Correspond each target data element (columns) to source data iii. Identify rules (expressions, functions, etc.) to be applied on source data element(s) in order to populate target data elements iv. Identify rules to validate transformed data elements v. Identify rules to transform source records/files into target records/tables (viz., merge, split, etc.) vi. Identify discrepancies related to data types, sizes, and formats of data elements vii. Resolve discrepancies related to data types, sizes, and formats of data elements viii. Identify and resolve gaps between Source and Target data elements Metalogic Systems Confidential Page 8 of 10
5.4 Mapper Creation Data Migration Service An Overview Create Map Information files on the basis of source and target data models and the rules identified above to aid population of repository. Map Information files are files that store data mapping rules and validations in pre-defined formats. The transformation tool recognizes this format 5.5 Repository Population Outputs of all preceding processes are utilized to populate the repository. The repository is a complex data structure that stores all Source and Target data definitions and the rules for transformation. 5.6 Data Validation Routine Generation These programs will validate the supplied mapping rules. All the rules may not be readily available to the customer on day one and they have to evolve over a period of time. These programs will help validate those rules by sampling the actual data from the source databases and generating error reports. Appropriate inclusion / modifications are then applied on the supplied set of rules to bring the error report contents to an acceptable limit and determine the actual rules. 5.7 Data Download / Upload Program generation 5.7.1 Data Download Data download programs are generated by the tool and run on the source platforms. There is one program for every file/record/table in the respective source databases, which dumps the contents of the respective data store in a flat file with all fields converted to plain ASCII text, removing all platform dependencies (i.e., embedded sign fields, computational fields etc.). The download programs may also generate a control file in order to preserve the existing set relationships of the current record with other records, ordering and other information as per requirement, so that no information is lost while pulling the data out of the existing environments. 5.7.2 Data Upload Data Upload Programs generated or developed during this stage will take as input the downloaded ASCII data extracted in the previous step. The Data Mapping document containing all mapping rules between the source and target databases will supply the specifications for this task.. Metalogic Systems Confidential Page 9 of 10
Data Download Testing The generated Download programs are tested on a test database in the source database environment. Testing is an iterative process. The test results will confirm correctness of the download process. Data Upload Testing Data Upload programs will be run on the development environment after setting up the target databases there. The test results at this stage will confirm correctness of the entire migration process. Tested Data Download and Upload programs The tested data Download and Upload programs are then delivered and installed on respective platforms with appropriate scripts to compile and execute them. Data Migration Dry Run A test run on all Source data units with sample data to ensure success for the live run. This stage will also provide a rough estimate of the time required for the final data migration. Data Migration Plan A plan for the live data migration is produced. The plan takes all logistics and contingencies into account. Test Data Migration Test run of data migration with full set of operational data for the identified phase(s). The test run is to be carried out on the staging environment. This will provide an estimate of actual time required for the final data migration. This stage will enable to determine the most suitable phases in the entire migration process and will produce a final data migration plan. Live Data Migration An approved migration plan is followed to undertake the Live Data Migration. The correctness of the transformation is confirmed by comparing control reports generated for both Source and Target data. Metalogic Systems Confidential Page 10 of 10