Incremental Data Migration in Multi-database Systems Using ETL Algorithm



Similar documents
Data Migration System in Heterogeneous Database Shinde Anita Vitthal 1, Thite Vaishali Baban 2, Roshni Warade 3, Krupali Chaudhari 4

Database Migration over Network

Data Migration In Heterogeneous Databases (ETL)

Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D.

USING MYWEBSQL FIGURE 1: FIRST AUTHENTICATION LAYER (ENTER YOUR REGULAR SIMMONS USERNAME AND PASSWORD)

David Dye. Extract, Transform, Load

Entity store. Microsoft Dynamics AX 2012 R3

Implementing and Maintaining Microsoft SQL Server 2008 Integration Services

Automated Data Validation Testing Tool for Data Migration Quality Assurance

Karl Lum Partner, LabKey Software Evolution of Connectivity in LabKey Server

INTRODUCTION: SQL SERVER ACCESS / LOGIN ACCOUNT INFO:

Reduce and manage operating costs and improve efficiency. Support better business decisions based on availability of real-time information

MIGRATING TO AVALANCHE 5.0 WITH MS SQL SERVER

Basics Of Replication: SQL Server 2000

Table of Contents SQL Server Option

Chapter Replication in SQL Server

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

1 JiJi AD Bulk Manager User Manual. JiJi AD Bulk Manager - User Manual

Real-time Data Replication

Employer Quick User Guideline

Optimization of ETL Work Flow in Data Warehouse

MSSQL quick start guide

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012

Zero Downtime Deployments with Database Migrations. Bob Feldbauer

Migration Manager v6. User Guide. Version

news from Tom Bacon about Monday's lecture

Connecting to Manage Your MS SQL Database

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Implementing a Data Warehouse with Microsoft SQL Server 2012

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

East Asia Network Sdn Bhd

Chapter 5. Learning Objectives. DW Development and ETL

Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server

MicroStrategy Course Catalog

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Oracle Database 10g Express

Extraction Transformation Loading ETL Get data out of sources and load into the DW

How to Copy A SQL Database SQL Server Express (Making a History Company)

ETL tools for Data Warehousing: An empirical study of Open Source Talend Studio versus Microsoft SSIS

AWS Schema Conversion Tool. User Guide Version 1.0

A Critical Review of Data Warehouse

Moving the Web Security Log Database

Tracking Database Schema Changes. Guidelines on database versioning using Devart tools

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

MySQL for Beginners Ed 3

1 File Processing Systems

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Data Integration and ETL Process

AUTHENTICATION... 2 Step 1:Set up your LDAP server... 2 Step 2: Set up your username... 4 WRITEBACK REPORT... 8 Step 1: Table structures...

G Data TechPaper #0257

Implementing a Data Warehouse with Microsoft SQL Server

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Implementing a SQL Data Warehouse 2016

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

INTRODUCING ORACLE APPLICATION EXPRESS. Keywords: database, Oracle, web application, forms, reports

How to Use PIPS Access to/from SQL Database Utility Program. By PIPSUS Support Team Dr. Chouikha

A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Backup/Restore Microsoft SQL Server 7.0 / 2000 / 2005 / 2008

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Jet Data Manager 2012 User Guide

Data Warehouse. Business Objects

Querying the Data Warehouse Using Microsoft Access

Implementing a Data Warehouse with Microsoft SQL Server 2012

ETL as a Necessity for Business Architectures

Dynamic Data in terms of Data Mining Streams

MS-40074: Microsoft SQL Server 2014 for Oracle DBAs

AVALANCHE MC 5.3 AND DATABASE MANAGEMENT SYSTEMS

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

LearnFromGuru Polish your knowledge

CONVERTING FROM NETKEEPER ISAM v6.32 to NETKEEPER SQL

1) After login WinTonenet Securities Trading System, a window named WinTonenet Securities Trading will be shown.

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Course Outline. Module 1: Introduction to Data Warehousing

A Framework for Data Migration between Various Types of Relational Database Management Systems

Unit 10: Microsoft Access Queries

29200 Northwestern Hwy Suite 350 Southfield, MI WINSPC winspc.com

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

DigiVault Online Backup Manager. Microsoft SQL Server Backup/Restore Guide

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server

Integrity 10. Curriculum Guide

Solving Business Pains with SQL Server Integration Services. SQL Server 2005 / 2008

How To Load Data Into An Org Database Cloud Service - Multitenant Edition

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Connecting to your Database!... 3

International Journal of Advanced Research in Computer Science and Software Engineering

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

CRM Migration Manager for Microsoft Dynamics CRM. User Guide

INTEGRATING MICROSOFT DYNAMICS CRM WITH SIMEGO DS3

Migrate Topaz databases from One Server to Another

SQL Server Training Course Content

Transcription:

Incremental Data Migration in Multi-database Systems Using ETL Algorithm Rajesh Yadav, Prabhakar Patil, Uday Nevase, Kalpak Trivedi, Prof. Bharati Patil Department of Computer Engineering Imperial College of Engineering and Research, Pune Abstract Nowadays Information is becoming an increasingly valuable corporate asset, the demand for a right tool to store, manage, and move this information in a cost efficient and reliable manner do arises. As part of an Information Lifecycle Management (ILM) best-practices strategy, the organizations require innovative solutions for migrating data among heterogeneous database systems and environments. To support this need, we as engineering students planned to design a powerful tool that enables highperformance and affordable data migration in a wide range of storage environments. This paper covers the unique challenges of data migration in dynamic IT environments and the key business advantages that we have designed to provide over traditional tools used for migration. Index Terms s, networks, middleware, multi-database and data migration I. INTRODUCTION Nowadays systems have become an essential part of any computer software, almost every computer system be it personal or corporate do have a database system. Low introduction and running cost being the main reason for this popularity. Having a look at the current trend when an organization starts, it usually starts as a Small- Scale industry and hence Microsoft Access is quite capable of handling its records and databases. But perceptibly as the company expands its database has to expand too. Thus they have to switch to more efficient databases. Hence the demand for a right tool to manage this data in a cost efficient and reliable manner do arises. Due to advancement in the technology newer version of secure database have been developed to transfer the data to the newer version or to an upgraded databases. In pursuance of meeting this demand Data Migration tools are developed, which are aimed at conversion of one database to another database easily and efficiently. Data Migration tools are usually developed for individuals and organizations to save time for migrating into a new database and helps to free up the human resources from tedious and complicated task. Instead of creating all the tables etc. of the already existing database, one can simply use the tool to convert it into new database, if required. Data migration can be used for server storage, equipment replacement or upgrades, website consolidation, server maintenance and for the data center relocation. Data migration tool can also be used by organizations for dealing with the migration issues regarding the complex data exports and imports. After all, exporting, importing or migrating data between different sources is very time consuming and complicated task, especially if these data sources store data in different formats.[1] This is where a database migration tool comes handy. Basically during the migration process, data is extracted from the old system and loaded to the new system. Currently various tools are available for data migration that transform a particular database system to another. Hence this tool has been developed keeping in mind conversion and migration of database to any other available database easily and efficiently without need of any technical knowledge or manual effort. Here an incremental approach has been considered for migration of data from a single database system to multi-database. II. BRIEF HISTORY Data migration techniques have always been a topic of research. Many reorganization techniques have already been proposed [2] [3] and the research on many is on the way. Some of these typical techniques proposed use the snapshot function to target data for the migration process. By using the snapshots, other queries can be processed while processing the data migration. But these techniques are not always adaptive hence other technique for data migration is necessary. One way to solve the data migration issue is by using the classical data migration technique that uses the locking, insertion and deletion function for processing data migration. In this technique, queries on the target data cannot be processed until the data migration is completed. Here, the turn-around time of other queries using this technique is more degraded than the snapshot technique wherein the turn-around time for data migration using this technique is shorter than the snapshot. Hence we consider only this classical data migration technique [4]. III. PROPOSED SYSTEM The system proposed is designed on the basis of the classical data migration scheme where locking, insertion and deletion function are used for data migration. In this scheme, target data in the source database is locked to prevent the execution of any queries on the target data until the data migration is completed. Some part of the destination database is also locked if necessary, there after transfer of data takes place where the target data is inserted to the destination database after the process of cleaning and transformation. Finally after successful migration the target data is unlocked. During the whole process of migration the target data is unaffected what so ever be the condition[5][6]. www.ijcsit.com 2121

www.ijcsit.com 2122

1. lock the target data in the source database system 2. get some locks in the destination database system if necessary 3. transfer the target data 4. insert the target data to the destination database system 5. unlock (according to 1 and 2) Figure.4 Proposed scheme The migration tool developed based upon the above scheme helps migrating database structures and data across various relational databases. The system will take an existing database in one format from the user and will convert it to a database in another format, which again will be specified by the user. The system will felicitate migrating data among MS-access, Sql server and Oracle. Access My SQL Oracle SQL server Figure.5 Relation databases IV. PROPOSED ARCHITECTURE My SQL SQL Server Oracle The system architecture designed for the data migration allows the validated user to select the source and destination database from the given options. The migration of database is carried out by migrating the tables first and then going for the actual migration. Before starting the migration process the connections for the source and destination are established and the detailed summary of the selected items is displayed to the user and he is then prompted to select the desired constraints. And finally the actual migration process is initiated Figure.6 explains the system architecture of the proposed system. It explains the overall working of the application. Step 1 - The user is provided with the login facility. The user will be prompted for a username and password, for the validating the user. Step 2 - The user is then asked to select the source database which is to be migrated. The source database consists of MS Access, My SQL, Oracle and SQL Server. After selecting the source database the user has to fill the details of the selected source database to create the connection. Step 3 - After the connection is tested the user is asked to select the destination database for migrating. After selecting the destination database the user has to fill the details of the selected destination database to create the connection. Step 4 - The user is prompted to select the tables, columns of the table, views which are to be migrated. Step 5 - The user is provided with the facility to select either the schema of the table or rows of the table along with the constraints which are applied to the tables. Step 6 - Click start migration button to start the migration. After a successful migration of source database to destination database it displays the status report of the complete migration process. This report shows whether the migration is completed or still pending. It also shows the status of the various constraints applied to the tables. Whether rows are inserted into the tables or not or the schema is migrated or not is also shown. V. ETL ALGORITHM To achieve an effective data migration process, data on the old system is mapped to the new system providing a design for data extraction and data loading. This design relates old data formats to the new system's requirements and formats. Programmatic data migration may involve many phases but it minimally includes data extraction where data is read from the old system and data loading where data is written to the new system. The algorithmic strategy used for the purpose is that of ETL where data from a relational database is read and after applying some transformation or business rules it is loaded to another database. Here it is assumed that the source data structures are generally not similar to target data structures [7]. The Plan Basically extraction, transformation, and load require three main steps: (i)read the source data (ii)apply business and transformation rules (iii)load the data Figure.6 System Architecture Figure.7 ETL Architecture www.ijcsit.com 2123

After the process reads the data, it must transform the data by applying business and transformation rules to it. Applying business and transformation rules to target data means transforming codes, generating keys, converting data types, parsing, merging, and many other operations. Once the target data is in an appropriate business and technical format, the ETL process can load it into target tables. Extract The first part of an ETL process involves extracting the data from the source database. In most of the cases this is the most challenging aspect of ETL, since correct extraction of data sets the stage for how subsequent processes go further. In general, the goal of the extraction phase is to convert the data into a single format appropriate for transformation processing. The extract step should be designed in a way that it does not negatively affect the source system in terms of response time, performance or any kind of locking [8][9]. Transform The transform stage applies a series of rules or functions to the extracted data from the source to derive the data for loading into the destination. Few data sources may require very little or even no manipulation of data. In some cases, one or more of the following transformations may be required to meet the business and technical needs of the target database: (i)selecting only certain columns to load (or selecting null columns not to load). For example, if the source data has four columns (also called attributes), for example roll_no, name, age, and salary, then the extraction may take only salary and roll_no. Similarly, the extraction mechanism may ignore all those records where salary is not present (salary = null) or just select some particular names. (ii)translating coded values (e.g., if the source database stores 1 for male and 2 for female, but the destination database stores M for male and F for female) (iii)encoding free-form values (e.g., mapping "Male" to "M") (iv)deriving a new calculated value (e.g., sale_amount = qty * unit_price) (v)applying any form of data validation. In case the validation fails, it may result in a partial, full or no rejection of the data, and hence none, some or all the data is handed over for the next step, which basically depends on the rule design and exception handling.[10] Loading The load phase loads the data into the target database. Here if the user as chosen to extract and load the full database then the whole database would be extracted and after applying transformation it would be loaded to the destination database else the user and specify the tables, views and stored procedures that need to be migrated. VI. APPLICABILITY A. Individuals The database migration tool also becomes useful for individuals. Imagine a scenario where an individual has data divided into two different databases and he wants to combine the two, he can convert one part into the other s format and use it. Thus database migratory comes handy here. The below figure explains how database migratory becomes handy for individuals. If the data is divided into two different formats then database in format 1 is used in the same format and the database in format 2 is given to the migrator to convert the database in format 2 to the database in format 1. This converted database is then combined with the previous database. Format 2 Migrator Figure.8 Individuals B. Organizations According to the above mentioned scenario i.e. when a company starts it starts as a small-scale industry and hence MS Access database is quite capable of handling its records. But gradually with time as the company expands its database has to expand too thus the scenario of switching to a more efficient database do arises. Hence the proposed tool becomes very useful for the organizations when the size of their data expands and the necessity to migrate to another more efficient database arises. VII. CONCLUSION As the technology changes, working with the same old technology would surely bear some losses which any organization wouldn t prefer. Hence the scenario of upgrading from the existing system to the new one do arises. And to do so the user needs to understand the new technology in order to work with it. Though importing, exporting or migrating data between different databases is very complicated and time consuming process, especially if these data sources store data in different formats, that s where this tool comes into existence providing a interactive GUI to work with.. www.ijcsit.com 2124

REFERENCES [1] Lixian Xing, Yanhong Li, Design and application of data migration system in heterogeneous database in 2010 International Forum on Information Technology and applications. [2] Salzberg, B., and Dimock, A., Principles of Transaction-Based Online Reorganization, Proc. of 18th International Conf. on Very Large Data Bases, pp. 511-520, 1992.. [3] Achyutuni, K., Omiecinski, E., and Navathe, S., Two techniques foron-line index modification in shared nothing parallel database, Proc.of the 1996 ACM SIGMOD International Conf. on Management ofdata, pp. 124-136, 1996. [4] Microsoft CRM Data Migration Framework White Paper by Parul Manek, Program Manager Published: April 2003. [5] DATA MIGRATION BEST PRACTICES etapp Global Services January 2006. [6] Jaiwei Han, Michelinne Kamber, Data Minning: Concepts and Techniques. [7] en.wikipedia.org/wiki/extract_transform_load [8] Panos Vassiliadis, Alkis Simitis Conceptual Modelling for ETL process [9] DATA Quality in health Care Data warehouse Environments Robert L.Leitheiser, University of Wisconsin Whitepaper [10] www.mysql.com www.ijcsit.com 2125