Structure of the presentation

Size: px
Start display at page:

Download "Structure of the presentation"

Transcription

1 Integration of Legacy Data (SLIMS) and Laboratory Information Management System (LIMS) through Development of a Data Warehouse Presenter N. Chikobi Structure of the presentation Background Preliminary work Sites database Legacy data (SLIMS Database) Weir data Microsoft share point Hardware Architecture Investigation with Business Intelligence Vendor 1

2 Structure of the presentation (Cont) Purpose and objective of the project Development of Data Warehouse(SLIMS and LIMS) Business requirements Creating master data Creating dimension Development of ETL Deployment Reporting Recommendations Conclusions Background Providing water quality data that is accurate and reliable is the primary goal of Rand Water Scientific Service Division. Rand Water Scientific Service is responsible for ensuring that water quality data analysed is of good quality. This is achieved through constant monitoring, maintaining, and securing of data. It is also important for Rand water Scientific Services that the historic and current data are stored, secured and managed centrally. 2

3 Preliminary Work Interviews were held to understand decentralized data. Data sources identified during investigations; Analytical Services LIMS (ASLIMS) Zwartkopies LIMS (ZKLIMS) Vereeniging LIMS (VGLIMS) Zuikerbosch LIMS (ZBLIMS) Symmetry Laboratory Information Management (SLIMS) database. Weir database Electronic data Hardcopy data Preliminary Work (Cont) Some decentralized data sources were copied to a central server and restored; Investigations were conducted on all RW LIMS deployments. Legacy SLIMS was secured. Microsoft share point was installed. Challenges are ; Hardcopy data. Excel Etc 3

4 SLIMS A legacy system SLIMS Database Server outdated Windows NT not supported by MS Not on RW network since network upgrade Risk - Danger of data loss SLIMS Database It contains water quality data from 1950 to SLIMS System consist of : Relational database and Worklist data. The relational data was on SQL server 6.5 version. SLIMS is now secured on a viable MS SQL Server Environment 4

5 SLIMS Database Warehousing SLIMS and LIMS, the following processes were applied; Relational db was upgraded from MSSQL 6.5 db version to MSSQL 2005(Current LIMS db version) The upgrading was done by exporting all tables from Ms Access db to Ms SQL 2005 db. Converting Worklist data to MSSQL 2005 database SQL scripts were utilized. SLIMS Database MS SQL Data Validation Relational database Comparison of data was done to check any missing data or tables. The validation was done by comparing the number of records per table between MS Access and MS SQL Worklist data Worklist were chosen with a maximum of ten per folder and compared with the results in the MS SQL. Thus the legacy data is secured. 5

6 Hardware Architecture Investigation with Business Intelligence Vendor Help was requested from ASYST Intelligence to obtain a clear understanding. Road map Document was developed by our BI partner The scope of the project had to be narrowed down due to; budget and time constrains. Immediate benefit 6

7 Investigation with BI Vendor (Cont) Two solutions were considered namely; Short term Strategy to develop a data warehouse which will only consist of SLIMS database (historical) and LIMS database (Current database). Long term Strategy - to develop a fully integrated central repository (Rand Water Water Quality Portal) which will house all data sources identified within our division and place them into an environment that is easily accessible and managed centrally. Project Purpose and Objectives To develop a data warehouse which will consist of legacy data and current data that will be integrated, maintained, updated and backed up. To provide RW with Business Intelligence on its water quality data. 7

8 Development of Data Warehouse Activities engaged to develop the warehouse; Business requirements by interviewing various knowledgeable RW personnel. Documenting the sample process flow. Create Master Data Create the Dimension Model Development of Extract, Transform and Load Deployment of the solution Reporting Master Data Sources systems 8

9 Dimension Model ETL System Architecture SOURCE SYSTEMS - LIMS (Current Data) - SLIMS (legacy historic data) EXTRACT via ODBC Drivers ETL 1 STAGING DB - Data deltas Success TRANSFORM - Clean - Conform ETL SYSTEM Success LOAD - Shared dimension - Slowly changing dimensions - Populate Facts - Populate Aggregate Facts Success ETL 2 DW DB Dimensional Model Failure Failure Failure ETL META DATA - System exceptions - Record exceptions - Data quality exceptions ETL System Management Services Process metadata: Technical metadata: - ETL operation statistics - ETL job logic transformations - Audit results - Retention, backup and security 9

10 Development of ETL ETL was designed using Microsoft SQL Server Integration Services 2005(SSIS). Four processes were developed to Extract, Transform and load the data which are; Extract Load dimension table Load fact table Controls Two packages were created namely, Initial load package and daily load package which involves the above mentioned packages. Initial Load Package Development of ETL Control - run the packages sequentially Extract- Extract the whole data from LIMS to staging area (RWWQ_SG) without doing any transformation. SLIMS was loaded once off. Load Dimension transform and load data into the dimension table. Load dimension tables on RWWQDW from RWWQ_SG. Load fact transform and load the data into the fact tables. Load fact tables on RWWQDW from RWWQ_SG. Daily load Packages Extract Extract 5 days of data from LIMS to staging area (RWWQ_SG) without doing any transformation. Load Dimension transform and load data into the dimension table. Load fact transform and load the data into the fact tables 10

11 Deployment All packages were deployed on file system Reasons for deploying on file system are as follows; No need to configure tasks like Execute package task during deployment which was going to delay the process. Allow user to see the run progress which is not possible when package are deployed on SQL server. Green, red and yellow colours are used to indicate failure, success and progress of the run. ETL Package 11

12 Recommendations Organization to truly benefit from the vision of creating the Rand water water quality portal; Phase two of which is to incorporate other data sources to the current data warehouse should be initiated. Benefits are; executive view of water quality data viewing data from various resources on one report faster response to changing requirements of the business data will be safeguarded and managed centrally Enhance decision making. Recommendations (Cont) Reporting Layer - Current Reporting Layer - Ideal Crystal Reports Crystal Reports Web Intelligence Info View BusinessObjects universe Data Warehouse Data Warehouse BusinessObjects Enterprise 12

13 Conclusion The warehousing of SLIMS and LIMS was completed successfully. The success of the project was achieved based on team work. Acknowledgements to; Information Management Team Business Intelligent Team Water Quality Services Team Process Technology Team Thank You! 13