Automate Data Integration Processes for Pharmaceutical Data Warehouse



Similar documents
Practical application of SAS Clinical Data Integration Server for conversion to SDTM data

Using SAS Data Integration Studio to Convert Clinical Trials Data to the CDISC SDTM Standard Barry R. Cohen, Octagon Research Solutions, Wayne, PA

ABSTRACT INTRODUCTION THE MAPPING FILE GENERAL INFORMATION

Gregory S. Nelson ThotWave Technologies, Cary, North Carolina

USE CDISC SDTM AS A DATA MIDDLE-TIER TO STREAMLINE YOUR SAS INFRASTRUCTURE

SDTM, ADaM and define.xml with OpenCDISC Matt Becker, PharmaNet/i3, Cary, NC

WHITE PAPER. CONVERTING SDTM DATA TO ADaM DATA AND CREATING SUBMISSION READY SAFETY TABLES AND LISTINGS. SUCCESSFUL TRIALS THROUGH PROVEN SOLUTIONS

Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA

PharmaSUG Paper CD13

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

Using SAS in Clinical Research. Greg Nelson, ThotWave Technologies, LLC.

Did you know? Accenture can deliver business outcome-focused results for your life sciences research & development organization like these:

Overview of CDISC Implementation at PMDA. Yuki Ando Senior Scientist for Biostatistics Pharmaceuticals and Medical Devices Agency (PMDA)

Use of Metadata to Automate Data Flow and Reporting. Gregory Steffens Novartis PhUSE 13 June 2012

How to easily convert clinical data to CDISC SDTM

Clinical Trial Data Integration: The Strategy, Benefits, and Logistics of Integrating Across a Compound

CDISC standards and data management The essential elements for Advanced Review with Electronic Data

ADaM or SDTM? A Comparison of Pooling Strategies for Integrated Analyses in the Age of CDISC

PharmaSUG2010 HW06. Insights into ADaM. Matthew Becker, PharmaNet, Cary, NC, United States

ABSTRACT INTRODUCTION PATIENT PROFILES SESUG Paper PH-07

Pharmaceutical Applications

Sanofi-Aventis Experience Submitting SDTM & Janus Compliant Datasets* SDTM Validation Tools - Needs and Requirements

PharmaSUG 2016 Paper IB10

SAS CLINICAL TRAINING

SDTM-ETL TM. The user-friendly ODM SDTM Mapping software package. Transforming operational clinical data into SDTM datasets is not an easy process.

Data Conversion to SDTM: What Sponsors Can Do to Facilitate the Process

Implementation of SDTM in a pharma company with complete outsourcing strategy. Annamaria Muraro Helsinn Healthcare Lugano, Switzerland

Streamlining the drug development lifecycle with Adobe LiveCycle enterprise solutions

Clinical Data Management (Process and practical guide) Dr Nguyen Thi My Huong WHO/RHR/RCP/SIS

Meta-programming in SAS Clinical Data Integration

Best Practices for Managing and Monitoring SAS Data Management Solutions. Gregory S. Nelson

Dynamic Decision-Making Web Services Using SAS Stored Processes and SAS Business Rules Manager

BRIDGing CDASH to SAS: How Harmonizing Clinical Trial and Healthcare Standards May Impact SAS Users Clinton W. Brownley, Cupertino, CA

Business & Decision Life Sciences

ProjectTrackIt: Automate your project using SAS

Electronic Submission of Regulatory Information, and Creating an Electronic Platform for Enhanced Information Management

Managing Custom Data Standards in SAS Clinical Data Integration

Statistical Operations: The Other Half of Good Statistical Practice

How to Use SDTM Definition and ADaM Specifications Documents. to Facilitate SAS Programming

A simple tool to catalogue statistical outputs developed for submission by linking two in-house systems experience from a submission project

A white paper presented by: Barry Cohen Director, Clinical Data Strategies Octagon Research Solutions, Inc. Wayne, PA

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

Practical Image Management for

StARScope: A Web-based SAS Prototype for Clinical Data Visualization

AN INTRODUCTION TO THE PHARSIGHT KNOWLEDGEBASE SUITE

Bringing Order to Your Clinical Data Making it Manageable and Meaningful

PharmaSUG Paper AD08

Quality Assurance: Best Practices in Clinical SAS Programming. Parag Shiralkar

Training/Internship Brochure Advanced Clinical SAS Programming Full Time 6 months Program

What's New in SAS Data Management

Managing Clinical Trials Data using SAS Software

Understanding CDISC Basics

Using the SAS XML Mapper and ODS PDF to create a PDF representation of the define.xml (that can be printed)

Copyright 2012, SAS Institute Inc. All rights reserved. VISUALIZATION OF STANDARD TLFS FOR CLINICAL TRIAL DATA ANALYSIS

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Rationale and vision for E2E data standards: the need for a MDR

Can Coding MedDRA and WHO Drug be as Easy as a Google Search? Sy Truong, Meta-Xceed, Inc., Milpitas, CA Na Li, Pharmacyclics, Sunnyvale CA

Managing and Integrating Clinical Trial Data: A Challenge for Pharma and their CRO Partners

A Macro to Create Data Definition Documents

Modernizing Your IT Systems While Preserving Your Investments & Managing Risk

Using Web Services to exchange information via XML

CDISC SDTM & Standard Reporting. One System

How To Use Sas With A Computer System Knowledge Management (Sas)

Monitoring Clinical Trials with a SAS Risk-Based Approach

PharmaSUG 2015 Paper SS10-SAS

Using SAS Enterprise Business Intelligence to Automate a Manual Process: A Case Study Erik S. Larsen, Independent Consultant, Charleston, SC

The CDISC/FDA Integrated Data Pilot: A Case. Support an Integrated Review

Bridging Statistical Analysis Plan and ADaM Datasets and Metadata for Submission

TopBraid Insight for Life Sciences

Whitepaper. Data Warehouse/BI Testing Offering YOUR SUCCESS IS OUR FOCUS. Published on: January 2009 Author: BIBA PRACTICE

Clinical Data Management (Process and practical guide) Nguyen Thi My Huong, MD. PhD WHO/RHR/SIS

ClinPlus. Report. Technology Consulting Outsourcing. Create high-quality statistical tables and listings. An industry-proven authoring tool

MOVING THE CLINICAL ANALYTICAL ENVIRONMENT INTO THE CLOUD

The ADaM Solutions to Non-endpoints Analyses

SDTM AND ADaM: HANDS-ON SOLUTIONS

ORACLE CLINICAL. Globalization. Flexibility. Efficiency. Competition ORACLE DATA SHEET OVERVIEW ROBUST CLINICAL DATA MANAGEMENT SOLUTION

Unicenter Desktop DNA r11

SDTM Validation: Methodologies and Tools

Benefits of Upgrading to Phoenix WinNonlin 6.2

«How we did it» Implementing CDISC LAB, ODM and SDTM in a Clinical Data Capture and Management System:

REx: An Automated System for Extracting Clinical Trial Data from Oracle to SAS

PhUSE Paper CD13

PharmaSUG Paper DS07

Use of standards: can we really be analysis ready?

ABSTRACT THE ISSUE AT HAND THE RECIPE FOR BUILDING THE SYSTEM THE TEAM REQUIREMENTS. Paper DM

How to build ADaM from SDTM: A real case study

The Importance of Good Clinical Data Management and Statistical Programming Practices to Reproducible Research

Essential Project Management Reports in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

AN INTRODUCTION TO THE PHARSIGHT KNOWLEDGEBASE SERVER (PKS )

Extraction Transformation Loading ETL Get data out of sources and load into the DW

Real-Time Market Monitoring using SAS BI Tools

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

Harnessing the power of advanced analytics with IBM Netezza

Japan PMDA and CDISC Standards. Yuki Ando Senior Scientist for Biostatistics Pharmaceuticals and Medical Devices Agency (PMDA)

Transcription:

Paper AD01 Automate Data Integration Processes for Pharmaceutical Data Warehouse Sandy Lei, Johnson & Johnson Pharmaceutical Research and Development, L.L.C, Titusville, NJ Kwang-Shi Shu, Johnson & Johnson Pharmaceutical Research and Development, L.L.C, Titusville, NJ ABSTRACT SAS Data Integration (DI) Studio is a JAVA GUI tool for processing inbound data into a data warehouse. For an environment that has established processes and rules for data manipulation, DI studio provides a visual mechanism to display extracted data attributes, transformation mappings, and loading processes. This is the ideal case and the preferred way to use the tool. Applying the DI studio to pharmaceutical clinical data sometimes encounters the following issues: 1) the length, label, or type of a variable differs from study to study within the same compound, 2) a study may have many datasets and each dataset may contain many variables. Applying a manual process via DI Studio is time-consuming and error-prone. Often the same process may need to be repeated for different studies. To resolve these issues, an automated Data Integration for data extraction, transformation and loading process has been developed using combination JAVA plug-ins, user-written codes and SAS macros to interface with SAS Metadata server. INTRODUCTION Clinical trials are usually conducted on a per study basis. Most of the studies require the consistency on the case report form (CRF) across all studies for the same compound, while others allow variations. Based on various needs, statistically or programmatically, different variables may be introduced into the analysis data sets for different studies. As a result, combining studies to form a uniform clinical data set for integrated summary of safety (ISS) or integrated summary of efficacy (ISE) is often a challenging task for either metadata analysis and/or e-submission in the pharmaceutical industry. This paper addresses the process of loading and creating uniform clinical data sets into a Data Warehouse. To ease the work, third-party tools, such as SAS Metadata Server and DI Studio, are used. Business Intelligence is a set of software tools and applications that enable business users and analysts to interact with their company data in an easy, efficient and effective manner. SAS Business Intelligence includes the following: 1) a set of client applications designed for a specific type of business or analyst, 2) SAS server processes designed to provide specific type of services for the client applications, 3) a centralized metadata management facility. As one of the client side tools, SAS Data Integration Studio enables a data warehouse specialist to do the following: 1) extract data from operational data stores, 2) transform data into a desired form, and 3) load data into a data warehouse. On the server side, SAS implements SAS Open Metadata Architecture, which is a general-purpose metadata management facility that provides common metadata services to SAS applications. The architecture includes an application metadata model, a repository metadata model, an application programming interface, and a metadata server. There are several implementations of Application Program Interface (API), such as JAVA, C++, SAS, and all of them use XML as the transport language.

The pharmaceutical industry today is facing ever-increasing pressure to bring new drugs to the market that are safe as well as cost effective but with shorter timelines and less resources. It has become critical that a well-designed Clinical Data Warehouse be able to help scientists and researchers to tap into for potential innovative ideas and solutions to address these new business and regulatory challenges. As the pharmaceutical arm of Johnson & Johnson, Johnson & Johnson Pharmaceutical Research and Development, L.L.C (J & J PRD) strives to manage and mitigate risks, manage intellectual assets, leverage intellectual assets to improve science and exploratory analysis, create robust submissions, increase the ability to do cross-study and cross-product analysis, and address the FDA s new proactive risk assessment guidelines. All these business requirements lead us to work on building a clinical data warehouse that has a uniform repository for loading, organizing, combining and interrelating clinical information from multiple sources for use in risk management, has search and access capabilities, can perform various analyses, report and output capabilities and meet compliance. GOAL OF THE PROCESS Clinical information is organized around compounds. Each compound may have many studies at various phases, such as PK, Phase I, II, III or IV, across many years. The label, length and even type of a variable may differ among these studies. To create a uniform data repository, one has to ask what is the uniform data set? What kind of data model is the uniform data set going to use? For studies starting from 2006, J&JPRD has rolled out the Clinical Data Interchange (CDISC) Study Data Tabulation Model (SDTM) based data model; any study starting earlier than that is based on J&J PRD s proprietary Analysis Data Model (ADM). Converting data from ADM model to SDTM model will be dealt later. The ultimate goal of the process is to create harmonized analysis data sets (HADS) across studies for the same compound. The reality is that there are many variations among studies and once a study is completed and submitted to FDA, the data cannot be changed. Therefore we propose a two-stage approach: harmonize the study data based on the data model, and then create integrated data sets by resolving the remaining inconsistencies in a systematic approach. THE AUTOMATED PROCESS The overall automated process is presented in the following process map.

The automated process starts by creating and loading compound metadata information; that includes all data domains and attributes of all variables, such as comments, length, type, formats, code list, and origin, into the Metadata server. A compound metadata can be developed as expanding a selected existing standard model such as SDTM, ADaM, or any in-house models. The following figures are snapshots of the DI studio that show the extended attributes of the compound metadata. Figure 1 contains the circled symbols for keys, notes, and the extended attributes of each variable. Figure 2 shows the comments from the data specifications via the DI studio note text field. Figure 1: Compound Metadata Figure 2: Algorithm defined in the compound metadata notes

The next step is to load study datasets into the Metadata server, extract the attributes from study datasets, and compare the attributes from study verse compound to identify any deviation from the data model. In order to capture the transformation process information, an excel workbook is created which contains various comparison reports such as missing compound-required variables, variables in study but not in compound, mismatched attributes, etc. Some of the reports, such as finding data sets containing null key variables, take a long time when running sequentially. They are ideally for parallel programming due to logical independency between data sets and can be programmed in parallel. The Excel workbook that captures these comparison reports will be used to document any necessary transformations to create harmonized data. The following are the examples of three comparison reports created in this step: missing compound-required variables, in study and not in compound, and mismatch study vs. compound. Among them, missing compound-required variables, is shown in the Figure 3 below, will be used as example for transformation and loading process. Figure 3: Comparison Reports Two methodologies were explored in the SAS DI studio for data transformation: 1) interactive approach through the DI mapping process, 2) create user-written codes as background support for the transformations. We believe that using user-written code approach is more efficient when the in-bound databases require similar or identical transformations. Figure 4 below shows that there are three variables renamed through interactive process via the DI studio. Besides renaming, Proc SQL statements can be used to perform data manipulation for the variables. The same process will be repeated for all other studies in the same compound since names of these variables were changed in the data model after studies have been completed.

Figure 4: Transformation through interactive mapping process Figure 5 and Figure 6 demonstrated the user-written code with job template approach. The actual codes are SAS data step statements with additional macro variables. It can be stored as a template and shared by studies. This approach reduces repetitive work by sharing the same codes. The job template contains information such as where the actual user-written codes located. Figure 5: Transformation via user written code with job template

Figure 6: User written code file location After applying transformation and loading, the exact same comparison reports will be created again to make sure there is no deviation from the data model. The transformation and loading study process can be iterated several times until all concerns are addressed and harmonized study data sets are created. A compound level transformation spreadsheet document can be dynamically updated by adding any new transformations when the new study transformations are created. With the reality that the clinical database usually contains large amount of variables, structures, and codelists within a compound, and the time and resource constrains, we define a harmonized individual study database as one that meet all data model specifications with the following exceptions across studies: 1) code-decode inconsistency for CRF pre-printed variables, 2) length inconsistency for the variables, 3) type inconsistency for variables that are not relevant to analysis. An automated data pooling tool will be developed to resolve the length and type inconsistencies, such as applying the largest length and convert numerical variables to character ones. Resolving code-decode inconsistency is complicated since it usually is based on the study statistical analysis plan. For example, same treatment group codes in different studies usually have different decodes. As the result, treatment group code-decode relationship has to be redefined when creating the ISS/ISE databases. In general, it is a case-by-case approach. Pooling all harmonized study databases with the data-pooling tool described above, a uniform integrated compound database and its corresponding metadata will be created. The conjunction of the integrated database and the metadata database could be sufficient to address regulatory authorities' requests and to support further clinical researches. CONCLUSION For many years, to have a uniform clinical data set across studies has been a big challenge for pharmaceutical companies. We proposed and implemented a two-stage automated approach to achieve this goal. These processes can be audited by examining the transformation workbook that is also used as the

basis for automating the transformation. In the future, we will extend the automated work by periodically checking the deviation of ongoing studies from the data model to ensure consistency across studies, which can reduce the work involved to resolve inconsistencies during the integration stage. With the single source of data, the consistent data format and the metadata integrity, searching and accessing across studies can be conducted systematically, and the time and the efforts spent will be reduced significantly. REFERENCES 1. SAS 9.1.3 ETL Studio User s Guide, SAS Institute Inc., Cary, NC USA 2004 2. SAS Using SAS ETL Studio to Integrate Your Data, SAS Institute Inc., Cary, NC USA 2005 3. SAS 9.1.3 Open Metadata Interface User s Guide, SAS Institute Inc., Cary, NC USA 2004 4. SAS 9.1.3 Open Metadata Interface Reference, SAS Institute Inc., Cary, NC USA 2005 5. Introduction to the SAS Business Intelligence Client Tools, SAS Institute Inc., Cary, NC USA 2004 6. Distributed and Parallel Processing with the SAS System, SAS Institute Inc., Cary, NC USA 2004 7. CDISC Study Data Tabulation Model: SDTM V1.1 & SDTM IG V3.1.1, 2005 8. Analysis Data Model: ADM Specifications V4.1.3, J&JPRD, 2004 ACKNOWLEDGEMENTS We would like to thank our managers for their support, encouragement as well as continuously challenging us to enhance the automated clinical data warehousing processes. TRADEMARKS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. JAVA is a trademark of SUN Microsystems Inc. CONTACT INFORMATION Sandy Lei Johnson & Johnson PRD 1125 Trenton-Harbourton Road Titusville, NJ 08560 Work Phone: (609) 730-3320 Email: slei@prdus.jnj.com Kwang-Shi Shu Johnson & Johnson PRD 1125 Trenton-Harbourton Road Titusville, NJ 08560 Work Phone: (609) 730-4633 Email: kshu@prdus.jnj.com