Use of Metadata to Automate Data Flow and Reporting. Gregory Steffens Novartis PhUSE 13 June 2012

Similar documents
USE CDISC SDTM AS A DATA MIDDLE-TIER TO STREAMLINE YOUR SAS INFRASTRUCTURE

How to easily convert clinical data to CDISC SDTM

Automate Data Integration Processes for Pharmaceutical Data Warehouse

Clinical Trial Data Integration: The Strategy, Benefits, and Logistics of Integrating Across a Compound

Statistical Operations: The Other Half of Good Statistical Practice

New features in SDTM-ETL v SDTM-ETL TM. New Features in version 1.2

PharmaSUG2010 HW06. Insights into ADaM. Matthew Becker, PharmaNet, Cary, NC, United States

Bridging Statistical Analysis Plan and ADaM Datasets and Metadata for Submission

Training/Internship Brochure Advanced Clinical SAS Programming Full Time 6 months Program

ABSTRACT INTRODUCTION THE MAPPING FILE GENERAL INFORMATION

Using the SAS XML Mapper and ODS PDF to create a PDF representation of the define.xml (that can be printed)

Practical application of SAS Clinical Data Integration Server for conversion to SDTM data

Business & Decision Life Sciences What s new in ADaM

Einführung in die CDISC Standards CDISC Standards around the World. Bron Kisler (CDISC) & Andrea Rauch DVMD Tagung

Business & Decision Life Sciences

A Brief Introduc/on to CDISC SDTM and Data Mapping

CDISC SDTM & Standard Reporting. One System

Implementation of SDTM in a pharma company with complete outsourcing strategy. Annamaria Muraro Helsinn Healthcare Lugano, Switzerland

WHITE PAPER. CONVERTING SDTM DATA TO ADaM DATA AND CREATING SUBMISSION READY SAFETY TABLES AND LISTINGS. SUCCESSFUL TRIALS THROUGH PROVEN SOLUTIONS

Lessons on the Metadata Approach. Dave Iberson- Hurst 9 th April 2014 CDISC Euro Interchange 2014

Analysis Data Model: Version 2.0

Managing Custom Data Standards in SAS Clinical Data Integration

Understanding CDISC Basics

Using SAS Data Integration Studio to Convert Clinical Trials Data to the CDISC SDTM Standard Barry R. Cohen, Octagon Research Solutions, Wayne, PA

SDTM Validation: Methodologies and Tools

Electronic Submission of Regulatory Information, and Creating an Electronic Platform for Enhanced Information Management

Metadata Submission Guidelines Appendix to the Study Data Tabulation Model Implementation Guide

A Macro to Create Data Definition Documents

Clinical Data Management (Process and practical guide) Dr Nguyen Thi My Huong WHO/RHR/RCP/SIS

Synergizing global best practices in the CRO industry

ClinPlus. Report. Technology Consulting Outsourcing. Create high-quality statistical tables and listings. An industry-proven authoring tool

Overview of CDISC Implementation at PMDA. Yuki Ando Senior Scientist for Biostatistics Pharmaceuticals and Medical Devices Agency (PMDA)

SDTM AND ADaM: HANDS-ON SOLUTIONS

The CDISC/FDA Integrated Data Pilot: A Case. Support an Integrated Review

PharmaSUG Paper AD08

UTILIZING CDISC STANDARDS TO DRIVE EFFICIENCIES WITH OPENCLINICA Mark Wheeldon CEO, Formedix Boston June 21, 2013

Clinical Data Management (Process and practical guide) Nguyen Thi My Huong, MD. PhD WHO/RHR/SIS

PharmaSUG Paper CD13

Meta-programming in SAS Clinical Data Integration

CDISC Roadmap Outline: Further development and convergence of SDTM, ODM & Co

PharmaSUG 2016 Paper IB10

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

SDTM, ADaM and define.xml with OpenCDISC Matt Becker, PharmaNet/i3, Cary, NC

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

How to build ADaM from SDTM: A real case study

Data Conversion to SDTM: What Sponsors Can Do to Facilitate the Process

CDER/CBER s Top 7 CDISC Standards Issues

ADaM Implications from the CDER Data Standards Common Issues and SDTM Amendment 1 Documents Sandra Minjoe, Octagon Research Solutions, Wayne, PA

STUDY DATA TECHNICAL CONFORMANCE GUIDE

ADaM or SDTM? A Comparison of Pooling Strategies for Integrated Analyses in the Age of CDISC

STUDY DATA TECHNICAL CONFORMANCE GUIDE

PharmaSUG 2015 Paper SS10-SAS

SDTM-ETL 3.1 New Features

How to Use SDTM Definition and ADaM Specifications Documents. to Facilitate SAS Programming

CDISC SDTM/ADaM Pilot Project 1 Project Report

Building and Customizing a CDISC Compliance and Data Quality Application Wayne Zhong, Accretion Softworks, Chester Springs, PA

4. Executive Summary of Part 1 FDA Overview of Current Environment

Software Validation in Clinical Trial Reporting: Experiences from the Biostatistical & Data Sciences Department

SDTM Validation Rules in XQuery

PhUSE Paper CD13

XClinical offers an integrated range of software products for CROs, pharmaceutical, medical device and biopharmaceutical companies.

BRIDGing CDASH to SAS: How Harmonizing Clinical Trial and Healthcare Standards May Impact SAS Users Clinton W. Brownley, Cupertino, CA

STUDY DATA TECHNICAL CONFORMANCE GUIDE

Business & Decision Life Sciences CDISC Workshop: From SDTM to ADaM: Mapping Methodologies

Implementing CDASH Standards Into Data Collection and Database Design. Robert Stemplinger ICON Clinical Research

SDTM-ETL TM. The user-friendly ODM SDTM Mapping software package. Transforming operational clinical data into SDTM datasets is not an easy process.

Package R4CDISC. September 5, 2015

PharmaSUG2010 Paper CD04 CD04

PhUSE Annual Meeting, London 2014

Pharmaceutical Applications

FreeForm Designer. Phone: Fax: POB 8792, Natanya, Israel Document2

A white paper presented by: Barry Cohen Director, Clinical Data Strategies Octagon Research Solutions, Inc. Wayne, PA

Introduction to the CDISC Standards

Smart Dataset-XML Viewer: Web Services

Analysis Data Model (ADaM)

Current Status and Future Perspectives for Systemization of Clinical Study related the issues of CDISC in USA and other

The Intelligent Content Framework

Summary Level Information and Data for CDER s Inspection Planning. Paul Okwesili Office of Scientific Investigations Office of Compliance, CDER/FDA

Metadata and ADaM.

Sanofi-Aventis Experience Submitting SDTM & Janus Compliant Datasets* SDTM Validation Tools - Needs and Requirements

SAS CLINICAL TRAINING

Rationale and vision for E2E data standards: the need for a MDR

Strategies and Practical Considerations for Creating CDISC SDTM Domain Data Sets from Existing CDM Data Sets

ABSTRACT INTRODUCTION. Paper RS08

StARScope: A Web-based SAS Prototype for Clinical Data Visualization

PharmaSUG Paper IB05

From Validating Clinical Trial Data Reporting with SAS. Full book available for purchase here.

U.S. FDA Title 21 CFR Part 11 Compliance Assessment of SAP Records Management

CDISC and Clinical Research Standards in the LHS

ABSTRACT INTRODUCTION PATIENT PROFILES SESUG Paper PH-07

Implementing the CDISC standards into an existing CDMS

A Comparison of Two Commonly Used CRO Resourcing Models for SAS/ Statistical Programmers R. Mouly Satyavarapu, PharmaNet/ i3, Ann Arbor, MI

Clinical Data Management BPaaS Approach HCL Technologies

Extracting the value of Standards: The Role of CDISC in a Pharmaceutical Research Strategy. Frank W. Rockhold, PhD* and Simon Bishop**

SAS Drug Development User Connections Conference 23-24Jan08

Data Standards in Clinical Trials, A Regulatory Perspec9ve

The CDISC Study Data Tabulation Model (SDTM): History, Perspective, and Basics Fred Wood Principal Consultant, Octagon Research Solutions

Needs, Providing Solutions

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Transcription:

Use of Metadata to Automate Data Flow and Reporting Gregory Steffens Novartis PhUSE 13 June 2012

Stages of Metadata Evolution I In the beginning... No corporate or industry level data or reporting standards Data Standards defined in each company, or often in each therapeutic area, inconsistently complied with Data standards and study data specifications were stored in documents or unstructured excel files. Programmers re-enter information into SAS program files. Claims for scientific freedom required in data design Lots of reinvention, inefficiencies, inconsistent data that can t be easily pooled, re-entry of information into documents and program files. Expensive in time and money.

Evolution of Metadata II Data Standards defined for the industry, most recently by CDISC Begin to store data standards and specifications in formats approaching metadata. Starts with excel in formats that are inconsistent, not designed for programmatic access and don t have a clear distinction between data and metadata sometimes (e.g. why isn t suppqual a flag in metadata instead of a separate, physical data domain?) But metadata not playing nearly as primary a role as it should. Data standards not published in standard metadata (e.g. define.xml) and software tools not yet in place to use metadata) No industry standard metadata used to publish standards or study specification with yet

Stages of Metadata Evolution III Rigorously standardized metadata design Implementation of corporate meta-programming programs that need no modification as it is used in every study to implement database attributes defined in metadata. Metadata and meta-programming should be data standard neutral - no assumptions about what the data standard is and programming language neutral and process neutral. The industry is not generally here yet. Still thinking about out to automate SDTM or ADaM or SUPPQUAL instead of thinking about true meta-programming. We need to evolve to the implementation of industry-level metaprogramming, driven by industry-standard metadata design. We are starting to get there!

Stages of Metadata IV The next big thing is to standardize map metadata that defines the relationships between a source metadatabase and a target metadatabase. A standardized representation of data flow. Map metadata should be separate tables from metadata, to allow for mapping from any source to any target and multiple targets. Create corporate meta-programming that automates data flow - a Data Transformation Engine (DTE) Implement an industry DTE with meta-programming driven by metadata and map metadata that is shared by industry, CROs and regulatory agencies.

Stages of Metadata Evolution V The next phase of metadata evolution is not strictly metadata, but is Study Information Data (SID), that is a standard structure to store study design, treatment arms, visit definitions, schedule of events, TFL design, etc. We need to continue our journey out of the world of documents and into the world of metadata. SID will enable meta-programming for the generation of standard tables, figures and listings as well as analysis results metadata that enables navigation through TFLs like the define file enables navigation through the data sets. SID is starting with trial design standards in CDISC and in companies (e.g. Jeff s presentation about Rho). But there is a mix of SID in data domains, ODM and metadata. Documents, like the protocol and SAP, will be generated from metadata in this phase of evolution.

Metadata Constituents A standard list of database attributes to include in any description of a database or of a data standard Put in a standard set of data structures that can be read by programming code The attributes must be highly structured in order to be usable by program code To define a standard for defining data standards and study data specifications Enables easy publication in different formats, html, word, pdf, xml, etc. Generate documents from metadata, not metadata from documents!

Standard Database Attributes Data Set Level Short/long names, data set location, order in define Variable level Short/long name, type, length label, primary key flag, format, value list name, suppqual flag, code/decode relationship, order, acrf location, etc. Valid values Value list name, start/end value, short decode, long decode, rank Descriptions Source name, derivation description Row-level attributes Identical to variable level attributes but for subsets of rows defined by a parameter variable value. Defines virtual variables, variables whose attributes change in different type of rows in the table.

Row-Level Metadata Necessary to fully describe tall-thin data set structures USUBJID SYSBP BPSYSLOC BPSYSU HEIGHT HEIGHTU WEIGHT WEIGHTU BMI BMIU 1 120 STANDING mm Mg 185 CM 90 KG 26.3 Kg/m**2 USUBJID VSTESTCD VSLOC VSORRES VSORRESU 1 SYSBP STANDING 120 Mm Mg 1 HEIGHT 185 CM 1 WEIGHT 90 KG 1 BMI 26.3 Kg/M**2

Metadata Structure Structured content to enable programmatic access to the list of attributes Storage structure is separate from publication structure maximize programmatic access in the metadata design and user friendly access by people in metadata publication formats Storage structure is also separate from the data entry format Maximize sharing of information within the metadata, e.g. values lists and descriptions. Normalize the metadata design. There are a lot of errors and inefficiencies out there yet, in the design and implementation of metadata

Some Principles of Metadata Design Rigorously standardized for all database and standard descriptions, no metadata design change is required for different database standard types or study specification! Metadata should not impose a process or a data flow, like SDTM to ADaM to IDB. Process and flow belong in map. Maximize structured information and programmatic access, e.g. primary keys flagged instead of listed Enter once; use many. e.g. descriptions and values; meta-programming Complex derivation logic in descriptions and subroutines, though. Data transformation automation is implemented differently than data derivation automation.

Objectives of Metadata It is critical to explicitly define the objectives. Many disagreements arise from an unstated difference in assumed objectives Objectives allow evaluation of the success of the metadata design; e.g. retrospective description for esubmission or prescriptive enabler of automation Data standards and metadata are a means to an end and that end includes an efficient and transparent data flow that leads to good decisions about safety and efficacy

Objectives Prescriptive metadata drives meta-programming, no more merely description, post-facto metadata Meta-programming must be able to assume a standard metadata structure in order to minimize its assumptions about data structures. Don t automate each domain, automate all domains and all standards with a single set of macros that read metadata that tells them what to do. The DTE meta-programming. Include enough attributes to enable the automation of every transform Store data standards, standards templates and study specifications in the same metadata design

Industry Metadata Standard We need an industry metadata standard to exchange information about data standards, data specifications and the way one database is created from another (e.g. ADaM from SDTM) Current practice is to use metadata that is quasistandardized at each company or to use old-fashioned word documents This causes great inefficiencies - Translating between metadata standard structures and attribute lists causes large amounts of unnecessary work

Some of the Problems Could be Solved by an Industry Standard Metadata Excel often used, with un-typed columns, not 2-dimensional and confusion between storage, entry and presentation structures Inconsistent metadata structures even within a company, between different standards, specifications and versions of the same standard Unstructured information like controlled terminology concatenated in large character variables Primary key variables in lists instead of flags Inconsistent attribute lists, metadata structure CDISC excel workbooks have these problems too Including mapping information in metadata Assumptions about process, data flow and data standards

What could be An industry metadata standard does exist the define.xml. This has a standard list of attributes and a standard structure But the standard structure is xml and difficult to access programmatically A solution is a standard relational metadata structure that contains the list of attributes in the define.xml schema but in a programmatically accessible format. This approach was used in the two CDISC pilot projects with success, using my relational metadata design and some meta-programs. All data standards and specifications would be stored and publicized in this standard metadata structure Standard GUI for entry and modification of metadata content A set of standard presentations of metadata content

What to do with Standard Metadata Data standards published in a standard way Study data specifications exchanged between organizations and software systems using the same metadata design Automation that uses metadata to inform the code about the database, instead of the code making assumptions about the database. Metadata is code. A metadata standard is more important that data standards!

A Process Submit data standards in an industry standard metadata structure. Create a study data specification by subsetting the metadata-resident data standard Compare the study specification to an IDB standard so that integrating the study data will be easier. Using multiple CROs for different studies is less of a problem. Create the define.xml / pdf / html / rtf from metadata in minutes, including all the hyperlinks to data and acrfs Send the source data and specification to the programming team The team uses meta-programs to build and validate the database Validation of the data by automated comparison of the data to the metadata-resident specification

Principles of the Process Metadata is prescriptive rather than merely descriptive Prescriptive metadata created at the start has much more value than descriptive metadata created at the end Metadata is populated at the start of the project and supports automation throughout the process from creation to FDA submission Publish the plan Check compliance to standard Build the database Validate the data Create define file for the FDA Metrics measure compliance of requirements to standard and the data to requirements Enter once; use many! Metadata structure is identical in all applications to support sharing of content

Other Kinds of Metadata After metadata comes map metadata that supports even more complex automation of the transformation of data from source to target structures, like creating SDTM, ADaM or integrated databases to support ISS/ISE A Data Transformation Engine requires metadata and map metadata and provides huge efficiency gains and transparency in the data flow (transforms not hidden in code or documents) The term metadata is often used more broadly to also mean data that describes trial design, treatment arms, tables, figures and listings, titles/footnotes, etc. A more general term is data driven applications, which include metadata driven applications.

Map Metadata Map metadata must be standardized Map metadata connects an observation in the source metadata with an observation in the target metadata. It s structure is simple one map metadata set for each metadata set. It contains the primary key variables of the metadata sets for the source and the target. A columns metadata set is keyed by TABLE and COLUMN, so the map metadata structure contains SOURCE_TABLE SOURCE_COLUMN, TARGET_TABLE and TARGET_COLUMN. This is enough to support metaprogramming of the flow of data from one structure to another. Map describes no DB attributes.

Meta-programming to implement data flow %dtmap( source_mdlib=m,source_prefix=raw_, target_mdlib=m,target_prefix=target_, maplib=m, inlib=raw, outlib=sdtm, suppqual_make=yes)

Study Information Data (SID) Standard, structured data sets that describe information required for TFL generation and the creation of some of the protocol and SAP sections. Visits, epochs, schedule of events, baseline visits Treatment arms, treatments, schedule of treatments TFL titles and footnotes meta-programming creates all the titles and footnotes and analysis results metadata can be automatically created, just like the define file. TFL summary statistics for each TFL and a style sheet functionality to create the TFLs from that.

Examples of Macros that Implement Meta-programming List of some of the macros and their functionality which help to achieve efficiency and ensure good quality: Mdprint/md2odm Publish in html or xml format Mdatribs Ut_find_decodes Dt_make_decodes Dt_copy_headers Mdcompare / mdcompare_print mdcheck mdbuild mdfreqvals Apply attributes defined in metadata to a data library Finds decode variables and their attributes Creates decode variables Copies header variables from source to target data sets Compares metadatabases to each other, such as a study requirement to a standard or a study to a study Checks data and reports discrepancies with the metadata Builds metadata to describe an existing data library Creates the values metadata set (supplements mdbuild)

Examples of Meta-Programming dtmap Dt_thin2wide Dt_wide2thin Tool_code_lib Ut_saslogcheck Ut_age_years Top level macro that users call to transform data from one format to another, e.g. raw to SDTM to ADaM to IDB Convert tall-thin to short-wide Convert short-wide to tall-thin Documents program code Checks SAS logs for disallowed messages Computes age in years Ut_truncate_long_chars Truncates long character variable lengths to least length to hold longest value mdport md2excel / excel2md mdmkdsn Creates a transport file of a metadatabase to archive versions Converts metadata between SAS and excel Creates 0-observation data sets as defined in metadata

Examples of Meta-Programming Suppqual_make Suppqual_get Dtmap_values Mdformats Missvars Missobs Creates the suppqual data sets, by reading the suppqual flag in the metadata to identify supplementary qualifiers Gets supplementary qualifier variables from the suppqual data sets and adds them to their proper domain Changes the value of variables by reading value map metadata Create user formats from values metadata set Report variables that have a missing value in all observations Report observations where all variables have a missing value