PhUSE SDE Mee<ng, NY 2015 PhUSE Management Project, Study data standards, Master data, terminology and interoperability defini<ons Mitra Rocca, FDA Marcelina Hungria, DIcore Group
Table of Content PhUSE CSS Emerging Technology (ET) Working group Management Project DefiniHons ImplementaHon DefiniHons and study data standards Master data Controlled terminology Interoperability Pooling, aggregahon, integrahon Lessons learned
PhUSE CSS Emerging Technology FDA/PhUSE ComputaHonal Science Symposium (CSS) is a collaborahve effort between industry and the FDA to work on implementahon of data standards In 2013 a new working group was established focusing on the following emerging technologies: semanhc technology (now in a dedicated working group) management Cloud compuhng Big data The ET WG has re- organized in 2014
Management Project Goals Changing landscape: need for concept based Repository (MDR) from protocol to data submission
Project Team Deliverables Defini<ons Document hzp://www.phusewiki.org/wiki/index.php?htle=_management Comments to FDA Guidances SubmiKed to the FDA docket (by the May- 2014 deadline)
Defini<ons Soup. 6
Defini<ons 1 METADATA MANAGEMENT 1.1 1.2 Structural metadata 1.3 Descrip5ve metadata 1.4 Study Instance 1.5 repository 1.6 registry 1.7 Data element 1.8 ABribute 1.9 Class 1.10 Data type 1.11 Value level metadata 2 CONTROLLED TERMINOLOGY, CODE SYSTEMS & VALUE SETS 2.1 Controlled Terminology/controlled vocabulary 2.2 Code system 2.3 Dic5onary 2.4 Concept 2.5 Code 2.6 Code list 2.7 Value set 3 MASTER DATA MANAGEMENT 3.1 Master Data 3.2 (Master) Reference Data 3.3 Master Data Management 4 INTEROPERABILITY Categoriza5on of Interoperability (by HL7) 4.1 Technical interoperability ( machine interoperability ) 4.2 Seman5c interoperability 4.3 Process Interoperability 5 DATA AGGREGATION, INTEGRATION, POOLING 5.1 Data pooling 5.2 Data integra5on 5.3 Data aggrega5on
PhUSE SDE Mee<ng, NY 2015 Approach Defini<ons Lessons Learned
Defini<ons 1 METADATA MANAGEMENT 1.1 1.2 Structural metadata 1.3 Descrip5ve metadata 1.4 Study Instance 1.5 repository 1.6 registry 1.7 Data element 1.8 ABribute 1.9 Class 1.10 Data type 1.11 Value level metadata 2 CONTROLLED TERMINOLOGY, CODE SYSTEMS & VALUE SETS 2.1 Controlled Terminology/controlled vocabulary 2.2 Code system 2.3 Dic5onary 2.4 Concept 2.5 Code 2.6 Code list 2.7 Value set 3 MASTER DATA MANAGEMENT 3.1 Master Data 3.2 (Master) Reference Data 3.3 Master Data Management 4 INTEROPERABILITY Categoriza5on of Interoperability (by HL7) 4.1 Technical interoperability ( machine interoperability ) 4.2 Seman5c interoperability 4.3 Process Interoperability 5 DATA AGGREGATION, INTEGRATION, POOLING 5.1 Data pooling 5.2 Data integra5on 5.3 Data aggrega5on
Approach Master Data Management Synonym DefiniHon & source DescripHon Example Recommended definihon Reference Data Management; MDM [Gartner Magic Quadrant for Master Data Management of Customer Data SoluHon] hzp://www.gartner.com/technology/reprints.do?id=1-1ck9udo&ct=121019&st=sb MDM is a technology- enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semanhc consistency and accountability of the enterprise's official, shared master data assets. [Source: Master Data Management] Master Data Management (MDM) is the collechve applicahon of governance, business processes, policies, standards and tools facilitate consistency in data definihon. The idea of Master Data focuses on providing unobstructed access to a consistent representa5on of shared informa5on [Source: SAS White Paper on SupporHng Your InformaHon Strategy with a Phased Approach to Master Data Management Master Data Management (MDM) comprises of a set of processes and tools that consistently define and manage the master data and master reference data of an enterprise, which are fundamental to the company s business operahons. MDM has the objechve of providing processes & tools for collechng, aggregahng, matching, consolidahng, quality- assuring, persishng and distribuhng such data throughout an organizahon to ensure consistency and control in the ongoing maintenance and applicahon use of this informahon. There are different models for master data management the 2 main extremes are Centralized model where all data are managed within a central data store and pushed to the different applicahons within an organizahon. Decentralized model (registry) where the master data are managed within each applicahons but then reconciled through a registry systems to federate. Specific products from vendors such as INFORMATICA, IBM, Soqware AG, Set of processes and tools needed for the deployment of master data and master reference data within an organizahon.
(Organization/ Enterprise Level) (Drug Level) Drug Structural Descriptive Drug Structural Drug Descriptive Semantic Descriptive Process Descriptive Semantic Descriptive Process Descriptive Subset of IDMP standard + CDISC (CDASH, SDTM for a compound) (Study Level) Study Subset of CDISC CDASH, SDTM standard (based on company best practice) Study Structural Semantic Descriptive Study Descriptive Process Descriptive
Master Data
How Controlled Vocabularies are described and used Codes C16576 for F Concept Identifiers Designations Female F (Primary) female Concepts C16576 + F Concept Representation ISO 21090 Datatypes the CD Concept Descriptor Controlled Terminology In define.xml (machine processable): Code System (CodeList Context): nciextcodeid (not directly processable URI instead) Value Set (CodeList) CUI for SEX: C66731 Code CUI for Designa<on F (Female): C16576 Code System Versioning Code Systems Codelist Value Set & Code with CDISC example Value Set Definition Value Set Versioning Value Sets C66731 for SEX inspired from Julie James, BlueWave Informatics
Interoperability
Data Pooling, Integra<on, Aggrega<on Dataset 1 Dataset 2 Dataset 3 AGGREGATION AddiHonal grouping or derivahon of data POOLING Storing data together without changing the datasets INTEGRATION: TransformaHon, mapping or harmonizahon of data (ETL process)
Lessons learned (Compiled from different team members) Efficient Data Integra<on and compliance to regulatory standards does not start ader pooling (retroac<ve approach); it starts with the protocol (proac<ve approach) A proac<ve approach is based on two components: o DefiniHon of Master Data (Drug Products, Studies, Sites, InvesHgators,..) and associated descriphve metadata o DefiniHon of study structural metadata aka study specific data standards as a subset of the enterprise wide variables and value sets contained in a repository (MDR) To be manageable, variables in an MDR need to be grouped in seman<cally meaningful "clinical research concepts" (CRC)
CHANGING LANDSCAPE : Enforcing data standards from protocol onwards Retro-active approach from paper protocol, Pro-active approach with structural metadata Different interpretations of same protocol Limited standards Time to build integrated SDTM data sets 17 Courtesy of Isabell de Zegher 2014 PAREXEL INTERNATIONAL CORP. / 17 CONFIDENTIAL One single interpretation of protocol Increased efficiency, consistency & quality through standards Reduced time for integration and secondary data use
CONCEPT BASED MDR : Protocol is not about variables but about concepts Annotated ecrf for Patient Demography? Courtesy of Isabell de Zegher 2014 PAREXEL INTERNATIONAL CORP. / 18 CONFIDENTIAL? SDTM data set (SAS) (different t variables names and different structures than ecrf)
CONCEPT BASED MDR: CDASH/SDTM can be organized by CRC Courtesy of Isabell de Zegher Concept CDASH Question CDASH Variable Subject What is the sex of the subject? What is the subject s date of birth? What is the ethnicity of the subject? What is the race of the subject? What is the subject s age? What are the age units used? ecrf content description 2014 PAREXEL INTERNATIONAL CORP. / 19 CONFIDENTIAL SEX BRTHDAT or BRTHYR BRTHMO BRTHDY ETHNIC RACE AGE AGEU SDTM Variable SEX BRTHDTC EHTNIC RACE AGE SDTM mapping AGEU
Conclusions Let us speak the same language We need to change the way we consider compliance to data standards and data integrahon: o From a retroachve way (building define.xml at submission) o To a proachve approach (study data standards defined at study setup) We need new tools to manage metadata: Concept based MDR o Grouping variables into semanhcally meaningful concepts (following industry wide pazerns) o Linking data sources (e.g, CDASH based collechon) to data submission (SDTM) variables o Linking with controlled terminology o With capabilihes to handle standards versioning
Defini<on Project Isabelle de Zegher (co- chair) Par<cipants Parexel Mitra Rocca (co- chair) FDA Marcelina Hungria (co- chair) DiCore Group Julie James BlueWave InformaHcs Tim Church Torch Yun Oldshue Takeda Praveen Garg ICON Kenneth Stoltzfus Accenture Gregory Steffens NovarHs John Leveille d- Wise Aimee Basile Celgene Sam Hume CDISC
PhUSE SDE Mee<ng, NY 2015 Mitra Rocca Senior Medical InformaHcian Office of TranslaHonal Sciences CDER, FDA Mitra.rocca@fda.hhs.gov Marcelina Hungria Clinical Data Standards & IntegraHon Consultant / Owner DIcore Group, LLC mhungria@dicoregroup.com