presents Implementation of a data warehouse based on CDISC- SDTM for centralized analytics at the study and DataFax server level Kevin Newell, NIH/NIAID/Office of Cyber Infrastructure and Computational Biology
Background NIH/NIAID/OCICB supports clinical research domestically and in many countries around the globe ~90 research studies using DataFax server Multiple data streams are supported within our Program, including: clinical data clinical laboratory data research laboratory data freezer/specimen management system imaging data clinical environmental monitoring system Multiple CDM software is supported within our Program (but DataFax is the best) DataFax server and cross-study metrics are other important data streams for the Program
CDISC- FDA submission requirements looming HISTORICAL APPROACH Study-level programming to produce reports Study-specific transformations to produce data submissions in required format Few standards used to define common data modules at level of CRF and database design (distributed operational model) Limited resources NOVEL REQUIREMENTS Consolidated, centralized services needed Conceptualize and implement efficiencies in all phases of the data management life cycle Design and adopt common data standards at CRF design, study database setup, and protocol reporting levels
Time to reinvent ourselves Being a better version of ourselves
A Data Warehouse is born Harmony Data Warehouse
Data Warehouse A single, complete and consistent store of data obtained from a variety of different sources made available to end users in an understandable and easy to use business context All source data destined for the warehouse is mapped into a standard data model EXTRACT TRANSFORM LOAD Provides an efficient, central platform for reporting, analysis and data sharing Data input streams can be multiple study databases from same (or different) CDMS Data input streams can be from various other systems
Data Warehouse Workflow Source Systems Data Staging Area Database End User Utilities Using CDISC/SDTM standard
Data Warehouse- Benefits CDISC compliant FDA submissions made more efficient Centralized reporting platform Prospective study visuals - graphical protocol status reports: demographics, enrollment and accrual, subject disposition, data quality, adverse events and reactogenicity Data & Safety reporting to DSMB and safety review committees Centralized analytic capacity Merge and link clinical data in useful ways Linkage to other data streams in the warehouse e.g. research laboratory outcomes (immunology, virology, microbiology, and genomics), imaging data, etc. Linkage to specimen repository data in order to identify target samples for testing based on specific clinical criteria Capacity for cross-protocol analyses Program-level administrative efficiencies for summarizing operational data and program metrics
Data Model Specifications
Sample Domain: Demographics PostgreSQL database Customized SDTM-hybrid data model Supports FDA submissions Caters for non-cdisc data types that are important to our researchers Affords future extension
Study ETL Specifications
Extract, Transform, Load
Analytic Outputs Study visuals dashboard reports DSMB/SRCP standardized reports Server-level administrative reports Program-level (across DataFax studies) reports
Validation Process Risk-based strategy Structural testing- to ensure database integrity CDISC compliance- in order to meet FDA submission requirements ETL specifications- review of mapping ETL programming- data validity, accuracy and completeness Downstream analytic business lines- to ensure outputs are accurate Validation as an ongoing, prospective process in the pipeline
It s a Work in Progress Continue to increase capacity of data warehouse by introducing new domains into data model Continue to develop new data input streams into the Data Warehouse Continue to map new protocols into the Data Warehouse More efficient approach to forms and database development- including adopting CDASH/SDTM approach to defining styles/modules/fields Continue to improve efficiencies in reporting outputs, including SAS reporting (SRCP, DSMB) and study visuals--- cut down on amount of customization required Continue to produce new standardized reports as required Evolution in user management/access control (currently managed using DataFax)
Acknowledgements Jaskiran Singh, Paschaline Gumne, Harish Kandaswamy, Jennifer Xiao, Francis Appiah - NET ESOLUTIONS CORPORATION Michael Duvenhage, Michael Holdsworth, Christopher Whalen - Research Data and Communication Technologies, Inc. Alexander Rosenthal, Michael Tartakovsky - Office of Cyber Infrastructure and Computational Biology Greg Marlow, Aziza Ahmad- Tableau visualizations resources Bruce Burgess- NIAID OEB LINUX Team members- server administration support Lisa Hoopengardner, Neelam Gulati- Leidos Biomed- CRF development support
Questions