The Importance of Good Clinical Data Management and Statistical Programming Practices to Reproducible Research Eileen C King, PhD Research Associate Professor, Biostatistics Acting Director, Data Management Center
Reproducible Research The term first proposed by Jon Claerbout at Stanford University and refers to the idea that the ultimate product of research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. necessary for reproduction of the results and building upon the research.
Why? Scholarship can be recreated, better understood and verified. Others can start from the current state of the art Simplifies task of comparing a new method to existing methods Create earlier results again in a later stage of the research
Reproducibility of Clinical Research is Cornerstone of Drug Development Process Phase III trials - Requirement for two trials that produce similar results (reproducibility) Summary of Clinical Efficacy Document Displays the reproducibility of the efficacy results
Drug Development Submissions Required to provide all data along with complete documentation May be required to provide statistical analysis code along with complete documentation
Why? Regulators (e.g. FDA) will recreate the results in your submission Regulators will use the data to evaluate efficacy and safety issues across a drug class cardiac effects from NSAIDS effectiveness of antihistamines in OTC cold products
Academic Research Centers should steal shamelessly from the Drug Development Process in order to facilitate Reproducible Research
Why? NIH is requiring: Data Sharing Plan in large grants Data and Documentation Statistical Programming Code and Documentation Posting of clinical study results to clinicaltrials.gov
Facilitating Reproducible Research
Develop Standard Operating Procedures (SOPs) SOPs should be written to cover all key elements of the conduct of a study Provide enough detail to ensure steps are consistently carried out Don t provide so much detail as to end up with violations due to normal variations in working Written at a high level to outline: Required tasks Sign-offs Checks performed
Develop Work Instructions Work instructions can be more specific to a particular division or study Can document in more specific detail how to do things Generally not formally audited by regulatory agencies on work instructions
Assemble Multi-Functional Teams Assemble as soon as study planning is underway Team should provide input into the protocol regarding their functional area
Statistics QA Biomedical Informatics Statistical Programming Clinical Operations Study Team Regulatory Pharmacovigilance Investigator Data Management Coordinator
Cannot Work in Functional Silos Expertise is needed in each of these areas from protocol development through manuscript submission Amount of effort varies dependent on stage of study
Documents Required for Reproducible Research Study Protocol Manual of Procedures (MOP) Data Management Plan (DMP) Statistical Analysis Plan (SAP)
Importance of Data Management for Reproducible Research
Susanne Prokscha (2007): Practical Guide to Clinical Data Management As its importance has grown, clinical data management has changed from an essentially clerical task in the late 1970s and early 1980s to the highly computerized specialty it is today
Society of Clinical Data Management (SCDM) Founded to advance the discipline of clinical data management (CDM) Organized exclusively for educational and scientific purposes Mission: Promoting clinical data management excellence including promotion of standards of good practice within clinical data management Provides certification: The CCDM program establishes eligibility criteria and standards of knowledge as measured by a rigorous examination to qualified applicants.
Good Clinical Data Management Practices (GCDMP) Charter: The review and approval of new pharmaceuticals by federal regulatory agencies is contingent upon a trust that the clinical trials data presented are of sufficient integrity to ensure confidence in the results and conclusions presented by the sponsor company. Important to obtaining that trust is adherence to quality standards and practices. To this same goal, companies must assure that all staff involved in the clinical development program are trained and qualified to perform those tasks for which they are responsible From SCDM GCDMP document
Certified Clinical Data Manager (CCDM) SCDM certification program was designed to meet the following goals: Establish and promote professional practice standards throughout CDM Identify qualified professionals within the profession Ensure recognition of expertise Enhance the credibility and image of the profession
Data Management Plan (DMP) Details how data will be: Collected Stored Managed Archived Describes and defines all data management activities for a study What Who When How
Timing of DMP Development begins after the protocol and Case Report Forms are drafted Should be completed before the study begins Must be kept current reflect important changes to the data management process and computer systems that took place during the study
General DMP Contents Protocol Summary Study Personnel/Roles CRF Design/Tracking Database Development Data Entry and Processing Data Cleaning Reports Audit Plans External Data Transfers Managing Lab Data SAE Handling Coding Training Study Closeout
Forces planning Benefits of DMP Process and tasks become more visible to the project team Expected documents are listed at the start of the study so they can be produced during the study
Benefits (Cont) Provides continuity of process and a history of a project Useful for long-term studies Useful for personnel turnover and growing DM groups Regulatory Requirement Auditors will ask for it
DMP Planning upfront saves time at end and improves data quality First DMP is time-consuming to do Use templates and previous DMPs to ease upfront burden
Development of the Data Management Plan is so important that the DMC is recommending tracking of the percent of studies for which a DMP was written prior to first patient enrolled.
Selection of Database
21 CFR part 11 In March of 1997, FDA issued final part 11 regulations that provide criteria for acceptance by FDA, under certain circumstances, of electronic records, electronic signatures, and handwritten signatures executed to electronic records as equivalent to paper records and handwritten signatures executed on paper. These regulations, which apply to all FDA program areas, were intended to permit the widest possible use of electronic technology, compatible with FDA's responsibility to protect the public health.
Implications If data are being captured in an electronic system with no paper CRFs then should(must) use compliant system System can be compliant but user must follow standard procedures to assure total process is compliant At minimum, use system with audit trails
Current Options REDCap Not yet Request for Proposals to purchase 21CFR part 11 compliant system Oracle RDC Medidata Rave Study Manager Omnicomm Trialmaster TargetHealth Estimated timing for purchase: Summer 2012 Estimated timing for installation: End of 2012
NOT Options EXCEL spreadsheets Microsoft Access
Data Base Design Investigators, Coordinators, Data Managers, Biomedical Informatics, and Statisticians must work together Poor database design affects timings for: Data Entry Data cleaning Extraction Statistical Analysis
Things to do to quality of data in data base Enter data as soon as possible after it is received Run cleaning procedures throughout the study so that queries go out early Identify missing CRF pages and lab data by knowing what is expected. Use tracking systems Code AEs and meds frequently
Improving Quality (cont) Reconcile SAEs periodically throughout the study Get listings from safety system early so you know what to expect Begin to audit data against the CRF early to detect systematic problems Continue to audit as the study proceeds to monitor quality. Open the study documentation at the start of the study and make an effort to keep it updated as the study progresses.
Data Standards Facilitate Reproducible Research
Value of standard modules in CRF Design Completion instructions are the same Database design is the same Edit checks are the same Associated statistical analysis programs are the same Easier to compare and combine data across studies
Developing Standards Standards can include: Case Report Forms / Instructions Variable names Codelists Must be a commitment to following the standards that are developed
Resource for Standards Clinical Data Interchange Standards Consortium (CDISC) Global, open, multidisciplinary, non-profit organization that has established standards to support the acquisition, exchange, submission and archive of clinical research data and metadata.
CDISC Mission To develop and support global, platformindependent data standards that enable information system interoperability to improve medical research and related areas of healthcare Standards are freely available via the CDISC website.
Leading the development of standards for data collection is goal for Data Management Center Will need standards committee to develop and maintain which involves: Clinical teams Data Management Statistics Pharmacovigilance Biomedical Informatics
Importance of Good Statistical Programming Practices to Reproducible Research
Statistical Programmers Work closely with Data Management to: Specify format of data base that will be received by statistics Assist with data cleaning by running analysis programs on live data base Benefit: Statisticians don t spend time cleaning data
Includes: Statistical Analysis Plan Hypotheses to be tested Identification of primary and secondary endpoints Statistical analysis approach for each hypothesis Assumption testing Alternative approaches Data handling decisions Draft tables and figures for reports/manuscripts
Good Statistical Programming Practices Clear documentation contained within programming code Creation of Analysis Datasets Ensures all programs are using a consistent set of derived data. Validation of Statistical Programs How: Second programming or code review What: All analysis datasets and tables and figures
Tips to Facilitate Reproducibility Produce Report-Ready Output RTF files cut and pasted into manuscript/report Authors need not edit the numerical output Authors can change titles and footnotes Conduct quality assurance audits of manuscripts/reports
Reproducibility (cont) Programs for Tables and Graphs Archive SAS Programs, list and log files All tables should contain a footnote with the location and name of the SAS program that created it When report is complete, change permissions to read-only All SAS programs, list and log files Analysis datasets
Where Can I Get Help? http://cctst.uc.edu/ Go to Research Central on Centerlink CCTST: Center for Clinical and Translational Science and Training CCTST can identify resources for your study
Resources Available Clinical Trial Office Biomedical Informatics Data Management Center Biostatistical Consulting Units Drug Poison Information Center
Thank you!