SDTM Validation: Methodologies and Tools Bay Area CDISC Implementation Network Meeting Friday, April 30 th, 2010 Dan Shiu
Disclaimer The ideas and examples presented here do NOT imply: They have been or will be implemented at Amgen They have not been or will not be implemented at Amgen Amgen agrees or disagrees with them The ideas and examples presented here DO represent: My personal views My sweat and blood
Regulations, Guidance, and Expectations on SDTM Validation FDA 21 CFR Part 11 applies to computer systems (e.g. Base SAS) but not to use/output of the systems (e.g. SAS programs/datasets) FDA Guidance for Industry: Study Data Specifications for electronic submission data tabulation datasets should follow SDTMIG FDA website: SDTM Validation Specifications validation checks from FDA software tools (Janus) Data submitted to regulatory agency is expected to be complete and accurate, regardless of the regulatory requirement
SDTM Validation Categories SDTM Mapping Validation Raw Data Mapping Specifications/aCRF Programming SDTM Data Verify raw data is CORRECTLY and TRUTHFULLY converted to SDTM data SDTM Compliance Checks Rules have been developed to ensure the software used by FDA (WebSDM by PhaseForward) can check and load the submitted SDTM data into their data warehouse (Janus) Each rule carries a degree of severity for non-compliance in the worst case may result in refusal to file
SDTM Mapping Validation vs. Compliance Checks SDTM Mapping Validation SDTM Compliance Checks The QS domain is not intended for use in submitting diaries capturing routine study data Measurement, Test, or Examination values must have consistent standard unit value (--STRESU) across all records in EG, LB, QS, VS Start Date/Time of Observation (--STDTC) must be less than or equal to End Date/Time of Observation (--ENDTC)
SDTM Validation Methodologies SDTM Mapping Validation Full Independent-programming Risk-based QC Process Characteristics-based QC Process SDTM Compliance Checks WebSDM (v1.5/v2.6/v3.0) Janus (v1.0 Draft) Other SDTMIG custom checks
Full Independent-Programming Create SDTM mapping specifications/acrfs Programmer creates production SDTM datasets based on mapping specifications/acrfs QC role creates QC SDTM datasets based on the same mapping specifications/acrfs PROC COMPARE production vs. QC SDTM datasets Resolve discrepancies until production SDTM matches with QC SDTM
Issues with Full Independent- Programming Result is still dependent and biased Inconsistent QC process across products/studies/milestones QC not based on risk spend more time on less important/risky issues Double resources programmers, codes, datasets, documentation Inefficiency delayed deliverables
Risk-based QC Not all uses of SDTM data are equally important Not all programming steps are equally errorprone Align QC efforts with the intended use of SDTM as well as the programming steps used to produce data Spend most of your QC resources on data with the greatest business/quality risk!
Risk-based QC Concept
Risk Assessment Examples Complexity Programming Complexity Low Medium High - No pooling or merging of data - No calculations or derivations - Basic data steps and sorting - Simple data merges - Simple pre-processing of data, sub-setting, where/if clauses, retains, arrays, transposing - Steps involving validated/standard macros - Complex merging data across various source data - Complex derivation and calculation of data
Risk Assessment Examples Intended Use Intended Use of SDTM Data Low - Internal use only - Not to be used for major business decisions Medium - Data/safety review - Non-endpoint data High - Regulatory submission - Primary analysis/final CSR - Endpoint safety and efficacy data
Risk-based QC Method Examples Method Responsibility Time Needed Log Review use automated log checking utility to detect potential errors Code Review line-by-line review of code and log Requirements/Specifications Review comparison of SDTM data with specifications/acrf Spot Check Review ad hoc programming/visual checks on SDTM/raw data Independent Programming programming to produce matching datasets Programmer, QC Role QC Role, designated group Programmer, QC Role, Statistician QC Role, Statistician QC Role Short Medium Medium Medium Long
Risk Matrix Examples High 1. Log Review 1. Log Review 1. Log Review Complexity of Program Medium Low 2. Requirements/ Specifications Review 3. Code Review 1. Log Review 2. Requirements/ Specifications Review 1. Log Review 2. Requirements/ Specifications Review 2. Requirements/ Specifications Review 3. Spot Check Review 4. Code Review 1. Log Review 2. Requirements/ Specifications Review 3. Spot Check Review 1. Log Review 2. Requirements/ Specifications Review 3. Spot Check Review 2. Requirements/ Specifications Review 3. Independent Programming 1. Log Review 2. Requirements/ Specifications Review 3. Spot Check Review 4. Code Review 1. Log Review 2. Requirements/ Specifications Review 3. Spot Check Review Low Medium High Intended Use (Business Risk/Impact of Error)
Characteristics-based QC SDTM Mapping Validation: Raw Data Mapping Specifications / acrf / Programming SDTM Data Full Independent-programming Risk-based QC Are these the best ways?
Characteristics-based QC Concept "Grandma, what big eyes you have! Grandma what big ears you have! Grandma what big teeth you have!" Each data element has characteristics Characteristics describe a data element as whole If all characteristics match, data elements match If all data elements match, raw data is CORRECTLY and TRUTHFULLY converted to SDTM
Data Element Examples Data Element: a group of data, regardless of datasets, variables, records, attributes, that together represent a precise meaning or semantics CDISC SHARE Project: The vision for CDISC SHARE is to build a global, accessible electronic library, which through advanced technology, enables precise and standardized data element definitions that can be used in applications and studies to improve biomedical research and its link with healthcare. Age Element: USUBJID, AGE Race Element: USUBJID, RACE, SUPPDM.QNAM= RACEOTH, QVAL AE Term Element: USUBJID, AETERM, AEDECOD SF36 Score Element: USUBJID, QSCAT= SF36, QSORRES, QSSTRESC, QSSTRESN, QSSTAT, QSREASND
Data Element Characteristics Numeric Characteristics Descriptive Statistics: can be generated from PROC SUMMARY, PROC MEANS, PROC UNIVARIATE N, NMISS, MIN, MAX, MEAN, MODE SUM, RANGE, VAR, STD, STDMEAN Coefficient of Variation, Skewness, Kurtosis Character Characteristics FREQ, NOBS, min/max length Checksum: e.g. odd parity bit a simplified algorithm Pain =01010000011000010110100101101110 Count the number of 1s 14 To keep odd parity pit, add 1 to 14 checksum=1 If all checksums match all character values match If statistics of all checksums match all character values match
Characteristics-based QC Examples QC on Age Element From raw data: demog.age_raw From SDTM: DM.AGE Compare: N, MIN, MAX, MEAN, MODE, SUM, STD QC on AE Term Element From raw data: adverse.subjectid, adverse.aevt, adverse.aept From SDTM: AE.USUBJID, AE.AETERM, AE.AEDECOD Compare: FREQ, NOBS, min/max length, checksum QC on SF36 Score Element: From raw data: sf36.subjectid, sf36.score_raw, sf36.cmt From SDTM: QS.USUBJID, QS.QSCAT= SF36, QS.QSORRES, QS.QSSTRESC, QS.QSSTRESN, QS.QSSTAT, QS.QSREASND Compare numeric: N, NMISS, MIN, MAX, MEAN, MODE, SUM, STD, RANGE Compare character: FREQ, NOBS, min/max length
Characteristics-based QC Benefits Data element characteristics exist as soon as data is created/refreshed Characteristics-based QC is an extension of risk-based QC in a more consistent way Characteristics-based QC can be applied to all end-to-end data conversion processes (e.g. raw to SDTM, SDTM to ADaM) Characteristics-based QC can be automated!
SDTM Compliance Checks Raw Data SDTM Mapping Validation
SDTM Validation and Loading at FDA Electronic Submission Sponsor: SDTM Define.xml ectd Communication FDA Review Tools: JMP J-Review WebSDM Etc. Review Communication / Refuse to File FDA Electronic Document Room JANUS Data Repository Data Validation and Loading WebSDM Checks JANUS Checks Pass Pass
WebSDM v3.0 Checks 154 rules based on SDTMIG 3.1.2 Checks apply to data (classes, domains, variables, values) and metadata (define.xml, SDTM Terminology.xls) Severity (Low, Medium, High) is only an indicator of potential problems or anomalies in the data. There is no direct correlation between a severity value and a FDA decision about whether the data is acceptable for review or not.
Janus v1.0 (Draft) Checks 109 rules based on SDTMIG 3.1.1 Overlap with WebSDM rules but with different definition of the severity levels Severity High Medium Low Description The error is serious and will prevent the study data from being loaded successfully into the Janus repository. The SDTM study will not be loaded into the Janus repository. The error may impact the reviewability of the submission, but will not have an impact on loading the study data into the Janus repository. The SDTM study will be loaded into the Janus repository. The error may or may not impact the reviewability or the integrity of the submission but will not have an impact on loading the study data into the Janus repository. The SDTM study will be loaded into the Janus repository.
WebSDM vs. Janus Severity WebSDM and Janus may assign different severity levels for the same rule
Custom SDTMIG Compliance Checks WebSDM/Janus checks cannot cover all of the explicit/implicit rules in SDTMIG: 8/40/200 character limitation check USUBJID value must be unique for each subject across all trials in the submission IDVAR (variable), IDVARVAL (record) reference check against parent domain for CO ISO 8601 format check on Duration, Elapsed Time, and Interval values And many more
Tools for SDTM Compliance Checks Proprietary Software: WebSDM from Phase Forward,., etc. Free Software: OpenCDISC Validator Direct-download and installation on PC Graphic user interface Reporting in Excel, CSV, and HTML SAS Clinical Standards Toolkit PC/UNIX installation support from IT Interactive/Batch SAS programming interface Reporting functions not provided but can be custom-built
SAS CST is a framework including: Directory structure Metadata: datasets, format catalog, XML, Excel Data: datasets, format catalog, XML, Excel Source code: SAS programs/macros
Tools Comparison Installation OpenCDISC Validator User direct-download PC/USB flash drive, tweak on UNIX SAS Clinical Standards Toolkit IT/SAS administrator support PC (9.1.3/9.2) and UNIX (9.2) Interface Graphic user interface Interactive/Batch SAS programming interface Supported Standards / Features Reporting Validate SDTMIG 3.1.1/3.1.2 based on WebSDM v3/janus v1 draft Additional custom checks CDISC-NCI Terminology Generate/Validate define.xml based on CRTDDS v1 Excel/CSV/HTML reports Can only limit number of occurrence per rule WebSDM/Janus rule ID on website but not on reports Severity levels follow Janus Validate SDTMIG 3.1.1 based on WebSDM v2.6/janus v1 draft Additional custom checks CDISC-NCI Terminology Generate/Validate define.xml based CRTDDS v1 Results in SAS datasets Can limit number of occurrence per rule/dataset/actual value WebSDM/Janus ID in results Severity levels follow WebSDM/Janus
Tools Comparison (Cont d) Processing Real memory OpenCDISC Validator Check on SAS transport XPT or other delimited text files Performance Fair (hours) for small studies but potential memory crash for large studies Maintenance Open XML code for configuration Open Java code on website Standard/Custom metadata in XML/Excel Flexibility Need XML/Java expertise for any customization/enhancement SAS Clinical Standards Toolkit Disk and real memory Redundant processing steps Check on SAS datasets To be improved (1+ day) Open source SAS code/configuration Standard/Custom metadata in SAS datasets Select/Deselect rules to check in SAS code Build custom checks with SAS code Build graphic user interface in SAS/Excel Documentation Website Instructions Installation Instructions IQ/OQ document Examples/Exercises User s Guide Technical Support Website forum SAS technical support from phone/email/website
References and Contact FDA Guidance for Industry, Part 11, Electronic Records; Electronic Signatures Scope and Application http://www.fda.gov/downloads/drugs/guidancecomplianceregula toryinformation/guidances/ucm072322.pdf FDA Guidance for Industry, Study Data Specifications (v1.5.1): http://www.fda.gov/downloads/drugs/developmentapprovalproce ss/formssubmissionrequirements/electronicsubmissions/ucm1 99759.pdf WebSDM Checks: http://www.phaseforward.com/products/cdisc Janus Checks: http://www.fda.gov/forindustry/datastandards/studydatastandar ds/ucm155327.htm OpenCDISC Validator: http://www.opencdisc.org SAS Clinical Standards Toolkit: http://ftp.sas.com/techsup/download/hotfix/12clintlkt.html Contact Information: dan.shiu@amgen.com