Supplementary Web Material Development of an open metadata schema for Prospective Clinical Research (openpcr) in China A. Methods We used Singapore Framework for Dublin Core Application Profiles (DCAP), a generic construct for designing metadata records, [1] to develop openpcr. The framework holds the design and documentation components of specific metadata applications which can be grouped into four processes for metadata schema development: defining functional requirements, developing a domain model, defining metadata terms and designing a metadata record, and finally developing implementation syntax. We followed the four steps in modeling openpcr. A.1 Step 1: Define the Core Functional Requirements and Deduce the Core Metadata Items Defining specific goals of openpcr was an essential first step [1]. We used business process modeling method to define the top level objective. The general business process cycle of conducting a CR includes defining the problem and reviewing the relevant literature, preparing and approving a protocol, conducting research and collecting data, analyzing data and completing the final report, and producing new hypothesis [2, 3]. Designing openpcr is in the process of preparing and approving a protocol. Before conducting CR, researchers will develop a protocol, register to clinical registry and design case report form (CRF). The Declaration of Helsinki states Every clinical trial must be registered in a publicly accessible
database before recruitment of the first subject [4]. CRF, defined as a printed, optical, or electronic document designed to record all of the protocol required information to be reported to the sponsor on each trial subject, is the major clinical data collection tool [5]. The protocol is one of the most important documents in CR. Its general information is useful for registration, and trial design information is of the most importance in developing data collection tools [5, 6]. We defined the registration information, trial design information and CRF information as the basic information in developing openpcr, so the top level objective of openpcr can be specifically described as developing a metadata schema for research registration, trial design and CRF (Figure 1). Figure 1: Clinical research process and openpcr objective We then used both business process modeling method and literary warrant method to create the core functional requirements. We systematically reviewed hundreds of sources in standards, regulation, best practice guidelines, and studies on trial registration and development of CR data schema. Based on the top level objective, we divided the core functional requirements into three parts: requirements for research registration, requirements for trial design, and requirements for CRF. We further examined the specific business process
of registration, trial design and CRF, created specific requirements from selected sources, and deduced specific metadata items. The selected sources included two standard guidelines, five clinical trial registry documents, and two standards for clinical data submission and one standard for clinical data collection issued by the Clinical Data Interchange Standards Consortium (CDISC). Two standard guidelines are ICH GCP (International Conference on Harmonisation of technical requirements for registration of pharmaceuticals for human use - Good Clinical Practice) [5] and DCTQCS (Drug Clinical Trial and the Quality Control Standard of China) [7]. ICH GCP is an international ethical and scientific quality standard controlling the whole process cycle of CR. Its main purpose is to protect not only research subject but also the quality and credibility of data. DCTQCS of China has come into force since 2003. Based on the Pharmaceutical Administration Law of China and referenced by ICH GCP, it basically conforms to international standards and national circumstances. The trial design parts of ICH GCP and DCTQCS are important sources for trial design requirements. Five clinical trial registry documents are Ottawa Statement on Trial Registration [8], the WHO Trial Registration Data Set [9], and data sets in ClinicalTrial.gov (USA) [10], Chinese Clinical Trial Registry (China) [11] and ISRCTN Register (Britain) [12]. In creating the core functional requirements of registration, we reviewed Ottawa Statement, a guide to implement global trial registration. WHO Trial Registration Data Set (TRDS) was also included which provides the minimum amount of trial information that must appear in a register in order for a given trial to be considered fully registered. We also elicited those mandatory elements from ClinicalTrial.gov [13], ChiCTR and ISRCTN [11, 14] that are not embraced by TRDS,
because they were often registered databases for Chinese researchers [15]. Two standards for clinical data submission and one standard for clinical data collection issued by CDISC are Study Data Tabulation Model (SDTM) [16], SDTM Implementation Guide (SDTMIG) [17], and Clinical Data Acquisition Standards Harmonization (CDASH) [18]. STDM, STDMIG and CDASH are standards focusing on creating trial design and data collection requirements. STDM defines standard datasets and variables collected and submitted to a regulatory authority, and STDMIG provides specific details and examples for preparing standard datasets based on STDM. The Trial Design Model in the STDM represents standard datasets describing the plan of a clinical trial and helping to determine what data to be collected. CDASH describes basic standard data fields for the collection of clinical trial data. STDM and CDASH are closely related and harmonized, and both of them are intended for the collection of clinical data. If identical data are met between the two standards, the CDASH data collection fields would adopt the SDTM variable names. Based on the selected sources, we created the framework of core functional requirements, developed the requirements specification and deduced core metadata items. A.2 Step 2: Develop Archetype Models The next step was to develop archetype models based on the core metadata items. An archetype model is a kind of domain model containing openpcr metadata structure and relationships, and is the basic blueprint for the construction of the archetype records. During this process, metadata items were standardized, reorganized and structured to form archetype models. Structures of archetype models were visualized in a mind map. Firstly, metadata items of the same semantic meaning and content were combined
according to the core functional requirements result. Then, we standardized the Chinese term of each metadata item since most of them are from sources in other countries and organizations. As to metadata items deduced from the requirements of registration, we used the Chinese names of data fields in Chinese ChiCTR registry as the authority names which correspond to item names in TRDS, ClinicalTrial.gov and ISRCTN. Metadata items from requirements of trial design and CRF are clinical related data. We used National Health Data Dictionary (NHDD) [19] and EHR Architecture and Data Standard of China (Chinese EHR Standard) [20] as the standard name source. If there was a corresponding Chinese authority name, we adopted the name. If there was no corresponding Chinese name, we gave the name to the metadata item deliberately and consistently. Then we grouped archetypes into four categories according to Data Structure in Chinese EHR Standard. Data Structure in Chinese EHR Standard includes four hierarchical levels: document, section, data group, and data element. A document is the top level container, and can be organized into sections or data groups. Sections provide logical categories, and can be divided into a number of data groups. A section can be defined by its contained data groups. A data group is a collection of data elements or smaller data groups for holding related items of information. The data element is an indivisible data unit represented by data attributes such as identifier, definition, representation, and value domain. Data Structure in Chinese EHR Standard is analogous to EHR Information Model in openehr. We used Data Structure in Chinese EHR Standard as the information model of openpcr for future consideration of EHR and CR integration. Based on the four hierarchical levels of Data Structure in Chinese EHR Standard, we developed four classes of archetypes: document, section, data group, and data
element. Document archetypes were the top level archetypes that include public document, internal document, and clinical document which correspond to the top level requirements of registration, trial design, and CRF. Both data group archetypes and data element archetypes could be reused and combined into other archetypes. In public document archetype, data elements for registration are general information including title, identification, summary, and etc., and most of them are in compliance with elements in Dublin Core (DC) Metadata Element Set [21]. Since DC Metadata Element Set is a widely accepted metadata standard, which provides a means for categorizing and describing resources, and makes public registration information more easily searchable and deliverable on the web, we used it to structure and represent data in public document. Internal document archetype includes trial design information. Internal document can not only help to model the structured protocol [6], but also provide value domains for clinical metadata. The trial design information is mainly from ICH GCP, DCTQCS and SDTM. ICH GCP and DCTQCS provide required description information of trial design, while SDTM provides five datasets: Trial Arms, Trial Elements, Trial Visits, Trial Inclusion/Exclusion, Trial Summary, and study identification information [14]. The Trial Summary dataset contains information about the planned trial characteristics, mainly represented in public document and we did not include Trial Summary dataset in internal document. Therefore, internal document contained data group about design description, study identification, arms, elements, visits, and inclusion/exclusion. Unlike study-level metadata containers as public document and internal document, clinical document includes subject-level metadata and is broken up into a header section and a body
section. The header section is composed of subject identifier, arm, specific study visit, and etc. The body section includes clinical data for CRF corresponding to each study visit. Most of CDASH Domain Tables have corresponding domains in STDM. We made each CDASH Domain Table the top level data group in the body section, and converted variables of Category and Subcategory in corresponding domain in SDTM into smaller hierarchical data groups. When the core data elements under the top level data group can be reused by the smaller data groups, we created a generic data group containing the core data elements. Smaller data groups can reference the generic data group as a slot. If a CDASH domain table has a corresponding data group in Chinese EHR Standard, we analyzed the data group and determined if the structure of the data group could be used to categorize the CDASH domain table into lower data groups. For example, the domain table of physical examination in CDASH has a corresponding data group of Physical Examination in Chinese EHR Standard which is divided into ten categories: General Appearance, Skin, Lymph Node, Head, Neck, Chest, Abdomen, Genital Anal Rectum, Spine and Limbs, and Function (Disability). Then in the body section of clinical document, data group: Physical Examination includes smaller data groups of General Appearance, Skin, Lymph Node, Head, Neck, Chest, Abdomen, Genital Anal Rectum, Spine and Limbs, and Function (Disability). All the smaller data groups have three core data elements: Body System Examined, Examination Result, and Abnormal Findings. We formed a generic data group of physical examination containing three data elements: Body System Examined, Examination Result, and Abnormal Findings, which can be reused by data groups under physical examination as a slot.
A.3 Step 3: Define Metadata Terms and Develop Archetype Records As a kind of domain model, the archetype model represents concepts and relationships among them, and is always described using notations for modeling. An archetype record is based on archetype models, and is a technical specification holding metadata and metadata attributes within archetypes. Archetypes are grouped into four categories according to Data Structure in Chinese EHR Standard: document, section, data group, and data element. Document, section and data group are container metadata. The data element is the smallest named metadata that can be assigned with a value. All the metadata can be described by metadata attributes. Basic attributes facilitate the common understanding of the meaning and representation of the metadata [22] and constrain specifying technical details such as the obligation or occurrence of elements or restrictions on allowable values. In order to keep the definition consistency, avoid definition duplication, and increase metadata interoperability across China, we defined the openpcr metadata in accordance with ISO/IEC 11179-3:2003(E) Information technology - Metadata registries (MDR) - Part 3: Registry metamodel and basic attributes (ISO 11179-3) [22] and metadata attributes in NHDD and Chinese EHR Standard. ISO/IEC 11179-3 specifies 45 basic metadata attributes grouped to seven categorizes including: Common attributes, Attributes specific to Data Element Concepts, Attributes specific to Data Elements, Attributes specific to Conceptual Domains, Attributes specific to Value domains, Attributes specific to Permissible Values, Attributes specific to Value Meanings. Common attributes are further categorized as: Identifying, Definitional, Administrative, and Relational. Metadata attributes in NHDD and Chinese EHR Standard are developed based on ISO/IEC
11179-3. NHDD specifies five attribute categories including: Identifying and definition, Collection and usage guides, Source and reference, Relation, and Administration with 24 metadata attributes. Chinese EHR Standard contains six classes of attributes: Identification, Definition, Relation, Representation, Administration, and Extension with 17 mandatory metadata attributes. Extension class includes attributes for usage guides. In openpcr, we grouped metadata attributes into six categories according to NHDD and Chinese EHR Standard: Identification, Definition, Relation, Representation, Administration, and Usage guides. Container metadata such as document, section and data group have no attribute class of Representation. If attributes in the six classes already existed in NHDD and Chinese EHR Standard, we adopted them directly. If there were no corresponding ones, we created them according to Rules for Data Element Standardization of Health Information of China [23]. We developed 24 basic attributes for specifying metadata in openpcr (Table 1). Based on archetype models and the basic metadata attributes, we created the archetype records of openpcr. Table 1: Metadata attributes for openpcr Attribute categories Identification Attribute names MetadataIdentifier InternalName ChineseName EnglishName SynonymousNames Attribute categories Relation Attribute names Mapping Hierarchy Relevance ConditionOfUse MetadataType Usage guides Obligation Definition Definition Source Occurrence Representation (No need for Container metadata) ValueDomainName ValueDomainIdentifier DataType Administration Example RegistrationAuthority RegistrationStatus
Attribute categories Attribute names DataFormat Attribute categories Attribute names PermittedValue UnitOfMeasure SubmissionContact A.4 Step 4: Define Implementation Syntax The archetype models and records are semantic part of openpcr. We also developed the implementation syntax by XML, which is a simple and flexible text format used to publish and exchange a wide variety of electronic data. As a vendor neutral and platform independent markup language [24], XML is now widely used in the field of medicine. In the healthcare environment, HL7 Clinical Document Architecture (CDA) is a document markup standard for the purpose of exchange of clinical documents in XML. OpenEHR also provides XML schema to represent Archetype Object Model (AOM). In the CR environment, CDISC has developed Operational Data Model (ODM) based on XML schema to exchange and archive clinical trials data. CDISC has also released Study Design Model in XML (SDM-XML) standard which allows organizations to provide rigorous, machine-readable, interchangeable descriptions of the designs of their clinical studies. It is an extension to core ODM [25]. OpenPCR in XML will benefit for data exchange among other EHR standards or CR data standards. OpenPCR information model is composed of documents, sections, data groups, and data elements. Accordingly, information model in XML includes <document>, <section>, <data group>, and <data element>. <data group> can be nested by lower data groups using the same node tag-<data group>. We used UML to represent openpcr archetypes and to develop XML schema. Fragments
of the XML instances representing clinical document archetype were given to express the syntactical part of a piece of archetype records.
Table 2: Core functional requirements of registration and deduced metadata items in openpcr Requirement framework Extract Deduced metadata items Registration Categories Requirement description Agent 1 registration should include sponsors and collaborators information. (Ottawa 1 primary sponsor 2 secondary sponsor(s) Statement, TRDS, ClinicalTrials.gov, 3 contact for public queries ISRCTN, ChiCTR) 4 contact for scientific queries 5 source(s) of monetary or material Business 2 registration should include 6 primary registry process identification information assigned to the trial single international source. (Ottawa 7 primary trial identifying number 8 secondary identifying numbers Statement, TRDS, ClinicalTrials.gov, 9 date of registration in primary registry ISRCTN, ChiCTR) 10 date of first enrollment 3 registration should include key dates related to the process of the trial. (Ottawa Statement, TRDS, ClinicalTrials.gov, ISRCTN, ChiCTR) Record content 4 registration should include minimum protocol items to enable critical appraisal of trial methodology and statistical analyses. (Ottawa Statement, TRDS, ClinicalTrials.gov, ISRCTN, ChiCTR) 11 public title 12 scientific title 13 health condition(s) 14 intervention(s) 15 key inclusion and exclusion criteria 16 study type 17 target sample size 18 primary outcome(s) 19 key secondary outcomes 20 countries of recruitment 21 brief summary (or lay summary) Mandates 5 registration should include the consent forms approved by the IRB/IEC. (Ottawa Statement, ClinicalTrials.gov, ISRCTN) 22 board approval (or ethical approval)
Table 3: Core functional requirements of trial design and deduced metadata items in openpcr Requirement framework Extract Deduced metadata items Trial design Categories Requirement description Study identification 1 trial design should include study identification. (SDTMIG) 1 study identifier 2 site identifier Study 2 trial design should include primary 3 study type methods and secondary endpoints. (ICH GCP 6.4.1) 3 trial design should include study type. (ICH GCP 6.4.2) 4 method of allocation 5 blinding 6 intervention 7 dosage 4 trial design should include the measures taken to minimize/avoid bias. (ICH GCP 6.4.3, 6.4.8) 5 trial design should include trial treatment(s). (ICH GCP 6.4.4) 6 trial design should include the investigational product(s) information. (ICH GCP 6.4.4, 6.4.7) enrollment 7 trial design should include inclusion criteria. 8 trial design should include exclusion criteria. 9 trial design should include planned path of treatment. 10 trial design should include planned period of the trial and subjects. (ICH GCP 6.4.5) 8 inclusion short name 9 inclusion criterion 10 exclusion short name 11 exclusion criterion 12 arm code 13 arm description 14 order of element within arm 15 element code 16 element description 17 rules for start of element 18 epoch Visits 11 Trial design should include planned visits. 19 visit number 20 visit start rule Outcomes 12 Trial design should include the primary endpoints and the secondary endpoints. 21 primary endpoints 22 secondary endpoints
Table 4: Core functional requirements of CRF and deduced metadata items in openpcr Requirement framework Extract Deduced metadata items CRF Categories Requirement description Subject 1 CRF should include subject 1 Subject identifier identification identifier. (SDTMIG) Visit: screen 2 CRF should include subject s meet with inclusion and exclusion criteria. 2 Met all eligibility criteria 3 Criterion not met 3 CRF should include subject 4 Date of birth Visit: baseline demographics and additional 5 Year of birth Visit: treatment information of demographics data. 4 CRF should include subject medical history. 5 CRF should include substance the subject used. 6 Sex 7 subject characteristic question 8 subject characteristic result 9 reported medical conditions 10 type of substance used 6 CRF should include exposure 11 substance use occurrence information to study treatment. 12 start date of treatment Visit: follow up 7 CRF should include physical 13 name of the body system examined examination. 14 assessment of examined body system 8 CRF should include prior and 15 abnormal findings of physical concomitant medications. 9 CRF should include vital signs. 10 CRF should include laboratory tests. 11 CRF should include ECG test. 12 CRF should include adverse events. examination 16 medical or therapy name 17 start date of the medical or therapy 18 end date of the medical or therapy 19 vital sign name 20 vital sign result 13 CRF should include subject 21 laboratory test status disposition. 22 date of sample collection 23 laboratory test name 24 laboratory test result 25 ECG performed or not 26 date of ECG 27 name of ECG measurement 28 result of ECG measurement 29 adverse event name 30 start date of adverse event 31 end date of adverse event 32 severity of adverse event 33 serious adverse event 34 relationship of adverse event to study 35 action taken to adverse event with study 36 adverse event outcome
37 subject status
Table 5: Metadata attributes of Date of collection in laboratory test results Identification Metadata identifier Internal identifier Name Synonyms Definition Definition Source Representation Value domain name Value domain identifier Data type Data format Permitted value Units Relation Mapping hierarchy Relevance Usage guides Condition of use Obligation Occurrence openpcr-clinical.laboratory.date.v1 LBDAT Date of collection Collection date Date for sample collection. CDASHV1.0 ISO 8601 Data elements and interchange formats Information interchange Representation of dates and times ISO8601:2000 Date yyyy-mm-dd Chinese EHR Standard S.06.003-HR51.97.001.06 CDASH SDTM Clinical document Laboratory test results Date of collection Mandatory No Example 2011-09-14 Administration (omitted) For submission and statistics For interoperability and future EHR/CR system of China LBDAT LBDTC versioned 3-dimensional space for easy understanding and date finding
<?xml version="1.0"?> <document name= clinical document > <section> <header> </header> <body> <datagroup> <physicalexamination> <datagroup> <skin> <bodysystemexamined> skin </bodysystemexamined> <examinationresult>abnormal</examinationresult> <abnormalfindings>pitting edema</abnormalfindings> </skin> </datagroup> </physicalexamination> </datagroup> </body> </section> </document> Figure 2: An example structure and XML instance of data group: skin
References 1. Karen Coyle, Thomas Baker. Guidelines for Dublin Core Application Profiles. c1995-2011. [updated May 18, 2009; cited Sept. 20, 2009]. Available from http://dublincore.org/documents/2009/05/18/profile-guidelines/ 2. Gary D. Friedman. Primer of epidemiology (3 rd edition). McGraw-Hill Book Company. 1987: 215-231 3. Jin Pihuan, Su Binghua. Clinical Trial Design and Statistics. Shanghai Scientific & Technological Literature Publishing House. 1997: 5-9 4. International Clinical Trials Registry Platform (ICTRP). Why is Trial Registration Important? c2013. [cited Aug. 12, 2013]. Available from http://www.who.int/ictrp/trial_reg/en/index.html 5. ICH Harmonised Tripartite Guideline: Guideline for Good Clinical Practice E6(R1). International Conference on Harmonisation of technical requirements for registration of pharmaceuticals for human use. [cited Sept. 20, 2009]. Available from http://www.ich.org/fileadmin/public_web_site/ich_products/guidelines/efficacy/e6_r 1/Step4/E6_R1 Guideline.pdf 6. Kush R. Can A Clinical Trial Protocol be Standardized? c2011. [cited May 5, 2011]. Available from http:// www.cdisc.org/stuff/contentmgr/files/0/4ae64c2a8be8638308db771bcb710893/misc/cdis k_ed.pdf 7. Drug Clinical Trial and the Quality Control Standard of China. [cited Nov. 20, 2012]. Available from http://www.sfda.gov.cn/ws01/cl0053/24473.html
8. Ottawa Statement on Trial Registration. c2005-2013. [April 12, 2013]. Available from http://ottawagroup.ohri.ca/index.html 9. International Clinical Trials Registry Platform (ICTRP). WHO Trial Registration Data Set (Version 1.2.1) c2011. [cited Aug. 5, 2011]. Available from http://www.who.int/ictrp/network/trds/en/index.html 10. A service of the U.S. National Institutes of Health. c2011. [cited Aug. 5, 2011]. Available from http://www.clinicaltrials.gov/ct2/results/map 11. Chinese clinical trial registry. c2011. [cited Aug. 5, 2012]. Available from http://www.chictr.org/cn/proj/search.aspx 12. International Standard Randomised Controlled Trial Number Register. c2011. [cited Aug 5, 2012]. Available from http://www.controlled-trials.com/isrctn/ isrctn_dataset 13. ClinicalTrials.gov Protocol Data Element Definitions (DRAFT). [cited April 12, 2013]. Available from http://prsinfo.clinicaltrials.gov/definitions.html 14. ISRCTN Required Data Items and WHO ICTRP Fields. [cited April 12, 2013]. Available from http://www.controlled-trials.com/isrctn/applications/isrctn_data_items.htm 15. Liu X,Li Y,Wu T, Liu G, Li J. A Survey of the Status of Funding of Registered Chinese Clinical Trials Survey of registration of funded clinical trial in China. Chinese Journal of Evidence-Based Medicine. 2008;8(5):305-311 16. CDISC Submission Data Standards Team. Study Data Tabulation Model. Texas: CDISC, Inc. 2008 17. CDISC Submission Data Standards Team. SDTM Implementation Guide: Human Clinical Trials (SDTMIG). Texas: CDISC, Inc. 2008
18. CDASH_STD-1.0. Clinical Data Acquisition Standards Harmonization (CDASH). Texas: CDISC, Inc. 2008 19. Statistics Information Center of China s ministry of health. China national health data dictionary and metadatamanagement system (DRAFT). [cited April 20, 2013]. Available from http://chisc.net/doc/view/6923.html 20. Electronic Health Record (EHR) Architecture and Data Standard (trial version). c2011. [cited April 20, 2013]. Available from http://www.moh.gov.cn/zwgkzt/ppxxhjs1/200912/45414.shtml 21. W. Davenport Robertson, Ellen M. Leadem, Jed Dube. Design and Implementation of the National Institute of Environmental: Health Sciences Dublin Core Metadata Schema. Proc. Int l. Conf. on Dublin Core and Metadata Applications 2001:193-199 22. ISO/IEC 11179-3 Information technology Metadata registries (MDR) Part 3:Registry metamodel and basic attributes. [S] Geneva: ISO copyright office, 2003 23. WS/T S03 2009 Rules for data element standardization of health information of China. [S] China, China's Ministry of Health, 2009 24. Extensible Markup Language (XML) 1.0 (Fifth Edition). c2008. [cited April 12, 2013]. Available from http://www.w3.org/tr/2008/rec-xml-20081126/ 25. CDISC Study Design Model in XML (SDM-XML). c2008-2011. [cited April 12, 2013]. Available from http://www.cdisc.org/stuff/contentmgr/files/0/8c85b168e80d6834ded59339b55fdbc7/mis c/cdisc_sdm_xml_1.0.pdf