Submitting Data to ISCA and NCBI created by Tim Hefferon last updated August 28, 2012 Dear ISCA Submitter, This brief guide is intended to make the submission of your copy number variation and clinical data to ISCA and NCBI easy and straightforward. If you have any questions after reviewing this guide, please send an email to iscahelp@ncbi.nlm.nih.gov. The submission process involves the following steps: 1. Register your lab for an ISCA-dbGaP submission account (see below). 2. Review the attached ISCA submission template. 3. Transfer your data to the submission spreadsheet. There are three tabs to complete (indicated in yellow): SAMPLES, EXPERIMENTS, and VARIANT CALLS. Details and instructions for completing these can be found in this document, as well as in the blue INSTRUCTIONS tabs in the spreadsheet. Figure 1: The submission spreadsheet. Please complete the yellow tabs; blue tabs contain instructions. 4. Log in to your dbgap account (see step 1) and upload your completed submission file. 5. dbgap will process your submission, store clinical information and submitted identifiers behind controlled access, and then sever links between samples and calls so aggregate data can be sent to dbvar. 6. dbvar will further process your data and assign permanent accession id s to your variants. A list of id s will be returned to dbgap, who will pass them on to you; you can then link these back to your original data using the sample ID s you submitted. 1
7. dbvar releases ISCA data updates on a quarterly basis. Your data will be combined with variants reported by other ISCA labs during the same quarter. dbvar receives all ISCA data in aggregate and cannot determine which variants come from which labs. In addition, no personally identifying information is stored at dbvar such information will always remain behind controlled access at dbgap. Notes on linking variants in dbgap and dbvar 1. WARNING: Do not submit any data files containing individual level data to dbgap. NCBI takes protection of human subjects very seriously, and it is important that you assign anonymous patient ids prior to submitting your data. 2. dbgap retains all data you submit, including sensitive information like patient clinical phenotypes and connections between observed phenotypes and reported variants. This information is kept behind strict controlled access at dbgap; in order to see it users must go through a secure NIH approval system (see https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login). 3. Before forwarding your data to dbvar, dbgap will sever all informational links between patients and calls to preserve patient privacy and anonymity. There is an exception: When multiple variant calls are reported in the same patient with clinical assertion values of Pathogenic, Uncertain significance: likely pathogenic, or Uncertain significance, their co-occurrence may be an important aspect of their effect on clinical expression. Therefore their relationship must be retained and displayed as an integral part of the data. To achieve this, dbgap creates fake sample IDs linking the variants; dbvar can then display these specific sample:variant relationships without risking patient confidentiality. REGISTERING FOR A SUBMISSION ACCOUNT You must register your study with dbgap before you can submit any data. Once your name is in the system, you can acquire a secure submission account for the person who will be uploading the files to dbgap (if different from you). Please follow these steps: 1. Register for an NIH era account (if you don t already have one). Usernames and passwords for the dbgap submission systems are managed as part of the era Commons NIH system. Your lab or organization likely already has an era account as part of your dealings with NIH grants, in which case you should already be able to use your credentials to log in to dbgap: 2
https://dbgap.ncbi.nlm.nih.gov/ss/dbgapss.cgi?login If you cannot log in (or if you are sure you do not already have an era account), please register for one here: https://public.era.nih.gov/commons/public/registration/registrationinstructions.jsp If you have questions regarding the era account application process please contact the era help desk: https://public.era.nih.gov/commons/public/contacts.jsp. If you still encounter problems getting an era account, contact us at iscahelp@ncbi.nlm.nih.gov. 2. Log in to dbgap using your era credentials. Log in to dbgap using your era credentials from step one: https://dbgap.ncbi.nlm.nih.gov/ss/dbgapss.cgi?login 3. Request a ISCA dbgap Submission account. After you have successfully logged in, send us an email (isca-help@ncbi.nlm.nih.gov) requesting a ISCA-dbGaP Submission Account. We will then create your account and ask you to confirm that you are able to log in. You will then be able to upload your ISCA submission file. COMPLETING THE SUBMISSION SPREADSHEET SAMPLES Figure 2: The SAMPLES tab Each sample in your study should be entered in a separate row of the SAMPLES worksheet. If you want to enter more than one sample for a given subject, each 3
sample should have a unique sample_id (column A) but the same subject_id (column G). Enter no-call samples just as you would any other sample; however, you must identify them by indicating Yes in the is_no_call field (column P). REQUIRED FIELDS sample_id subject_id subject_phenotype is_no_call consent may_recontact OPTIONAL FIELDS sample_resource sample_cell_type sample_cancer sample_attribute sample_karyotype subject_collection subject_population subject_karyotype subject_sex subject_age subject_maternal_id subject_paternal_id family_history Instructions for completing each field are included in the blue SAMPLES INSTRUCTIONS tab (to the right of the yellow tabs). Phenotypes Phenotype information on subjects must be entered using established terminology from the Human Phenotype Ontology (HPO). These should be supplied as a commadelimited list of vocabulary:term_id pairs e.g., HP:0007018, HP:0001249. Please see Appendix A to this document for phenotype terms commonly used by ISCA. If you are unable to find suitable HPO terms, ISCA has access to a text-mining algorithm that can help determine the most suitable term ids for your phenotypes. The only exception to the rule requiring the use of HPO vocabulary terms is that you may instead use a very general text designation that was developed specifically for ISCA: Developmental Delay and additional significant developmental and morphological phenotypes referred for genetic testing. This phrase may be supplied instead of vocabulary:id pairs, if you do not wish to indicate more specific phenotypes. 4
EXPERIMENTS Figure 3: The EXPERIMENTS tab The EXPERIMENTS tab is used to record the specific methods, analyses, and platforms you used to generate and validate your data. Note that experiment_id 1 through 5 have already been completed with commonly-used parameters matching ISCA-related studies. If the pre-filled information adequately describes your experiments, you need only complete reference_type and reference_value for experiment 1 (you can include additional details in other columns if desired). You do not have to use the pre-completed experiments; if you wish to enter your own experiments, replace the gray sample text with the desired information. REQUIRED FIELDS experiment_id method_type analysis_type reference_type reference_value OPTIONAL FIELDS experiment_resolution method_platform method_description analysis_description detection_method detection_description external_links site Instructions for completing each field are included in the blue EXPERIMENTS INSTRUCTIONS tab (to the right of the yellow tabs). 5
VARIANT CALLS Figure 4: The VARIANT CALLS tab The VARIANT CALLS tab is used to record the details of your variant calls, and represents the core of your data. It includes clinical assertions you have made, phenotypes included in those assertions, copy number data, and the genomic locations of your variant calls. REQUIRED FIELDS variant_call_id variant_call_type experiment_id sample_id clinical_significance phenotype copy_number assembly chr inner_start inner_stop OPTIONAL FIELDS validation description origin is_parent_of_origin_affected mode_of_inheritance zygosity external_links outer_start outer_stop Clinical Assertions and Phenotypes Clinical assertions are one of the most important aspects of your ISCA data. As with the subject_phenotype field in SAMPLES tab, phenotype information must be provided here as vocabulary:term_id pairs. Again, the only exception is the generic text designation developed for ISCA, Developmental Delay and additional significant developmental and morphological phenotypes referred for genetic testing. 6
Genomic Location of Variants Figure 5: Genomic coordinates section of VARIANT CALLS tab Required fields are indicated in yellow. We strongly recommend you also include outer_start and outer_stop coordinates whenever possible. Reporting Coordinates in the Pseudoautosomal Region (PAR) If you are reporting variants that fall within the pseudoautosomal regions of chromosome X (PAR1) and chromosome Y (PAR2), report only the X chromosome coordinates. Please do not use X/Y. 7
FREQUENTLY ASKED QUESTIONS 1. How do I submit no-call samples (samples in which I am not reporting any variants)? Enter sample information for no-call samples just as you would other samples, in the SAMPLES tab. Simply indicate Yes in the required field, is_no_call (column P). 2. How do I register my lab? Please see the section Registering for a Submission Account at the bottom of page 2 of this document. Additional registration with dbvar is not necessary. 3. How do I report variants in the pseudoautosomal region of chromosomes X and Y? Please report only X chromosome coordinates for PAR variants. Do not indicate X/Y in the chr field. 8
APPENDIX A: HPO PHENOTYPE TERMS AND IDs Instructions: The accurate interpretation and reporting of genetic test results is contingent upon the reason for referral, clinical information provided, and family history. To help provide the best possible service, please check the applicable clinical information below. Patient Identification Patient Name: (Last) (First) Gender: [ ] Male [ ] Female Date of Birth: (mm/dd/yyyy) Clinical Information Check all that apply. Use additional space at the bottom of the form if needed. Perinatal History [ ] Prematurity (HP:0001622) [ ] Intrauterine growth restriction (HP:0001511) [ ] Oligohydramnios (HP: 0001562) [ ] Polyhydramnios (HP: 0001561) [ ] Non-immune hydrops fetalis (HP: 0001790) Growth [ ] Failure to thrive (HP: 0001508) [ ] Overgrowth (HP: 0001548) [ ] Short stature (HP: 0004322) Cognitive/Developmental [ ] Developmental delay (HP: 0001263) [ ] Gross motor delay (HP: 0002194) [ ] Fine motor delay (HP: 0010862) [ ] Speech delay (HP: 0000750) [ ] Intellectual disability/mr (HP:0001249) Behavioral/Psychiatric [ ] Autism (HP: 0000717) [ ] Autism spectrum disorder (HP: 0000729) (includes pervasive developmental delay and Asperger syndrome) [ ] Attention deficit hyperactivity disorder (HP: 0007018) [ ] Anxiety (HP: 0007018) [ ] Behavioral/psychiatric abnormality (HP: 0000708) Cutaneous [ ] Hyperpigmentation (HP: 0000953) [ ] Hypopigmentation (HP: 0001010) Neurological [ ] Seizures (HP: 0001250) [ ] Hypotonia (HP: 0001252) [ ] Hypertonia (HP: 0001276) [ ] Cerebral palsy (HP: 0100021) [ ] Encephalopathy (HP: 0001298) [ ] Structural brain anomaly (HP: 0002011) Cardiac [ ] Atrial septal defect (HP: 0001631) [ ] Ventricular septal defect (HP: 0001629) [ ] Coarctation of the aorta (HP: 0001680) [ ] Tetralogy of Fallot (HP: 0001636) [ ] Other structural heart defect (HP: 0002564) [ ] Other cardiac abnormality (HP: 0001627) Craniofacial [ ] Dysmorphic facial features (HP: 0002260) [ ] Ear malformation (HP: 0000598) [ ] Cleft lip (HP: 0000204) [ ] Cleft palate (HP: 0000175) [ ] Macrocephaly (HP: 0000256) [ ] Microcephaly (HP: 0000252) Hearing/Vision [ ] Hearing loss (HP: 0000365) [ ] Abnormality of vision (HP: 0000504) Specify: [ ] Abnormality of eye movement (HP: 0000496) Musculoskeletal [ ] Contractures (HP: 0001371) [ ] Club foot (HP: 0001762) [ ] Diaphragmatic hernia (HP: 0000776) [ ] Limb anomaly (HP: 0002813) [ ] Polydactyly (HP: 0010442) [ ] Syndactyly (HP: 0001159) [ ] Vertebral anomaly (HP: 0003468) Specify: Gastrointestinal [ ] Gastroschisis (HP: 0001543) [ ] Omphalocele (HP: 0001539) [ ] Anal atresia (HP: 0002023) [ ] Tracheoesophageal fistula (HP: 0002575) [ ] Pyloric stenosis (HP: 0002021) Genitourinary [ ] Ambiguous genitalia (HP: 0000062) [ ] Hydronephrosis (HP: 0000126) [ ] Kidney malformation (HP: 0000792) Specify: [ ] Cryptorchidism (HP: 0000028) [ ] Hypospadias (HP: 0000047) Family History [ ] Parents with 2 miscarriages [ ] Other relatives with similar clinical history Explain: As a participant in the ISCA (International Standards for Cytogenomic Arrays) Consortium, this clinical cytogenetics laboratory contributes submitted clinical information and test results to a HIPAA compliant, de-identified public database as part of the NIH s effort to improve understanding of the relationship between genetic changes and clinical symptoms. Confidentiality is maintained. Patients may request to opt-out of this scientific effort by: 1) checking the box below, 2) calling the laboratory at XXX-XXX-XXXX and asking to speak with a laboratory genetic counselor. Please call with any questions. [ ] Indicate refusal for inclusion in these efforts by checking this box. If the box is not marked, data will be anonymized and used. 9
APPENDIX B: ISCA SUBMISSION TEMPLATE Please see the Excel spreadsheet (ISCA-NCBI Submission Template.xlsx) included in the zipped archive which contained this submission guide. 10