Exome sequencing e targeted sequencing in diagnostica e ricerca applicata: pro e contro P. Gasparini University of Trieste IRCCS-Burlo Garofolo Children Hospital SIGU Sorrento 22/11/2012
Our NGS facility NGS core facility is located at: 1)CBM scrl Basovizza,TS, Italy 2)IRCCS-Burlo Garofolo, TS, Italy NGS systems available: 1) HiScanSQ - Illumina 2) Solid4 System- LifeTecnologies 3) Ion Torrent LifeTechnologies
Our experience 274 whole genome sequencing data from isolated populations 110 whole exome sequencing data from families with inherited diseases Several targeted sequencing data for diagnostic purposes
The pipeline Data production Mapping Quality control Filtering Final data analysis Variants prioritization Sanger confirmation
Whole genome sequencing in isolated populations Italy Qatar Genetically, cultural, social and environmental factors homogeneous Silk Road Geographic and/or linguistic barrier Founder effect or a bottleneck Endogamy rate very high
Protocol + + + Whole genome genotyping Whole genome sequencing Imputation Phenotype To identify rare variants underlying qualitative and quantitative traits
What we have learned: Useful approach to accurately impute rare variants Outsourcing for data production is a good option!! Need for fully dedicated people for data analysis
Whole exome sequencing (WES) Exome = Protein coding sequences 220.000 exons 50 megabases Includes canonical splice sites, mirnas 5 &3 untranslated regions
WES example 1 mental retardation
Project Dissecting the molecular bases of mental retardation (without other signs and symptoms) 1. SNP array in RM 2. Patients selection 3. Exome sequencing 4. Variants selection 5. Validations 6. Future plan 250 cases MR array 216 neg for causative CNVs 10 probands with parents 34 pos for causative CNVs Inclusion criteria
Study criteria Inclusion s criteria Moderate to severe Mental Retardation Healthy parents No familial aggregation Negative SNP array data Negative test for fragile-x syndrome Exclusion s criteria Pre/perinatal infections Malformations brain abdominal cardiac Hearing loss Micro/macrocefaly Skeletal abnormalities Genital abnormalities Epilepsy
Material and methods Sample preparation: Illumina TruSeq DNA Sample Preparation (LT) Kit Enrichment: Illumina TruSeq kit approximately 62 Mb of genome sequencing was targeted Exome sequencing: Illumina HiScanSQ platform Production Time: Enrichment+Library 4 days. Ilumina 8 days Filtering of data: 1. Coverage at least 20x in probands and 4x in parents. 2. db SNPs 3. Internal database 4. Selection of de novo variants 5. Type of mutations
0
Prioritarization of the variants According to: a) Literature b) Gene function and pathways (if known) c) Gene expression d) in silico evaluation of the variant using different software/algorithms e) Possible clinical relationships
Filtering (pattern of inheritance): 1 de novo, 2 X-linked, 3 recessive
Summary of results (de novo) Family de novo selected Sanger True de novo 1 195 18 1 0 3 119 7 6 1 4 69 15 0 0 5 136 5 2 1 6 97 8 6 1 7 303 30 0 0 8 84 14 5 0 9 154 17 3 0 10 137 16 6 0 TOTAL 130 29 3 29/130 confirmed (false positive) 3/29 true de novo (false negative)
Filtering - de novo JARID1B/KDM5B: Histone demethylase interacts with FOXG1 JARID1C already found to be involved in mental retardation Am J Hum Genet. 2005 Feb;76(2):227-36. CIC : Granular cell development in CNS. A recent NGS study showed a de novo mutation in a MR case Nat Genet. 2010 Dec;42(12):1109-12 KIAA2022 : highly expressed in fetal and adult brain. Known to be associated with MR Gene Expr Patterns. 2009 Sep;9(6):423-9.
Filtering X-linked Secreted semaphorins control spine distribution and morphogenesis in the postnatal CNS Nature. 2009 Dec 24;462(7276):1065-9. Epub 2009 Dec 13
Filtering recessive (homozigosity)
What we have learned: A high number of false positive and false negative cases A lot of Sanger sequencing to be carried out Some MR cases clearly explained through this approach A useful combination of filtering according different patterns of inheritance
WES example 2 Hearing loss
HEARING LOSS 70% GENETIC CAUSES Hereditary Hearing loss (HHL) Complex disease Dominant loci 54 29 genes lacking Recessive loci 71 29 genes lacking X-linked loci 5 3 genes lacking Age-Related Hearing loss HHL genes ion homeostasis: e.g GJB2, ion channels (KCNQ4, SLC26A4) hair bundle morphogenesis: e.g motor proteins (MYO15A), adhesion protein: CDH23) extracellular matrix proteins: ion homeostasis proteins (COCH, TECTA) poorly understood function
HHL: The Italian and Qatari picture GJB2 gene accounts for approx 1/3 of all cases and 35delG represents approx 70% of all GJB2 alleles GJB2 mutations are present in only a minority (<10%) of Qatari patients affected with non-syndromic hearing loss Mitochondrial A1555G mutation is present in 1/1000 in the general population Many families have been collected over several decades; they are not large enough to be included in linkage studies but can be used for NGS studies GJB6 deletion (D13S1830) was never detected Mitochondrial mutation MT-RNR1 (1555G>A) was never detected There is a strong need to search for the causative Qatari gene/s
What we have learned: Two new genes have been identified A high number of false positive and false negative cases A lot of Sanger sequencing to be carried out Bioinformatics analysis and the databases searches (i.e. expression, mutation, prediction, etc.) play a key role
WES example 3 Alagille syndrome
Alagille syndrome autosomal dominant, multi-system disorder with highly variability. Population prevalence = 1:70.000 Two genes have been described: -JAG1 (99% of cases) -NOTCH2 (1% of cases). Our family III-1 -> Neonatal jaundice; retinal hemorrhage; high levels of hepatic enzymes values and thoracic aorta hypoplasia. ALGS was suspected > liver biopsy > paucity of intrahepatic bile ducts. II-1 -> Coarctation of aorta; normal liver functionality. II-3 -> No clinical symptoms. I-2 -> Unspecified renal failure. III-1-> coding region of JAG1 and NOTCH-2 analyzed by dhplc and MLPA no mutation!
Exome sequencing was performed on 3 family members (III-1, II-1 and II-3) -SureSelect Human All Exon 38Mb Target Enrichment System (Agilent Technologies) -SOLiD 4 (Life Technologies) A new missense mutation (-> c.1308c>g, p.c436w, NM_00124) was identified in JAG1 gene confirmed by Sanger Sequencing and segregating with the disease PolyPhen-2, Mutation Taster, Condel: predict mutation as pathogenic.
What we have learned: There is a need to revise all the molecular diagnostic work done so far and hopefully lead to remarkable improvements of counseling and of diagnostic algorithms
WES example 4 Usher syndrome
Whole Exome Sequencing and focused analysis: Usher syndrome Retinitis + hearing loss Autosomal recessive inheritance 3 Clinical types (I, II and III) 13 loci mapped 10 genes identified Digenic inheritance demonstrated (CDH23/PCDH15)
Whole Exome sequencing 9 cases negative for the presence of mutations within the Asper chip
What we have learned: In the best scenario we can explain 7 cases with known genes, but not the remaining two (homozigosity might help) In most cases we do not have idea about the pathogenetic role of the variants detected!!...how can we prepare a diagnostic report? The enrichment step dramatically affects the coverage of each gene (probes are not equally distributed and some exons/genes are lacking). For this reason we have decided, both for HHL and Usher, to design custom targeted resequencing protocols based on Ion Torrent platform There is a huge need to functionally validate variants
Targeted resequencing Diagnostic
Usher Genes Long PCR NGS of Long PCR to detect unknown mutations. 1) 2 Sample with single different identified causal heterozygote mutation. 2) 1 Sample from patient clinically identified without molecular characterization. 3) 1 Healthy sample. Long PCR DNA Whole Genomic region : 2.5M bp ooptimization of: ofocus on coding regions. oaverage amplicons length. ogc content. After optimization: 0.7M bp GAII GAII Single Sample per line 2x75bp reads. 4 Sample Multiplexing 2x100bp reads. 454 FX 4-Region Plate: 4 Sample Multiplexing. Titanium reads. 34 34 34
Long PCR On-Exon) -4,3% -4,5% -10% -17,4% Long PCR covers 95-100% of exons in the selected genes even for high (>40X) coverage. Agilent SureSelect shows a lower percentage of covered exons and a further dropping for high (>40X) coverage.
Step 1. Targeted re-sequencing by multiplex long PCR A positive control, a heterozygous, one patient with unknown allelles, and a negative control. All patients (3) are characterized by known genes, some differences have been detected among platforms.
Additional examples Ion Torrent technology 1)Small size: CFTR gene analysis 2)Medium size: Chronic Lymphocytic leukemia (9 genes) and Usher syndrome (10 genes) 3)Large size: hearing loss chip (100 genes)
An example of RDB/Sanger sequencing vs NGS Ion Torrent The CFTR gene (AmpliSeq design CDS+UTR) Besides these causing mutations we were able to detect 3 known polymorphisms in all samples
What we have learned: Targeted resequecing is reliable, accurate and affordable in few words it works!!!
Acknowledgements Medical Genetics, DSM, University of Trieste, IRCCS-Burlo Garofolo Trieste, Italy Danilo Licastro Diego Vozzi Flavio Faletra Giorgia Girotto Emmanouil Athanasakis Laura Esposito Angela D eustacchio Savina Dipresa Marcello Morgutti SUN and Tigem, Naples, Italy Sandro Banfi Vincenzo Nigro Funds from Telethon Foundation (Italy), QNRF (Qatar)