RNAseq / ChipSeq / Methylseq and personalized genomics 7711 Lecture Subhajyo) De, PhD Division of Biomedical Informa)cs and Personalized Biomedicine, Department of Medicine University of Colorado School of Medicine September 11, 2014
Outline > RNA sequencing The biology of RNA: types of RNA, intron- exon structure, alterna)ve splicing Microarrays to quan)fy RNAs RNAseq protocols Quan)fica)on of RNA abundance Quality control in RNAseq data Microarrays vs RNAseq SoRware and their basic principles RNA edi)ng Chipseq / methylseq The biology of chroma)n histone modifica)ons, CpG methyla)on, nucleosome Quan)fica)on of abundance of chroma)n signature SoRware and their basic principles Combina)on of chroma)n marks Variant detec=on Germ line and soma)c muta)ons Quality control issues SoRware and their basic principles Genomics and Personalized Medicine
RNA sequencing
RNA sequencing > The biology of RNA: types of RNA, intron- exon structure, alterna)ve splicing Transcrip;on start site Intron Types of RNA: (a) mrnas: codes for protein/pep)de sequences (b) trnas: small RNA, necessary component of protein transla)on (c) micrornas: small RNA that plays a regulatory role in major biological processes; typically suppress expression of target genes. (d) Ribosomal RNA: small RNAs that are key components of ribosome (e) Long non- coding RNAs: Large mul)- exonic RNAs. Emerging reports link them to major biological processes (f) Other non- coding RNAs: such as circular RNA Kapranov, BMC Biology, 2010 Exon
RNA sequencing > Microarrays to quan)fy RNAs 1. Quan)le normaliza)on 2. RMA normaliza)on 3. GCRMA 4. Mas5 5. Lim et al. Bioinforma)cs. 2007
RNA sequencing > RNAseq protocol Removal of Ribosomal and other types of RNAs. PolyA selec)on or Ribo- minus. Sequencing using: Pyrosequencing (454 Technologies) Solexa sequencing (Illumina HiSeq) Sequencing by liga)on (SOLiD) Ion Torrent seminonductor sequencing Nanopore sequencing Single molecule real )me sequencing (PacBio) Read length: 50bp 10,000bp Sequencing volume: ~ million reads/sec Sequencing accuracy: 90% - 99.99%
RNA sequencing > Quan)fica)on of RNA abundance RNAseq has excellent reproducibility RPKM (Reads per kilo base per million) is a measure of expression level of a genomic en)ty. Mortazavi et al. Nature Methods. 2008 Paired end read: FPKM (fragments per kilobase of exon per million fragments mapped) RNAseq has excellent dynamic range
RNA sequencing > Quan)fica)on of RNA abundance Mortazavi et al. Nature Methods. 2008
RNA sequencing > Quality control in RNAseq data; Microarrays vs RNAseq RNAseq has excellent reproducibility i.e. low technical varia)on Mortazavi et al. Nature Methods. 2008 The technical varia)on typically follows Poisson distribu)on Marioni et al. Genome Res. 2008 There is considerable amount of biological varia)on Choy et al. PLoS Gene)cs. 2008 There is robust concordance between microarray and RNAseq data for the same sample. Mortazavi et al. Nature Methods. 2008
RNA sequencing > Quality control in RNAseq data GC content bias the RNA expression level The GC- bias is not always straight forward. Hansen et al. Biosta)s)cs. 2011 A C G T Hexamar priming bias RNA expression level Hansen et al. Biosta)s)cs. 2011 Fragment bias also affect paired end RNAseq data Roberts et al. Genome Biol. 2011
RNA sequencing > SoRware
RNA sequencing > SoRware Integra)ve Genome Viewer / Broad Ins)tute Differen)al expression Allelic expression Variant detec)on Alterna)ve splicing iden)fica)on Rapaport et al. Genome Biol. 2013
RNA sequencing > RNA edi)ng Li et al. Science. 2011 Pachter. Nature Biotech. 2012
RNA sequencing > Brain- storming S1: Untreated sample: 6 million reads S2: Treated sample: 4 million reads Depth of coverage plot How do you know if the RNAseq data looks ok? Differen)al expression MA plot
Chipseq / methylseq
Chipseq / methylseq > Chroma)n biology CpG methyla)on, histone modifica)ons, nucleosome Cremer & Cremer. Nature Reviews Genet. 2001
Chipseq / methylseq > Quan)fica)on of abundance of chroma)n signature Gu et al. Nature Protocols. 2011
Chipseq / methylseq > Quan)fica)on of abundance of chroma)n signature Mardis. Nature Methods. 2007 Zhao. Nature Immunology. 2011 Shah. Nature Methods. 2009
Chipseq / methylseq > SoRware and their basic principles Zhao. Nature Immunology. 2011
Chipseq / methylseq > SoRware and their basic principles These programs produce very different peaks in terms of peak size, number, and posi)on rela)ve to genes. Malone et al. PLoS One. 2011
Chipseq / methylseq > Combina)on of chroma)n marks Chroma)n marks are associated with one another Chrom- HMM Earnst and Kellis. Nature Methods. 2012
Chipseq / methylseq > Brainstorming How do you compare the peaks?
Variant detec=on (recap from previous lectures...)
Variant detec=on > Germ line and soma)c muta)ons Biesecker and Spinner. Nature Reviews Gene)cs. 2013 Haemophilia Type- II diabetes Early onset obesity Neurofibromatosis type 1 (NF1) Cancer Soma)c muta)ons are common in healthy human )ssues De. Trends Genet. 2011
Variant detec=on > Quality control issues An example: GATK pipeline (Probably covered in previous lecture...) McKenna et al. Genome Res. 2011
Variant detec=on > SoRware and their basic principles There is reasonable overlap among the muta)on calling sorware O Rawe et al. Genome Medicine. 2013
Variant detec=on > SoRware and their basic principles Detec)on of variants from RNAseq / Chipseq / methylseq data RNAseq:: GATK RNAseq variant calling workflow for calling muta)on from RNAseq data Methylseq:: BisSNP for detec)ng SNP from methylseq ChIPseq:: Simultaneous SNP iden)fica)on and assessment of allele- specific bias from ChIP- seq data (Ni et al. BMC Genomics. 2012) Most of the normal sorware with suitable improvisa)on. Detec)on of low frequency muta)ons in heterogeneous samples Mutect Mutsig JointSNVmix Many more...
Genomics and personalized Medicine
Genomics and Personalized Medicine > Systema)c discovery from clinical samples 100-800 samples from cancer pa)ents RNAseq, muta)on detec)on, methyla)on New cancer genes New drug- development New way to stra)fy pa)ents TCGA, Nature, 2012 Applica)on of genome sequencing to detect disease progression Relapsed Acute Myeloid Leukemia 8 pa)ents Primary and relapse samples Novel cancer gene muta)ons Ding et al, Nature, 2011 Clonal evolu)on as a result of chemotherapy Genome sequencing shaping diagnosis and treatment Welch et al, JAMA, 2011 Acute promyelocy)c leukemia One pa)ent 7 weeks for genome sequencing and analysis Ac)onable gene fusion detected Changed the treatment plan for the pa)ent
Contact details > Subhajyo= De, PhD Assistant Professor Department of Medicine, University of Colorado School of Medicine Anschutz Medical Campus Email: subhajyo).de@ucdenver.edu Phone: +1 303 724 6461 Webpage: hup://www.sjdlab.org/