Analysis of NGS Data Introduction and Basics Folie: 1
Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference (variation, read distribution, read frequencies) Folie: 3
Overview of Analysis Workflow Images Basecalling Primary Analysis (on sequencer using vendor`s software) Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference (variation, read distribution, read frequencies) Folie: 4
Overview of Analysis Workflow Images Basecalling Sequenzen denovo - Sequencing Assembly Annotation Resequencing Alignments Secondary Analysis (on downstream computers using open-source Tools or vendor`s software) Comparison to reference (variation, read distribution, read frequencies) Folie: 5
Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference (variation, read distribution, read frequencies) Tertiary Analysis (on downstream computers Using open-source Tools or vendor`s software) Folie: 6
Overview of Data Amounts and Formats Several TB Images Basecalling Hundreds of GB Sequences denovo - Sequencing FASTA Assembly Annotation TXT-Formate FASTQ Resequencing Alignments SAM, BAM, TXT Comparison to reference (variation, read distribution, read frequencies) Hundreds of GB Few GB or less Folie: 7
Resequencing Genome of sequenced organism already known WGS: Sequencing of complete DNA Target-Enriched: Enrichment of specific target regions prior to sequencing Folie: 11
Alignment Identify the origin of each read in the original genome Difficulties: short reads (36 bp 400 bp) many reads (several millions) sequencing errors genomic variation Folie: 12
SAM / BAM Format SAM Format Specification (v1.4-r962) Alignment-Startposition (1-basiert) Folie: 21
SAM / BAM Format SAM Format Specification (v1.4-r962) Alignment Quality -10 log10 P(alignment position is wrong) in [0,255], 255 for not available Folie: 22
SAM / BAM Format SAM Format Specification (v1.4-r962) CIGAR String Folie: 23
SAM / BAM Format SAM Format Specification (v1.4-r962) reference name of next partner read = : same reference * : not available Folie: 24
SAM / BAM Format SAM Format Specification (v1.4-r962) Alignment position of next partner read Folie: 25
SAM / BAM Format SAM Format Specification (v1.4-r962) Fragmentsize 0: single read or not available Folie: 26
SAM / BAM Format SAM Format Specification (v1.4-r962) Sequence Folie: 27
SAM / BAM Format SAM Format Specification (v1.4-r962) Basecalling qualities for each base of read * : not available Folie: 28
SAM / BAM Format SAM Format Specification (v1.4-r962) Optional fields following the TAG:TYPE:VALUE format (here: edit-distance) Folie: 29
Folie: 30
Folie: 31
Folie: 32
Software Seed-and-Extend BWT-Based MAQ BWA Eland (Illumina) Bowtie BFAST SOAP2 Mosaik Differ in Stampy ability to do gapped alignment read length requirements NovoAlign ability to do PE-alignment speed and memory footprint Folie: 33
Variantcalling (SNPs/Indels) Bayesian Approach P( D g ) P( g ) P( g D) = P( D) Folie: 34
Variantcalling SNPs Pileup of reads against reference sequence Filter: - Basequality Alignmentquality Frequency of variant Variant in both forward and reverse reads Folie: 35
Variantcalling Indels Pileup of reads against reference sequence Generation of candidate haplotypes Realignment of reads against candidate haplotypes Probability of each candidate haplotype Folie: 36
Variantcalling Indels Dindel: Accurate indel calls from short-read data C.A. Albers et al. Genome Research 2011 Folie: 37
Variantcalling SVs Anomalous PE-alignment deletion in sample Folie: 38
Variantcalling SVs Anomalous PE-alignment insertion in sample Folie: 39
Variantcalling SVs Anomalous PE-alignment inversion in sample Folie: 40
Variantcalling SVs Anomalous PE-alignment Chr B Chr A inter-chromosomal rearrangement Folie: 41
Variantencalling SVs Partially aligning reads partially aligned reads completely aligned reads deletion Folie: 42
Software SNPs / Indels - Samtools (Sanger) - GATK (Broad) - SOAP (BGI) (SNPs only) - Vendor`s software Indels - Dindel (Sanger) SVs - BreakDancer - CREST Folie: 43