Analysis of NGS Data

Similar documents
Next generation sequencing (NGS)

Copy Number Variation: available tools

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Introduction to NGS data analysis

Practical Guideline for Whole Genome Sequencing

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Deep Sequencing Data Analysis

-> Integration of MAPHiTS in Galaxy

Accelerating variant calling

New solutions for Big Data Analysis and Visualization

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Towards Integrating the Detection of Genetic Variants into an In-Memory Database

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

Basic processing of next-generation sequencing (NGS) data

Text file One header line meta information lines One line : variant/position

Databases and mapping BWA. Samtools

Next Generation Sequencing: Technology, Mapping, and Analysis

Delivering the power of the world s most successful genomics platform

CHALLENGES IN NEXT-GENERATION SEQUENCING

Comparing Methods for Identifying Transcription Factor Target Genes

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Analysis of ChIP-seq data in Galaxy

HiSeq Analysis Software v0.9 User Guide

Hadoopizer : a cloud environment for bioinformatics data analysis

Integrated Rule-based Data Management System for Genome Sequencing Data

LifeScope Genomic Analysis Software 2.5

A Complete Example of Next- Gen DNA Sequencing Read Alignment. Presentation Title Goes Here

How-To: SNP and INDEL detection

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production

Version 5.0 Release Notes

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Accelerating Data-Intensive Genome Analysis in the Cloud

Introduction. Overview of Bioconductor packages for short read analysis

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster

CloudMap: A Cloud-based Pipeline for Analysis of Mutant Genome Sequences

Data formats and file conversions

Disease gene identification with exome sequencing

Hadoop-BAM and SeqPig

Next generation DNA sequencing technologies. theory & prac-ce

High Throughput Sequencing Data Analysis using Cloud Computing

UGENE Quick Start Guide

Using Galaxy for NGS Analysis. Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Challenges associated with analysis and storage of NGS data

A Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System

Next Generation Sequencing

Core Facility Genomics

Accessing the 1000 Genomes Data. Paul Flicek European BioinformaMcs InsMtute

An FPGA Acceleration of Short Read Human Genome Mapping

Data Analysis for Ion Torrent Sequencing

Workflow. Reference Genome. Variant Calling. Galaxy Format Conversion Groomer. Mapping BWA GATK Preprocess

SRA File Formats Guide

BioHPC Web Computing Resources at CBSU

Practical Solutions for Big Data Analytics

Development of Bio-Cloud Service for Genomic Analysis Based on Virtual

AS Replaces Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

escience and Post-Genome Biomedical Research

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

A Hitchhiker s Guide to Next-Generation Sequencing

Biology in the Clouds

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

HPC pipeline and cloud-based solutions for Next Generation Sequencing data analysis

De Novo Assembly Using Illumina Reads

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Issues in Data Storage and Data Management in Large- Scale Next-Gen Sequencing

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

CSE-E5430 Scalable Cloud Computing. Lecture 4

NEXT GENERATION SEQUENCING

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

DNA Sequencing Data Compression. Michael Chung

NGS Data Analysis: An Intro to RNA-Seq

Welcome to the Plant Breeding and Genomics Webinar Series

Next Generation Sequencing

Writing & Running Pipelines on the Open Grid Engine using QMake. Wibowo Arindrarto DTLS Focus Meeting

Subread/Rsubread Users Guide

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

July 7th 2009 DNA sequencing

Automated and Scalable Data Management System for Genome Sequencing Data

NGS Technologies for Genomics and Transcriptomics

Introduction to next-generation sequencing data

Computational Genomics. Next generation sequencing (NGS)

Human Genomes and Big Data Challenges QUANTITY, QUALITY AND QUANDRY Gerry Higgins, M.D., Ph.D. AssureRx Health, Inc.

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Transcription:

Analysis of NGS Data Introduction and Basics Folie: 1

Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference (variation, read distribution, read frequencies) Folie: 3

Overview of Analysis Workflow Images Basecalling Primary Analysis (on sequencer using vendor`s software) Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference (variation, read distribution, read frequencies) Folie: 4

Overview of Analysis Workflow Images Basecalling Sequenzen denovo - Sequencing Assembly Annotation Resequencing Alignments Secondary Analysis (on downstream computers using open-source Tools or vendor`s software) Comparison to reference (variation, read distribution, read frequencies) Folie: 5

Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference (variation, read distribution, read frequencies) Tertiary Analysis (on downstream computers Using open-source Tools or vendor`s software) Folie: 6

Overview of Data Amounts and Formats Several TB Images Basecalling Hundreds of GB Sequences denovo - Sequencing FASTA Assembly Annotation TXT-Formate FASTQ Resequencing Alignments SAM, BAM, TXT Comparison to reference (variation, read distribution, read frequencies) Hundreds of GB Few GB or less Folie: 7

Resequencing Genome of sequenced organism already known WGS: Sequencing of complete DNA Target-Enriched: Enrichment of specific target regions prior to sequencing Folie: 11

Alignment Identify the origin of each read in the original genome Difficulties: short reads (36 bp 400 bp) many reads (several millions) sequencing errors genomic variation Folie: 12

SAM / BAM Format SAM Format Specification (v1.4-r962) Alignment-Startposition (1-basiert) Folie: 21

SAM / BAM Format SAM Format Specification (v1.4-r962) Alignment Quality -10 log10 P(alignment position is wrong) in [0,255], 255 for not available Folie: 22

SAM / BAM Format SAM Format Specification (v1.4-r962) CIGAR String Folie: 23

SAM / BAM Format SAM Format Specification (v1.4-r962) reference name of next partner read = : same reference * : not available Folie: 24

SAM / BAM Format SAM Format Specification (v1.4-r962) Alignment position of next partner read Folie: 25

SAM / BAM Format SAM Format Specification (v1.4-r962) Fragmentsize 0: single read or not available Folie: 26

SAM / BAM Format SAM Format Specification (v1.4-r962) Sequence Folie: 27

SAM / BAM Format SAM Format Specification (v1.4-r962) Basecalling qualities for each base of read * : not available Folie: 28

SAM / BAM Format SAM Format Specification (v1.4-r962) Optional fields following the TAG:TYPE:VALUE format (here: edit-distance) Folie: 29

Folie: 30

Folie: 31

Folie: 32

Software Seed-and-Extend BWT-Based MAQ BWA Eland (Illumina) Bowtie BFAST SOAP2 Mosaik Differ in Stampy ability to do gapped alignment read length requirements NovoAlign ability to do PE-alignment speed and memory footprint Folie: 33

Variantcalling (SNPs/Indels) Bayesian Approach P( D g ) P( g ) P( g D) = P( D) Folie: 34

Variantcalling SNPs Pileup of reads against reference sequence Filter: - Basequality Alignmentquality Frequency of variant Variant in both forward and reverse reads Folie: 35

Variantcalling Indels Pileup of reads against reference sequence Generation of candidate haplotypes Realignment of reads against candidate haplotypes Probability of each candidate haplotype Folie: 36

Variantcalling Indels Dindel: Accurate indel calls from short-read data C.A. Albers et al. Genome Research 2011 Folie: 37

Variantcalling SVs Anomalous PE-alignment deletion in sample Folie: 38

Variantcalling SVs Anomalous PE-alignment insertion in sample Folie: 39

Variantcalling SVs Anomalous PE-alignment inversion in sample Folie: 40

Variantcalling SVs Anomalous PE-alignment Chr B Chr A inter-chromosomal rearrangement Folie: 41

Variantencalling SVs Partially aligning reads partially aligned reads completely aligned reads deletion Folie: 42

Software SNPs / Indels - Samtools (Sanger) - GATK (Broad) - SOAP (BGI) (SNPs only) - Vendor`s software Indels - Dindel (Sanger) SVs - BreakDancer - CREST Folie: 43