Copy Number Variation: available tools



Similar documents
Introduction to NGS data analysis

Analysis of NGS Data

Next generation sequencing (NGS)

CRAC: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data.

Delivering the power of the world s most successful genomics platform

LifeScope Genomic Analysis Software 2.5

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Next Generation Sequencing: Technology, Mapping, and Analysis

Version 5.0 Release Notes

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data

Accelerating variant calling

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

How-To: SNP and INDEL detection

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

G E N OM I C S S E RV I C ES

Basic processing of next-generation sequencing (NGS) data

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

UGENE Quick Start Guide

Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

DNA Sequencing Data Compression. Michael Chung

Overview sequence projects

Text file One header line meta information lines One line : variant/position

Complete Genomics Sequencing

Next Generation Sequencing; Technologies, applications and data analysis

SAP HANA Enabling Genome Analysis

BioHPC Web Computing Resources at CBSU

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

E. coli plasmid and gene profiling using Next Generation Sequencing

Practical Guideline for Whole Genome Sequencing

Disease gene identification with exome sequencing

Simplifying Data Interpretation with Nexus Copy Number

A Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System

-> Integration of MAPHiTS in Galaxy

Databases and mapping BWA. Samtools

About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster

Challenges associated with analysis and storage of NGS data

Deep Sequencing Data Analysis

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

Writing & Running Pipelines on the Open Grid Engine using QMake. Wibowo Arindrarto DTLS Focus Meeting

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

New solutions for Big Data Analysis and Visualization

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

De Novo Assembly Using Illumina Reads

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Hadoopizer : a cloud environment for bioinformatics data analysis

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequence Variants

Towards Integrating the Detection of Genetic Variants into an In-Memory Database

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Hadoop-BAM and SeqPig

Structural Health Monitoring Tools (SHMTools)

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

A Tutorial in Genetic Sequence Classification Tools and Techniques

Next Generation Sequencing; Technologies, applications and data analysis

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille

escience and Post-Genome Biomedical Research

Searching Nucleotide Databases

Development of Bio-Cloud Service for Genomic Analysis Based on Virtual

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Introduction. Xiangke Liao 1, Shaoliang Peng, Yutong Lu, Chengkun Wu, Yingbo Cui, Heng Wang, Jiajun Wen

How To Find Rare Variants In The Human Genome

Human-Mouse Synteny in Functional Genomics Experiment

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Data Analysis for Ion Torrent Sequencing

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Practical Solutions for Big Data Analytics

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

Supervised DNA barcodes species classification: analysis, comparisons and results. Tutorial. Citations

Computational Genomics. Next generation sequencing (NGS)

How Sequencing Experiments Fail

Module 1. Sequence Formats and Retrieval. Charles Steward

AS Replaces Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

Final Project Report

Bioinformatics Resources at a Glance

Next Generation Sequencing

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production

CD-HIT User s Guide. Last updated: April 5,

Welcome to the Plant Breeding and Genomics Webinar Series

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Analysis of ChIP-seq data in Galaxy

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

Accelerating Data-Intensive Genome Analysis in the Cloud

Transcription:

Copy Number Variation: available tools Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics

Introduction A literature review of available tools. CNV workgroup discussion 1/14 Friday, 6 May 2011

Introduction A literature review of available tools. Contents: Coverage based. Breakpoint based. Discordant pairs based. De novo insertions. Existing pipelines. CNV workgroup discussion 1/14 Friday, 6 May 2011

Coverage based methods CNV workgroup discussion 2/14 Friday, 6 May 2011

Coverage based methods MoGUL: Lee et al. seunghak@cs.cmu.edu Common insertions and deletions in a population. Clustering / Bayesian network. Minor Allele Frequency (MAF) for each variant. Especially suitable for MAF > 0.06 and indels > 20bp. Assigns confidence to variant calls. Proposes mrfast as initial aligner, others may work too? CNV workgroup discussion 2/14 Friday, 6 May 2011

Coverage based methods FREEC: Boeva et al. http://bioinfo-out.curie.fr/projects/freec Window size automatically optimised. Works with either a control dataset... or uses GC content. CNV workgroup discussion 3/14 Friday, 6 May 2011

Coverage based methods FREEC: Boeva et al. http://bioinfo-out.curie.fr/projects/freec Window size automatically optimised. Works with either a control dataset... or uses GC content. In Boeva et al., there is a good comparison: CNV-seq. SegSeq. RDXplorer. CNV workgroup discussion 3/14 Friday, 6 May 2011

Coverage based methods FREEC: Boeva et al. http://bioinfo-out.curie.fr/projects/freec Window size automatically optimised. Works with either a control dataset... or uses GC content. In Boeva et al., there is a good comparison: CNV-seq. SegSeq. RDXplorer. CNV workgroup discussion 3/14 Friday, 6 May 2011

Coverage based methods Name FREEC CNV-seq SegSeq RDXplorer Refs. (Xie and (Chiang et al., (Yoon et al., Tammi, 2009) 2009) 2009) Compatibility Paired-ends/Mate-pairs + - - + Input format SAM, bowtie, BLAT psl, arachne BAM eland, BED, SOLiD arachne, psl, SOAP Genomes Any Any Any Human Strategy Fixed length moving + + - + window + statistical approach Automatic evaluation of + + - window size +* GC-content normalization + - - + Normalization with a + + + - control dataset Refinement of breakpoint - - - boundaries + Table 1: Tools for CNA/CNV prediction. (Boeva et al. 2010) * Window size is selected separatelyfor each region according to the number of reads in this region in the control sample. CNV workgroup discussion 4/14 Friday, 6 May 2011

Coverage based methods Name FREEC CNV-seq SegSeq RDXplorer Refs. (Xie and (Chiang et al., (Yoon et al., Tammi, 2009) 2009) 2009) Compatibility Minimal number of reads Any Any Any High (>1.2x) Segmentation of profiles + - + + Automatic annotation of - - + gain/loss regions + Copy Number prediction** + - - +*** Output CNA coordinates in one + - - + file Graphical output +**** + - - Programming language C/C++ Perl/R Matlab Java/R Computational efficiency Memory usage Low Low Low >2GB of RAM Running time***** 45s w/control 422s 250s N/A sample 105s w/o control Table 1: Tools for CNA/CNV prediction. (Boeva et al. 2010) ** Assignment of copy number changes to gain or loss. *** Only for diploid genomes. **** It is possible to visualize resulted copy number profiles and CNAs with a script R (makegraph.r) included in the package. ***** Preprocessed HCC1143 data from (Chiang et al., 2009) on a 8-core 64-bit Linux machine. CNV workgroup discussion 4/14 Friday, 6 May 2011

Breakpoint based methods CNV workgroup discussion 5/14 Friday, 6 May 2011

Breakpoint based methods GMAP / GSNAP: Wu et al. http://research-pub.gene.com/gmap Index based (global search). Mapping RNA or ESTs. Can use a database, can do without. SNP-tolerant alignment. Potentially suitable for detection of all breakpoints. CNV workgroup discussion 5/14 Friday, 6 May 2011

Breakpoint based methods GMAP / GSNAP: Wu et al. http://research-pub.gene.com/gmap Index based (global search). Mapping RNA or ESTs. Can use a database, can do without. SNP-tolerant alignment. Potentially suitable for detection of all breakpoints. BWA-SW: Li et al. http://bio-bwa.sourceforge.net BWA for long-read alignment. Chimeric reads. CNV workgroup discussion 5/14 Friday, 6 May 2011

Discordant pairs methods CNV workgroup discussion 6/14 Friday, 6 May 2011

Discordant pairs methods Breakway: Clark et al. http://breakway.sourceforge.net Clustering of discordantly mapped paired-end reads. BAM input. Designed for the use in pipelines. CNV workgroup discussion 6/14 Friday, 6 May 2011

Discordant pairs methods Breakway: Clark et al. http://breakway.sourceforge.net Clustering of discordantly mapped paired-end reads. BAM input. Designed for the use in pipelines. GASV: Sindi et al. http://code.google.com/p/gasv Deletions, Invertions. Translocations. No insertions. BAM input. Poor documentation. CNV workgroup discussion 6/14 Friday, 6 May 2011

HYDRA SV: Discordant pairs methods Quinlanet al. http://code.google.com/p/hydra-sv Clustering of discordant pairs. No classifications, but can be done with BEDTools. Future plans: Denovo assembly of insertions. Split reads. Automatic annotation. Toolkit includes useful tools like bamtofastq. CNV workgroup discussion 7/14 Friday, 6 May 2011

HYDRA SV: Discordant pairs methods Quinlanet al. http://code.google.com/p/hydra-sv Clustering of discordant pairs. No classifications, but can be done with BEDTools. Future plans: Denovo assembly of insertions. Split reads. Automatic annotation. Toolkit includes useful tools like bamtofastq. Possibly useful when we want to realign unmapped reads. CNV workgroup discussion 7/14 Friday, 6 May 2011

De novo insertions CNV workgroup discussion 8/14 Friday, 6 May 2011

De novo insertions TIGRA SV: Chen et al. http://tigrasv.sourceforge.net Targeted local assembly. Iterative graph routing assembly. Optional breakpoint library. BAM input. CNV workgroup discussion 8/14 Friday, 6 May 2011

De novo insertions TIGRA SV: Chen et al. http://tigrasv.sourceforge.net Targeted local assembly. Iterative graph routing assembly. Optional breakpoint library. BAM input. Next-generation VariationHunter: Hormozdiari et al. http://compbio.cs.sfu.ca/strvar.htm Transposon insertion discovery. Discordant pairs / conflict resolution. > 85% discovery. > 90% accuracy. DIVET input format. CNV workgroup discussion 8/14 Friday, 6 May 2011

NovelSeq: De novo insertions Hajirasouliha et al. Long novel sequence insertions. In essence a pipeline: 1. Alignment: mrfast. 2. Assembler: EULER-SR / ABySS. 3. Contamination filter: BLAST. 4. Clustering / anchoring: mrcar. 5. Local assembly: mrsaab on the output of 3 and 4. SAM / DIVET input. CNV workgroup discussion 9/14 Friday, 6 May 2011

Miscellaneous CNV workgroup discussion 10/14 Friday, 6 May 2011

Miscellaneous Dindel: Albers et al. http://sites.google.com/site/keesalbers Localized reassembly/realignment. Multiple BAM files. Estimate haplotype frequencies. CNV workgroup discussion 10/14 Friday, 6 May 2011

Mosaik: Miscellaneous Lee & Strömberg. https://github.com/wanpinglee/mosaik Smith-Waterman gapped alignment. Discordantly mapped paired-end reads. Reference-guided assemblies. Toolkit includes: MosaikBuild. MosaikAligner. MosaikSort. MosaikAssembler. Input is raw sequence data. CNV workgroup discussion 11/14 Friday, 6 May 2011

Existing pipelines CNV workgroup discussion 12/14 Friday, 6 May 2011

SVMerge: Existing pipelines Wong et al. http://svmerge.sourceforge.net BreakDancer, Pindel, SE cluster, RDXplorer, RetroSeq,... Expandable. BAM input. CNV workgroup discussion 12/14 Friday, 6 May 2011

Existing pipelines Figure 1: SVMerge pipeline. CNV workgroup discussion 13/14 Friday, 6 May 2011

Questions? Acknowledgements: Martijn Vermaat Maarten van Iterson CNV workgroup discussion 14/14 Friday, 6 May 2011