Next Generation Sequencing for Invertebrate Virus Discovery

Similar documents
Next Generation Sequencing

Next generation DNA sequencing technologies. theory & prac-ce

NGS data analysis. Bernardo J. Clavijo

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Introduction to NGS data analysis

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Introduction to next-generation sequencing data

Computational Genomics. Next generation sequencing (NGS)

Next Generation Sequencing: Technology, Mapping, and Analysis

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

NGS Technologies for Genomics and Transcriptomics

Introduction to Bioinformatics 3. DNA editing and contig assembly

July 7th 2009 DNA sequencing

A Primer of Genome Science THIRD

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

How Sequencing Experiments Fail

Introduction. Overview of Bioconductor packages for short read analysis

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

PreciseTM Whitepaper

Next Generation Sequencing for DUMMIES

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

RNAseq / ChipSeq / Methylseq and personalized genomics

Description: Molecular Biology Services and DNA Sequencing

Microbial Oceanomics using High-Throughput DNA Sequencing

Bioinformatics Resources at a Glance

Bacterial Next Generation Sequencing - nur mehr Daten oder auch mehr Wissen? Dag Harmsen Univ. Münster, Germany dharmsen@uni-muenster.

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Next Generation Sequencing; Technologies, applications and data analysis

G E N OM I C S S E RV I C ES

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

Introduction Bioo Scientific

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT

Services. Updated 05/31/2016

Overview of Next Generation Sequencing platform technologies

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Concepts and methods in sequencing and genome assembly

Module 1. Sequence Formats and Retrieval. Charles Steward

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Next Generation Sequencing

Data Analysis for Ion Torrent Sequencing

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

Version 5.0 Release Notes

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

SEQUENCING. From Sample to Sequence-Ready

Biotechnology: DNA Technology & Genomics

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

History of DNA Sequencing & Current Applications

OriGene Technologies, Inc. MicroRNA analysis: Detection, Perturbation, and Target Validation

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

Techniques in Molecular Biology (to study the function of genes)

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Keeping up with DNA technologies

TGC AT YOUR SERVICE. Taking your research to the next generation

FOR REFERENCE PURPOSES

A Tutorial in Genetic Sequence Classification Tools and Techniques

Deep Sequencing Data Analysis

The Steps. 1. Transcription. 2. Transferal. 3. Translation

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

17 July 2014 WEB-SERVER MANUAL. Contact: Michael Hackenberg

Welcome to the Plant Breeding and Genomics Webinar Series

TruSeq Custom Amplicon v1.5

The world of non-coding RNA. Espen Enerly

PAGANTEC: OPENMP PARALLEL ERROR CORRECTION FOR NEXT-GENERATION SEQUENCING DATA

DNA Sequencing & The Human Genome Project

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

Vector NTI Advance 11 Quick Start Guide

Addressing the Black Box Phenomenon of Genome Sequencing and Assembly

MiSeq: Imaging and Base Calling

restriction enzymes 350 Home R. Ward: Spring 2001

Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison

DNA Sequencing and Personalised Medicine

Searching Nucleotide Databases

Bioinformatics Unit Department of Biological Services. Get to know us

Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing

An Overview of DNA Sequencing

Overview sequence projects

Intro to Bioinformatics

Core Facility Genomics

Bioanalyzer Applications for

Next Generation Sequencing; Technologies, applications and data analysis

BIG DATA BIG DATA 8/1/12. Cool Informa+cs Tools and Services for Biomedical Research. David Ruau, PhD. August 1 st, 2012

Next generation sequencing (NGS)

An Introduction to Next-Generation Sequencing Technology

Gene Expression Assays

IIID 14. Biotechnology in Fish Disease Diagnostics: Application of the Polymerase Chain Reaction (PCR)

3. About R2oDNA Designer

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

ONLINE SUPPLEMENTAL MATERIAL. Allele-Specific Expression of Angiotensinogen in Human Subcutaneous Adipose Tissue

CCR Biology - Chapter 9 Practice Test - Summer 2012

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Frequently Asked Questions Next Generation Sequencing

DNA, RNA, Protein synthesis, and Mutations. Chapters

Transcription:

Next Generation Sequencing for Invertebrate Virus Discovery -a practical approach Sijun Liu & Bryony C. Bonning Iowa State University, USA 8-14-2013 SIP Pittsburgh

Outline Introduction: Why use NGS? Traditional approach for virus discovery Next Generation Sequencing (NGS) Advantages of NGS for virus discovery How it s done Sample selection Sequencing library preparation Sequencing method Assembly of sequencing reads Identification of viral sequence Assembly of viral genome

Insect viruses detected/discovered by use of NGS Liu S, Vijayendran D, Bonning BC. 2011. 3(10):1849-69.

Traditional Approach for Virus Discovery Collect samples that show disease symptoms Isolate viruses Observe virus particles Identify viral genomes Clone genomic DNA/RNA - sequence (Sanger sequencing) - assemble viral genome

Advantages of NGS for Virus Discovery Many viruses are latent or asymptomatic NGS can identify viral sequences without background information on viruses Viral genomes are assembled de novo without reference sequences NGS has revolutionized virus discovery

Aphis glycines virus (AGV) -assembled from transcriptome VPg Similar to tetraviruses P 10 prdrp (P145) CP UAG * P28 RTD (?) P35 Structurally resemble luteoviruses (plant virus) 4795 nt CP RTD sgrna (?) A new insect virus with tetravirus-like RdRp, and plant virus-like capsid protein

Outline Introduction: Why use NGS? Traditional approach for virus discovery Next Generation Sequencing (NGS) Advantages of NGS for virus discovery How it s done Sample selection Sequencing library preparation Sequencing method Assembly of sequencing reads Identification of viral sequence Assembly of viral genome

Sample Selection Small sample size (10 ug or less RNA adequate) -but the more the better Tissue vs. whole organism -sequencing depth Virus purification -helps to identify full-length sequence -better approach for DNA viruses

Sequencing Technologies o Short reads (35-250 nt) 1. Genome Analyzer IIx (GAIIx), HiSeq2000, HiSeq2500, MiSeq Illumina (Hiseq2000: capable of up to 600Gb per run) 1. SOLiD 5500xl System Applied Biosystems 2. HeliScope Single Molecule Sequencer - Helicos o Long reads (400-20,000 nt) 1. Genome Sequencer FLX System (454) Roche 2. PacBio RS - Pacific Bioscience 3. Personal Genome Machine, Ion Proton - Ion Torrent 4. GridION Oxford Nanopore

Preparation of Sequencing Library Library type Viral genomes Sequence recovery mrna* DNA/RNA +++, possible full-length Small RNA DNA/RNA +/++ DNA DNA +++, possible full-length DNA or RNA isolated from viruses DNA/RNA +++++, full-length *mrna purification may result in loss of sequences for viruses that lack polya tails

AGV assembled from different sequencing samples RNA isolated from gut RNA isolated from whole aphid with 2 rounds polya purification RNA isolated from whole aphid with 1 round polya purification Green: + strand; Red: - strand

NGS for Virus Discovery RNA/DNA/small RNA Reads Assembly DNA/RNA contigs Nucleotide database By BlastN Protein database By BlastX Host Genome Known viruses New viruses in known genera New viruses in new genera PCR RT-PCR Modified from Ding & Lu 2011 Curr. Opin. Virol. 1:533-544 Complete viral genomes

Assembly of Sequencing Reads -pre-processing of sequence data Remove potential adaptor / index sequences Check sequencing quality Quality score; GC content Read length distribution Overrepresented sequences etc. If necessary trim bases with low quality

Trimming of Bases with Low QS

Trimming of bases with low quality scores may result in loss of viral sequences NOTE: The near full-length genome of AGV was assembled from an untrimmed data set with poor quality scores. The genome could not be assembled from the data set following standard trimming.

Software for Checking Sequence Quality-FastQC

Overrepresented Sequences Sequence Count Percentage Possible Source AGATCGGAAGAG CACACGTCTGAAC TCCAGTCACCTTG TAATCTCGTATG 1968861 2.20 TruSeq Adapter, Index 12 (100% over 49bp) Sequence Count Percentage Possible Source CAGATTTCGGGCTAAAGGGAATACGGTTAAAATC CCGTGACCTGCCCTGT TCAGATTTCGGGCTAAAGGGAATACGGTTAAAATC CCGTGACCTGCCCTG 51018488 40.90 No Hit 24264170 19.45 No Hit The seqeunces were derived from Penaeus vannamei 18S ribosomal RNA -cotaminated in srna

Software for manipulating sequencing data

CLC Genomics Workbench (US$5,000 per copy, >US$1,000/per year for update)

Assembly of Sequencing Reads de novo assembly or mapping (alignment) -de novo assembly: searching for new viruses, no reference is needed -mapping: re-sequencing, SNP, isolate, need reference sequences (MARA, GATK and other toolkits) de novo assembly may provide extra information about known viral sequences Shrimp virus: Infectious myonecrosis virus (IMNV, a dsrna virus) - documented seq: 7560 bp (Poulos et al., JGV, 2006 87: 987-996) - de novo assembled from RNA-seq: 8233 bp RT-PCR proved IMNV should have at least 8233 bp Thursday 9:45 am 168 Virus 4 ; Duan Loy

Trinity for Assembly

Oases/Velvet for Assembly

Running the Assembly Program Two most important parameters for assembly K-mers (word length): length of sequence fragments used for joining C - coverage cut-off Different combinations of K and C will result in assembly of different contigs Multiple K and C should be tested for best results (Liu et al. PLoS One. 2012;7(9):e45161. doi: 10.1371/journal.pone)

Multiple K Test for Assembly of AGV using Oases/Velvet (Here) read = contig Green: + strand; Red: - strand

Data Analysis How do we find viral sequences? Annotation of contigs -search for viral genes using BLASTx or BLASTn BLAST against NCBI database BLAST using your own databases Blast2GO platform -annotation of contigs -motif search -analysis of annotation data

Data Analysis Analyzing virus-derived contigs Extract BLAST data (sequences with virus as top hit) Organize contigs that hit the same or similar viruses Join contigs into viral genome Design primers for PCR/RT-PCR to fill sequence gaps Sequence to confirm in silico cloning result 5 and 3 RACE to identify end sequences

Working with Viral Contigs viral gene == virus

Trinity Assembly of APV2 (>9800 nt) Assembled using srna isolated from pea aphid + strand 7815 8536 8523 8730 1 244 2295 2679 4486 2656 4084 5079 6839 7221 8874 9193 9476 9737 340 522 501 2315 - strand 4062 4508 5056 7310 6337 7539 6377 6759 7538 7840 9307 9495 7539 9737 7538 7310 7221 6839 6759 6377 6337 340 244 APV2-Acyrthosiphon pisum virus 2 (dicistrovrius)

Summary No single rule can be used to find a virus by NGS Knowledge of virology can greatly help for analyzing NGS data Manual alignment of virus derived sequences may be needed Biological evidence is required for verifying true nature of viral sequences discovered by NGS

John K. VanDyk Lyric Bartholomay Duan Loy Acknowledgements