An Effect of a Shearing Process on the Re-Sequencing of the Arabidopsis thaliana Genome

Similar documents
Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

TruSeq Custom Amplicon v1.5

Data Analysis for Ion Torrent Sequencing

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

SEQUENCING. From Sample to Sequence-Ready

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation.

Vector NTI Advance 11 Quick Start Guide

Real-time PCR: Understanding C t

Automated Nucleic Acid Extraction WorkStation Faster, cleaner,

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Introduction to NGS data analysis

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

FOR REFERENCE PURPOSES

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Gene Expression Assays

360 Master Mix. , and a supplementary 360 GC Enhancer.

Next Generation Sequencing

2.500 Threshold e Threshold. Exponential phase. Cycle Number

Bioanalyzer Applications for

How To Use An Enzymatics Spark Dna Sample Prep Kit For Ion Torrent

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

How is genome sequencing done?

Whole genome Bisulfite Sequencing for Methylation Analysis Preparing Samples for the Illumina Sequencing Platform

Illumina Sequencing Technology

PreciseTM Whitepaper

DNA Integrity Number (DIN) For the Assessment of Genomic DNA Samples in Real-Time Quantitative PCR (qpcr) Experiments

Introduction To Real Time Quantitative PCR (qpcr)

PATHOGEN DETECTION SYSTEMS BY REAL TIME PCR. Results Interpretation Guide

DNA Technology Mapping a plasmid digesting How do restriction enzymes work?

Preparing Samples for Sequencing Genomic DNA

Objectives: Vocabulary:

Automated Lab Management for Illumina SeqLab

PicoMaxx High Fidelity PCR System

Paired-End Sample Preparation Guide

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Methylation Analysis Using Methylation-Sensitive HRM and DNA Sequencing

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

July 7th 2009 DNA sequencing

Rapid Aneuploidy and CNV Detection in Single Cells using the MiSeq System

Non-invasive prenatal detection of chromosome aneuploidies using next generation sequencing: First steps towards clinical application

Use of the Agilent 2100 Bioanalyzer and the DNA 500 LabChip in the Analysis of PCR Amplified Mitochondrial DNA Application

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

Terra PCR Direct Polymerase Mix User Manual

Content Sheet 7-1: Overview of Quality Control for Quantitative Tests

EZ Load Molecular Rulers. Catalog Numbers bp bp bp PCR bp kb Precision Mass

Forensic DNA Testing Terminology

Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds. Overview. Data Analysis Tutorial

50 g 650 L. *Average yields will vary depending upon a number of factors including type of phage, growth conditions used and developmental stage.

G E N OM I C S S E RV I C ES

Application Guide... 2

CCR Biology - Chapter 9 Practice Test - Summer 2012

BacReady TM Multiplex PCR System

restriction enzymes 350 Home R. Ward: Spring 2001

Amazing DNA facts. Hands-on DNA: A Question of Taste Amazing facts and quiz questions

The RNAi Consortium (TRC) Broad Institute

JIANGSU CARTMAY INDUSTRIAL CO.,LTD mail:

E. coli plasmid and gene profiling using Next Generation Sequencing

Fast. Integrated Genome Browser & DAS. Easy. Flexible. Free. bioviz.org/igb

Introduction to next-generation sequencing data

GRS Plasmid Purification Kit Transfection Grade GK (2 MaxiPreps)

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

AP Physics 1 and 2 Lab Investigations

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

Interaktionen von Nukleinsäuren und Proteinen

Welcome to this presentation on Thermal Characteristics of LEDs, part of OSRAM Opto Semiconductors LED Fundamentals series.

An Overview of Cells and Cell Research

ABSTRACT. Promega Corporation, Updated September Campbell-Staton, S.

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Software Getting Started Guide

TIANquick Mini Purification Kit

Expression Quantification (I)

HighPure Maxi Plasmid Kit

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

Mitochondrial DNA Analysis

DNA Sequencing Overview

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Improved methods for site-directed mutagenesis using Gibson Assembly TM Master Mix

Descriptive Statistics

Reduced Representation Bisulfite Sequencing for Methylation Analysis Preparing Samples for the Illumina Sequencing Platform

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Next Generation Sequencing: Technology, Mapping, and Analysis

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

FOR TEACHERS ONLY. The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION LIVING ENVIRONMENT

Real-Time PCR Vs. Traditional PCR

DNA and Forensic Science

CHROMOSOMES Dr. Fern Tsien, Dept. of Genetics, LSUHSC, NO, LA

For information regarding shipping specifications, please refer to the following link:

DNA Electrophoresis Lesson Plan

MLX BCG Buccal Cell Genomic DNA Extraction Kit. Performance Characteristics

Formalin fixation at low temperature better preserves nucleic acid integrity. Gianni Bussolati. University of Turin

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

CAMI Education linked to CAPS: Mathematics

Transcription:

APPLICATION Genomics CATEGORY Nucleic Acid Shearing ORGANIZATION Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Canada CHALLENGE Overall performance of DNA nebulization did not meet expectations and included DNA smears The lab s shift to sonication techniques solved some problems, but resulted in others, including a similar smear Failure to achieve adequate shearing and coverage through the use of sonication became prohibitively expensive for the laboratory SOLUTION After acquiring a Covaris S2 System with Adaptive Focused Acoustics (AFA) for DNA shearing, immediate improvements in range of fragment size and precision of output fragment size were observed Gel smears of large to small fragments of DNA from nebulization and sonication now became tight bands of DNA Sample bias experienced using the previous methods was eliminated, enabling an additional 20% of the Arabidopsis genome (that had been previously missed) to be readily sequenced. Covaris www.covarisinc.com INTRODUCTION Professor David Guttman is from the Centre for the Analysis of Genome Evolution and Function (CAGEF) at the University of Toronto. His laboratory uses whole genome sequencing and a range of genomic approaches to study the evolution of host specificity and virulence in plant pathogenic bacteria. FRAGMENTATION BIAS The failure to get effective shearing and coverage through the use of sonication was becoming prohibitively expensive. An Effect of a Shearing Process on the Re-Sequencing of the Arabidopsis thaliana Genome During the early stages of Next Generation Sequencing, nebulization was the standard and accepted approach for DNA shearing. According to Pauline Wang (CAGEF Laboratory Facility Manager), their group initially employed nebulization to fragment the Arabidopsis thaliana plant genome for sequencing. Over time, they began to suspect there was bias in genome coverage using this approach, largely because of their inability to reproducibly shear samples. The overall performance of nebulization did not meet expectations. 1, 2 For example, CAGEF observed a smear of fragment size ranges when they ran output DNA gels. As a result, a purification step needed to be added to the DNA preparation process to achieve the desired narrow fragment size range. This additional step, combined with the large sample losses typically experienced with nebulization, resulted in alarmingly low yields. Very little of the desired DNA fragment size remained, following this sample prep process. Low DNA yields (as well as added processing times resulting from this problem) have been observed and reported by other labs performing Next-Gen or whole genome sequencing 1,2. Due to their concerns about the potential for bias and low yields, the CAGEF lab switched An Effect of a Shearing Process on the Re-Sequencing of the Arabidopsis thaliana Genome Case Study 104 PN400104 1

FIGURE 1: Histogram showing the number of reads mapped at each position across the chloroplast genome of Arabidopsis thaliana summed over 1 Kb intervals. High variability in the read depth or coverage can be seen for the sample prepared using sonication (blue bars) while relatively uniform and consistent coverage is seen for the sample prepared using Covaris (orange line). Approximately 14 million mapped reads from each method were obtained from paired end, 38 base sequencing performed on an Illumina GAIIx. Total number of reads or read depth summed in 1Kb intervals is shown on the Y-axis. Position in base pairs along the chloroplast genome is shown on the X-axis. The Arabidopsis sequence was obtained from the TAIR9 release and mapped using the BWA aligner. Results were processed with SAMtools and plotted using R. to sonication using the Bioruptor (Diagenode, Inc.). The Bioruptor sonicator was initially viewed as an affordable upgrade from nebulization, with the promise of improved shearing performance. Its technology is similar to other ordinary laboratory bath sonicators, in that it uses relatively long wavelengths and unfocused acoustic energy. THE SONICATION EXPERIENCE CAGEF s shift to sonication brought its own set of challenges. Test shears needed to be performed to try to determine the best settings to achieve the desired DNA, and often the resulting output contained a smear with a range of large to small fragments. While some size selectivity could be achieved by changing the shearing conditions, gaps in coverage (due to shearing biases) were observed. It remained difficult for CAGEF to attain fragments of the desired size. At a cost of approximately $3,000 per Arabidopsis genome analyzed, any failure to achieve adequate and reproducible shearing and coverage through the use of conventional sonication became prohibitively expensive and prompted a study of the Covaris AFA system. FIGURE 1 ANALYSIS There are several factors which can affect sequence depth and rate of coverage such as PCR amplification, GC/AT percentage, copy number, and shearing efficiency. PCR Amplification: PCR amplification effectiveness can be impacted by GC content and repeats which tend not to amplify as well as other parts of the genome. While the GAIIx An Effect of a Shearing Process on the Re-Sequencing of the Arabidopsis thaliana Genome Case Study 104 PN400104 2

FIGURE 2: Histogram showing the overall distribution of read depths for each position of the chloroplast genome of Arabidopsis thaliana. Sonication shows an exponential distribution skewed towards low coverage. The amount of mapped reads is shown above the graph. Reads were obtained from paired-end, 38 base sequencing performed on an Illumina GAIIx. The frequency of read depth is shown on the Y-axis. The read depth or total number of reads contributing to base calls for each position of the chloroplast is shown on the X-axis. The Arabidopsis sequence was obtained from the TAIR9 release and mapped using the BWA aligner. Results were processed with SAMtools and plotted using R. library prep does involve amplification, the methodology has been optimized to not allow amplification to reach the exponential phase in order to avoid just such biases from occurring, making this an unlikely cause of the problem. GC/AT Percentage: Higher GC content requires a higher melting temperature. Prevalence of GC rich regions of the genome could result in gaps and lower coverage. However, when CAGEF plotted content against the gaps in coverage, no relationship was seen. Copy Number: The Chloroplast genome has a bacterial origin and is highly conserved, so it will lack copy number variations seen in higher organisms. Therefore, Copy Number Variation (CNV) is the least likely explanation for the problem. Shearing Efficiency: Of all potential factors, lack of shearing efficiency is the most likely cause of the sequencing coverage problems experienced by CAGEF. This was demonstrated when identical sequencing experiments (performed using DNA fragmented with Covaris AFA technology) were compared to samples processed with the sonicator. As seen in Figure 2 below, under identical conditions the Covaris DNA shows little or no skewing in this experiment. FIGURE 2 COVARIS An Effect of a Shearing Process on the Re-Sequencing of the Arabidopsis thaliana Genome Case Study 104 PN400104 3

FIGURE 3: Histograms showing the overall distribution of read depths for each position along chromosome 4 of the Arabidopsis thaliana genome (the same procedure that was applied to FIGURE 2 was applied here). FIGURE 3 The sample prepared using sonication shows an exponential distribution skewed towards low coverage (top graph) while the sample prepared using Covaris shows a normal distribution of read depths (lower graph). The same trends hold as were seen in Figure 2, but for a much larger genomic region. The Arabidopsis sequence was obtained from the TAIR9 release and mapped using the BWA aligner. Results were processed with SAMtools and plotted using R. Approximately 16 million reads were mapped for each method. Reads were obtained from paired-end, 38 base sequencing performed on an Illumina GAIIx. The frequency of each read depth is shown on the Y-axis. The read depth or total number of reads contributing to base calls for each position across chromosome 4 is shown on the X-axis. Note the almost perfect Gaussian distribution of read depth/frequency for the Covaris treated samples. COVARIS, THE ULTIMATE DNA SHEARING SOLUTION In June of 2010, the CAGEF laboratory acquired a Covaris S2 System, which uses Adaptive Focused Acoustics (AFA) for DNA shearing. With its highly controlled shearing conditions, immediate improvements were observed in both the fragment size range and the precision of output fragment size achieved. What was previously seen on a gel as a smear of large to small DNA fragments became a tight band of DNA. Most significantly, the bias that had been observed with previous methods had been eradicated with AFA. SUMMARY Considering all factors, CAGEF concluded that Covaris with AFA technology made the difference in coverage. Using the Covaris S2, CAGEF estimates that an additional 20% of the Arabidopsis genome that had been previously missed was now being readily and routinely sequenced (described in Figure 4 next page). An Effect of a Shearing Process on the Re-Sequencing of the Arabidopsis thaliana Genome Case Study 104 PN400104 4

FIGURE 4: Histograms showing the number of positions across chromosome 4 of Arabidopsis thaliana that have no reads mapped to them. FIGURE 4 Each bin corresponds to the number of bases with no reads mapped across a 62Kb stretch, indicated by the frequency shown on the Y-axis. The basepair position along chromosome 4 is shown on the X-axis. The total number of basepairs with no reads mapped (Total gaps) is shown for each method. In this experiment there are many more gaps present in the sequence data obtained from the sample using sonicator (top graph) versus Covaris (lower graph). The Arabidopsis sequence was obtained from the TAIR9 release and mapped using the BWA aligner. Results were processed with SAMtools and plotted using R. Approximately 16 million reads were mapped for each method. Reads were obtained from paired-end, 38 base sequencing performed on an Illumina GAIIx. The Ultimate DNA Shearing Solution: Because Sample Prep Matters How it Works: The Covaris Adaptive Focused Acoustics (AFA) technology is used for a variety of sample prep processes and purposes, from DNA Shearing to Tissue Homogenization. For DNA Shearing, the AFA technology is the Gold Standard due to its technological advantages over other sample prep methods, such as sonication and nebulization. Key advantages of AFA technology include reproducibility, versatility (DNA output across a wide size range), uniformity in fragment size distribution and isothermal, non-contact methodology. Highly controlled AFA energy from Covaris delivers unsurpassed shearing consistency and reproducibility. Covaris, Inc. 14 Gill Street, Unit H Woburn, Massachusetts 01801-1721 USA Web: www.covarisinc.com Tel: +1 781-932-3959 Fax: +1 781-932-8705 info@covarisinc.com ACKNOWLEDGEMENT Our thanks for data submitted to us by: Wang, PW, Austin, RS & DS Guttman, University of Toronto Centre for the Analysis of Genome Evolution & Function, unpublished data 1. Quail, M, et al, A Large Genome Center s Improvements to the Illumina Sequencing System, Nat. Meth., Vol 5, #12, Dec 2008 2. Fisher et al. Genome Biology 2011, 12:R1, A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries Covaris, Adaptive Focused Acoustics, and AFA are registered trademarks of Covaris, Inc. Bioruptor is a registered trademark of Diagenode, Inc. Illumina and GAIIx are registered trademarks of Illumina, Inc. An Effect of a Shearing Process on the Re-Sequencing of the Arabidopsis thaliana Genome Case Study 104 PN400104 5