SMRT Analysis v2.2.0 Overview 100 338 400 01 1. SMRT Analysis v2.2.0 1.1 SMRT Analysis v2.2.0 Overview Welcome to Pacific Biosciences' SMRT Analysis v2.2.0 Overview
1.2 Contents This module will introduce the new key features and benefits in SMRT Analysis v2.2.0. We will discuss SMRT Portal Enhancements and then describe upgrade requirements. Finally, we will show additional resources available for your reference.
1.3 Key Features & Benefits SMRT Analysis v2.2.0 Key Features and Benefits
1.4 New & Improved Application Analysis SMRT Analysis v2.2.0 software enables PacBio users to perform a wider range of scientific analyses for their research question. The software now has improved functionality for the study of transcriptomes, haplotypes and phasing with long amplicons, and minor variants in genomic samples. In addition, we have incorporated significant speed improvements to PacBio's Hierarchical Genome Assembly Process (HGAP). Finally, we have made workflow enhancements to SMRT Portal, our graphical user interface.
1.5 Feature Enhancements SMRT Analysis 2.2.0 features a number of incremental advancements over previous releases. -For HGAP, notable improvements in 2.2.0 include a 10x improvement in assembly time for microbial genomes, as well as inclusion of the diploid version of Quiver (consensus calling algorithm). - The Long-Amplicon Analysis module now includes support for filtering chimeric sequences. - The Minor-Variant algorithm has been updated to include a more sophisticated variant caller that can detect minor variants with as low as.5% frequency. - The transcriptome-analysis module (Iso-Seq application) now includes full-length transcript QC and clustering steps, in addition to mapping to a reference sequence.
1.6 Iso Seq Module SMRT Analysis v2.2.0 Iso-Seq Module
1.7 Iso Seq Method Full Length Transcripts The workflow for the Iso-Seq protocol is shown in this slide. Full-length cdna synthesis from poly(a) RNA using the Clontech SMARTer PCR cdna Synthesis Kit, followed by size selection and conversion to SMRTbell libraries is shown here. Full-length cdna libraries can be produced from as little as 1 ng of poly(a) RNA, or 2 ng total RNA. Following PacBio sequencing, the PacBio Iso-Seq software protocol in SMRT determines which transcripts are sequenced to full length. Analysis A full-length assignment is made on the basis of -detection of the 5' -end primer, 5' UTR, ORF, 3' UTR, polya tail, and 3' primer in a single read. Similar reads can be clustered together to generate a consensus non-redundant isoform transcript to be used for evidence-based gene models.
1.8 QC Visualization of Full Length Transcripts The Iso-Seq module provides QC metrics that might be of interest to the user such as the number of reads of insert, mean read length, quality, and number of passes. The classify output feature displays a histogram of the distribution of read length as well as a breakdown of full-length reads, non-full-length reads, and filtered short reads. Also available is a cluster output that provides a summary of the length distribution of the non-redundant consensus sequence for isoform reads.
1.9 Long Amplicon Analysis SMRT Analysis v2.2.0 Long-Amplicon Analysis
1.10 Process for Long Amplicon Analysis Long-amplicon analysis has been updated with the implementation of chimera filtering. The workflow involves the following steps: 1. De-multiplexing of reads by Barcode 2. Finding overlaps between reads. Reads belonging to the same gene will cluster together. 3. Fine tuning those clusters at a SNP level 4. Recursively partitioning to determine different phases 5. Applying Quiver to determine haplotypes 6. Post-processing filters such as chimera filtering
1.11 Minor Variant Analysis SMRT Analysis 2.2.0 Minor-Variant Analysis
1.12 Detection of Minor Variants The minor variant module calls minor variants in a heterogeneous dataset that is mapped against a user-provided reference sequence. A major feature of the current version of the algorithm is that it learns an error model for the user's data. The algorithm works by aligning Reads of Insert sequences to the reference sequence. It then exhaustively scores alignments of query sequences to the reference using a fixed window. The output format consists of variant calls in vcf and csv formats. The algorithm is tuned for sensitivity: >99% sensitivity >98% specificity The algorithm can detect down to 0.5% abundance, data permitting.
1.13 Minor Variant Protocol Shown here is an example output from the Minor Variant protocol. The protocol is launched through SMRT Portal. A set of diagnostic plots is shown, and a VCF file with the frequency of each detected mutation is able to be exported from the "Analysis" sub-tab. Subsequent release versions will have support for phasing analysis.
1.14 HGAP SMRT Analysis v2.2.0 HGAP
1.15 Updated HGAP HGAP 2 is transitioned out of beta and contains bug fixes and algorithm improvements to HGAP 1 Microbial genome assembly now runs <1day. HGAP 3 (beta) is optimized for speed with up to 10-times shorter wall-clock time for microbial genomes. Command line HGAP using smrtpipe.py supports assembly of larger genomes.
1.16 HGAP: Marked Speed Improvements While Maintaining Comparable Assembly Quality As illustrated in the graph above, HGAP 3 shows significant improvement in speed compared to previous releases. The correction phase of the software was significantly improved from HGAP 1. The beta version, HGAP 3, incorporates further improvements in the draft phase to drive down assembly time.
1.17 Summary of Protocol Changes Shown here is a summary of protocol changes in SMRT Analysis v2.2.0. For the pre-assembly step, the pbdagcon algorithm drives much of the speed improvements in HGAP. Nomenclature changes regarding the Reads of Insert protocol now reflect a more sophisticated error model (Quiver) that works on sub-reads for consensus calling. Bar coding functionality has been added to the RS filter module, and Bridgemapper has been transitioned out of beta status.
1.18 SMRT Portal Improvements SMRT Analysis 2.2.0 SMRT Portal Improvements
1.19 Dialog Boxes We have added a new functionality to SMRT Portal that groups protocols by application area, thereby simplifying navigation. The user is presented with this dialog on the Design-Job page. The user has an option to opt out and not view this dialog box during future logins.
1.20 N50 Metric We are also implementing the N50 metric, which is a more informative measure of readlength distribution. N50 read length is the value at which 50% of the bases are derived from read lengths greater than or equal to this value.
1.21 Updated Quiver A new Quiver consensus-calling training is implemented in SMRT Analysis v2.2.0. With this improved training, consensus accuracy with P5-C3 improves by 5%. However, this new training does not affect P4-C2 chemistry. This new training was tested on three 20 kb libraries (E. coli, S. aureus, and R. palustris).
1.22 Upgrade Requirements SMRT Analysis v2.2.0 Software Availability and Upgrade Requirements
1.23 Software Availability SMRT Analysis v2.2.0 will be available from pacbiodevnet.com This is an upgrade of secondary-analysis software only. The user will not experience instrument downtime during the upgrade. Install documentation and release notes are available, as well as FAQs.
1.24 Upgrade Requirements Upgrade is supported from SMRT Analysis v2.1.1 only. User cannot skip level upgrades. Upgrade Requirements Supported Operating Systems: Ubuntu 12.04, 10.04, 8.04 CentOS 6.3, 5.6, 5.3 Bash, Linux Standard Base (LSB) No MySQL, Perl dependencies SMRT Analysis v2.2.0 is backwards compatible with previous versions. Links to documentation on PacBio DevNet pacbiodevnet.com Bundled MySQL server No external MySQL needed Single OS installation tarball
Patch tarball concept introduced Very small self-extracting tarball which can be used to workaround bugs found in the field (beta or production) Replaces manual fixes and edits by the customer Can run it as part of the installer or upgrader command line (i.e. can patch the installer/updater itself) Can also run after install/upgrade completed (to fix an existing install) Similar to patch tarball used for SMRT Analysis v2.1.1 1.25 Additional Resources Additional Resources
1.26 Resources Here are some additional resources, grouped by topic area.
1.27 Resources Some more resources for your interest.
1.28 Thank You Thank you for your participation. For more information, please contact your local PacBio Field Applications Scientist or PacBio Account Representative. www.pacificbiosciences.com