4.2.1. What is a contig? 4.2.2. What are the contig assembly programs?



Similar documents
Introduction to Bioinformatics 3. DNA editing and contig assembly

Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing

Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne

DNA Sequencing Overview

Vector NTI Advance 11 Quick Start Guide

DNA sequencing is the process of determining the precise order of the nucleotide bases in a particular DNA molecule. In 1974, two methods of DNA

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

User Guide for the Genetic Analysis Lab Information Management System (dnalims)

Version 5.0 Release Notes

Introduction to GCG and SeqLab

Clone Manager. Getting Started

LESSON 9. Analyzing DNA Sequences and DNA Barcoding. Introduction. Learning Objectives

An Overview of DNA Sequencing

Searching Nucleotide Databases

UGENE Quick Start Guide

Software review. Vector NTI, a balanced all-in-one sequence analysis suite

AS Replaces Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Working with AppleScript

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Analysis of ChIP-seq data in Galaxy

Data Analysis Software

RESTRICTION DIGESTS Based on a handout originally available at

Reading DNA Sequences:

Surveyor. DNA Variant Analysis Software. Mutation. SoftGenetics LLC. v Innovation Blvd, Suite 235 State College PA USA 814/237/9340

Bioinformatics Resources at a Glance

Module 10: Bioinformatics

Monitor file integrity using MultiHasher

Description: Molecular Biology Services and DNA Sequencing

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

Sequencing Analysis Software Version 5.1

Learning Objectives:

Module 1. Sequence Formats and Retrieval. Charles Steward

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Mentor Tools tutorial Bold Browser Design Manager Design Architect Library Components Quicksim Creating and Compiling the VHDL Model.

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation.

Molecular Visualization. Introduction

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

A data management framework for the Fungal Tree of Life

Next Generation Sequencing: Technology, Mapping, and Analysis

Memory Management Simulation Interactive Lab

Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

The Biotechnology Education Company

Pro/E Design Animation Tutorial*

(A GUIDE for the Graphical User Interface (GUI) GDE)

DNA Sequence Analysis

Gene Synthesis & Protein Engineering News by DNA2.0 Inc. SEPTEMBER 2005

Tera Term Telnet. Introduction

Next generation sequencing (NGS)

- In , Allan Maxam and walter Gilbert devised the first method for sequencing DNA fragments containing up to ~ 500 nucleotides.

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

Introduction to Bioinformatics AS Laboratory Assignment 6

Basics Series Basics Version 9.0

Software review. Analysis for free: Comparing programs for sequence analysis

CHAPTER 6: SEARCHING AN ONLINE DATABASE

PyRy3D: a software tool for modeling of large macromolecular complexes MODELING OF STRUCTURES FOR LARGE MACROMOLECULAR COMPLEXES

HOW TO MAKE YOUR WEBSITE

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

DNA Sequencing & The Human Genome Project

14.3 Studying the Human Genome

SUBJECT: New Features in Version 5.3

How Sequencing Experiments Fail

GENE CONSTRUCTION KIT 4

Sequencing the Human Genome

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

DNA sequencing. Dideoxy-terminating sequencing or Sanger dideoxy sequencing

Genome Explorer For Comparative Genome Analysis

Overview of Genome Assembly Techniques

Structure Tools and Visualization

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Analyzing A DNA Sequence Chromatogram

Biological Sequence Data Formats

BIOLOMICS SOFTWARE & SERVICES GENERAL INFORMATION DOCUMENT

How To Use Syntheticys User Management On A Pc Or Mac Or Macbook Powerbook (For Mac) On A Computer Or Mac (For Pc Or Pc) On Your Computer Or Ipa (For Ipa) On An Pc Or Ipad

Introduction to next-generation sequencing data

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

454 Sequencing System Software Manual Version 2.6

DNA Core Facility: DNA Sequencing Guide

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

SPSS: Getting Started. For Windows

Chapter 11 Sharing and Reviewing Documents

Viewing and editing of Typhoon scanner images from 1D, 2D, and DiGE experiments at CIAN

MiSeq: Imaging and Base Calling

Committee on WIPO Standards (CWS)

A Primer of Genome Science THIRD

Rapid Acquisition of Unknown DNA Sequence Adjacent to a Known Segment by Multiplex Restriction Site PCR

Exercises for the UCSC Genome Browser Introduction

CHAPTER 20. GEOPAK Road > 3D Tools > 3D Modeling

Access Control and Audit Trail Software

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Getting Started Guide

Data formats and file conversions

Transcription:

Table of Contents 4.1. DNA Sequencing 4.1.1. Trace Viewer in GCG SeqLab Table. Box. Select the editor mode in the SeqLab main window. Import sequencer trace files from the File menu. Select the trace files in ABI or SCF format. Open the traces window to display the traces. The trace viewer window and the editor window. IUB standard nucleotide codes and their implementation in GCG/Staden. Online resources 4.2. Contig Assembly 4.2.1. What is a contig? 4.2.2. What are the contig assembly programs? 4.2.3. Workflow of contig assembly in GCG 4.2.4. Strategy for correct contig assembly Box. Workflow of contig assembly in GCG Online resources References Simon Lin, M.D. Duke Bioinformatics 919-681-9646 http://www.canctr.mc.duke.edu/bioinformatics Version: 10-18-1999

4.1 Introduction of DNA Sequencing The ABI sequencer (PE-Applied Biosystems, Foster City, CA) is one of the automatic DNA sequencers routinely used in molecular biology labs. All modern DNA sequencing relies on the Sanger method of DNA replication with dideoxy chain termination. The ABI sequencer utilizes a scanning laser to detect the fluorescence-labeled products as they electrophorese through a denaturing polyacrylamide gel. The signals collected are plotted as a series of color peaks representing the nucleotide sequence and are called chromatogram or electrophorogram. The process of transforming the chromatogram into sequence is referred to as base calling. Usually, the base calling process is done at the sequencing facility. However, under some special circumstance such as SNP detection, you might want use specialized programs to process the chromatogram yourself. Phrap and PloyPhrap are such base calling programs Usually, the results from the sequencing facility contain two kinds of files: the sequences and the chromatograms. The sequence file is in text format and can be easily viewed and analyzed in any bioinformatics programs, whereas a trace viewer is needed to view the chromatogram in ABI format. An example of the data produced by an automated sequencer. The peaks in different color for each base are read directly from left to right to determine the sequence. 4.1.1. Trace Viewer in GCG SeqLab To use the trace viewer in SeqLab, you should have an X-windows emulator. A free X- windows emulator can be obtained at the Duke OIT software library.

Select the editor mode in the SeqLab main window. Import sequencer trace files from the File menu.

Select the trace files in ABI or SCF format. In this example, 1012721 and 1012722 are trace files from and ABI sequencer, whereas 1012721.seq and 1012722.seq are the nucleotide sequence files, respectively. Click OK to import the sequence trace into the editor. Repeat this process until all traces files you want are imported, then click Cancel to exit the file selection box.

Open the traces window to display the traces. From the Windows menu, choose Traces. Then the trace viewer window should appear.

The trace viewer window and the editor window. IUB/GCG Meaning Complement Staden/Sanger A A T A C C G C G G C G T or U T A T M A or C K M R A or G Y R W A or T W W S C or G S S Y C or T R Y K G or T M K V A or C or G B V H A or C or T D H D A or G or T H D B C or G or T V B X/N G or A or T or C X/N N. or ~ gap character./~ - Table. IUB standard nucleotide codes and their implementation in GCG / Staden.

Online Reference Chapter 3. Working with the Trace Viewer. GCG SeqLab Guide http://www.canctr.mc.duke.edu/bioinformatics/gcg_documents/seqlab/03-working_with_trace_viewer.pdf X-Windows emulator From the Duke OIT software Library. http://www.oit.duke.edu/site/html/body_micro_x-win32.html 4.2. Introduction of Contig Assembly Contig assembly is a critical step in genome sequencing projects. It puts the jigsaws of fragmented sequences together. As the shortgun sequencing strategy being adopted in many sequencing projects, contig assembly became an area of more active research. Although you might not work on a large sequencing project, your knowledge of bioinformatics would not be complete if you do not know the basic concepts of sequence assembly. 4.2.1. What is a contig? Contig stands for contiguous sequence. It was first used by Staden (1980). Dr. Roger Staden is a pioneer in the study of fragment assembly. His work remains the basis of most sequence assembly programs nowadays. A contig is a collection of overlapped fragments, which includes an assembled consensus sequence for the entire group and the information of each individual sequence fragment. Contig assembly program will detect the overlap of many small sequence fragments and form a longer, contiguous consensus sequences. 4.2.2. Contig Assembly Programs TIGR Assembler Has been used in a number of megabase microbial genome projects at TIGR. Sutton G., White, O., Adams, M., and Kerlavage, A. (1995) TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science & Technology 1:9-19).

Gel Assemble Chapter V. DNA Sequencing and Contig Assembly http://www.tigr.org/softlab/ Phred/Phrap/Consed Including a base-caller, an assembler and an X-windows graphical interface, authored by Phil Green at Washington University. http://bozeman.genome.washington.edu/index.html Staden Package Complete package for sequencing, mutation detection, and sequence management. With X-windows interface. http://www.mrc-lmb.cam.ac.uk/pubseq/ (If you need any of the software above, please contact Simon Lin at 919-681-9646.) GCG Package Fragment Assembly System (FAS) in GCG. http://www.canctr.mc.duke.edu/bioinformatics/gcg_documents/gcg10_help_unix/contents/gelintroduction.html 4.2.3. Workflow of contig assembly in GCG GelStart GelEnter FAS Database /archive /working /consensus /relation Gel Merge Gel Disassemble GelView Workflow of contig assembly in GCG

After creates the fragment assemble project by gelstart, gelenter inputs the fragments into the FAS database. Gelmerge is the automatic contig assembler. Gelassemble is the post-assembly editing tool to resolve the ambiguities and conflicts in the automatic process by manual inspection. Gelview generates an overall view of the current status of the assembly project. Use common sense when you edit the contigs. Three kinds of errors can be corrected by manual editing: uncertainties, substitution errors, and frame shift errors. Frame shift error is more serious since it will completely change the deduced protein sequence from the position of error forward. This kind of error can usually be corrected by inspecting the chromatogram of alignment fragments. To get the consensus sequence of the contig, go to the command mode of gelassemble. The command prettyout writes an aligned output of fragments and the consensus similar to that of the GCG program pretty. If only the consensus sequence is needed, the command seqout can be used to generate the output. The program geldisassemble unmelds all assembled contigs in the current project and rebuilds a database consisting of the unjoined fragments. Although the details might vary, this workflow in GCG is generally applicable to all contig assembly software. 4.2.4. Hints and Common Mistakes You must run gelstart once to create or delete the sequence assembly project. And, you must run Gelstart once every time you wish to work on the project. GCG utilize a fragment assembly system (FAS) database to handle each assemble project. All files and directories under the project directory are in FAS database format. Do not add or delete files yourself in these directories. It will cause the FAS database corrupted. Use gelenter to add more fragments, and gelmerge/ gelassemble/ geldisassemble to modify the file contents. Remember, do not manipulate any file in the database with a UNIX text editor! You can manually resolve the discrepancies and correct the assembly errors by using gelassemble. You can also revise the errors in base calling of fragments if you have a graphical printout of ABI traces in hand. References Staden, R. 1980. A new computer method for the storage and manipulation of DNA gel reading data. Nucleic Acids Res. 8: 3673-3694.

Sutton G., White, O., Adams, M., and Kerlavage, A. (1995) TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science & Technology 1:9-19.