DNA sequencing is the process of determining the precise order of the nucleotide bases in a particular DNA molecule. In 1974, two methods of DNA



Similar documents
DNA Sequencing Overview

1/12 Dideoxy DNA Sequencing

- In , Allan Maxam and walter Gilbert devised the first method for sequencing DNA fragments containing up to ~ 500 nucleotides.

LESSON 9. Analyzing DNA Sequences and DNA Barcoding. Introduction. Learning Objectives

Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne

Introduction. Preparation of Template DNA

What is a contig? What are the contig assembly programs?

Vector NTI Advance 11 Quick Start Guide

Analyzing A DNA Sequence Chromatogram

RESTRICTION DIGESTS Based on a handout originally available at

The Techniques of Molecular Biology: Forensic DNA Fingerprinting

PrimeSTAR HS DNA Polymerase

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Bioinformatics Resources at a Glance

Introduction to Bioinformatics 3. DNA editing and contig assembly

Procedures For DNA Sequencing

DNA sequencing. Dideoxy-terminating sequencing or Sanger dideoxy sequencing

Sequencing the Human Genome

ID kit. imegen Anchovies II. and E. japonicus) DNA detection by. User manual. Anchovies species (E. encrasicolus. sequencing.

Troubleshooting Sequencing Data

Lab 5: DNA Fingerprinting

User Support Manual KIDS IEP AND DATA MANAGEMENT SOFTWARE PROGRAM. Customized Relational Technology, Inc.

Ansur Test Executive. Users Manual

Guide to using the Bio Rad CFX96 Real Time PCR Machine

CUSTOM DNA SEQUENCING SERVICES

Windows Movie Maker 2012

EZ Load Molecular Rulers. Catalog Numbers bp bp bp PCR bp kb Precision Mass

Microsoft Access 2010 handout

The Biotechnology Education Company

TaqMan Fast Advanced Master Mix. Protocol

An Overview of DNA Sequencing

GenBank, Entrez, & FASTA

Description: Molecular Biology Services and DNA Sequencing

Sequencing Guidelines Adapted from ABI BigDye Terminator v3.1 Cycle Sequencing Kit and Roswell Park Cancer Institute Core Laboratory website

DNA Sequence Analysis

Mir-X mirna First-Strand Synthesis Kit User Manual

Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102

Clone Manager. Getting Started

Mitochondrial DNA Analysis

Crime Scenes and Genes

DNA Sequencing Troubleshooting Guide.

Version 5.0 Release Notes

Dreamweaver and Fireworks MX Integration Brian Hogan

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Beginning PowerPoint: Hands-On Exercise (Windows XP) Regent University

Intellect Platform - Tables and Templates Basic Document Management System - A101

Getting Started Guide

Custom Reporting System User Guide

PicoMaxx High Fidelity PCR System

GENE CONSTRUCTION KIT 4

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

Visualization of Phylogenetic Trees and Metadata

(These instructions are only meant to get you started. They do not include advanced features.)

Cloning Blunt-End Pfu DNA Polymerase- Generated PCR Fragments into pgem -T Vector Systems

Technical Note. Roche Applied Science. No. LC 19/2004. Color Compensation

QIAsymphony Management Console User Manual

Introduction to next-generation sequencing data

July 7th 2009 DNA sequencing

Using the GroupWise Client

ICP Data Entry Module Training document. HHC Data Entry Module Training Document

CompleteⅡ 1st strand cdna Synthesis Kit

NJCU WEBSITE TRAINING MANUAL

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

USER GUIDE. Unit 2: Synergy. Chapter 2: Using Schoolwires Synergy

DNA Sequencing Troubleshooting Guide

Creating and Using Links and Bookmarks in PDF Documents

Create a New Database in Access 2010

Chapter 4: Website Basics

Plexor Systems Instrument Setup and Data Analysis for the Roche LightCycler 2.0 and LightCycler 1.5 Systems using LightCycler Software Version 4.

Setting up VPN and Remote Desktop for Home Use

Setting up VPN and Remote Desktop for Home Use

Chapter 15: Forms. User Guide. 1 P a g e

PyroPhage 3173 DNA Polymerase, Exonuclease Minus (Exo-)

Cleaning your Windows 7, Windows XP and Macintosh OSX Computers

Troubleshooting for PCR and multiplex PCR

DNA SEQUENCING SANGER: TECHNICALS SOLUTIONS GUIDE

NDA ISSUE 1 STOCK # CallCenterWorX-Enterprise IMX MAT Quick Reference Guide MAY, NEC America, Inc.

Beckman Coulter DTX 880 Multimode Detector Bergen County Technical Schools Stem Cell Lab

mircute mirna qpcr Detection Kit (SYBR Green)

Concepts and methods in sequencing and genome assembly

First Strand cdna Synthesis

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation.

NDSU Technology Learning & Media Center. Introduction to Google Sites

, SHAPE Services

Plexor Systems Instrument Setup and Data Analysis for the Applied Biosystems 7300 and 7500 Real-Time PCR Systems

HiPer RT-PCR Teaching Kit

Mass Frontier 7.0 Quick Start Guide

So you want to create an a Friend action

DNA Detection. Chapter 13

Wellesley College Alumnae Association. Volunteer Instructions for Template

DNA SEQUENCING (using an ABI automated sequencer)

WINDOWS LIVE MAIL FEATURES

SERVICE REQUEST OF DNA SEQUENCING

Artisan Scientific is You~ Source for: Quality New and Certified-Used/Pre:-awned ECJuiflment

Troubleshooting Guide for DNA Electrophoresis

Determining the Quantity of Iron in a Vitamin Tablet. Evaluation copy

2010 Document Template Administration. User Guide. Document Template Administration

PowerPoint 2007: Basics Learning Guide

To successfully initialize Microsoft Outlook (Outlook) the first time, settings need to be verified.

AppShore Premium Edition Campaigns How to Guide. Release 2.1

Transcription:

BIO440 Genetics Laboratory DNA sequencing DNA sequencing is the process of determining the precise order of the nucleotide bases in a particular DNA molecule. In 1974, two methods of DNA sequencing were independently developed. Maxam and Gilbert used a chemical cleavage protocol, while Fred Sanger designed a procedure similar to DNA replication. Both teams shared the 1980 Nobel Prize, but Sanger s method became the standard because of its practicality. This was Sanger's second Nobel prize - his first was for figuring out how to determine the sequence of amino acids in proteins. The Sanger method involves creating DNA fragments terminated with dideoxynucleotides (ddntp). These ddntps lack a 3'OH on the deoxyribose, which prevents DNA polymerase from adding more nucleotides (this method is also called "chain termination sequencing"). Traditionally, this involved performing four separate reactions (one for each of the 4 bases). The DNA fragments generated from each dideoxy reaction were separated by gel electrophoresis under conditions that allow DNA molecules that differed in length by only one nucleotide to be resolved. To visualize the DNA, it was typically labeled with 32 P or 35 S. This method is depicted on the next page. Today, several modifications of the Sanger method allow us to sequence DNA much faster. Instead of using radioactively labeled nucleotides, we now use fluorescent labels. Thus, we can now run all 4 ddntp reactions in one lane of a gel instead of 4 separate lanes. Each fragment ending in a dideoxya (dda) is labeled with a red fluor, those ending in ddt are labeled with a yellow fluor, etc. Additionally, this lets us do all four dideoxy reactions in one tube simultaneously, because the different fluorescent dyes are attached only to the ddntps. Every fragment that gets terminated with a dda is thus labeled with the red dye that is attached to that ddntp. This method is called fluorescent dye-terminator cycle sequencing, and it uses PCR to incorporate ddntps in a primer extension sequencing reaction. The PCR reaction consists of DNA template, primer, a special DNA polymerase, unlabeled dntp's, fluorescently labeled ddntp's, and buffer. When the PCR is complete, the reaction mix contains a population of PCR fragments of different lengths, each terminating in a fluorescent-dye-containing ddntp. Each ddntp base contains a different fluorescent dye that emits a characteristic wavelength, thus the identity of the dye corresponds to the final base on that fragment. The entire reaction is run in a single lane on a polyacrylamide gel, so that the fragments separate according to size. The fragments run past a laser detector at the bottom of the gel, and the emission wavelength of each fragment is recorded. This is depicted below.

The sequence data is usually converted to a chromatogram form, and various software programs allow the rapid analysis of the.scf (standard chromatogram format) files of these chromatograms. A sample chromatogram is depicted below. As you can see each of the bases emits at a different wavelength, and the chromatogram can be read from left to right (early to late, hence 5' to 3' relative to the newly synthesized strand). Sequencing reaction for the Licor DNA sequencer. The automated sequencer that we have lets us sequence a plasmid insert in two directions. We need to set up 4 PCR reactions, each of which contains a different ddntp. In our PCR reactions, the forward primer and the reverse primer are each labeled with a different fluorescent dye. Thus, we can read the sequence of one strand in one direction and the sequence of the other strand in the other direction. This is called Simultaneous Bidirectional Sequencing. Each PCR reaction is 6 µl in volume. The molar amount of template used is based on the size of the insert between the priming sites. This equals the size of the cloned insert + 100 bp. The table below gives guidelines: Insert Size (bp) amount of template desired 300-600 bp 50-100 femtomoles 600-1200 bp 125-225 fmol 1200-1800 bp 250-300 fmol >1800 bp 300-500 fmol The mass corresponding to fmol amounts of 500 bp and 1000 bp inserts is shown in the table below Template required 500 bp insert 1 kb insert 50 fmol 17 ng 33 ng 100 fmol 33 ng 66 ng 150 fmol 50 ng 100 ng 200 fmol 67 ng 135 ng 250 fmol 82 ng 165 ng 300 fmol 100 ng 200 ng

Add the following components to a 0.2 ml tube to prepare the template/primer mix for each template: dsdna (your plasmid) µl 700nm-emitting forward primer (1 pmol/µl) 1.5 µl 800nm-emitting reverse primer (1 pmol/µl) 1.5 µl sterile distilled Water µl Total Volume 13 µl 2. Label a set of 4 tubes A,C,G, or T. Add 3 µl of the A reagent to tube A, 3 µl of the T reagent to tube T, etc. This has been done for you. 3. Mix your plasmid template/primer mixture by gently pipetting up and down. Add 3 µl to each of the 4 tubes, using a new tip for each addition. After addition, mix your template/primer/reagent mixture by gently pipetting up and down twice. 4. Cap your set of 8 tubes (make sure caps are all the way down on each tube), and move to a thermalcycler at 4 C. 4. Begin PCR reaction. At end of PCR reaction, add 3 µl of loading dye/formamide stop solution to each reaction. Denature at 92 C for 2 minutes, then chill on ice. Load sequencing gel.

Observations and Analyses - DNA sequencing Due 10/18/07 (note: there is a second part of this observation and analysis that will be completed using software in class) Name: Plasmid number A260 of 1:20 dilution of plasmid: Concentration of undiluted plasmid DNA (ng/µl) Desired molar amount of template (from table) Volume of plasmid that gives desired mass µl Volume of water to use in sequencing reaction (10 µl - vol. plasmid) µl Additional questions: Examine the chromatograms of your sequenced plasmid. Describe the chromatograms. How does the quality of the two chromatograms change as you go from the beginning of the sequence to the end? What do these changes represent, physically why does the quality change?

2. What is the function of the primer in a set of 4 sequencing reactions? 3. What role does the gel play in the sequencing process? 4. The structure of AZT, which is used in treating HIV infections, is shown below. Based on what you know about DNA sequencing, how do you think AZT works to stop the spread of HIV? 4. Draw a gel below, with the bands that you would expect to see if you sequenced the following template: template 3' G A C T G A A G C T G A 5' primer 5' C T G A 3'

BIO440 Fall 2007 DNA Sequencing Results Part 2 In this part of the project, we will start with raw data from the LiCor sequencer. We will clean up the raw data, and then determine whether or not our sequences really are 16S rrna sequences. If they are 16S rrna sequences, we will determine what kind of organisms they came from, and hence, an idea of the phylogenetic diversity of isolates from Boiling Springs Lake. For the purposes of this project, you are to create an electronic copy of your analysis, and submit it via email to your beloved instructor. This should be a word document entitled 'XXX(your initials)seqanalysis'. I.e., MSWseqanalysis. There is a form on the class website to use for this - you can download it and then type into the spaces. There are italicized, bolded regions where you are to fill in your results. This exercise should introduce you to the type of data that you will get and the kinds of analysis you will have to carry out in order to interpret your results. It will also introduce you to two different types of sequence analysis and manipulation software: Sequencher, an intuitive but unreasonably expensive program; and Staden, a powerful and free but non-intuitive package of programs. You will learn how to use the software on example files that are on your desktop, and then once you have defined your own scfs from the sequencer upstairs you will analyze youur actual data. Note: The protocols below are generalized protocols. Because this is real data and (and a real research project) not all of the sequences will necessarily conform exactly to this process. You may need to be creative/try a few different approaches in order to get this to work. Remember that patience is a virtue. Cleaning up the sequence. We are going to start with the raw data from the sequencer. This data consists of the sequence of both strands of our insert (and sometimes of the plasmid vector). Summary. We want to: -Open Sequencher, the sequence analysis software. -Open up the forward and reverse sequence data files. -Align the two sequences.

-Trim away vector sequences and any poorly sequenced regions at the ends. Note: This may actually be the most problematic part of the entire process. Details. -Open Sequencher, the sequence analysis software. The icon for this program is located in the pop-up menu. We will need to use demo mode for this process, if we do it all at once (and you can't save your results in demo mode). This first time I just want you to understand how the software works, so use the demo version so that you know what to expect. Open Sequencher, start a new project, and Under File, select import sequences. The files are in the folder on the desktop entitled BIO440 sequencing, that contains single curve files (chromatograms from the sequencing gels). To see these files in sequencher you will also need to change the Files of Type" box at the bottom of the select screen from *.ABI to all -Open up the forward and reverse sequence data files for your first sequence. The files will end in '.ab1'. Files that you generate from the LiCor Sequencer upstairs will have the ending '.scf' Each group should clean up one of the sequences in this project to become familiar with the Sequencher software, and (if possible) save the final cleaned consensus sequence in a word document. You will then compare your cleaned sequence to my version. Align the forward and reverse sequences. This is actually an alignment between one of the sequences and the reverse complement of the other sequence. This is done by highlighting the two sequences and selecting the Assemble automatically button. You should get a screen that has a Contig [0001] icon. Select this icon. If you don t get the icon, go to Assembly parameters, and slide the minimum match percentage bar a little to the left, then try again. When you select the contig, you should get a diagram displaying the alignment. Take a look at the diagram and see if it makes sense. If it does, then select Bases for a more detailed view of the alignment. At the top of the screen, the alignment of the two sequences will be displayed. At the bottom of the screen, the consensus sequence will be

displayed. Where the two sequences are in perfect agreement, the consensus sequence is unmarked. Where there is a discrepancy between the two sequences, the consensus sequence is marked with an asterisk. There may be many discrepancies at the beginning and the end of the consensus sequence, because these regions represent the very ends (poor quality) of one or the other of the sequencing reactions. Scroll through the sequence to verify this. Where one reaction is of the highest quality, the other reaction is of the lowest quality. Go to the middle region of the sequence, where there are few asterisks. Using your cursor, select a base on the consensus sequence that is NOT marked with an asterisk. Then select Show chromatograms to see the chromatograms representing the raw sequence data. The two chromatograms should agree well. By changing the base that is selected in the consensus sequence, you can examine how the chromatograms change in the different parts of the sequence. How does the quality of the two chromatograms change as you go from the beginning of the consensus sequence to the end? What do these changes represent, physically why does the quality change? Now you want to trim away the poorly sequenced regions at the ends of the sequence. These are the regions with numerous asterisks. (note that the sequence for one of the two reference sequences is probably of very high quality [don't trim], and for the other sequence of very low quality[get rid of]). To do this, use the cursor to highlight the poor-quality data, and delete it. If Sequencher asks you if you want to 'fill from left(or right)', select yes. Use the chromatograms to try to resolve any discrepancies between the remaining portion of the sequence. If you can, try to obtain at least 900 bp of consensus sequence. -Trim away vector sequences. To do this, we will search for the primer sequences -- i.e. the primers that you used for the original 16S amplification. The primers are degenerate, that is, there are some places where there is more than one nucleotide in the 'conserved' target site. For example, the forward primer is 5'cctacgggrsgcagcag 3' where the R stands for a purine (A,G) and the s represents "Strongly H-bonding" (C, G) I realize that this is a somewhat clunky approach - can you think of a better way to do this?

All of the strains were produced using 341F as the forward primer...some had 1525R as the reverse primer and others had 1391R as the reverse primer. primer sequences =1391R 5' gacgggcggtgtgtgc 3' 5' gacgggcggtgtgtac 3' in opposite orientation, this = 5 gcacacaccgcccgtc 3 5 gtacacaccgcccgtc 3 primer sequences = 341f 5' CCT ACG GGR SGC AGC AG 3' note this primer is a mixture of 4 slightly different primers What are the 4 primer sequences? What is the reverse complement of these sequences? After trimming these sequences away, paste the final cleaned consensus sequence into your document. Analyzing sequence data using the Staden program Using Staden We can use the Staden programs to process sequence data instead of the Sequencher program. The advantages of using Staden are that it is free, the vector clipping function is easy and works, and you can use it on computers at home. Also, it s more powerful. However, Staden is not as user-friendly as Sequencher - it is somewhat clunky and takes getting used to. The Staden Package has 5 programs PreGap4 for removing vector sequences & aligning different sequence files Gap4 for studying aligned sequences Trev Trace viewer Spin a set of functions including looking for restriction sites, translating nt to aa, and some alignment functions Console I don t yet know what this does To clip vector sequences and remove poor quality sequencing data from a raw sequence To start: Need a folder on the desktop that has the sequences you want to analyze and the file pgemt.txt (this file is a text file (not a word file) with the sequence of the PgemT vector. The sequences you want to analyze can be in.ab1 format or.scf. You need to know the name of this folder, i.e. My BIO440 seqs Using the Start prompt, select programs.staden package.pregap 4

This should open a program that gives you a window with a big blank screen with three folder tabs up top. These tabs have the names Files to Process, Configure Modules and Textual output Under Configure Modules set up the general configuration by making sure that the following boxes are checked (may be a good idea to uncheck the other boxes): Estimate base accuracies Trace format conversion Initialize experiment files Augment experiment files Quality clip Screen for unclipped vector Cloning vector clip With Cloning vector clip, you want to deselect it and then reselect it. When reselected, on the right hand side of the window you will be prompted to select a cloning vector. Using the browse button, select the pgemt.txt file. Then select save these parameters. Gap4 Shotgun Assembly After selecting Gap4 shotgun assembly, the program that will align your forward and reverse sequences, you need to select create new database and where it says GAP4database name, type Align. When looking at your actual sequence data, you might type your strain name (i.e. 341F17) instead of Align. Then select save these parameters. Under Files to Process select Add files A new window opens Change Files of type ABI(*.ab1) to Files of Type Any *.* In that window, select Desktop from the icons on the left hand side, and then select your folder (ie My 440 Seqs) Then highlight the files you want to analyze and select open, or doubleclick. For example, you might have the files sample1.ab1, sample2.ab1, and HSU1.scf and HSU2.scf Then, click run. Some stuff should happen, and you should see the phrase Processing finished Using Trev to view vector clip, quality clip, and to edit your sequence Now go to the desktop and look in your folder (My seqs) pregap4 has created some new files. You are interested in the.exp files. There should be a.exp file for each of the sequences that you analyzed. To look at one, doubleclick the.exp file, and it will open in Trev. You may have to open Trev first, then open your file. The sequence should be color-coded the crosshatched area is bad sequence, the pink area = vector, the light grey area = good sequence data and the dark grey area = not as good seq data. You might not see any vector on the practice files, but should see some on your experimental files. Under view, select display edits. Under Edit, select sequence. This will cause a new line of sequence data to appear the edit line. Go to the right hand side of the vector, and using the mouse select the rightmost nt of the vector. Use the delete (not backspace) key to remove the vector sequence, and/or poor quality sequence.

Then, go to the far right of the sequence file. Using the quality of the chromatogram, you can delete the poor quality region of the sequence file. Note that you are editing the.exp file and not the original.ab1 or.scf file. Then, save the.exp file under the file menu. Finally, select File save as Plain text and give it a name that corresponds to your sequence followed by the extension.txt. For example sample1.txt You should now have a text file that contains your sequence which has the vector and poor quality sequence data trimmed. Note that this is not your aligned sequence, but only your sequence in one direction. To look at your forward and reverse aligned sequences, we will use the program Gap4. Using GAP4 to examine your aligned sequences. On the desktop, select Start.Staden Package.Gap4. From the Gap4 window, select File Open. In the 440 sequencing folder you have been using, there should be a file with the name Align.O.aux. Open this document. Select Edit..Edit contig. Then select OK. The aligned bases should open in a new window. Select Settings Trace Display.Embed Traces size 5. Select Settings Highlight disagreements by background color. Now, edit your sequence I will leave the details of this up to your discretion and experimentation. Once your editing is complete, go to the left hand side of the consensus sequence and select the first nt. Then, holding down the Shift key, go to the right hand side and select the right-most T. Copy the selected sequence, and paste it into your word document. Once you have your new, edited sequence Finding the Closest Match in GenBank, and aligning the sequences. First, blastn search the nucleotide sequence to verify that it is a 16S sequence. Use the discontiguous megablast program at http://www.ncbi.nlm.nih.gov/ What is the closest match in GenBank? What is the Genbank accession number?what are the corresponding nucleotides in the GenBank sequence (this information can be obtained from looking at the alignments in the blast output). If the closest match isn t from a cultured organism (genus and species will be named), then what is the closest match which is to a cultured organism? What is the Genbank accession number? Does this appear to be a 16S sequence? List the publication information (i.e authors, title, and journal/date, if any). Also, list the information giving taxonomic details of the organism in the entry. What is the percent identity with the closest match, and with closest cultured organism? Across how many nucleotides?

Next, use the forward and reverse sequences alone (not the aligned forward and reverse sequence) and carry out discontiguous megablast searches on each of these two sequences. Did all three searches (forward, reverse, and aligned) yield the same result? Describe your results. Analysis in the Ribosomal Database Project. The RDP is at: http://rdp.cme.msu.edu/index.jsp We are going to perform a 'Sequence Match' analysis with the small subunit sequences in the database. This allows us to find the sequences most similar to our new sequence. Paste the sequence into the provided space, leave default settings as they are, and select submit sequences. See the 'Seq. Match Info' to interpret your results. In your own words, compare the output and utility of the blastn and the RDP analysis programs. Next, we will create an alignment with similar 16S sequences. At RDPII, Go to the 'Online Analyses' page, and use the 'Sequence Aligner' function. Click on run, cut and paste your sequence in space provided, choose HTML format as output and include 10 sequences. Leave other defaults as is. Examine the results. Is this a good alignment? Were gaps inserted? Were identities to other organisms apparent? Did the sequence match up to other sequences in the database? How closely? What do your results indicate? Next, use the 'Classifier' program at the RDP to assign a sequence to the taxonomical hierarchy at the RDP. Interpret the output. Print out the results or paste them into your report. Write a brief description of the organism that your sequence is likely to have come from. Do you think that it is likely this organism was isolated from Boiling Springs Lake, or do you think that this organism may represent a contaminant that was introduced during the isolation procedure?