Detecting DNA Base Modifications

Size: px
Start display at page:

Download "Detecting DNA Base Modifications"

Transcription

1 Detecting DNA SMRT Analysis of Microbial Methylomes Background Microbial genomes contain a variety of base modifications, most commonly occurring as methylation at adenine or cytosine residues. These modifications typically arise from RM (restrictionmodification) systems which serve as a defense mechanism in microbes, protecting the cell from invading bacteriophages or other foreign DNA 1. In general, an RM system comprises both a restriction enzyme (endonuclease) and a methylase that are specific to the same sequence motif. The endonuclease recognizes and cleaves at the motif in order to degrade foreign DNA. The bacteria s own DNA is protected from degradation, because the same sequence motif has been methylated, thus preventing cleavage by the restriction endonuclease. DNA modifications are also known to control other biological processes in bacteria, such as cell cycle and DNA replication, mismatch repair, gene expression, and pathogenicity 2,3. Full characterization of methylomes has been challenging, because motifs that are targeted for modification are extremely variable within and across species of bacteria, and most species contain more than one RM system 4. This is further complicated by bacterial conjugation, which allows horizontal transfer of mobile genetic elements between bacteria 5,6. With SMRT Sequencing, it is now possible to discover 7 modifications of novel sites, on a genome-wide scale, in a particular strain of bacteria. It can also be used to determine the modification sites for particular methyltransferases 8. In this document, we outline methods for base modification detection in microbial genomes using the SMRT Analysis software. We provide guidelines for sample preparation, sequencing, and downstream analyses, including the detection of modifications and subsequent motif analysis. The document assumes an understanding of base modification detection fundamentals as described in the Pacific Biosciences White Paper Detecting DNA Using SMRT Sequencing. Throughout this technical note we provide examples of each step. The data we have used for the examples is available on DevNet at Open the data set home page labeled Normal E. coli and then download the files Technote E coli Native Raw Reads HDF and E coli K12 MG1655 Mutated (FASTA). Import the SMRT Cell data from the Raw Reads HDF file into SMRT Portal, and import the FASTA file as a new reference sequence (see the SMRT Portal Online Help or the Secondary Analysis Web Services API for more information). You will then be ready to follow the examples in this document. Over time, new data sets will be posted to the web site that may also be used for practice with the analysis process. Methods When designing an experiment to detect base modifications on the PacBio RS, the number and types of SMRTbell libraries and the number of SMRT Cells are dictated by: (1) the amount of input DNA available; (2) the method used to calculate interpulse duration (IPD) ratios; (3) the particular modification you are analyzing; (4) the size of the genome and (5) whether Tet conversion is used to magnify the signature of 5-methylcytosine. These factors underlie the current coverage recommendations. For a description of basic terminology, see the Pacific Biosciences White Paper Detecting DNA Using SMRT Sequencing. Experiment Design Isolate DNA Template Preparation Sequencing Analysis Page 1

2 1. The amount of input DNA available. Adequate amounts of native (i.e., unamplified) DNA are required for methylation detection, since amplification effectively erases base modifications. When considering working with a limited sample, the coverage needs for the modification of interest are important. Smaller size inserts require less starting DNA for generating adequate sequence coverage for kinetic analysis. However, they will not be adequate for de novo assembly unless they are used in conjunction with a long insert library. 2. The method used to calculate IPD ratios. Base modification analysis, using kinetics, relies on sequence context normalization of the kinetics values. Currently, the primary metric used for this analysis is interpulse duration (IPD). This corresponds to the time required for a new base to bind in the active site of the sequencing polymerase after the previous base has been incorporated. We normalize by calculating the ratio of the IPD in the sample of interest to the IPD of a control to determine the IPD ratio. The default analysis mode, and the focus of this document, is to use a polymerase kinetics computational model to calculate IPD ratios. The computational model is called the in silico control. Internal studies at PacBio have shown that the IPD for a particular base incorporation depends on a sequence context spanning approximately 12 bases, which matches the binding footprint of the DNA polymerase. Currently, SMRT Analysis only supports modification identification using the in silico control. When using the in silico control, the detection accuracy may be increased by activating modification identification in SMRT Analysis. This analysis compares the modification signal to an additional computational model (of the expected positive signature) for three modification types: 6-mA, 4-mC, and Tetconverted 5-mC. Alternatively, IPD ratios may be calculated using an amplified control. This control is created by separately sequencing an amplified version of the genomic sample of interest, which will have base modifications erased by the amplification process. The amplified control will take an extra sequencing library to generate, but it can produce a lower background (statistical noise) than the in silico control as long as there is more than 80X coverage, per strand, of the amplified control library. This type of control may be useful when studying modifications other than 6-mA, 4- mc, and 5-mC. However, this type of control is not currently compatible with the modification identification feature of SMRT Analysis. Finally, IPD ratios may be calculated by comparing two different samples. In this case, two native DNA samples are sequenced separately and then compared to each other to detect differential modifications resulting from various growth conditions, or strain-to-strain comparisons of bacteria. Using this protocol, modifications shared between the strains will go undetected. In order to perform differential analysis of identified modifications and motifs, it is easiest to perform separate analyses using the in silico control for each condition or strain and then comparing the two outputs to each other. Using a native DNA control will help to locate regions of differential modifications, but it is not currently compatible with modification identification in SMRT Analysis. 3. The particular modification you are analyzing. Coverage requirements vary with modification type, due to differences between their kinetic signatures. For example, N 6 -methyladenine (6- ma) and N 4 -methylcytosine (4-mC) provide strong kinetic signals, while the signal from native 5-methylcytosine (5-mC) is weaker. Therefore, detecting native 5-mC requires higher coverage to achieve reliable detection. However, by using a Tet enzyme to oxidize 5- mc to 5-caryboxylcytosine (5-caC), the modification signal is enhanced to a level comparable to that of 6-mA and 4-mC 9. Page 2

3 For example, reliable detection of 6-mA, 4-mC or Tet-converted 5-mC requires approximately 25X coverage per strand. However, because of the smaller and more dispersed kinetic signature of native 5-mC, at least 10-fold higher coverage (250X) per strand is recommended for detection in that case. More information on the relationship between coverage and modification types and the accuracy of calls can be found in the Pacific Biosciences - Detecting and Identification of with Single Molecule Real Time Sequencing Data. Since SMRT Sequencing coverage across a genome closely follows the expected Poisson distribution, we recommend targeting an average of ~100X total coverage to ensure the lowest covered regions of the genome meet the threshold of 25X coverage per strand. Note, that if an amplified control is being used, the coverage requirements apply to both libraries (i.e., the native DNA sample and the amplified DNA control sample should each be sequenced to a total average coverage of 100X). 4. The size of the genome. The size of a genome, the SMRTbell library insert size, and the coverage required to detect the modification(s) of interest directly impact the number of SMRT Cells required. As an example, the 5 Mb E. coli genome would require 4-6 SMRT Cells for reliable detection of 6-mA, 4- mc or Tet-converted 5-mC. In this example, each SMRT Cell is assumed to yield approximately 100 Mb of mapped sequence. 5. Whether you are using Tet conversion to identify 5-mC. It has recently been shown that Tet will convert 5-mC to 5-caC 9. Used in SMRT Sequencing, this will amplify the kinetic signal and reduce the coverage required to detect 5-mC to a level similar to 6-mA or 4-mC, as noted in part 3 above. It is important to note that the current protocols for Tet conversion of 5-mC have off-target effects on 4-mC in some sequence contexts. Therefore, in order to simultaneously detect 4- mc and 5-mC, it is necessary to run a native sample separately from a Tet-converted sample. If only 6-mA and 4-mC are of interest, only the native sample needs to be run. If only 6-mA and 5-mC are of interest, only the Tet-converted sample needs to be run. Sample Preparation and PacBio RS Run Parameters Base modification detection requires that libraries be constructed from native genomic DNA. If the in silico control will be used, data must be generated using Sequencing Kit 2.0 (or C2 chemistry). If an analysis will be performed with an amplified control or native control, both samples must be sequenced using the same sequencing kit version. If Tet1 conversion is performed to identify 5-mC sites, see the Shared Protocol Guidelines for Using WiseGene 5-mC Tet1 Oxidation Kit for SMRT Sequencing on Sample Net ( This conversion should be done prior to SMRTbell library preparation. There are no additional requirements for preparing SMRTbell libraries for base modification detection. Multi-molecule analysis is adequate to perform motif analysis (i.e., there is no need to generate high Circular Consensus Sequence (CCS) coverage when analyzing bacterial methylomes). Sample preparation and PacBio RS run parameters will be influenced by other elements of the experimental design, including whether a de novo or resequencing workflow is used. SMRTbell Library Preparation If a de novo assembly (i.e., creation of a new genome reference sequence) of the microbe is being generated as part of the same experiment, libraries of longer inserts (at least 10 kb) are recommended to support accurate assembly. For more information, see the Pacific Biosciences Technical Note - De Novo and Hybrid Assembly. For a resequencing experiment (i.e., an alignment to a known reference), the key requirement is that the resulting reads be long enough to accurately map to the reference sequence, and that enough DNA is Page 3

4 available to construct libraries with inserts of a particular size. DNA damage repair will not affect modifications in the DNA such as 6-mA, 4-mC, 5- hmc, and 5-mC. For instructions on preparing SMRTbell libraries, see the Pacific Biosciences Guide - Template Preparation and Sequencing. When an experiment is designed to use an amplified control for IPD ratio analysis, a separate SMRTbell library must be prepared from whole-genome amplified (WGA) genomic material using a third party kit (e.g., REPLI-g from QIAGEN). This amplified control library will be the baseline for kinetic analyses and is run separately from the test library. PacBio RS Run Parameters Long inserts (greater than 3 kb) should be sequenced using a 1x90 minute movie protocol. Shorter inserts (less than 3 kb) should be sequenced using a 2x45 minute movie protocol. Longer inserts may also be sequenced using a 2x45 minute movie protocol to increase the data per SMRT Cell. Example: To examine adenine methylation across the 5 Mb E. coli genome strain K-12 substrain MG1655, we conducted a resequencing experiment. We created a 1 kb SMRTbell library using C2 chemistry for use in a 2x45 minute movie protocol. To target 100X coverage, we sequenced five SMRT Cells and achieved a mean coverage of 162X. The data generated is available at To prepare 1 kb libraries, see the Pacific Biosciences Procedure & Checklist - 1 kb Template Preparation and Sequencing. Methylome Analysis Methylation and motif analysis is done directly in SMRT Portal v1.3.3 using a new protocol called RS_Modification_and_Motif_Analysis.1, which performs the following steps: 1. Aligns SMRT sequencing subreads to a reference genome, producing a cmp.h5 alignment file. 2. Detects variants, producing a variant GFF track and VCF track. 3. Stores data in the same format regardless of which type of normalization control (amplified, native, or in silico control) was used to calculate the IPD ratios. 4. Generates modifications.csv and modifications.gff files. These files contain statistics on the polymerase kinetics during sequencing (at every position in the genomic sample). High IPD ratio positions in these files represent locations of putative modification. Note that some modifications will have high IPD ratios at multiple sites within the 12-base polymerase footprint. 5. Analyzes the recurring context of modifications across the genome and creates a summary report of these modified motifs in motif_summary.csv. 6. Identifies locations (using the information from step 4) of 6-mA, 4-mC, and Tet-converted 5-mC and combines them with the motif information determined in step 5. In the modifications.gff file, secondary kinetic variation events are removed if modification identification is turned on. The analysis also outputs the combined modification and motif information in a motifs.gff file. For example, a secondary +5 peak for a 6-mA will be removed so that only the single identified base is represented. Two off-site peaks for Tet-converted 5- mc will be removed, and a single correct site for the modification will replace it. This clean-up process occurs for both the modifications.gff and motifs.gff files. Note that the RS_Modification_and_Motif_Analysis.1 protocol requires a reference assembly. This reference must be uploaded to SMRT Portal before setting up the job. For experiments comprising both de novo assembly and base modification detection, the assembly must be generated first, before being used as a reference. Be aware that low-quality areas of assemblies or variable sequence regions in the known reference can result in apparent modification calls since the true sequence context of the base calls will not match the expected sequence context that the in silico model is using. Use of low quality or highly variable reference sequence typically results in poor mapping of the sequence reads to the reference sequence and difficulty detecting base modifications. Regions of low Map QV are a strong Page 4

5 indicator that reads have been mapped to a repeat region. If using an amplified control for IPD ratio analysis, a separate SMRT Portal job must be run to align the unmodified DNA sequence to the reference. This alignment may be performed with any protocol that performs resequencing alignment, but the same reference sequence must be used for both the amplified control and the base modification analysis SMRT Portal job. Identify Modifications is enabled, the software will ignore the identify modifications check box. If using the in silico control, leave the Control Job ID field blank. 4. If using Tet-converted DNA, select Sample is Tet treated Identify M5C Tet Modifications to identify 5-mC. Setting Up the Job After selecting SMRT Cells for analysis in the Design Job tab of SMRT Portal: 1. Select RS_Modification_and_Motif_Analysis.1 from the Protocol drop-down menu and click the button. Figure 1. RS_Modification_Detection.1 Protocol Selection 2. Select the appropriate (previously-uploaded) reference from the Reference drop-down menu (see Figure 2). For instructions on how to do this, please see the SMRT Portal help section Managing reference sequences within SMRT Portal. Figure 3. Entering the Control Job ID Example: To analyze 6-mA in E. coli, we used the default in silico IPD normalization process with an edited version of the MG1655 clone of K12 strain as the reference (named ecoli_mutated and available with the sequence data files on These edits were made to correct for variants in the E. coli strain that was used as compared to the available reference sequence. Output Files Two reports and four data files are generated by the RS_Modification_and_Motif_Analysis.1 analysis protocol. They are available as compressed.gzip files (gz) and can be downloaded from the SMRT Portal Job Details Page (in the DATA section). The two reports are called 1) Modifications and 2) Motifs. Modifications indicates which bases have a modification via two graphics (see Figure 4). Figure 2. Reference Selection 3. Select Postprocessing and then enter the Job ID for the control alignment in the ControlJobID box, if an amplified control or native control was run. Note that modification identification does not currently support use of an amplified or native control. If a Control Job ID is entered and Figure 4. Modifications Report Page 5

6 The modification QV vs Coverage scatterplot will show modified bases as distinct clouds (which have a higher than expected modification QV at a particular coverage). In the example figure above, the red adenines are distinctly separate. A similar adenine methylation indication is depicted by the Modification QV Histogram, where the red line for the adenine bases is again distinct from the other three bases. Note that these data do not incorporate information from identifying specific types of base modification, so kinetic signatures that spread over multiple bases will cause several clouds or lines to diverge from the background. That signal is evidenced above in the bulge of higher modification QV for the C, G, and T bases due to the +5 secondary peak in many contexts of 6-mA. This would also arise with 5-mC for example, because the strongest kinetic signals are two and six bases from the site of modification. Note that in the case of modified 5-mC, a distinct separation of the C base will not necessarily appear, because there is little signal at the site of modification. The largest signal is two bases away from the site of modification. These two reports both display single-site modification QVs and only incorporate kinetics information, not the modification identity. The Motifs report is a summary table of the motifs, and is described in more detail in the Performing Motif Analysis section below. See Figure 13 below. The four data file outputs are as follows: The modifications.csv file is a comma-separated values (CSV) file (see Table 1 and Table 3) with statistical analysis of each position in the reference. It is intended to allow additional follow-up analysis for every genomic position. Note that when analyzing the subreads, all IPDs for a subread are normalized by the mean IPD of that subread, which handles read-to-read variation in IPDs. This file is also produced when motif analysis is not active, such as with the RS_Modification_Detection analysis protocol. The modifications.gff file is a General Features Format (GFF) file (see Table 2). The GFF file is used for motif analysis and modification visualization in SMRT View. The GFF file is a text file formatted for graphical sequence viewers. It includes sequence contexts only for sites of putative modification defined as positions with p-values of 0.01 or less, which indicate that the IPD ratio (at the position) is significantly different from the expected background. This file is also produced when motif analysis is not active. With modification identification turned on, secondary kinetic variation events will be removed from this file for example, if a 6-mA with a secondary +5 signal is identified, the secondary signal will be removed from this file. motif_summary.csv contains the genome-wide summary of the methyltransferase recognition motifs discovered in this sample. The motifs.gff file is similar to the modifications.gff file, but is produced when motif analysis is active. This file contains information about all sites detected as modified, all locations of a discovered motif, and also the overlap between the modifications and motifs. With modification identification turned on, secondary peaks are removed as in the modifications.gff file. Many genomic viewers will be able to open the GFF file (with a small edit) to be sure that the reference sequence identifiers match the sequence identifiers in the GFF. SMRT View, covered in the next section, has been enhanced to take advantage of specific features of the files produced by this analysis protocol. Page 6

7 Table 1: Fields Included in the modifications.csv file when using the in silico control. Column refid Description Reference sequence tag for this observation. Same as Seqid in the.gff file. This is an internal identifier for a reference FASTA sequence. A mapping of this ID to the labels in the FASTA file is contained in the reference.info.xml file stored with the reference file on the SMRT Portal server. Tpl Template position, starting at 1. Strand Native sample strand where kinetics were generated. 0 is the strand of the original FASTA and 1 is the reverse complement of the strand. Note that in the.gff file these are marked "+" and "-" respectively. Base The letter representing this base: A, C, T, G. Score tmean -10 log (p-value) score for the detection of this event. Analogous to a Phred quality score. A value of 20 is the minimum default threshold for this file, and corresponds to a p-value of A score of 30 corresponds to a p- value of Capped mean of IPDs observed at this position. Capped means that outlier data points are removed - this will reduce the impact of random polymerase pausing events. Numerator of IPD ratio. terr ModelPrediction ipdratio Coverage Capped standard error of IPDs observed at this position (standard deviation/qrt (coverage)). Normalized mean IPD predicted by the in silico control model for this sequence context. Denominator of IPD ratio. tmean/modelprediction. Count of valid IPDs at this position (see Filtering section for details). Table 2: Fields included in the modifications.gff file. Column Seqid Source Type Start End Score Strand Phase Attributes Description Reference tag (e.g. ref00001). Same as refid in the.csv file. Name of tool -- "kinmodcall". Modification type a generic tag "modified_base" is used for unidentified bases. For identified bases, m6a, m4c, and m5c are used. Location of modification. Location of modification. -10 log (p-value) score for the detection of this event. Analogous to a Phred quality score. A value of 20 is the minimum default threshold for this file, and corresponds to a p-value of A score of 30 corresponds to a p- value of Native sample strand where kinetics were generated. + is the strand of the original FASTA and - is the reverse complement of the strand. Note that in the.csv file these are marked "0" and "1" respectively. Not applicable. Contains extra fields. IPDRatio is traditional IPD Ratio, context is the reference sequence -20bp to +20bp around the modification plus the base at this location as the 21 st character, and sequencing coverage of that position. Context is always written in 5 -> 3 orientation of the template strand. Page 7

8 Table 3: Fields included in the modifications.csv file when using an amplified control or native control. Column refid Description Reference sequence tag for this observation. Same as Seqid in the GFF file. This is an internal identifier for a reference FASTA sequence. A mapping of this ID to the labels in the FASTA file is contained in the reference.info.xml file stored with the reference file on the SMRT Portal server. Tpl Template position, starting at 1. Strand Native sample strand where kinetics were generated. 0 is the strand of the original FASTA and 1 is the reverse complement of the strand. Note that in the.gff file these are marked "+" and "-" respectively. Base The letter representing this base: A, C, T, G. Score casemean -10 log (p-value) score for the detection of this event. Analogous to a Phred quality score. A value of 20 is the minimum default threshold for this file, and corresponds to a p-value of A score of 30 corresponds to a p- value of Mean of case IPDs observed at this position. Numerator of IPD ratio. controlmean casestd controlstd ipdratio teststatistic coverage controlcoverage casecoverage Mean of control IPDs observed at this position. Denominator of IPD ratio. Standard deviation of case IPDs observed at this position. Standard deviation of control IPDs observed at this position. casemean/controlmean. t-statistic of two-sample t-test. Mean of case and control coverages. Count of valid IPDs in control sequence at this position (see Filtering section of SMRT pipe documentation for details). Count of valid IPDs in case sequence at this position (see Filtering section of SMRT pipe documentation for details). Page 8

9 Table 4: Fields included in the motifs.gff file (generated only when motif analysis is active). Column Seqid Source Type Start End Score Strand Phase Attributes Description Reference tag (e.g. ref00001). Same as refid in the.csv file. Name of tool -- "kinmodcall". Modification type a generic tag "modified_base" is used for unidentified bases. For identified bases, m6a, m4c, and m5c are used. A. indicates a site where methylation was expected, but was below the significance threshold during the initial kinetics analysis. This suggests a site that is possibly being demethylated in the genome. Location of modification. Location of modification. -10 log (p-value) score for the detection of this event. Analogous to a Phred quality score. A value of 20 is the minimum default threshold for this file, and corresponds to a p-value of A score of 30 corresponds to a p- value of This is the Modification QV for the statistical event detection at this position only, not for the identification. In the case of a multi-site kinetic variation event, such as with Tet-converted 5-mC, it is likely that this score will be very low and the identificationqv (in the Attributes field, below) will contain a higher score that incorporates the full multi-site signal. Native sample strand where kinetics were generated. + is the strand of the original FASTA and - is the reverse complement of the strand. Note that in the.csv file these are marked "0" and "1" respectively. Not applicable. Contains extra fields. IPDRatio is traditional IPD Ratio, context is the reference sequence -20bp to +20bp around the modification plus the base at this location as the 21 st character, and sequencing coverage of that position. Context is always written in 5 -> 3 orientation of the template strand. The id attribute is added with the complete double-strand methyltransferase motif. The motif attribute is added with the single-strand methyltransferase motif for the modification described in this row of the file If the motif is palindromic, then the id and motif attributes will be the same. identificationqv is the score for the identification call, if applicable. This is a separate statistical test from the Score field above. Example: context=atacgccggccataatggcgatcgacattttctcgccacgg;motif=gatc;coverage=99;ipdratio= 3.71;id=GATC;identificationQv=174 Table 5. Fields included in motif_summary.csv (generated only when motif analysis is active). Column motifstring centerpos fraction ndetected ngenome grouptag partnermotifstring meanscore meanipdratio meancoverage objectivescore Detected motif sequence for this site such as GATC. Position in motif of modification (0-based). Description The percent of time this motif is detected as modified in the genome. (Fraction of instances of this motif with modification QV or identification QV above the QV threshold.) Number of instances of this motif that are detected as modified. (Number of instances of this motif with modification QV or identification QV above threshold.) Number of occurances of this motif in the reference sequence genome. A name identifying the complete double-strand recognition motif. For paired motifs this is <motifstring1>/<motifstring2>, for example GAGA/TCTC. For palindromic or unpaired motifs this is the same as motifstring. motifstring of paired motif (motif with reverse-complementary motifstring). Mean Modification QV of instances of this motif that are detected as modified. Mean IPD ratio of instances of this motif that are detected as modified. Mean coverage of instances of this motif that are detected as modified. Score of this motif in the motif finder algorithm. The algorithm considers higher objective scores to be more confidently identified motifs in the genome based on several factors. Page 9

10 Visualizing Sites of Base Modification with SMRT View Figure 5. Example of a SMRT View Screen. The motifs.gff output file, along with modification summary data, can be viewed in SMRT View by clicking on the Tachometer icon in the top tool bar of SMRT View. Figure 6. Tachometer Icon in SMRT View Tool Bar There are several visualization tracks: 1. modifications (regions): This track displays a bar chart for both the plus and minus strands, denoting the number of modification events within a 10 kb window. It is a stacked graph, which also displays the type of modification in each 10 kb window. Figure 7. Modification Regions Showing Number of Modification Events 2. motifs: For each motif detected in the analysis, a sub-track is created to visualize the sites in the genome that are methylated within that motif. A marker is placed on this track denoting the strand on which the modification was detected. If a modification falls outside of one of the discovered motifs, it is placed in a track called others with a label indicating the 7-base context. This context is always represented in 5 to 3 orientation on the template strand. The content of this track is located in the motifs.gff file. Page 10

11 an incorporation of the complement base T, which is the base and kinetic information stored in the FASTA file and bas.h5 file. Figure 8. Marker Showing the Strand Where Event Was Detected 3. Kinetogram: This is a reference sequence view where both strands are shown, with a bar chart displaying the IPD ratio at each position for each strand (see Figure 9). It is visible only when the view is zoomed. It is important to note that the strands are displayed in template space, not in read space. An adenine ( A ) shown in the Kinetogram track and in the subread alignment below it is the adenine in the sequencing template. This means that if a higher IPD ratio value is shown above an A position, it indicates a potentially modified adenine in the original template it will generally correspond to an identified 6-mA in a motif in the modification events track above the Kinetogram. In read space, the actual measured IPD corresponds to The data in the Kinetogram is a ratio of the mean IPD of the genome of interest to the mean IPD of the control at the given position. Hence the IPD ratio values have a baseline at 1. An IPD ratio greater than 1 means that the sequencing polymerase slowed down (relative to the control) at this base position, while an IPD ratio less than 1 indicates an increase in polymerase speed. The data for this track is pulled directly from the cmp.h5 alignment file generated during the analysis job. 4. Aligned reads: By default, this area is hidden. Pressing the View Reads icon in the Details Panel title bar will show the subread alignment. This area displays Variants and Base QV by default. Coverage is displayed on the left, separated by strand (see Figure 10). Using the Preferences menu option, it is possible to switch to other subread data in this panel including the raw IPD and raw Pulse Width. Figure 9. Kinetogram Displaying the IPD Ratio at Each Position for Each Strand Figure 10. Heat Map of Raw IPDs Separated by Strand Page 11

12 SMRT View visualization sessions can be saved to a file, using the File/Save Project menu command. A project file can be shared with other users. i Importing Relevant Annotations Visualizing base modification data in the context of relevant annotation tracks, such as genes, CpG islands, etc., can help relate putative modification events to related biological function. SMRT View can import many types of data files. For more information, see the Pacific Biosciences SMRT View Help menu. Using the Table Browser The Table Browser can be used to look at the highest confidence modification events. By default the table is sorted by QV (score). that region in the graphical view. By inputting text in the search box for a feature label or type in which you are interested, you can quickly find modification events of interest. By pressing the eye icon next to the search box, the events displayed in the main window will also be filtered. For example, try typing unknown in the search box without the quote marks. This will list only motifs that were not detected as modified. Now press the eye icon in the main screen you will notice that the motif tracks will hide most of the motifs and only show the motifs with an unknown Type in other words, those that did not meet a confidence threshold to be declared as modified. These may correlate with other interesting genomic regions. Figure 12. Closer View of a Region with Motif Performing Motif Analysis Since methylation in bacteria generally occurs at specific sequence motifs that are recognized by methyltransferases, a genome-wide analysis of the modified motifs is critical to understanding the bacteria being studied. Figure 11. Table Browser View For modifications that occur at a discovered motif, the Feature label will be the motif. Modifications that pass the confidence threshold will be labeled either by the type of modification (m6a, m4c, m5c) or as modified_base when an identification was not possible. Unmodified instances of the motif have the Type unknown. Double-clicking one of the events listed in the Table Browser will zoom in on A default SMRT Portal analysis job will provide several forms of motif analysis. The first is the Motif Report in SMRT Portal. The second is a file download called motif_summary.csv (see Table 5) that contains similar information which can be easily opened in a spreadsheet program. The third is motifs.gff (see Table 4), which shows all of the sites in the genome that are methylated, all of the sites in the genome with one of the discovered motifs, and the overlap between the methylation and the motifs. The fourth is SMRT View, which allows easy visualization of the information contained in these files. Page 12

13 Figure 13. The Motif report in SMRT Portal The motif report in SMRT Portal, which is similar in content to the motif_summary.csv file, is a high level summary of the motifs discovered in this genome. For each motif detected, statistics are shown describing how often it is detected as methylated in this genome. If you have very low coverage, you may be able to detect the motifs but you will also see a lower value for percentage detected. In cases where a non-palindromic double-stranded methyltransferase recognition site is detected, a Partner Motif is indicated, which is the complementary motif. Example: In our sample E. coli genome, two motifs are modified almost all of the time: GATC and AACNNNNNNGTGC/GCACNNNNNNGTT. GATC is detected as modified over 99% of the time. The second motif is non-palindromic, so it is split into two rows in the motif report AAC and GCA which are modified over 98% of the time. As expected, these paired motifs are both in the genome the same number of times. If the paired motifs are considered together, they are modified ( )/( ) = 98.5% of the time. We also saw three additional motifs modified about 50% of the time. These motifs occur due to a very low detection rate of 5-mC in the Dcm methyltransferase recognition motif, CCWGG. In this case, the motifs have been expanded so that the fraction methylated is high enough for the motif analysis software to call a valid motif. For this genome, the presence of these motifs suggests that an additional sequencing run using Tet-converted DNA should be run in order to also discover the recognition sites for methyltransferases that create 5-mC modifications. If the list of motifs is different from what was expected, or further refining the list is desired, consider re-analyzing the data using a different modification QV threshold for the motif analysis. By default, any modification with a modification QV (confidence score) of 40 or higher is included. By looking at the Modifications report (Figure 4), it is possible to choose a different threshold to include more data in the motif analysis. There are two options for re-analyzing motifs with different thresholds. The first is to re-run the job in SMRT Portal. The setting is found in the Protocol Settings ( button) when making a new analysis job (at the bottom of the Postprocessing tab). This requires re-running the entire analysis including sequence alignment. The other alternative is to download the motifmaker command-line Java program from GitHub ( This is an open-source, unsupported version of the same software that analyzes data in SMRT Portal. It is quicker to run because it uses the output of the SMRT Portal modification analysis as an input (i.e., it starts after the slowest part of the analysis has already been performed). However, it may be more complicated to run since it is a command line program. Additional advanced tools for motif analysis are also available on DevNet at 000GtatAAC. Example: In our sample E. coli data, we see that the Modifications report has strong methylation only on adenines, which is consistent with the Motif report. In the Modification QV Histogram, the red line representing adenine does not trend with the other bases, and it separates at a Modification QV of approximately 60. One option is to re-run the analysis with a new threshold of 60. It is interesting to see how the motif report changes with the new threshold. Try a lower setting, such as 25, to examine the resulting effect. Another component to motif analysis is the association between methyltransferase recognition motifs and the methyltransferase gene that modifies that site. For example, creating this association is important for introducing a particular modification into another bacteria or knocking it out of the one just sequenced. A common place to begin with this analysis, before proceeding to lab validation, is to visit the organism Page 13

14 database at REBASE ( This site lists a variety of bacterial genomes and has information on predicted and verified RM genes in each genome, plus predicted recognition domains for each methyltransferase or restriction endonuclease found in those genomes. By comparing the list generated by SMRT Portal to the predicted list in REBASE, it is often possible to determine which genes are active and responsible for the modifications you have discovered in your bacterial strain. Conclusion SMRT Sequencing allows genome-wide, singlebase-resolution study of bacterial methylation including 6-mA, 5-mC and 4-mC. The analysis of these modifications along with motif analysis is possible today using SMRT Analysis, and will continue to become more accessible and automated. References 1. Geoffrey G. Wilson. Organization of restriction-modification systems. Nucleic Acids Research, (1991) 19(10): doi: /nar/ ). 2. Zweiger et al; A Caulobacter DNA methyltransferase that functions only in the predivisional cell, J Mol Biol, Jan 1994; 235(2): Shrikhanta et al; The phasevarion: phase variation of type III DNA methyltransferases controls coordinated switching in multiple genes. Nature Reviews Microbiology, March 2010, Volume Murray et al. The Methylomes of Six Bacteria, Nucleic Acids Research, Sept Ryan KJ, Ray CG (editors). Sherris Medical Microbiology (4th ed.). McGraw Hill. pp (2004): ISBN Russi et al; "Molecular Machinery for DNA Translocation in Bacterial Conjugation". Plasmids: Current Research and Future Trends. Caister Academic Press. (2008): ISBN ). Endnotes i Adjusting the Region view in SMRT View: To change the regional modification density histogram to a heat map, press the x key. This will make it easier to see the less frequent types of modification. Press x again to change it back. Adjusting the Details view in SMRT View: There are several keyboard commands that can be used to change the size of letters displayed in the heat map view, which may make it easier to see at various zoom levels. To zoom to base resolution, click the Zoom to Base toolbar icon (a magnifying glass over an A). The keyboard commands are: R or r : make the IPD ratio bar chart larger or smaller, respectively. D or d : make the nucleotide letters larger or smaller. T or t: make the annotation tracks larger or smaller. Left arrow or right arrow: After clicking in the top or bottom pane to select that pane, move the view to the left or right. Up arrow or down arrow: After clicking on a base modification event in the Details view, move to the next or previous event in that track. Additionally, moving the mouse over an event will show additional details from the GFF file such as the IPD ratio, score, coverage, and sequence context. 7. Flusberg et al; Direct detection of DNA methylation during single-molecule, real-time sequencing. Nature Methods 7: Clark et al; Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing, Nucleic Acids Research, (2011). 9. Yu et al. Base-Resolution Analysis of 5- Hydroxymethylcytosine in the Mammalian Genome. Cell, Volume 149, Issue 6, , 17 May Page 14

15 For Research Use Only. Not for use in diagnostic procedures. Copyright 2012, Pacific Biosciences of California, Inc. All rights reserved. Information in this document is subject to change without notice. Pacific Biosciences assumes no responsibility for any errors or omissions in this document. Certain notices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences products and/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions of Sale and to the applicable license terms at Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT and SMRTbell are trademarks of Pacific Biosciences in the United States and/or certain other countries. All other trademarks are the sole property of their respective owners. PN Page 15

Detecting DNA Base Modifications Using Single Molecule, Real-Time Sequencing

Detecting DNA Base Modifications Using Single Molecule, Real-Time Sequencing Detecting DA Using Single Molecule, Real-Time Sequencing Introduction Base modifications are important to the understanding of biological processes such as gene expression, host-pathogen interactions,

More information

Software Getting Started Guide

Software Getting Started Guide Software Getting Started Guide For Research Use Only. Not for use in diagnostic procedures. P/N 001-097-569-03 Copyright 2010-2013, Pacific Biosciences of California, Inc. All rights reserved. Information

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977

More information

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation.

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation. Introduction to SMRTbell Template Preparation 100 338 500 01 1. SMRTbell Template Preparation 1.1 Introduction to SMRTbell Template Preparation Welcome to Pacific Biosciences' Introduction to SMRTbell

More information

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v2.2.0. 1.1 SMRT Analysis v2.2.0 Overview. Notes:

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v2.2.0. 1.1 SMRT Analysis v2.2.0 Overview. Notes: SMRT Analysis v2.2.0 Overview 100 338 400 01 1. SMRT Analysis v2.2.0 1.1 SMRT Analysis v2.2.0 Overview Welcome to Pacific Biosciences' SMRT Analysis v2.2.0 Overview 1.2 Contents This module will introduce

More information

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each

More information

Statistics Output Guide

Statistics Output Guide Statistics Output Guide Introduction This document describes the file stats.xml, which is produced by the primary analysis pipeline. This file packages summary statistics from a single movie acquisition.

More information

Sequencing Analysis Software Version 5.1

Sequencing Analysis Software Version 5.1 Applied Biosystems DNA Sequencing Analysis Software Sequencing Analysis Software Version 5.1 The Applied Biosystems DNA Sequencing Analysis Software v5.1 is designed to analyze, display, edit, save, and

More information

Version 5.0 Release Notes

Version 5.0 Release Notes Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Microsoft Access 2010 handout

Microsoft Access 2010 handout Microsoft Access 2010 handout Access 2010 is a relational database program you can use to create and manage large quantities of data. You can use Access to manage anything from a home inventory to a giant

More information

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays Exiqon Array Software Manual Quick guide to data extraction from mircury LNA microrna Arrays March 2010 Table of contents Introduction Overview...................................................... 3 ImaGene

More information

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER JMP Genomics Step-by-Step Guide to Bi-Parental Linkage Mapping Introduction JMP Genomics offers several tools for the creation of linkage maps

More information

Access 2007 Creating Forms Table of Contents

Access 2007 Creating Forms Table of Contents Access 2007 Creating Forms Table of Contents CREATING FORMS IN ACCESS 2007... 3 UNDERSTAND LAYOUT VIEW AND DESIGN VIEW... 3 LAYOUT VIEW... 3 DESIGN VIEW... 3 UNDERSTAND CONTROLS... 4 BOUND CONTROL... 4

More information

Appointment Scheduler

Appointment Scheduler EZClaim Appointment Scheduler User Guide Last Update: 11/19/2008 Copyright 2008 EZClaim This page intentionally left blank Contents Contents... iii Getting Started... 5 System Requirements... 5 Installing

More information

QuantStudio 3D AnalysisSuite Software: Relative Quantification

QuantStudio 3D AnalysisSuite Software: Relative Quantification HELP QuantStudio 3D AnalysisSuite Software: Relative Quantification for use with QuantStudio 3D Digital PCR System and QuantStudio 3D AnalysisSuite Server System Publication Number MAN0009636 Revision

More information

QuantStudio 3D AnalysisSuite Software

QuantStudio 3D AnalysisSuite Software USER GUIDE QuantStudio 3D AnalysisSuite Software for use with QuantStudio 3D Digital PCR System Publication Number MAN0008161 Revision 1.0 For Research Use Only. Not for use in diagnostic procedures. For

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

SUBJECT: New Features in Version 5.3

SUBJECT: New Features in Version 5.3 User Bulletin Sequencing Analysis Software v5.3 July 2007 SUBJECT: New Features in Version 5.3 This user bulletin includes the following topics: New Features in v5.3.....................................

More information

MicroStrategy Desktop

MicroStrategy Desktop MicroStrategy Desktop Quick Start Guide MicroStrategy Desktop is designed to enable business professionals like you to explore data, simply and without needing direct support from IT. 1 Import data from

More information

Introduction To Real Time Quantitative PCR (qpcr)

Introduction To Real Time Quantitative PCR (qpcr) Introduction To Real Time Quantitative PCR (qpcr) SABiosciences, A QIAGEN Company www.sabiosciences.com The Seminar Topics The advantages of qpcr versus conventional PCR Work flow & applications Factors

More information

Visualization of Phylogenetic Trees and Metadata

Visualization of Phylogenetic Trees and Metadata Visualization of Phylogenetic Trees and Metadata November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com

More information

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Introduction Mate pair sequencing enables the generation of libraries with insert sizes in the range of several kilobases (Kb).

More information

SonicWALL GMS Custom Reports

SonicWALL GMS Custom Reports SonicWALL GMS Custom Reports Document Scope This document describes how to configure and use the SonicWALL GMS 6.0 Custom Reports feature. This document contains the following sections: Feature Overview

More information

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS

More information

ProteinPilot Report for ProteinPilot Software

ProteinPilot Report for ProteinPilot Software ProteinPilot Report for ProteinPilot Software Detailed Analysis of Protein Identification / Quantitation Results Automatically Sean L Seymour, Christie Hunter SCIEX, USA Pow erful mass spectrometers like

More information

Vector NTI Advance 11 Quick Start Guide

Vector NTI Advance 11 Quick Start Guide Vector NTI Advance 11 Quick Start Guide Catalog no. 12605050, 12605099, 12605103 Version 11.0 December 15, 2008 12605022 Published by: Invitrogen Corporation 5791 Van Allen Way Carlsbad, CA 92008 U.S.A.

More information

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249

More information

TruSeq Custom Amplicon v1.5

TruSeq Custom Amplicon v1.5 Data Sheet: Targeted Resequencing TruSeq Custom Amplicon v1.5 A new and improved amplicon sequencing solution for interrogating custom regions of interest. Highlights Figure 1: TruSeq Custom Amplicon Workflow

More information

Frequently Asked Questions Next Generation Sequencing

Frequently Asked Questions Next Generation Sequencing Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided

More information

EMC Smarts Network Configuration Manager

EMC Smarts Network Configuration Manager EMC Smarts Network Configuration Manager Version 9.4.1 Advisors User Guide P/N 302-002-279 REV 01 Copyright 2013-2015 EMC Corporation. All rights reserved. Published in the USA. Published October, 2015

More information

Introduction to NGS data analysis

Introduction to NGS data analysis Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High

More information

WIDA Assessment Management System (WIDA AMS) User Guide, Part 2

WIDA Assessment Management System (WIDA AMS) User Guide, Part 2 WIDA Assessment Management System (WIDA AMS) User Guide, Part 2 Data Recognition Corporation (DRC) 13490 Bass Lake Road Maple Grove, MN 55311 Direct: 1-855-787-9615 https://wida-ams.us Revision Date: September

More information

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial Samuel J. Rulli, Jr., Ph.D. qpcr-applications Scientist Samuel.Rulli@QIAGEN.com Pathway Focused Research from Sample Prep to Data Analysis! -2-

More information

Creating a Website with Publisher 2013

Creating a Website with Publisher 2013 Creating a Website with Publisher 2013 University Information Technology Services Training, Outreach, Learning Technologies & Video Production Copyright 2015 KSU Division of University Information Technology

More information

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents: Table of contents: Access Data for Analysis Data file types Format assumptions Data from Excel Information links Add multiple data tables Create & Interpret Visualizations Table Pie Chart Cross Table Treemap

More information

Migrating to Excel 2010 from Excel 2003 - Excel - Microsoft Office 1 of 1

Migrating to Excel 2010 from Excel 2003 - Excel - Microsoft Office 1 of 1 Migrating to Excel 2010 - Excel - Microsoft Office 1 of 1 In This Guide Microsoft Excel 2010 looks very different, so we created this guide to help you minimize the learning curve. Read on to learn key

More information

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAAC GTGCAC GTGAAC Wouter Coppieters Head of the genomics core facility GIGA center, University of Liège Bioruptor NGS: Unbiased DNA

More information

Gene Expression Macro Version 1.1

Gene Expression Macro Version 1.1 Gene Expression Macro Version 1.1 Instructions Rev B 1 Bio-Rad Gene Expression Macro Users Guide 2004 Bio-Rad Laboratories Table of Contents: Introduction..................................... 3 Opening

More information

Enterprise Interface User Guide

Enterprise Interface User Guide Enterprise Interface User Guide http://www.scientia.com Email: support@scientia.com Ref: 3094 ISO 9001:2000 / TickIT certified Copyright Scientia Ltd 2010 This document is the exclusive property of Scientia

More information

restriction enzymes 350 Home R. Ward: Spring 2001

restriction enzymes 350 Home R. Ward: Spring 2001 restriction enzymes 350 Home Restriction Enzymes (endonucleases): molecular scissors that cut DNA Properties of widely used Type II restriction enzymes: recognize a single sequence of bases in dsdna, usually

More information

Business Insight Report Authoring Getting Started Guide

Business Insight Report Authoring Getting Started Guide Business Insight Report Authoring Getting Started Guide Version: 6.6 Written by: Product Documentation, R&D Date: February 2011 ImageNow and CaptureNow are registered trademarks of Perceptive Software,

More information

AS4.1 190509 Replaces 260806 Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

AS4.1 190509 Replaces 260806 Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO): Replaces 260806 Page 1 of 50 ATF Software for DNA Sequencing Operators Manual Replaces 260806 Page 2 of 50 1 About ATF...5 1.1 Compatibility...5 1.1.1 Computer Operator Systems...5 1.1.2 DNA Sequencing

More information

Novell ZENworks Asset Management 7.5

Novell ZENworks Asset Management 7.5 Novell ZENworks Asset Management 7.5 w w w. n o v e l l. c o m October 2006 USING THE WEB CONSOLE Table Of Contents Getting Started with ZENworks Asset Management Web Console... 1 How to Get Started...

More information

VMware vcenter Operations Manager Administration Guide

VMware vcenter Operations Manager Administration Guide VMware vcenter Operations Manager Administration Guide Custom User Interface vcenter Operations Manager 5.6 This document supports the version of each product listed and supports all subsequent versions

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

Avaya Network Configuration Manager User Guide

Avaya Network Configuration Manager User Guide Avaya Network Configuration Manager User Guide May 2004 Avaya Network Configuration Manager User Guide Copyright Avaya Inc. 2004 ALL RIGHTS RESERVED The products, specifications, and other technical information

More information

MS Project Tutorial for Senior Design Using Microsoft Project to manage projects

MS Project Tutorial for Senior Design Using Microsoft Project to manage projects MS Project Tutorial for Senior Design Using Microsoft Project to manage projects Overview: Project management is an important part of the senior design process. For the most part, teams manage projects

More information

Compass Software for Peggy User Guide

Compass Software for Peggy User Guide page 1 Compass Software for Peggy User Guide Copyright 2013 ProteinSimple. All rights reserved. ProteinSimple 3040 Oakmead Village Drive Santa Clara, CA 95051 Toll-free: (888) 607-9692 Tel: (408) 510-5500

More information

Data Analysis for Ion Torrent Sequencing

Data Analysis for Ion Torrent Sequencing IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page

More information

3 What s New in Excel 2007

3 What s New in Excel 2007 3 What s New in Excel 2007 3.1 Overview of Excel 2007 Microsoft Office Excel 2007 is a spreadsheet program that enables you to enter, manipulate, calculate, and chart data. An Excel file is referred to

More information

History Explorer. View and Export Logged Print Job Information WHITE PAPER

History Explorer. View and Export Logged Print Job Information WHITE PAPER History Explorer View and Export Logged Print Job Information WHITE PAPER Contents Overview 3 Logging Information to the System Database 4 Logging Print Job Information from BarTender Designer 4 Logging

More information

Access Tutorial 3 Maintaining and Querying a Database. Microsoft Office 2013 Enhanced

Access Tutorial 3 Maintaining and Querying a Database. Microsoft Office 2013 Enhanced Access Tutorial 3 Maintaining and Querying a Database Microsoft Office 2013 Enhanced Objectives Session 3.1 Find, modify, and delete records in a table Hide and unhide fields in a datasheet Work in the

More information

Tutorial 3 Maintaining and Querying a Database

Tutorial 3 Maintaining and Querying a Database Tutorial 3 Maintaining and Querying a Database Microsoft Access 2013 Objectives Session 3.1 Find, modify, and delete records in a table Hide and unhide fields in a datasheet Work in the Query window in

More information

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes 2.1 Introduction Large-scale insertional mutagenesis screening in

More information

RESTRICTION DIGESTS Based on a handout originally available at

RESTRICTION DIGESTS Based on a handout originally available at RESTRICTION DIGESTS Based on a handout originally available at http://genome.wustl.edu/overview/rst_digest_handout_20050127/restrictiondigest_jan2005.html What is a restriction digests? Cloned DNA is cut

More information

Business Objects 4.1 Quick User Guide

Business Objects 4.1 Quick User Guide Business Objects 4.1 Quick User Guide Log into SCEIS Business Objects (BOBJ) 1. https://sceisreporting.sc.gov 2. Choose Windows AD for Authentication. 3. Enter your SCEIS User Name and Password: Home Screen

More information

Task Force on Technology / EXCEL

Task Force on Technology / EXCEL Task Force on Technology EXCEL Basic terminology Spreadsheet A spreadsheet is an electronic document that stores various types of data. There are vertical columns and horizontal rows. A cell is where the

More information

DAS202Tools v1.0.0 for DAS202 Operating Manual

DAS202Tools v1.0.0 for DAS202 Operating Manual DAS202Tools v1.0.0 for DAS202 Operating Manual DAT102Tools 1.0.0 Manual Table of context 2 Table of Contents 1 General Information... 3 2 PC... Configuration Requirements 3 3 Software Installation... 3

More information

Context-sensitive Help Guide

Context-sensitive Help Guide MadCap Software Context-sensitive Help Guide Flare 11 Copyright 2015 MadCap Software. All rights reserved. Information in this document is subject to change without notice. The software described in this

More information

Essentials of Real Time PCR. About Sequence Detection Chemistries

Essentials of Real Time PCR. About Sequence Detection Chemistries Essentials of Real Time PCR About Real-Time PCR Assays Real-time Polymerase Chain Reaction (PCR) is the ability to monitor the progress of the PCR as it occurs (i.e., in real time). Data is therefore collected

More information

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation Recombinant DNA & Genetic Engineering g Genetic Manipulation: Tools Kathleen Hill Associate Professor Department of Biology The University of Western Ontario Tools for Genetic Manipulation DNA, RNA, cdna

More information

DNA Sequencing Overview

DNA Sequencing Overview DNA Sequencing Overview DNA sequencing involves the determination of the sequence of nucleotides in a sample of DNA. It is presently conducted using a modified PCR reaction where both normal and labeled

More information

How To Use Excel 2010 On Windows 7 (Windows 7) On A Pc Or Mac) With A Microsoft Powerbook (Windows Xp) On Your Computer Or Macintosh (Windows) On Windows Xp (Windows 2007) On Microsoft Excel 2010

How To Use Excel 2010 On Windows 7 (Windows 7) On A Pc Or Mac) With A Microsoft Powerbook (Windows Xp) On Your Computer Or Macintosh (Windows) On Windows Xp (Windows 2007) On Microsoft Excel 2010 ISBN 978-1-921780-70-7 CREATE AND PRODUCE SPREADSHEETS BSBITU202A/BSBITU304A Excel 2010 Supporting BSBITU202A Create and Use Spreadsheets and BSBITU304A Produce Spreadsheets in the Business Services Training

More information

Frog VLE Update. Latest Features and Enhancements. September 2014

Frog VLE Update. Latest Features and Enhancements. September 2014 1 Frog VLE Update Latest Features and Enhancements September 2014 2 Frog VLE Update: September 2014 Contents New Features Overview... 1 Enhancements Overview... 2 New Features... 3 Site Backgrounds...

More information

MiSeq: Imaging and Base Calling

MiSeq: Imaging and Base Calling MiSeq: Imaging and Page Welcome Navigation Presenter Introduction MiSeq Sequencing Workflow Narration Welcome to MiSeq: Imaging and. This course takes 35 minutes to complete. Click Next to continue. Please

More information

Real-time PCR: Understanding C t

Real-time PCR: Understanding C t APPLICATION NOTE Real-Time PCR Real-time PCR: Understanding C t Real-time PCR, also called quantitative PCR or qpcr, can provide a simple and elegant method for determining the amount of a target sequence

More information

SPSS: Getting Started. For Windows

SPSS: Getting Started. For Windows For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 Introduction to SPSS Tutorials... 3 1.2 Introduction to SPSS... 3 1.3 Overview of SPSS for Windows... 3 Section 2: Entering

More information

Visualization with Excel Tools and Microsoft Azure

Visualization with Excel Tools and Microsoft Azure Visualization with Excel Tools and Microsoft Azure Introduction Power Query and Power Map are add-ins that are available as free downloads from Microsoft to enhance the data access and data visualization

More information

Avigilon Control Center Web Client User Guide

Avigilon Control Center Web Client User Guide Avigilon Control Center Web Client User Guide Version: 4.12 Enterprise OLH-WEBCLIENT-E-E-Rev2 Copyright 2013 Avigilon. All rights reserved. The information presented is subject to change without notice.

More information

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from

More information

Microsoft Outlook 2010. Reference Guide for Lotus Notes Users

Microsoft Outlook 2010. Reference Guide for Lotus Notes Users Microsoft Outlook 2010 Reference Guide for Lotus Notes Users ContentsWelcome to Office Outlook 2010... 2 Mail... 3 Viewing Messages... 4 Working with Messages... 7 Responding to Messages... 11 Organizing

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

Microsoft Access 2010 Part 1: Introduction to Access

Microsoft Access 2010 Part 1: Introduction to Access CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES Microsoft Access 2010 Part 1: Introduction to Access Fall 2014, Version 1.2 Table of Contents Introduction...3 Starting Access...3

More information

Introduction to SPSS 16.0

Introduction to SPSS 16.0 Introduction to SPSS 16.0 Edited by Emily Blumenthal Center for Social Science Computation and Research 110 Savery Hall University of Washington Seattle, WA 98195 USA (206) 543-8110 November 2010 http://julius.csscr.washington.edu/pdf/spss.pdf

More information

Creating Online Surveys with Qualtrics Survey Tool

Creating Online Surveys with Qualtrics Survey Tool Creating Online Surveys with Qualtrics Survey Tool Copyright 2015, Faculty and Staff Training, West Chester University. A member of the Pennsylvania State System of Higher Education. No portion of this

More information

Analyzing A DNA Sequence Chromatogram

Analyzing A DNA Sequence Chromatogram LESSON 9 HANDOUT Analyzing A DNA Sequence Chromatogram Student Researcher Background: DNA Analysis and FinchTV DNA sequence data can be used to answer many types of questions. Because DNA sequences differ

More information

LifeScope Genomic Analysis Software 2.5

LifeScope Genomic Analysis Software 2.5 USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use

More information

Project Setup and Data Management Tutorial

Project Setup and Data Management Tutorial Project Setup and Heavy Construction Edition Version 1.20 Corporate Office Trimble Navigation Limited Engineering and Construction Division 5475 Kellenburger Road Dayton, Ohio 45424-1099 U.S.A. Phone:

More information

Using FileMaker Pro with Microsoft Office

Using FileMaker Pro with Microsoft Office Hands-on Guide Using FileMaker Pro with Microsoft Office Making FileMaker Pro Your Office Companion page 1 Table of Contents Introduction... 3 Before You Get Started... 4 Sharing Data between FileMaker

More information

CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA

CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA INTRODUCTION DNA : DNA is deoxyribose nucleic acid. It is made up of a base consisting of sugar, phosphate and one nitrogen base.the

More information

DocAve 6 Service Pack 1 Job Monitor

DocAve 6 Service Pack 1 Job Monitor DocAve 6 Service Pack 1 Job Monitor Reference Guide Revision C Issued September 2012 1 Table of Contents About Job Monitor... 4 Submitting Documentation Feedback to AvePoint... 4 Before You Begin... 5

More information

Build Your First Web-based Report Using the SAS 9.2 Business Intelligence Clients

Build Your First Web-based Report Using the SAS 9.2 Business Intelligence Clients Technical Paper Build Your First Web-based Report Using the SAS 9.2 Business Intelligence Clients A practical introduction to SAS Information Map Studio and SAS Web Report Studio for new and experienced

More information

Adaptive Enterprise Solutions

Adaptive Enterprise Solutions Reporting User Guide Adaptive Enterprise Solutions 8401 Colesville Road Suite 450 Silver Spring, MD 20910 800.237.9785 Toll Free 301.589.3434 Voice 301.589.9254 Fax www.adsystech.com Version 5 THIS USER

More information

User Guide for TASKE Desktop

User Guide for TASKE Desktop User Guide for TASKE Desktop For Avaya Aura Communication Manager with Aura Application Enablement Services Version: 8.9 Date: 2013-03 This document is provided to you for informational purposes only.

More information

Microsoft Office System Tip Sheet

Microsoft Office System Tip Sheet Experience the 2007 Microsoft Office System The 2007 Microsoft Office system includes programs, servers, services, and solutions designed to work together to help you succeed. New features in the 2007

More information

Sequencing of DNA modifications

Sequencing of DNA modifications Sequencing of DNA modifications part of High-Throughput Analyzes of Genome Sequenzes Bioinformatics University of Leipzig Leipzig, WS 2014/15 Chemical modifications DNA modifications: 5-Methylcytosine

More information

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic

More information

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University Genotyping by sequencing and data analysis Ross Whetten North Carolina State University Stein (2010) Genome Biology 11:207 More New Technology on the Horizon Genotyping By Sequencing Timeline 2007 Complexity

More information

Legal Notes. Regarding Trademarks. 2012 KYOCERA Document Solutions Inc.

Legal Notes. Regarding Trademarks. 2012 KYOCERA Document Solutions Inc. Legal Notes Unauthorized reproduction of all or part of this guide is prohibited. The information in this guide is subject to change without notice. We cannot be held liable for any problems arising from

More information

WEB TRADER USER MANUAL

WEB TRADER USER MANUAL WEB TRADER USER MANUAL Web Trader... 2 Getting Started... 4 Logging In... 5 The Workspace... 6 Main menu... 7 File... 7 Instruments... 8 View... 8 Quotes View... 9 Advanced View...11 Accounts View...11

More information

Guide for Data Visualization and Analysis using ACSN

Guide for Data Visualization and Analysis using ACSN Guide for Data Visualization and Analysis using ACSN ACSN contains the NaviCell tool box, the intuitive and user- friendly environment for data visualization and analysis. The tool is accessible from the

More information

Release Notes DAISY 4.0

Release Notes DAISY 4.0 2010 Release Notes DAISY 4.0 NEW FEATURES Inactivate/Reactivate accounts and patients Enhanced treatment planning AutoRemind electronic appointment confirmation Copyright 2010. DAISY is a registered trademark

More information

The LSUHSC N.O. Email Archive

The LSUHSC N.O. Email Archive The LSUHSC N.O. Email Archive Introduction The LSUHSC N.O. email archive permanently retains a copy of all email items sent and received by LSUHSC N.O. Academic email users. Email items will be accessible

More information

Desktop, Web and Mobile Testing Tutorials

Desktop, Web and Mobile Testing Tutorials Desktop, Web and Mobile Testing Tutorials * Windows and the Windows logo are trademarks of the Microsoft group of companies. 2 About the Tutorial With TestComplete, you can test applications of three major

More information

TOPS v3.2.1 Calendar/Scheduler User Guide. By TOPS Software, LLC Clearwater, Florida

TOPS v3.2.1 Calendar/Scheduler User Guide. By TOPS Software, LLC Clearwater, Florida TOPS v3.2.1 Calendar/Scheduler User Guide By TOPS Software, LLC Clearwater, Florida Document History Version Edition Date Document Software Trademark Copyright First Edition Second Edition 02 2007 09-2007

More information

Next Generation Sequencing: Technology, Mapping, and Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took

More information

Working with the new enudge responsive email styles

Working with the new enudge responsive email styles Working with the new enudge responsive email styles This tutorial assumes that you have added one of the mobile responsive colour styles to your email campaign contents. To add an enudge email style to

More information

Appendix A How to create a data-sharing lab

Appendix A How to create a data-sharing lab Appendix A How to create a data-sharing lab Creating a lab involves completing five major steps: creating lists, then graphs, then the page for lab instructions, then adding forms to the lab instructions,

More information