goldminer Tutorial Introduction

Size: px
Start display at page:

Download "goldminer Tutorial Introduction"

Transcription

1 goldminer Tutorial Introduction Genomic sequencing or large-scale gene expression studies often produce a large number of sequence fragments. A major challenge in bioinformatics is to identify the function of these sequence fragments, a process commonly known as sequence annotation. This tutorial outlines the fundamental concepts in sequence annotation, the computational aspect of sequence annotation and, in particular, how to perform sequence annotation of a set of ESTs (expressed sequence tags) by using goldminer developed in Dr. Xuhua Xia s lab in University of Ottawa. There are two major categories of computational methods for sequence annotation. The first is based on known genes in molecular databases and uses homology searches. The best representatives of this category of methods are FASTA (Pearson and Lipman 1988) and BLAST (Altschul et al. 1990; Altschul et al. 1997). The second, best represented by GENSCAN (Burge and Karlin 1997), is based on known gene structures for pattern recognition by using two types of computational methods: the neural network algorithms and the hidden Markov model. Existing software for gene-finding often combine both approaches, e.g., GenMark (Hayes and Borodovsky 1998), GLIMMER (Salzberg et al. 1998), Orpheus (Frishman et al. 1998), Projector (Meyer and Durbin 2004) and YACOP (Tech and Merkl 2003). The first category of gene-finding method used to be unimportant when there are few genes in the gene dictionary. However, the gene dictionary has expanded dramatically in the last decades and it has now become rare for a new gene sequence to find no match in the public databases. Many sequence annotation platforms are now based mainly on the first category of method, especially sequence annotation platforms designed for EST annotation (Ayoubi et al. 2002; Davila et al. 2005; Koski et al. 2005; Mao et al. 2003; Martin et al. 2004; Paquola et al. 2003). Gene annotation in large-scale gene expression studies is similar to genome annotation in that both involve a large number of sequence fragments to be annotated and that both would search against databases of known genes. For search against protein databases, one would also need to translate the nucleotide sequences in six frames (3 frames in the input strand and another 3 in the complementary strand). There are a few unique features in EST annotation. First, an actively transcribed gene (say X) tends to have more copies of its RNA, and consequently will be cloned more often, than an inactive gene (say Y). However, the ESTs for gene X RNA may be identical (Fig. 1). For this reason, an EST annotation system is expected to have contig assembly function. goldminer implements the contig assembly algorithms developed by Huang (1992) with a few modifications to improve performance. 1 ATTTAATTAAACCACGGTAAGCC 2 ACGGTAAGCCTCAACCTTTTCC 3 CAATGCTGCT TTGATTTAATTAAACCACGGTAAGCCTCAACCTTTTCCATATGTGGTCAATGCTGCTTCC AACUAAAUUAAUUUGGUGCCAUUCGGAGUUGGAAAAGGUAUACACCAGUUACGACGAAGG 4 AGUUGGAAAAGGUAUAC EST1+ EST2+ EST4- : ATTTAATTAAACCACGGTAAGCCTCAACCTTTTCCATATG Fig. 1. A schematic illustration of a RNA sequence together with its reverse transcribed complementary strand (in italics). Four ESTs are derived from the RNA, with three collinear with the RNA and the fourth collinear with the complementary strand. Contig assembly will join ESTs 1-3, with the resulting contig named concisely as EST1+ EST2+ EST4- to indicate the fact that the contig is assembled from EST1, EST2 and the complementary strand of EST4. Correct functional annotation will show that all four ESTs map to the same gene, leading to the correct conclusion that the RNA was cloned four times. Large-scale gene expression studies typically will involve printing the ESTs onto a microarray chip. If the RNA from the highly expressed gene X is represented by 100 ESTs, then one naturally do not want to print all these ESTs into separate microarray cells because it would be too wasteful. Contig assembly and EST annotation allows one to know whether a certain gene has many EST representatives so that one can choose which ESTs to print. Now suppose that you study gene expression of goldfish brains and have already accumulated a large number of ESTs (expressed sequence tags). How are you going to know what gene products these ESTs code for? Naturally

2 you would first want search your sequences against a goldfish sequence database (if there is one). If there are only a few goldfish genes that have been sequenced, then you should search your goldfish sequences against the genome of a closely related species such as the zebrafish genome. If there are still goldfish sequences without a good match the zebrafish genome, then you should search against genomic databases of other vertebrate species. Sometimes a good match may mean nothing, e.g., a match against another EST or against an unannotated putative sequence. For such sequences, you will need to search against databases of protein functional classification such as pfam (Bateman et al. 1999; Bateman et al. 2004), SMART (Letunic et al. 2004), and COG (Tatusov et al. 2003). While these databases represent outstanding bioinformatics advancement in protein functional classification, there are three major problems with searching against these databases. First, protein families in each of these databases are overlapping subsets of proteins with known functions (Fig. 2). It would have been much nicer to cross-validate and merge all these databases together with the inclusion of all proteins with known functions. Second, a search against these databases may yield multiple matches in different protein families and you are left wondering which one represents a more accurate functional classification. Third, searching against these databases is typically slow, which handicaps a large-scale study of gene expression with thousands of ESTs. The Conserved Domain Database (CDD) was created to overcome (or at least alleviate) these three problems (Marchler-Bauer et al. 2005). The database imports protein families from pfam, SMART and COG, with cross-database validation and re-classification to increase the accuracy of the functional classification. It includes other proteins with known functions that are curated at NCBI but not included in other databases. To increase the searching speed, CDD uses the RPS-BLAST search engine whose speed is augmented by pre-computation of much of the output. pfam SMART Unincluded proteins with known function COG Fig. 2. SMART, COG and pfam include subsets of proteins with known functional classification. The size of the circles does not reflect the size of individual databases. The outermost circle represents CDD. The web API for CDD has not yet been formally released and goldminer is the first software package for the scientific community that automates the search against CDD. In this tutorial you will learn how to install goldminer and how to use it to annotate a sample set of EST sequences from goldfish. The goldminer program is large because I packed a zebrafish CDS database with it so that one can work with the sample EST sequences from goldfish by local BLASTing. In the latter part of the tutorial, one will learn how to retrieve the zebrafish CDS sequences from GenBank and how to create a local BLAST database. Objectives:

3 1. Learn to do quick and dirty sequence annotation by automated local BLASTing. 2. Learn to automate the slow but accurate and informative functional annotation against databases for protein families such as Conserved-Domain Database (CDD) and pfam. 3. Automated database search and annotation by searching against other NCBI-hosted databases. 4. Assemble contigs from sequence fragments. 5. Gain experience in creating local BLAST databases.. Procedures: Note: Please ignore the installation step below if goldminer is already installed on your computer. This tutorial is written not only for you, but also for others who need to do installation themselves. 1. Install Goldminer from Unless your computer is extremely old, all you need to do is just clicking the Goldminer.msi file and then click the Run button, following by a few more clicks on the Next button in response to ensuing dialog boxes. The default installation directory is C:\Program Files\Goldminer. Under this directory, three subdirectories are created during the installation process: a. Plate directory which contains a single sample file: CaNCBI.FAS with 42 sequences from 42 goldfish mrnas. You may put your own sequences into this same directory. b. BLASTDB directory which contains the sample zebrafish CDS BLAST files for you to practice local BLAST with the CaNCBI.FAS file. c. ESTDB directory which contains files with annotated sequences. (This directory may be missing in your installation) 2. Sequence annotation by local BLAST. a. Open EST sequence file i. Click Start All Programs Goldminer to start the program ii. Click Tools Options to set the program defaults (You do not need to do this if you use the Goldminer default). The EST plate directory is where you should store your unannotated sequence files. The default is GoldminerDir\Plate (where GoldminerDir is the Goldminer installation directory, being C:\Program Files\Goldminer by default). The EST database directory stores files containing annotated or partially annotated sequences, and the default is GoldminerDir\ESTDB. The BLAST program directory is where the BLAST programs are located and you are advised to leave it as the default. The BLAST database directory is where you have stored your personal local BLAST databases, and the default is GoldminerDir\BLASTDB. The default input file format is FASTA but you can set it to another format. A simple guideline of sequence naming is to use a combination of plate ID and well ID, e.g., iii. iv. >A1 AACACAGGUUUA where A1 designate the coordinates of the cloning plate. There are two types of plate in current use: the 96-well plate (i.e., with column heading from A to H and rows from 1 to 12) and the 384-well plate (i.e., with column headings from A to P and rows from 1 to 24). Of course one does not have to use the coordinates as the sequence name, but the naming convention helps associate the sequence with its physical location. Click File Open plate files to read in the CaNCBI file. Goldminer can recognize many different sequence formats, but FASTA format is the most frequently used sequence format in gene expression studies. Hence the default of FASTA format. The sequence will be displayed. At this point, we do not know what genes these sequences are, and most columns are blank. b. Local BLAST. All these searches take time. If you have many sequence fragments to annotate and you need to know their approximate functions quickly, then speed is a prime consideration. Remote searching is always slow, so one should do remote BLAST only with a subset of sequences that do not find matches by local BLAST. For this reason you should always create and install local databases to facilitate your search. You are advised to always do local BLAST first so that only a small fraction of the sequences will then be searched against remote BLAST databases in NCBI. This reduces the chance of overloading the NCBI BLAST server. i. Click BLAST ReBlast against genomic DB. ii. A dialog appears. In the EST option, click ReBlast All ESTs. In the bottom frame, leave the default unchanged, i.e., Blast against local database.

4 iii. Specify the local database by clicking the Browse button. If you keep the default, you will see the zebrafish.rna file. Double-click it to set. iv. Set other BLAST parameters if necessary. Leave as default if you do not know what they mean. v. Click the Done button to start local BLASTing. Once the BLASTing is finished (it may take quite a while depending on your computer speed), you will see the output with some ESTs annotated with goldfish genes. vi. Many of the ESTs have now been annotated against zebrafish genes, with highly significant e-values. You may note that sequence A2 has no match. c. A few hidden functions i. Now right click anywhere in the Matched gene column and click Find. In the dialog, enter casein kinase 1 (without quotes) and click OK. You will find three genes (D5-D7) highlighted in red (you may have to scroll down to see them). If the sequences are from your own cloning experiment, this would mean that the transcript of the casein kinase 1 gene has been cloned multiple times, and they all match the same zebrafish casein kinase 1 gene (NM_ ). This provides useful information in two ways. First, the casein kinase 1 gene in goldfish must be highly expressed in the brain tissue. Second, if you are study gene expression by spotted cdna microarray, then there is no need to spot these replicate clones of the casein kinase 1 transcripts into multiple sets of probe ii. cells. Only one set of probe cells is sufficient. Now right click the GeneID entry for sequence D5 (i.e., NM_ ) and click GenBank Sequence. The annotated zebrafish casein kinase 1 gene is displayed for you to obtain further information about the gene. iii. You may also right click the GeneID entry for sequence D5 (i.e., NM_ ) and then click Show HSP (HSP stands for high-scoring sequence pair) to see the details of the matched segments. iv. Click File Save to save the sequences. Whenever possible, provide an informative file name and save it to your own personal working directory 3. Remote search against the CDD database hosted at NCBI. a. Click Func.Pred. CD Search and set the parameters (which are self-explanatory) in the ensuing dialog box. If you do not know them, just use the default. In particular, you should not change the URL for CDD hosted in NCBI unless (in the very unlikely case) you have local mirror of the NCBI databases. b. You will be asked to specify a translation table. This is because the CDD database is a protein database and we need to translate our nucleotide sequences into protein sequences. All known translation tables have been implemented in Goldminer. For our sequences, the first (Standard) translation table should be used. Goldminer will then translate each sequence in three frames and search all of them against the CDD database. c. The checkbox Use complementary sequence is for database search using the complementary of the EST sequences. For the first run, you should leave it cleared. If the input sequences find no match, then you can run the search again by checking this check box. d. Click the Submit button to start. It may take a long time to finish depending on how many loaded the CDD server is. A progress bar is implemented. e. Once the search is complete, most sequences would have been functionally annotated. A few of them will find no match. These may represent genes new to science. f. Click File Save annotation with sequences to save the sequence annotations. Given the long waiting time that you have suffered through, it would be silly not to save the results in a secure directory. g. Click Func.Pred. CD Search again. In the ensuing dialog box, select CD-Search sequences with no match, and check the Use the complementary strand check box. h. Click the Submit button to start search CDD database using the complementary strand of those ESTs with no match. i. Once the search is complete, click File Save annotation with sequences to save the sequence annotations. j. Right-click anything in the CDSID column and then click CD-Search Gene will take you to the CDD seed protein in its function group. For example, right-click the first CDSD, i.e., DEAD, will take you to the full annotation of the DEAD/DEAH box helicase at the CDD server. 4. Search against a pfam server. Searching against a remote pfam server is slow. If you have many ESTs and really have to use pfam, then you should have a locally installed pfam server. Searching against pfam may not yield anything new after you have already searched against the CDD database). a. Click Func.Pred. pfam and set the parameters. If you do not know them, just use the default. b. Click the Submit button to start. It may take a long time to finish. So a progress bar is implemented. c. Once the search is complete, most sequences would have been functionally annotated. A few of them will find no match, which may be due to wrong translation.

5 d. Click File Save annotation with sequences to save the sequences. e. Click Func.Pred. pfam again. In the ensuing dialog box, select CD-Search sequences with no match, and check the Use the complementary strand check box. f. Click the Submit button to start. g. Click File Save annotation with sequences to save your file. h. Right-click anything in the pfamid column and then click pfam Gene will take you to the pfam seed protein in its function group. For example, right-click the first pfamid, i.e., DEAD, will take you to the full annotation of the DEAD/DEAH box helicase at pfam server. 5. Remote BLAST against NCBI database: For sequences with no match after searching against all the databases for protein functional classification, your last resort is to search against GenBank in the hope of getting a match with some information for functional inference. It is often impractical to store all databases locally because of the sheer amount of disk space need and because it is very difficult to keep updating these terabyte-size databases. So we will take advantage of the regularly updated databases maintained at NCBI. a. Click BLAST ReBlast against genomic DB. b. A dialog appears. In the EST option, set the option to ReBlast ESTs with e-value greater than 0.01 (or smaller). In the bottom frame, choose the option to Blast against NCBI databases. What is an e-value? What does the default e-value of 0.01 mean? c. Specify the NCBI database (or just leave the default of nr which stands for non-redundant) and set other BLAST parameters if necessary (or just use the default value). d. Click the BLAST button to start BLASTing against the chosen NCBI database. Note that the NCBI BLAST server often needs to handle thousands of queries per hour, and is prone to being flooded. We could be selfish and send all queries to BLAST quickly, but selfishness is incompatible with a civilized society. So Goldminer will send only one query EST at a time and do not send another until the first has been processed. This guarantees that NCBI will never identify us as bad citizens (or the Goldminer programmer an inconsiderate scientist). You can leave Goldminer to do its job and go about other businesses. Because of the slowness, a progress bar is implemented to alleviate your frustration (No progress bar is implemented for the local BLAST which is fairly fast). e. Once the BLASTing is over, those sequences that do not have matches or have only poor matches may find new matches or stay the same as before. You may note that A2, which does not have a match before after local BLAST, now has a good match. At this point there is still no information on functional classification. f. Click File Save to save the sequences. 6. Create local BLAST file for local BLASTing a. Launch your WWW browser to b. Type in Carassius auratus (which is the Latin name for goldfish) or more general taxonomic terms in the search box c. Click Limit and set the Limit to dropdown box to Organism d. Click Go to search the goldfish sequences. You will get a list of at least 700 goldfish sequences e. In the Display dropdown box, choose Fasta. f. In the Send to dropdown box, choose File, and save the sequences to Goldfish.FAS. The file can reside in any directory but for consistence we will save it in the same directory as the goldminer program. g. Back to goldminer, click BLAST Format genomic BLAST DB. In the ensuing dialog box, enter Goldfish or anything meaningful as in Title for database file and Base name for BLAST file boxes. Click Add files to be formated and browse to where you have saved the Goldfish.FAS file. Highlight the file name so that it will appear in the File name textbox. Click OK. h. Click the Go button to format the file. The display panel will tell you that the formatting is complete. i. To use this new local goldfish BLAST file for local BLASTing, repeat 3.a-b, except that teh local BLAST file will be Goldfish.nsq instead of zebrafish.rna.nsq. 7. Contig assembly. The contig assembly function is independent of the functional annotation. It is a bad idea to assemble the sequences and then perform functional annotation on the reduced number of sequences. This is because two overlapping EST may NOT necessarily be from the same transcribed RNA and may belong to different protein families. a. Click Sequence Contig Assembly b. A dialog box appears. Here is a brief explanation just in case you do not know the meaning of the contig assembly options.: i. Sequence quality options: The beginning and ending of a sequence fragment is less reliable than the middle section and may have many base-calling errors. All automatic DNA sequencers come with base-calling software that will perform an analysis of base-calling quality and let you know the

6 ii. iii. bases that are inferred with little confidence. Some base-calling software may allow you to set the option to trim off the unreliable ends. In that case you should change the default 20 and 500 to 1 and a number greater than the length of the longest sequence fragment, respectively. In other words, you are telling DAMBE that every base in the input sequences is good. Alignment parameters: The gap open penalty of 0 specifies local sequence alignment. The Gap extension and Mismatch score are the penalties against gap extension and mismatch. For sequences with no base-calling error, there should be no gap or mismatch, and gaps and mismatches should be penalized severely (Hence -6 and -6, respectively, by default). Base-calling errors increase the chance of has gaps and mismatches in local sequence alignment. Hence the reduced penalty for both (-2 and -3, respectively, by default). Decision parameters: These are parameters for heuristic string matching algorithms that increase the speed of computation. You may leave them as default. c. Now click the Go button. The contig assembly will be performed automatically. d. For sequence fragments that have been merged into one, the new sequence name will be in the form of SeqName1+ SeqName2+SeqName3..., meaning EST1 has its 3 -end overlapping the 5 -end of EST2 which in turn has its 3 -end overlapping the 5 -end of the complementary strand (indicated by the - sign. + means the original input sequence) of EST3 and so on. If one sequence is entirely embedded in another sequence, then the former is omitted in the new name. If the result is from the sample file, CaNCBI.fas, then H4 is entirely embedded inside A10 and H4 will not appear in the name of the assembled contig. e. Keep in mind that an assembled contig is a hypothesized neighbor relationship among the ESTs and may not be correct. Look at the detailed output instead of believe in the output blindly. 8. A few miscellaneous items: a. The column width can be user-resized. b. The last column is for custom annotation. c. Clicking the top-left cell highlights the entire sheet. Clicking a column or row heading highlights the entire column or row, respectively. 9. There are a number of functions accessible from the popup menu: a. If a sequence tag (e.g., Seq1 above) does not have a sequence entry but you obtained the sequence latter and wish to add it in, just left-click the sequence name (at first column) to highlight the entire row and then right-click to access the popup menu. Click 'Change sequence' to add the new sequence information. b. To append an entry: Right-click to access the popup menu and then click 'Append a row'. c. To copy an entire sheet, a column, a row or a cell (e.g., to EXCEL), first select it and then right-click to access the popup menu and then click Copy. References: Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology 215: Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: Ayoubi P, Jin X, Leite S, Liu X, Martajaja J, Abduraham A, Wan Q, Yan W, Misawa E, Prade RA (2002) PipeOnline 2.0: automated EST processing and functional data sorting. Nucleic Acids Res 30: Bateman A, Birney E, Durbin R, Eddy SR, Finn RD, Sonnhammer EL (1999) Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. Nucleic Acids Res 27: Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR (2004) The Pfam protein families database. Nucleic Acids Res 32:D Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic dna. J. Mol. Biol. 268:78-94 Davila AM, Lorenzini DM, Mendes PN, Satake TS, Sousa GR, Campos LM, Mazzoni CJ, Wagner G, Pires PF, Grisard EC, Cavalcanti MC, Campos ML (2005) GARSA: genomic analysis resources for sequence annotation. Bioinformatics 21: Frishman D, Mironov A, Mewes HW, Gelfand M (1998) Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 26: Hayes WS, Borodovsky M (1998) How to interpret an anonymous bacterial genome: machine learning approach to gene identification. Genome Res 8: Huang XQ (1992) A Contig Assembly Program Based on Sensitive Detection of Fragment Overlaps. Genomics 14:18-25 Koski LB, Gray MW, Lang BF, Burger G (2005) AutoFACT: an automatic functional annotation and classification tool. BMC Bioinformatics 6:151

7 Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J, Ponting CP, Bork P (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res 32:D Mao C, Cushman JC, May GD, Weller JW (2003) ESTAP--an automated system for the analysis of EST data. Bioinformatics 19: Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Liebert CA, Liu C, Lu F, Marchler GH, Mullokandov M, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Yamashita RA, Yin JJ, Zhang D, Bryant SH (2005) CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res 33:D Martin DM, Berriman M, Barton GJ (2004) GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics 5:178 Meyer IM, Durbin R (2004) Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res 32: Paquola AC, Nishyiama MY, Jr., Reis EM, da Silva AM, Verjovski-Almeida S (2003) ESTWeb: bioinformatics services for EST sequencing projects. Bioinformatics 19: Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85: Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26: Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41 Tech M, Merkl R (2003) YACOP: Enhanced gene prediction obtained by a combination of existing methods. In Silico Biol 3:

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

Version 5.0 Release Notes

Version 5.0 Release Notes Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

DNA Sequencing Overview

DNA Sequencing Overview DNA Sequencing Overview DNA sequencing involves the determination of the sequence of nucleotides in a sample of DNA. It is presently conducted using a modified PCR reaction where both normal and labeled

More information

GenBank, Entrez, & FASTA

GenBank, Entrez, & FASTA GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,

More information

Module 1. Sequence Formats and Retrieval. Charles Steward

Module 1. Sequence Formats and Retrieval. Charles Steward The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.

More information

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

BLAST. Anders Gorm Pedersen & Rasmus Wernersson BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise

More information

Guide for Bioinformatics Project Module 3

Guide for Bioinformatics Project Module 3 Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

More information

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some

More information

Basic Analysis of Microarray Data

Basic Analysis of Microarray Data Basic Analysis of Microarray Data A User Guide and Tutorial Scott A. Ness, Ph.D. Co-Director, Keck-UNM Genomics Resource and Dept. of Molecular Genetics and Microbiology University of New Mexico HSC Tel.

More information

SonicWALL CDP 5.0 Microsoft Exchange InfoStore Backup and Restore

SonicWALL CDP 5.0 Microsoft Exchange InfoStore Backup and Restore SonicWALL CDP 5.0 Microsoft Exchange InfoStore Backup and Restore Document Scope This solutions document describes how to configure and use the Microsoft Exchange InfoStore Backup and Restore feature in

More information

Searching Nucleotide Databases

Searching Nucleotide Databases Searching Nucleotide Databases 1 When we search a nucleic acid databases, Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from the forward strand and 3 reading frames

More information

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249

More information

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/ CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu [email protected] 1. Introduction

More information

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each

More information

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:

More information

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment [email protected] SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

Yale Pseudogene Analysis as part of GENCODE Project

Yale Pseudogene Analysis as part of GENCODE Project Sanger Center 2009.01.20, 11:20-11:40 Mark B Gerstein Yale Illustra(on from Gerstein & Zheng (2006). Sci Am. (c) Mark Gerstein, 2002, (c) Yale, 1 1Lectures.GersteinLab.org 2007bioinfo.mbb.yale.edu Yale

More information

Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 [email protected]

More information

Introduction to Genome Annotation

Introduction to Genome Annotation Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT

More information

Algorithms in Bioinformatics I, WS06/07, C.Dieterich 47. This lecture is based on the following, which are all recommended reading:

Algorithms in Bioinformatics I, WS06/07, C.Dieterich 47. This lecture is based on the following, which are all recommended reading: Algorithms in Bioinformatics I, WS06/07, C.Dieterich 47 5 BLAST and FASTA This lecture is based on the following, which are all recommended reading: D.J. Lipman and W.R. Pearson, Rapid and Sensitive Protein

More information

Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing

Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing KOO10 5/31/04 12:17 PM Page 131 10 Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing Sandra Porter, Joe Slagel, and Todd Smith Geospiza, Inc., Seattle, WA Introduction The increased

More information

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,

More information

Central Management Software CV3-M1024

Central Management Software CV3-M1024 Table of Contents Chapter 1. User Interface Overview...5 Chapter 2. Installation...6 2.1 Beginning Installation...6 2.2 Starting the CMS software...10 2.3 Starting it from the Start menu...10 2.4 Starting

More information

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011 Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear

More information

A Tutorial in Genetic Sequence Classification Tools and Techniques

A Tutorial in Genetic Sequence Classification Tools and Techniques A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University [email protected] www.jakemdrew.com Sequence Characters IUPAC nucleotide

More information

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,

More information

Frequently Asked Questions Next Generation Sequencing

Frequently Asked Questions Next Generation Sequencing Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript

More information

AS4.1 190509 Replaces 260806 Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

AS4.1 190509 Replaces 260806 Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO): Replaces 260806 Page 1 of 50 ATF Software for DNA Sequencing Operators Manual Replaces 260806 Page 2 of 50 1 About ATF...5 1.1 Compatibility...5 1.1.1 Computer Operator Systems...5 1.1.2 DNA Sequencing

More information

Biological Sequence Data Formats

Biological Sequence Data Formats Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA

More information

BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs

BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs Richard J. Edwards 2008. Contents 1. Introduction... 2 1.1. Version...2 1.2. Using this Manual...2 1.3. Why use BUDAPEST?...2

More information

Webserver: bioinfo.bio.wzw.tum.de Mail: [email protected]

Webserver: bioinfo.bio.wzw.tum.de Mail: w.mewes@weihenstephan.de Webserver: bioinfo.bio.wzw.tum.de Mail: [email protected] About me H. Werner Mewes, Lehrstuhl f. Bioinformatik, WZW C.V.: Studium der Chemie in Marburg Uni Heidelberg (Med. Fakultät, Bioenergetik)

More information

BIOINFORMATICS TUTORIAL

BIOINFORMATICS TUTORIAL Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.

More information

Exercises for the UCSC Genome Browser Introduction

Exercises for the UCSC Genome Browser Introduction Exercises for the UCSC Genome Browser Introduction 1) Find out if the mouse Brca1 gene has non-synonymous SNPs, color them blue, and get external data about a codon-changing SNP. Skills: basic text search;

More information

Clone Manager. Getting Started

Clone Manager. Getting Started Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software

More information

Analysis of ChIP-seq data in Galaxy

Analysis of ChIP-seq data in Galaxy Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers

More information

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: [email protected]

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

More information

SPSS: Getting Started. For Windows

SPSS: Getting Started. For Windows For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 Introduction to SPSS Tutorials... 3 1.2 Introduction to SPSS... 3 1.3 Overview of SPSS for Windows... 3 Section 2: Entering

More information

ProSightPC 3.0 Quick Start Guide

ProSightPC 3.0 Quick Start Guide ProSightPC 3.0 Quick Start Guide The Thermo ProSightPC 3.0 application is the only proteomics software suite that effectively supports high-mass-accuracy MS/MS experiments performed on LTQ FT and LTQ Orbitrap

More information

CRM Migration Manager 3.1.1 for Microsoft Dynamics CRM. User Guide

CRM Migration Manager 3.1.1 for Microsoft Dynamics CRM. User Guide CRM Migration Manager 3.1.1 for Microsoft Dynamics CRM User Guide Revision D Issued July 2014 Table of Contents About CRM Migration Manager... 4 System Requirements... 5 Operating Systems... 5 Dynamics

More information

How To Use Senior Systems Cloud Services

How To Use Senior Systems Cloud Services Senior Systems Cloud Services In this guide... Senior Systems Cloud Services 1 Cloud Services User Guide 2 Working In Your Cloud Environment 3 Cloud Profile Management Tool 6 How To Save Files 8 How To

More information

Linear Sequence Analysis. 3-D Structure Analysis

Linear Sequence Analysis. 3-D Structure Analysis Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic

More information

Unemployment Insurance Data Validation Operations Guide

Unemployment Insurance Data Validation Operations Guide Unemployment Insurance Data Validation Operations Guide ETA Operations Guide 411 U.S. Department of Labor Employment and Training Administration Office of Unemployment Insurance TABLE OF CONTENTS Chapter

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

Writer Guide. Chapter 15 Using Forms in Writer

Writer Guide. Chapter 15 Using Forms in Writer Writer Guide Chapter 15 Using Forms in Writer Copyright This document is Copyright 2005 2008 by its contributors as listed in the section titled Authors. You may distribute it and/or modify it under the

More information

Analyzing A DNA Sequence Chromatogram

Analyzing A DNA Sequence Chromatogram LESSON 9 HANDOUT Analyzing A DNA Sequence Chromatogram Student Researcher Background: DNA Analysis and FinchTV DNA sequence data can be used to answer many types of questions. Because DNA sequences differ

More information

Results CRM 2012 User Manual

Results CRM 2012 User Manual Results CRM 2012 User Manual A Guide to Using Results CRM Standard, Results CRM Plus, & Results CRM Business Suite Table of Contents Installation Instructions... 1 Single User & Evaluation Installation

More information

ACCESS 2007. Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818) 677-1700

ACCESS 2007. Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818) 677-1700 Information Technology MS Access 2007 Users Guide ACCESS 2007 Importing and Exporting Data Files IT Training & Development (818) 677-1700 [email protected] TABLE OF CONTENTS Introduction... 1 Import Excel

More information

Appendix A How to create a data-sharing lab

Appendix A How to create a data-sharing lab Appendix A How to create a data-sharing lab Creating a lab involves completing five major steps: creating lists, then graphs, then the page for lab instructions, then adding forms to the lab instructions,

More information

TechTips. Connecting Xcelsius Dashboards to External Data Sources using: Web Services (Dynamic Web Query)

TechTips. Connecting Xcelsius Dashboards to External Data Sources using: Web Services (Dynamic Web Query) TechTips Connecting Xcelsius Dashboards to External Data Sources using: Web Services (Dynamic Web Query) A step-by-step guide to connecting Xcelsius Enterprise XE dashboards to company databases using

More information

Chapter 15 Using Forms in Writer

Chapter 15 Using Forms in Writer Writer Guide Chapter 15 Using Forms in Writer OpenOffice.org Copyright This document is Copyright 2005 2006 by its contributors as listed in the section titled Authors. You can distribute it and/or modify

More information

Prediction Analysis of Microarrays in Excel

Prediction Analysis of Microarrays in Excel New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Excel Companion. (Profit Embedded PHD) User's Guide

Excel Companion. (Profit Embedded PHD) User's Guide Excel Companion (Profit Embedded PHD) User's Guide Excel Companion (Profit Embedded PHD) User's Guide Copyright, Notices, and Trademarks Copyright, Notices, and Trademarks Honeywell Inc. 1998 2001. All

More information

XStream Remote Control: Configuring DCOM Connectivity

XStream Remote Control: Configuring DCOM Connectivity XStream Remote Control: Configuring DCOM Connectivity APPLICATION BRIEF March 2009 Summary The application running the graphical user interface of LeCroy Windows-based oscilloscopes is a COM Automation

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

Creating and Using Forms in SharePoint

Creating and Using Forms in SharePoint Creating and Using Forms in SharePoint Getting started with custom lists... 1 Creating a custom list... 1 Creating a user-friendly list name... 1 Other options for creating custom lists... 2 Building a

More information

Genome Explorer For Comparative Genome Analysis

Genome Explorer For Comparative Genome Analysis Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence

More information

Verizon Remote Access User Guide

Verizon Remote Access User Guide Version 17.12 Last Updated: August 2012 2012 Verizon. All Rights Reserved. The Verizon names and logos and all other names, logos, and slogans identifying Verizon s products and services are trademarks

More information

Qualtrics Survey Tool

Qualtrics Survey Tool Qualtrics Survey Tool This page left blank intentionally. Table of Contents Overview... 5 Uses for Qualtrics Surveys:... 5 Accessing Qualtrics... 5 My Surveys Tab... 5 Survey Controls... 5 Creating New

More information

Crystal Reports Payroll Exercise

Crystal Reports Payroll Exercise Crystal Reports Payroll Exercise Objective This document provides step-by-step instructions on how to build a basic report on Crystal Reports XI on the MUNIS System supported by MAISD. The exercise will

More information

The Power Loader GUI

The Power Loader GUI The Power Loader GUI (212) 405.1010 [email protected] Follow: @1010data www.1010data.com The Power Loader GUI Contents 2 Contents Pre-Load To-Do List... 3 Login to Power Loader... 4 Upload Data Files to

More information

Commander. The World's Leading Software for Label, Barcode, RFID & Card Printing

Commander. The World's Leading Software for Label, Barcode, RFID & Card Printing The World's Leading Software for Label, Barcode, RFID & Card Printing Commander Middleware for Automatically Printing in Response to User-Defined Events Contents Overview of How Commander Works 4 Triggers

More information

Decision Support AITS University Administration. Web Intelligence Rich Client 4.1 User Guide

Decision Support AITS University Administration. Web Intelligence Rich Client 4.1 User Guide Decision Support AITS University Administration Web Intelligence Rich Client 4.1 User Guide 2 P age Web Intelligence 4.1 User Guide Web Intelligence 4.1 User Guide Contents Getting Started in Web Intelligence

More information

Release Information. Copyright. Limit of Liability. Trademarks. Customer Support

Release Information. Copyright. Limit of Liability. Trademarks. Customer Support Release Information Document Version Number GeneticistAsst-1.1.6-UG002 Software Version 1.1.6 Document Status Final Copyright 2015. SoftGenetics, LLC, All rights reserved. The information contained herein

More information

17 July 2014 WEB-SERVER MANUAL. Contact: Michael Hackenberg ([email protected])

17 July 2014 WEB-SERVER MANUAL. Contact: Michael Hackenberg (hackenberg@ugr.es) WEB-SERVER MANUAL Contact: Michael Hackenberg ([email protected]) 1 1 Introduction srnabench is a free web-server tool and standalone application for processing small- RNA data obtained from next generation

More information

SonicWALL CDP 5.0 Microsoft Exchange User Mailbox Backup and Restore

SonicWALL CDP 5.0 Microsoft Exchange User Mailbox Backup and Restore SonicWALL CDP 5.0 Microsoft Exchange User Mailbox Backup and Restore Document Scope This solutions document describes how to configure and use the Microsoft Exchange User Mailbox Backup and Restore feature

More information

Contents Overview... 5 Configuring Project Management Bridge after Installation... 9 The Project Management Bridge Menu... 14

Contents Overview... 5 Configuring Project Management Bridge after Installation... 9 The Project Management Bridge Menu... 14 Portfolio Management Bridge for Microsoft Office Project Server User's Guide June 2015 Contents Overview... 5 Basic Principles and Concepts... 5 Managing Workflow... 7 Top-Down Management... 7 Project-Based

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

Schools Remote Access Server

Schools Remote Access Server Schools Remote Access Server This system is for school use only. Not for personal or private file use. Please observe all of the school district IT rules. 6076 State Farm Rd., Guilderland, NY 12084 Phone:

More information

Ingenious Testcraft Technical Documentation Installation Guide

Ingenious Testcraft Technical Documentation Installation Guide Ingenious Testcraft Technical Documentation Installation Guide V7.00R1 Q2.11 Trademarks Ingenious, Ingenious Group, and Testcraft are trademarks of Ingenious Group, Inc. and may be registered in the United

More information

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 402 A Multiple DNA Sequence Translation Tool Incorporating Web

More information

Instructions for Creating an Outlook E-mail Distribution List from an Excel File

Instructions for Creating an Outlook E-mail Distribution List from an Excel File Instructions for Creating an Outlook E-mail Distribution List from an Excel File 1.0 Importing Excel Data to an Outlook Distribution List 1.1 Create an Outlook Personal Folders File (.pst) Notes: 1) If

More information

Consensus alignment server for reliable comparative modeling with distant templates

Consensus alignment server for reliable comparative modeling with distant templates W50 W54 Nucleic Acids Research, 2004, Vol. 32, Web Server issue DOI: 10.1093/nar/gkh456 Consensus alignment server for reliable comparative modeling with distant templates Jahnavi C. Prasad 1, Sandor Vajda

More information

Biological Databases and Protein Sequence Analysis

Biological Databases and Protein Sequence Analysis Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to

More information

FORMS. Introduction. Form Basics

FORMS. Introduction. Form Basics FORMS Introduction Forms are a way to gather information from people who visit your web site. Forms allow you to ask visitors for specific information or give them an opportunity to send feedback, questions,

More information

Integration of data management and analysis for genome research

Integration of data management and analysis for genome research Integration of data management and analysis for genome research Volker Brendel Deparment of Zoology & Genetics and Department of Statistics Iowa State University 2112 Molecular Biology Building Ames, Iowa

More information

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE AP Biology Date SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE LEARNING OBJECTIVES Students will gain an appreciation of the physical effects of sickle cell anemia, its prevalence in the population,

More information

Excel Database Management Microsoft Excel 2003

Excel Database Management Microsoft Excel 2003 Excel Database Management Microsoft Reference Guide University Technology Services Computer Training Copyright Notice Copyright 2003 EBook Publishing. All rights reserved. No part of this publication may

More information

Step by Step Guide to Importing Genetic Data into JMP Genomics

Step by Step Guide to Importing Genetic Data into JMP Genomics Step by Step Guide to Importing Genetic Data into JMP Genomics Page 1 Introduction Data for genetic analyses can exist in a variety of formats. Before this data can be analyzed it must imported into one

More information

Sequence homology search tools on the world wide web

Sequence homology search tools on the world wide web 44 Sequence Homology Search Tools Sequence homology search tools on the world wide web Ian Holmes Berkeley Drosophila Genome Project, Berkeley, CA email: [email protected] Introduction Sequence homology

More information