BIOINFORMATICS TUTORIAL

Size: px
Start display at page:

Download "BIOINFORMATICS TUTORIAL"

Transcription

1 Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT. A pdf of this document is available on the bio 242 website. Acknowledgements The Bates Bioinformatics Tutorial was originally developed as part of the Collaborative Technologies Development project. David Asanuma ('09) created the site under the guidance of Nancy Kleckner, Associate Professor of Biology, and Michael Hanrahan, Assistant Director of Research and Curricular Computing. Revision of the content is performed annually by Greg Anderson and Carolyn Lawson to keep the document up to date with the website. Page 1 of 26 Bioinformatics Tutorial (rev )

2 Bioinformatics Tutorial Bioinformatics is the acquisition, storage, arrangement, identification, analysis, and communication of information related to biology. The term was coined in 1990 with the use of computers in DNA sequence analysis. Think of it as the theoretical branch of molecular biology like the relationship of theoretical physics to the general field of physics. Now that you have obtained information about some of the chemical properties of α amylase, in this exercise you will be comparing the molecular structure of the enzyme among the three (or more!) species. The tutorial will guide you through finding the gene sequences using both the Entrez search and BLAST tools, and then comparing them using the Clustal Omega tool. You will be using the DNA and protein sequence on line databases that are the core of bioinformatics. There are two general types of sequence databases: Primary databases contain experimental results in an accessible format, but are not sequences that are a population consensus. DDBJ, EMBL, and GenBank are primary databases. Secondary databases are curated to reflect consensus sequences from multiple experiments and usually use the primary databases as their sources. Abbreviations DDBJ DNA Databank of Japan EMBL European Molecular Biology Laboratory NCBI National Center for Biotechnology Information BLAST Basic local alignment search tool The standard sequence format is called FASTA. All FASTA sequences start with a definition line which consists of: a unique identification number (the accession number) the version number of the sequence the length of the sequence molecule type (DNA or mrna) taxonomic division (for instance, INV = invertebrate) last release date source organism Every coding sequence also has a unique protein number assigned to it, starting with AA. Reference sequences (which undergo continuing curation) are the most complete and up to date and always start with NT for DNA, NM for mrna, or NP for protein. Hint these are the ones you want to use if possible. Sequence Search Introduction Entrez Entrez is a data retrieval system developed by the National Center for Biotechnology Information (NCBI) that provides integrated access to a wide range of data domains, including literature, nucleotide and protein sequences, complete genomes, three dimensional structures, and more. Entrez includes powerful search features that retrieve not only the exact search results but also related records within a data domain that might not be retrieved otherwise and associated records across data domains. These features enable us to gather previously disparate pieces of an information puzzle for a topic of interest. Page 2 of 26 Bioinformatics Tutorial (rev )

3 Effective and powerful use of Entrez requires an understanding of the available data domains, the variety of data sources and types within each domain, and Entrez s advanced search features. This tutorial uses corn (Zea mays) alpha amylase to demonstrate the wide variety of information that we can rapidly gather for a single gene. The numbers noted in the search results will of course change over time as the databases grow. The same techniques shown here can be used for any topic of interest. The search goals are to: separate the wheat from the chaff identifying a representative, well annotated mrna or protein sequence record retrieve associated literature identify conserved domains within the protein identify similar proteins find a resolved three dimensional structure for the protein or, in its absence, identify structures with homologous sequence Perform VAST alignments of 3d structures of plant and animal amylases to visualize where similarities and differences occur. Let s get started! Go to the NCBI website by entering the URL in the address field of your browser. After accessing the NCBI website, you may now search for corn alpha amylase sequences in either the nucleotide or protein databases by selecting one or the other from the Database dropdown menu. Other points of interest on the NCBI Home Page are the PubMed link, which allows you to search for journal articles on the structure and function of alpha amylases, and the BLAST link, which allows you to search for nucleotide or protein sequences with similarity to your sequence of interest. For now, make sure you are at the NCBI home page (click on the NCBI icon in the upper left of the NCBI page to be sure), and choose "Protein" from the search drop down databases menu. Type "Zea mays alpha amylase" in the line below. These selections are illustrated in Figure 1 (next page). Click "Search" to proceed. Page 3 of 26 Bioinformatics Tutorial (rev )

4 Figure 1. NCBI home page from Entrez searches can be performed. Search results: Fig. 2 shows a typical results page for this search. Yours should look similar, but might be a little different depending on what new information has arrived since the screen shot was made. The sequence of interest has the accession number (identifier) AAA It is highlighted in the screen shot. How do you know this is the one you want? Click on the accession number and study the page that comes up. It should be identical to the one shown in Fig. 3. Figure 2. Typical search results for protein sequences. Page 4 of 26 Bioinformatics Tutorial (rev )

5 Figure 3. Typical record for a typical accession number record. In Figure 3, take note of the DEFINITION, SOURCE and ORGANISM, AUTHORS of the sequence, and the TITLE and JOURNAL name of the article published about it. If you don t already have this article, you can retrieve it simply by clicking on the PUBMED number (in the live window) and print the PDF version. Then find your way back to the results page. Page 5 of 26 Bioinformatics Tutorial (rev )

6 Skip down through the FEATURES and note the ORIGIN section, which gives you the amino acid sequence of your protein. This is the sequence we ll use in a BLAST search, but the default format is not particularly helpful. All further processing of the sequence information requires that the sequence be in FASTA format. FASTA Format: Conversion of the sequence to a universal format Scroll to the top of your results page and note the Display drop down box with "GenPept" selected. The GenPept format is the default setting and gives you all of the information we discussed above. However, the FASTA format is more useful for BLAST searches and alignments of sequences. Select FASTA from the menu as illustrated in Fig 4. Your results should appear like the screen shot in Fig. 5. You now see less information: just the accession number followed by a brief descriptor, and the amino acid sequence preceded by some identifying information. Figure 4. Click FASTA to convert the sequence to proper format for further searching. Figure 5. FASTA conversion results. In the live window, highlight and copy the complete amino acid sequence along with the identifying information (>gi ). From your start menu bring up NotePad and paste the FASTA sequence into the window. You will use this sequence in a BLAST search to identify other amino acid sequences in the NCBI databases with similarity to your sequence. Note that many of the relevant analysis tools that can use this sequence information are linked down the right side of the NCBI page. Once you are comfortable using these tools, you can work more efficiently. Minimize NotePad to return to the NCBI website. Page 6 of 26 Bioinformatics Tutorial (rev )

7 Protein BLAST Introduction To access the BLAST page, in your live window, click on the NCBI icon in the upper left of the page (this takes you to the home page). Click on BLAST in the Popular Resources menu. Carefully read through the list of programs available under "Basic Blast" (Fig. 6) and what they can do for you (Table 1) before proceeding. Figure 6. Basic BLAST search options. Selecting a BLAST Program The "Basic BLAST" menu allows you to do either nucleotide or protein BLAST searches of various types. Because our sequence is a protein sequence, we will do a Protein protein BLAST (blastp). Click on this option. Table 1. Explanation of BLAST program functions for the rest of us. BLAST PROGRAM nucleotide blast or blastn protein blast (or blastp) blastx tblastn tblastx Further details Compares a nucleotide query sequence against a nucleotide sequence database. Compares an amino acid query sequence against a protein sequence database. Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence. Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames. Compares the six frame translations of a nucleotide query sequence against the six frame translations of a nucleotide sequence database. Please note that the tblastx program cannot be used with the nr database on the BLAST Web page because it is computationally intensive. Page 7 of 26 Bioinformatics Tutorial (rev )

8 BLAST P Search Paste your copied FASTA sequence into the text box under "Enter Query Sequence" (Fig. 7). Make sure the "Non redundant protein sequence (nr)" database is selected in the Database drop down menu under "Chose Search Set.". Click on BLAST. You may see a window indicating your query has been added to the BLAST Queue. You might have to wait for several seconds for your results during which time you will see a screen like that in Fig. 8. Be patient, remember that your sequence is being compared to thousands of others! Figure 7. The BLAST search screen. Page 8 of 26 Bioinformatics Tutorial (rev )

9 Figure 8. The initial screen during a BLAST search. BLAST P Results Part 1 The blastp results page (Fig. 9) shows around 100 "Hits", or other protein sequences showing at least some similarity to corn alpha amylase. The illustration with the red bars is a diagrammatic representation of how your sequence (the top red bar) lines up with other sequences in the database along the primary structure of the protein (from 0 to over 400 amino acids). Note that some of the sequences lack the amino terminus of your corn alpha amylase sequence. Figure 9. BLAST summary of related sequences. The lines show relative alignment of the hit sequences with the query sequence. Page 9 of 26 Bioinformatics Tutorial (rev )

10 Once your results appear, scroll down past the red diagram and you will see a list of accession numbers and descriptors for sequences in order of decreasing similarity to your sequence (Fig. 10). In fact, the first item in the list is (or should be) your sequence (check the accession number to be sure). The two scores at the right (Ident and E value) indicate the degree of similarity. Both are defined in the glossary of terms in this tutorial. You can click on any of these sequences to go to the GenPept page that describes it. Figure 10. Descriptions of the 100 most related sequences to the query sequence. For now, scroll down to the "Alignment" section of the results (Fig. 11) to see the actual amino acid sequences aligned against yours (illustrated in the screen shot below). Note the amino acid identities to get a measure of how similar the sequences are. The first should be 100 % since it is the identical sequence. As you scroll down through the next several sequences, though, the percent identity should get smaller. Figure 11. Sequence alignment information for the most related protein sequences. Oryza is the genus of rice. Page 10 of 26 Bioinformatics Tutorial (rev )

11 Your immediate goal using BLASTP is to locate sequences for the animal alpha amylases utilized in your experiments. Scroll back up slowly through the list of "hits". What species do you see? If it is not clear from the brief description, click on the accession number to get the GenPept descriptions. In fact, what you will probably find are mostly sequences from plants, some bacteria, and a few insects. Click on the "Distance of Tree Results" button at the top of the list of hits (see Fig. 11) to examine which organisms are represented in the list. If human and oyster (or other bivalve species [Order Pelecypoda]) salivary alpha amylase are not found in this list of BLAST hits, how else might you find those sequences to compare to corn? To broaden your analysis a bit, also search for the sequence for barley (Hordeum vulgare). Design and carry out a strategy to find them, and once you do, copy the FASTA formatted sequences to the same NotePad file your other sequence is in. Make sure to leave one blank line between the sequences. How many species? For this analysis you may find that using more than just the three species we used in lab would be very helpful in seeing larger patterns when comparing plant and animal amylases. We strongly recommend adding at least one more plant (barley is good). Addition of a third plant and animal amylase would be even better to help bring out the patterns of similarity and difference in amino acid sequences in plants and animals. ACCESSION NUMBER CHECK To facilitate a broader comparison of alpha amylase among plant and animals, you should now have four accession numbers: one for corn (Zea mays), humans (Homo sapiens), Pacific oyster (Crassostrea gigas) and barley (Hordeum vulgare). There are now sequences for amylase from two other clam Genera in the databases (Cerastoderma and Corbicula) which could be used as alternatives to the Pacific oyster. Record those accession numbers below and then check with a lab instructor or TA to make sure that you have appropriate sequences before you proceed. SPECIES corn (Zea mays) barley (Hordeum vulgare) humans (Homo sapiens) Pacific oyster (Crassostrea gigas) Other animal: Other plant: ACCESSION NUMBER AAA50161 Page 11 of 26 Bioinformatics Tutorial (rev )

12 Clustal Omega: A DNA and Protein Multiple Sequence Alignment Tool URL: Introduction Once you have found all four usable sequences, you will want to align them to see how similar they are. We will use the program Clustal Omega to do such an alignment. Be sure to read the information below that describes Clustal Omega and the underlying basis for sequence comparisons. When you are finished, enter the URL shown above to bring up the site that hosts the Clustalw2 program. Clustal Omega is a general purpose global multiple sequence alignment program for DNA or proteins for use when you want to align 3 or more sequences (for aligning 2 sequences use the pairwise sequence alignment tool: Clustal Omega produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms. Alignment scores are returned as a Percent Identity Matrix. Multiple alignments of protein sequences are important tools in studying sequences. The basic information they provide is identification of conserved sequence regions. This is very useful in designing experiments to test and modify the function of specific proteins, in predicting the function and structure of proteins, and in identifying new members of protein families. Sequences can be aligned across their entire length (global alignment) or only in certain regions (local alignment). This is true for pair wise and multiple alignments. Global alignments need to use gaps (representing insertions/deletions) while local alignments can avoid them, aligning regions between gaps. The alignment is progressive and considers the sequence redundancy. Trees can also be calculated from multiple alignments. The program has some adjustable parameters with reasonable defaults. Submission Form You will use the default settings for all menus that appear at the top of the Submission Form (Fig. 12), so don't change these. Copy all of your sequences, in FASTA format including their first descriptor line, into the open frame on the Submission Form; make sure to leave one blank space between them (Fig. 12). Clustal Omega will attempt to align these amino acid sequences based on their similarities. Click Submit. Your results might take a few seconds. Page 12 of 26 Bioinformatics Tutorial (rev )

13 Figure 12. The Clustal Omega submission form. Alignment Results The first screen you ll see shows the alignments of your sequences (Fig. 13a). It will be helpful to click on Show Colors to more easily see locations of similarity and difference among the sequences based on the chemical nature of the amino acid residues. RED (residues AVFPMILW) = Small (small+ hydrophobic (incl.aromatic Y)) BLUE (residues DE) = Acidic MAGENTA (residues RK) = Basic H GREEN (residues STYHCNGQ) = Hydroxyl + sulfhydryl + amine + G GREY (other residues) = Unusual amino/imino acids etc The displayed rows (except last one with the consensus symbols *, :,.) are the aligned amino acid sequences; the last one is an indication of consensus, or which amino acids are conserved across the Page 13 of 26 Bioinformatics Tutorial (rev )

14 compared sequences. By default, an alignment will display the following consensus symbols denoting the degree of conservation observed in each column. Conserved means the amino acid is replaced by one having similar chemical properties. Consensus Symbols: " * " means that the residues, or nucleotides, in that column are identical in all sequences in the alignment. " : " means that conserved substitutions have been observed; amino acids having strongly similar properties. ". " means that semi conserved substitutions are observed, i.e., amino acids having similar shape, but otherwise have weakly similar properties. Click on Results Summary button at the top of the page. A table is returned that allows you to select multiple summaries of information about the analysis. The one you ll want is the last one, the Percent Identity Matrix this returns the alignment scores for the pairwise comparisons of the sequences you submitted. The matrix lists the sequences by accession number by row and column (we added the red labels). The score at the intersection of a row and column is the alignment for that pair. To help you understand the alignment score, review the description below from the Clustal Omega site FAQs. How are pairwise alignment scores calculated? A pairwise score is calculated for every pair of sequences that are to be aligned. These scores are presented as a matrix in the results. Pairwise scores are calculated as the number of identities (same amino acid residue in the best alignment divided by the number of residues compared (gap positions are excluded). Thus, they tell us approximately what percentage of the two sequences have functional identity, or similarity. Page 14 of 26 Bioinformatics Tutorial (rev )

15 Figure 13. (A) A portion of a multiple sequence alignment. The number at the end of the row indicates the amino acid number in the last position of that row relative to the entire molecule. (B) Results Summary options, (C) Matrix of alignment scores. Page 15 of 26 Bioinformatics Tutorial (rev )

16 Be sure to copy the alignments output and matrix scores results to your Notepad file. Look through the entire sequence to look for areas of similarity. How much is there? Can you guess why clam/oyster and human sequences did not appear in the BLAST search with corn alpha amylase? Compare each pair of sequences to see which ones are most similar. You might need to re run ClustalW2 with the different pairs to most efficiently determine this. Are there any areas of the sequence that you expect to be more similar between species than others (i.e., the active site)? If you don t know where the important functional domains are, you should run a search of the literature in PubMed to find out. Simply click on the NCBI icon on the active web page and choose PubMed. Protein Structures Conserved Domain Database (CDD) Since you found that there are few similarities in the amino acid sequences for alpha amylase in the three organisms, how do we account for them being functionally similar? We need to take one more step and examine the three dimensional structure of the enzymes. You can use tools on the NCBI website for this as well. Open the NCBI main page (Fig. 14). Click on Domains and Structure on the left hand menu bar, and then select Conserved Domain Database (CDD) under the resource tab. Figure 14. NCBI website homepage. On the CDD database page, click on "Search Methods" (Fig. 15). Page 16 of 26 Bioinformatics Tutorial (rev )

17 Figure 15. Conserved Domain Database entry page. Type (or paste) the accession number for human salivary alpha amylase into the big center search window (Fig. 16). Select the CDD database from the pull down menu. Click on the SUBMIT button. Figure 16. Conserved domain query submission page. Page 17 of 26 Bioinformatics Tutorial (rev )

18 The results window should confirm that this sequence is for alpha amylase. Click on SEARCH FOR SIMILAR DOMAIN ARCHITECTURE (Fig. 17). Figure 17. Results page from CDD query. Note that the graphic identifies the active, catalytic, and Calcium binding site regions. Select the pfam00128 accession number to continue. In the window displaying the results for the pfam00128 group, expand the "[+] Structure" menu, which is collapsed by default. Then click on Structure View (Fig. 18a). If you are using your own computer, click on Download Cn3D to install the viewing program and follow whatever are your platform s usual instructions for program installation On Bates laptops, the program should open the structure file automatically. A. B. Figure 18. Accessing the Cn3D display program. Page 18 of 26 Bioinformatics Tutorial (rev )

19 The Cn3D application will open enabling you to see the structure of your protein (Fig. 19). You can rotate the 3 D structure by dragging it with your mouse. The catalytic active region is shown in red. Figure 19. 3D rendering of the human salivary amylase molecule. The color key matches the amino acid sequence information (Fig. 20) in the window that appears below the 3 D representation of your protein. The first row is the query sequence. If you select a portion of the sequence by dragging the mouse, it will be highlight in yellow of the model. The same works for individual residues. Figure 20. Amino acid sequences of pfam00128 amylases. The first row is the query sequence. Change the display format of Cn3D by selecting Style > Rendering Shortcuts > Worms (Fig. 21). Now you should be able to rotate the structure to clearly see the α/β barrel site in the center. Figure 21. Commands to change the rendering style of the 3d model. Page 19 of 26 Bioinformatics Tutorial (rev )

20 Protein Structures: Comparisons Now that you know what the catalytic site looks like, you can search for the 3d structure of the enzymes used in this lab and see how they compare. 1. Close the CDD windows and return to the main NCBI website by clicking the NCBI logo in the upper left corner. 2. Click on STRUCTURE at the top of the page. 3. At the Structure Search Entrez, enter Human Salivary Amylase (1SMD) and click GO. 4. Click on VIEW 3D STRUCTURE. 5. Rotate the model of the enzyme can you see the characteristic catalytic site? This site does not show the catalytic site in red, but you can highlight a section of the sequence in the lower window, and it will also be highlighted on the model. 6. Minimize the 3D model, and go back one page. Unfortunately, there are no structure models for either corn or clams in the database, but there is one for barley. Before viewing the structure of the barley enzyme, return to the ClustalW2 page and compare the barley and corn sequences to determine if this substitute is valid. Enter barley alpha amylase (1RPK) and click GO. Click on VIEW 3D STRUCTURE. 7. Rotate the model of the enzyme can you see the characteristic catalytic site? Maximize the window with the human enzyme model and compare the two side by side. Comparing Structures with VAST (now this IS cool!!) While Cn3D does fine with single structures, it's even better suited to displaying structure alignments of multiple proteins, i.e., it enables you to superimpose 3 D structure on top of each other such that differences in structure are readily apparent. NCBI creates and maintains a database of such alignments, called VAST (Vector Alignment Search Tool), for all pairs of proteins from MMDB whose structures have some similar core regions. The VAST tool does two things for each related pair: it calculates an optimal 3 D superimposition for the conserved core, and constructs a sequence alignment based on the correlation of the 3 D structures. 1. From the NCBI home page, choose the Structure database. 2. Search for human salivary alpha amylase. Somewhere on the hit list should be one with a PDB ID= 1SMD. 3. When you select 1SMD, you should get the Structure Summary page. To see the 3 D structure, click on the view structure button. 4. To compare this structure with other molecules, click the VAST+ button on the right. You now have a list of similar structures. Find the structure for barley alpha amylase (1AMY). Hint enter 1AMY for the PDB ID and click Search within Results button. 5. Expand the entry by clicking on the + to the left of 1AMY. Click on the 3 D view button to display the aligned 3 D structures. Page 20 of 26 Bioinformatics Tutorial (rev )

21 6. The default coloring for structure alignments in Cn3D uses magenta and blue for the regions aligned by the VAST algorithm, where residues aligned in 3 D space are magenta, and different residues are blue; unaligned regions are colored gray. Note that because of the way VAST works, the aligned regions tend to correspond to individual or groups of consecutive secondary structure elements helices and strands, while the loops outside the core vary in length and orientation and are often left unaligned. 7. There are some important differences between structure based alignments in Cn3D and sequence alignments from common algorithms like BLAST or Clustal Omega, both in the display and the underlying alignment data. In a structure alignment (e.g. from VAST), one residue is aligned with another because their alpha carbons are nearby in space, not because of the residue identity. 8. Try aligning a molecule that is very similar to human alpha amylase porcine alpha amylase. Search for the PDP ID = 1PIF instead of the barley. 9. Alteromonas halopanctis, the cold adapted marine organism that Feller, et. al., wrote about is in the VAST results too search for PDP ID = 1AQH. Page 21 of 26 Bioinformatics Tutorial (rev )

22 Page 22 of 26 Bioinformatics Tutorial (rev )

23 Glossary Alignment The process of lining up two or more sequences to achieve maximal levels of identity (and conservation, in the case of amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology. Algorithm A fixed procedure embodied in a computer program. Bioinformatics Bit score BLAST BLOSUM Conservation Domain DUST The merger of biotechnology and information technology with the goal of revealing new insights and principles in biology. The value S' is derived from the raw alignment score S in which the statistical properties of the scoring system used have been taken into account. Because bit scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches. Basic Local Alignment Search Tool. (Altschul et al.) A sequence comparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query. The initial search is done for a word of length "W" that scores at least "T" when compared to the query using a substitution matrix. Word hits are then extended in either direction in an attempt to generate an alignment with a score exceeding the threshold of "S". The "T" parameter dictates the speed and sensitivity of the search. For additional details, see one of the BLAST tutorials (Query or BLAST) or the narrative guide to BLAST. Blocks Substitution Matrix. A substitution matrix in which scores for each position are derived from observations of the frequencies of substitutions in blocks of local alignments in related proteins. Each matrix is tailored to a particular evolutionary distance. In the BLOSUM62 matrix, for example, the alignment from which scores were derived was created using sequences sharing no more than 62% identity. Sequences more identical than 62% are represented by a single sequence in the alignment so as to avoid over weighting closely related family members. (Henikoff and Henikoff) Changes at a specific position of an amino acid or (less commonly, DNA) sequence that preserve the physico chemical properties of the original residue. A discrete portion of a protein assumed to fold independently of the rest of the protein and possessing its own function. Page 23 of 26 Bioinformatics Tutorial (rev )

24 E value FASTA Filtering Gap A program for filtering low complexity regions from nucleic acid sequences. Expectation value. The number of different alignents with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. The first widely used algorithm for database similarity searching. The program looks for optimal local alignments by scanning the sequence for small matches called "words". Initially, the scores of segments in which there are multiple word hits are calculated ("init1"). Later the scores of several segments may be summed to generate an "initn" score. An optimized alignment that includes gaps is shown in the output as "opt". The sensitivity and speed of the search are inversely related and controlled by the "k tup" variable which specifies the size of a "word". (Pearson and Lipman) Also known as Masking. The process of hiding regions of (nucleic acid or amino acid) sequence having characteristics that frequently lead to spurious high scores. See SEG and DUST. A space introduced into an alignment to compensate for insertions and deletions in one sequence relative to another. To prevent the accumulation of too many gaps in an alignment, introduction of a gap causes the deduction of a fixed amount (the gap score) from the alignment score. Extension of the gap to encompass additional nucleotides or amino acid is also penalized in the scoring of an alignment. Global Alignment The alignment of two nucleic acid or protein sequences over their entire length. H Homology HSP Identity K H is the relative entropy of the target and background residue frequencies. (Karlin and Altschul, 1990). H can be thought of as a measure of the average information (in bits) available per position that distinguishes an alignment from chance. At high values of H, short alignments can be distinguished by chance, whereas at lower H values, a longer alignment may be necessary. (Altschul, 1991) Similarity attributed to descent from a common ancestor. High scoring segment pair. Local alignments with no gaps that achieve one of the top alignment scores in a given search. The extent to which two (nucleotide or amino acid) sequences are invariant. Page 24 of 26 Bioinformatics Tutorial (rev )

25 A statistical parameter used in calculating BLAST scores that can be thought of as a natural scale for search space size. The value K is used in converting a raw score (S) to a bit score (S'). Lambda A statistical parameter used in calculating BLAST scores that can be thought of as a natural scale for scoring system. The value lambda is used in converting a raw score (S) to a bit score (S'). Local Alignment The alignment of some portion of two nucleic acid or protein sequences Low Complexity Region (LCR) Regions of biased composition including homopolymeric runs, short period repeats, and more subtle overrepresentation of one or a few residues. The SEG program is used to mask or filter LCRs in amino acid queries. The DUST program is used to mask or filter LCRs in nucleic acid queries. Masking Also known as Filtering. The removal of repeated or low complexity regions from a sequence in order to improve the sensitivity of sequence similarity searches performed with that sequence. Motif A short conserved region in a protein sequence. Motifs are frequently highly conserved parts of domains. Multiple Sequence Alignment An alignment of three or more sequences with gaps inserted in the sequences such that residues with common structural positions and/or ancestral residues are aligned in the same column. Clustal W is one of the most widely used multiple sequence alignment programs Optimal Alignment An alignment of two sequences with the highest possible score. Orthologous Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function. P value The probability of an alignment occurring with the score in question or better. The p value is calculated by relating the observed alignment score, S, to the expected distribution of HSP scores from comparisons of random sequences of the same length and composition as the query to the database. The most highly significant P values will be those close to 0. P values and E values are different ways of representing the significance of the alignment. PAM = Percent Accepted Mutation A unit introduced by Dayhoff et al. to quantify the amount of evolutionary change in a protein sequence. 1.0 PAM unit, is the amount of evolution which will change, on average, 1% of amino acids in a protein sequence. A PAM(x) substitution matrix is a look up table in which scores for each amino acid substitution have been calculated based on the frequency of that substitution Page 25 of 26 Bioinformatics Tutorial (rev )

26 in closely related proteins that have experienced a certain amount (x) of evolutionary divergence. Paralogous Profile Proteomics Homologous sequences within a single species that arose by gene duplication. A table that lists the frequencies of each amino acid in each position of protein sequence. Frequencies are calculated from multiple alignments of sequences containing a domain of interest. See also PSSM. The systematic analysis of protein expression in normal and diseased tissues that involves the separation, identification, and characterization of all of the proteins in an organism. PSI BLAST Position Specific Iterative BLAST An iterative search using the BLAST algorithm. A profile is built after the initial search, which is then used in subsequent searches. The process may be repeated, if desired with new sequences found in each cycle used to refine the profile. Details can be found in this discussion of PSI BLAST. (Altschul et al.) PSSM = Position specific scoring matrix The PSSM gives the log odds score for finding a particular matching amino acid in a target sequence. Query The input sequence (or other type of search term) with which all of the entries in a database are to be compared. VAST Vector Alignment Search Tool. A tool that enables superimposition of multiple 3d structures. The VAST tool does two things for each related pair: it calculates an optimal 3 D superimposition for the conserved core, and constructs a sequence alignment based on the correlation of the 3 D structures. Page 26 of 26 Bioinformatics Tutorial (rev )

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

Guide for Bioinformatics Project Module 3

Guide for Bioinformatics Project Module 3 Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

Module 1. Sequence Formats and Retrieval. Charles Steward

Module 1. Sequence Formats and Retrieval. Charles Steward The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.

More information

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:

More information

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

BLAST. Anders Gorm Pedersen & Rasmus Wernersson BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

GenBank, Entrez, & FASTA

GenBank, Entrez, & FASTA GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,

More information

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

More information

Biological Databases and Protein Sequence Analysis

Biological Databases and Protein Sequence Analysis Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to

More information

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

More information

Clone Manager. Getting Started

Clone Manager. Getting Started Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software

More information

Structure Tools and Visualization

Structure Tools and Visualization Structure Tools and Visualization Gary Van Domselaar University of Alberta gary.vandomselaar@ualberta.ca Slides Adapted from Michel Dumontier, Blueprint Initiative 1 Visualization & Communication Visualization

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

DNA Sequencing Overview

DNA Sequencing Overview DNA Sequencing Overview DNA sequencing involves the determination of the sequence of nucleotides in a sample of DNA. It is presently conducted using a modified PCR reaction where both normal and labeled

More information

A Tutorial in Genetic Sequence Classification Tools and Techniques

A Tutorial in Genetic Sequence Classification Tools and Techniques A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University jakemdrew@gmail.com www.jakemdrew.com Sequence Characters IUPAC nucleotide

More information

Version 5.0 Release Notes

Version 5.0 Release Notes Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

Analyzing A DNA Sequence Chromatogram

Analyzing A DNA Sequence Chromatogram LESSON 9 HANDOUT Analyzing A DNA Sequence Chromatogram Student Researcher Background: DNA Analysis and FinchTV DNA sequence data can be used to answer many types of questions. Because DNA sequences differ

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004 Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence

More information

Searching Nucleotide Databases

Searching Nucleotide Databases Searching Nucleotide Databases 1 When we search a nucleic acid databases, Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from the forward strand and 3 reading frames

More information

Molecular Databases and Tools

Molecular Databases and Tools NWeHealth, The University of Manchester Molecular Databases and Tools Afternoon Session: NCBI/EBI resources, pairwise alignment, BLAST, multiple sequence alignment and primer finding. Dr. Georgina Moulton

More information

UGENE Quick Start Guide

UGENE Quick Start Guide Quick Start Guide This document contains a quick introduction to UGENE. For more detailed information, you can find the UGENE User Manual and other special manuals in project website: http://ugene.unipro.ru.

More information

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need

More information

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015 Reference Genome Tracks November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com Reference

More information

Linear Sequence Analysis. 3-D Structure Analysis

Linear Sequence Analysis. 3-D Structure Analysis Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic

More information

Section I Using Jmol as a Computer Visualization Tool

Section I Using Jmol as a Computer Visualization Tool Section I Using Jmol as a Computer Visualization Tool Jmol is a free open source molecular visualization program used by students, teachers, professors, and scientists to explore protein structures. Section

More information

Visualization of Phylogenetic Trees and Metadata

Visualization of Phylogenetic Trees and Metadata Visualization of Phylogenetic Trees and Metadata November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com

More information

Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

More information

ID of alternative translational initiation events. Description of gene function Reference of NCBI database access and relative literatures

ID of alternative translational initiation events. Description of gene function Reference of NCBI database access and relative literatures Data resource: In this database, 650 alternatively translated variants assigned to a total of 300 genes are contained. These database records of alternative translational initiation have been collected

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Genome Explorer For Comparative Genome Analysis

Genome Explorer For Comparative Genome Analysis Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/ CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1. Introduction

More information

Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

More information

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. Tutorial Module 5 BioMart You will learn about BioMart, a joint project developed and maintained at EBI and OiCR www.biomart.org How to use BioMart to quickly obtain lists of gene information from Ensembl

More information

Discovering Bioinformatics

Discovering Bioinformatics Discovering Bioinformatics Sami Khuri Natascha Khuri Alexander Picker Aidan Budd Sophie Chabanis-Davidson Julia Willingale-Theune English version ELLS European Learning Laboratory for the Life Sciences

More information

Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999

Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999 Dr Clare Sansom works part time at Birkbeck College, London, and part time as a freelance computer consultant and science writer At Birkbeck she coordinates an innovative graduate-level Advanced Certificate

More information

REUTERS/TIM WIMBORNE SCHOLARONE MANUSCRIPTS COGNOS REPORTS

REUTERS/TIM WIMBORNE SCHOLARONE MANUSCRIPTS COGNOS REPORTS REUTERS/TIM WIMBORNE SCHOLARONE MANUSCRIPTS COGNOS REPORTS 28-APRIL-2015 TABLE OF CONTENTS Select an item in the table of contents to go to that topic in the document. USE GET HELP NOW & FAQS... 1 SYSTEM

More information

Multiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker

Multiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker Multiple Sequence Alignment Hot Topic 5/24/06 Kim Walker Outline Why are Multiple Sequence Alignments useful? What Tools are Available? Brief Introduction to ClustalX Tools to Edit and Add Features to

More information

MEDIAplus administration interface

MEDIAplus administration interface MEDIAplus administration interface 1. MEDIAplus administration interface... 5 2. Basics of MEDIAplus administration... 8 2.1. Domains and administrators... 8 2.2. Programmes, modules and topics... 10 2.3.

More information

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011 Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear

More information

Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick Reference Guide

Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick Reference Guide Open Crystal Reports From the Windows Start menu choose Programs and then Crystal Reports. Creating a Blank Report Ohio University Computer Services Center August, 2002 Crystal Reports Introduction Quick

More information

Library page. SRS first view. Different types of database in SRS. Standard query form

Library page. SRS first view. Different types of database in SRS. Standard query form SRS & Entrez SRS Sequence Retrieval System Bengt Persson Whatis SRS? Sequence Retrieval System User-friendly interface to databases http://srs.ebi.ac.uk Developed by Thure Etzold and co-workers EMBL/EBI

More information

Module 10: Bioinformatics

Module 10: Bioinformatics Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior

More information

Managing your Joomla! 3 Content Management System (CMS) Website Websites For Small Business

Managing your Joomla! 3 Content Management System (CMS) Website Websites For Small Business 2015 Managing your Joomla! 3 Content Management System (CMS) Website Websites For Small Business This manual will take you through all the areas that you are likely to use in order to maintain, update

More information

Making Visio Diagrams Come Alive with Data

Making Visio Diagrams Come Alive with Data Making Visio Diagrams Come Alive with Data An Information Commons Workshop Making Visio Diagrams Come Alive with Data Page Workshop Why Add Data to A Diagram? Here are comparisons of a flow chart with

More information

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some

More information

AIM Dashboard-User Documentation

AIM Dashboard-User Documentation AIM Dashboard-User Documentation Accessing the Academic Insights Management (AIM) Dashboard Getting Started Navigating the AIM Dashboard Advanced Data Analysis Features Exporting Data Tables into Excel

More information

How To Change Your Site On Drupal Cloud On A Pcode On A Microsoft Powerstone On A Macbook Or Ipad (For Free) On A Freebie (For A Free Download) On An Ipad Or Ipa (For

How To Change Your Site On Drupal Cloud On A Pcode On A Microsoft Powerstone On A Macbook Or Ipad (For Free) On A Freebie (For A Free Download) On An Ipad Or Ipa (For How-to Guide: MIT DLC Drupal Cloud Theme This guide will show you how to take your initial Drupal Cloud site... and turn it into something more like this, using the MIT DLC Drupal Cloud theme. See this

More information

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS

More information

Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS

Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary

More information

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

More information

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper

More information

Frequently Asked Questions Next Generation Sequencing

Frequently Asked Questions Next Generation Sequencing Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided

More information

ReceivablesVision SM Getting Started Guide

ReceivablesVision SM Getting Started Guide ReceivablesVision SM Getting Started Guide March 2013 Transaction Services ReceivablesVision Quick Start Guide Table of Contents Table of Contents Accessing ReceivablesVision SM...2 The Login Screen...

More information

Sequencing the Human Genome

Sequencing the Human Genome Revised and Updated Edvo-Kit #339 Sequencing the Human Genome 339 Experiment Objective: In this experiment, students will read DNA sequences obtained from automated DNA sequencing techniques. The data

More information

LESSON 9. Analyzing DNA Sequences and DNA Barcoding. Introduction. Learning Objectives

LESSON 9. Analyzing DNA Sequences and DNA Barcoding. Introduction. Learning Objectives 9 Analyzing DNA Sequences and DNA Barcoding Introduction DNA sequencing is performed by scientists in many different fields of biology. Many bioinformatics programs are used during the process of analyzing

More information

Creating Online Surveys with Qualtrics Survey Tool

Creating Online Surveys with Qualtrics Survey Tool Creating Online Surveys with Qualtrics Survey Tool Copyright 2015, Faculty and Staff Training, West Chester University. A member of the Pennsylvania State System of Higher Education. No portion of this

More information

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers. org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank

More information

Access to Moodle. The first session of this document will show you how to access your Lasell Moodle course, how to login, and how to logout.

Access to Moodle. The first session of this document will show you how to access your Lasell Moodle course, how to login, and how to logout. Access to Moodle The first session of this document will show you how to access your Lasell Moodle course, how to login, and how to logout. 1. The homepage of Lasell Learning Management System Moodle is

More information

HRS 750: UDW+ Ad Hoc Reports Training 2015 Version 1.1

HRS 750: UDW+ Ad Hoc Reports Training 2015 Version 1.1 HRS 750: UDW+ Ad Hoc Reports Training 2015 Version 1.1 Program Services Office & Decision Support Group Table of Contents Create New Analysis... 4 Criteria Tab... 5 Key Fact (Measurement) and Dimension

More information

Biological Sciences Initiative. Human Genome

Biological Sciences Initiative. Human Genome Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.

More information

Qualtrics Survey Tool

Qualtrics Survey Tool Qualtrics Survey Tool This page left blank intentionally. Table of Contents Overview... 5 Uses for Qualtrics Surveys:... 5 Accessing Qualtrics... 5 My Surveys Tab... 5 Survey Controls... 5 Creating New

More information

Getting Started With Mortgage MarketSmart

Getting Started With Mortgage MarketSmart Getting Started With Mortgage MarketSmart We are excited that you are using Mortgage MarketSmart and hope that you will enjoy being one of its first users. This Getting Started guide is a work in progress,

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

NYS OCFS CMS Contractor Manual

NYS OCFS CMS Contractor Manual NYS OCFS CMS Contractor Manual C O N T E N T S CHAPTER 1... 1-1 Chapter 1: Introduction to the Contract Management System... 1-2 CHAPTER 2... 2-1 Accessing the Contract Management System... 2-2 Shortcuts

More information

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 402 A Multiple DNA Sequence Translation Tool Incorporating Web

More information

CREATING EXCEL PIVOT TABLES AND PIVOT CHARTS FOR LIBRARY QUESTIONNAIRE RESULTS

CREATING EXCEL PIVOT TABLES AND PIVOT CHARTS FOR LIBRARY QUESTIONNAIRE RESULTS CREATING EXCEL PIVOT TABLES AND PIVOT CHARTS FOR LIBRARY QUESTIONNAIRE RESULTS An Excel Pivot Table is an interactive table that summarizes large amounts of data. It allows the user to view and manipulate

More information

Creating and Managing Online Surveys LEVEL 2

Creating and Managing Online Surveys LEVEL 2 Creating and Managing Online Surveys LEVEL 2 Accessing your online survey account 1. If you are logged into UNF s network, go to https://survey. You will automatically be logged in. 2. If you are not logged

More information

VALUE LINE INVESTMENT SURVEY ONLINE USER S GUIDE VALUE LINE INVESTMENT SURVEY ONLINE. User s Guide

VALUE LINE INVESTMENT SURVEY ONLINE USER S GUIDE VALUE LINE INVESTMENT SURVEY ONLINE. User s Guide VALUE LINE INVESTMENT SURVEY ONLINE User s Guide Welcome to Value Line Investment Survey Online. This user guide will show you everything you need to know to access and utilize the wealth of information

More information

JOOMLA 2.5 MANUAL WEBSITEDESIGN.CO.ZA

JOOMLA 2.5 MANUAL WEBSITEDESIGN.CO.ZA JOOMLA 2.5 MANUAL WEBSITEDESIGN.CO.ZA All information presented in the document has been acquired from http://docs.joomla.org to assist you with your website 1 JOOMLA 2.5 MANUAL WEBSITEDESIGN.CO.ZA BACK

More information

HOW TO MAKE YOUR WEBSITE

HOW TO MAKE YOUR WEBSITE HOW TO MAKE YOUR WEBSITE Use Netscape Composer to make your web page presentation of a 3D structure of your choosing. You will need to download a few template web pages from the biochemistry website, and

More information

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,

More information

ACAAGGGACTAGAGAAACCAAAA AGAAACCAAAACGAAAGGTGCAGAA AACGAAAGGTGCAGAAGGGGAAACAGATGCAGA CHAPTER 3

ACAAGGGACTAGAGAAACCAAAA AGAAACCAAAACGAAAGGTGCAGAA AACGAAAGGTGCAGAAGGGGAAACAGATGCAGA CHAPTER 3 ACAAGGGACTAGAGAAACCAAAA AGAAACCAAAACGAAAGGTGCAGAA AACGAAAGGTGCAGAAGGGGAAACAGATGCAGA CHAPTER 3 GAAGGGGAAACAGATGCAGAAAGCATC AGAAAGCATC ACAAGGGACTAGAGAAACCAAAACGAAAGGTGCAGAAGGGGAAACAGATGCAGAAAGCATC Introduction

More information

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS SECTION 2-1: OVERVIEW Chapter 2 Describing, Exploring and Comparing Data 19 In this chapter, we will use the capabilities of Excel to help us look more carefully at sets of data. We can do this by re-organizing

More information

Web Ambassador Training on the CMS

Web Ambassador Training on the CMS Web Ambassador Training on the CMS Learning Objectives Upon completion of this training, participants will be able to: Describe what is a CMS and how to login Upload files and images Organize content Create

More information

Genomes and SNPs in Malaria and Sickle Cell Anemia

Genomes and SNPs in Malaria and Sickle Cell Anemia Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing

More information

DeCyder Extended Data Analysis module Version 1.0

DeCyder Extended Data Analysis module Version 1.0 GE Healthcare DeCyder Extended Data Analysis module Version 1.0 Module for DeCyder 2D version 6.5 User Manual Contents 1 Introduction 1.1 Introduction... 7 1.2 The DeCyder EDA User Manual... 9 1.3 Getting

More information

Tabs3, PracticeMaster, and the pinwheel symbol ( trademarks of Software Technology, Inc. Portions copyright Microsoft Corporation

Tabs3, PracticeMaster, and the pinwheel symbol ( trademarks of Software Technology, Inc. Portions copyright Microsoft Corporation Tabs3 Trust Accounting Software Reseller/User Tutorial Version 16 for November 2011 Sample Data Copyright 1983-2013 Software Technology, Inc. 1621 Cushman Drive Lincoln, NE 68512 (402) 423-1440 http://www.tabs3.com

More information

Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102

Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102 Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102 Interneer, Inc. Updated on 2/22/2012 Created by Erika Keresztyen Fahey 2 Workflow - A102 - Basic HelpDesk Ticketing System

More information

Biological Sequence Data Formats

Biological Sequence Data Formats Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA

More information

Microsoft Word Track Changes

Microsoft Word Track Changes Microsoft Word Track Changes This document is provided for your information only. You SHOULD NOT upload a document into imedris that contains tracked changes. You can choose to use track changes for your

More information

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations SCENARIO You have responded, as a result of a call from the police to the Coroner s Office, to the scene of the death of

More information

Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison

Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

More information

CATIA Basic Concepts TABLE OF CONTENTS

CATIA Basic Concepts TABLE OF CONTENTS TABLE OF CONTENTS Introduction...1 Manual Format...2 Log on/off procedures for Windows...3 To log on...3 To logoff...7 Assembly Design Screen...8 Part Design Screen...9 Pull-down Menus...10 Start...10

More information

Protein Studies Using CAChe

Protein Studies Using CAChe Protein Studies Using CAChe Exercise 1 Building the Molecules of Interest, and Using the Protein Data Bank In the CAChe workspace, click File / pen, and navigate to the C:\Program Files\Fujitsu\ CAChe\Fragment

More information

Getting Started with SurveyGizmo Stage 1: Creating Your First Survey

Getting Started with SurveyGizmo Stage 1: Creating Your First Survey Getting Started with SurveyGizmo Stage 1: Creating Your First Survey 1. Open SurveyGizmo site (http://www.surveygizmo.com) Log in 2. Click on Create Survey (see screen shot below) 3. Create Your Survey

More information

Working with the Ektron Content Management System

Working with the Ektron Content Management System Working with the Ektron Content Management System Table of Contents Creating Folders Creating Content 3 Entering Text 3 Adding Headings 4 Creating Bullets and numbered lists 4 External Hyperlinks and e

More information

BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs

BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs Richard J. Edwards 2008. Contents 1. Introduction... 2 1.1. Version...2 1.2. Using this Manual...2 1.3. Why use BUDAPEST?...2

More information

Dreamweaver and Fireworks MX Integration Brian Hogan

Dreamweaver and Fireworks MX Integration Brian Hogan Dreamweaver and Fireworks MX Integration Brian Hogan This tutorial will take you through the necessary steps to create a template-based web site using Macromedia Dreamweaver and Macromedia Fireworks. The

More information

MICROSOFT ACCESS 2003 TUTORIAL

MICROSOFT ACCESS 2003 TUTORIAL MICROSOFT ACCESS 2003 TUTORIAL M I C R O S O F T A C C E S S 2 0 0 3 Microsoft Access is powerful software designed for PC. It allows you to create and manage databases. A database is an organized body

More information

Access Tutorial 3 Maintaining and Querying a Database. Microsoft Office 2013 Enhanced

Access Tutorial 3 Maintaining and Querying a Database. Microsoft Office 2013 Enhanced Access Tutorial 3 Maintaining and Querying a Database Microsoft Office 2013 Enhanced Objectives Session 3.1 Find, modify, and delete records in a table Hide and unhide fields in a datasheet Work in the

More information

Amino Acids and Their Properties

Amino Acids and Their Properties Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

More information

Tutorial 3 Maintaining and Querying a Database

Tutorial 3 Maintaining and Querying a Database Tutorial 3 Maintaining and Querying a Database Microsoft Access 2013 Objectives Session 3.1 Find, modify, and delete records in a table Hide and unhide fields in a datasheet Work in the Query window in

More information

BMC Bioinformatics. Open Access. Abstract

BMC Bioinformatics. Open Access. Abstract BMC Bioinformatics BioMed Central Software Recent Hits Acquired by BLAST (ReHAB): A tool to identify new hits in sequence similarity searches Joe Whitney, David J Esteban and Chris Upton* Open Access Address:

More information

ORACLE BUSINESS INTELLIGENCE WORKSHOP

ORACLE BUSINESS INTELLIGENCE WORKSHOP ORACLE BUSINESS INTELLIGENCE WORKSHOP Creating Interactive Dashboards and Using Oracle Business Intelligence Answers Purpose This tutorial shows you how to build, format, and customize Oracle Business

More information