A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques
|
|
|
- Preston Parrish
- 10 years ago
- Views:
Transcription
1 Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques HYE RY LEE, SEUNG-HEE LEE, KEON MYUNG LEE School of Electrical and Computer Engineering Chungbuk National University 12 Gaeshin, Cheongju, Chungbuk KOREA [email protected] CHAN HEE LEE Department of Microbiology Chungbuk National University 12 Gaeshin, Cheongju, Chungbuk KOREA Abstract: With the advent of various high throughput technologies in molecular biology, the accumulated biological data grow ever rapidly. It is essential to have some bioinformatics tools to automate the analysis tasks which biologists manually carry out in order to handle such large volume of biological data. This paper presents an intelligent bioinformatics tool which has been designed and developed to acquire multiple DNA sequences at a time, to translate them into amino acid sequences and to recommend most likely ones based on the biological knowledge. The tool makes use of a web robot to collect sequence data from the NCBI web site and contains an ORF(Open Reading Frame) analyzer to determine most likely amino acid sequences. Key Words : Bioinformatics, DNA Sequence Translation, ORF finder, Web robot 1 Introduction With the help of high throughput sequencing technologies and efficient computational tools, the genomes for dozens of organisms have been successfully sequenced and archived into biological databases. Many researchers have been paying their attentions to post-genome studies such as identifying the functions of genes, modeling the interaction networks like gene regulatory networks, signal transduction networks and metabolic pathway networks, and so on. Once a sequence is obtained from some biological treatment, the biological analyst tries to deduce its structural, functional and evolutionary relevance by evaluating its similarity and difference in DNA base-level or amino acid-level with respect to other sequences of interest.[1] The biological analysts usually collect the existing sequences and corresponding annotation information from public biological databases like NCBI[2] and carry out the designated analyses with web applications and stand-alone applications. Various bioinformatics tools have been being developed to find optimized or approximated solutions, to perform complicated computations, to carry out effectively the analysis tasks manually done by the analysts. This study is concerned with the analysis situation in which an analyst needs to get multiple DNA sequences from a public database and to convert them into amino acid sequences. In order to accomplish this task, she first has to enter GenBank Accession Number(GB) or Geninfo Identifier(GI) into the NCBI GenBank database[3], extracts out the DNA sequence sections from the retrieved results and store them into a file. She now translates one by one the collected sequences into amino acid sequences manually or with the help of translation program like ExPASy s Translate[4], and then stores them in a file in a required format. This paper presents an intelligent tool which has been designed and implemented to perform the above-mentioned whole steps at a time. This paper is organized as follows: Section 2 briefly reviews some related works on DNA sequence translation tools. Section 3 presents the proposed approach and the developed system in term of system architecture and its functionality. Section 4 draws the conclusions and future works. 2 Related Works Representative databases widely used to get biological information include GenBank, UniProtKB/Swiss- Prot of EMBL[5] and PIR-International[6]. They are equipped with the query interfaces and provide retrieved results in a web page or in various standard format like ASN.1, XML and FASTA. Each record in these databases consists of various fields including sequence, annotation, organism, reference, and so on. The databases have partly overlapped records for the same objects each other even
2 Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, though they have different focuses, e.g., some for DNA, others for protein. In the biological databases, each record has a unique identifier called access number or other name in the specific database, with which the records are linked across the databases. GenBank is one of the most frequently used databases to which newly identified sequences are generally registered. It contains data about DNA and protein sequences, genome maps, molecular modeling database, and documentary comments.[3] For Gen- Bank, NCBI provides a Web query interface program, called Entrez, one of which functions allows to retrieve DNA sequences along with its related information with GenBank Accession Number or Geninfo Identifier.[2] Entrez can publish the retrieved results in various formats like plain text, XML, and so on, from which users have to extract the DNA sequence. ExPASy(Expert Protein Analysis System) provides a tool called Translate which translates a given DNA sequence into its corresponding amino acid sequence.[4] ExPASy is a system operated by SIB(Swiss Institute of Bioinformatics) to provide proteomics-related information. Translate allows to convert a DNA sequence at a time. Therefore, to handle multiple DNA sequences its users have to do bothersome cut-and-paste works as many times as the number of sequences to be translated. When a user collects multiple DNA sequences of interest and translates them into the corresponding amino acid sequences, she carries out the following steps at this stage of available technology: She retrieves DNA sequences by their accession numbers with NCBI Entrez, manually extracts DNA sequence parts from the retrieved results and transforms into FASTA format, then feeds DNA sequences into Ex- PASy s Translate one by one while cut-and-pasting the translated sequences into a file. This series of steps is a time-consuming and bothersome task. From this observation, we have developed a system which takes care of the whole process with the minimized user s intervention. 3 The Developed System For automatic multiple DNA sequence acquisition and translation, a system has been developed which consists of the DNA sequence acquisition module, the translation module, and the auxiliary tools such as Pattern Finder, Biostatistics Viewer, and Exporter. The DNA sequence acquisition module has been implemented using a web robot technique which plays role of collecting the DNA sequences from NCBI Gen- Bank when a set of accession numbers for DNA sequences is given. The translation module is charge of translating the collected DNA sequences into amino acid sequences in which 6 amino acid sequences are generated for each DNA sequence according to the possible reading frame positions and directions, and then the most plausible one is recommended. Pattern Finder allows to search for a specific subsequence from the selected DNA sequence. Biostatistics Viewer shows some basic statistical information for the sequences. Exporter plays role to export the processed sequences into a file. When a user makes use of this system, she can ask to collect the DNA sequences from the NCBI Gen- Bank by listing out their accession numbers or can provide directly the DNA sequences to the translation module. The DNA accession numbers to be collected are listed in a file and the file is handed over the system as the input file. The files to store the DNA sequences collected by the user are edited in the FASTA format, and the system provides a functionality to handle these files to translate into amino acid sequences. For each DNA sequence, the system takes into account 6 possible translations and recommends the most likely one at the first place and enables the user to confirm it or to choose other one from the translated sequences through the graphical user interface. The confirmed amino acid sequences are exported into a file in the FASTA or XML format. Figure 1 shows how the system works for the multiple DNA sequence translation. Amino acid sequence selection ORF view Acession number list Web robot processing DNA sequences Amino acid translation Post-translation Analysis & manipulation Pattern Finding query pages Export NCBI GenBank Biostatistic analysis Figure 1: The Operation Flow of the Developed System 3.1 The Web Robot to Collect DNA Sequences To collect multiple DNA sequences of interest from the NCBI GenBank, a web robot has been developed which interacts with the NCBI Entrez. The web robot first uploads the file containing the DNA accession numbers, then retrieves the DNA sequences corresponding to the accession numbers one by one
3 Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, from the GenBank, and the collected sequences are maintained in an internal data structure and displayed through the tree-view directory graphical component. For each accession number, the web robot creates a query to be delivered to the Entrez s CGI program. In response to each query, the CGI program retrieves the corresponding record from the GenBank s Nucleotide database and sends back the retrieved result. Then, the robot parses through the received page and extracts out the DNA sequence. The collected DNA sequences are maintained in a FASTA format data structure and are used in the following translation work. The web robot helps the analysts avoid sequence-wise bothersome and time-consuming manual interactions with the Entrez system to collect the sequences. It could contribute largely to the analysis time reduction. 3.2 The DNA Sequence Translation A DNA sequence is made of base characters each of which A is for adenine, T for thymine, G for guanine, and C for cytosine, and encodes genetic information. Each consecutive three bases, called a codon, could encode an amino acid. There are 20 kinds of amino acids corresponding to 64 possible codons. Each codon has a single letter code for its amino acid, e.g., M for methionine.[8] For a DNA sequence, there are 3 possible reading frames in each the directions from 5 to 3, and from 3 to 5. A reading frame is the way in which nucleotides are read in groups of three to specify a code. The developed system generates 6 amino acid sequences corresponding to each reading frame. When a DNA sequence is selected, the graphical user interface shows its amino acid sequences and allows to choose one out of 6 sequences, as shown in Figure 2. For each reading frame, there is a sequence of codons, called ORF(open reading frame), beginning with an initiation codon (i.e., ATG) and ending with a termination codon (e.g., TAA, TGA, TAG). An ORF is a potential coding area which encodes a gene and thus is used to compose a protein when it is expressed. In biology, an ORF plays important role in determining whether its sequence encodes potentially a protein composition information by carrying out homology analysis at the amino acid level, or as a yardstick when evaluating a coded protein s size or molecular mass.[9] A meaningful ORF is usually the longest reading frame containing no in-frame stop codons. In order to search for the longest ORFs, the developed system takes into account all possible combinations between M, which is the initiation codon, and *, which denotes a termination codon. With the threshold of the minimum ORF length which can be controlled by the analysts, the longest one of size greater than the threshold is selected as the ORF at the corresponding reading frame. To tell the most likely amino acid sequence for a DNA sequence, the developed system determines the amino acid sequence with the largest ORF among the 6 possible translated amino acid sequences. The most likely sequence is displayed with which the corresponding check box is in the set state in the amino acid sequence viewer when an accession number is selected on the access number list window. This recommendation helps the analysts confirm the translated amino sequence with little effort, where they can select different translation other than the recommended one. The confirmed sequences are later exported into a file. 3.3 The ORF Mapper Figure 2: The Amino Acid Sequence Viewer Figure 3: The ORF Mapper
4 Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, The ORF Mapper is the window to show the recommended ORF for the selected amino acid sequence. In the window, the possible ORFs positions are depicted by the bar graphs. The threshold for the minimum length of ORFs can be controlled by the analysts with the graphical sliderbar interface. The window displays a simple statistics about the amino acid sequence length and the longest ORF length. Figure 3 shows the ORF Mapper window where the recommended ORF is highlighted. 3.4 The Pattern Finder The Pattern finder is a tool that helps search for a pattern based in the selected reading frame. In the window, the matched portions of the DNA sequence are turned into a different color for easy spying and the number of the occurrences of the query pattern is also displayed. The patterns frequently searched in the analysis include the initiation codon, termination codon, att site, and so on. Figure 4 shows the interface of Pattern Finder. Figure 4: The Pattern Finder 3.5 The Biostatistics Viewer The Biostatistics viewer window shows some statistical information for the translated sequences. At the current version, the developed system provides the information about the composition and distribution of amino acids for the selected amino acid sequence. Figure 5 shows the biostatistics viewer window. 3.6 The Exporter For the later use and further processing, the exporter takes charge of exporting into a file both the DNA sequences collected from GenBank and the translated acid sequences which are confirmed out of 6 possible Figure 5: The Biostatistics Viewer translations for each DNA sequence. The files can be saved in either the FASTA format or an XML format. The saved sequences could be later used as a query sequence for the sequence similarity-based retrieval services like BLAST. 3.7 Implementation The system has been implemented into a standone application for the Windows environment using Microsoft Visual Basic components. Due to its necessity to access the NCBI GenBank, the application should be deployed in the computers which are Internetconnected. 4 Conclusions A role of bioinformatics tools is to reduce the analysis burden by automating time-consuming and bothersome manual manipulation works and thus to increase the productivity in the analysis work. Many molecular biological studies ask to collect a volume of DNA sequences from the public databases and to transform into amino acid sequences as a prerequisite for further studies. This paper presented a system designed and implemented to meet the above needs. The developed system is capable of collecting DNA sequences at one with the help of a web robot and recommending the most likely amino acid sequences by taking into account the longest ORFs across the possible translations. For further studies, there remains to add on more functionality to the auxiliary services like regular expression query support for the Pattern Finder, additional biostatistical information provision for the Biostatistics Viewer. We are working on the functionality implementation to collect multiple DNA sequences which match the query sequence to some extent from the public biological database with the help of the robot agent at a time and to choose the sequences matched with the descriptive query, i.e., of which annotation part or title part is compatible with the query statement with the consideration of Gene Ontology information. Acknowledgements: This research was supported by the Regional Research Centers Program of the Ministry of Education & Human Resources Development in Korea.
5 Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, References: [1] P. Baldi, S. Brunak, Bioinformatics : The Machine Learning Approach(2nd Ed.), The MIT Press, [2] NCBI National Center for Biotechnology Information, [3] NCBI GeneBank, Genbank/index.html. [4] ExPASy, [5] UniProtKB/Swiss-Prot, swissprot/. [6] Entrez, [7] R. Durbin, S. R. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis : Probabilistic models of protein and nucleic acids, The Cambridge University Press, [8] S. Aluru, Handbook of Computational Molecular Biology(Eds.), Chapman & Hall/CRC, [9] T. A. Brown, Genomes(2nd ed), Oxford, United Kingdom: Wiley-Liss, [10] A. D. Baxevanis, B.F. F. Ouellette, Bioinformatics : A Practical Guide to the Analysis of Genes and Proteins(Eds.), John Wiley & Sons, Inc., 1998.
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
Bioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
Biological Sequence Data Formats
Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA
GenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
Module 10: Bioinformatics
Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior
Bioinformatics, Sequences and Genomes
Bioinformatics, Sequences and Genomes BL4273 Bioinformatics for Biologists Week 1 Daniel Barker, School of Biology, University of St Andrews Email [email protected] BL4273 and 4273π 4273π is a custom
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
Module 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
A Tutorial in Genetic Sequence Classification Tools and Techniques
A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University [email protected] www.jakemdrew.com Sequence Characters IUPAC nucleotide
UGENE Quick Start Guide
Quick Start Guide This document contains a quick introduction to UGENE. For more detailed information, you can find the UGENE User Manual and other special manuals in project website: http://ugene.unipro.ru.
Bioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
Introduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
Searching Nucleotide Databases
Searching Nucleotide Databases 1 When we search a nucleic acid databases, Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from the forward strand and 3 reading frames
Overview of Eukaryotic Gene Prediction
Overview of Eukaryotic Gene Prediction CBB 231 / COMPSCI 261 W.H. Majoros What is DNA? Nucleus Chromosome Telomere Centromere Cell Telomere base pairs histones DNA (double helix) DNA is a Double Helix
Teaching Bioinformatics to Undergraduates
Teaching Bioinformatics to Undergraduates http://www.med.nyu.edu/rcr/asm Stuart M. Brown Research Computing, NYU School of Medicine I. What is Bioinformatics? II. Challenges of teaching bioinformatics
A Web Based Software for Synonymous Codon Usage Indices
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 3 (2013), pp. 147-152 International Research Publications House http://www. irphouse.com /ijict.htm A Web
Biological Databases and Protein Sequence Analysis
Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to
LESSON 4. Using Bioinformatics to Analyze Protein Sequences. Introduction. Learning Objectives. Key Concepts
4 Using Bioinformatics to Analyze Protein Sequences Introduction In this lesson, students perform a paper exercise designed to reinforce the student understanding of the complementary nature of DNA and
REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])
820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor
Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.
13 Multiple Choice RNA and Protein Synthesis Chapter Test A Write the letter that best answers the question or completes the statement on the line provided. 1. Which of the following are found in both
13.2 Ribosomes & Protein Synthesis
13.2 Ribosomes & Protein Synthesis Introduction: *A specific sequence of bases in DNA carries the directions for forming a polypeptide, a chain of amino acids (there are 20 different types of amino acid).
Distributed Bioinformatics Computing System for DNA Sequence Analysis
Global Journal of Computer Science and Technology: A Hardware & Computation Volume 14 Issue 1 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.
BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, [email protected]) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
Concluding lesson. Student manual. What kind of protein are you? (Basic)
Concluding lesson Student manual What kind of protein are you? (Basic) Part 1 The hereditary material of an organism is stored in a coded way on the DNA. This code consists of four different nucleotides:
PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: [email protected]
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
Linear Sequence Analysis. 3-D Structure Analysis
Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic
Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
SUBMITTING DNA SEQUENCES TO THE DATABASES
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D. Baxevanis, B.F. Francis Ouellette Copyright 2001 John Wiley & Sons, Inc. ISBNs: 0-471-38390-2 (Hardback);
Analyzing A DNA Sequence Chromatogram
LESSON 9 HANDOUT Analyzing A DNA Sequence Chromatogram Student Researcher Background: DNA Analysis and FinchTV DNA sequence data can be used to answer many types of questions. Because DNA sequences differ
200630 - FBIO - Fundations of Bioinformatics
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 1004 - UB - (ENG)Universitat de Barcelona MASTER'S DEGREE IN STATISTICS AND
DNA Sequence formats
DNA Sequence formats [Plain] [EMBL] [FASTA] [GCG] [GenBank] [IG] [IUPAC] [How Genomatix represents sequence annotation] Plain sequence format A sequence in plain format may contain only IUPAC characters
Bioinformatics: course introduction
Bioinformatics: course introduction Filip Železný Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Intelligent Data Analysis lab http://ida.felk.cvut.cz
DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!
DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other
Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
MAKING AN EVOLUTIONARY TREE
Student manual MAKING AN EVOLUTIONARY TREE THEORY The relationship between different species can be derived from different information sources. The connection between species may turn out by similarities
Guide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
Algorithms in Computational Biology (236522) spring 2007 Lecture #1
Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office
Annex 6: Nucleotide Sequence Information System BEETLE. Biological and Ecological Evaluation towards Long-Term Effects
Annex 6: Nucleotide Sequence Information System BEETLE Biological and Ecological Evaluation towards Long-Term Effects Long-term effects of genetically modified (GM) crops on health, biodiversity and the
LESSON 9. Analyzing DNA Sequences and DNA Barcoding. Introduction. Learning Objectives
9 Analyzing DNA Sequences and DNA Barcoding Introduction DNA sequencing is performed by scientists in many different fields of biology. Many bioinformatics programs are used during the process of analyzing
An agent-based layered middleware as tool integration
An agent-based layered middleware as tool integration Flavio Corradini Leonardo Mariani Emanuela Merelli University of L Aquila University of Milano University of Camerino ITALY ITALY ITALY Helsinki FSE/ESEC
Committee on WIPO Standards (CWS)
E CWS/1/5 ORIGINAL: ENGLISH DATE: OCTOBER 13, 2010 Committee on WIPO Standards (CWS) First Session Geneva, October 25 to 29, 2010 PROPOSAL FOR THE PREPARATION OF A NEW WIPO STANDARD ON THE PRESENTATION
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives [email protected] 2015-05-21 Functional Bioinformatics, Örebro University Vad är bioinformatik och varför
DNA and the Cell. Version 2.3. English version. ELLS European Learning Laboratory for the Life Sciences
DNA and the Cell Anastasios Koutsos Alexandra Manaia Julia Willingale-Theune Version 2.3 English version ELLS European Learning Laboratory for the Life Sciences Anastasios Koutsos, Alexandra Manaia and
Integration of data management and analysis for genome research
Integration of data management and analysis for genome research Volker Brendel Deparment of Zoology & Genetics and Department of Statistics Iowa State University 2112 Molecular Biology Building Ames, Iowa
Gene Models & Bed format: What they represent.
GeneModels&Bedformat:Whattheyrepresent. Gene models are hypotheses about the structure of transcripts produced by a gene. Like all models, they may be correct, partly correct, or entirely wrong. Typically,
Sequencing the Human Genome
Revised and Updated Edvo-Kit #339 Sequencing the Human Genome 339 Experiment Objective: In this experiment, students will read DNA sequences obtained from automated DNA sequencing techniques. The data
Basic Concepts of DNA, Proteins, Genes and Genomes
Basic Concepts of DNA, Proteins, Genes and Genomes Kun-Mao Chao 1,2,3 1 Graduate Institute of Biomedical Electronics and Bioinformatics 2 Department of Computer Science and Information Engineering 3 Graduate
Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison
Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 [email protected]
Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in
DNA, RNA, Protein Synthesis Keystone 1. During the process shown above, the two strands of one DNA molecule are unwound. Then, DNA polymerases add complementary nucleotides to each strand which results
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
Clone Manager. Getting Started
Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software
Introduction to Bioinformatics 3. DNA editing and contig assembly
Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 [email protected]
Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism )
Biology 1406 Exam 3 Notes Structure of DNA Ch. 10 Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Proteins
Protein Analysis and WEKA Data Mining
GenMiner: a Data Mining Tool for Protein Analysis Gerasimos Hatzidamianos 1, Sotiris Diplaris 1,2, Ioannis Athanasiadis 1,2, Pericles A. Mitkas 1,2 1 Department of Electrical and Computer Engineering Aristotle
Translation Study Guide
Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to
DNA Sequencing Overview
DNA Sequencing Overview DNA sequencing involves the determination of the sequence of nucleotides in a sample of DNA. It is presently conducted using a modified PCR reaction where both normal and labeled
Intro to Bioinformatics
Intro to Bioinformatics Marylyn D Ritchie, PhD Professor, Biochemistry and Molecular Biology Director, Center for Systems Genomics The Pennsylvania State University Sarah A Pendergrass, PhD Research Associate
Molecular Genetics. RNA, Transcription, & Protein Synthesis
Molecular Genetics RNA, Transcription, & Protein Synthesis Section 1 RNA AND TRANSCRIPTION Objectives Describe the primary functions of RNA Identify how RNA differs from DNA Describe the structure and
Integrating Bioinformatics, Medical Sciences and Drug Discovery
Integrating Bioinformatics, Medical Sciences and Drug Discovery M. Madan Babu Centre for Biotechnology, Anna University, Chennai - 600025 phone: 44-4332179 :: email: [email protected] Bioinformatics
Processing Genome Data using Scalable Database Technology. My Background
Johann Christoph Freytag, Ph.D. [email protected] http://www.dbis.informatik.hu-berlin.de Stanford University, February 2004 PhD @ Harvard Univ. Visiting Scientist, Microsoft Res. (2002)
org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.
org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank
SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
Version 5.0 Release Notes
Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com
This document presents the new features available in ngklast release 4.4 and KServer 4.2.
This document presents the new features available in ngklast release 4.4 and KServer 4.2. 1) KLAST search engine optimization ngklast comes with an updated release of the KLAST sequence comparison tool.
Scientific databases. Biological data management
Scientific databases Biological data management The term paper within the framework of the course Principles of Modern Database Systems by Aleksejs Kontijevskis PhD student The Linnaeus Centre for Bioinformatics
AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
(http://genomes.urv.es/caical) TUTORIAL. (July 2006)
(http://genomes.urv.es/caical) TUTORIAL (July 2006) CAIcal manual 2 Table of contents Introduction... 3 Required inputs... 5 SECTION A Calculation of parameters... 8 SECTION B CAI calculation for FASTA
Gene Finding CMSC 423
Gene Finding CMSC 423 Finding Signals in DNA We just have a long string of A, C, G, Ts. How can we find the signals encoded in it? Suppose you encountered a language you didn t know. How would you decipher
Educator s Section: p. 1 7 Handout 1: p. 8-9 Handout 2: p. 10-12. Grades & Levels: High school (honors/ap) undergraduate (year 1)
ActionBioscience.org lesson To accompany the article by T. Ryan Gregory: "Genomic Puzzles Old and New" (August 2006) http://www.actionbioscience.org/genomic/gregory.html Bioinformatics (February 2007)
Databases and mapping BWA. Samtools
Databases and mapping BWA Samtools FASTQ, SFF, bax.h5 ACE, FASTG FASTA BAM/SAM GFF, BED GenBank/Embl/DDJB many more File formats FASTQ Output format from Illumina and IonTorrent sequencers. Quality scores:
Part ONE. a. Assuming each of the four bases occurs with equal probability, how many bits of information does a nucleotide contain?
Networked Systems, COMPGZ01, 2012 Answer TWO questions from Part ONE on the answer booklet containing lined writing paper, and answer ALL questions in Part TWO on the multiple-choice question answer sheet.
Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: [email protected]
Distributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London
Distributed Data Mining in Discovery Net Dr. Moustafa Ghanem Department of Computing Imperial College London 1. What is Discovery Net 2. Distributed Data Mining for Compute Intensive Tasks 3. Distributed
When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
4.2.1. What is a contig? 4.2.2. What are the contig assembly programs?
Table of Contents 4.1. DNA Sequencing 4.1.1. Trace Viewer in GCG SeqLab Table. Box. Select the editor mode in the SeqLab main window. Import sequencer trace files from the File menu. Select the trace files
Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks
Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks Semester II Paper II: Mathematics I 85 marks B.Sc. II Year
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge
Library page. SRS first view. Different types of database in SRS. Standard query form
SRS & Entrez SRS Sequence Retrieval System Bengt Persson Whatis SRS? Sequence Retrieval System User-friendly interface to databases http://srs.ebi.ac.uk Developed by Thure Etzold and co-workers EMBL/EBI
DNA is found in all organisms from the smallest bacteria to humans. DNA has the same composition and structure in all organisms!
Biological Sciences Initiative HHMI DNA omponents and Structure Introduction Nucleic acids are molecules that are essential to, and characteristic of, life on Earth. There are two basic types of nucleic
Department of Food and Nutrition
Department of Food and Nutrition Faculties Professors Lee-Kim, Yang Cha, Ph.D. (M.I.T., 1973) Nutritional biochemistry, Antioxidant vitamins, Fatty acid metabolism, Brain development, and Hyperlipidemia
Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
Structure and Function of DNA
Structure and Function of DNA DNA and RNA Structure DNA and RNA are nucleic acids. They consist of chemical units called nucleotides. The nucleotides are joined by a sugar-phosphate backbone. The four
From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains
Proteins From DNA to Protein Chapter 13 All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequence of a gene The Path From Genes
