Sequence formats and databases in bioinformatics
|
|
|
- Lynette Russell
- 10 years ago
- Views:
Transcription
1 Sequence formats and databases in bioinformatics Definitions/Basics Sequence formats Databases in Biology Dinesh Gupta Structural and Computational Biology Group ICGEB
2 What is Bioinformatics? Bioinformatics is the use of computers to solve biological and biomedical problems. Bioinformatics is the application of information technology to mine, visualize, analyze, integrate, and manage biological and genetic information, which can then be applied in, among other things, accelerating drug discovery and development. Application of tools of computation and analysis to the capture and interpretation of biological data. Biological Data management and analysis. NIH definition of Bioinformatics ( Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.
3 Use of Bioinformatics DNA analysis Genome sequencing Sequence assembly Sequence/gene annotations Genefinding/Sequence translation tools Sequence Similarity searching (eg. BLAST, ClustalW) Comparison between genomes Evolution of sequences (Phylogenetic analysis) Gene expression
4 Use of Bioinformatics (..contd.) Protein analysis Structure X-ray crystallography Homology based models Drug designing Sequence Sequence similarity Protein family assignments Conserved motifs Proteomics data analysis Protein Evolution
5 Uses of Bioinformatics (..contd.) Other uses: Drug designing Vaccine development Dairy technology Forensics Crop improvement Designing enzymes for detergents Genetic counseling
6 Bioinformatics: Integration of several fields Physics Computer Science Biological Science Bioinformatics Mathematics Statistics Chemistry
7 Recent events making bioinformatics more important Exponential expansion of biological information Expansion of multiple types of information Cheaper high throughput technologies Improvement in computation power Lack of standards/quality Need for micro and macro analysis Need for better algorithms
8
9 Vast Growth in (Structural) Data... but number of Fundementally New (Fold) Parts Not Increasing that Fast Total in Databank New Submissions New Folds
10 Bioinformatics Analysis? It is like any other lab analysis! You need to know your data/input sources You need to understand your methods and their assumptions You need a plan to get from point A to point B You need to understand your equipment You need to be critical and understand potential sources of error You need to interpret your results Your results need to be reproducible Your results should be testable
11 References, but not limited to: Baxevanis & Ouellette Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins 2 nd Edition. John Wiley Publishing. Gibas & Jambeck Developing Bioinformatics Computer Skills. O Reilly. Bioinformatics: Genome Sequence Analysis Mount 2001 Bioinformatics For Dummies Claverie & Notredame 2003 Introduction to Bioinformatics Lesk 2002
12 Sequence formats: Basics Why different formats? Type of information Software requirements Database requirements
13 Main file formats used in Bioinformatics ASN.1 EMBL, Swiss Prot FASTA GCG GenBank/GenPept PHYLIP PIR
14 ASN 1: Abstract Syntax Notation 1 used by NCBI Seq-entry ::= set { class phy-set, descr { pub { pub { article { title { name "Cross-species infection of blood parasites between resident and migratory songbirds in Africa" }, authors { names std { { name name { last "Waldenstroem", first "Jonas", initials "J." } }, { name name { last "Bensch", first "Staffan", initials "S." } }, { name name { last "Kiboi", first "Sam", initials "S." } }, { name name { last "Hasselquist", first "Dennis", initials "D." } }, { name name {
15 The first line of each sequence entry is the ID definition line which contains entry name, dataclass, molecule, division and sequence length. XX line contains no data, just a separator The AC line lists the accession number. DE line gives description about the sequence FT precise annotation for the sequence Sequence information SQ in the first two spaces. The sequence information begins on the fifth line of the sequence entry. The last line of each sequence entry in the file is a terminator line which has the two characters // in the first two spaces. EMBL/Swiss Prot ( ID AA03518 standard; DNA; FUN; 237 BP. XX AC U03518; XX AC U03518; XX DE Aspergillus awamori internal transcribed spacer 1 (ITS1) and 18S DE rrna and 5.8S rrna genes, partial sequence. DE rrna and 5.8S rrna genes, partial sequence. RX MEDLINE; RX PUBMED; XX FT rrna <1..20 FT /product="18s ribosomal RNA" FT misc_rna FT /standard_name="internal transcribed spacer 1 (ITS1)" FT rrna 206..>237 FT /product="5.8s ribosomal RNA" SQ Sequence 237 BP; 41 A; 77 C; 67 G; 52 T; 0 other; aacctgcgga aggatcatta ccgagtgcgg gtcctttggg cccaacctcc catccgtgtc 60 tattgtaccc tgttgcttcg gcgggcccgc cgcttgtcgg ccgccggggg ggcgcctctg 120 ccccccgggc ccgtgcccgc cggagacccc aacacgaaca ctgtctgaaa gcgtgcagtc 180 tgagttgatt gaatgcaatc agttaaaact ttcaacaatg gatctcttgg ttccggc 237 //
16 FASTA A sequence in Fasta format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greaterthan (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. >U03518 Aspergillus awamori internal transcribed spacer 1 (ITS1) AACCTGCGGAAGGATCATTACCGAGTGCGGGTCCTTTGGGCCCAACCTCCCATCCGTGTCTATTGTACCC TGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGGGGCGCCTCTGCCCCCCGGGCCCGTGCCCGC CGGAGACCCCAACACGAACACTGTCTGAAAGCGTGCAGTCTGAGTTGATTGAATGCAATCAGTTAAAACT TTCAACAATGGATCTCTTGGTTCCGGC
17 GCG Exactly one sequence Begins with annotation lines Start of the sequence is marked by a line ending with ".. This line also contains the sequence identifier, the sequence length and a checksum ID AA03518 standard; DNA; FUN; 237 BP. XX AC U03518; XX DE Aspergillus awamori internal transcribed spacer 1 (ITS1) and 18S DE rrna and 5.8S rrna genes, partial sequence. XX SQ Sequence 237 BP; 41 A; 77 C; 67 G; 52 T; 0 other; AA03518 Length: 237 Check: aacctgcgga aggatcatta ccgagtgcgg gtcctttggg cccaacctcc catccgtgtc 61 tattgtaccc tgttgcttcg gcgggcccgc cgcttgtcgg ccgccggggg ggcgcctctg 121 ccccccgggc ccgtgcccgc cggagacccc aacacgaaca ctgtctgaaa gcgtgcagtc 181 tgagttgatt gaatgcaatc agttaaaact ttcaacaatg gatctcttgg ttccggc
18 GenBank/GenPept The nucleotide (GenBank) and protein (Gen Pept) database entries are available from Entrez in this format Can contain several sequences One sequence starts with: LOCUS The sequence starts with: "ORIGIN The sequence ends with: "// LOCUS AAU bp DNA PLN 04-FEB-1995 DEFINITION Aspergillus awamori internal transcribed spacer 1 (ITS1) and 18S rrna and 5.8S rrna genes, partial sequence. ACCESSION U03518 BASE COUNT 41 a 77 c 67 g 52 t ORIGIN 1 aacctgcgga aggatcatta ccgagtgcgg gtcctttggg cccaacctcc catccgtgtc 61 tattgtaccc tgttgcttcg gcgggcccgc cgcttgtcgg ccgccggggg ggcgcctctg 121 ccccccgggc ccgtgcccgc cggagacccc aacacgaaca ctgtctgaaa gcgtgcagtc 181 tgagttgatt gaatgcaatc agttaaaact ttcaacaatg gatctcttgg ttccggc //
19 Phylip format G019uabh ATACATCATA ACACTACTTC CTACCCATAA GCTCCTTTTA ACTTGTTAAA G028uaah CATAAGCTCC TTTTAACTTG TTAAAGTCTT GCTTGAATTA AAGACTTGTT GTCTTGCTTG AATTAAAGAC TTGTTTAAAC ACAAAAATTT AGAGTTTTAC TAAACACAAA ATTTAGACTT TTACTCAACA AAAGTGATTG ATTGATTGAT TCAACAAAAG TGATTGATTG ATTGATTGAT TGATTGATGG TTTACAGTAG TGATTGATTG ATGGTTTACA GTAGGACTTC ATTCTAGTCA TTATAGCTGC The first line of the input file contains the number of sequences and their length (all should have the same length) separated by blanks. The next line contains a sequence name, next lines are the sequence itself in blocks of 10 characters. Then follow rest of sequences.
20 Other formats MEGA #mega Title: infile.fasta #G019uabh ATACATCATAACACTACTTCCTACCCATAAGCTCCTTTTAACTTGTTAAAGTCTTGCTTG AATTAAAGACTTGTTTAAACACAAAAATTTAGAGTTTTACTCAACAAAAGTGATTGATTG ATTGATTGATTGATTGATGGTTTACAGTAGGACTTCATTCTAGTCATTATAGCTGCTGGC AGTATAACTGGCCAGCCTTTAATACATTGCTGCTTAGAGTCAAAGCATGTACTTAGAGTT GGTATGATTTATCTTTTTGGTCTTCTATAGCCTCCTTCCCCATCCCCATCAGTCTTAATC AGTCTTGTTACGTTATGACTAATCTTTGGGGATTGTGCAGAATGTTATTTTAGATAAGCA AAACGAGCAAAATGGGGAGTTACTTATATTTCTTTAAAGC #G028uaah CATAAGCTCCTTTTAACTTGTTAAAGTCTTGCTTGAATTAAAGACTTGTTTAAACACAAA ATTTAGACTTTTACTCAACAAAAGTGATTGATTGATTGATTGATTGATTGATGGTTTACA GTAGGACTTCATTCTAGTCATTATAGCTGCTGGCAGTATAACTGGCCAGCCTTTAATACA TTGCTGCTTAGAGTCAAAGCATGTACTTAGAGTTGGTATGATTTATCTTTTTGGTCTTCT ATAGCCTCCTTCCCCATCCCATCAGTCT
21 ReadSeq Don Gilbert May 2001 Indiana University, Bloomington, Indiana WWW Seqret A program in EMBOSS suite
22
23
24
25
26
27 The Readseq package can read most common formats: examples of all these formats are included in the readseq directory. The formats include: IG/Stanford, used by Intelligenetics and others GenBank/GB, genbank flatfile format NBRF format (SAM modifications cause this to break when sequences do not have a terminating asterix) EMBL, EMBL flatfile format GCG, single sequence format of GCG software DNAStrider, for common Mac program Fitch format, limited use Pearson/Fasta, a common format used by Fasta programs and others Zuker format, limited use. Input only. Olsen, format printed by Olsen VMS sequence editor. Input only. Phylip3.2, sequential format for Phylip programs Plain/Raw, sequence data only (no name, document, numbering) MSF multi sequence format used by GCG software PAUP's multiple sequence (NEXUS) format PIR/CODATA format used by PIR
28 Databases in Biology
29
30 Need for databases in Biology? Need for storing and communicating large datasets has grown. Need to disseminate biological information. Provide Organized data for analysis friendly retrieval. Need to make biological data available in computerreadable form.
31 Different classifications of databases Type of data nucleotide sequences protein sequences proteins sequence patterns or motifs macromolecular 3D structure gene expression data metabolic pathways proteomics data
32 Different classifications of databases. Primary or derived databases Primary databases: experimental results directly into database Secondary databases: results of analysis of primary databases Aggregate of many databases Links to other data items Combination of data Consolidation of data
33 Different classifications of databases. Technical design Flat-files Relational database (SQL) Exchange/publication technologies (HTML, CORBA, XML,...) Each one of the above are inter convertible
34 Different classifications of databases. Availability Publicly available, no restrictions Available, but with copyright Accessible, but not downloadable Academic, but not freely available Proprietary, commercial; possibly free for academics
35 Different classifications of databases. Content Protein/DNA/RNA/miRNA etc. Family: kinases Common physical properties: membrane bound, mitochondrial proteins Common chemical properties: Proteases, reductases etc. Sequences of a particular genome/species: e.g. Influenza sequences, plasmodium sequences etc. Motifs/domains
36 Where to look for databases? Search Engines Journals related to Bioinformatics Websites like: Several others websites
37
38 NAR DB issue new dbs since last year! Total >1230! ( e/a/ Complete list Searchable m1037/dc1/1 (html format), also as downloadable word file)
39
40 Database searching tips Look for links to Help or Examples Always check update dates Level of curation Try Boolean searches Be careful with UK/US spelling differences leukaemia vs leukemia haemoglobin vs hemoglobin colour vs color
41 Exercise Retrieve sequences from sequence databases Convert sequence formats Study different formats and flow of information
Lecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr
Introduction to Databases Shifra Ben-Dor Irit Orr Lecture Outline Introduction Data and Database types Database components Data Formats Sample databases How to text search databases What units of information
A Tutorial in Genetic Sequence Classification Tools and Techniques
A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University [email protected] www.jakemdrew.com Sequence Characters IUPAC nucleotide
A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques
Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 402 A Multiple DNA Sequence Translation Tool Incorporating Web
GenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011
Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear
Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: [email protected]
DNA Sequence formats
DNA Sequence formats [Plain] [EMBL] [FASTA] [GCG] [GenBank] [IG] [IUPAC] [How Genomatix represents sequence annotation] Plain sequence format A sequence in plain format may contain only IUPAC characters
Biological Databases and Protein Sequence Analysis
Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to
Module 10: Bioinformatics
Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
An agent-based layered middleware as tool integration
An agent-based layered middleware as tool integration Flavio Corradini Leonardo Mariani Emanuela Merelli University of L Aquila University of Milano University of Camerino ITALY ITALY ITALY Helsinki FSE/ESEC
Integrating Bioinformatics, Medical Sciences and Drug Discovery
Integrating Bioinformatics, Medical Sciences and Drug Discovery M. Madan Babu Centre for Biotechnology, Anna University, Chennai - 600025 phone: 44-4332179 :: email: [email protected] Bioinformatics
Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
(A GUIDE for the Graphical User Interface (GUI) GDE)
The Genetic Data Environment: A User Modifiable and Expandable Multiple Sequence Analysis Package (A GUIDE for the Graphical User Interface (GUI) GDE) Jonathan A. Eisen Department of Biological Sciences
org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.
org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank
Bioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
Module 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
Sequence information - lectures
Sequence information - lectures Pairwise alignment Alignments in database searches Multiple alignments Profiles Patterns RNA secondary structure / Transformational grammars Genome organisation / Gene prediction
UGENE Quick Start Guide
Quick Start Guide This document contains a quick introduction to UGENE. For more detailed information, you can find the UGENE User Manual and other special manuals in project website: http://ugene.unipro.ru.
A data management framework for the Fungal Tree of Life
Web Accessible Sequence Analysis for Biological Inference A data management framework for the Fungal Tree of Life Kauff F, Cox CJ, Lutzoni F. 2007. WASABI: An automated sequence processing system for multi-gene
Core Bioinformatics. Degree Type Year Semester
Core Bioinformatics 2015/2016 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: [email protected] Teachers Use of
Distributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London
Distributed Data Mining in Discovery Net Dr. Moustafa Ghanem Department of Computing Imperial College London 1. What is Discovery Net 2. Distributed Data Mining for Compute Intensive Tasks 3. Distributed
Data formats and file conversions
Building Excellence in Genomics and Computational Bioscience s Richard Leggett (TGAC) John Walshaw (IFR) Common file formats FASTQ FASTA BAM SAM Raw sequence Alignments MSF EMBL UniProt BED WIG Databases
Version 5.0 Release Notes
Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com
200630 - FBIO - Fundations of Bioinformatics
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 1004 - UB - (ENG)Universitat de Barcelona MASTER'S DEGREE IN STATISTICS AND
Basic Concepts of DNA, Proteins, Genes and Genomes
Basic Concepts of DNA, Proteins, Genes and Genomes Kun-Mao Chao 1,2,3 1 Graduate Institute of Biomedical Electronics and Bioinformatics 2 Department of Computer Science and Information Engineering 3 Graduate
Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison
Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 [email protected]
SUBMITTING DNA SEQUENCES TO THE DATABASES
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D. Baxevanis, B.F. Francis Ouellette Copyright 2001 John Wiley & Sons, Inc. ISBNs: 0-471-38390-2 (Hardback);
Library page. SRS first view. Different types of database in SRS. Standard query form
SRS & Entrez SRS Sequence Retrieval System Bengt Persson Whatis SRS? Sequence Retrieval System User-friendly interface to databases http://srs.ebi.ac.uk Developed by Thure Etzold and co-workers EMBL/EBI
PPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, [email protected]) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives [email protected] 2015-05-21 Functional Bioinformatics, Örebro University Vad är bioinformatik och varför
Activity 7.21 Transcription factors
Purpose To consolidate understanding of protein synthesis. To explain the role of transcription factors and hormones in switching genes on and off. Play the transcription initiation complex game Regulation
Bioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
BIOINFORMATICS TUTORIAL
Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.
Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
Guide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks
Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks Semester II Paper II: Mathematics I 85 marks B.Sc. II Year
Bio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
Biological Sequence Data Formats
Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA
Pairwise Sequence Alignment
Pairwise Sequence Alignment [email protected] SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
UNIT I LESSON -1 INTRODUCTION TO BIOINFORMATICS
UNIT I LESSON -1 INTRODUCTION TO BIOINFORMATICS 1.0 Aims and Objectives 1.1 Introduction to Bioinformatics 1.2 Landmark Sequences Completed 1.3 Sequence Analysis: Sequence to Potential Function 1.4 The
BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS
BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title: Bioinformatics
Global and Discovery Proteomics Lecture Agenda
Global and Discovery Proteomics Christine A. Jelinek, Ph.D. Johns Hopkins University School of Medicine Department of Pharmacology and Molecular Sciences Middle Atlantic Mass Spectrometry Laboratory Global
A Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
Linear Sequence Analysis. 3-D Structure Analysis
Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic
THE GENBANK SEQUENCE DATABASE
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D. Baxevanis, B.F. Francis Ouellette Copyright 2001 John Wiley & Sons, Inc. ISBNs: 0-471-38390-2 (Hardback);
Introduction to GCG and SeqLab
Oxford University Bioinformatics Centre Introduction to GCG and SeqLab 31 July 2001 Oxford University Bioinformatics Centre, 2001 Sir William Dunn School of Pathology South Parks Road Oxford, OX1 3RE Contents
CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu [email protected] 1. Introduction
ProteinQuest user guide
ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for
Comparing Methods for Identifying Transcription Factor Target Genes
Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge
Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility
Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility Report of a meeting organized by the Wellcome Trust and held on 14 15 January 2003 at Fort Lauderdale,
Committee on WIPO Standards (CWS)
E CWS/1/5 ORIGINAL: ENGLISH DATE: OCTOBER 13, 2010 Committee on WIPO Standards (CWS) First Session Geneva, October 25 to 29, 2010 PROPOSAL FOR THE PREPARATION OF A NEW WIPO STANDARD ON THE PRESENTATION
Special report. Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006
Special report Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006 Gene And Protein The gene that causes the mutation is CCND1 and the protein NP_444284 The mutation deals with the cell
University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology
University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology Programme Structure - the MSc outcome will require 180 credits total (full-time only) - 60
Clone Manager. Getting Started
Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software
Teaching Bioinformatics to Undergraduates
Teaching Bioinformatics to Undergraduates http://www.med.nyu.edu/rcr/asm Stuart M. Brown Research Computing, NYU School of Medicine I. What is Bioinformatics? II. Challenges of teaching bioinformatics
Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices
overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding
Dr Alexander Henzing
Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander
Big Data in Drug Discovery
Big Data in Drug Discovery David J. Wild Assistant Professor & Director, Cheminformatics Program Indiana University School of Informatics and Computing [email protected] - http://djwild.info Epochs in
Bioinformatics Tools Tutorial Project Gene ID: KRas
Bioinformatics Tools Tutorial Project Gene ID: KRas Bednarski 2011 Original project funded by HHMI Bioinformatics Projects Introduction and Tutorial Purpose of this tutorial Illustrate the link between
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues
Algorithms in Computational Biology (236522) spring 2007 Lecture #1
Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office
Structure and Function of DNA
Structure and Function of DNA DNA and RNA Structure DNA and RNA are nucleic acids. They consist of chemical units called nucleotides. The nucleotides are joined by a sugar-phosphate backbone. The four
An Introduction to Genomics and SAS Scientific Discovery Solutions
An Introduction to Genomics and SAS Scientific Discovery Solutions Dr Karen M Miller Product Manager Bioinformatics SAS EMEA 16.06.03 Copyright 2003, SAS Institute Inc. All rights reserved. 1 Overview!
Bioinformatics: course introduction
Bioinformatics: course introduction Filip Železný Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Intelligent Data Analysis lab http://ida.felk.cvut.cz
P G DIPLOMA IN BIOINFORMATICS
P G DIPLOMA IN BIOINFORMATICS Name Course Code Name of the Course Credits PGD BINF 301 Introduction to Bioinformatics and Databases 2 Module I PGD BINF 302 Genome and Protein Sequence Analysis 2 Basic
Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS
Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary
Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004
Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence
Introduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])
820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor
Sequencing the Human Genome
Revised and Updated Edvo-Kit #339 Sequencing the Human Genome 339 Experiment Objective: In this experiment, students will read DNA sequences obtained from automated DNA sequencing techniques. The data
RAST Automated Analysis. What is RAST for?
RAST Automated Analysis Gordon D. Pusch Fellowship for Interpretation of Genomes What is RAST for? RAST is designed to rapidly call and annotate the genes of a complete or essentially complete prokaryotic
Human Genome and Human Genome Project. Louxin Zhang
Human Genome and Human Genome Project Louxin Zhang A Primer to Genomics Cells are the fundamental working units of every living systems. DNA is made of 4 nucleotide bases. The DNA sequence is the particular
Scientific databases. Biological data management
Scientific databases Biological data management The term paper within the framework of the course Principles of Modern Database Systems by Aleksejs Kontijevskis PhD student The Linnaeus Centre for Bioinformatics
Genome Viewing. Module 2. Using Genome Browsers to View Annotation of the Human Genome
Module 2 Genome Viewing Using Genome Browsers to View Annotation of the Human Genome Bert Overduin, Ph.D. PANDA Coordination & Outreach EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus
Biological Sciences Initiative. Human Genome
Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.
Databases and mapping BWA. Samtools
Databases and mapping BWA Samtools FASTQ, SFF, bax.h5 ACE, FASTG FASTA BAM/SAM GFF, BED GenBank/Embl/DDJB many more File formats FASTQ Output format from Illumina and IonTorrent sequencers. Quality scores:
GenBank: A Database of Genetic Sequence Data
GenBank: A Database of Genetic Sequence Data Computer Science 105 Boston University David G. Sullivan, Ph.D. An Explosion of Scientific Data Scientists are generating ever increasing amounts of data. Relevant
Molecular Genetics. RNA, Transcription, & Protein Synthesis
Molecular Genetics RNA, Transcription, & Protein Synthesis Section 1 RNA AND TRANSCRIPTION Objectives Describe the primary functions of RNA Identify how RNA differs from DNA Describe the structure and
