Protein Domain Identification Curso Doctorado UAM 2008

Size: px
Start display at page:

Download "Protein Domain Identification Curso Doctorado UAM 2008"

Transcription

1 ProteinDomain Identification CursoDoctorado UAM 2008 LuisSánchezPulido CentroNacionaldeBiotecnología Madrid

2 Whydoweanalysesequences? Proteinswithknownsequence 3D Function Both?????

3 DataOverload!!! Databasegrowthbyyear www3.ebi.ac.uk/services/dbstats/

4 Thankstotherecognitionof homologybetweenproteins, wecan TRANSFERINFORMATION Structuraland/orFunctional

5 Homologues:twoproteinswithacommonancestor....dependentonthetypeofdivergencetheycanbe: orthologues speciation paralogues geneduplication xenologues horizontaltransference

6 INFORMATIONTRANSFER Structural fromhomologousproteinsofknownstructure(x Ray,NMRoEM) Functional fromexperimentallycharacterisedhomologousproteinsortheirgenomic orproteomiccontext

7 TheStructureisbetterconservedthansequence! D'AlfonsoG,TramontanoA,LahmA. Structuralconservationinsingle domainproteins:implicationsforhomologymodeling. JStructBiol.134, (2001)

8 ARemoteHomologyexample: 1NYN 1P9Q SBDSFamily

9 DefiniendoHomologíaRemota truepositives RostB.(1999) Twilightzoneof proteinsequencealignments. ProteinEng.12: truenegatives

10 Comparisonsbetweenpairsofsequenceswithknownstructure 100 Identity 50 20% Size Twilight zone Chothia&Lesk, 1986 Rost,1999 == Rmsd>3ARmsd<3A

11 INFORMATIONTRANSFER Structural fromhomologousproteinsofknownstructure(x Ray,NMRoEM) Functional fromexperimentallycharacterisedhomologousproteinsortheirgenomic orproteomiccontext

12 FUNCTION? Thesearehomologous Proteins... Theirroleinthecellis verydifferent But...AllofthembindGTP

13 Depending onthe Definition offunction

14 TheTransferofStructuraland/orFunctional Informationbetweenhomologousproteinsisa ComplexTask Howisitdone? DivideeachoftheTasksin asmanypartsasisnecessary toresolvetheproblem

15 Domain Definition Domains are described, from a structural point of view, as structurally compact units, locally independent in function and folding and usually characterized by a well define hydrophobic core. From sequence analysis point of view, we describe domains as evolutionary conserved regions that are present in different protein families of diverse architecture. Hypothetical Domain

16

17 REPEATS InthelimitsofDomainDefinition LRR HEAT TPR PFTA beta l WD40 Proteinrepeats.ShortspecialistreviewfortheEncyclopediaofGenomics,Proteomics,and Cedidapor:Perez IratxetaC,AndradeMA(2005)Bioinf.Ed.WileyandSonsLtd.,UK.

18 Proteinirregularitiesthathindersequence analysis Lowcomplexityregions Repeats,Trans membraneandcoiled coilregions(highmutation rates) andfoldirregularities,suchas: CircularPermutationsandInsertions

19 Theroleofdomainsinprotein evolution Shuffling,AccretionandSupra Domains VERSATILITY!!

20 METHODS ON DOMAIN ORIENTED SEQUENCE ANALYSIS

21 CommonName IDorACCorGI ReferenceDatabase SRS - EBI Entrez NCBI Buscar comparando: Sequence MRTSRGH... Alignment >Profile RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN Secuencia contra Secuencias BLAST Secuencias contra Perfiles Pfam Buscar comparando: Perfil contra Secuencias PsiBlast o HMMer Perfil contra Perfiles HHpred

22

23

24

25

26

27

28

29

30 Detectionofhomologousproteinsequences Profiles (PSSMs:PsiBlast,HMMs) Reciprocal!

31

32

33

34

35

36

37

38 TwomainCharacterictics: CombiningMultipleAlignmentMethods MixingHeterogenousInformation AND

39 Combining Multiple Alignment Methods Clustal W T-Coffee Probcons Specialist Muscle T-Coffee Multiple Sequence Alignment CopyrightCédricNotredame,2000,allrightsreserved

40 Mixing Heterogenous Information Local Alignment Global Alignment Multiple Alignment Specialist Structural T-Coffee Multiple Sequence Alignment CopyrightCédricNotredame,2000,allrightsreserved

41 Duringthistediouswork... ofsequencebysequencecompilation Alotofblastsearchestowarddifferent(ests,genomicDNAandProtein)sequencedatabases ESTclusteringandtranscriptreconstruction IncludingGeneFinding(FGENESH+)... WemusthaveFaithinthat: Everynewsequencethatweaddtothealignment, couldbecausefor"phasetransition"between aprofilethatlocalize,ornot,remotehomologousproteins...withknownfunction!!!

42 Admiring the amazing life's diversity GenBank CopyrightCédricNotredame,2000,allrightsreserved dbest

43 BasicReferences: ZuckerkandlE,yPaulingL.(1965) Evolutionarydivergenceandconvergenceinproteins. EvolvingGenesandProteins, AcademicPress,NewYork, Questions:

44 CommonName IDorACCorGI BasedeDatosdeReferencia SRS - EBI Entrez NCBI UNIPROT Buscar comparando: Sequence MRTSRGH... Alignment >Profile RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN RTNMSDAQQ GSWYSDPK REGWFYN Secuencia contra Secuencias BLAST Secuencias contra Perfiles Pfam Buscar comparando: Perfil contra Secuencias PsiBlast o HMMer Perfil contra Perfiles HHpred

45 Detectionofhomologousproteinsequences Profiles (PSSMs:PsiBlast,HMMs) Reciprocal!

46

47 A easier way to view a Protein Sequence... Thanks to DOMAIN DataBases

48

49

50 Pfam IgCLANFamily

51 Pfam IgCLANFamily

52

53

54

55

56

57 STRINGUPDATE(Version7.0) available1stjanuary2007 WhatisSTRING?? MoreGenomes 0.7millionsofproteinsin179species 1.5millionsofproteinsin373species CoreyPeriphera l STRING&SMART MoreDatabases... MINT,HPRD,BioGRID,DIPandReactome NewDisplay AJAX

58

59 373

60

61

62

63

64

65 Sequence Domain Oriented Sequence Sequence DataBases dbest Analysis Flow-Chart HMMer GenBank Domain Databases As you will never be sure which are the right problems to work on, most of the time that you spend in the laboratory or at your desk will be wasted. If you want to be creative, HHpred ALIGNMENT HMMer then you will have to get used to spending most of your time DOMAIN Hypothetical not being creative, Domain to being becalmed on the ocean of scientific knowledge. Steven Weinberg Biochemical Knowledge How Do the Pieces of the Functional Assignment Puzzle Fit Together? Functional Hypothesis

66 REAL-LIFE EXAMPLES

67 SPOC:Awidelydistributeddomainassociatedwithcancer,apoptosisandtranscription. Sanchez PulidoL,RojasAM,VanWelyK,Martinez AC,ValenciaA. CNB CSIC DATF1_HUMAN(DIDO 1) TFS2M PHD dphd SPOC

68 Domainlocatedin, atleast, TwoArchitectures

69

70 Gas1isrelatedtotheGFRαfamilyandregulatesRetsignaling CabreraJ.R.,Sánchez PulidoL.,RojasA.M.,ValenciaA., MańesS.,NaranjoJ.R.&MellstromB.(2005) CNB CSIC InitialProtein:GAS1(GrowthArrestSpecific1) FUNCIÓN:Regulacióndeprocesosapoptóticos. Enunprimerabordaje: Péptidoseńal DuplicaciónInterna GPI Anclaje

71 GAS1&GFR α ModelStructure

72

73 Arquitecturasimilar

74 JRCabrera CNB CSIC BIBLIOGRAFÍA Diferentesfactorestróficosoligandos: GDNFGlialcellDerivedNeurotrophicFactor NRTNNeurturin ARTNArtemin PSPNPersephin LaFamiliaGFRαconverge enlatransduccióndelaseńal ensuinteraccióncon laquinasaret > YGAS1...?

75 Computational predictions supportedby experimentalanalysis.

76 DTRGHYFASSTNDR???????????? JorgeRuben, Santos & Ana

77 MALproteinFamily MARVEL:aconserveddomaininvolvedinmembraneappositionevents. Sánchez PulidoL.,Martín BelmonteF.,ValenciaA.&AlonsoM.A.(2002) CNB CSIC&CBM CSIC InitialProtein:MAL Function:MembraneTraffic ApicalZoneofpolarizedEpithelialCells TransGolgi Endosomes Membrane

78 Diverse...buthasCOMMONFunctionalElements Rafts=Cholesterol+Sphingolipids Zonulaoccludens CONCLUSION: Similar MARVELdomaincouldbeusedasmachineryforraft Phenotypesin organizationinmembraneappositionevents,suchas ChristophThiele,MatthewJ.Hannah,Falk thoseoccurringduringbiogenesisoftransportvesicles Fahrenholz&WielandB.Huttner OverExpression (e.g.mal,physinsandgyrins)ortheformationof NatureCellBiology2,(2000) Cholesterolbindstosynaptophysinandis specializedclosecontacts(kisses)intightjunctions requiredforbiogenesisofsynapticvesicles (e.g.occludin).

79 ?????????????????? REL RER Endocitosis Lisosomas Secrección Vesícula Golgi HELPinexperimentalCharacterization

80 p40 rtt45 Tinton23 p23 Rr78 SinChan2 ppr..... EssentialGenesinMyoblatfusion MARVEL DTRGHYFASSTNDR???????????? HELPinexperimentalCharacterization

81 CONCLUSION Cada Proteína es un MUNDO! Asageneralguidetofunctionalannotation,itshouldbekept inmindthatcurrentmethodsforgenomeanalysis,eventhe mostpowerfulandsophisticatedofthem,facilitate,butdo notsupplanttheworkofahumanexpert. EugeneKoonin.

82 Ofallnaturalsystems,livingmatter istheonewhich,inthefaceofgreattransformations, preservesinscribedinitsorganizationthelargestamountofitsownpasthistory. UsingHegel'sexpression,wemaysaythatthereisnoothersystemthatisbetter"aufgehoben" (constantlyabolishedandsimultaneouslypreserved). Wemayaskthequestionswhereinthenowlivingsystemsthegreatestamount oftheirpasthistoryhassurvivedand Bioinformatics howitcanbeextracted. EmileZuckerkandlyLinusPauling,1965. Moleculesasdocumentsofevolutionaryhistory. JTheorBiol.8: Evolutionarydivergenceandconvergenceinproteins. EvolvingGenesandProteins,AcademicPress,NewYork,

83 BasicReferences: ZuckerkandlE,yPaulingL.(1965) Evolutionarydivergenceandconvergenceinproteins. EvolvingGenesandProteins, AcademicPress,NewYork, BorkP,GibsonTJ.(1996) Applyingmotifandprofilesearches. MethodsEnzymol.266: IyerLM,AravindL,BorkP,HofmannK,MushegianAR,ZhulinIB,KooninEV.(2001) Quoderatdemonstrandum?Themysteryofexperimentalvalidationofapparentlyerroneous computationalanalysesofproteinsequences. GenomeBiol.2:RESEARCH0051. Questions:

Linear Sequence Analysis. 3-D Structure Analysis

Linear Sequence Analysis. 3-D Structure Analysis Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic

More information

EMBL-EBI Web Services

EMBL-EBI Web Services EMBL-EBI Web Services Rodrigo Lopez Head of the External Services Team SME Workshop Piemonte 2011 EBI is an Outstation of the European Molecular Biology Laboratory. Summary Introduction The JDispatcher

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

A new type of Hidden Markov Models to predict complex domain architecture in protein sequences

A new type of Hidden Markov Models to predict complex domain architecture in protein sequences A new type of Hidden Markov Models to predict complex domain architecture in protein sequences Raluca Uricaru, Laurent Bréhélin and Eric Rivals LIRMM, CNRS Université de Montpellier 2 14 Juin 2007 Raluca

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

More information

Guide for Bioinformatics Project Module 3

Guide for Bioinformatics Project Module 3 Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

More information

Module 1. Sequence Formats and Retrieval. Charles Steward

Module 1. Sequence Formats and Retrieval. Charles Steward The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat

More information

ID of alternative translational initiation events. Description of gene function Reference of NCBI database access and relative literatures

ID of alternative translational initiation events. Description of gene function Reference of NCBI database access and relative literatures Data resource: In this database, 650 alternatively translated variants assigned to a total of 300 genes are contained. These database records of alternative translational initiation have been collected

More information

CDD: a curated Entrez database of conserved domain alignments

CDD: a curated Entrez database of conserved domain alignments # 2003 Oxford University Press Nucleic Acids Research, 2003, Vol. 31, No. 1 383 387 DOI: 10.1093/nar/gkg087 CDD: a curated Entrez database of conserved domain alignments Aron Marchler-Bauer*, John B. Anderson,

More information

Sequence Information. Sequence information. Good web sites. Sequence information. Sequence. Sequence

Sequence Information. Sequence information. Good web sites. Sequence information. Sequence. Sequence Sequence information Multiple Pair-wise SRS Entrez Comparisons Database searches Sequence Information Orthologue clusters Sequence Organell localisation Patterns Protein families Membrane attachment Bengt

More information

BIOINFORMATICS TUTORIAL

BIOINFORMATICS TUTORIAL Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

The Integrated Microbial Genomes (IMG) System: A Case Study in Biological Data Management

The Integrated Microbial Genomes (IMG) System: A Case Study in Biological Data Management The Integrated Microbial Genomes (IMG) System: A Case Study in Biological Data Management Victor M. Markowitz 1, Frank Korzeniewski 1, Krishna Palaniappan 1, Ernest Szeto 1, Natalia Ivanova 2, and Nikos

More information

UGENE Quick Start Guide

UGENE Quick Start Guide Quick Start Guide This document contains a quick introduction to UGENE. For more detailed information, you can find the UGENE User Manual and other special manuals in project website: http://ugene.unipro.ru.

More information

Discovering Bioinformatics

Discovering Bioinformatics Discovering Bioinformatics Sami Khuri Natascha Khuri Alexander Picker Aidan Budd Sophie Chabanis-Davidson Julia Willingale-Theune English version ELLS European Learning Laboratory for the Life Sciences

More information

Biological Databases and Protein Sequence Analysis

Biological Databases and Protein Sequence Analysis Biological Databases and Protein Sequence Analysis Introduction M. Madan Babu, Center for Biotechnology, Anna University, Chennai 25, India Bioinformatics is the application of Information technology to

More information

Core Bioinformatics. Titulació Tipus Curs Semestre. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Titulació Tipus Curs Semestre. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Codi: 42397 Crèdits: 12 Titulació Tipus Curs Semestre 4313473 Bioinformàtica/Bioinformatics OB 0 1 Professor de contacte Nom: Sònia Casillas Viladerrams Correu electrònic:

More information

Sequence homology search tools on the world wide web

Sequence homology search tools on the world wide web 44 Sequence Homology Search Tools Sequence homology search tools on the world wide web Ian Holmes Berkeley Drosophila Genome Project, Berkeley, CA email: ihh@fruitfly.org Introduction Sequence homology

More information

Department of Microbiology, University of Washington

Department of Microbiology, University of Washington The Bioverse: An object-oriented genomic database and webserver written in Python Jason McDermott and Ram Samudrala Department of Microbiology, University of Washington mcdermottj@compbio.washington.edu

More information

200630 - FBIO - Fundations of Bioinformatics

200630 - FBIO - Fundations of Bioinformatics Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 200 - FME - School of Mathematics and Statistics 1004 - UB - (ENG)Universitat de Barcelona MASTER'S DEGREE IN STATISTICS AND

More information

Genome Explorer For Comparative Genome Analysis

Genome Explorer For Comparative Genome Analysis Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence

More information

BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS

BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title: Bioinformatics

More information

Teaching Bioinformatics to Undergraduates

Teaching Bioinformatics to Undergraduates Teaching Bioinformatics to Undergraduates http://www.med.nyu.edu/rcr/asm Stuart M. Brown Research Computing, NYU School of Medicine I. What is Bioinformatics? II. Challenges of teaching bioinformatics

More information

An agent-based layered middleware as tool integration

An agent-based layered middleware as tool integration An agent-based layered middleware as tool integration Flavio Corradini Leonardo Mariani Emanuela Merelli University of L Aquila University of Milano University of Camerino ITALY ITALY ITALY Helsinki FSE/ESEC

More information

P G DIPLOMA IN BIOINFORMATICS

P G DIPLOMA IN BIOINFORMATICS P G DIPLOMA IN BIOINFORMATICS Name Course Code Name of the Course Credits PGD BINF 301 Introduction to Bioinformatics and Databases 2 Module I PGD BINF 302 Genome and Protein Sequence Analysis 2 Basic

More information

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper

More information

Molecular Databases and Tools

Molecular Databases and Tools NWeHealth, The University of Manchester Molecular Databases and Tools Afternoon Session: NCBI/EBI resources, pairwise alignment, BLAST, multiple sequence alignment and primer finding. Dr. Georgina Moulton

More information

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011 Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear

More information

Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

More information

Cloud Ready for Bioinformatics?

Cloud Ready for Bioinformatics? IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001) Cloud Ready for Bioinformatics?

More information

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:

More information

Biological Sequence Data Formats

Biological Sequence Data Formats Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA

More information

Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison

Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

More information

The Galaxy workflow. George Magklaras PhD RHCE

The Galaxy workflow. George Magklaras PhD RHCE The Galaxy workflow George Magklaras PhD RHCE Biotechnology Center of Oslo & The Norwegian Center of Molecular Medicine University of Oslo, Norway http://www.biotek.uio.no http://www.ncmm.uio.no http://www.no.embnet.org

More information

A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model

A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model G.R. Fernandes¹*, D.V.C. Barbosa¹*, F. Prosdocimi¹, I.A. Pena¹, L. Santana-Santos¹,

More information

THE UNIVERSITY OF MANCHESTER Unit Specification

THE UNIVERSITY OF MANCHESTER Unit Specification 1. GENERAL INFORMATION Title Unit code Credit rating 15 Level 7 Contact hours 30 Other Scheduled teaching and learning activities* Pre-requisite units Co-requisite units School responsible Member of staff

More information

Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

More information

Importance of Statistics in creating high dimensional data

Importance of Statistics in creating high dimensional data Importance of Statistics in creating high dimensional data Hemant K. Tiwari, PhD Section on Statistical Genetics Department of Biostatistics University of Alabama at Birmingham History of Genomic Data

More information

Geneious 4.0.2. Biomatters Ltd

Geneious 4.0.2. Biomatters Ltd Geneious 4.0.2 Biomatters Ltd 17th September 2008 2 Contents 1 Getting Started 7 1.1 Downloading & Installing Geneious.......................... 7 1.2 Using Geneious for the first time............................

More information

Integration of data management and analysis for genome research

Integration of data management and analysis for genome research Integration of data management and analysis for genome research Volker Brendel Deparment of Zoology & Genetics and Department of Statistics Iowa State University 2112 Molecular Biology Building Ames, Iowa

More information

Protein annotation and modelling servers at University College London

Protein annotation and modelling servers at University College London Nucleic Acids Research Advance Access published May 27, 2010 Nucleic Acids Research, 2010, 1 6 doi:10.1093/nar/gkq427 Protein annotation and modelling servers at University College London D. W. A. Buchan*,

More information

GenBank: A Database of Genetic Sequence Data

GenBank: A Database of Genetic Sequence Data GenBank: A Database of Genetic Sequence Data Computer Science 105 Boston University David G. Sullivan, Ph.D. An Explosion of Scientific Data Scientists are generating ever increasing amounts of data. Relevant

More information

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk Three data delivery cases for EMBL- EBI s Embassy Guy Cochrane www.ebi.ac.uk EMBL European Bioinformatics Institute Genes, genomes & variation European Nucleotide Archive 1000 Genomes Ensembl Ensembl Genomes

More information

ClusterControl: A Web Interface for Distributing and Monitoring Bioinformatics Applications on a Linux Cluster

ClusterControl: A Web Interface for Distributing and Monitoring Bioinformatics Applications on a Linux Cluster Bioinformatics Advance Access published January 29, 2004 ClusterControl: A Web Interface for Distributing and Monitoring Bioinformatics Applications on a Linux Cluster Gernot Stocker, Dietmar Rieder, and

More information

AP Biology Essential Knowledge Student Diagnostic

AP Biology Essential Knowledge Student Diagnostic AP Biology Essential Knowledge Student Diagnostic Background The Essential Knowledge statements provided in the AP Biology Curriculum Framework are scientific claims describing phenomenon occurring in

More information

Move to Usability SOA Arquitecture: Undo Process Implementation

Move to Usability SOA Arquitecture: Undo Process Implementation Move to Usability SOA Arquitecture: Undo Process Implementation Hernan Merlino, Oscar Dieste, Patricia Pesado, and Ramon Garcia-Martinez Abstract This work is a new stage of an investigation in usability

More information

Degree Level Expectations, Learning Outcomes, Indicators of Achievement and the Program Requirements that Support the Learning Outcomes

Degree Level Expectations, Learning Outcomes, Indicators of Achievement and the Program Requirements that Support the Learning Outcomes Department/Academic Unit: DBMS/Graduate Program in Biochemistry Degree Program: MSc Degree Level Expectations, Learning Outcomes, Indicators of Achievement and the Program Requirements that Support the

More information

Core Bioinformatics. Degree Type Year Semester

Core Bioinformatics. Degree Type Year Semester Core Bioinformatics 2015/2016 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat Teachers Use of

More information

Global and Discovery Proteomics Lecture Agenda

Global and Discovery Proteomics Lecture Agenda Global and Discovery Proteomics Christine A. Jelinek, Ph.D. Johns Hopkins University School of Medicine Department of Pharmacology and Molecular Sciences Middle Atlantic Mass Spectrometry Laboratory Global

More information

MAKING AN EVOLUTIONARY TREE

MAKING AN EVOLUTIONARY TREE Student manual MAKING AN EVOLUTIONARY TREE THEORY The relationship between different species can be derived from different information sources. The connection between species may turn out by similarities

More information

Secondary Structure Prediction. Michael Tress CNIO

Secondary Structure Prediction. Michael Tress CNIO Secondary Structure Prediction Michael Tress CNIO Why do we Need to Know About Secondary Structure? Secondary structure prediction is a step towards deducing the fold. In order to arrive at the correct

More information

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. Tutorial Module 5 BioMart You will learn about BioMart, a joint project developed and maintained at EBI and OiCR www.biomart.org How to use BioMart to quickly obtain lists of gene information from Ensembl

More information

investigation 3 Comparing DNA Sequences to

investigation 3 Comparing DNA Sequences to Big Idea 1 Evolution investigation 3 Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST How can bioinformatics be used as a tool to determine evolutionary relationships and to

More information

Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999

Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999 Dr Clare Sansom works part time at Birkbeck College, London, and part time as a freelance computer consultant and science writer At Birkbeck she coordinates an innovative graduate-level Advanced Certificate

More information

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 402 A Multiple DNA Sequence Translation Tool Incorporating Web

More information

PROGRAMME SPECIFICATION

PROGRAMME SPECIFICATION PROGRAMME SPECIFICATION 1 Awarding Institution: University of Exeter 2 School(s)/Teaching Institution: School of Biosciences 3 Programme accredited/validated by: 4 Final Award(s): MSc Medical Informatics

More information

Data Integration of Bioinformatics Database Based on Web Services

Data Integration of Bioinformatics Database Based on Web Services Data Integration of Bioinformatics Database Based on Web Services Yuelan Liu, Jian hua Wang College of Computer, Harbin Normal University Intelligent Education Information Technology Emphases Lab of Heilongjiang

More information

Structure Tools and Visualization

Structure Tools and Visualization Structure Tools and Visualization Gary Van Domselaar University of Alberta gary.vandomselaar@ualberta.ca Slides Adapted from Michel Dumontier, Blueprint Initiative 1 Visualization & Communication Visualization

More information

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need

More information

Committee on WIPO Standards (CWS)

Committee on WIPO Standards (CWS) E CWS/1/5 ORIGINAL: ENGLISH DATE: OCTOBER 13, 2010 Committee on WIPO Standards (CWS) First Session Geneva, October 25 to 29, 2010 PROPOSAL FOR THE PREPARATION OF A NEW WIPO STANDARD ON THE PRESENTATION

More information

Process of Science: Using Diffusion and Osmosis

Process of Science: Using Diffusion and Osmosis Process of Science: Using Diffusion and Osmosis OBJECTIVES: 1. To understand one way to approach the process of science through an investigation of diffusion and osmosis. 2. To explore how different molecules

More information

Keystone Review Practice Test Module A Cells and Cell Processes. 1. Which characteristic is shared by all prokaryotes and eukaryotes?

Keystone Review Practice Test Module A Cells and Cell Processes. 1. Which characteristic is shared by all prokaryotes and eukaryotes? Keystone Review Practice Test Module A Cells and Cell Processes 1. Which characteristic is shared by all prokaryotes and eukaryotes? a. Ability to store hereditary information b. Use of organelles to control

More information

Module 3. Genome Browsing. Using Web Browsers to View Genome Annota4on. Kers4n Howe Wellcome Trust Sanger Ins4tute zfish- help@sanger.ac.

Module 3. Genome Browsing. Using Web Browsers to View Genome Annota4on. Kers4n Howe Wellcome Trust Sanger Ins4tute zfish- help@sanger.ac. Module 3 Genome Browsing Using Web Browsers to View Genome Annota4on Kers4n Howe Wellcome Trust Sanger Ins4tute zfish- help@sanger.ac.uk Introduc.on Genome browsing The Ensembl gene set Guided examples

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

Section 3 Comparative Genomics and Phylogenetics

Section 3 Comparative Genomics and Phylogenetics Section 3 Section 3 Comparative enomics and Phylogenetics At the end of this section you should be able to: Describe what is meant by DNA sequencing. Explain what is meant by Bioinformatics and Comparative

More information

An Introduction to the Semantic Web for Life Science Practitioners

An Introduction to the Semantic Web for Life Science Practitioners Applied Semantic Web Timely. Practical. Reliable. http://applied-semantic-web.org An Introduction to the Semantic Web for Life Science Practitioners Emanuele Della Valle emanuele.dellavalle@polimi.it http://emanueledellavalle.org

More information

Laboratorio di Bioinformatica

Laboratorio di Bioinformatica Laboratorio di Bioinformatica Lezione #2 Dr. Marco Fondi Contact: marco.fondi@unifi.it www.unifi.it/dblemm/ tel. 0552288308 Dip.to di Biologia Evoluzionistica Laboratorio di Evoluzione Microbica e Molecolare,

More information

EMBL Identity & Access Management

EMBL Identity & Access Management EMBL Identity & Access Management Rupert Lück EMBL Heidelberg e IRG Workshop Zürich Apr 24th 2008 Outline EMBL Overview Identity & Access Management for EMBL IT Requirements & Strategy Project Goal and

More information

Amazing DNA facts. Hands-on DNA: A Question of Taste Amazing facts and quiz questions

Amazing DNA facts. Hands-on DNA: A Question of Taste Amazing facts and quiz questions Amazing DNA facts These facts can form the basis of a quiz (for example, how many base pairs are there in the human genome?). Students should be familiar with most of this material, so the quiz could be

More information

Bioinformatics: course introduction

Bioinformatics: course introduction Bioinformatics: course introduction Filip Železný Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Intelligent Data Analysis lab http://ida.felk.cvut.cz

More information

2. Analysis, Design and Implementation

2. Analysis, Design and Implementation 2. Subject/Topic/Focus: Software Production Process Summary: Software Crisis Software as a Product: From Individual Programs to Complete Application Systems Software Development: Goals, Tasks, Actors,

More information

Bioinformatics for Biologists. Protein Structure

Bioinformatics for Biologists. Protein Structure Bioinformatics for Biologists Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research

More information

BMC Bioinformatics. Open Access. Abstract

BMC Bioinformatics. Open Access. Abstract BMC Bioinformatics BioMed Central Software Recent Hits Acquired by BLAST (ReHAB): A tool to identify new hits in sequence similarity searches Joe Whitney, David J Esteban and Chris Upton* Open Access Address:

More information

Guzmán Llambías and Raúl Ruggia Universidad de la República, Facultad de Ingeniería, Montevideo, Uruguay, 11300 {gllambi, ruggia}@fing.edu.

Guzmán Llambías and Raúl Ruggia Universidad de la República, Facultad de Ingeniería, Montevideo, Uruguay, 11300 {gllambi, ruggia}@fing.edu. CLEI ELECTRONIC JOURNAL, VOLUME 18, NUMBER 2, PAPER 6, AUGUST 2015 A middleware-based platform for the integration of bioinformatic services Guzmán Llambías and Raúl Ruggia Universidad de la República,

More information

Distributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London

Distributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London Distributed Data Mining in Discovery Net Dr. Moustafa Ghanem Department of Computing Imperial College London 1. What is Discovery Net 2. Distributed Data Mining for Compute Intensive Tasks 3. Distributed

More information

Master of Philosophy (MPhil) and Doctor of Philosophy (PhD) Programs in Life Science

Master of Philosophy (MPhil) and Doctor of Philosophy (PhD) Programs in Life Science CURRICULUM FOR RESEARCH POSTGRADUATE PROGRAMS Master of Philosophy (MPhil) and Doctor of Philosophy (PhD) Programs in Life Science Curriculum for Master of Philosophy (MPhil) Program in Life Science The

More information

Special report. Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006

Special report. Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006 Special report Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006 Gene And Protein The gene that causes the mutation is CCND1 and the protein NP_444284 The mutation deals with the cell

More information

A Practitioner's G uide to Data Management and Data Integration in Bioinformatics

A Practitioner's G uide to Data Management and Data Integration in Bioinformatics 3 CHAPTER A Practitioner's G uide to Data Management and Data Integration in Bioinformatics Barbara A. Eckman 3.1 INTRODUCTION Integration of a large and widely diverse set of data sources and analytical

More information

Protein Protein Interactions (PPI) APID (Agile Protein Interaction DataAnalyzer)

Protein Protein Interactions (PPI) APID (Agile Protein Interaction DataAnalyzer) APID (Agile Protein Interaction DataAnalyzer) 23 APID (Agile Protein Interaction DataAnalyzer) Integrates and unifies 7 DBs: BIND, DIP, HPRD, IntAct, MINT, BioGRID. Includes 51,873 proteins 241,204 interactions

More information

Software review. Vector NTI, a balanced all-in-one sequence analysis suite

Software review. Vector NTI, a balanced all-in-one sequence analysis suite Vector NTI, a balanced all-in-one sequence analysis suite Keywords: sequence analysis, software package, database, virtual cloning, sequence assembly Abstract Vector NTI is a well-balanced desktop application

More information

Databases and mapping BWA. Samtools

Databases and mapping BWA. Samtools Databases and mapping BWA Samtools FASTQ, SFF, bax.h5 ACE, FASTG FASTA BAM/SAM GFF, BED GenBank/Embl/DDJB many more File formats FASTQ Output format from Illumina and IonTorrent sequencers. Quality scores:

More information

2011.008a-cB. Code assigned:

2011.008a-cB. Code assigned: This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the

More information

2 Short biographies and contact information of the workshop organizers

2 Short biographies and contact information of the workshop organizers 1 Title of the workshop from sequence to surveillance 2 Short biographies and contact information of the workshop organizers Dr Peter Durr - peter.durr@csiro.au Veterinary epidemiologist, Australian Animal

More information

Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource

Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource Alan R. Gingle Andrew H. Paterson Joshua A. Udall Jonathan F. Wendel 1 CEGC project goals set the context

More information

BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm377

BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm377 Vol. 23 no. 19 2007, pages 2558 2565 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm377 Structural bioinformatics Comparative protein structure modeling by combining multiple templates and

More information

Bachelor of Bachelor of Computer Science

Bachelor of Bachelor of Computer Science Bachelor of Bachelor of Computer Science Detailed Course Requirements The 2016 Monash University Handbook will be available from October 2015. This document contains interim 2016 course requirements information.

More information

A Tutorial in Genetic Sequence Classification Tools and Techniques

A Tutorial in Genetic Sequence Classification Tools and Techniques A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University jakemdrew@gmail.com www.jakemdrew.com Sequence Characters IUPAC nucleotide

More information

SIGMA CRIS: SCIENTIFIC OUTPUTS, INTEGRATION AND INTEROPERABILITY

SIGMA CRIS: SCIENTIFIC OUTPUTS, INTEGRATION AND INTEROPERABILITY SIGMA CRIS: SCIENTIFIC OUTPUTS, INTEGRATION AND INTEROPERABILITY From an On-Premise solution to service model for SIGMA CONSORTIUM Jordi Cuní Chief Information Officer SIGMA AIE EUNIS 2015 SIGMA AIE SIGMA

More information

Unit I: Introduction To Scientific Processes

Unit I: Introduction To Scientific Processes Unit I: Introduction To Scientific Processes This unit is an introduction to the scientific process. This unit consists of a laboratory exercise where students go through the QPOE2 process step by step

More information

Course Specification

Course Specification 1 Course Specification Program on which the course is given: Department offering the program: Department offering the course: Academic year /level: Date of specification approval: 2008/2009 Masters Degree

More information

Geneious 7.0. Biomatters Ltd

Geneious 7.0. Biomatters Ltd h in a flash Geneious 7.0 Biomatters Ltd September 3, 2013 2 Contents 1 Getting Started 7 1.1 Downloading & Installing Geneious.......................... 7 1.2 Using Geneious for the first time............................

More information

Review. Bioinformatics - a definition 1. As submitted to the Oxford English Dictionary

Review. Bioinformatics - a definition 1. As submitted to the Oxford English Dictionary N.M. Luscombe, D. Greenbaum, M. Gerstein Department of Molecular Biophysics and Biochemistry Yale University New Haven, USA Review What is bioinformatics? An introduction and overview Abstract: A flood

More information

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

More information

Applying data integration into reconstruction of gene networks from micro

Applying data integration into reconstruction of gene networks from micro Applying data integration into reconstruction of gene networks from microarray data PhD Thesis Proposal Dipartimento di Informatica e Scienze dell Informazione Università degli Studi di Genova December

More information