Genome Viewing. Module 2. Using Genome Browsers to View Annotation of the Human Genome

Similar documents
Module 3. Genome Browsing. Using Web Browsers to View Genome Annota4on. Kers4n Howe Wellcome Trust Sanger Ins4tute zfish-

How To Use The Assembly Database In A Microarray (Perl) With A Microarcode) (Perperl 2) (For Macrogenome) (Genome 2)

Module 1. Sequence Formats and Retrieval. Charles Steward

A Primer of Genome Science THIRD

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Genomes and SNPs in Malaria and Sickle Cell Anemia

Using Ensembl tools for browsing ENCODE data

Gramene: Exploring Function through Comparative Genomics and Network Analysis Doreen H. Ware, Ph.D. United States Department of Agriculture ARS Cold

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

Protein Protein Interactions (PPI) APID (Agile Protein Interaction DataAnalyzer)

Yale Pseudogene Analysis as part of GENCODE Project

Introduction to Genome Annotation

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

GenBank, Entrez, & FASTA

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility

Bioinformatics Resources at a Glance

Figure 1: Genome sizes of different organisms.

CCR Biology - Chapter 9 Practice Test - Summer 2012

PROTEOMEXCHANGE AN INTERNATIONAL INFRASTRUCTURE FOR OPEN PROTEOMICS DATA

New solutions for Big Data Analysis and Visualization

Databases and platforms for data analysis from NGS of MTB

Processing Genome Data using Scalable Database Technology. My Background

Scientific databases. Biological data management

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

Searching Nucleotide Databases

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS

Note: This document wh_informatics_practical.doc and supporting materials can be downloaded at

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Global Alliance. Ewan Birney Associate Director EMBL-EBI

GeneProf and the new GeneProf Web Services

EMBL-EBI Web Services

Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Integration of data management and analysis for genome research

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

Visualisation tools for next-generation sequencing

Genomics GENterprise

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Human Genome Organization: An Update. Genome Organization: An Update

Outline. MicroRNA Bioinformatics. microrna biogenesis. short non-coding RNAs not considered in this lecture. ! Introduction

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

PANTHER User Manual. For PANTHER 9.0. Date: January 7, The PANTHER Team. Authors:

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

EMBL-European Bioinformatics Institute. Annual Scientific Report 2012

PlantGDB, plant genome database and analysis tools

Replacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control. Begin

Biological Databases and Protein Sequence Analysis

GWASrap User Manual v1.1

Fast. Integrated Genome Browser & DAS. Easy. Flexible. Free. bioviz.org/igb

Data Sharing Initiative: International Cancer Genome Consortium

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

EMBL Identity & Access Management

Introduction. Overview of Bioconductor packages for short read analysis

Frequently Asked Questions Next Generation Sequencing

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

What s New in Pathway Studio Web 11.1

Basic processing of next-generation sequencing (NGS) data

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

Human-Mouse Synteny in Functional Genomics Experiment

Comparing Methods for Identifying Transcription Factor Target Genes

Lecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

Simplifying Data Interpretation with Nexus Copy Number

SUBMITTING DNA SEQUENCES TO THE DATABASES

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Transcription:

Module 2 Genome Viewing Using Genome Browsers to View Annotation of the Human Genome Bert Overduin, Ph.D. PANDA Coordination & Outreach EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK

Why Genome Browsers? Browse genes in their genomic context Display features in and around a particular gene Explore larger chromosomal regions Search and retrieve information on a genomewide scale Compare genomes

Genome Browsers Ensembl Genome Browser http://www.ensembl.org NCBI Map Viewer http://www.ncbi.nlm.nih.gov/mapview UCSC Genome Browser http://genome.ucsc.edu

What Distinguishes Ensembl? Automatic annotation for those species for which no manually annotated gene sets exist Data mining tool BioMart Direct database access and programmatic access via the Perl API Not only the data, but also the software code is open source

Ensembl - Organisation Joint project between the European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute (WTSI) Started in 1999 for the Human Genome Project Funded primarily by the Wellcome Trust, with additional funding by EMBL, NIH-NHGRI, NIH- NIAID, BBSRC, MRC and EU Team of ca. 50 people, led by Ewan Birney (EBI) and Tim Hubbard (WTSI)

Ensembl - Species 48 chordates, ranging from human to two Ciona species 3 key eukaryote model organisms: Drosophila melanogaster Caenorhabditis elegans Saccharomyces cerevisiae

http://www.ensemblgenomes.org Aedes aegypti, Anopheles gambiae, Culex quinquefasciatus, 12 Drosophila species, 5 Caenorhabditis species, Ixodes scapularis Plasmodium falciparum, Plasmodium knowlesi, Plasmodium vivax Bacillus, Escherichia/Shigella, Mycobacterium, Neisseria, Pyrococcus, Staphylococcus, Streptococcus Arabidopsis lyrata, Arabidopsis thaliana, Brachypodium distachyon, Oryza sativa, Oryza sativa indica group, Populus trichocarpa, Sorghum bicolor, Vitis vinifera 7 Aspergillus species, Neosartorya fischeri, Saccharomyces cerevisiae, Schizosaccharomyces pombe

Ensembl - Data Genomic sequence Gene / transcript / protein models External references Mapped cdnas, proteins, microarray probes, BAC clones, cytogenetic bands, repeats, markers etc. etc. Comparative data: orthologs and paralogs, protein families, whole genome alignments, syntenic regions Variation data: SNPs Regulatory data: best guess set of regulatory elements Externally stored data (Distributed Annotation System)

Ensembl Gene Models Automatically annotated genes for the whole genome of all species ( Ensembl genes ) Manually annotated genes for part of the human and mouse genome ( Vega/Havana genes)

Biological Evidence All Ensembl gene models are based on evidence from: UniProtKB/Swiss-Prot Proteins, manually curated NCBI RefSeq Proteins and mrnas, partially manually curated UniProtKB/TrEMBL Translations of EMBL-Bank CDSs, automatically annotated EMBL-Bank / GenBank / DDBJ Primary nucleotide sequence repositories

Ensembl Genebuild Genome assembly + Experimental evidence + Computer programs

Access to Data Release web site http://www.ensembl.org Pre-release web site http://pre.ensembl.org Archive web site http://archive.ensembl.org BioMart http://www.ensembl.org/biomart/martview http://www.biomart.org/biomart/martview FTP site ftp://ftp.ensembl.org Amazon Web Services http://aws.amazon.com/publicdatasets MySQL http://www.ensembl.org/info/data/mysql.html Perl API http://www.ensembl.org/info/data/api.html

Ensembl Stable Identifiers Human: ENSG########### Ensembl Gene ID ENST########### Ensembl Transcript ID ENSP########### Ensembl Protein ID ENSE########### Ensembl Exon ID ENSR########### Ensembl Regulatory Feature ID ENSSNP########### Ensembl SNP ID ENSFM########### Ensembl Protein Family ID Other species have a suffix: ENSMUSG########### A mouse (Mus musculus) gene

Summary Genome browsers render the plain sequence more accessible Ensembl provides automatic genome annotation, yet is strongly based on experimental evidence from protein and cdna sequences in public databases Ensembl heavily links to data sets from other species, as well as to external resources

Data Mining Ensembl with BioMart

BioMart Joint project between the European Bioinformatics Institute (EBI) and the Ontario Institute for Cancer Research (OICR) Originally developed for Ensembl (EnsMart) Website : http://www.biomart.org

Publicly Available Marts Ensembl Ensembl Bacteria Ensembl Metazoa Ensembl Protists Dictybase Wormbase Gramene Europhenome UniProt InterPro HGNC Rat Genome Database DroSpeGe ArrayExpress DW Eurexpress HapMap GermOnLine PRIDE PepSeeker VectorBase HTGT Pancreatic Expression Database Reactome EU Rat Mart Paramecium DB International Potato Center (CIP) Central portal: http://www.biomart.org/biomart/martview

BioMart - Principle Step 1 Dataset Choose your dataset and species Step 2 Filters Limit your dataset Step 3 Attributes Specify what information you want to output Step 4 Results Preview and output your results

Summary BioMart is a highly flexible tool for data mining Queries are defined in just 4 steps: Dataset, Filters, Attributes and Results Genomic regions, Gene identifiers, Gene Ontology terms and many other sources of information can serve as filters BioMart heavily links to data sets within Ensembl and provides links to external resources

Help Helpdesk helpdesk@ensembl.org Mailing lists: ensembl-dev@ebi.ac.uk ensembl-announce@ebi.ac.uk Blog: http://ensembl.blogspot.com YouTube channel: http://www.youtube.com/user/ensemblhelpdesk