This task contains question. Please answer these questions in groups of two persons and make a small report.

Similar documents
Guide for Bioinformatics Project Module 3

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Bioinformatics Resources at a Glance

ID of alternative translational initiation events. Description of gene function Reference of NCBI database access and relative literatures

Exercises for the UCSC Genome Browser Introduction

The Galaxy workflow. George Magklaras PhD RHCE

Introduction to Bioinformatics AS Laboratory Assignment 6

Bioinformatics Grid - Enabled Tools For Biologists.

DNA Sequencing Overview

Phylogenetic Trees Made Easy

Multiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker

A Primer of Genome Science THIRD

EMBL-EBI Web Services

Genome Explorer For Comparative Genome Analysis

Analyzing A DNA Sequence Chromatogram

Clone Manager. Getting Started

Molecular Databases and Tools

Introduction to Bioinformatics 3. DNA editing and contig assembly

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

GenBank, Entrez, & FASTA

Authorware Install Directions for IE in Windows Vista, Windows 7, and Windows 8

CLC Sequence Viewer USER MANUAL

Course Equivalencies

Biological Databases and Protein Sequence Analysis

Guidelines for Creating Reports

BIOINFORMATICS TUTORIAL

RAST Automated Analysis. What is RAST for?

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

Citrix Client Install Instructions

UGENE Quick Start Guide

1&1 SEO Tool Expert Call

Library page. SRS first view. Different types of database in SRS. Standard query form

Introduction to Genome Annotation

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Structure Tools and Visualization

JustClust User Manual

Downloading RIT Account Analysis Reports into Excel

CUSTOMER+ PURL Manager

ecommercesoftwareone Advance User s Guide -

Visualization of Phylogenetic Trees and Metadata

Note: This document wh_informatics_practical.doc and supporting materials can be downloaded at

Pairwise Sequence Alignment

Google Sites. How to create a site using Google Sites

Blocking Junk in Outlook Version 1.00

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

Genome and DNA Sequence Databases. BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009

RCS Liferay Google Analytics Portlet Installation Guide

Vector NTI Advance 11 Quick Start Guide

Regional Drought Decision Support System (RDDSS) Charting Tools Help Documentation

BMC Bioinformatics. Open Access. Abstract

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Cary 100 Bio UV-Vis Operating Instructions 09/25/2012 S.V.

Having a BLAST: Analyzing Gene Sequence Data with BlastQuest

Latin American and Caribbean Flood and Drought Monitor Tutorial Last Updated: November 2014

Using Impatica for Power Point

4. Do you have a VGA splitter ( Y Cable)? a document camera?

Inking in MS Office 2013

Activity 7.21 Transcription factors

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

Module 1. Sequence Formats and Retrieval. Charles Steward

Egnyte Single Sign-On (SSO) Installation for Okta

Geocortex HTML 5 Viewer Manual

UF Health SharePoint 2010 Introduction to Content Administration

Check current version of Remote Desktop Connection for Mac.. Page 2. Remove Old Version Remote Desktop Connection..Page 8

LEARNING RESOURCE CENTRE. Guide to Microsoft Office Online and One Drive

What is Microsoft PowerPoint?

Software review. Analysis for free: Comparing programs for sequence analysis

DCAD Website Instruction Manual

Pipeliner CRM Phaenomena Guide Opportunity Management Pipelinersales Inc.

NIS-Elements Viewer. User's Guide

Cell Division Simulation: Bacteria Activity One

Human-Mouse Synteny in Functional Genomics Experiment

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Zoho CRM and Google Apps Synchronization

OpenIMS 4.2. Document Management Server. User manual

Genetics Lecture Notes Lectures 1 2

THE CHILDREN S HEALTH NETWORK CONTRACTING TOOL TRAINING MANUAL

An Overview of Cells and Cell Research

Create a Poster Using Publisher

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

How to use PGS: Basic Services Provision Map App

Click on various options: Publications by Wizard Publications by Design Blank Publication

Tutorial for Tracker and Supporting Software By David Chandler

Umbraco Content Management System (CMS) User Guide

Adobe Acrobat 6.0 Professional

Amazing DNA facts. Hands-on DNA: A Question of Taste Amazing facts and quiz questions

Filling out application form manual

From the list of Cooperative Extension applications, choose Contacts Extension Contact Management System.

Introduction to Smart Board. Table of Contents. Connection Basics 3. Using the Board (Basics) 4. The Floating Tools Toolbar 5-6

Investigating World Development with a GIS

In this example, Mrs. Smith is looking to create graphs that represent the ethnic diversity of the 24 students in her 4 th grade class.

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

D2L: An introduction to CONTENT University of Wisconsin-Parkside

After going through this lesson you would be able to:

SPSS INSTRUCTION CHAPTER 1

Decreases the magnification of your chart. Changes the magnification of the displayed chart.

Transcription:

Tasks Monday January 21st 2006 Goals: - to work with public databases on the internet to find gene and protein information. - To use tools to analyse and compare DNA sequences - To find homologous sequences in other organisms and to learn the concept of orthologs and paralogs. - To make a phylogenetic analysis using ClustalW - To analyse genome sequences from multiple organisms using VISTA We will make use of public DNA (NCBI and UCSC) and protein databases (EBI). The underlying information in the various databases is mostly identical but visualisation and search options as well as annotation may vary. This task contains question. Please answer these questions in groups of two persons and make a small report. Task 1: Homologs of the E. coli photolyase gene The bacterium E. coli can repair UV-induced DNA damage. UV-light can result in the formation of cyclobutane-thymidine dimers. The enzyme photolyase can repair the damage but it needs visible light to be activated. The energy of a photon is absorbed by the enzyme and used by FADH to free an electron needed for repair of the DNA damage. In this task you will search for the E. coli K12 photolyase gene and protein and you will try to find and compare homologs in other model organisms from Page 1 of 6

other 'kingdoms'. You will collect information for these homologs (e.g. protein size, protein domains present). Using this information, you will try to find out the possible evolution for this gene and how it did arise in various organisms. Find the amino acid sequence of the E. coli photolyase protein at NCBI. Go to http://www.ncbi.nih.gov/ and search all databases for photolyase. Find the protein sequence starting with NP_, which means that it is reference sequence for a given organism. How many amino acids does the protein consist of? In the pull-down menu 'display' you can select another view. Select 'FASTA' for a short version of the sequence without extensive annotation. Copy the sequence, including its description which is preceded by ">" and paste it on your notepad. Save this file for later use. As depicted in the picture above, the protein contains two important activities. We will now analyse the protein for known 'domains' residing in the protein using the program "Interproscan" (http://www.ebi.ac.uk/interproscan/). Copy the protein sequence into this screen and start the analysis. Which two large protein domains are found in the E. coli photolyase protein? What are the functions of these two domains? To find homologs in other organisms you can choose to use the complete protein or one of the two domains. We will first search for homologs in the one-cellular organism bakers yeast Saccharomyces cerevisiae. Copy the E. coli photolyase amino acid sequence and find homologs in yeast using Blast (http://www.ncbi.nlm.nih.gov/blast/). Which Blast program would be best suited for this task? Paste the protein sequence in the 'search' window and limit your search to "Saccharomyces cerevisiae" in the options panel. After you started the Blast comparison, a new screen will pop up, again showing the two domains present in the photolyase protein. Hit the 'format' button to go the results. Click on the best hit, preferably again a 'NP_xxxxx' sequence and retrieve and copy this sequence in FASTA format to your notepad with the E. coli sequence. Blast also returns an alignment of the E. coli and yeast protein sequence, but this is a local alignment that only shows those parts that match best. In this case, homology information for the start and end of the protein is missing. To make a global alignment, we will use the program Align (http://www.ebi.ac.uk/emboss/align/index.html). The alignment method is Page 2 of 6

standard put on global alignment. Paste the E. coli and yeast protein sequences in the two different input fields. Hit 'run' to see the aligned output. What is the most striking difference between the two sequences? This part of the protein may play a role in the subcellular targeting of the protein. Go to the Saccharomyces Genome Database (SGD) (http://www.yeastgenome.org/) and find out what the subcellular localisation of the yeast photolyase protein is to search for "Phr1", which is the yeast name for this protein. What is the subcellular localisation of the protein? Could this be expected? Assuming that the domain that is present in yeast and not in E. coli is responsible for this localisation, why can E. coli do without this segment? Now go back and try to find other homologs of the E. coli photolyase in Eukaryotes. Include homologs in human, mouse and a plant and some organisms of your own choice. Copy all sequences in FASTA format (preferentially sequences with NP_xxxxx names) to a new notepad file. Note that organisms may contain multiple different homologs! Collect all of them. Once you have a nice collection of sequences we will compare them with each other using the multiple sequence alignment program ClustalW (http://www.ebi.ac.uk/clustalw/). Read the Frequently Asked Questions for more background on this tool. On the bottom of the page you will find the 'Upload a file' field. Select your saved notepad file and run the program. Discuss your findings in your report. You can improve your alignment by removing distantly related sequences. Delete these sequences (e.g. E. coli) from your notepad file and reanalyse your sequences. The human and mouse genome both contain two clear photolyase homologs: cryptochrome 1 and 2. Describe which genes are likely to be orthologs and which are paralogs. Page 3 of 6

Task 2: Comparative genome analysis of the human cry2 locus. From Task 1 you have learnt that you can find protein sequences and identify homologs in other organisms. However, sometimes the protein sequence is not available for a given organism or it may be questionable if the gene structure is properly predicted from the genome sequence. In this task, you will search for homologous regions in mouse, rat, chimpanzee, fugu, etcetera using the comparative genome browser VISTA (http://genome.lbl.gov/vista/index.shtml). You will find various programs on the VISTA home page for specific types of searches. Go to the VISTA Browser (http://pipeline.lbl.gov/cgi-bin/gateway2) and search in the 'Human July 2003' genome for the human photolyase gene 'cry2' by filling out this term in the position field. You will now graphically see the degree of conservation between the homologous human and mouse genome sequences. Try to understand the figure and colouring using the legend. Extend the comparison by adding more organisms using the pull down menu on the left. Which parts of the gene are clearly conserved in all organisms? Which organisms are best suited for the identification of this kind of conserved regions? Which are less suited? Explain why. Which organism would be best suited for finding conserved and potentially functional promoter elements that regulate the expression of this gene? Zoom out by clicking on the magnification icon with the '-' sign. You will now see a larger genomic region. In the chimpanzee trace you will now see a large gap. What does this mean and what process is underlying this. Page 4 of 6

Task 3: Identification of functional genomic elements using phylogenetic shadowing This morning you have read the paper by Boffelli et al. on phylogenetic shadowing. This method is specifically suited for the identification of small conserved elements in a genome or lineage-specific features. For this task you will be using the sequences from 10 different primates (FASTA format) from the course webpage. Align these sequences using ClustalW. What can you conclude from this alignment? There is clearly another approach needed to extract information from these sequences. Use the eshadow (http://eshadow.dcode.org/) tool to analyse these sequences. Play around with the window size settings to get a clear view. What is shown in the graph? How many potentially functional segments are present in this region and what principle is underlying this hypothesis? What is the estimated size of each conserved region? Page 5 of 6

Now let's go back to the Vista homepage and see if we can retrieve the same information using other genomes. Use the GenomeVISTA tool and use the human sequence in your sequence list to search the genomic coordinates in the human June 2004 genome assembly. Wait until your search is finished and click the 'Vista browser' option. Add all available organisms for comparison. What can you conclude? Which organisms could also be used to identify these individual elements and which are not very informative? What is the estimated size of each conserved region? Close the VISTA browser window and select the 'VISTA track' option in the search results window. You are now redirected to the UCSC genome browser, which displays your results along with existing genome annotation. There is another track with conservation information, showing the cumulative conservation using information from 10 different organisms (this is not a pairwise alignment, as you have seen thus far in VISTA, but a graphical representation of a sort of ClustalW multiple alignment). What would you conclude from the 10-way alignment? What is the estimated size of each conserved region? Under the graph you will find many options that can be displayed as well. Select the 'full' option for the sno/mirna option. Which element(s) reside in the conserved regions? What are their sizes? Page 6 of 6