Sequence Comparison and Genome Alignment in the Human Genome

Similar documents
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Bioinformatics Resources at a Glance

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

Analysis of ChIP-seq data in Galaxy

Searching Nucleotide Databases

Exercises for the UCSC Genome Browser Introduction

Biological Sciences Initiative. Human Genome

Gene Models & Bed format: What they represent.

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Creating a Network Graph with Gephi

Visualization of Phylogenetic Trees and Metadata

Analyzing A DNA Sequence Chromatogram

Module 1. Sequence Formats and Retrieval. Charles Steward

Lab 2-2: Exploring Threads

Price list update. What this exercise is about What you should be able to do Introduction Requirements... 2

Outline. MicroRNA Bioinformatics. microrna biogenesis. short non-coding RNAs not considered in this lecture. ! Introduction

Pharmacy Affairs Branch. Website Database Downloads PUBLIC ACCESS GUIDE

Using a Remote SQL Server Best Practices

Nebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA

Files Used in this Tutorial

How to Create a Voicethread PowerPoint Presentation

NaviCell Data Visualization Python API

Version 5.0 Release Notes

CREATING AND EDITING CONTENT AND BLOG POSTS WITH THE DRUPAL CKEDITOR

Livezilla How to Install on Shared Hosting By: Jon Manning

Guide for Bioinformatics Project Module 3

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

Chapter 4: Website Basics

Colligo Manager 5.1. User Guide

Document Management Set Up Guide

Power Point 2003 Table of Contents

Colligo Manager 6.0. Connected Mode - User Guide

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Human Genome Organization: An Update. Genome Organization: An Update

Fusion Release Notes Versions January 2015

How to Create a PDF Document

Using Impatica for Power Point

Virtual Appliance Setup Guide

Using Internet or Windows Explorer to Upload Your Site

Exclaimer Mail Archiver User Manual

Practice Fusion API Client Installation Guide for Windows

Change Manager 5.0 Installation Guide

Listed below are the common process in creating a new content type, and listing a summary of all contents via view and/or panel custom page.

1. Digital Asset Management User Guide Digital Asset Management Concepts Working with digital assets Importing assets in

Getting Started with MozyPro Online Backup Online Software from Time Warner Cable Business Class

Comparing Methods for Identifying Transcription Factor Target Genes

Upgrading from Call Center Reporting to Reporting for Contact Center. BCM Contact Center

Quick Start Using DASYLab with your Measurement Computing USB device

Content Management System User Guide

Genome Explorer For Comparative Genome Analysis

Bioinformatics Grid - Enabled Tools For Biologists.

GPS Tracking Software Training and User Manual

Colligo Manager 6.2. Offline Mode - User Guide

Client Instructions - ID Tech Configuration Instructions

WebEx Sharing Resources

5.6.2 Optional Lab: Restore Points in Windows Vista

Umbraco Content Management System (CMS) User Guide

MICROSOFT OFFICE LIVE MEETING GUIDE TO RECORDING MEETINGS

Deploying System Center 2012 R2 Configuration Manager

BT CONTENT SHOWCASE. JOOMLA EXTENSION User guide Version 2.1. Copyright 2013 Bowthemes Inc.

Gephi Tutorial Quick Start

Visualization with Excel Tools and Microsoft Azure

Applicant Workflow Hiring Managers

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

User Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N Rev.

Voice over IP. Orator Dictation Voice-over-IP Quick Start Installation Guide

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Spam Marshall SpamWall Step-by-Step Installation Guide for Exchange 5.5

Microsoft Migrating to PowerPoint 2010 from PowerPoint 2003

Background Deployment 3.1 (1003) Installation and Administration Guide

The Welcome screen displays each time you log on to PaymentNet; it serves as your starting point or home screen.

Increasing Productivity and Collaboration with Google Docs. Charina Ong Educational Technologist

Reporting. Understanding Advanced Reporting Features for Managers

BID2WIN Workshop. Advanced Report Writing

Managing your Joomla! 3 Content Management System (CMS) Website Websites For Small Business

Microsoft Dynamics CRM Clients

- 1 - Guidance for the use of the WEB-tool for UWWTD reporting

Enhanced Imaging Options for Client Profiles for Windows

Jolly Server Getting Started Guide

Exchange Web Services [EWS] support in The Bat! v7

QQConnect Overview Guide

Worksheet - COMPARATIVE MAPPING 1

Tutorial for proteome data analysis using the Perseus software platform

SHIPSTATION / MIVA MERCHANT SETUP GUIDE

Enter your User Id and Password and click the Log In button to launch the application.

App Building Guidelines

1. Nuxeo DAM User Guide Nuxeo DAM Concepts Working with digital assets Import assets in Nuxeo DAM

DNA Sequencing Overview

CLC Sequence Viewer USER MANUAL

Title: SharePoint Advanced: Adding An Image to A Site Purpose Policy Definitions

AVG File Server. User Manual. Document revision ( )

Schools Remote Access Server

Colligo Manager 6.0. Offline Mode - User Guide

Supply Chain Finance WinFinance

Joomla! 2.5.x Training Manual

Advanced Digital Imaging

Transcription:

Sequence Comparison and Genome Alignment in the Human Genome Jian Ma Powerpoint: Casey Hanson Jian Ma Sequence Comparison and Genome Alignment 1

Introduction This goals of the lab are as follows: 1. Gain experience using BLAST and Genome Browsers by looking at repeat families in the VHL gene. 2. Become familiar with BLAT and the UCSC website by discovering the identity of a mystery sequence. 3. Visualize pairwise multi-genome alignment and chromosomal rearrangements. 4. View phylogeny based multi-genome alignment. 5. Use UCSC tools and Galaxy to intersect annotated functional regions between human and other placental animals. Jian Ma Sequence Comparison and Genome Alignment 2

Step 0: Download and Extract Data Files For viewing and manipulating the files needed for this laboratory exercise, download the following archive: http://veda.cs.uiuc.edu/compgen2014/labs/06 _Comparative_Genomics.zip Right Click and Extract the contents of the archive to your course directory. We will use the files found in: [course_directory]/06_comparative_gen omics/data/ Jian Ma Sequence Comparison and Genome Alignment 3

BLAST & Genome Browser In this exercise, we will use BLAST (Basic Local Alignment Search Tool) to search for significant occurrences of a class of transposable elements (TEs) called Short INterspersed Elements (SINEs), specifically of the ALU family, in the well-known VHL tumor suppressor gene. The goal of this exercise is to gain experience using BLAST, particularly blastn, and the UCSC genome browser to answer biologically relevant questions. Jian Ma Sequence Comparison and Genome Alignment 4

Step 1A: BLAST VLH in ALU Database Go to the following web page: http://blast.ncbi.nlm.nih.gov/blast.cgi Click nucleotide_blast In the Enter Query Sequence box, paste the accession # for VHL: AF010238 In the Database drop-down list, select the following: Human ALU repeat elements (alu_repeats) Click the BLAST button. Jian Ma Sequence Comparison and Genome Alignment 5

Step 1B: BLAST VLH in ALU Database Jian Ma Sequence Comparison and Genome Alignment 6

Step 2A: Interpreting BLAST Results Color Indicates Quality of Match Coordinates of VHL gene Very Good Matches Good Matches A match is a significant similarity between a region of the query and a region of a database sequence. Okay Matches Lines between boxes indicate gaps between matches in the query sequence. (The next slide has a legend for interpretation) Jian Ma Sequence Comparison and Genome Alignment 7

Step 2B: Interpreting BLAST Results Exonic regions less likely to have ALU repeats. Matches like this are likely to be located in intronic regions. Note the following legend for interpreting a match. Intron Intron Intron Exon Exon Excellent Match Good Match Okay Match Jian Ma Sequence Comparison and Genome Alignment 8

Step 3A: Examine VHL in UCSC Browser Let s look at the structure of the VHL gene in a Genome Browser to verify that ALU elements are confined to the introns. Go to the following web page: http://genome.ucsc.edu/ Click Genome Browser Select genome, Human In the search term, type VHL Click submit Click the 2 nd link: VHL (uc003bvd.3) at chr3:10183319-10195354 Jian Ma Sequence Comparison and Genome Alignment 9

Step 3B: Examine VHL in UCSC Browser Enter chr3:10,181,000-10,196,000 into input box and click go. Right click on tracks NOT shown below and hide them. Right click on the RepeatMasker track and click full. It is dense by default. Adjust the zoom until you get a view you are comfortable with. Jian Ma Sequence Comparison and Genome Alignment 10

Step 3C: Examine VHL in UCSC Browser Repeat tracks are 3 to the gene, 5 to the gene, or in the intronic region. This validates our hypothesis. ALUs are not the only family of SINEs located in the intronic regions. What other SINE families does VHL have? What about other TE classes other than SINE? (Answers provided in separate pdf) Jian Ma Sequence Comparison and Genome Alignment 11

BLAT In this exercise, we will use BLAT (Basic Local Alignment Tool) to search for the identity of a mystery gene annotated in the human genome. The goal of this exercise is to gain experience using BLAST, particularly blastn, and the UCSC genome browser to answer biologically relevant questions. Jian Ma Sequence Comparison and Genome Alignment 12

BLAST v. BLAT BLAST Can find matches to a query in any set of GenBank sequences. Not limited to a given k-mer size. Consumes a lot of memory. Slow compared to BLAT. BLAT Limited to matches to a query in a particular reference genome. Limited to non-overlapping 11-mers for DNA. Can fit an entire genome in memory ( < 1GB) of RAM. Fast compared to BLAST. Jian Ma Sequence Comparison and Genome Alignment 13

Step 1A: BLAT the Mystery Sequence Go to the following web page: http://genome.ucsc.edu/ Click BLAT Open our mystery sequence, located below, in Notepad. [course_directory]/06_comparative_genomics/data/mystery_sequence.txt Paste the sequence into the textarea Click submit Jian Ma Sequence Comparison and Genome Alignment 14

Step 1B: BLAT the Mystery Sequence Screenshot of the web form for BLAT. Jian Ma Sequence Comparison and Genome Alignment 15

Step 2A: Identify Mystery Sequence BLAT will return a list of significant matches in the genome. Investigate the matches in the list by clicking browser for each match For example, click the first browser link here. Jian Ma Sequence Comparison and Genome Alignment 16

Step 2B: Identify Mystery Sequence The screenshot below shows UCSC and RefSeq genes aligned to the Mysterious Sequence. In particular, CYP2A13. Examine the other matches on the previous slide in the genome browser. Keep in mind 2 questions: (Answers provided at the end of the document) A. How many potential genes does the mystery sequence come from? B. What is the relationship among these genes? Jian Ma Sequence Comparison and Genome Alignment 17

Pairwise Whole Genome Alignments In this exercise, we will utilize the UCSC Genome Browser to view whole genome alignments computed by lastz of the following genomes individually to human: organutan, mouse, dog, and opossum. We will investigate these alignments to see if we can discover chromosomal rearrangements. Jian Ma Sequence Comparison and Genome Alignment 18

Step 1: Create a Custom UCSC Track Go to the UCSC Genome Browser: http://genome.ucsc.edu/index.html Under the My Data Tab, click Create Custom Tracks: In the Paste URLs textbox paste the following and click submit: (no commas) chr13 58481798 58486558 On the next page, click Go to Genome Browser Jian Ma Sequence Comparison and Genome Alignment 19

Step 2A: Track Addition The track should look similar to what is below: Jian Ma Sequence Comparison and Genome Alignment 20

Step 2B: Track Addition and Removal To get Pairwise Alignments we need to turn a few tracks on and one track off. Specifically, we need to select: Primate Chain/Net Placental Chain/Net Vertebrate Chain/Net. Underneath the Comparative Genomics Tab, turn these tracks to dense. Additionally, set Conservation to hide and click refresh. Jian Ma Sequence Comparison and Genome Alignment 21

Step 2C: Track Addition The resulting view should look like the figure below. There is one problem: our species of interest are not being displayed. Jian Ma Sequence Comparison and Genome Alignment 22

Step 2D: Species Selection To select the correct species, go back to the Comparative Genomics Tab. Click on the Primate Chain/Net link. In the resulting window, set Chains to hide and make sure only Orangutan is selected. Click submit Jian Ma Sequence Comparison and Genome Alignment 23

Step 2E: Species Selection Continued Conduct Step 2D for the other two tracks: Placental Chain/Net Vertebrate Chain/Net Make sure your configuration resembles the screenshots below: Placental Chain/Net Vertebrate Chain/Net Jian Ma Sequence Comparison and Genome Alignment 24

Step 2F: Expand Tracks On the tracks for each species, Right Click and select Full. The resulting Genome Browser (after moving the tracks to the top) should look like the following: Jian Ma Sequence Comparison and Genome Alignment 25

Step 3: Whole Genome Alignment Analysis. Investigate the tracks for each species and answer the following questions. A. Are the sequence counterparts co-linear with respect to human? If not, is their evidence of genomics rearrangements in this region? Which kind? B. Can you infer when these rearrangements happened evolutionarily on the diagram to the right? Answers provided in separate pdf. Jian Ma Sequence Comparison and Genome Alignment 26

Phylogeny Based Whole Genome Alignment In this exercise, we will utilize the UCSC Genome Browser to view a refined whole genome alignment of orangutan, mouse, dog, and opossum genomes to human. This alignment is produced by Multiz, a program that utilizes pairwise whole genome alignments of many species and, using a phylogenetic tree, improves the alignment. Jian Ma Sequence Comparison and Genome Alignment 27

Step 1: Setup Multiz Visualization Go to the UCSC Genome Browser: http://genome.ucsc.edu/index.html Upload the following as a Custom Track and go to the genome browser, as in the previous exercise: (no commas) chr20 61733467 61733528 Under the Comparative Genomics tab in the genome browser, click on Conservation. Ensure the following settings are in place on the next 2 pages: Jian Ma Sequence Comparison and Genome Alignment 28

Step 1B: Setup Multiz Visualization Jian Ma Sequence Comparison and Genome Alignment 29

Step 1C: Setup Multiz Visualization Once your configuration resembles the last 2 figures, click submit Jian Ma Sequence Comparison and Genome Alignment 30

Step 2: Multiz Visualization Analysis After rearranging tracks, the genome browser should resemble the figure below: Investigate the tracks for each species and answer the following questions: A. Is this region highly conserved in mammals? B. Look closely at the Multiz track. Do you see anything strange in the human sequence compared to the other species? What could be the reason for this discrepancy? (Answers provided in separate pdf) Jian Ma Sequence Comparison and Genome Alignment 31

Intersection of Annotated Regulatory Regions in Human and Placental Mammals In this exercise, we will use Galaxy to intersect annotated regulatory regions in human with annotated regions in other placental mammals. We will then view the intersection in the UCSC genome browser Jian Ma Sequence Comparison and Genome Alignment 32

Step 1A: Place Regulatory Data in Galaxy Connect to Galaxy : https://usegalaxy.org/ Upload the sequence of predicted regulatory regions in h19 to Galaxy: [course_directory]/06_comparative_genomics/data/pre_mod_hg19.bed Make sure to identify hg19 as your reference genome. Acquire all conserved regions in placental mammals from the UCSC Main Table Browser in Galaxy: Jian Ma Sequence Comparison and Genome Alignment 33

Step 1B: Place Regulatory Data in Galaxy Select Comparative Genomics for Group Select 100 Vert. E1 (phastconselements100way) for table. Select Genome for region. Select Galaxy for send output to. Click Get Output On the next screen, click Send Query to Galaxy. Jian Ma Sequence Comparison and Genome Alignment 34

Step 2: Intersect Datasets Go to Operate on Genomic Intervals in Galaxy and select Intersect. Select the parameters below and click Execute. When finished, click display at UCSC in history pane. UCSC Results chr19 regulatory regions. Jian Ma Sequence Comparison and Genome Alignment 35

Step 3: Predicted Modules Overlap with PAX5 Regulators Jian Ma Sequence Comparison and Genome Alignment 36

Exploratory Exercise Pick a gene of interest. (VHL, CMYC, ETS1, TBP, USF2, GATA-1, ) Visualize the intersected intervals in the UCSC Genome Browser. See how this region correlates with results from ENCODE to assess their functional roles. We will come around to help. Jian Ma Sequence Comparison and Genome Alignment 37