Exercises for the UCSC Genome Browser (part I)

Similar documents
Exercises for the UCSC Genome Browser Introduction

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Bioinformatics Resources at a Glance

IGV Hands-on Exercise: UI basics and data integration

Frequently Asked Questions Next Generation Sequencing

Vector NTI Advance 11 Quick Start Guide

SWAN 15.1 Advance user information What s new in SWAN? Introduction of the new user interface. Last update: 28th April 2015

Human Genome Organization: An Update. Genome Organization: An Update

Online Web Learning University of Massachusetts at Amherst

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Trigger. Perform this procedure when using the CRM Worklist. Helpful Hints

PEMBINA TRAILS SCHOOL DIVISION. Information Technology Department

Biological Sciences Initiative. Human Genome

Note: This document wh_informatics_practical.doc and supporting materials can be downloaded at

MicroStrategy Tips and Tricks

Analysis of FFPE DNA Data in CNAG 2.0 A Manual

Version 5.0 Release Notes

How to Find Commercial Sources

Synergy SIS Grade Book User Guide

Finance Reporting. Millennium FAST. User Guide Version 4.0. Memorial University of Newfoundland. September 2013

Performance and Contract Management System Data Submission Guide

STEPfwd Quick Start Guide

Analysis of ChIP-seq data in Galaxy

Applicant Tracking System Job Aids. Prepared by:

Analyzing A DNA Sequence Chromatogram

Chapter Website Navigation and Database Instructions

Education Solutions Development, Inc. APECS Navigation: Business Systems Getting Started Reference Guide

Getting Started Guide

User Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N Rev.

SYNERGY SIS Attendance User Guide

Roulette-Tools PATTERN RECOGNITION TRAINING

Contents. Stationery Greeting Cards at a glance Stationery Greeting Cards in Mail Installing Stationery Greeting Cards...

Affiliated Provider Billing/Coding

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

efiletexas.gov Review Queue User Guide

Information Exchange Network (IEN) System Operator Training Day 3

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

Simplifying Data Interpretation with Nexus Copy Number

Smoke Density Monitor application documentation

1. Contents What is AGITO Translate? Supported formats Translation memory & termbase Access, login and support...

Turnitin Instructor User Manual Chapter 2: OriginalityCheck

Genomes and SNPs in Malaria and Sickle Cell Anemia

Data Entry Training Module

Custom Reporting System User Guide

PyroMark Q24 Analysis Software User Guide

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

vrecord User Manual Vegress Record 2.0 (vrecord) page 1 of 10

Google Drive: Access and organize your files

Gene Models & Bed format: What they represent.

Real-time qpcr Assay Design Software

FleetFocus M5 Basic Application Navigation Manual

How To Use Textbuster On Android (For Free) On A Cell Phone

Frog VLE Update. Latest Features and Enhancements. September 2014

MICROSOFT OFFICE ACCESS NEW FEATURES

This document also includes steps on how to login into HUDMobile with a grid card and launch published applications.

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Introducing Improved Access to Your Patient Web Results

ithenticate User Manual

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

AT&T Voice DNA User Guide

CLC Sequence Viewer USER MANUAL

How to Login Username Password:

Enterprise Asset Management System

Totally Internet Based Software. Getting Started. Strategy Systems, Inc. PO Box 2136 Rogers, AR (479)

Integrating Warehouse and Inventory Management Practices

Making a Web Page with Microsoft Publisher 2003

PrimePCR Assay Validation Report

Searching Nucleotide Databases

Pacific Premier Bank s Business e- Banking Getting Started Guide with QuickBooks for Windows

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

Fleet Manager Quick Guide (Non Maintenance Mode)

MANAGER SELF SERVICE USER GUIDE

Scheduling/ Calendar Software Training Manual

Important. Please read this User s Manual carefully to familiarize yourself with safe and effective usage.

HTML Code Generator V 1.0 For Simatic IT Modules CP IT, IT, IT

Marathon Data Systems

COMMON CUSTOMIZATIONS

Frequently Asked Questions Mindful Schools Online Courses. Logging In Navigation s & Forums Tracking My Work Files...

Cognos Introduction to Cognos

NearPoint Archive and Retrieval System

Introduction to Computers: Session 3 Files, Folders and Windows

Release Notes Assistance PSA 2015 Summer Release

NetIQ Operations Center 5: The Best IT Management Tool in the World Lab

Connecting to a Soundweb TM. London Network

PowerPoint Slide Show Step-By-Step

1. Click the Site Actions dropdown arrow and select Show Page Editing Toolbar. 2. Click Edit Page to begin changing the page layout

Works. Works Quick Reference Guide. Creating and Managing Reports

Affiliated Provider Billing/Coding

Project Management within ManagePro

WebSphere Business Monitor V7.0 Business space dashboards

Quest Web Parts for SharePoint

Transcription:

Exercises for the UCSC Genome Browser (part I) 1) Find out if the mouse TP53 gene has a gene expression data available from the The Genomics Institute of the Novartis Research Foundation, or GNF. Click on a tissue to examine the results in more detail. Skills: basic text search; Genome Viewer pulldown menus; data details pages 2) Find the protein sequence for rat leptin. BLAT this sequence vs. the human genome to find the human homolog. Look for SNPs in the coding region of this gene are there any? Obtain the human DNA sequence for this region, and underline the SNPs. Skills: obtaining protein sequence; BLAT; finding SNPs in exons; get DNA sequence with extended case/color options 3) Find the genomic region for the human NRAS [neuroblastoma RAS viral (v-ras) oncogene homolog] gene. Look for ESTs that were found in neuroblastoma tissues by using the filter to color those blue in the genome viewer. When you are convinced that this protein is found in neuroblastoma tissue, examine the properties of this protein with the Proteome Browser (which is found as a link from Known Gene details pages). What are the pi, mw for this neuroblastoma protein? Skills: examining ESTs; using a filter to color specific items; finding protein properties with the Proteome browser. 4) Perform an in silico PCR, to see what happens when more than 1 PCR product may arise. Determine the product sizes, and the melting temperatures of these primers. Skills: in silico PCR of genomic sequence; finding product size and Tm. UCSC Exercises, version 7.0. Correspond to the builds available in March 2006. The materials and slides offered are for non-commercial use only. Reproduction, distribution and/or use for commercial purposes are strictly prohibited. Copyright 2006, OpenHelix, LLC. 1

Step-by-Step instructions for the UCSC Genome Browser exercises 1) Find out if the mouse TP53 gene has a gene expression data available from the The Genomics Institute of the Novartis Research Foundation, or GNF. Click on a tissue to examine the results in more detail. 1 Go to the UCSC Genome Browser homepage, genome.ucsc.edu 2 Enter the Gateway, by clicking the Genome Browser link from the homepage. 3 Select Vertebrate as the clade, Mouse as the genome. Choose the most current assembly. 4 Enter the text tp53 in the text box. Click Submit. 5 From the results list, click a link that appears to be the real tp53. I will choose (AY212017, Cellular tumor antigen p53) for this example. It is the 3 rd item on the list at this time. 6 From the Genome Viewer page, scroll to the Expression and Regulation group of track controls. Open the menu for the track for GNF U74A. Choose Full mode. 7 You must click Refresh to enforce this change to the viewer. 8 Look back up in the Genome Viewer for the extensive list of tissues now present in the viewer. 9 Click on any GNF U74A track to go to the details page for this data. 10 Examine the color codes for the data. Note that higher expression is on the red scale, lower expression is on the green scale. 11 Scroll down the page to the Methods section. The Methods section provides additional information about the source and display of the data in the browser. Note that the details for this data are found in a publication which is also listed on this page. 12 Use the back button to go back to the Genome Viewer. Special note: there are several different microarray data sets, and the sets vary by species. UCSC presents the data, but they do not generate the data. It is provided by other labs. 2

2) Find the protein sequence for rat leptin. BLAT this sequence vs. the human genome to find the human homolog. Look for SNPs in the coding region of this gene are there any? Obtain the human DNA sequence for this region, and underline the SNPs. 1 From the Gateway page, search for the rat leptin gene, using the most current assembly. 2 From the rat leptin Genome Viewer region, get the protein sequence for leptin (hint: from the known gene details page). 3 Copy the rat leptin protein sequence. You can take everything on the page the FASTA formatting is fine. Use the Back button to return to the Known Gene page. 4 Now access the BLAT tool (either from the Known Gene page, or the home page). 5 Paste your leptin sequence into the BLAT text box 6 Check the BLAT options, and choose to BLAT against the human genome, using the May 2004 assembly, with a protein sequence. All other settings leave as default. 7 Submit your BLAT search. 8 From the BLAT results page, click the DETAILS link for the top hit. Examine the details page to see if the match is good. 9 If your match is acceptable, return to the BLAT results page. Now click BROWSER to see the Genome Viewer location with this match. 10 If you are convinced we are in the right genomic region in the viewer, we will get the DNA sequence for this region and find SNPs in exons. 11 First, choose the HIDE ALL button so we can add back just the items we care about. 12 Next, choose to see Known Genes in Full mode, and SNPs in Pack mode. Click Refresh to enforce these changes. In your viewer there should only be 2 tracks now. 13 Let s look at the SNPs in the context of the genomic sequence. Click the link for DNA in the blue navigation bar at the top. 14 From the Get DNA page, click the Extended Case/Color options button. 15 Choose Underline for Known Genes (essentially this means exons for this purpose), and Bold the SNPs. Also put 255 in one of the color boxes for SNPs. 16 Submit. You should have a new page with your sequence, with the Known Gene exons in bold, and SNPs underlined and in color. Special note: Extended case/color options list only those tracks which are currently shown in the Genome Viewer. 3

3) Find the genomic region for the human NRAS [neuroblastoma RAS viral (v-ras) oncogene homolog] gene. Look for ESTs that were found in neuroblastoma tissues by using the filter to color those blue in the genome viewer. When you are convinced that this protein is found in neuroblastoma tissue, examine the properties of this protein with the Proteome Browser (which is found as a link from Known Gene details pages). What are the pi, mw for this neuroblastoma protein? This exercise demonstrates the use of a FILTER for ESTs. 1 From the Gateway page, search for the human NRAS gene. 2 If you just did exercise 2, your settings will have been saved. Only 2 tracks will be shown. Click default tracks to restore the default settings. 3 From the NRAS Genome Viewer region page, click the link for the Human ESTs track above the pulldown menu in the track controls section. 4 On the Human ESTs track information and filter page, make 3 changes: a) change the Display to Pack; b) type neuroblastoma in the tissue box; c) select the color you want these ESTs to be. 5 Now click Submit, and you will return to the genome viewer. 6 The Genome Viewer will now display neuroblastoma ESTs in the color you selected. 7 If you are convinced that this protein is likely to be expressed in neuroblastomas, learn more about the protein properties with the Proteome Browser. 8 Click the track for NRAS in the Known Genes section. 9 On the Known Genes detail page click the link for the UCSC Proteome Browser (Hint: in the Quick Links box) 10 Examine the protein properties that are available in the UCSC Proteome Browser. Find the pi for the human NRAS protein: Find the molecular weight of the NRAS protein: Which KEGG pathways does this protein participate in? Special note: Not all tracks have filters; but you can find out the filtering options for a track using the links to the track information page above the pulldown menus, OR by clicking the gray or blue boxes on the left side of the genome viewer image section. Also note filters will remain in force until you reset everything!! 4

4) Perform an in silico PCR, to see what happens when more than 1 PCR product may arise. Determine the product sizes, and the melting temperatures of these primers. 1 Go to the UCSC Genome Browser homepage, genome.ucsc.edu 2 Enter the PCR tool by clicking either of the PCR or In Silico PCR links from the homepage. 3 Select human as the species, and the most current assembly. 4 Enter this as the FORWARD primer (with or without spaces): TTC AAG GAG GCC TTC TCC CT 5 Enter this as the REVERSE primer: CTG GGG GAG AAG CTG A 6 Click flip reverse primer. 7 Click Submit. 8 The results page will show that these particular primers would amplify 2 different genomic regions one on chr19 and one on chr10. The product size would vary and be detectible. Product size on chr19: Product size on chr10: This set of primers is clearly not specific for one region, if that is the goal. 9 What is the melting temperature for the primers? Forward primer: Reverse primer: 5

Exercises for the UCSC Gene Sorter and Table Browsers (part II) 1) Find genes that are predominately expressed in the mouse adrenal gland, determine the expression pattern of the human ortholog of one such gene and obtain the genomic sequence of the human gene. Skills: basic text search; Gene Sorter menus and options; configure & filtering, connections between browsers, etc. 2) Obtain a list of SNPs in a single gene (CLOC_HUMAN) using the Table Browser. Skills: basic table search menus and options; intersecting tables, choosing format, downloading sequence 3) Find CpG islands in known genes on the last part of chromosome 22 of the human genome. Obtain this sequence as one fasta record per region. Skills: basic table search menus and options; intersecting tables, choosing format, downloading sequence 6

Step-by-Step instructions for the UCSC Gene Sorter and Table Browser exercises 1) Find genes expressed predominately in the mouse adrenal gland. 1 Go to the UCSC Genome Browser homepage, genome.ucsc.edu 2 Enter the Gene Sorter, by clicking either of the Gene Sorter links from the homepage. 3 Select mouse as the species. Leave as August 2005 assembly 4 Enter a mouse gene (example: hoxa11) in the search box. 5 Sort by expression (GNF Atlas 2) 6 Click configure, leaving rest of default configuration, choose median of all replicas from the TISSUES pull down menu of Expression (GNF Atlas 2) and then 7 while still in configure menu, scroll down and click the checkbox for human ortholog. (If you wish, you can click on the up arrow of the human ortholog column till it is listed before GNF Atlas 2 column, for later ease of viewing, though this my take some time). Click submit. 8 Click the filter button and filter for genes highly expressed in the adrenal gland by putting 2.5 in the minimum box for the adrenal gland. (to obtain genes that are expressed in the adrenal gland to at least this level). Click submit. 9 The resulting browser page will show genes expressed in the adrenal gland (and other tissues as shown in red) 10 To narrow down the field to genes predominately expressed in the adrenal gland, click again on the filter button and add a maximum expression value (.5 is a good number) for the following tissues: frontal cortex, olfactory bulb, mammary gland, brown fat, and ovary. This will filter out these genes which are expressed in multiple tissues. Click submit. 11 The resulting page gives 12 genes expressed in the adrenal gland, you will notice that at least one of these genes is expressed predominately in the adrenal gland. (AK143928) is one strong possibility) 12 Click on the human ortholog name of the gene you ve chosen (hint: NM_053279) taking you to a choice between the Known 7

(hint: NM_053279) taking you to a choice between the Known Gene record or the RefSeq record. Choose the Known Gene record link which will take you to the genome graphical browser view. Find out more about this human ortholog now by clicking on the gene name in the known gene track. Two links will be of interest to you on this page: The Gene Sorter link under Quick Links to Tools and Databases heading and the sequence link under Human Gene C8orf13 Description and Page Index heading, clicking the former link will take you back to the Gene Sorter with the human ortholog as the entry. Here you can view the expression pattern. Clicking the latter sequence link will give you options to download the sequence of this gene (genomic, RNA, etc). 8

2) Obtain a list of SNPs in a single gene (CLOC_HUMAN) using the Table Browser. 1 Go to the UCSC Genome Browser homepage, genome.ucsc.edu 2 Enter the Table Browser, by clicking either of the Table Browser links from the homepage. 3 choose human and the May 2004 assembly. 4 Choose a table: Choose group Variation and Repeats in the groups pull-down menu, SNPs in the tracks menu and snp in the table menu. 5 type in CLOCK in the position box 6 Click the look up button. You will receive a list of records that have clock in the record. The one we are looking for is known gene at position chr4:56139587-56253925 (Alternatively, you could paste in a known accession number and choose that option, but we wanted to show how the look up works here.) Click the link that says CLOCK at chr4:56139587-56253925 under known genes section. This will put the position of the gene in the location box. 7 leave buttons for filter, intersection and correlation as default (none) 8 Under Output Format choose on all fields from selected table in pull-down menu 9 click get output. This gives a lists of SNPs in CLOCK gene and information about that SNP in tab-delineated fields. 10 You can now copy/paste or download the resulting file for more study. (CLOC_HUMAN is an interesting gene that encodes a protein associated with circadian rhythm sleep disorders. How would you find out more about this gene using the Genome Browser?) 9

3) Find CpG islands in known genes on the last part of chromosome 22 of the human genome. 1 Go to the UCSC Genome Browser homepage, genome.ucsc.edu 2 Enter the Table Browser, by clicking either of the Table Browser links from the homepage. 3 choose human and the May 2004 assembly 4 Choose a table: Choose Genes and Gene Predictions Tracks in Group pull-down menu, choose Known genes in the Track menu and knowngene in the table menu. 5 type in chr22:40000000-50000000 in the position box 6 Click the intersection create button 7 On the resulting page, choose Expression and Regulation in the group menu and CpG Islands in the track menu and CpG Islands (cpgislandext) is the table. Leave other options as default ( all Known Genes records that have any overlap with CpG Islands ) and click submit. 8 on the resulting page (back to table browser interface), choose sequence on the output format menu. Click Get Output 9 On resulting screen, choose genomic and click submit 10 Make sure only 5' UTR Exons, CDS Exons, 3' UTR Exons options are chosen (unclick introns). Then click on the One FASTA record per region option and leave rest of the sequence retrieval options as default. click Get Sequence 11 You can now copy/paste or download the resulting file (a list of CpG islands in known genes) for more study. The resulting file will be large. In cases like this, it is best to type in a file name in the Output File box. This will save a fasta formatted text file to your computer. 10