Human-Mouse Synteny in Functional Genomics Experiment

Similar documents
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations

Computational localization of promoters and transcription start sites in mammalian genomes

GMQL Functional Comparison with BEDTools and BEDOPS

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

Yale Pseudogene Analysis as part of GENCODE Project

Genomes and SNPs in Malaria and Sickle Cell Anemia

Comparing Methods for Identifying Transcription Factor Target Genes

MeDIP-chip service report

Pairwise Sequence Alignment

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

arxiv: v1 [q-bio.gn] 29 Jan 2015

Module 1. Sequence Formats and Retrieval. Charles Steward

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Activity 7.21 Transcription factors

Worksheet - COMPARATIVE MAPPING 1

Gene mutation and molecular medicine Chapter 15

A Tutorial in Genetic Sequence Classification Tools and Techniques

Thymine = orange Adenine = dark green Guanine = purple Cytosine = yellow Uracil = brown

Genome Explorer For Comparative Genome Analysis

Scottish Qualifications Authority

Next Generation Sequencing: Technology, Mapping, and Analysis

Chapter 6 DNA Replication

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

Bioinformatics Resources at a Glance

FINDING RELATION BETWEEN AGING AND

Module 3. Genome Browsing. Using Web Browsers to View Genome Annota4on. Kers4n Howe Wellcome Trust Sanger Ins4tute zfish-

Analysis of ChIP-seq data in Galaxy

Frequently Asked Questions Next Generation Sequencing

Note: This document wh_informatics_practical.doc and supporting materials can be downloaded at

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

Replication Study Guide

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

Biological Sciences Initiative. Human Genome

Mapping bias overestimates reference allele frequencies at the HLA genes in the 1000 Genomes Project phase I data

Translation Study Guide

Molecular Computing Athabasca Hall Sept. 30, 2013

Searching Nucleotide Databases

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

EPIGENETICS DNA and Histone Model

Chapter 18 Regulation of Gene Expression

Next generation sequencing (NGS)

Introduction to Bioinformatics AS Laboratory Assignment 6

Outline. MicroRNA Bioinformatics. microrna biogenesis. short non-coding RNAs not considered in this lecture. ! Introduction

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Human Genome and Human Genome Project. Louxin Zhang

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT

Control of Gene Expression

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

GenBank, Entrez, & FASTA

Clone Manager. Getting Started

Transcription and Translation of DNA

Protein Synthesis How Genes Become Constituent Molecules

An Overview of DNA Sequencing

Protein Protein Interaction Networks

Current Motif Discovery Tools and their Limitations

zpicture: Dynamic Alignment and Visualization Tool for Analyzing Conservation Profiles

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Final Project Report

Genome Viewing. Module 2. Using Genome Browsers to View Annotation of the Human Genome

Chapter 5: Organization and Expression of Immunoglobulin Genes

Gene Switches Teacher Information

Nebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA

The Human Genome Project

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

PrimePCR Assay Validation Report

Becker Muscular Dystrophy

RNA & Protein Synthesis

Data Analysis for Ion Torrent Sequencing

From DNA to Protein

GWASrap User Manual v1.1

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

13.4 Gene Regulation and Expression

Hierarchical Bayesian Modeling of the HIV Response to Therapy

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

Typing in the NGS era: The way forward!

Linking the Epigenome to the Genome: Correlation of Different Features to DNA Methylation of CpG Islands

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

Biology Final Exam Study Guide: Semester 2

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

Genetics Module B, Anchor 3

Accelerated evolution of conserved noncoding sequences in the human genome

12.1 The Role of DNA in Heredity

Comparative genomic tools and databases: providing insights into the human genome

Bob Jesberg. Boston, MA April 3, 2014

Von Mäusen und Menschen E - 1

Transcription:

Human-Mouse Synteny in Functional Genomics Experiment Ksenia Krasheninnikova University of the Russian Academy of Sciences, JetBrains krasheninnikova@gmail.com September 18, 2012 Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 1 / 20

Objectives 1 Obtain the synteny blocks between the genomes of Homo Sapiens (hg18) and Mus Musculus (mm9) 2 Study genetic properties of the syntenic data (genome coverage, locuses) 3 Study epigenetic properties of the syntenic data (compare the methylation level in synteny blocks) Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 2 / 20

Outline Human-Mouse Synteny Approaches to reveal conserved regions Evolutional properties of transcription start sites Enlightment Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 3 / 20

Facts about the Human-Mouse Relation Evolutional distance: 75 million years of evolution Human genome size: 3,107,677,273 bp [hg18, UCSC] Mouse genome size: 2,716,965,481 bp [mm9, Reference assembly (C57BL/6J, golden path )] 245-500 synteny blocks between human and mouse 90.2% of the human genome and 93.3% of the mouse genome lie in conserved syntenic segments [Waterstone et al, Nature 420, 2002] Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 4 / 20

Synteny What is a synteny block? A block of genes (markers) with evolutionary conserved order [cinteny.cchmc.org] Segments that can be converted into conserved segments by micro-rearrangements Usually consists of short regions of similarity (anchors) that may be interrupted by dissimilar regions and gaps [Pavel Pevzner and Glenn Tesler, 2003] Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 5 / 20

Genetic Properties of Locuses: Two Strategies 1 first get genome pairwise alignments and use them as anchors to find synteny blocks Alignments: BLASTZ (local), Vista(glocal) Algorithms: GRIMM-Synteny, DRIMM-Synteny, i-adhore 2 get synteny mark-up from database and align syntenic regions to get 1-nucleotide resolution Databases: Cinteny, OrthoClusterDB Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 6 / 20

Algorithms GRIMM-Synteny: given anchors Anchor graph, gap size G search for connected components i-adhore given anchors genomic profiles represent alignmnets of homologous segments greedy algorithm used to construct the alignments Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 7 / 20

Alignments and Coverage BLASTZ Whole genome-genome local alignment adjusted for complicated genomes comparison Repeat masking Specialized substitution matrix Coverage: 32.5% VISTA Whole genome-genome alignment Pipeline: BLAT local alignments Shuffle-LAGAN glocal chaining Sensible to inversions Coverage: 7-20% Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 8 / 20

Transcription Start Sites (TSS) Coverage Transcription Start Site is where a molecule of RNA polymerase II binds. The start site is where transcription of the gene into RNA begins. Figure : Start of transcription, yellow ellipse shows the RNA polymerase II Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 9 / 20

Transcription Start Sites (TSS) Coverage TSS can be treated as an exact position in genome or also as a location area UCSC Genome Browser Data: SwitchGear TSS and Eponine TSS (Experimental and Machine Learning approaches) alternative: txstart or segment around cdsstart Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 10 / 20

Distribution of the conservative TSSs for txstart locus [UCSC Genome Browser Data] Figure : Cumulative frequency of conservation between TSS regions in Human-Mouse for a genome segment [txstart-50, txstart] Figure : Cumulative frequency of conservation between the closest TSS regions in Human-Mouse for a genome segment [txstart-50, txstart+50] Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 11 / 20

Distribution of conservation of TSS for [cdsstart - X, cdsstart] locus [UCSC Genome Browser Data] Figure : Cumulative frequency of conservation between the TSS regions in Human-Mouse for a genome segment [cdsstart-35, cdsstart] Figure : Cumulative frequency of conservation between the Human TSS and Mouse Genome for a genome segment [cdsstart-50, cdsstart] Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 12 / 20

Is it expected to be conserved? Figure : Phylogeny and constrained elements from the 29 eutherian mammalian genome sequences. Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 13 / 20

Highly Conserved TSS 4748 tss: txstart(mouse) txstart(human) < 100. Figure : Frequency of distance between the closest TSS in Human-Mouse Figure : Cumulative distribution for distance [0, 2000], step = 100 Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 14 / 20

Expression of genes closest to the highly conserved TSS according to RefSeq Figure : Expression in kidneys red - low-distance tss blue - large-distance tss Figure : Expression in lungs Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 15 / 20

Expression of genes closest to the highly conserved TSS according to RefSeq Figure : Expression in liver red - low-distance tss blue - large-distance tss Figure : Expression in hypothalamus Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 16 / 20

Statistical difference in expression of genes close to low-distance tss comparing to large-distance tss Wilcoxon rank-sum test : Lungs: p-value = 0.255 Kidney: p-value = 3.713e-09 Hypothalamus: p-value = 0.05862 Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 17 / 20

Statistical difference in expression of genes close to low-distance tss comparing to large-distance tss Wilcoxon rank-sum test : Lungs: p-value = 0.255 Kidney: p-value = 3.713e-09 Hypothalamus: p-value = 0.05862 Alternative hypothesis: one distribution is stochastically greater than the other Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 18 / 20

However What does the Human - Mouse 90% similarity suggest? Figure : Alignment of the 1st human chromosome against the 1st mouse chromosome [GRIMM Human-Mouse alignments at cinteny.cchmc.org] Figure : Genes in the large green region [GRIMM Human-Mouse alignments at cinteny.cchmc.org] Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 19 / 20

Discussion Thank you! Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 20 / 20