Human-Mouse Synteny in Functional Genomics Experiment Ksenia Krasheninnikova University of the Russian Academy of Sciences, JetBrains krasheninnikova@gmail.com September 18, 2012 Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 1 / 20
Objectives 1 Obtain the synteny blocks between the genomes of Homo Sapiens (hg18) and Mus Musculus (mm9) 2 Study genetic properties of the syntenic data (genome coverage, locuses) 3 Study epigenetic properties of the syntenic data (compare the methylation level in synteny blocks) Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 2 / 20
Outline Human-Mouse Synteny Approaches to reveal conserved regions Evolutional properties of transcription start sites Enlightment Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 3 / 20
Facts about the Human-Mouse Relation Evolutional distance: 75 million years of evolution Human genome size: 3,107,677,273 bp [hg18, UCSC] Mouse genome size: 2,716,965,481 bp [mm9, Reference assembly (C57BL/6J, golden path )] 245-500 synteny blocks between human and mouse 90.2% of the human genome and 93.3% of the mouse genome lie in conserved syntenic segments [Waterstone et al, Nature 420, 2002] Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 4 / 20
Synteny What is a synteny block? A block of genes (markers) with evolutionary conserved order [cinteny.cchmc.org] Segments that can be converted into conserved segments by micro-rearrangements Usually consists of short regions of similarity (anchors) that may be interrupted by dissimilar regions and gaps [Pavel Pevzner and Glenn Tesler, 2003] Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 5 / 20
Genetic Properties of Locuses: Two Strategies 1 first get genome pairwise alignments and use them as anchors to find synteny blocks Alignments: BLASTZ (local), Vista(glocal) Algorithms: GRIMM-Synteny, DRIMM-Synteny, i-adhore 2 get synteny mark-up from database and align syntenic regions to get 1-nucleotide resolution Databases: Cinteny, OrthoClusterDB Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 6 / 20
Algorithms GRIMM-Synteny: given anchors Anchor graph, gap size G search for connected components i-adhore given anchors genomic profiles represent alignmnets of homologous segments greedy algorithm used to construct the alignments Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 7 / 20
Alignments and Coverage BLASTZ Whole genome-genome local alignment adjusted for complicated genomes comparison Repeat masking Specialized substitution matrix Coverage: 32.5% VISTA Whole genome-genome alignment Pipeline: BLAT local alignments Shuffle-LAGAN glocal chaining Sensible to inversions Coverage: 7-20% Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 8 / 20
Transcription Start Sites (TSS) Coverage Transcription Start Site is where a molecule of RNA polymerase II binds. The start site is where transcription of the gene into RNA begins. Figure : Start of transcription, yellow ellipse shows the RNA polymerase II Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 9 / 20
Transcription Start Sites (TSS) Coverage TSS can be treated as an exact position in genome or also as a location area UCSC Genome Browser Data: SwitchGear TSS and Eponine TSS (Experimental and Machine Learning approaches) alternative: txstart or segment around cdsstart Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 10 / 20
Distribution of the conservative TSSs for txstart locus [UCSC Genome Browser Data] Figure : Cumulative frequency of conservation between TSS regions in Human-Mouse for a genome segment [txstart-50, txstart] Figure : Cumulative frequency of conservation between the closest TSS regions in Human-Mouse for a genome segment [txstart-50, txstart+50] Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 11 / 20
Distribution of conservation of TSS for [cdsstart - X, cdsstart] locus [UCSC Genome Browser Data] Figure : Cumulative frequency of conservation between the TSS regions in Human-Mouse for a genome segment [cdsstart-35, cdsstart] Figure : Cumulative frequency of conservation between the Human TSS and Mouse Genome for a genome segment [cdsstart-50, cdsstart] Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 12 / 20
Is it expected to be conserved? Figure : Phylogeny and constrained elements from the 29 eutherian mammalian genome sequences. Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 13 / 20
Highly Conserved TSS 4748 tss: txstart(mouse) txstart(human) < 100. Figure : Frequency of distance between the closest TSS in Human-Mouse Figure : Cumulative distribution for distance [0, 2000], step = 100 Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 14 / 20
Expression of genes closest to the highly conserved TSS according to RefSeq Figure : Expression in kidneys red - low-distance tss blue - large-distance tss Figure : Expression in lungs Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 15 / 20
Expression of genes closest to the highly conserved TSS according to RefSeq Figure : Expression in liver red - low-distance tss blue - large-distance tss Figure : Expression in hypothalamus Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 16 / 20
Statistical difference in expression of genes close to low-distance tss comparing to large-distance tss Wilcoxon rank-sum test : Lungs: p-value = 0.255 Kidney: p-value = 3.713e-09 Hypothalamus: p-value = 0.05862 Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 17 / 20
Statistical difference in expression of genes close to low-distance tss comparing to large-distance tss Wilcoxon rank-sum test : Lungs: p-value = 0.255 Kidney: p-value = 3.713e-09 Hypothalamus: p-value = 0.05862 Alternative hypothesis: one distribution is stochastically greater than the other Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 18 / 20
However What does the Human - Mouse 90% similarity suggest? Figure : Alignment of the 1st human chromosome against the 1st mouse chromosome [GRIMM Human-Mouse alignments at cinteny.cchmc.org] Figure : Genes in the large green region [GRIMM Human-Mouse alignments at cinteny.cchmc.org] Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 19 / 20
Discussion Thank you! Ksenia Krasheninnikova (AU) Human-Mouse Synteny September 18, 2012 20 / 20