A Brief Introduction on DNase-Seq Data Aanalysis



Similar documents
The Segway annotation of ENCODE data

Nebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA

Comparing Methods for Identifying Transcription Factor Target Genes

How many of you have checked out the web site on protein-dna interactions?

How Sequencing Experiments Fail

Current Motif Discovery Tools and their Limitations

GMQL Functional Comparison with BEDTools and BEDOPS

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Genome-wide measurements of protein-dna interaction by chromatin immunoprecipitation

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Interaktionen von Nukleinsäuren und Proteinen

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

PrimePCR Assay Validation Report

PrimePCR Assay Validation Report

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Outline. interfering RNA - What is dat? Brief history of RNA interference. What does it do? How does it work?

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

RNA & Protein Synthesis

Real-time quantitative RT -PCR (Taqman)

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Microarray Technology

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, b.patel@griffith.edu.

BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS

DNA Sequencing Troubleshooting Guide

Central Dogma. Lecture 10. Discussing DNA replication. DNA Replication. DNA mutation and repair. Transcription

A Primer of Genome Science THIRD

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial

How To Understand How Gene Expression Is Regulated

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

Computational Genomics. Next generation sequencing (NGS)

Real-time PCR: Understanding C t

Error Tolerant Searching of Uninterpreted MS/MS Data

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

Activity 7.21 Transcription factors

ONLINE SUPPLEMENTAL MATERIAL. Allele-Specific Expression of Angiotensinogen in Human Subcutaneous Adipose Tissue

Next Generation Sequencing: Technology, Mapping, and Analysis

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Illumina TruSeq DNA Adapters De-Mystified James Schiemer

Pairwise Sequence Alignment

An Introduction to Genomics and SAS Scientific Discovery Solutions

Control of Gene Expression

PreciseTM Whitepaper

FOR REFERENCE PURPOSES

DNA Sequence Analysis

Quantitative proteomics background

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

IMBB Genomic DNA purifica8on

Biotechnology: DNA Technology & Genomics

Appendix C DNA Replication & Mitosis

Molecular Biology Techniques: A Classroom Laboratory Manual THIRD EDITION

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Module 10: Bioinformatics

THE LIFE SCIENCE DASHBOARD

Interaktionen von RNAs und Proteinen

Next Generation Sequencing

Genome accessibility is widely preserved and locally modulated during mitosis

Genomic DNA Extraction Kit INSTRUCTION MANUAL

50 g 650 L. *Average yields will vary depending upon a number of factors including type of phage, growth conditions used and developmental stage.

ChIP TROUBLESHOOTING TIPS

Multiprotocol Label Switching (MPLS)

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Genetics Lecture Notes Lectures 1 2

Molecular Cloning, Product Brochure

Speed Matters - Fast ways from template to result

Introduction to Genome Annotation

Introduction Bioo Scientific

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Design of conditional gene targeting vectors - a recombineering approach

Faculty of Medicine. Settore disciplinare: BIO/10. functional domains. Monica Soldi. IFOM-IEO Campus, Milan. Matricola n. R08407

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.

Description: Molecular Biology Services and DNA Sequencing

Frequently Asked Questions Next Generation Sequencing

HCS Exercise 1 Dr. Jones Spring Recombinant DNA (Molecular Cloning) exercise:

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

Molecular Computing Athabasca Hall Sept. 30, 2013

Replication Study Guide

Chapter 18 Regulation of Gene Expression

Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne

mrna EDITING Watson et al., BIOLOGIA MOLECOLARE DEL GENE, Zanichelli editore S.p.A. Copyright 2005

Gene Mapping Techniques

Single Nucleotide Polymorphisms (SNPs)

Bioinformatics Unit Department of Biological Services. Get to know us

Genetomic Promototypes

Control of Gene Expression

Automated Library Preparation for Next-Generation Sequencing

G E N OM I C S S E RV I C ES

GeneProf and the new GeneProf Web Services

Transcription:

A Brief Introduction on DNase-Seq Data Aanalysis Hashem Koohy, Thomas Down, Mikhail Spivakov and Tim Hubbard Spivakov s and Fraser s Lab September 13, 2014

1

Introduction DNaseI is an enzyme which cuts duplex DNA at a rate that depends strongly its chromatin environment. In Combination with HTS technology, it can be used to infer genome wide landscape of open chromatin regions. Using this technique, systematic identification of hundreds of thousands of DNaseI Hypersensitive Sites (DHS) per cell type has been possible(encode). In what follows, we provide a brief introduction about this technique with a particular emphasis on peak calling step.

Currently Existing DNase-Seq Protocoles Double Hit Protocole Developed in John Stam Lab in University of Washington, and has been used greatly for detection of DHS in ENCODE project. Single Hit Protocol Developed in Greg Crawford Lab in Duke University. It has been used for detection DHS in ENCODE. This protocol is also in great use by some other researchers world-wide. ATAC-Seq Developed in Greenland Lab in Stanford University. This is a very new protocol (published 2013) and has been reported to be very fast and very efficient.

Two Commonly Used Protocols End Capture (Duke) and Double Hit (UW) Protocol Ligate Biotinylated Linker1 Mmel Digested Ligate Linker2 PCR Amplification Sequencing End Capture Protocol: Greg Crawford Lab, Duke Double Hit Protocol: John Stam Lab, UW

ChIP-Seq vs DNase-Seq Note that DNase HS is different from its sister DNase Footprinting ChIP-Seq: Nature Reviews, Peter J. Park, 2009 DNase HS: Duke Protocol

TF ChIP-Seq vs DNase-Seq Some key differences between TF ChIP-Seq and DNase-Seq: In ChIP-Seq data, a protein is usually in bound or unbound position, whereas DNaseI shows a more generic behaviour, representing the openness of the chromatin to any regulatory feature; DNase HS are strand-independent and therefore no need to shift size or tag extension; DNase HS data sometimes shows less enrichment over wider regions (a kind of Mixed-Source).

DNase-Seq Data Analysis Similar to any other ChIP-Seq data, there are various steps: Sequencing, Mapping and Quality Controls Sequencing is getting cheaper, providing us with more data! Mapping possibly is still the most computationally expensive part. Peak Calling Gauging the statistical significance of reads enrichment which is generally known as Peak Calling is very central to ChIP-Seq data analysis. Post Peak Calling Analysis Different directions and purposes, including differential binding analysis, motif discovery, detection of regulatory regions, Genome segmentation and so on

Context-Specificity of ChIP-Seq Data Different protein classes have distinct mode of interactions: Point-Source These factors and chromatin marks are localised specifically and have high signal-to-noise ration Broad-Source These factors are associated with wide genomic domains, generating broad but more noisy signals; e.g. H3K9me3, H3K36me3 Mixed-Source These factors show a point-source style signal at some regions whereas more broader in other regions e.g. RNA Pol II

DNaseI Peak Calling Algorithms Note that, a peak caller is an algorithms and/or a model that is applied to gauge the statistical significance of enrichment of the short read tags. Context specificity of ChIP data among some other reasons has give rise to numerous peak callers. MACS and F-SEQ are considered among the best reported peak callers for the DNaseI-Seq. Throughout this tutorial, We will try to illustrate how to use these two peak callers for detection statistically significant open chromatin regions. We also show you very briefly how to visualise your data as the very fist step of validating your results.

F-Seq An histogram-based (number of tags per bin) approach is, possibly, the most naivest for gauging the enrichment of short read tags; However, it suffers from some problems including boundary effects and selection of bin width; To overcome, F-Seq suggested in which a Kernel Density Estimator(with mean 0 and variance 1) is applied to obtain the distribution of reads: p(x) = 1 i=n K( x x i ) nb b i=1 F-Seq has been implemented in Java, easy to use, though, doesn t support some commonly used file formats.

MACS The most used peak callers for ChIP-Seq data; It has been reviewed and benchmarked in different studies; At the time of development, the emphasis was on handling shift size and local biases from sequencability and mappability; A Poisson model is employed for identification of statistically signicant enriched regions; MACS has been implemented in python and is relatively fast. It is user friendly and fairly well-supported.