Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics



Similar documents
Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Next Generation Sequencing

G E N OM I C S S E RV I C ES

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

A Primer of Genome Science THIRD

GMQL Functional Comparison with BEDTools and BEDOPS

Discovery & Modeling of Genomic Regulatory Networks with Big Data

GeneProf and the new GeneProf Web Services

Overview. Transcriptional cascades. Amazing aspects of lineage plasticity. Conventional (B2) B cell development

CCR Biology - Chapter 9 Practice Test - Summer 2012

Partek Methylation User Guide

Comparing Methods for Identifying Transcription Factor Target Genes

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

ELITE Custom Antibody Services

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator

The University is comprised of seven colleges and offers 19. including more than 5000 graduate students.

Fast. Integrated Genome Browser & DAS. Easy. Flexible. Free. bioviz.org/igb

GenBank, Entrez, & FASTA

Analysis of ChIP-seq data in Galaxy

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Chapter 5: Organization and Expression of Immunoglobulin Genes

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

Computational Genomics. Next generation sequencing (NGS)

Human-Mouse Synteny in Functional Genomics Experiment

LifeScope Genomic Analysis Software 2.5

Genetomic Promototypes

The Galaxy workflow. George Magklaras PhD RHCE

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

Version 5.0 Release Notes

KMS-Specialist & Customized Biosimilar Service

Overview of Next Generation Sequencing platform technologies

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Faculty of Medicine. Settore disciplinare: BIO/10. functional domains. Monica Soldi. IFOM-IEO Campus, Milan. Matricola n. R08407

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

The National Institute of Genomic Medicine (INMEGEN) was

Bioinformatics Unit Department of Biological Services. Get to know us

Next Generation Sequencing: Technology, Mapping, and Analysis

School of Nursing. Presented by Yvette Conley, PhD

-> Integration of MAPHiTS in Galaxy

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

GeneSifter: Next Generation Data Management and Analysis for Next Generation Sequencing

Current Motif Discovery Tools and their Limitations

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Nebula A web-server for advanced ChIP-seq data analysis. Tutorial. by Valentina BOEVA

TCRG TCRA/D IGH IGK/L

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial

Antibody Structure, and the Generation of B-cell Diversity CHAPTER 4 04/05/15. Different Immunoglobulins

Superior TrueMAB TM monoclonal antibodies for the recognition of proteins native epitopes

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Biotechnology and Life Science Marketing Services Mailing List and Data Card Order Form

New solutions for Big Data Analysis and Visualization

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Frequently Asked Questions Next Generation Sequencing

Running a Bioinformatics Help Desk. Solved and Unsolved Problems

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

mygenomatix - secure cloud for NGS analysis

PREDA S4-classes. Francesco Ferrari October 13, 2015

MeDIP-chip service report

The Segway annotation of ENCODE data

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Delivering the power of the world s most successful genomics platform

BIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am)

Biology & Big Data. Debasis Mitra Professor, Computer Science, FIT

The world of non-coding RNA. Espen Enerly

Core Facility Genomics

Next generation sequencing and proteomics. to study the antibody repertoire. and generate monoclonal antibodies

History of DNA Sequencing & Current Applications

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

NCBI resources III: GEO and ftp site. Yanbin Yin Spring 2013

Introduction. Overview of Bioconductor packages for short read analysis

Genome and DNA Sequence Databases. BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009

Next generation sequencing (NGS)

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

An Overview of Cells and Cell Research

Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource

Understanding West Nile Virus Infection

Immunology Ambassador Guide (updated 2014)

Next generation DNA sequencing technologies. theory & prac-ce

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data

Challenges associated with analysis and storage of NGS data

An Introduction to Genomics and SAS Scientific Discovery Solutions

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

TECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298

Modelli murini di linfomagenesi. Roberto Chiarle, M.D. Firenze, 24/11/2011

PolyLens: Software for Map-based Visualization and Analysis of Genome-scale Polymorphism Data

The Advantages and Disadvantages of Using Gene Ontology

Guidance for Industry

Biochemistry Major Talk Welcome!!!!!!!!!!!!!!

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Systems Biology through Data Analysis and Simulation

Visualisation tools for next-generation sequencing

Outline. MicroRNA Bioinformatics. microrna biogenesis. short non-coding RNAs not considered in this lecture. ! Introduction

LESSON 3: ANTIBODIES/BCR/B-CELL RESPONSES

Transcription:

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Christopher Benner, PhD Director, Integrative Genomics and Bioinformatics Core (IGC) idash Webinar, April 17 th 2015

Overview for Webinar: Quick introduction to the wider world of next-generation sequencing (NGS) Overview of HOMER, our software for NGS analysis Using advanced NGS assays to understand B cell development and the generation of antibody repertoires Quick teaser on how innovative NGS assays and genetics can enhance our understanding of transcriptional mechanisms

Next-Generation Sequencing Large Consortiums 1000 Genomes Project TCGA (cancer) many many more Illumina sequencing can sequence any DNA fragment from 0-600 bp in length

NGS Innovation RNA-Seq (i.e. gene expression) GRO-Seq (i.e. transcription rates) ChIP-Seq DNA:protein interactions

Graphic from Illumina Inc.

HOMER (Hypergeometric Optimization of Motif EnRichment) http://homer.salk.edu Next-generation Sequencing Analysis for Quantitative Genomics Software suite for UNIX command-line environment (works downstream of manufacture s pipeline and mapping to reference genome) Quality Control for Experiments Basic and advanced analysis, annotation, and visualization capabilities General framework handles data from different types of quantitative sequencing (ChIP-Seq/RNA-Seq/GRO- Seq/DNase-Seq/etc.) Can work with any organism Regulatory element analysis De novo Motif Discovery Sort out spatial relationships between sequence features

Overview of HOMER

HOMER Functionality Any organism with a FASTA file can be analyzed with HOMER Model organisms are preconfigured with annotation information: Human, mouse, rat, zebrafish, drosophila, C. elegans, yeast, pombe, arabidopsis Genomes annotated on the UCSC Genome Browser are easy to incorporate, but any custom genome can be added with annotation files (i.e., GTF files)

HOMER Tutorials (on website)

Best way to develop NGS Analysis methods: Do it in the context of research! Biology Bioinformatics NGS Methods Development

Interplay between epigenetics, spatial genome conformation, and transcription in B-lymphocyte development

Interplay between epigenetics, spatial genome conformation, and transcription in B-lymphocyte development

Why study transition from pre-pro-b to pro-b cells? Lineage commitment: pro-b cells cannot dedifferentiate back to hematopoietic stem cells. i.e. pre-pro-b cells can be used to reconstitute the whole immune system Antibody Recombination: Pro-B cells are paused at the exact stage when VDJ recombination is set to occur B cell marker expression: Key cell-surface markers and transcription factors are induced in pro-b cells, including CD19, Ebf1 (Early B cell factor), Pax5, and Foxo1.

Mapping the Epigenome

Unbiased Discovery of Regulatory Features in pro-b cells

Relationship between Transcription Factors and Epigenetic Modifications Transcription Factors

Unbiased Discovery of Lineage Determining Transcription Factors Ebf1, E2A mice fail to make pro-b cells

Hi-C: Mapping 3D interactions in the genome GRO- Seq Hi-C method from Lieberman-Aiden et al., Science 2009

Most significant interactions in the genome occur at epigenetically modified locations

Cell-type specific interactions often change their DNA methylation status

Genome Organization into topological domains pre-pro-b pro-b TAD definition by Dixon et al. 2012

CTCF binding site is directional CTCF only makes interactions with other CTCF sites in a specific direction along the DNA determined by the orientation of the motif 5 boundary of TAD 3 boundary of TAD

Clusters of CTCF sites form Super Anchors pre-pro-b pro-b

Clusters of CTCF sites form Super Anchors Igh Firre Foxo1 Borrowing from Richard Young s Super Enhancer concept, we can define over 2500 CTCF super anchors in the data Only 25% of CTCF sites are found at boundaries. However, nearly 50% of Super Anchors are found at the boundaries of topological domains.

Overview of Immunoglobulin Heavy Chain Locus

Igh Locus in the Genome (~3 Mb) Top Super Anchor

Igh Locus in the Genome (~3 Mb) To generate full repertoires of Antibodies, each V region needs to find a way to interact with the D regions to recombine Top Super Anchor

V regions in Igh locus are associated with CTCF sites In addition, each CTCF site associated with V regions is in a consistent orientation

CTCF Orientation at D/J regions

Igh Locus Model VD recombination target Top Super Anchor (looping backstop)

Summary NGS is a lot more than genome sequencing Integration of different data types empowers discovery where any given data type alone falls short The DNA sequence (CTCF motifs and their orientation) dictates the structure of the genome to accomplish critical tasks such as VDJ recombination

Future Directions: Leveraging Genetics

Thanks!