High-throughput sequencing and big data: implications for personalized medicine?



Similar documents
Human Genome and Human Genome Project. Louxin Zhang

Genome and DNA Sequence Databases. BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009

How To Understand The Science Of Genomics

History of DNA Sequencing & Current Applications

The Human Genome Project. From genome to health From human genome to other genomes and to gene function Structural Genomics initiative

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Genetic diagnostics the gateway to personalized medicine

Embargoed until 14:30 CEST European time, 13:30 BST UK, 8:30 Eastern US summer time Contacts:

Intro to Bioinformatics

Human Genome Organization: An Update. Genome Organization: An Update

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Structure and Function of DNA

OpenMedicine Foundation (OMF)

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

Grand V Challenge We must improve human health, nutrition and wellness of the U.S. population

DNA Sequencing and Personalised Medicine

BioBoot Camp Genetics

Big data in cancer research : DNA sequencing and personalised medicine

Canadian Microbiome Initiative

Ten years ago marked the official end of the Human Genome Project, but really it was just the beginning

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

TRACKS GENETIC EPIDEMIOLOGY

How Can Institutions Foster OMICS Research While Protecting Patients?

Basic Concepts of DNA, Proteins, Genes and Genomes

A leader in the development and application of information technology to prevent and treat disease.

Q&A: Kevin Shianna on Ramping up Sequencing for the New York Genome Center

FACULTY OF MEDICAL SCIENCE

The Human Genome Project

Mitochondrial DNA Analysis

14.3 Studying the Human Genome

Biological Sciences Initiative. Human Genome

ITT Advanced Medical Technologies - A Programmer's Overview

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

The 100,000 genomes project

A Primer of Genome Science THIRD

The National Institute of Genomic Medicine (INMEGEN) was

U.S. Meat Animal Research Center Clay Center, NE

CCR Biology - Chapter 9 Practice Test - Summer 2012

Biology & Big Data. Debasis Mitra Professor, Computer Science, FIT

Give a NOD to diabetes:

1865 Discovery: Heredity Transmitted in Units

DNA and the Cell. Version 2.3. English version. ELLS European Learning Laboratory for the Life Sciences

The Healing Power of Data

Human Genetics: Online Resources

All your base(s) are belong to us

Localised Sex, Contingency and Mutator Genes. Bacterial Genetics as a Metaphor for Computing Systems

SAP HANA Enabling Genome Analysis

2019 Healthcare That Works for All

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Graduate School of Excellence Exzellenz-Graduiertenschule

A Genomic Timeline Tim Shank 2003

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

Deliverable First report on sample storage, DNA extraction and sample analysis processes

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider

Methods for Promoting Public Dialogue on the Use of Residual Newborn Screening Samples for Research

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

University Uses Business Intelligence Software to Boost Gene Research

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Searching Nucleotide Databases

NGS and complex genetics

SEQUENCING INITIATIVE SUOMI (SISU) SYMPOSIUM SPEAKERS August 26, 2014

Biotechnical Engineering (BE) Course Description

PERSONALIZED MEDICINE: Tailoring Health Care for the Information Age

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility

MEMORANDUM. The Establishment of the Center for Integrated Animal Genomics (CIAG) at Iowa State University

Databases and platforms for data analysis from NGS of MTB

PROYECTO GENOMA HUMANO. Medicina Molecular 2011 Maestría en Biología Molecular Médica- UBA

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Biological Sciences B.S. Degree Program Requirements (Effective Fall, 2015)

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Integration of genomic data into electronic health records

Personalized medicine in China s healthcare system

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

AmphoraNet: Taxonomic Composition Analysis of Metagenomic Shotgun Sequencing Data

Teaching Bioinformatics to Undergraduates

Lessons from the Stanford HIV Drug Resistance Database

Course Descriptions. I. Professional Courses: MSEG 7216: Introduction to Infectious Diseases (Medical Students)

G E N OM I C S S E RV I C ES

Cystic Fibrosis Webquest Sarah Follenweider, The English High School 2009 Summer Research Internship Program

2 The Human Genome Project

CURRICULUM VITAE Rosella Visintin. Assistant professor of SEMM (European School of Molecular Medicine). IFOM-IEO Campus Milan - Italy

Institutional Partnership Program

International CEMarin Omics Workshop: Omics Techniques for the Study of Marine Organisms and Ecosystems

ATIP Avenir Program Applicant s guide

FOR TEXTILES. o Kuchipudi-I] textiles. Outdoor Activity Based Courses. Camp

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

EMBL. International PhD Training. Mikko Taipale, PhD Whitehead Institute/MIT Cambridge, MA USA

UCF, College of Medicine BS Biotechnology MS Biotechnology/MBA Professional Science Masters Program in Biotechnology/MBA

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

NIH/NIGMS Trainee Forum: Computational Biology and Medical Informatics at Georgia Tech

13.4 Gene Regulation and Expression

Personalized Medicine and IT

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

BIOLOGICAL SCIENCES REQUIREMENTS [63 75 UNITS]

Transcription:

High-throughput sequencing and big data: implications for personalized medicine? Dominick J. Lemas, PhD Postdoctoral Fellow in Pediatrics - Neonatology Mentor: Jacob E. (Jed) Friedman, PhD

What is Big Data? 1) Top Tech phrase of 2013 1 2) Messy, Noisy, Imprecise 3) Datafication 4) Repurposed 5) N= All 6) Privacy? 1 http://www.languagemonitor.com/

What is the human genome? 1. Human body contains trillions of cells. 2. Each cell contains a nucleus. 3. Each nucleus contains 2 complete genomes. 4........ OR if the genome was a book! There are 23 chapters chromosomes Each chapter contains several hundred stories genes Each story is composed of paragraphs exons Interrupted by advertisements introns Each paragraph is made up of words codons Each word is written in letters bases... A,C,T,G Genome. Matt Ridley. 1999

Goals of Human Genome Project 1) Generate working draft of 90% of the human genome (2001). 2) Obtain complete, high-quality genomic sequence (2003). 3) Make all data publically available. 4) Develop novel sequencing technologies. 5) Map Sequence Variation. 6) Interpret functions of genome. 7) Develop comparative genomic strategies. 8) Ethical, legal and social implications (ELSI). 9) Bioinformatics and Computational Biology 10) Training

Benefits of Sequencing Human Genome 1) Molecular Medicine 2) Energy and Environmental Applications 3) Bioarcheology, anthropology, evolution, human migration 4) DNA forensics 5) Agriculture, livestock breeding, and bioprocessing

Sequencing Milestones: the early days 1953 1977 1985 1988 Watson & Crick Fredrick Sanger Charles DeLisi James D. Watson Collins. 2001.Nature 422, 835-847

Sequencing Milestones: HGP Francis Collins Bermuda Principle Craig Venter H. influenzae Sequenced C. elegans Sequenced D. Melanogaster Sequenced Collins. 2001.Nature 422, 835-847

The International Human Genome Sequencing Consortium G5- Completed Bulk of Sequencing Whitehead Institute/MIT Center for Genome Research, Cambridge, Mass., U.S. The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, U.K. Washington University School of Medicine Genome Sequencing Center, St. Louis, Mo., U.S. U. S. Department of Energy Joint Genome Institute, Walnut Creek, Calif., U.S. Baylor College of Medicine Human Genome Sequencing Center, Houston, Tex., U.S. RIKEN Genomic Sciences Center, Yokohama, Japan Genoscope and CNRS UMR-8030, Evry, France GTC Sequencing Center, Waltham, Mass., U.S. Department of Genome Analysis, Jena, Germany Beijing Genomics Institute/Human Genome Center, Beijing, China Multimegabase Sequencing Center, Seattle, Wash., U.S. Stanford Genome Technology Center, Stanford, Calif., U.S. Stanford Human Genome Center, Stanford, Calif., U.S. University of Washington Genome Center, Seattle, Wash., U.S. Department of Molecular Biology, Tokyo, Japan University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, U.S. University of Oklahoma's Advanced Center for Genome Technology, Norman, Okla., U.S. Max Planck Institute for Molecular Genetics, Berlin, Germany Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, Cold Spring Harbor, N.Y., U.S. GBF - German Research Centre for Biotechnology, Braunschweig, Germany

Sanger Sequencing 1975-1991 DNA Template The ABI Prism 3700 /3730 1992-2000 s $300,000/machine Sequence 50-100K bp/hr

Whole-Genome Sequencing Break genome into small pieces called reads Sequence reads Assemble Reads

Computational Challenges Coverage? Imputation? Alignment? Formatting? Analysis?

Where does the data live?

Sequencing has gotten Cheaper and Faster Cost of one human genome HGP: $ 3 billion 2004: $30,000,000 2008: $100,000 2010: $10,000 2011: $4,000 2012-13: $1,000???: $300

BIG DATA & Sequence Whole-Genome Shotgun (WGS) GenBank http://plone.hpcf.upr.edu/members/humberto/class/2006/bioinformatics/bioinformatics ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt

So What Did We Learn? <3% of genome encodes ~20,000 genes. More than half of genome is repetitive. Identification of ~2,850 gene impact rare diseases. ~1,100 markers affecting common disease & ~150 targets for cancer. Big Science can win. Cost of sequencing per base has been reduced by magnitude of ~100k. Lander. 2011. Nature 470, 187-197

Emerging Applications of Sequence data Gene Expression? Human Genome Sequence Genetic Variation? Microbial Variation? Metabolites? Epigenetic Variation?

Mapping Genetic Variation ~7 billion people ~8.7 million species ~37 trillion cells/human

Obesity is Complex Obesity Environment Genes 18

Disease Risk Gene-Environment Interactions Genotype A/A A/B B/B Environment 19

Variation in the Human Microbiome Microbe: tiny living organism, such as bacterium, fungus, protozoan, or virus. Microbiome: collectively all the microbes in the human body; a community of microbes.

Variation in Human Microbiome? Relative to humans: 9 in 10 cells are microbial! ~1000 different species. ~150x more genes. ~3lbs of microbes in the human gut. ~60% of stool by dry mass is microbial. Mass of Earth 5.9 x 10 24 kg Mass of Moon 7.3 x 10 22 kg Primary functions of the microbiome: Stimulate the development of our immune system. Resist colonization by pathogens. Extract energy from food.

Does the microbiome impact maternal metabolism during pregnancy? Koren et al 2012 We hypothesize the maternal microbiome in mothers will directly affect the development of the infants microbiome and adiposity during the first year of life.

Time Overview of Specific Aims Pregnancy 4 months Maternal Phenotype NW OW/Ob T2D GDM SA1 Maternal Microbiome Stool Human Milk (HM) SA2 Infant Microbiome Stool SA3 Infant Adiposity

How Do You Measure the Microbiome? Target Highly Conserved Bacterial Gene Illumina MiSeq Personal Sequencer Who is present? DNA samples Target All Bacterial Genomes What are they doing?

Moving toward a Personalized Medicine? But has sequencing the human genome improved health?

Five Domains of Genomic Research Green. 2011. Nature 470, 204-213

Bioinformatics and Computational Biology to the Rescue? Data Analysis- existing tools are becoming inadequate to analyze data. Data Integration- Need to harmonize disparate data types. Visualization- Need to accommodate multi-dimensional data. Computational Tools and Infrastructure- storage, capacity, privacy. Training- Need biologist, computer science, informatics, data science...... Green. 2011. Nature 470, 204-213

Recommended Reading

Special Thanks to: My mentors Jacob E. Friedman, PhD Daniel N. Frank, PhD Stephanie A. Santorico, PhD Linda A. Barbour, MD, MSPH Dana Dabelea, MD, PhD Funding Support: ADA/Glaxo-SmithKline T32-HD07186 Questions?