Big data in cancer research : DNA sequencing and personalised medicine

Similar documents
Worldwide Collaborations in Molecular Profiling

Cancer Genomics: What Does It Mean for You?

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

Next Generation Sequencing

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

A leader in the development and application of information technology to prevent and treat disease.

How can we generate economic value from personalized medicine and big data analysis?

How To Change Medicine

The Human Genome Project. From genome to health From human genome to other genomes and to gene function Structural Genomics initiative

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Genomic Medicine The Future of Cancer Care. Shayma Master Kazmi, M.D. Medical Oncology/Hematology Cancer Treatment Centers of America

2019 Healthcare That Works for All

Dr Alexander Henzing

Year 10: The transmission of heritable characteristics from one generation to the next involves DNA

DNA Sequencing & The Human Genome Project

The 100,000 genomes project

Integrating Bioinformatics, Medical Sciences and Drug Discovery

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Intro to Bioinformatics

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE

Sommaire projets sélectionnés mesure 29: Soutien à la recherche translationnelle

The National Institute of Genomic Medicine (INMEGEN) was

Personalized Medicine and IT

Summary of Discussion on Non-clinical Pharmacology Studies on Anticancer Drugs

BSc (Hons) Biology (Minor: Forensic Science or Marine & Coastal Environmental Science)/MSc Biology SC516 (Subject to Approval) SC516

The NGS IT notes. George Magklaras PhD RHCE

Next Generation Sequencing: Technology, Mapping, and Analysis

TRACKS GENETIC EPIDEMIOLOGY

Balancing Big Data for Security, Collaboration and Performance

Building a Collaborative Informatics Platform for Translational Research: Prof. Yike Guo Department of Computing Imperial College London

HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD

TECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298

Next generation DNA sequencing technologies. theory & prac-ce

Automated and Scalable Data Management System for Genome Sequencing Data

SEQUENCING. From Sample to Sequence-Ready

Master of Science in BIOINFORMATICS. > information. > insight. > innovation

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

DNA Sequencing and Personalised Medicine

How To Understand The Science Of Genomics

History of DNA Sequencing & Current Applications

Opportunities and Challenges in Translating Novel Discoveries into Useful Clinical Tests

Individualizing Your Lung Cancer Care: Informing Decisions Through Biomarker Testing

Genetic diagnostics the gateway to personalized medicine

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS

THE SIDNEY KIMMEL COMPREHENSIVE CANCER CENTER AT JOHNS HOPKINS

Genetic Testing in Research & Healthcare

Oncology Insights Enabled by Knowledge Base-Guided Panel Design and the Seamless Workflow of the GeneReader NGS System

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

Médecine de précision médecine personnalisée en Oncologie. Fabien Calvo, Directeur Recherche et Innovation, INCa, Directeur ITMO Cancer, AVIESAN

Big Data jako součást našeho života. Zdenek Panec: June, 2015

An Introduction to Genomics and SAS Scientific Discovery Solutions

Putting Genomes in the Cloud with WOS TM. ddn.com. DDN Whitepaper. Making data sharing faster, easier and more scalable

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille

Programme Specification ( ): MSc in Bioinformatics and Computational Genomics

Information for patients and the public and patient information about DNA / Biobanking across Europe

Biology & Big Data. Debasis Mitra Professor, Computer Science, FIT

Biotechnology and Life Science Marketing Services Mailing List and Data Card Order Form

How Can Institutions Foster OMICS Research While Protecting Patients?

G E N OM I C S S E RV I C ES

Technology funding opportunities at the National Cancer Institute

Big Data Trends A Basis for Personalized Medicine

Future Directions in Clinical Research. Karen Kelly, MD Associate Director for Clinical Research UC Davis Cancer Center

Big Data and the Data Lake. February 2015

Patient Centricity and the Changing Landscape of Healthcare

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime

Molecular markers and clinical trial design parallels between oncology and rare diseases?

Personalized medicine in China s healthcare system

Institutional Partnership Program

High-throughput sequencing and big data: implications for personalized medicine?

European registered Clinical Laboratory Geneticist (ErCLG) Core curriculum

THE ROLE OF BIG DATA IN HEALTH AND BIOMEDICAL RESEARCH. John Quackenbush Dana-Farber Cancer Institute Harvard School of Public Health

Cancer Patients Urgently Need Effective, Genetically-Targeted Treatments

Big Data so what s the big deal? Jevin D. West ischool, University of Washington jevinw@uw.edu

School of Nursing. Presented by Yvette Conley, PhD

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

NIH/NIGMS Trainee Forum: Computational Biology and Medical Informatics at Georgia Tech

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

Degree Level Expectations, Learning Outcomes, Indicators of Achievement and the Program Requirements that Support the Learning Outcomes

Integrated Biomedical and Clinical Research Informatics for Translational Medicine and Therapeutics

Cardiff University MSc in Bioinformatics - MSc in Genetic Epidemiology & Bioinformatics

Vision for the Cohort and the Precision Medicine Initiative Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Precision

Clinical Research Infrastructure

Prostate Cancer Guide. A resource to help answer your questions about prostate cancer

Breast Disease Centres

Revision of the Directive 98/79/EC on In Vitro Diagnostic Medical Devices. Response from Cancer Research UK to the Commission August 2010

Transcription:

Big in cancer research : DNA sequencing and personalised medicine Philippe Hupé Conférence BIGDATA 04/04/2013 1 - Titre de la présentation - nom du département émetteur et/ ou rédacteur - 00/00/2005

Deciphering the cancer genome with high-throughput technologies Cancer karyotype Cancer Normal karyotype is a gene disease Sequence the cancer genome (i.e. read its DNA sequence) to : Understand the molecular mechanisms of tumoral progression Tailored the therapy for each patient individually Use high-throughput sequencing methods (Next-Generation sequencing)

30 years ago... the era of DNA sequencing Walter Gilbert Harvard Nobel Laureate, 1980 Co-inventor with Frederick Sanger of the eponymic DNA sequencing method in 1977 I expect that within a few years, our technology will be able to sequence one megabase/technician-year. At that rate 100 technicians could sequence the genome in 30 years. An effort to improve the technology over a 10-year period should raise the rate by a factor of 10. The Scientist. October 20. 1986

Evolution of sequencing technologies and cost decreasing Year Genome 2003 HGP 2007 Venter 2008 Watson 2009 Cost $ Duration Technology Nb. of scientists 2,700,000,000 13 years Sanger 2,800 100,000,000 4 years Sanger 31 Roche 454 27 1,500,000 4.5 months 50,000 4 weeks Helicos 3 Sources: Pushkarev et al. (2009), Wadman et al. (2008) Roche 454 Illumina Solid Helicos In 2013, around 5000$ to sequence a human genome in one week with one technician (1500 times faster than Gilbert's prediction) Toward the 1000$ genome

Data tsunami in cancer research Low cost sequencing + Availability to every lab = Cost is divided by 2 in : CPU - Moore's law: 18 months Storage - Kryder's law : 12-14 months Network - Butter's law : 9 months NGS' law : 5 month informatic challenges

Next-generation sequencing... some figures... Sequencing with Illumina Hiseq 2500 : 6 billions of sequences: 1 sequence = 100 bases (A, T, C, G) 1 experiment = 600 billions of bases = 200,000 Les Misérables 1Tb of (per week) Human genome = 3 billions of bases = 1,000 Les Misérables Reference human genome (known sequence) = dictionnary Cancer genome = wrong copy the the dictionnary In cancer, genes = words contains mutation = mistake gene1 = GIRAFFE gene1 = GILAFFE Cancer creates new words = fusion genes gene1 = GIRAFFE, genes2 = ZEBRA new gene = GIBRA The 6 billions of sequences will be compared to the reference genome to find the mutations and fusion genes taking into account the fact that the sequencer itself makes error when reading the sequence

Extraction of the biological signal from the raw Development of algorithms and statistical methods Interdisciplinary work with bioinformaticians, informaticians, biologists, mathematiciens, statisticians and algorithmists HPC infrastructure Pieces of the cancer genome CGAGCTG ACGAGCT TCCTAGC GCTCCTA TTTACGA AGCTCCT TTTACGA AGCTCCT ACGACTT ACTACGA GGCCAAC CGGCCAA AGCTGCG CGAGCTG CTACGAG CATCTAC Reference Genome Sequence = dictionnary A C T A C G A C T C T A C G A G C A T C TA C G A GC T A C T A G C G A T C A C G A G C T G C G A G C A A C G GC CA A C Mutations

Visualisation of the significant fusions Intra-chromosome fusions Intra-chromosome fusions Source: MCF-7 breast cancer cell line, Hampton et al., Genome Research 2009

Application to personalised medicine: the SHIVA clinical trial molecularly targeted therapy >? conventional therapy Molecular profile Molecular abnormality Targeted agent Targeted agent Chemotherapy Chemotherapy Chemotherapy Targeted agent Targeted agent Targeted agent Targeted agent compare the efficacy of molecularly targeted therapy based on tumor molecular profiling versus conventional therapy in patients with refractory cancer

SHIVA clinical trial: the workflow Patient s inclusion Shipment to CRB biopsy clinic Validation of amplified/deleted genes by IHC 4 weeks Shipment to pathology Shipment of DNA to Affymetrix platform DNA extraction Affymetrix Cytoscan HD IHC RO/RP/RA Shipment of DNA to sequencing platform Sequencing Ion Torrent Bioinformatics integration List of amplified/ deleted genes Bioinformatics analysis: detection of amplified/deleted genes Bioinformatics analysis: detection of mutated genes Elaboration of a report that is sent to the Molecular Biology Board Therapeutic decision

The therapeutic decision is based on a report with the list of molecular abnormalities Simple decision rules: If STK11 is mutated targeted therapy = everolimus Other simple rules are used for other targeted therapies Cancer biology is much more complex and these naive rules need to be improved

Cancer is a complex disease Multiple biological layers Interactions between chemical species The multidimensional nature of the cancer (genome, proteome, epigenome, kinome, etc.) has to be considered to unravel the complexity of the disease. Mathematical models and computational systems biology are definitely needed to improve current decision rules and understand the emergent properties of cancer cells. In order to perfom such integrative analyses with sophisticated mathematical models, the integration of these multidimensional informations within an efficent information system is required.

Data integration is a major challenge in cancer research Private Medical Copy Number images Public Clinical NGS MS Gene expression Phenotyping Biobank Reactome TCGA CCLE ICGC RPPA A large Volume of patients' is disseminated accross a large Variety of bases which increase in size at a huge Velocity. In order to extract most of the hidden Value from these we must face challenges at : the technical level : develop a powerful informatic architecture the organisational and management levels : define the procedures to collect with hightest confidence and quality the scientific level : create sophisticated mathematical models to predict the disease evolution and patient's risk At Institut Curie we are currently building an information system to fully integrate all the molecular, biological and clinical

Can we dream of an online prediction system to help therapeutic prediction? Private Public wrapper LIMS NGS wrapper LIMS RPPA wrapper Reactome wrapper...... Every day, for several patients, information are collected : wrapper Gene expression LIMS Integrative analysis aim at building signatures to predict disease evolution (e.g. risk of metastatis) Clinical Centralised bioinformatics base Virtual base pathological complete response survival response to therapy molecular profiles etc. Therapeutic decision Re-evaluate prediction rules in real-time taking into account these new informations Apply online machine learning techniques Prediction of pcr New patient Training math models Observed pcr... time

Towards P4 medicine P4 medecine was coined by Leroy Hood (president of the Institute of System Biology) The practise of medicine is mainly reactive, i.e. the physician reacts to the disease state of the patient and little is done to prevent the occurrence of the disease. Predictive medicine was first introduced by Jean Dausset (Nobel prize in medicine, 1980). P4 medicine : Predictive : consider the genetic background of the individual and his environment Preventive : adapting lifestyle, traking preventing drugs Personalised : tailored the treatment to the unique feature of the individual (such as patient's genetic background, tumour's genetic and epigenetic landscape, life environment) Parcipatory : many options about healthcare which require in-depth exchanges between the indivudual and his physician P4 medicine = manage patient'health instead of manage a patient's disease

Big basket with a large variety of

Data integration + mathematical models leverage new information

Bienvenue à GATTACA