Acceleration for Personalized Medicine Big Data Applications



Similar documents
DNA Mapping/Alignment. Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky

How To Change Medicine

Personalized medicine in China s healthcare system

Infrastructure Matters: POWER8 vs. Xeon x86

Personalized Medicine and IT

Attacking the Biobank Bottleneck

Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim?

Extending the Power of FPGAs. Salil Raje, Xilinx

SAP HANA Enabling Genome Analysis

Integrating Bioinformatics, Medical Sciences and Drug Discovery

14.3 Studying the Human Genome

Big Data Challenges in Bioinformatics

Big Data Science. Prof.dr.ir. Geert-Jan Houben. TU Delft Web Information Systems Delft Data Science KIVI chair Big Data Science

3. NUMBER OF PARTICIPANTS TO BE ENROLLED

Genetic testing. The difference diagnostics can make. The British In Vitro Diagnostics Association

IBM Deep Computing Visualization Offering

Integration of Genetic and Familial Data into. Electronic Medical Records and Healthcare Processes

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Xeon+FPGA Platform for the Data Center

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Big Data Challenges. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

UF EDGE brings the classroom to you with online, worldwide course delivery!

2019 Healthcare That Works for All

Big Data Analytics for Healthcare

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

An FPGA Acceleration of Short Read Human Genome Mapping

Cloud-Based Big Data Analytics in Bioinformatics

DELL s Oracle Database Advisor

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Medical Certification: Bringing genomic microcores to clinical use OI- VF- WP- 011

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

ebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.

Big Data Trends A Basis for Personalized Medicine

Data Center and Cloud Computing Market Landscape and Challenges

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

THE NEXT FRONTIER IN COMPUTING QUANTUM OPTICAL COMPUTING. Presentation For Venture Capital Investment

Manufacturing CUSTOM CHEMICALS AND SERVICES, SUPPORTING SCIENTIFIC ADVANCES FOR HUMAN HEALTH

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Health Informatics Research and Development in Europe

The National Institute of Genomic Medicine (INMEGEN) was

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Big Data and the Data Lake. February 2015

Big Data Performance Growth on the Rise

Big Data Analytics and Healthcare

Hardware and Software

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Concept and Project Objectives

Scientific and Technical Applications as a Service in the Cloud

Next Generation Sequencing: Technology, Mapping, and Analysis

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

FPGA-based MapReduce Framework for Machine Learning

Bricata Next Generation Intrusion Prevention System A New, Evolved Breed of Threat Mitigation

IOS110. Virtualization 5/27/2014 1

Bioinformatics Grid - Enabled Tools For Biologists.

Bio-Informatics Lectures. A Short Introduction

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

History of DNA Sequencing & Current Applications

Big Data Analytics Driving Healthcare Transformation

EMBL Identity & Access Management

Stream Processing on GPUs Using Distributed Multimedia Middleware

The Scientific Data Mining Process

Continuing the MDM journey

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

RevoScaleR Speed and Scalability

Genetic diagnostics the gateway to personalized medicine

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

CURRICULUM GUIDE COMPUTER SCIENCE CERTIFICATES OF COMPLETION

Model-based system-on-chip design on Altera and Xilinx platforms

BIOSCIENCES COURSE TITLE AWARD

Regulatory Issues in Genetic Testing and Targeted Drug Development

Putting IBM Watson to Work In Healthcare

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS

Embedded System Hardware - Processing (Part II)

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format

Contents. Preface Acknowledgements. Chapter 1 Introduction 1.1

high-performance computing so you can move your enterprise forward

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

Digital Catapult. The impact of Big Data in a Connected Digital Economy Future of Healthcare. Mark Wall Big Data & Analytics Leader.

GYAN VIHAR SCHOOL OF ENGINEERING & TECHNOLOGY M. TECH. CSE (2 YEARS PROGRAM)

DataSafe Solutions. Protect your valuable genomic data

Smarter Research. Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research

A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT)

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Introduction to Cloud Computing

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

High Performance Computing Initiatives

Transcription:

Acceleration for Personalized Medicine Big Data Applications Zaid Al-Ars Computer Engineering (CE) Lab Delft Data Science Delft University of Technology 1"

Introduction Definition & relevance Personalized medicine is the customization of healthcare - with medical decisions, practices, and products being tailored to the individual patient. Example of societally critical, highly-demanding big data application domains 2"

Introduction Scientific and societal challenges Exponentially growing data volumes Increasing complexity of analysis Both computational and data challenges 3"

Introduction Scientific and societal challenges Urgent clinical diagnostics, for example Targeted cancer & neo-natal diagnostics! We provide techniques to reduce compute time Cost prohibitive for society More patients & diseases to be treated! We provide techniques to reduce cost COMPUTE"COST " """"""""""""""""COMPUTE"TIME" 4"

Introduction Master class outline Introduction and background Field of personalized medicine Challenges and opportunities Relations to other big data fields Computational big data pipeline Stages of a typical personalized medicine pipeline Methods to reduce computation time Methods to reduce pipeline cost Solution demonstration 5"

Background Field of personalized medicine Vision: P4 medicine medicine that is predictive, preventive, personalized, and participatory 6"

Background Field of personalized medicine Sources of personalized information Measurements of vitals & body data Regular blood, spit, urine, etc. testing Genome data sequencing 7"

Background Field of personalized medicine Measurements of vitals & body data Pros Body is minutely and continuously monitored Corporate support from big industry Cons Use is not yet clear Health risks are not monitored! Not known if applications in health are possible 8"

Background Field of personalized medicine Regular blood, spit, urine, etc. testing Pros Measurement 100s of molecules in body Direct correlation to health risk Cons Still too expensive No specific health advice yet possible! Possible future use if cost becomes manageable 9"

Background Field of personalized medicine Genome data sequencing Pros Detailed knowledge genetic information Known markers to diagnose disease Cons Huge computational effort! Can be used today if computation effort becomes manageable 10"

Background DNA-based diagnostics 11"

Background DNA-based diagnostics DNA"muta<on"results"in" abnormal"cell"behavior" " Some"muta<ons"cause"cells" to"divide"without"control" causing"cancer" Cancer"can"be"diagnosed"by" iden<fying"which"muta<ons" are"in"the"dna"!cancer!diagnos-cs!is!main!use!for!dna!data!today! 12"

Background DNA-based diagnostics 13"

Big data pipeline Computational big data pipeline Three"main"stages" 1. Data"genera<on" Generate"and"store"DNA"data"using" specialized"compression"techniques" 2. Data"analysis" Accelerate"mapping"&"variant"calling" of"gene<c"algorithms"on"hardware" 3. Data"visualiza<on" Understand"the"analyzed"gene<c" data"to"make"clinical"decisions"for" the"pa<ent" GENERATE" Generate"and"store"DNA"data" using"specialized"compression" techniques" ANALYZE" Accelerate"mapping"&"variant" calling"of"gene<c"algorithms" on"hardware" INTERPRET" Understand"the"analyzed" gene<c"data"to"make"clinical" decisions"for"the"pa<ent" 14"

Big data pipeline Data generation " DNA processing passes in 3 stages # Sequence generation # Data analysis # Result interpretation " Sequence generation faces size bottlenecks 10^E7" 10^E6" GENERATE" Generate"and"store"DNA"data" using"specialized"compression" techniques" ANALYZE" Accelerate"mapping"&"variant" calling"of"gene<c"algorithms" on"hardware" 10^E5" 10^E4" 10^E3" 10^E2" 10^E1" 2003" 2004" 2005" 2006" 2007" 2008" 2009" 2010" 2011" Lincoln"D"Stein," The"case"for"cloud"compu<ng"in"genome" informa<cs,"genome"biology,"11:207,"2010." INTERPRET" Understand"the"analyzed" gene<c"data"to"make"clinical" decisions"for"the"pa<ent" 15"

Big data pipeline Data analysis " Growth of throughput of data generation is faster than growth in CPU processing capacity 10^E8" 10^E7" 10^E6" # Growth is exponential # Need for rapidly increasing processing capacity DNA"sequencing"(bp/day)" GENERATE" Generate"and"store"DNA"data" using"specialized"compression" techniques" ANALYZE" Accelerate"mapping"&"variant" calling"of"gene<c"algorithms" on"hardware" 10^E5" 10^E4" 10^E3" CPU"speed"(M"Inst./s)" INTERPRET" Understand"the"analyzed" gene<c"data"to"make"clinical" decisions"for"the"pa<ent" 10^E2" 2003" 2004" 2005" 2006" 2007" 2008" 2009" 2010" 2011" Po-Ru Loh, Michael Baym & Bonnie Berger, Compressive genomics, Nature Biotechnology, 30:627 630, 2012. 16"

Big data pipeline Data interpretation " Relative cost of interpretation is increasing # Number of sequenced genomes increases # Cross referencing multiple genomes to identify correlations # Need for innovative DNA visualization " Sequence generation faces size bottlenecks 100%" 90%" 80%" 70%" 60%" 50%" 40%" 30%" 20%" 10%" 0%" 2012$ 2020$ Genotyping" 2012$ 2020$ Interpreta<on" Ingo"Helbig," Be"literate"when"the"exome"goes"clinical,"hcp:// channelopathist.net/,"june"6,"2012" GENERATE" Generate"and"store"DNA"data" using"specialized"compression" techniques" ANALYZE" Accelerate"mapping"&"variant" calling"of"gene<c"algorithms" on"hardware" INTERPRET" Understand"the"analyzed" gene<c"data"to"make"clinical" decisions"for"the"pa<ent" 17"

Big data pipeline 10^E8" 10^E7" 10^E6" Current solution " Current solution: increasing capacity in local or cloud clusters # Not always the best solution Growth"in"DNA"and"CPU"computa<onal"complexity" DNA"sequencing"(bp/day)" 10^E5" 10^E4" CPU"speed"(M"Inst./s)" 10^E3" 10^E2" 2003" 2004" 2005" 2006" 2007" 2008" 2009" 2010" 2011" Po-Ru Loh, Michael Baym & Bonnie Berger, Compressive genomics, Nature Biotechnology, 30:627 630, 2012. 18"

Big data pipeline CE lab solution: compression " Domain specific compression # Enables high compression rate # Allows reduced infrastructure footprint " Possible transparent compression from and to file system 19"

Big data pipeline CE lab solution: acceleration " Hybrid core computing # Means using dedicated computing chips for specific algorithms # Next to traditional general-purpose CPUs (Intel processors) " Dedicated chips use FPGAs (field programmable gate arrays) like Xilinx " Recreate small compute elements on hardware " Can parallelize the computations tens of times " Becoming mainstream: used by Intel, IBM, Microsoft, Facebook, etc. 20"

Big data pipeline CE lab solution: acceleration " Compare and align nucleotide or protein sequences " Algorithm scores every possible alignment # Cell of matrix compares elements of query and database # Much parallelism, both within & between sequences 21"

Big data pipeline CE lab solution: distribution " Efficient utilization of available hardware resources # Less hardware is used for same algorithms " Tuning of hardware-software system to use case # More parallelism extracted from algorithms Task"P1" Task"S1" Task"S2" Task"S3" " Task"Sn" Task"S1" Task"S2" Task"S3" " Task"Sn" Task"S1" Task"S2" Task"S3" " Task"Sn" Task"P2" Task"P3" " Task"Pn" Task"P1" Task"P2" Task"P3" " Task"Pn" 22"

Big data pipeline CE lab solution: distribution " Higher Performance # 5x to 25x speed gains " Energy Saving # Up to 90% power reduction " Easy to use, program, manage # Standard Linux ecosystem # Transparent to the user " Well suited for Bioinformatics # Inherent parallelism exploited by pipelining # Small data types use logic efficiently 23"

Next steps Delft Data Science research agenda CE Lab provides a holistic approach to optimize big data infrastructure 1. Addressing big data storage limitations Effective compression techniques 2. Addressing big data computational time Acceleration of big data algorithms 3. Addressing big data system cost Effective utilization of system resources Storage"limita<ons" Computa<onal" boclenecks" Infrastructure"cost" op<miza<ons" 24"

Next steps Collaboration opportunities Collaborations on big data infrastructure Work together on industrially relevant challenges Transfer of expert knowledge to organizations CE Lab is leading research in Pipeline-wide performance optimization Integrated system cost optimization Large network of leading technology providers IBM, Intel, Altera, etc. 25"

Next steps Contact for further discussion Contact for further discussion on collaborations or question/feedback Zaid Al-Ars CE Lab / TUDelft Mekelweg 4, 2628 CD Delft Email: z.al-ars@tudelft.nl Web: ce.ewi.tudelft.nl/zaid Tel: 015 27 89097 26"

Next steps Future prospects Genetic analysis has significant potential Personalized medicine Preemptive intervention Trait selection and enhancement Etc. Early detection & cure of diabetes w/ ipop 60 TeraB data 27"

Next steps Solution demonstration 28"