SAP HANA Enabling Genome Analysis

Size: px
Start display at page:

Download "SAP HANA Enabling Genome Analysis"

Transcription

1 SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC

2 Outline Use cases Genomics review Challenges in genomic analysis SAP HANA as the solugon Rethinking the genomics pipeline Vision for the future

3 Use Case 1: Clinician IdenGfy Clinically AcGonable GeneGc Variants (e.g. Causing Tumor FormaGon) in Order to Deliver Personalized Medical Treatment Needs: Real- Time Comparison of Variants to Assess Causal Ones Access to all PaGent- Specific Data AnyGme and Anywhere 3

4 Use Case 2: Researcher IdenGfy Causal Variants or MutaGons in Cohorts (> 10,000 Individuals) Suffering from Diseases of Interest, e.g. AuGsm Needs: Comparison of Variants in Diseased and Healthy Cohorts Flexible Queries to Verify Hypotheses in Real- Time 4

5 What is the Genome? GENOMICS Today ~3500 known diseases caused by DNA changes

6 GeneEc Variants Chromosomes are like chapters in a book Genes are like sentences in a chapter GeneEc variaeon (or mutagons) are like misspelled words or missing sentences Any two individuals are 99.9% idengcal in their DNA The 0.1% of unique DNA is what makes us different

7 Human Genome The engre set of genegc informagon is called our genome A single genome consists of 3.2 billion base- pairs of DNA, spread across 23 chromosome pairs. Sex chromosomes

8 Why is this a big data problem?

9 Why is this a big data problem? Sboner et al Genome Biology 2011

10 Genomics Data & SAP HANA How big? Tens to hundreds of billions of records Tens to hundreds of terabytes of genome sequence data Need for speed? Clinical Environments Premature & New- born babies Researchers InteracGve speed Why HANA? Speed In- Memory, Column Store Efficient Caching, Compression, Late MaterializaGon VersaGlity SQL Language ApplicaGon Builder

11 SAP HANA Due to the Power of MathemaGcs and Distributed CompuGng, SAP HANA can Predictably Complete any InformaGon Processing Task, However Complex, Within a Given Time- Window. Scanning 3MB/msec/core InserGng 1.5M Records/sec AggregaGng 12.5M Records/sec/core 11

12 SAP HANA + 12

13 Genomics Pipeline Sequencing Service/Lab e.g. Biologist ComputaGonal Pipeline e.g. BioinformaGcian ComputaGonal Analysis e.g. Clinicians and Researchers Sequencing Alignment Variant Calling AnnotaEon and Analysis PaGent Samples Raw DNA Reads Mapped Genome Discovered Variants Follow- up and ValidaGon

14 Genomics Pipeline Sequencing Service/Lab e.g. Biologist ComputaGonal Pipeline e.g. BioinformaGcian ComputaGonal Analysis e.g. Clinicians and Researchers Sequencing Alignment Variant Calling AnnotaEon and Analysis PaGent Samples Raw DNA Reads Mapped Genome Discovered Variants Follow- up and ValidaGon Numerous open-source/commercial tools for alignment Common tools: BWA-SW & SOAP Raw DNA reads aligned to human reference genome Algorithms must be tolerant to slight variations in reads (from the reference) SLOW process

15 Genomics Pipeline Sequencing Service/Lab e.g. Biologist ComputaGonal Pipeline e.g. BioinformaGcian ComputaGonal Analysis e.g. Clinicians and Researchers Sequencing Alignment Variant Calling AnnotaEon and Analysis PaGent Samples Raw DNA Reads Mapped Genome Discovered Variants Follow- up and ValidaGon Faster BWA-SW 28.3h SAP HANA 3.6h Higher Accuracy BWA-SW 0.53% misaligned SAP HANA 0.35% misaligned BWA-SW 0.34% unaligned SAP HANA 0.14% unaligned

16 Genomics Pipeline Sequencing Service/Lab e.g. Biologist ComputaGonal Pipeline e.g. BioinformaGcian ComputaGonal Analysis e.g. Clinicians and Researchers Sequencing Alignment Variant Calling AnnotaEon and Analysis PaGent Samples Raw DNA Reads Mapped Genome Discovered Variants Follow- up and ValidaGon Identifying and analyzing frequency of variants Identifying variant leading to condition of interest Searching through literature databases for info on variant of interest

17 AnnotaEon & Analysis Output of variant calling commonly stored in Variant Call Format (VCF) Contains: PosiGons and states of variants idengfied Quality score of each variant AddiGonal meta- data for each variant AnnotaGon - common query: Report SNPs (Single NucleoGde Polymorphisms) Failing Quality Control Common tool: UCSC Genome Browser

18 AnnotaEon & Analysis Analysis common queries: Compute the alternagve allele frequency for each variant in a genomic region (Chromosome 1, posigons ) Compute the total number of missing genotypes for each individual Common tool: VCFtools

19 Genomics Pipeline Sequencing Service/Lab e.g. Biologist ComputaGonal Pipeline e.g. BioinformaGcian ComputaGonal Analysis e.g. Clinicians and Researchers Sequencing Alignment Variant Calling AnnotaEon and Analysis PaGent Samples Raw DNA Reads Mapped Genome Discovered Variants Follow- up and ValidaGon Report SNPs (Single NucleoGde Polymorphisms) Failing Quality Control UCSC sec SAP HANA 1.25 sec Compute the AlternaGve Allele Frequency for Each Variant in a Genomic Region (Chromosome 1, PosiGons 100, ,000) VCFtools 259 sec SAP HANA 0.43 sec Compute the Total Number of Missing Genotypes for Each Individual VCFtools 548 sec SAP HANA 2 sec 82x faster 600x faster 270x faster

20 Total # of Missing Genotypes Sample # SNP Genotypes 1 A/A A/T G/G A/A 2 A/A./../. A/A 3 A/C./../. A/A 4 A/A./../. A/A 5 A/C./../. C/C 6?./../. C/C 7 C/C A/T G/G C/C 8 C/C A/T G/T C/C 9 C/C./../. C/C 10 C/C./../. A/C Compute the Total Number of Missing Genotypes for Each Individual VCFtools 548 sec SAP HANA 2 sec 270x faster./. = Missing Genotype SNPs with high rate of missingness poteneal problem Source: Shaila Musharoff / Bustamante Lab / Stanford

21 Moving Forward Sequencing Service/Lab e.g. Biologist ComputaGonal Pipeline e.g. BioinformaGcian ComputaGonal Analysis e.g. Clinicians and Researchers Sequencing Alignment Variant Calling AnnotaEon and Analysis PaGent Samples Raw DNA Reads Mapped Genome Discovered Variants Follow- up and ValidaGon SAP HANA to contribute to all parts of the pipeline

22 The Future Enable Clinicians to: Make Evidence- Based Therapy Decisions at the PaGent s Bed Supervise High- Risk PaGents to Prevent Emergencies Enable Researchers to: InvesGgate the Genomes of Millions of High- Risk PaGents on a Cluster < 10M USD Analyze the Results in Real- Time 22

23 Pathway A molecular pathway is a signaling cascade in a cell with proteins as key components Drug Compound designed to cure diseases METABOLOMICS PROTEOMICS TRANSCRIPTOMICS GENOMICS

24 Vision for Personalized Medicine InformaGon and Feedback within the Window of Opportunity PaGents Doctors Insurers Researchers Real- Time Data Capture and Analysis SAP HANA Healthcare Plaqorm Genomics Electronic Medical Records AnnotaGons... All Relevant Medical InformaGon

25

26 THANK YOU FOR PARTICIPATING Please provide feedback on this session by complegng a short survey via the event mobile applicagon. SESSION CODE: 3503 For ongoing educaeon on this area of focus, visit

How Real-time Analysis turns Big Medical Data into Precision Medicine?

How Real-time Analysis turns Big Medical Data into Precision Medicine? Medical Data into Dr. Matthieu-P. Schapranow GLOBAL HEALTH, Rome, Italy August 27, 2014 Important things first: Where to find additional information? Online: Visit http://we.analyzegenomes.com for latest

More information

High Performance Compu2ng Facility

High Performance Compu2ng Facility High Performance Compu2ng Facility Center for Health Informa2cs and Bioinforma2cs Accelera2ng Scien2fic Discovery and Innova2on in Biomedical Research at NYULMC through Advanced Compu2ng Efstra'os Efstathiadis,

More information

Towards Integrating the Detection of Genetic Variants into an In-Memory Database

Towards Integrating the Detection of Genetic Variants into an In-Memory Database Towards Integrating the Detection of Genetic Variants into an 2nd International Workshop on Big Data in Bioinformatics and Healthcare Oct 27, 2014 Motivation Genome Data Analysis Process DNA Sample Base

More information

The National Institute of Genomic Medicine (INMEGEN) was

The National Institute of Genomic Medicine (INMEGEN) was Genome is...... the complete set of genetic information contained within all of the chromosomes of an organism. It defines the particular phenotype of an individual. What is Genomics? The study of the

More information

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine

More information

What s New in Pathway Studio Web 11.1

What s New in Pathway Studio Web 11.1 1 1 What s New in Pathway Studio Web 11.1 Elseiver is pleased to announce the release of Pathway Studio Web 11.1 for all database subscriptions (Mammal, Mammal+ChemEffect+DiseaseFx, Plant). This release

More information

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo Preparing the scenario for the use of patient s genome sequences in clinic Joaquín Dopazo Computational Medicine Institute, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB),

More information

Cancer Genomics: What Does It Mean for You?

Cancer Genomics: What Does It Mean for You? Cancer Genomics: What Does It Mean for You? The Connection Between Cancer and DNA One person dies from cancer each minute in the United States. That s 1,500 deaths each day. As the population ages, this

More information

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

Big Data for Population Health and Personalised Medicine through EMR Linkages

Big Data for Population Health and Personalised Medicine through EMR Linkages Big Data for Population Health and Personalised Medicine through EMR Linkages Zheng-Ming CHEN Professor of Epidemiology Nuffield Dept. of Population Health, University of Oxford Big Data for Health Policy

More information

School of Nursing. Presented by Yvette Conley, PhD

School of Nursing. Presented by Yvette Conley, PhD Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression

More information

How Can Institutions Foster OMICS Research While Protecting Patients?

How Can Institutions Foster OMICS Research While Protecting Patients? IOM Workshop on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials How Can Institutions Foster OMICS Research While Protecting Patients? E. Albert Reece, MD, PhD, MBA Vice

More information

Predictive Analytics and the Big Data Challenge

Predictive Analytics and the Big Data Challenge Predictive Analytics and the Big Data Challenge Andrei Grigoriev, MBA, MSc Sr. Director, Custom Development EMEA SAP Nice, April 2014 What is Predictive Analytics Predictive analytics is about analyzing

More information

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

Integration of genomic data into electronic health records

Integration of genomic data into electronic health records Integration of genomic data into electronic health records Daniel Masys, MD Affiliate Professor Biomedical & Health Informatics University of Washington, Seattle Major portion of today s lecture is based

More information

Cloud-Based Big Data Analytics in Bioinformatics

Cloud-Based Big Data Analytics in Bioinformatics Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large

More information

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM Aniket Bochare - aniketb1@umbc.edu CMSC 601 - Presentation Date-04/25/2011 AGENDA Introduction and Background Framework Heterogeneous

More information

Version 5.0 Release Notes

Version 5.0 Release Notes Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology

University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology Programme Structure - the MSc outcome will require 180 credits total (full-time only) - 60

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

Data Analysis for Ion Torrent Sequencing

Data Analysis for Ion Torrent Sequencing IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page

More information

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins The Future of the Electronic Health Record Gerry Higgins, Ph.D., Johns Hopkins Topics to be covered Near Term Opportunities: Commercial, Usability, Unification of different applications. OMICS : The patient

More information

Acceleration for Personalized Medicine Big Data Applications

Acceleration for Personalized Medicine Big Data Applications Acceleration for Personalized Medicine Big Data Applications Zaid Al-Ars Computer Engineering (CE) Lab Delft Data Science Delft University of Technology 1" Introduction Definition & relevance Personalized

More information

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

An example of bioinformatics application on plant breeding projects in Rijk Zwaan An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on

More information

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio

More information

Integration of Genetic and Familial Data into. Electronic Medical Records and Healthcare Processes

Integration of Genetic and Familial Data into. Electronic Medical Records and Healthcare Processes Integration of Genetic and Familial Data into Electronic Medical Records and Healthcare Processes By Thomas Kmiecik and Dale Sanders February 2, 2009 Introduction Although our health is certainly impacted

More information

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University HiCOMB 2014 May 19 th, Phoenix, Arizona 1 Outline

More information

A leader in the development and application of information technology to prevent and treat disease.

A leader in the development and application of information technology to prevent and treat disease. A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today

More information

Next Generation Sequencing: Technology, Mapping, and Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took

More information

Simplifying Data Interpretation with Nexus Copy Number

Simplifying Data Interpretation with Nexus Copy Number Simplifying Data Interpretation with Nexus Copy Number A WHITE PAPER FROM BIODISCOVERY, INC. Rapid technological advancements, such as high-density acgh and SNP arrays as well as next-generation sequencing

More information

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples DATA Sheet Single-Cell DNA Sequencing with the C 1 Single-Cell Auto Prep System Reveal hidden populations and genetic diversity within complex samples Single-cell sensitivity Discover and detect SNPs,

More information

The Need for BIG DATA Processing

The Need for BIG DATA Processing TECHNOLOGY LIKE NEVER BEFORE EBDC Dresden 7. October 2014 The Need for BIG DATA Processing Petra Streng Solution Manager SAP SE Industry Business Unit Life Sciences Markus Tempel Global Lead Big Data Analytics

More information

Biomedical Big Data and Precision Medicine

Biomedical Big Data and Precision Medicine Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types

More information

SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime

SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime Stephan Schindewolf, SAP SE, July 13, 2015 Facts per Decision Need Decision

More information

How To Change Medicine

How To Change Medicine P4 Medicine: Personalized, Predictive, Preventive, Participatory A Change of View that Changes Everything Leroy E. Hood Institute for Systems Biology David J. Galas Battelle Memorial Institute Version

More information

Human Genome Organization: An Update. Genome Organization: An Update

Human Genome Organization: An Update. Genome Organization: An Update Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion

More information

Integrating Bioinformatics, Medical Sciences and Drug Discovery

Integrating Bioinformatics, Medical Sciences and Drug Discovery Integrating Bioinformatics, Medical Sciences and Drug Discovery M. Madan Babu Centre for Biotechnology, Anna University, Chennai - 600025 phone: 44-4332179 :: email: madanm1@rediffmail.com Bioinformatics

More information

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider Daniel Masys, M.D. Affiliate Professor Biomedical and Health Informatics University of Washington, Seattle

More information

Module 1. Sequence Formats and Retrieval. Charles Steward

Module 1. Sequence Formats and Retrieval. Charles Steward The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.

More information

Genetic diagnostics the gateway to personalized medicine

Genetic diagnostics the gateway to personalized medicine Micronova 20.11.2012 Genetic diagnostics the gateway to personalized medicine Kristiina Assoc. professor, Director of Genetic Department HUSLAB, Helsinki University Central Hospital The Human Genome Packed

More information

Attacking the Biobank Bottleneck

Attacking the Biobank Bottleneck Attacking the Biobank Bottleneck Professor Jan-Eric Litton BBMRI-ERIC BBMRI-ERIC Big Data meets research biobanking Big data is high-volume, high-velocity and highvariety information assets that demand

More information

Understanding West Nile Virus Infection

Understanding West Nile Virus Infection Understanding West Nile Virus Infection The QIAGEN Bioinformatics Solution: Biomedical Genomics Workbench (BXWB) + Ingenuity Pathway Analysis (IPA) Functional Genomics & Predictive Medicine, May 21-22,

More information

<Insert Picture Here> The Evolution Of Clinical Data Warehousing

<Insert Picture Here> The Evolution Of Clinical Data Warehousing The Evolution Of Clinical Data Warehousing Srinivas Karri Principal Consultant Agenda Value of Clinical Data Clinical Data warehousing & The Big Data Challenge

More information

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti Data deluge (and its applications) Prologue Data is becoming cheaper and cheaper to produce and store Driving mechanism is parallelism on sensors, storage, computing Data directly produced are complex

More information

Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator

Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator Why is the NGS data processing a big challenge? Computation cannot keep up with the Biology. Source: illumina

More information

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur

More information

Cystic Fibrosis Webquest Sarah Follenweider, The English High School 2009 Summer Research Internship Program

Cystic Fibrosis Webquest Sarah Follenweider, The English High School 2009 Summer Research Internship Program Cystic Fibrosis Webquest Sarah Follenweider, The English High School 2009 Summer Research Internship Program Introduction: Cystic fibrosis (CF) is an inherited chronic disease that affects the lungs and

More information

Outline. Personal profile & research interests. Rheumatology research in Ireland. Current standing. Future plans

Outline. Personal profile & research interests. Rheumatology research in Ireland. Current standing. Future plans Outline Personal profile & research interests Rheumatology research in Ireland Current standing Future plans Personal profile 1983 MB Queens University 1990-3 ARUK Clinical Research Fellowship 1990-93

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE AP Biology Date SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE LEARNING OBJECTIVES Students will gain an appreciation of the physical effects of sickle cell anemia, its prevalence in the population,

More information

Milk protein genetic variation in Butana cattle

Milk protein genetic variation in Butana cattle Milk protein genetic variation in Butana cattle Ammar Said Ahmed Züchtungsbiologie und molekulare Genetik, Humboldt Universität zu Berlin, Invalidenstraβe 42, 10115 Berlin, Deutschland 1 Outline Background

More information

Vision for the Cohort and the Precision Medicine Initiative Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Precision

Vision for the Cohort and the Precision Medicine Initiative Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Precision Vision for the Cohort and the Precision Medicine Initiative Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Precision Medicine Initiative: Building a Large U.S. Research Cohort

More information

Single Nucleotide Polymorphisms (SNPs)

Single Nucleotide Polymorphisms (SNPs) Single Nucleotide Polymorphisms (SNPs) Additional Markers 13 core STR loci Obtain further information from additional markers: Y STRs Separating male samples Mitochondrial DNA Working with extremely degraded

More information

PREDA S4-classes. Francesco Ferrari October 13, 2015

PREDA S4-classes. Francesco Ferrari October 13, 2015 PREDA S4-classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.

More information

Personalized Medicine and IT

Personalized Medicine and IT Personalized Medicine and IT Data-driven Medicine in the Age of Genomics www.intel.com/healthcare/bigdata Ketan Paranjape General Manager, Life Sciences Intel Corp. @Portlandketan 1 The Central Dogma of

More information

University Uses Business Intelligence Software to Boost Gene Research

University Uses Business Intelligence Software to Boost Gene Research Microsoft SQL Server 2008 R2 Customer Solution Case Study University Uses Business Intelligence Software to Boost Gene Research Overview Country or Region: Scotland Industry: Education Customer Profile

More information

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013 ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and

More information

Information for patients and the public and patient information about DNA / Biobanking across Europe

Information for patients and the public and patient information about DNA / Biobanking across Europe Information for patients and the public and patient information about DNA / Biobanking across Europe BIOBANKING / DNA BANKING SUMMARY: A biobank is a store of human biological material, used for the purposes

More information

HIV NOMOGRAM USING BIG DATA ANALYTICS

HIV NOMOGRAM USING BIG DATA ANALYTICS HIV NOMOGRAM USING BIG DATA ANALYTICS S.Avudaiselvi and P.Tamizhchelvi Student Of Ayya Nadar Janaki Ammal College (Sivakasi) Head Of The Department Of Computer Science, Ayya Nadar Janaki Ammal College

More information

Core Facility Genomics

Core Facility Genomics Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray

More information

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the Chapter 5 Analysis of Prostate Cancer Association Study Data 5.1 Risk factors for Prostate Cancer Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the disease has

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS

IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS 29 OCTOBER 2015 DR. DIRK J. EVERS BACKGROUND TreatmentMAP

More information

What is Pharmacogenomics? Personalization of Medications for You! Michigan State Medical Assistants Conference May 6, 2006

What is Pharmacogenomics? Personalization of Medications for You! Michigan State Medical Assistants Conference May 6, 2006 What is Pharmacogenomics? Personalization of Medications for You! Michigan State Medical Assistants Conference May 6, 2006 Debra Duquette, MS, CGC Genomics Coordinator Epidemiology Services Division Department

More information

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Molecular typing of VTEC: from PFGE to NGS-based phylogeny Molecular typing of VTEC: from PFGE to NGS-based phylogeny Valeria Michelacci 10th Annual Workshop of the National Reference Laboratories for E. coli in the EU Rome, November 5 th 2015 Molecular typing

More information

LifeScope Genomic Analysis Software 2.5

LifeScope Genomic Analysis Software 2.5 USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use

More information

Custom TaqMan Assays For New SNP Genotyping and Gene Expression Assays. Design and Ordering Guide

Custom TaqMan Assays For New SNP Genotyping and Gene Expression Assays. Design and Ordering Guide Custom TaqMan Assays For New SNP Genotyping and Gene Expression Assays Design and Ordering Guide For Research Use Only. Not intended for any animal or human therapeutic or diagnostic use. Information in

More information

How does genetic testing work?

How does genetic testing work? How does genetic testing work? What is a genetic test? A genetic test looks at to find changes (variants) that cause disease or put you at greater risk to develop disease. DNA is the code our bodies use

More information

Lessons from the Stanford HIV Drug Resistance Database

Lessons from the Stanford HIV Drug Resistance Database 1 Lessons from the Stanford HIV Drug Resistance Database Bob Shafer, MD Department of Medicine and by Courtesy Pathology (Infectious Diseases) Stanford University Outline 2 Goals and rationale for HIVDB

More information

G E N OM I C S S E RV I C ES

G E N OM I C S S E RV I C ES GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E

More information

Text file One header line meta information lines One line : variant/position

Text file One header line meta information lines One line : variant/position Software Calling: GATK SAMTOOLS mpileup Varscan SOAP VCF format Text file One header line meta information lines One line : variant/position ##fileformat=vcfv4.1! ##filedate=20090805! ##source=myimputationprogramv3.1!

More information

PHYSIOLOGY. THE STUDY OF LIFE, and how genes, cells, tissues, and organisms function.

PHYSIOLOGY. THE STUDY OF LIFE, and how genes, cells, tissues, and organisms function. PHYSIOLOGY THE STUDY OF LIFE, and how genes, cells, tissues, and organisms function. What is PHYSIOLOGY? Physiologists teach and mentor students in both the classroom and laboratory. Physiologists apply

More information

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences Personalized Medicine: Humanity s Ultimate Big Data Challenge Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences 2012 Oracle Corporation Proprietary and Confidential 2 3 Humanity

More information

Major US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network

Major US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network Major US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network Dan Roden Member, National Advisory Council For Human Genome Research Genomic Medicine Working Group

More information

Genomes and SNPs in Malaria and Sickle Cell Anemia

Genomes and SNPs in Malaria and Sickle Cell Anemia Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing

More information

Genetics 1. Defective enzyme that does not make melanin. Very pale skin and hair color (albino)

Genetics 1. Defective enzyme that does not make melanin. Very pale skin and hair color (albino) Genetics 1 We all know that children tend to resemble their parents. Parents and their children tend to have similar appearance because children inherit genes from their parents and these genes influence

More information

Genetic Testing in Research & Healthcare

Genetic Testing in Research & Healthcare We Innovate Healthcare Genetic Testing in Research & Healthcare We Innovate Healthcare Genetic Testing in Research and Healthcare Human genetic testing is a growing science. It is used to study genes

More information

Investigating the genetic basis for intelligence

Investigating the genetic basis for intelligence Investigating the genetic basis for intelligence Steve Hsu University of Oregon and BGI www.cog-genomics.org Outline: a multidisciplinary subject 1. What is intelligence? Psychometrics 2. g and GWAS: a

More information

SNPbrowser Software v3.5

SNPbrowser Software v3.5 Product Bulletin SNP Genotyping SNPbrowser Software v3.5 A Free Software Tool for the Knowledge-Driven Selection of SNP Genotyping Assays Easily visualize SNPs integrated with a physical map, linkage disequilibrium

More information

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequence Variants

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequence Variants SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequence Variants 1 Dr. Xiuwen Zheng Department of Biostatistics University of Washington Seattle Introduction Thousands of gigabyte

More information

Outcome Data, Links to Electronic Medical Records. Dan Roden Vanderbilt University

Outcome Data, Links to Electronic Medical Records. Dan Roden Vanderbilt University Outcome Data, Links to Electronic Medical Records Dan Roden Vanderbilt University Coordinating Center Type II Diabetes Case Algorithm * Abnormal lab= Random glucose > 200mg/dl, Fasting glucose > 125 mg/dl,

More information

GOBII. Genomic & Open-source Breeding Informatics Initiative

GOBII. Genomic & Open-source Breeding Informatics Initiative GOBII Genomic & Open-source Breeding Informatics Initiative My Background BS Animal Science, University of Tennessee MS Animal Breeding, University of Georgia Random regression models for longitudinal

More information

The M.U.R.D.O.C.K. Study

The M.U.R.D.O.C.K. Study 1 The M.U.R.D.O.C.K. Study Measurement to Understand Reclassification of Disease Of Cabarrus/Kannapolis Jessica Tenenbaum, PhD and many, many others 2 MURDOCK Study Measurement to Understand the Reclassification

More information

SNP Essentials The same SNP story

SNP Essentials The same SNP story HOW SNPS HELP RESEARCHERS FIND THE GENETIC CAUSES OF DISEASE SNP Essentials One of the findings of the Human Genome Project is that the DNA of any two people, all 3.1 billion molecules of it, is more than

More information

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray

More information

Genomic Selection in. Applied Training Workshop, Sterling. Hans Daetwyler, The Roslin Institute and R(D)SVS

Genomic Selection in. Applied Training Workshop, Sterling. Hans Daetwyler, The Roslin Institute and R(D)SVS Genomic Selection in Dairy Cattle AQUAGENOME Applied Training Workshop, Sterling Hans Daetwyler, The Roslin Institute and R(D)SVS Dairy introduction Overview Traditional breeding Genomic selection Advantages

More information

This fact sheet describes how genes affect our health when they follow a well understood pattern of genetic inheritance known as autosomal recessive.

This fact sheet describes how genes affect our health when they follow a well understood pattern of genetic inheritance known as autosomal recessive. 11111 This fact sheet describes how genes affect our health when they follow a well understood pattern of genetic inheritance known as autosomal recessive. In summary Genes contain the instructions for

More information

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen 9-October 2015 Presentation by: Ahmad Alsahaf Research collaborator at the Hydroinformatics lab - Politecnico di

More information

GenBank: A Database of Genetic Sequence Data

GenBank: A Database of Genetic Sequence Data GenBank: A Database of Genetic Sequence Data Computer Science 105 Boston University David G. Sullivan, Ph.D. An Explosion of Scientific Data Scientists are generating ever increasing amounts of data. Relevant

More information

Processing Genome Data using Scalable Database Technology. My Background

Processing Genome Data using Scalable Database Technology. My Background Johann Christoph Freytag, Ph.D. freytag@dbis.informatik.hu-berlin.de http://www.dbis.informatik.hu-berlin.de Stanford University, February 2004 PhD @ Harvard Univ. Visiting Scientist, Microsoft Res. (2002)

More information

Implementation of Pharmacogenomics in Clinical Practice: Barriers and Potential Solutions

Implementation of Pharmacogenomics in Clinical Practice: Barriers and Potential Solutions Molecular Pathology : Principles in Clinical Practice Implementation of Pharmacogenomics in Clinical Practice: Barriers and Potential Solutions KT Jerry Yeo, Ph.D. University of Chicago Email: jyeo@bsd.uchicago.edu

More information

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources Appendix 2 Molecular Biology Core Curriculum Websites and Other Resources Chapter 1 - The Molecular Basis of Cancer 1. Inside Cancer http://www.insidecancer.org/ From the Dolan DNA Learning Center Cold

More information

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical

More information

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis Goal: This tutorial introduces several websites and tools useful for determining linkage disequilibrium

More information

14.3 Studying the Human Genome

14.3 Studying the Human Genome 14.3 Studying the Human Genome Lesson Objectives Summarize the methods of DNA analysis. State the goals of the Human Genome Project and explain what we have learned so far. Lesson Summary Manipulating

More information

Hacking Brain Disease for a Cure

Hacking Brain Disease for a Cure Hacking Brain Disease for a Cure Magali Haas, CEO & Founder #P4C2014 Innovator Presentation 2 Brain Disease is Personal The Reasons We Fail in CNS Major challenges hindering CNS drug development include:

More information