Data Management Tools: practical approaches and lessons learned when scaling up a computing and data environment to keep up with the pace of data
|
|
- Rosemary Green
- 8 years ago
- Views:
Transcription
1 Data Management Tools: practical approaches and lessons learned when scaling up a computing and data environment to keep up with the pace of data intensive research
2 Declaration of Potential Conflicts-of-Interest, Consulting and Corporate Collaborators NHCA Group
3 Scientific research is now data driven, analysis costs now exceed data generation costs, most data goes unanalyzed and the most competitive institutes will be those that embrace informatics as a discipline
4 Computations in the life and medical sciences are unique I Emphasis on symbolic/integer (non-floating point) intense computations (yet with floating point capabilities). Diverse types of computations that are continuously evolving, and demand different hardware, software and compute environment configurations Emphasis on a mix of computing technologies (microprocessor, GPGPU, FPGA) with an objective to build the capability to optimize different codes (or code parts) on the different platforms for maximum performance in a pipeline. Emphasis on scalable, sustainable hardware/software/environmental architectures and installations (including support staff). Emphasis on data intense computations, so significant storage and bandwidth to/from the HPC installation and to/from storage to processors is essential.
5 Computations in the life and medical sciences are unique II Emphasis on providing answers through very simple web interfaces (especially for biomedical researchers, health care providers, or applications that appeal to lay people) by creating or porting applications that demand real-time HPC intense resources. Life and Medical Scientists want to solve cancer, not become programmers. Installation would have readily available en masse data from major public and local databases and a semantic web approach to gathering and accessing the larger data and knowledge bases that are available and essential to extract new knowledge. Typical datasets, such as NextGen Human Genome Sequences and Medical images are TB in size, and some projects are in the PB size.
6 Informatics involves hardware, software, and expertise because scientists want answers should be thought of as an integrated mix that is continuously evolving, and complementary, not competitive Computing hardware (local, centralized/supercomputer centers, cloud) Computing software (many, varied, continuously evolving, few standards, best software becomes proprietary, comparisons of different implementations is biased, and most important there is little funding for sustaining software) Data analysts (many flavors programmers, informaticians, bioinformaticians, statisticians, clinical informaticians, anthropologists.)
7 Local computing is primarily done on a machine we developed: SHADOWFAX A heterogeneous computing environment for data intensive computations ~2,524 CPUs, > 12TB RAM (Dell/Intel) ~27,000 GPUs (nvidia) 8 FPGA hybrid core systems (Convey) ~0.8 PB Disk Arrays (DDN) 100 PB Sun/Oracle tape storage system
8 Local computing is primarily done on a machine we developed: SHADOWFAX With local synchronized copies of major databases: Medline, arxiv, PubMed Central, Genbank, SwissProt, 1,000 Genomes Project, The Cancer Genome Atlas, Wikipedia Designed to meet the needs of applications that demand HPC: deep sequencing assembly and analysis, molecular modeling, simulations, proteomics analysis, text mining, Health IT Deals with vendors greatly controlled cost
9 Data Analysis Core (DAC) provides turnkey study design, monitoring and analysis Projects are diverse ~80 projects completed in 2 years Genome assembly focuses on nonhuman and especially challenging genomes Turkey, bacteria, insects (butterfly), fish Genome variation discovery and annotation projects RNAseq Multiple projects ranging from binary comparisons to multifactor time studies mirna expression and discovery SNP population studies Metagenomic studies Co-Author papers for contributions are made to the science 9 published 4 submitted ~10 currently in draft Core personnel directly participate in grant applications USDA grant submitted (PI) 7 grants submitted as copi
10 A data intense example NextGen sequence analysis and exploitation
11 NextGen DNA sequence analysis is now the rate limiting step The cost of sequencing has dropped from $3B/genome to ~$1K/genome. New genomes are sequenced daily. It is estimated that there are 30,000 human genomes complete, with 15,000 of these in the public domain. Analysis has focused on Single Nucleotide Polymorphisms ( SNPs ), which are single letter changes in the DNA code. For complex diseases like cancer, heart disease and mental disorders, extensive work has still only explains 10-20% of the known genetic component. Recent research indicates that do to experimental measurement noise, perhaps most of the measured variations are false positives. Data analysis pipelines are built from a number of standard tools. There are many public and proprietary analysis pipelines, and there performance accuracy is highly contested. Truth Data is just beginning to be assembled. Different types of DNA sequencing do not cross-validate.
12 Microsatellites, or repetitive DNA sequences are particularly challenging Microsatellites, also called Simple Sequence Repeats or Short Tandem Repeats, are an understudied portion of genome; because they are considered part of our Junk DNA or more recently Dark Matter DNA; research focus has been on Single Nucleotide Polymorphisms ( SNPs ) Microsatellites have known value: long used for paternity and forensic testing and linked to neurological diseases (e.g. Huntington s and Fragile-X) None of major genomic research projects have focused on Microsatellites: not Human Genome Project, 1000 Genome Project, The Cancer Genome Atlas, ENCODE or the icogs study.
13 Microsatellite myths dispelled, enabling new discoveries Myth 1: Accurate and efficient analysis of the ~1 million Microsatellites is not possible. Microsatellite genotypes in 1000 Genome Project and The Cancer Genome Atlas demonstrated to be only 20% accurate 1 ; new proprietary algorithm is 96% accurate Myth 2: Microsatellites are hyper-variable, and will therefore not be useable in genotype-phenotype association studies Analysis of 1,200 healthy genomes demonstrated that 98% of the ~150,000 microsatellites in genes are highly invariant Myth 3: Heritable and spontaneous components of disease will be explained by SNPs. Recent icogs study involving over 200,000 subjects demonstrated that known and new SNPs explain less than 50% of heritability in breast, ovarian and prostate cancer 1 McIver, 2010
14 Research Pipeline Download and rebuild thousands of healthy and affected genomes Create genotype distributions for healthy and affected populations Compute Fishers Exact Test p-value for each of ~1 million loci and rank results Identify Patterns of Informative Microsatellites (PIM) from loci that pass Bonferroni and Benjamini Hochberg False Discovery Rate tests Manually review, do QC, compute sensitivity and specificity Annotate with ontologies, literature, input from experts Validate PIM with sequencing of wellcharacterized samples Business analysis; product definition; IP Publish; translate, regulatory approval, reimbursement; team with established clinical services co.
15 Genomeon has created a unique library of over 8000 genomes from 1000 Genomes Project and The Cancer Genome Atlas with corrected microsatellites Healthy Population representing many ethnicities Ovarian cancer Breast cancer Brain cancer: Glioma; Glioblastoma; Medulloblastoma Lung cancer Prostate cancer Melanoma Autism
16 Comparative analysis has yielded new actionable clinical diagnostics and drug targets for cancer, for example Breast Cancer
17 Pattern of 55 informative microsatellites differentiates Breast Cancer germlines from healthy germlines Sensitivity = 88% Specificity = 77% BRCA ½ positive samples
18 Genes proximate to 55 BC Informative Loci 52 loci are in genes, 3 loci are intergenic Of the 52 loci, 1 is in an exon, 4 are in untranslated regions while the rest are intronic located very close to the intron-exon boundary. Many of the genes are known to be alternatively spliced and are differentially expressed, both of which imply mechanism Ontologies: notch signaling, genome stability, alternative splicing, programed cell death, cell cycle and apoptosis 32 of the 52 genes previously associated with cancer, 18 with breast cancer Several genes are known and highly pursued drug targets, new targets include several kinase and membrane bound proteins. 11 of the 52 genes are targets or affected by pharmaceuticals, including 5 that are prescribed or in clinical trials for BC.
19 Applications of these microsatellite loci variations Cancer Risk Diagnostics Microsatellite profiling for increased risk of cancer, and the tissues at highest risk Companion/Treatment Diagnostics - Many informative microsatellites are functional elements implicated in therapeutic response Clinical Trial Support - Use of microsatellite profile to differentiate subpopulations in clinical trials Drug Targets - Identification of large number of genes previously unassociated with cancer - many with functions associated with cancer processes Toxicology - Quantification of stress induced exposures/stressors via microsatellite mutation screen Prognosis - Comparison of microsatellite variations between germlines and tumors Non-cancer Diseases - PTSD, Autism, MS, cardiac diseases, aging
20 Another data intense example Text analytics to quantify publication ethics violations and fraud.but lets talk about that and the fallout later.
21 Lessons Learned.. Informatics has become a critical bottleneck, is evolving quickly, is expensive and requires continuous investment ($, people, recognition.), but it is here to stay and is required to be competitive.
22 A few things to keep in mind Grow and evolve Systems (hardware/cloud, software and people) should be obtained in smaller, diverse (including jobs with high memory requirements, fast database access, intense parallel message passing) chunks and grow as demand grows to take advantage of Moore s Law, changing requirements, and vendor competition Systems should and can be operational on day one Provide for public-facing real time web services Verification and retention Data AND analysis history must be verified and retained will be required and will make one more competitive Restrictions Access to public databases is variable, for example TCGA cannot be downloaded/analyzed in the cloud, and there are minimum systems/personnel required for access
23 A few things to keep in mind Security Server security via multiple layers, limited access, invisibility Collaboritoriums are hard to secure, most times simple solutions are best (Google drive) Varying and changing requirements of institutes, governments, projects Liability Material Transfer Agreements now involve data and are getting more complex Release of data, software, etc. Uncertainty Changing demand, fluctuating funding, and impact of breakthroughs HIPAA Clinical data access may not be possible even behind their firewalls, driven by fear of loss of control, discovering an adverse event or comparisons across practices Access to data Commercialization/translation Patentability, proprietary/trade secret The world s best bioinformatics company is worth. The world s worst pharma company is worth..
24 Cloud computing is not yet the answer to computing in the life and medical sciences Locality/Dependencies Where is the data and what about data that must be merged from many sources? Compute match Some jobs require non-standard hardware configurations for performance: some genomic assemblies require 2+TB of memory, some simulations require extremely high data exchange/update rates Bandwidth Getting the data to the cloud from local sources can be limiting, as will be cases where data is moved from cloud to cloud. Cost The initial cost is low, but the sustained cost can be high, and in academic settings, funding to support work beyond 3 years is very difficult Security There are HIPAA compliant clouds, there are issues with acceptance Storage Costs are still high for sustained storage. Known amounts of local storage drives scientists to be economical in experimental design. Unknowns What happens when the cloud goes down? What happens if a supplier goes out of business? What if
25 One possible solution.. Create and support critical mass sized entities that span AIRI members, so that members together take advantage of scale
26 Discipline-specific informatics entities : condo computing organization where members buy in and excess capacity available to new/unfunded researchers Mix of compute technologies and bulk purchasing Best of class software, algorithms and data warehouses Automated pipelines Data analysists as independent researchers and as collaborators Complete data analysis solutions computing, statistics, experimental design, data monitoring/archiving, data and analysis reproducibility validation/checking Data and analysis delivery portals (required by funders and journals) Critical mass so all needed expertise and infrastructure is available and continuously upgraded to meet changing needs
27 Thank you. Any Questions?
Human Genome Organization: An Update. Genome Organization: An Update
Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion
More informationA leader in the development and application of information technology to prevent and treat disease.
A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationAn Introduction to Genomics and SAS Scientific Discovery Solutions
An Introduction to Genomics and SAS Scientific Discovery Solutions Dr Karen M Miller Product Manager Bioinformatics SAS EMEA 16.06.03 Copyright 2003, SAS Institute Inc. All rights reserved. 1 Overview!
More informationClinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients
Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients Brandy Bernard PhD Senior Research Scientist Institute for Systems Biology Seattle, WA Dr. Bernard s research
More informationComplexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationCenter for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing
Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing Center Yale University PIs: Ivet Bahar, Jeremy Berg,
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationFactors for success in big data science
Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)
More informationLecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)
Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur
More informationG E N OM I C S S E RV I C ES
GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationSeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
More informationMedical Informatics II
Medical Informatics II Zlatko Trajanoski Institute for Genomics and Bioinformatics Graz University of Technology http://genome.tugraz.at zlatko.trajanoski@tugraz.at Medical Informatics II Introduction
More informationIntroduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
More informationThe Human Genome Project. From genome to health From human genome to other genomes and to gene function Structural Genomics initiative
The Human Genome Project From genome to health From human genome to other genomes and to gene function Structural Genomics initiative June 2000 What is the Human Genome Project? U.S. govt. project coordinated
More informationBalancing Big Data for Security, Collaboration and Performance
Balancing Big Data for Security, Collaboration and Performance Sai Balu Lineberger Cancer Center UNC Chapel Hill Oct 14, 2014 About UNC Oldest Public University -1793 Top 5 Public University. 46th World
More informationAttacking the Biobank Bottleneck
Attacking the Biobank Bottleneck Professor Jan-Eric Litton BBMRI-ERIC BBMRI-ERIC Big Data meets research biobanking Big data is high-volume, high-velocity and highvariety information assets that demand
More informationBig Data Trends A Basis for Personalized Medicine
Big Data Trends A Basis for Personalized Medicine Dr. Hellmuth Broda, Principal Technology Architect emedikation: Verordnung, Support Prozesse & Logistik 5. Juni, 2013, Inselspital Bern Over 150,000 Employees
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationScaling up to Production
1 Scaling up to Production Overview Productionize then Scale Building Production Systems Scaling Production Systems Use Case: Scaling a Production Galaxy Instance Infrastructure Advice 2 PRODUCTIONIZE
More informationITT Advanced Medical Technologies - A Programmer's Overview
ITT Advanced Medical Technologies (Ileri Tip Teknolojileri) ITT Advanced Medical Technologies (Ileri Tip Teknolojileri) is a biotechnology company (SME) established in Turkey. Its activity area is research,
More informationCancer Genomics: What Does It Mean for You?
Cancer Genomics: What Does It Mean for You? The Connection Between Cancer and DNA One person dies from cancer each minute in the United States. That s 1,500 deaths each day. As the population ages, this
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationHow Can Institutions Foster OMICS Research While Protecting Patients?
IOM Workshop on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials How Can Institutions Foster OMICS Research While Protecting Patients? E. Albert Reece, MD, PhD, MBA Vice
More informationIntegrating Bioinformatics, Medical Sciences and Drug Discovery
Integrating Bioinformatics, Medical Sciences and Drug Discovery M. Madan Babu Centre for Biotechnology, Anna University, Chennai - 600025 phone: 44-4332179 :: email: madanm1@rediffmail.com Bioinformatics
More informationLeading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik
Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated
More informationBig Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.
Big Data Challenges technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data Deluge: Due to the changes in big data generation Example: Biomedicine
More informationHacking Brain Disease for a Cure
Hacking Brain Disease for a Cure Magali Haas, CEO & Founder #P4C2014 Innovator Presentation 2 Brain Disease is Personal The Reasons We Fail in CNS Major challenges hindering CNS drug development include:
More informationHuman Genome and Human Genome Project. Louxin Zhang
Human Genome and Human Genome Project Louxin Zhang A Primer to Genomics Cells are the fundamental working units of every living systems. DNA is made of 4 nucleotide bases. The DNA sequence is the particular
More informationCloud-Based Big Data Analytics in Bioinformatics
Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large
More informationBIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
More informationENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013
ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and
More informationSchool of Nursing. Presented by Yvette Conley, PhD
Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression
More informationNIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons
The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,
More informationIO Informatics The Sentient Suite
IO Informatics The Sentient Suite Our software, The Sentient Suite, allows a user to assemble, view, analyze and search very disparate information in a common environment. The disparate data can be numeric
More informationWorkshop on Establishing a Central Resource of Data from Genome Sequencing Projects
Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing
More informationSNP Essentials The same SNP story
HOW SNPS HELP RESEARCHERS FIND THE GENETIC CAUSES OF DISEASE SNP Essentials One of the findings of the Human Genome Project is that the DNA of any two people, all 3.1 billion molecules of it, is more than
More informationQ&A: Kevin Shianna on Ramping up Sequencing for the New York Genome Center
Q&A: Kevin Shianna on Ramping up Sequencing for the New York Genome Center Name: Kevin Shianna Age: 39 Position: Senior vice president, sequencing operations, New York Genome Center, since July 2012 Experience
More informationTECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298
DIAGNOSTICS BUSINESS ANALYSIS SERIES: TECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298 By ADAMS BUSINESS ASSOCIATES MAY 2014. May 2014 ABA 298 1 Technologies, Products & Services
More informationebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.
Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry. www.persistent.com 3 4 5 5 7 9 10 11 12 13 From the Vantage Point
More informationMediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015
19th of March 2015 MediSapiens Ltd Because data is not knowledge Bio-IT solutions for improving cancer patient care Sami Kilpinen, Ph.D Co-founder, CEO MediSapiens Ltd Copyright 2015 MediSapiens Ltd. All
More informationWhite Paper. Version 1.2 May 2015 RAID Incorporated
White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationThree data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk
Three data delivery cases for EMBL- EBI s Embassy Guy Cochrane www.ebi.ac.uk EMBL European Bioinformatics Institute Genes, genomes & variation European Nucleotide Archive 1000 Genomes Ensembl Ensembl Genomes
More informationData Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation. D. POLVERARI, CTO October 06-07 2008
Data Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation D. POLVERARI, CTO October 06-07 2008 Data integration definition and aims Definition : Data integration consists
More informationData deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti
Data deluge (and its applications) Prologue Data is becoming cheaper and cheaper to produce and store Driving mechanism is parallelism on sensors, storage, computing Data directly produced are complex
More informationFrom Data to Foresight:
Laura Haas, IBM Fellow IBM Research - Almaden From Data to Foresight: Leveraging Data and Analytics for Materials Research 1 2011 IBM Corporation The road from data to foresight is long? Consumer Reports
More informationDoctor of Philosophy in Computer Science
Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects
More information2019 Healthcare That Works for All
2019 Healthcare That Works for All This paper is one of a series describing what a decade of successful change in healthcare could look like in 2019. Each paper focuses on one aspect of healthcare. To
More informationThe Human Genome Project
The Human Genome Project Brief History of the Human Genome Project Physical Chromosome Maps Genetic (or Linkage) Maps DNA Markers Sequencing and Annotating Genomic DNA What Have We learned from the HGP?
More informationThe Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins
The Future of the Electronic Health Record Gerry Higgins, Ph.D., Johns Hopkins Topics to be covered Near Term Opportunities: Commercial, Usability, Unification of different applications. OMICS : The patient
More informationCore Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
More informationPutting Genomes in the Cloud with WOS TM. ddn.com. DDN Whitepaper. Making data sharing faster, easier and more scalable
DDN Whitepaper Putting Genomes in the Cloud with WOS TM Making data sharing faster, easier and more scalable Table of Contents Cloud Computing 3 Build vs. Rent 4 Why WOS Fits the Cloud 4 Storing Sequences
More informationCTC Technology Readiness Levels
CTC Technology Readiness Levels Readiness: Software Development (Adapted from CECOM s Software Technology Readiness Levels) Level 1: Basic principles observed and reported. Lowest level of software readiness.
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationBig Data and the Data Lake. February 2015
Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act
More informationNext Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took
More informationShouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center
Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing
More informationLarge Gene Interaction Analytics at University at Buffalo, SUNY
Large Gene Interaction nalytics at University at Buffalo, SUNY Giving researchers the ability to speed computations and increase data sets Overview The need Researchers required the ability to quickly
More informationAppendix 2 Molecular Biology Core Curriculum. Websites and Other Resources
Appendix 2 Molecular Biology Core Curriculum Websites and Other Resources Chapter 1 - The Molecular Basis of Cancer 1. Inside Cancer http://www.insidecancer.org/ From the Dolan DNA Learning Center Cold
More informationThe role of big data in medicine
The role of big data in medicine November 2015 Technology is revolutionizing our understanding and treatment of disease, says the founding director of the Icahn Institute for Genomics and Multiscale Biology
More informationUniversity Uses Business Intelligence Software to Boost Gene Research
Microsoft SQL Server 2008 R2 Customer Solution Case Study University Uses Business Intelligence Software to Boost Gene Research Overview Country or Region: Scotland Industry: Education Customer Profile
More informationEuro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of
More informationHPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
More informationAccelerating variant calling
Accelerating variant calling Mauricio Carneiro GSA Broad Institute Intel Genomic Sequencing Pipeline Workshop Mount Sinai 12/10/2013 This is the work of many Genome sequencing and analysis team Mark DePristo
More informationRegulated Applications in the Cloud
Keith Williams CEO Regulated Applications in the Cloud Aspects of Security and Validation Statement on the Cloud and Pharma s added Complexity Clouds already make sense for many small and mediumsize businesses,
More informationMake the Most of Big Data to Drive Innovation Through Reseach
White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability
More informationIMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS
IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS 29 OCTOBER 2015 DR. DIRK J. EVERS BACKGROUND TreatmentMAP
More informationDigital Catapult. The impact of Big Data in a Connected Digital Economy Future of Healthcare. Mark Wall Big Data & Analytics Leader.
1 Digital Catapult The impact of Big Data in a Connected Digital Economy Future of Healthcare Mark Wall Big Data & Analytics Leader March 12 2014 Catapult is a Technology Strategy Board programme Agenda
More informationCompliance and the Cloud. Guiding principles and architecture for addressing Life Science compliance in the cloud
Compliance and the Cloud Guiding principles and architecture for addressing Life Science compliance in the cloud Life Sciences Industry Unit Microsoft Corporation June 2012 ii Legal Disclaimers The information
More informationBBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge
More informationTRANSLATIONAL BIOINFORMATICS 101
TRANSLATIONAL BIOINFORMATICS 101 JESSICA D. TENENBAUM Department of Bioinformatics and Biostatistics, Duke University Durham, NC 27715 USA Jessie.Tenenbaum@duke.edu SUBHA MADHAVAN Innovation Center for
More informationHow To Make Cancer A Clinical Sequencing
10 this time, it s Personal In what is an exciting era in the evolution of oncology treatment, this special feature by Deborah J. Ausman explores how Next-Generation Sequencing and Convergent Informatics
More informationMUTATION, DNA REPAIR AND CANCER
MUTATION, DNA REPAIR AND CANCER 1 Mutation A heritable change in the genetic material Essential to the continuity of life Source of variation for natural selection New mutations are more likely to be harmful
More informationIntegrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps
White Paper Healthcare Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps Executive Summary The Transformation Lab at Intermountain Healthcare in Salt Lake City, Utah,
More informationBig data in cancer research : DNA sequencing and personalised medicine
Big in cancer research : DNA sequencing and personalised medicine Philippe Hupé Conférence BIGDATA 04/04/2013 1 - Titre de la présentation - nom du département émetteur et/ ou rédacteur - 00/00/2005 Deciphering
More informationFour Things You Must Do Before Migrating Archive Data to the Cloud
Four Things You Must Do Before Migrating Archive Data to the Cloud The amount of archive data that organizations are retaining has expanded rapidly in the last ten years. Since the 2006 amended Federal
More informationINTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE E15
INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE DEFINITIONS FOR GENOMIC BIOMARKERS, PHARMACOGENOMICS,
More informationNew solutions for Big Data Analysis and Visualization
New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology
More informationInternational Stem Cell Registry
International Stem Cell Registry Importance of Stem Cells Stem cells are model systems for the study of development and disease. Pluripotent stem cells offer new tools for drug design and discovery. Pluripotent
More informationImprove Cooperation in R&D. Catalyze Drug Repositioning. Optimize Clinical Trials. Respect Information Governance and Security
SINEQUA FOR LIFE SCIENCES DRIVE INNOVATION. ACCELERATE RESEARCH. SHORTEN TIME-TO-MARKET. 6 Ways to Leverage Big Data Search & Content Analytics for a Pharmaceutical Company Improve Cooperation in R&D Catalyze
More informationMatteo di Tommaso FDA-PhUSE March 2013 Vice President, Research Business Technology Chair, PRISME Forum
Pharma R&D IT & The Cloud Matteo di Tommaso FDA-PhUSE March 2013 Vice President, Research Business Technology Chair, PRISME Forum This presentation outlines a general technology direction. Pfizer Inc has
More informationHow To Change Medicine
P4 Medicine: Personalized, Predictive, Preventive, Participatory A Change of View that Changes Everything Leroy E. Hood Institute for Systems Biology David J. Galas Battelle Memorial Institute Version
More informationThe Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO
The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray
More informationWeb-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic
More informationIntegration of Genetic and Familial Data into. Electronic Medical Records and Healthcare Processes
Integration of Genetic and Familial Data into Electronic Medical Records and Healthcare Processes By Thomas Kmiecik and Dale Sanders February 2, 2009 Introduction Although our health is certainly impacted
More informationTestimony of. Paul Misener Vice President for Global Public Policy, Amazon.com. Before the
Testimony of Paul Misener Vice President for Global Public Policy, Before the United States House of Representatives Committee on Energy and Commerce Subcommittee on Communications and Technology Subcommittee
More informationThe National Institute of Genomic Medicine (INMEGEN) was
Genome is...... the complete set of genetic information contained within all of the chromosomes of an organism. It defines the particular phenotype of an individual. What is Genomics? The study of the
More informationNIH s Genomic Data Sharing Policy
NIH s Genomic Data Sharing Policy 2 Benefits of Data Sharing Enables data generated from one study to be used to explore a wide range of additional research questions Increases statistical power and scientific
More informationTop Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation
Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely
More informationHigh Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University
High Performance Spatial Queries and Analytics for Spatial Big Data Fusheng Wang Department of Biomedical Informatics Emory University Introduction Spatial Big Data Geo-crowdsourcing:OpenStreetMap Remote
More informationParadigm Changes Affecting the Practice of Scientific Communication in the Life Sciences
Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences Prof. Dr. Martin Hofmann-Apitius Head of the Department of Bioinformatics Fraunhofer Institute for Algorithms and
More informationVad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives
Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives Dirk.Repsilber@oru.se 2015-05-21 Functional Bioinformatics, Örebro University Vad är bioinformatik och varför
More informationBig Data Visualization for Genomics. Luca Vezzadini Kairos3D
Big Data Visualization for Genomics Luca Vezzadini Kairos3D Why GenomeCruzer? The amount of data for DNA sequencing is growing Modern hardware produces billions of values per sample Scientists need to
More informationGenetic diagnostics the gateway to personalized medicine
Micronova 20.11.2012 Genetic diagnostics the gateway to personalized medicine Kristiina Assoc. professor, Director of Genetic Department HUSLAB, Helsinki University Central Hospital The Human Genome Packed
More informationHETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation
HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM Aniket Bochare - aniketb1@umbc.edu CMSC 601 - Presentation Date-04/25/2011 AGENDA Introduction and Background Framework Heterogeneous
More information