Towards Integrating the Detection of Genetic Variants into an In-Memory Database
|
|
- Laurence Richards
- 8 years ago
- Views:
Transcription
1 Towards Integrating the Detection of Genetic Variants into an 2nd International Workshop on Big Data in Bioinformatics and Healthcare Oct 27, 2014
2 Motivation Genome Data Analysis Process DNA Sample Base Sequencing Read Alignment Variant Calling Data Annotation Analysis Results Next-generation sequencing (NGS) requires adapted analysis workflow Higher error rates Shorter reads Base sequencing step produces output within a few hours Subsequent processing steps take days up to several weeks 2
3 Motivation The Next-Generation Sequencing Data Deluge NGS growth pattern more remarkable than Moore s law à Addressing data deluge with more computing power no option For variant calling: Still options to improve data processing Single-threaded processing Data stored in files on disk Cost in [USD] Cost in [USD] Main Main Memory Memory Cost Cost per Megabyte per Megabyte Sequencing Sequencing Cost Cost per Megabase per Megabase /12/01 01/12/01 01/12/03 01/12/03 01/12/05 01/12/05 01/12/07 01/12/07 01/12/09 01/12/09 01/12/11 01/12/11 01/12/13 01/12/13 Date Date 3
4 IMDB Building Blocks P v Combined column and row store Map/Reduce Single and multi-tenancy Insert only for time travel Real-time replication Working on integers Active/passive data store Minimal projections Group key Dynamic multithreading Bulk load of data Objectrelational mapping No aggregate tables Data partitioning Any attribute as index On-the-fly extensibility Analytics on historical data Multi-core/ parallelization t Lightweight compression SQL SQL interface on columns and rows Reduction of software layers x x T disk Text retrieval and extraction engine No disk 4
5 IMDB Building Blocks P v Combined column and row store Map/Reduce Single and multi-tenancy Insert only for time travel Real-time replication Working on integers Active/passive data store Minimal projections Group key Dynamic multithreading Bulk load of data Objectrelational mapping No aggregate tables Data partitioning Any attribute as index On-the-fly extensibility Analytics on historical data Multi-core/ parallelization t Lightweight compression SQL SQL interface on columns and rows Reduction of software layers x x T disk Text retrieval and extraction engine No disk 5
6 Different Types of Genetic Variants AACTG vs. ATCTG Single Nucleotide Polymorphism (SNP) AACTG vs. AA_TG Insertion or Deletion (InDel) AACTG vs. GTCAA Structural Variations (SV) Different calling strategies for variant types with increasing complexity SNP calling (single-/ multi-sample) Indel calling à Focus here on single-sample SNP calling 6
7 Our Contribution Integrating SNP Calling into an SNP calling implemented as core component of the database Invocation of SNP calling via stored procedure call: CALL "_SYS_AFL"."CALL_SNPS ( SAMIMPORT.NA19240, REFERENCE.HG19CHR1, 'chr1', 20, 20, 30, 40, VARIANTS.OUTPUT); Built-in parallel scheduling and resource management of distinct SNP calling steps 7
8 Our Contribution SNP Calling Data Artifacts Reference Genome Base sequence for comparison Stored position-wise Read Alignments Reads mapped to the reference genome Table conforming SAM format Variant/SNP Calls Detected SNPs Table conforming VCF format 8
9 Our Contribution Genotype Calling Formula Genotype calling = deriving the actual genotype at a particular position Assign probability to all possible genotypes depending on given data P(G i ) = Uniform for all genotypes G i,i.e. 1 D j = all base occurrences at a particular position j G i = Genotype for which to calculate the probability H l = Haploid part of genotype G i b j,k = Base quality score of the particular base d j,k à Formula applied by GATK s UnifiedGenotyper 9
10 Our Contribution Experiment Results Data: 68.8M chr1 read alignments from 1,000 genomes project GATK IMDB Performance speedup by up to 22x for IMDB-based SNP calling Duration (seconds) GATK s runtime depends on system s I/O capabilities Lower boundary for our approach around 369s Covered Positions on Chromosome 1 (millions) 10
11 Conclusion Running SNP calling within in-memory database satisfies expectations Main memory availability Built-in parallelization strategies à Memory access is the new bottleneck SNP calling runtime improves up to factor 22 compared to GATK Further evaluations on runtime performance and result set quality Extension of statistical formula to incorporate other aspects 11
12 Keep in contact with us. Cindy Fähnrich, M. Sc. Dr. Hasso Plattner Institute Enterprise Platform & Integration Concepts August-Bebel-Str Potsdam, Germany 12
How Real-time Analysis turns Big Medical Data into Precision Medicine?
Medical Data into Dr. Matthieu-P. Schapranow GLOBAL HEALTH, Rome, Italy August 27, 2014 Important things first: Where to find additional information? Online: Visit http://we.analyzegenomes.com for latest
More informationAn example of bioinformatics application on plant breeding projects in Rijk Zwaan
An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on
More informationAccelerating variant calling
Accelerating variant calling Mauricio Carneiro GSA Broad Institute Intel Genomic Sequencing Pipeline Workshop Mount Sinai 12/10/2013 This is the work of many Genome sequencing and analysis team Mark DePristo
More informationRemoving Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data
Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University HiCOMB 2014 May 19 th, Phoenix, Arizona 1 Outline
More informationSAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
More informationCloud-Based Big Data Analytics in Bioinformatics
Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large
More informationNext Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took
More informationAnalysis of NGS Data
Analysis of NGS Data Introduction and Basics Folie: 1 Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference
More informationIn-Memory Data Management for Enterprise Applications
In-Memory Data Management for Enterprise Applications Jens Krueger Senior Researcher and Chair Representative Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University
More informationModule 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
More informationBuilding Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis
Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis Yanlei Diao, Abhishek Roy University of Massachusetts Amherst {yanlei,aroy}@cs.umass.edu Toby Bloom New York Genome Center tbloom@nygenome.org
More informationIntroduction to NGS data analysis
Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High
More informationHow-To: SNP and INDEL detection
How-To: SNP and INDEL detection April 23, 2014 Lumenogix NGS SNP and INDEL detection Mutation Analysis Identifying known, and discovering novel genomic mutations, has been one of the most popular applications
More informationOpenCB a next generation big data analytics and visualisation platform for the Omics revolution
OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio
More informationAccelerating Data-Intensive Genome Analysis in the Cloud
Accelerating Data-Intensive Genome Analysis in the Cloud Nabeel M Mohamed Heshan Lin Wu-chun Feng Department of Computer Science Virginia Tech Blacksburg, VA 24060 {nabeel, hlin2, wfeng}@vt.edu Abstract
More informationData Management in SAP Environments
Data Management in SAP Environments the Big Data Impact Berlin, June 2012 Dr. Wolfgang Martin Analyst, ibond Partner und Ventana Research Advisor Data Management in SAP Environments Big Data What it is
More informationSingle-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation
PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic
More informationSQL Server 2012 Performance White Paper
Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.
More informationHadoop-BAM and SeqPig
Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3 1 Department of Computer
More informationSemplicità ed Innovazione a portata di mano
Semplicità ed Innovazione a portata di mano Tavola Rotonda Napoli, 16 aprile 2015 www.icms.it ICM.S è VAR of the YEAR 2014 SAP HANA: not only a database in memory SQ L SQL Interface on Columns and Rows
More informationToward Efficient Variant Calling inside Main-Memory Database Systems
Toward Efficient Variant Calling inside Main-Memory Database Systems Sebastian Dorok Bayer Pharma AG and sebastian.dorok@ovgu.de Sebastian Breß sebastian.bress@ovgu.de Gunter Saake gunter.saake@ovgu.de
More informationTutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment
Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249
More informationComputational Requirements
Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationSybase Adaptive Server Enterprise
technical white paper Sybase Adaptive Server Enterprise Data Transfer Utility www.sybase.com Contents 1. Executive Summary..........................................................................................................
More informationLeading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik
Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated
More informationHADOOP IN THE LIFE SCIENCES:
White Paper HADOOP IN THE LIFE SCIENCES: An Introduction Abstract This introductory white paper reviews the Apache Hadoop TM technology, its components MapReduce and Hadoop Distributed File System (HDFS)
More informationPractical Guideline for Whole Genome Sequencing
Practical Guideline for Whole Genome Sequencing Disclosure Kwangsik Nho Assistant Professor Center for Neuroimaging Department of Radiology and Imaging Sciences Center for Computational Biology and Bioinformatics
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationIn-Memory Analytics: A comparison between Oracle TimesTen and Oracle Essbase
In-Memory Analytics: A comparison between Oracle TimesTen and Oracle Essbase Agenda Introduction Why In-Memory? Options for In-Memory in Oracle Products - Times Ten - Essbase Comparison - Essbase Vs Times
More informationSchool of Nursing. Presented by Yvette Conley, PhD
Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression
More informationCSE-E5430 Scalable Cloud Computing. Lecture 4
Lecture 4 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 5.10-2015 1/23 Hadoop - Linux of Big Data Hadoop = Open Source Distributed Operating System
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationAn FPGA Acceleration of Short Read Human Genome Mapping
An FPGA Acceleration of Short Read Human Genome Mapping Corey Bruce Olson A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering University
More informationMemory Efficient Processing of DNA Sequences in Relational Main-Memory Database Systems
Memory Efficient Processing of DNA Sequences in Relational Main-Memory Database Systems Sebastian Dorok Bayer Pharma AG Otto-von-Guericke-University Magdeburg Institute for Technical and Business Information
More informationSeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
More informationVersion 5.0 Release Notes
Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com
More informationProcessing NGS Data with Hadoop-BAM and SeqPig
Processing NGS Data with Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3
More informationActian Vector in Hadoop
Actian Vector in Hadoop Industrialized, High-Performance SQL in Hadoop A Technical Overview Contents Introduction...3 Actian Vector in Hadoop - Uniquely Fast...5 Exploiting the CPU...5 Exploiting Single
More informationGenome sequence analysis with MonetDB: a case study on Ebola virus diversity
Genome sequence analysis with MonetDB: a case study on Ebola virus diversity Robin Cijvat 1 Stefan Manegold 2 Martin Kersten 1,2 Gunnar W. Klau 2 Alexander Schönhuth 2 Tobias Marschall 3 Ying Zhang 1,2
More informationNavigating the Big Data infrastructure layer Helena Schwenk
mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationData Integrator Performance Optimization Guide
Data Integrator Performance Optimization Guide Data Integrator 11.7.2 for Windows and UNIX Patents Trademarks Copyright Third-party contributors Business Objects owns the following
More informationOracle Database In-Memory The Next Big Thing
Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes
More informationNew solutions for Big Data Analysis and Visualization
New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology
More informationIn-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
More informationParallel Data Preparation with the DS2 Programming Language
ABSTRACT Paper SAS329-2014 Parallel Data Preparation with the DS2 Programming Language Jason Secosky and Robert Ray, SAS Institute Inc., Cary, NC and Greg Otto, Teradata Corporation, Dayton, OH A time-consuming
More informationPERFORMANCE TIPS FOR BATCH JOBS
PERFORMANCE TIPS FOR BATCH JOBS Here is a list of effective ways to improve performance of batch jobs. This is probably the most common performance lapse I see. The point is to avoid looping through millions
More informationA Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System
A Design of Resource Fault Handling Mechanism using Dynamic Resource Reallocation for the Resource and Job Management System Young-Ho Kim, Eun-Ji Lim, Gyu-Il Cha, Seung-Jo Bae Electronics and Telecommunications
More informationDNA Sequencing Data Compression. Michael Chung
DNA Sequencing Data Compression Michael Chung Problem DNA sequencing per dollar is increasing faster than storage capacity per dollar. Stein (2010) Data 3 billion base pairs in human genome Genomes are
More informationOptimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,
More informationChallenges associated with analysis and storage of NGS data
Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group gabry@ebi.ac.uk Next-generation sequencing Next-generation sequencing
More informationA Tutorial in Genetic Sequence Classification Tools and Techniques
A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University jakemdrew@gmail.com www.jakemdrew.com Sequence Characters IUPAC nucleotide
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More information-> Integration of MAPHiTS in Galaxy
Enabling NGS Analysis with(out) the Infrastructure, 12:0512 Development of a workflow for SNPs detection in grapevine From Sets to Graphs: Towards a Realistic Enrichment Analy species: MAPHiTS -> Integration
More informationSAP HANA In-Memory Database Sizing Guideline
SAP HANA In-Memory Database Sizing Guideline Version 1.4 August 2013 2 DISCLAIMER Sizing recommendations apply for certified hardware only. Please contact hardware vendor for suitable hardware configuration.
More informationUsing Illumina BaseSpace Apps to Analyze RNA Sequencing Data
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless
More informationSAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013
SAP HANA SAP s In-Memory Database Dr. Martin Kittel, SAP HANA Development January 16, 2013 Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationPerformance Verbesserung von SAP BW mit SQL Server Columnstore
Performance Verbesserung von SAP BW mit SQL Server Columnstore Martin Merdes Senior Software Development Engineer Microsoft Deutschland GmbH SAP BW/SQL Server Porting AGENDA 1. Columnstore Overview 2.
More informationNazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office
2013 Laboratory Accreditation Program Audioconferences and Webinars Implementing Next Generation Sequencing (NGS) as a Clinical Tool in the Laboratory Nazneen Aziz, PhD Director, Molecular Medicine Transformation
More informationSAP Business Suite powered by SAP HANA
SAP Business Suite powered by SAP HANA CeBIT 2013, March 5 th Bernd Leukert, Corporate Officer and Executive Vice President Application Innovation, SAP AG Magnitude of Change: Omission of Restrictions
More informationPUBLIC Performance Optimization Guide
SAP Data Services Document Version: 4.2 Support Package 6 (14.2.6.0) 2015-11-20 PUBLIC Content 1 Welcome to SAP Data Services....6 1.1 Welcome.... 6 1.2 Documentation set for SAP Data Services....6 1.3
More informationIntegrating computational data analysis capabilities into analytics applications
Integrating computational data analysis capabilities into analytics applications TIBCO Spotfire API Juan Elvira Integromics Deputy CTO About Integromics www.integromics.com Focus on software development
More informationVariantSpark: Applying Spark-based machine learning methods to genomic information
VariantSpark: Applying Spark-based machine learning methods to genomic information Aidan R. O BRIEN a a,1 and Denis C. BAUER a CSIRO, Health and Biosecurity Flagship Abstract. Genomic information is increasingly
More informationSAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here
PLATFORM Top Ten Questions for Choosing In-Memory Databases Start Here PLATFORM Top Ten Questions for Choosing In-Memory Databases. Are my applications accelerated without manual intervention and tuning?.
More informationINTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B
INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE QUALITY OF BIOTECHNOLOGICAL PRODUCTS: ANALYSIS
More information9. Handling large data
9. Handling large data Thomas Lumley Ken Rice Universities of Washington and Auckland Seattle, June 2011 Large data R is well known to be unable to handle large data sets. Solutions: Get a bigger computer:
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationBig Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
More informationHigh-Volume Data Warehousing in Centerprise. Product Datasheet
High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified
More informationWork Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction
Work Package 13.5: Report summarising the technical feasibility of the European Genotype Archive to collect, store, and use genotype data stored in European biobanks in a manner that complies with all
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationGo where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe
Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. To published results faster With proven scalability To the forefront of discovery To limitless applications
More informationBiomedical Big Data and Precision Medicine
Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types
More informationHow, What, and Where of Data Warehouses for MySQL
How, What, and Where of Data Warehouses for MySQL Robert Hodges CEO, Continuent. Introducing Continuent The leading provider of clustering and replication for open source DBMS Our Product: Continuent Tungsten
More informationAssuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines
Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice Next-generation Sequencing: Standardization of Clinical Testing (Nex-StoCT) Workgroup Principles and Guidelines Supplementary
More informationOntology construction on a cloud computing platform
Ontology construction on a cloud computing platform Exposé for a Bachelor's thesis in Computer science - Knowledge management in bioinformatics Tobias Heintz 1 Motivation 1.1 Introduction PhenomicDB is
More informationCHALLENGES IN NEXT-GENERATION SEQUENCING
CHALLENGES IN NEXT-GENERATION SEQUENCING BASIC TENETS OF DATA AND HPC Gray s Laws of data engineering 1 : Scientific computing is very dataintensive, with no real limits. The solution is scale-out architecture
More informationUKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015
UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory April, 2015 1 Contents Overview... 3 Rare Variants... 3 Observation... 3 Approach... 3 ApoE
More informationOrganization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator
Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator Why is the NGS data processing a big challenge? Computation cannot keep up with the Biology. Source: illumina
More informationCitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
More informationbigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
More informationAmadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator
WHITE PAPER Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com SAS 9 Preferred Implementation Partner tests a single Fusion
More informationSeqPig: simple and scalable scripting for large sequencing data sets in Hadoop
SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop André Schumacher, Luca Pireddu, Matti Niemenmaa, Aleksi Kallio, Eija Korpelainen, Gianluigi Zanetti and Keijo Heljanko Abstract
More informationCloud-Based Big Data Analytics in Bioinformatics: A Review
Cloud-Based Big Data Analytics in Bioinformatics: A Review Cephas MAWERE 1, Kudakwashe ZVAREVASHE 2, Thamari SENGUDZWA 3, Tendai PADENGA 4 1 Harare Institute of Technology, School of Industrial Sciences
More informationStep by Step Guide to Importing Genetic Data into JMP Genomics
Step by Step Guide to Importing Genetic Data into JMP Genomics Page 1 Introduction Data for genetic analyses can exist in a variety of formats. Before this data can be analyzed it must imported into one
More informationBig Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.
Big Data Challenges technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data Deluge: Due to the changes in big data generation Example: Biomedicine
More informationFrom GWS to MapReduce: Google s Cloud Technology in the Early Days
Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationBuilding Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT
Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this
More informationPreview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any
More informationBusiness Analytics: The Big Leap Forward RUN BETTER
Business Analytics: The Big Leap Forward RUN BETTER Business Analytics Has Struggled to Keep Up 2 A Revolution Credit Suisse, The Need for Speed 3 Typical Business Intelligence Today Business Intelligence
More informationHow To Write Large Data In R On A Microsoft Macbook
Storing and retrieving large data Thomas Lumley Ken Rice UW Biostatistics Seattle, June 2009 Large data R is well known to be unable to handle large data sets. Solutions: Get a bigger computer: Linux computer
More informationExploring the Efficiency of Big Data Processing with Hadoop MapReduce
Exploring the Efficiency of Big Data Processing with Hadoop MapReduce Brian Ye, Anders Ye School of Computer Science and Communication (CSC), Royal Institute of Technology KTH, Stockholm, Sweden Abstract.
More informationEuropean Medicines Agency
European Medicines Agency July 1996 CPMP/ICH/139/95 ICH Topic Q 5 B Quality of Biotechnological Products: Analysis of the Expression Construct in Cell Lines Used for Production of r-dna Derived Protein
More informationEfficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
More informationNext Generation Data Warehouse and In-Memory Analytics
Next Generation Data Warehouse and In-Memory Analytics S. Santhosh Baboo,PhD Reader P.G. and Research Dept. of Computer Science D.G.Vaishnav College Chennai 600106 P Renjith Kumar Research scholar Computer
More informationSAP HANA. SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence
SAP HANA SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence SAP HANA Performance Table of Contents 3 Introduction 4 The Test Environment Database Schema Test Data System
More information