Is Grid or Cloud Computing Suitable for Normalization of Microarray Data and Resampling Approaches for SNP Data?
|
|
- Ross Beasley
- 8 years ago
- Views:
Transcription
1 1/ 16 Is Grid or Cloud Computing Suitable for Normalization of Microarray Data and Resampling Approaches for SNP Data? for Computational (Bio-)Statistics LMU Munich Institute for Medical Information Sciences, Biometry, and Epidemiology
2 2/ Project 1: Normalization of Microarray Data 2. Project 2: Resampling Approaches on SNP-Data 3. Outlook & Questions
3 Outline Normalization of Microarrays Resampling for SNP-Data 3/ 16 Project 1: Normalization of Microarray Data
4 Outline Normalization of Microarrays Resampling for SNP-Data Benefit of Addon-Normalization and Correct Normalization in Resampling Setups Evaluation of prediction models and the effect of normalization on the quality of prediction: strict separation of training and test set in resampling test set to be preprocessed separately possible approaches: each node performs one resampling step each node needs to be sent the whole AffyBatch-object or access to all.cel-files each node operates on small fraction of the microarrays (as in affypara) more communication between nodes during actual computation time (works for normalization step only) 4/ 16
5 Outline Normalization of Microarrays Resampling for SNP-Data Illustration - Effect of Normalization 5/ 16
6 Outline Normalization of Microarrays Resampling for SNP-Data Solution at the LRZ-cluster 6/ 16 Each node performs normalization step and subsequent classification step separately: host reads in.cel-files and broadcasts them to all nodes AffyBatch (faster than reading from hard disc) host sends training indices to each node different normalization methods and classification methods performed separately on each node host collects classification performances embarrassingly parallel scaled quite well
7 Outline Normalization of Microarrays Resampling for SNP-Data Speedup 7/ threads good speedup up to 125 workers computation time reduced from 124 hours to less than 90 minutes (125 workers) multicore (R) at HLRBII
8 Outline Normalization of Microarrays Resampling for SNP-Data Suitable Task for Grids/Clouds? Solution 1: Broadcast of AffyBatch on Grids/Clouds? requires 12GB RAM and more for large microarray studies (Can that be guaranteed on Grids or Clouds?) Solution 2: split data approach not realizable via broadcasts how to transfer respective parts of the data to the nodes? host reads in and sends them to the nodes nodes read from database themselves steady communication between nodes waiting times for slowest node in the grid/cloud 8/ 16
9 Normalization of Microarrays Resampling for SNP-Data Outlook & Questions 9/ 16 Project 2: Resampling Approaches on Genome-Wide SNP-Data
10 Normalization of Microarrays Resampling for SNP-Data Outlook & Questions Resampling on genome wide SNP-Data 10/ 16 2-step-approach: variable selection for all SNPs primarily univariate methods (filtering) wrapper approaches? regression/classification using selected SNPs only resembles classification/regression task on microarrays since only a small number of SNPs is selected construction of a single classifier computationally not very demanding
11 Normalization of Microarrays Resampling for SNP-Data Outlook & Questions 11/ 16 Possible Realization Filter-Step: each node could perform all gene selection steps for a small fraction of the data set no need to send whole data set to all nodes point-to-point communication or individual reading from data base? results collected by host Classification/Regression-Step: each node performs one resampling iteration using respective variables only heterogeneity and waiting times probably not problematic
12 Normalization of Microarrays Resampling for SNP-Data Outlook & Questions Illustration - Different Implementations 12/ 16
13 Resampling for SNP-Data Outlook & Questions 13/ 16 Outlook & Questions
14 Resampling for SNP-Data Outlook & Questions 14/ 16 Some questions: Are workers allowed to communicate with other workers? Is it possible to send or broadcast whole genome wide SNP data within a sensible time frame? Is there a guaranteed amount of RAM that can be specified? What to do if one node produces an error? Can the respective task be performed on another node without aborting the whole job? What is more efficient? host sends respective parts of the data to each node nodes read their part of the data from database How efficient are broadcasts (are they used at all)?
15 Resampling for SNP-Data Outlook & Questions 15/ 16 Last but not least... What about the infrastructure? data storage, data bases, maintenance, limits? sharing of data / results (community) R-package used for parallel computing? installation of own R-packages (+Bioconductor) problems with parallel R-computing and heterogeneous hardware help and support? test runs? speedups for certain tasks?
16 Resampling for SNP-Data Outlook & Questions 16/ 16 Thank you for your attention.
SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants
SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants Xiuwen Zheng Department of Biostatistics University of Washington Seattle Introduction Thousands of gigabyte
More informationSeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequence Variants
SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequence Variants 1 Dr. Xiuwen Zheng Department of Biostatistics University of Washington Seattle Introduction Thousands of gigabyte
More informationSilviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)
Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Outline Introduction EO challenges; EO and classical/cloud computing; EO Services The computing platform Cluster -> Grid -> Cloud
More informationCloud-Based Big Data Analytics in Bioinformatics
Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large
More informationScaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
More informationHPC performance applications on Virtual Clusters
Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK pkritika@epcc.ed.ac.uk 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)
More informationbwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 20.
bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 20. October 2010 Richling/Kredel (URZ/RUM) bwgrid Treff WS 2010/2011 1 / 27 Course
More information7. Working with Big Data
7. Working with Big Data Thomas Lumley Ken Rice Universities of Washington and Auckland Lausanne, September 2014 Large data R is well known to be unable to handle large data sets. Solutions: Get a bigger
More informationSeqPig: simple and scalable scripting for large sequencing data sets in Hadoop
SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop André Schumacher, Luca Pireddu, Matti Niemenmaa, Aleksi Kallio, Eija Korpelainen, Gianluigi Zanetti and Keijo Heljanko Abstract
More information22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
More information-> Integration of MAPHiTS in Galaxy
Enabling NGS Analysis with(out) the Infrastructure, 12:0512 Development of a workflow for SNPs detection in grapevine From Sets to Graphs: Towards a Realistic Enrichment Analy species: MAPHiTS -> Integration
More informationHigh-Performance Cloud Computing: A View of Scientific Applications
2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks High-Performance Cloud Computing: A View of Scientific Applications Christian Vecchiola 1, Suraj Pandey 1, and Rajkumar
More informationComparison of computational services at LRZ
Dedicated resources: Housing and virtual Servers Dr. Christoph Biardzki, Group Leader IT Infrastructure and Services 1 Comparison of computational services at LRZ SuperMUC Linux- Cluster Linux-Cluster
More informationIntegration of biospecimen data with clinical data mining
Astrid Genet astrid.genet@hs-furtwangen.de 24 Oct, 2014 The origins of Big Data in biomedicine As in many other fields, recently emerged state-of-the-art biomedical technologies generates huge and heterogeneous
More informationPerformance Guideline for syslog-ng Premium Edition 5 LTS
Performance Guideline for syslog-ng Premium Edition 5 LTS May 08, 2015 Abstract Performance analysis of syslog-ng Premium Edition Copyright 1996-2015 BalaBit S.a.r.l. Table of Contents 1. Preface... 3
More information9. Handling large data
9. Handling large data Thomas Lumley Ken Rice Universities of Washington and Auckland Seattle, June 2011 Large data R is well known to be unable to handle large data sets. Solutions: Get a bigger computer:
More informationFortgeschrittene Computerintensive Methoden
Fortgeschrittene Computerintensive Methoden Einheit 3: mlr - Machine Learning in R Bernd Bischl Matthias Schmid, Manuel Eugster, Bettina Grün, Friedrich Leisch Institut für Statistik LMU München SoSe 2014
More informationHPC data becomes Big Data. Peter Braam peter.braam@braamresearch.com
HPC data becomes Big Data Peter Braam peter.braam@braamresearch.com me 1983-2000 Academia Maths & Computer Science Entrepreneur with startups (5x) 4 startups sold Lustre emerged Held executive jobs with
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationComparison of the High Availability and Grid Options
Comparison of the High Availability and Grid Options 2008 Informatica Corporation Overview This article compares the following PowerCenter options: High availability option. When you configure high availability
More information1R01HG0007078: Privacy-Preserving Sharing and Analysis of Human Genomic Data. XiaoFeng Wang and Haixu Tang, IUB
1R01HG0007078: Privacy-Preserving Sharing and Analysis of Human Genomic Data XiaoFeng Wang and Haixu Tang, IUB Project Objectives Study of Scalable, Privacy-Preserving Data Analysis, particular those for
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationSIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
More informationSURFsara HPC Cloud Workshop
SURFsara HPC Cloud Workshop www.cloud.sara.nl Tutorial 2014-06-11 UvA HPC and Big Data Course June 2014 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current
More informationOptimizing a 3D-FWT code in a cluster of CPUs+GPUs
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la
More informationKerrighed: use cases. Cyril Brulebois. Kerrighed. Kerlabs
Kerrighed: use cases Cyril Brulebois cyril.brulebois@kerlabs.com Kerrighed http://www.kerrighed.org/ Kerlabs http://www.kerlabs.com/ 1 / 23 Introducing Kerrighed What s Kerrighed? Single-System Image (SSI)
More informationAn Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform
An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform A B M Moniruzzaman 1, Kawser Wazed Nafi 2, Prof. Syed Akhter Hossain 1 and Prof. M. M. A. Hashem 1 Department
More informationHPC Cloud. Focus on your research. Floris Sluiter Project leader SARA
HPC Cloud Focus on your research Floris Sluiter Project leader SARA Why an HPC Cloud? Christophe Blanchet, IDB - Infrastructure Distributing Biology: Big task to port them all to your favorite architecture
More informationEnsuring Collective Availability in Volatile Resource Pools via Forecasting
Ensuring Collective Availability in Volatile Resource Pools via Forecasting Artur Andrzejak andrzejak[at]zib.de Derrick Kondo David P. Anderson Zuse Institute Berlin (ZIB) INRIA UC Berkeley Motivation
More informationGC3 Use cases for the Cloud
GC3: Grid Computing Competence Center GC3 Use cases for the Cloud Some real world examples suited for cloud systems Antonio Messina Trieste, 24.10.2013 Who am I System Architect
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationKnowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &
More informationProductivity frameworks in big data image processing computations - creating photographic mosaics with Hadoop and Scalding
Procedia Computer Science Volume 29, 2014, Pages 2306 2314 ICCS 2014. 14th International Conference on Computational Science Productivity frameworks in big data image processing computations - creating
More informationS3IT: Service and Support for Science IT. Scaling R on cloud infrastructure Sergio Maffioletti IS/Cloud S3IT: Service and Support for Science IT
S3IT: Service and Support for Science IT Scaling R on cloud infrastructure Sergio Maffioletti IS/Cloud S3IT: Service and Support for Science IT Zurich, 19.03.2015 Who am I? Sergio Maffioletti: Cloud and
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationRecent Advances in Periscope for Performance Analysis and Tuning
Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,
More informationGPU Renderfarm with Integrated Asset Management & Production System (AMPS)
GPU Renderfarm with Integrated Asset Management & Production System (AMPS) Tackling two main challenges in CG movie production Presenter: Dr. Chen Quan Multi-plAtform Game Innovation Centre (MAGIC), Nanyang
More informationComputing in High- Energy-Physics: How Virtualization meets the Grid
Computing in High- Energy-Physics: How Virtualization meets the Grid Yves Kemp Institut für Experimentelle Kernphysik Universität Karlsruhe Yves Kemp Barcelona, 10/23/2006 Outline: Problems encountered
More informationSURFsara HPC Cloud Workshop
SURFsara HPC Cloud Workshop doc.hpccloud.surfsara.nl UvA workshop 2016-01-25 UvA HPC Course Jan 2016 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current
More informationGenomic CDS: an example of a complex ontology for pharmacogenetics and clinical decision support
Genomic CDS: an example of a complex ontology for pharmacogenetics and clinical decision support Matthias Samwald 1 1 Medical University of Vienna, Vienna, Austria matthias.samwald@meduniwien.ac.at Abstract.
More informationImportance of Statistics in creating high dimensional data
Importance of Statistics in creating high dimensional data Hemant K. Tiwari, PhD Section on Statistical Genetics Department of Biostatistics University of Alabama at Birmingham History of Genomic Data
More informationDesign and Implementation of the Heterogeneous Multikernel Operating System
223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,
More informationSpark: Cluster Computing with Working Sets
Spark: Cluster Computing with Working Sets Outline Why? Mesos Resilient Distributed Dataset Spark & Scala Examples Uses Why? MapReduce deficiencies: Standard Dataflows are Acyclic Prevents Iterative Jobs
More informationControlling the Linux ecognition GRID server v9 from a ecognition Developer client
Controlling the Linux ecognition GRID server v9 from a ecognition Developer client By S. Hese Earth Observation Friedrich-Schiller University Jena 07743 Jena Grietgasse 6 soeren.hese@uni-jena.de Versioning:
More informationA Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
More informationIntroduction to bioknoppix: Linux for the life sciences
Introduction to bioknoppix: Linux for the life sciences Carlos M Rodríguez Rivera Humberto Ortiz Zuazaga Who are we? Short: Bunch of computer geeks. Long: The High Performance Computing facility of the
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationManjrasoft Market Oriented Cloud Computing Platform
Manjrasoft Market Oriented Cloud Computing Platform Aneka Aneka is a market oriented Cloud development and management platform with rapid application development and workload distribution capabilities.
More informationBig Data Processing and Analytics for Mouse Embryo Images
Big Data Processing and Analytics for Mouse Embryo Images liangxiu han Zheng xie, Richard Baldock The AGILE Project team FUNDS Research Group - Future Networks and Distributed Systems School of Computing,
More informationResource Allocation and the Law of Diminishing Returns
esource Allocation and the Law of Diminishing eturns Julia and Igor Korsunsky The law of diminishing returns is a well-known topic in economics. The law states that the output attributed to an individual
More informationMatlab on a Supercomputer
Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides
More informationIntelligent Heuristic Construction with Active Learning
Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field
More informationHow can new technologies can be of service to astronomy? Community effort
1 Astronomy must develop new computational model Integration and processing of data will be done increasingly on distributed facilities rather than desktops Wonderful opportunity for the next generation!
More informationHadoop Parallel Data Processing
MapReduce and Implementation Hadoop Parallel Data Processing Kai Shen A programming interface (two stage Map and Reduce) and system support such that: the interface is easy to program, and suitable for
More informationBig Data Analytics and Healthcare
Big Data Analytics and Healthcare Anup Kumar, Professor and Director of MINDS Lab Computer Engineering and Computer Science Department University of Louisville Road Map Introduction Data Sources Structured
More informationFinal Project Proposal. CSCI.6500 Distributed Computing over the Internet
Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least
More informationChapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup
Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to
More informationDevelopment of Monitoring and Analysis Tools for the Huawei Cloud Storage
Development of Monitoring and Analysis Tools for the Huawei Cloud Storage September 2014 Author: Veronia Bahaa Supervisors: Maria Arsuaga-Rios Seppo S. Heikkila CERN openlab Summer Student Report 2014
More informationHOW CLOUD DATABASE ENABLES EFFICIENT REAL-TIME ANALYTICS?
HOW CLOUD DATABASE ENABLES EFFICIENT REAL-TIME ANALYTICS? DATA MANAGEMENT MATTERS Worldwide data volumes keep growing Real time management of big data Return result in milliseconds Deals with TBs to PBs
More informationModel Selection. Introduction. Model Selection
Model Selection Introduction This user guide provides information about the Partek Model Selection tool. Topics covered include using a Down syndrome data set to demonstrate the usage of the Partek Model
More informationReal-time distributed Complex Event Processing for Big Data scenarios
Institute of Parallel and Distributed Systems () Universitätsstraße 38 D-70569 Stuttgart Real-time distributed Complex Event Processing for Big Data scenarios Ruben Mayer Motivation: New Applications in
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationMap-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Multicore Processors
Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Sudarsanam P Abstract G. Singaravel Parallel computing is an base mechanism for data process with scheduling task,
More informationOutdated Architectures Are Holding Back the Cloud
Outdated Architectures Are Holding Back the Cloud Flash Memory Summit Open Tutorial on Flash and Cloud Computing August 11,2011 Dr John R Busch Founder and CTO Schooner Information Technology JohnBusch@SchoonerInfoTechcom
More informationA Cloud Computing Approach for Big DInSAR Data Processing
A Cloud Computing Approach for Big DInSAR Data Processing through the P-SBAS Algorithm Zinno I. 1, Elefante S. 1, Mossucca L. 2, De Luca C. 1,3, Manunta M. 1, Terzo O. 2, Lanari R. 1, Casu F. 1 (1) IREA
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationHadoopizer : a cloud environment for bioinformatics data analysis
Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,
More informationBootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
More informationApplication Performance Analysis of the Cortex-A9 MPCore
This project in ARM is in part funded by ICT-eMuCo, a European project supported under the Seventh Framework Programme (7FP) for research and technological development Application Performance Analysis
More informationDynamic Extension of a Virtualized Cluster by using Cloud Resources CHEP 2012
Dynamic Extension of a Virtualized Cluster by using Cloud Resources CHEP 2012 Thomas Hauth,, Günter Quast IEKP KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz
More informationCHAPTER FIVE RESULT ANALYSIS
CHAPTER FIVE RESULT ANALYSIS 5.1 Chapter Introduction 5.2 Discussion of Results 5.3 Performance Comparisons 5.4 Chapter Summary 61 5.1 Chapter Introduction This chapter outlines the results obtained from
More informationManjrasoft Market Oriented Cloud Computing Platform
Manjrasoft Market Oriented Cloud Computing Platform Innovative Solutions for 3D Rendering Aneka is a market oriented Cloud development and management platform with rapid application development and workload
More informationNoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB
bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de info@bankmark.de T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,
More informationGeoImaging Accelerator Pansharp Test Results
GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance
More informationComparing Methods for Identifying Transcription Factor Target Genes
Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF
More informationScheduling and Load Balancing in the Parallel ROOT Facility (PROOF)
Scheduling and Load Balancing in the Parallel ROOT Facility (PROOF) Gerardo Ganis CERN E-mail: Gerardo.Ganis@cern.ch CERN Institute of Informatics, University of Warsaw E-mail: Jan.Iwaszkiewicz@cern.ch
More informationHigh-Throughput Computing for HPC
Intelligent HPC Workload Management Convergence of high-throughput computing (HTC) with high-performance computing (HPC) Table of contents 3 Introduction 3 The Bottleneck in High-Throughput Computing 3
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationDynamicCloudSim: Simulating Heterogeneity in Computational Clouds
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds Marc Bux, Ulf Leser {bux leser}@informatik.hu-berlin.de The 2nd international workshop on Scalable Workflow Enactment Engines and Technologies
More informationParallel Programming
Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix
More informationA SURVEY ON MAPREDUCE IN CLOUD COMPUTING
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING Dr.M.Newlin Rajkumar 1, S.Balachandar 2, Dr.V.Venkatesakumar 3, T.Mahadevan 4 1 Asst. Prof, Dept. of CSE,Anna University Regional Centre, Coimbatore, newlin_rajkumar@yahoo.co.in
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationStorage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
More informationConvex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics
Convex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics Sabeur Aridhi Aalto University, Finland Sabeur Aridhi Frameworks for Big Data Analytics 1 / 59 Introduction Contents 1 Introduction
More informationOptimization of a parallel permutation testing function for the SPRINT R package
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2011; 23:2258 2268 Published online 23 June 2011 in Wiley Online Library (wileyonlinelibrary.com)..1787 SPECIAL
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationOnline Failure Prediction in Cloud Datacenters
Online Failure Prediction in Cloud Datacenters Yukihiro Watanabe Yasuhide Matsumoto Once failures occur in a cloud datacenter accommodating a large number of virtual resources, they tend to spread rapidly
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
More informationowncloud Enterprise Edition on IBM Infrastructure
owncloud Enterprise Edition on IBM Infrastructure A Performance and Sizing Study for Large User Number Scenarios Dr. Oliver Oberst IBM Frank Karlitschek owncloud Page 1 of 10 Introduction One aspect of
More informationStorage Solutions for Bioinformatics
Storage Solutions for Bioinformatics Li Yan Director of FlexLab, Bioinformatics core technology laboratory liyan3@genomics.cn http://www.genomics.cn/flexlab/index.html Science and Technology Division,
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING
ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING Sonam Mahajan 1 and Maninder Singh 2 1 Department of Computer Science Engineering, Thapar University, Patiala, India 2 Department of Computer Science Engineering,
More informationRemoving Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data
Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University HiCOMB 2014 May 19 th, Phoenix, Arizona 1 Outline
More informationReducing Storage TCO With Private Cloud Storage
Prepared by: Colm Keegan, Senior Analyst Prepared: October 2014 With the burgeoning growth of data, many legacy storage systems simply struggle to keep the total cost of ownership (TCO) in check. This
More informationIn-Memory BigData. Summer 2012, Technology Overview
In-Memory BigData Summer 2012, Technology Overview Company Vision In-Memory Data Processing Leader: > 5 years in production > 100s of customers > Starts every 10 secs worldwide > Over 10,000,000 starts
More informationNetBoot Fundamentals and Customizations
NetBoot Fundamentals and Customizations May 3, 2012 Justin Elliott Penn State University MacAdmins Conference 2011 2012 Quick Audience Survey Who s new to Mac systems imaging and deployment? Who has used
More informationIn-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015
In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015 June 29-30, 2015 Contacts Alexandre Boudnik Senior Solution Architect, EPAM Systems Alexandre_Boudnik@epam.com
More information