Challenges of Managing Scientific Workflows in High-Throughput and High- Performance Computing Environments

Size: px
Start display at page:

Download "Challenges of Managing Scientific Workflows in High-Throughput and High- Performance Computing Environments"

Transcription

1 Challenges of Managing Scientific Workflows in High-Throughput and High- Performance Computing Environments Ewa Deelman USC Informa1on Sciences Ins1tute h8p:// Funding from DOE, NSF, and NIH

2 Community Archives: Galactic Plane Atlas 2015! 18 million input images (~2.5 TB)! 900 output images (2.5 GB each, 2.4 TB total)! Measuring the global star formation rate in the galaxy! Studying the energetics of the interaction of molecular clouds with the interstellar medium! Determining whether coagulation or fragmentation governs the formation of massive stars! Assessing the supernova rate in the Galaxy Bruce Berriman John Good

3 Southern California Earthquake Center CME 2010: CyberShake 1.0, the world s first physics-based probabilistic seismic hazard map. Tom Jordan CyberShake Scott Callaghan Phil Maechling

4 2011 Duncan Brown Peter Couvares

5 Improving Soybean Productivity: SoyKB, Results made available to the community SNP and indels calling,. quality assessment, genomic annotation, etc. Trupti Joshi

6 Outline! Pegasus From Virtual Data to Workflows! Challenges in Managing Workflows in Distributed Environments! Challenges in Managing Workflows in HPC systems! Conclusions

7 GriPhyN Project , the beginning of Pegasus Carl Kesselman Ian Foster Miron Livny Paul Avery

8 LIGO Prototype 2001 Gaurang Mehta Recipe of how to generate virtual data " workflow Kent Blackburn Albert Lazzarini Roy Williams

9 Demonstration at SC 2001 From Virtual Data Concepts to Design decisions Abstract Workflows Resource discovery Planning Separation of planning and execution Leveraged remote job submission

10 First Mention of Pegasus in Print 2002 Jim Blythe Yolanda Gil Exploration of using AI planning technologies for workflow mapping, semantic template-based composition, metadata propagation

11 2002: Pegasus: the concept of data reuse Karan Vahi Introduced the notion of data reduction And later workflow-level checkpointing

12 Leveraging existing technologies! DAGMan workflow execution engine, since day 1! Keeps track of job dependencies! Through Condor schedd/condor-g submits jobs to remote resources Manages individual job execution Provides detailed execution logs! Scalability: can handle millions of jobs Miron Livny Pegasus Grants: 2007 and 2012 (NSF)! Reliability: Job retries Rescue DAG! Provides user support Kent Wenger

13 2002: National Virtual Observatory Alex Szalay JHU Input Developed solutions for uniform access to astronomy archives Developed and promoted tools to support astronomy data analysis Montage a major NVO tool, important workflow benchmark Jim Gray Bruce Berriman Science-grade Mosaic of the Sky Reprojection Background Rectification Co-addition Output John Good Image1 Project Background Diff Fitplane Image2 Project BgModel Background Add Diff Fitplane Image3 Project Background Montage Workflow

14 Workflows can be simple! J1 J2 J3 J4 J5 J6 J7 J8 J9 Jn

15 Science-grade Mosaic of the Sky Size of mosaic in degrees square Number of input data files Number of tasks Number of intermediate files Amazon M1 large with 2 cores Total data footprint Cummulative wall time GB 11 mins GB 43 mins GB 1 hour, 56 mins GB 3 hours, 42 mins GB 6 hours, 45 mins

16 Workflows have different computational needs SoCal Map needs 239 of those MPI codes ~ 12,000 CPU hours, Post Processing 2,000 CPU hours Data footprint ~ 800GB

17 Desired Workflow Management Properties! Submit locally, run globally! Automate to the max! Make use of the right resources! Provide reliability and performance time to solution! Help inspect the results! Provide performance information

18 Outline! Scientific Workflows! Pegasus From Virtual Data to Workflows! Pegasus Today! Challenges in Managing Workflows in Distributed Environments! Challenges in Managing Workflows in HPC systems! Conclusions

19 Related Work Workflow systems [Goble et al 2007] [Ludaescher et al 2007] [Bavoil et al 2005] [Marru et al 2007] [Mesirov et al 2009] [Rex et al, 2003] [Taylor et al 2005] [Yu et al 2010] [Zhao et al 2007] Unique contribution: Planning and Optimization of workflow execution

20 Our Approach # Analysis Representation # Support a declarative representation for the workflow # Represent the workflow structure as a Directed Acyclic Graph (DAG) # Tasks operate on files # Use hierarchical representations to achieve scalability # System (Plan for the resources, Execute the Plan, Manage tasks) # Layered architecture, each layer is responsible for a particular function # Mask errors at different levels of the system # Modular, composed of well-defined components, where different components can be swapped in # Use and adapt existing graph and other relevant algorithms

21 Our Approach: Submit locally, Compute globally Data Storage Work definition data Campus Cluster NSF XSEDE DOE Leadership-class Systems Open Science Grid Workflow Management System work Academic Clouds: FutureGrid, Cameleon, etc Amazon/Google Cloud Local Resource

22 Pegasus Workflow Management System! A workflow compiler! Input: abstract workflow description, resource-independent! Auxiliary Info (catalogs): available resources, data, codes! Output: executable workflow with concrete resources! Automatically locates physical locations for both workflow tasks and data! Transforms the workflow for performance and reliability! A workflow engine (DAGMan)! Executes the workflow on local or distributed resources (HPC, clouds)! Task executables are wrapped with pegasus-kickstart and managed by Condor schedd! Provenance and execution traces are collected and stored! Traces and DB can be mined for performance and overhead information

23 Outline! Pegasus From Virtual Data to Workflows! Pegasus Today! Challenges in Managing Workflows in Distributed Environments! Challenges in Managing Workflows in HPC systems! Conclusions

24 Challenges in High-Throughput Infrastructures! Failures in the execution environment or application! Data storage limitations on execution sites! Performance Small workflow tasks! Heterogeneous execution architectures Different file systems (shared/non-shared) Different system architectures (Cray XT, Blue Gene, ) Mismatch between tasks and architecture

25 Generating executable workflows (DAX) APIs for workflow specification (DAX--- DAG in XML) Java, Perl, Python Information Catalogs 25

26 Data reuse for Performance and Reliability Sometimes intermediate data is already available Want to restart the workflow from where it left off f.ip f.ip f.ip A A A f.a B f.a C f.a B f.a C f.a C Workflow Reduction f.b D f.c E f.b D f.c E f.c E Data Reuse f.d F f.e f.d F f.e f.d F f.e Workflow-level checkpointing f.out f.out f.out Abstract Workflow File f.d exists somewhere. Reuse it. Mark Jobs D and B to delete Delete Job D and Job B

27 Storage limitations Small amount of space Gurmeet Singh Automatically add tasks to clean up data no longer needed LIGO was running on Open Science Grid resources, processing TBs of data within a single workflow

28 Montage Astronomy Workflow Rizos Sakellariou 1.25GB versus 700 MB Arun Ramakrishnan

29 Full workflow: 185,000 nodes 466,000 edges 10 TB of input data 1 TB of output data. 166 nodes LIGO Workflows Need additional restructuring 26% improvement 56% improvement

30 Storage limitations Variety of file system deployments: shared vs non-shared User workflow Mats Rynge Allows us to run easily on Open Science Grid, Clouds

31 Workflow Restructuring to improve application performance! Cluster small running jobs together to achieve better performance! Issues Each job has scheduling overheads Execution sites have limits on the number of job submissions Clustered tasks can reuse common input data less data transfers A A B C B C B C B C B C B C B C B C Level-based clustering Label-based clustering Time-based clustering D D Gurmeet Singh

32 Explored Different Clustering methods via Simulation and experimentation with Pegasus! WorkflowSim, available on github! Mimics a Workflow Management System Workflow Mapper, Workflow Engine, Clustering Engine, Workflow Scheduler To support research in robustness! System Overhead Red.! Monetary Cost! Energy usage Weiwei Chen

33 Southern California Earthquake Center % Description CyberShake PSHA Workflow $ Builders ask seismologists: What will the peak ground motion be at my new building in the next 50 years? $ Seismologists answer this question using Probabilistic Seismic Hazard Analysis (PSHA) 239 Workflows! Each site in the input map corresponds to one workflow! Each workflow has: $ 820,000 tasks 2009

34 Evolution of SCEC CyberShake Probabilistic Seismic Hazard Analysis: What will the peak ground motion be at my house in the next 50 years? Each map = 286 sites, 30M SUs Titan (pilots) Blue Waters

35 CyberShake: Computing Needs Over Time 16 Million Files

36 A A Solutions B B B B B B B B Cluster tasks C C C C C C C C tasks Pilot Job D Use pilot jobs to dynamically provision a number of resources at a time D time Develop an MPI-based workflow management engine to manage sub-workflows

37 Custom Workflow Engines Needed for Different Execution Sites! For single tasks! For clustered tasks Gideon Juve! For non-shared file system environments! Pegasus MPI-Cluster for HPC systems A master/worker task scheduler for running fine-grained workflows on batch systems Runs as an MPI job Uses MPI to implement master/worker protocol Works on most HPC systems Requires: MPI, a shared file system, and fork() Allows sub-graphs of a Pegasus workflow to be submitted as monolithic MPI jobs to remote resources

38 CyberShake: Addressing Computing Challenges Hierarchical Workflows, Task Clustering/ Glideins + Corral frontend Hierarchical Workflows, PMC, merged SeisPSA jobs, in-memory rupture variation generation Use of GPU and CPU executables PMC Core days Makespan NCSA Mercury TACC Ranger USC/HPCC Ranger NICS Kraken Blue Waters Blue Waters Stampede Blue Waters

39 Pegasus Releases: Interleaving Research and Software Development Nightly builds Demonstra*on at SC'02 Support for GT4 Bug fixes New par**oning and clustering First standalone Pegasus release Ini*al AWS Cloud support Hierachical workflows User no*fica*ons, bener debugging tools Pegasus MPI- cluster Online monitoring dashboard Builds done using Docker Development started Research: Data Cleanup, 2003 Data footprint: 2007, 2013 Cloud: 2008 Data Cleanup Support for RC and MDS Stable release Workflow Par**oning Task clustering, Stable release Moved to NMI B&T Pegasus Lite Ensemble Manager, Google cloud Major performance improvements Moved code to GitHub New data transfer tools, pegasus- sta*s*cs, pegasus- plots Python API for crea*ng workflows

40 Outline! Pegasus From Virtual Data to Workflows! Pegasus Today! Challenges in Managing Workflows in Distributed Environments! Challenges in Managing Workflows in HPC systems! Conclusions

41 Trends in HPC scaling that will affect applications and runtime software! Increased power awareness Future HPC systems need to reduce power consumption (O(100))! Deep memory hierarchies DRAM, NVRAM, SSD, HDD, long-term storage Increased gap between memory access time and FLOPS! Heterogeneous architecture Billions of elements CPUs and GPUs! Novel I/O solutions as part of overall architecture Burst buffers (storage with computing attached)! More faults Would need more checkpointing, but that s expensive

42 Increased power awareness! Data movement is a significant part of the overall energy consumption! Data movement within the HPC system needs to be carefully coordinated Implies that data delivery to/from the system needs to be poweraware as well Need to develop new algorithms that take into data locality on the system! Some computations need to be done in-situ without writing to storage In-situ workflows which coordinate processing and visualization SCEC: in-situ post-processing Need to decide which data to keep, impacts provenance as well

43 What we learned in Distributed Area WMS We can apply to HPC application management! Reliability: WMS deal with: task failures, problems accessing data, resource failures, and others. Investigate how data replication techniques can be used to improve fault tolerance, while minimizing the impact on energy consumption Explore tradeoffs between data re-computation and data retrieval from DRAM/NVM/disk (time to solution and energy consumption)! Provenance Capture and Reproducibility: WMS capture provenance information about the creation, planning, and execution Up to now, the approach has been to save everything problem Provenance capture may need to adapt to the behavior of the application (coarse and fine levels of details, compression) May want to automatically re-run parts of the computation and reproduce the results and a more detailed provenance trail on demand! What we need: Workflow/Applications Performance/Behavior Modeling across scales

44 Issues in Workflow Performance Modeling and Prediction! Workflows involve Diverse applications A number of heterogeneous, distributed resources (compute, storage, networks) A layered software stack: Workflow engine, local scheduler, remote scheduler, runtime system Wide-area data movement services, on-site file system, memory! Varied and sometimes sparse monitoring systems Usually good network data Usually good low-level performance data on compute nodes Poor scheduler, data storage data! Lack of end-to-end performance/resource usage models! Lack of benchmarks that exercise the end-to-end system! Lack of traces that capture the workflow characteristics

45 To advance the Science of Workflows we need! Workflow/Applications Performance/Behavior Modeling: Understand the resource needs and behavior (performance, energy usage) of the workflow applications across scales: workflow ensemble, workflow instance, down to individual tasks and code segments Need community benchmarks, execution traces and profiles a beginning, has 11 workflow applications, most with multiple runs, synthetic workflow generator, simulator for the wide area Beginning of a performance archive Methodologies for predicting the behavior of applications On current infrastructures On future infrastructures That can inform future infrastructures! Fundamentally new software and software compositions that can seamlessly work across the scales, particularly in the data management area (wide area networks, deep memory hierarchies)

46 To Advance Computing for Science we need to Enhance the ease of use of software tools Can you do science as an app? Should we take a new look at Virtual Data? Still much more to do!

47 How to make progress?! Apply your knowledge to real world problems! Abstract real problems to generate new techniques and knowledge! Interleave software development and research! Test ideas in real settings! Exercise patience with yourself, explain the need for patience to funding agencies

Tricks of the Trade for Running Workflows on HPC Systems Gideon Juve

Tricks of the Trade for Running Workflows on HPC Systems Gideon Juve Tricks of the Trade for Running Workflows on HPC Systems Gideon Juve Informa(on Sciences Ins(tute University of Southern California gideon@isi.edu Scientific Workflows Enable automation of complex, multi-step

More information

Hosted Science: Managing Computational Workflows in the Cloud. Ewa Deelman USC Information Sciences Institute

Hosted Science: Managing Computational Workflows in the Cloud. Ewa Deelman USC Information Sciences Institute Hosted Science: Managing Computational Workflows in the Cloud Ewa Deelman USC Information Sciences Institute http://pegasus.isi.edu deelman@isi.edu The Problem Scientific data is being collected at an

More information

Creating A Galactic Plane Atlas With Amazon Web Services

Creating A Galactic Plane Atlas With Amazon Web Services Creating A Galactic Plane Atlas With Amazon Web Services G. Bruce Berriman 1*, Ewa Deelman 2, John Good 1, Gideon Juve 2, Jamie Kinney 3, Ann Merrihew 3, and Mats Rynge 2 1 Infrared Processing and Analysis

More information

Data Sharing Options for Scientific Workflows on Amazon EC2

Data Sharing Options for Scientific Workflows on Amazon EC2 Data Sharing Options for Scientific Workflows on Amazon EC2 Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta, Benjamin P. Berman, Bruce Berriman, Phil Maechling Francesco Allertsen Vrije Universiteit

More information

On the Use of Cloud Computing for Scientific Workflows

On the Use of Cloud Computing for Scientific Workflows On the Use of Cloud Computing for Scientific Workflows Christina Hoffa 1, Gaurang Mehta 2, Timothy Freeman 3, Ewa Deelman 2, Kate Keahey 3, Bruce Berriman 4, John Good 4 1 Indiana University, 2 University

More information

How can new technologies can be of service to astronomy? Community effort

How can new technologies can be of service to astronomy? Community effort 1 Astronomy must develop new computational model Integration and processing of data will be done increasingly on distributed facilities rather than desktops Wonderful opportunity for the next generation!

More information

Data Management Challenges of Data-Intensive Scientific Workflows

Data Management Challenges of Data-Intensive Scientific Workflows Data Management Challenges of Data-Intensive Scientific Workflows Ewa Deelman, Ann Chervenak USC Information Sciences Institute, Marina Del Rey, CA 90292 deelman@isi.edu, annc@isi.edu Abstract Scientific

More information

SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION

SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION Kirandeep Kaur Khushdeep Kaur Research Scholar Assistant Professor, Department Of Cse, Bhai Maha Singh College Of Engineering, Bhai Maha Singh

More information

Grids and Clouds: Making Workflow Applications Work in Heterogeneous Distributed Environments

Grids and Clouds: Making Workflow Applications Work in Heterogeneous Distributed Environments International Journal of High Performance Computing Applications OnlineFirst, published on December 4, 2009 as doi:10.1177/1094342009356432 1 Introduction Grids and Clouds: Making Workflow Applications

More information

Rapid 3D Seismic Source Inversion Using Windows Azure and Amazon EC2

Rapid 3D Seismic Source Inversion Using Windows Azure and Amazon EC2 Rapid 3D Seismic Source Inversion Using Windows Azure and Amazon EC2 Vedaprakash Subramanian, Hongyi Ma, and Liqiang Wang Department of Computer Science University of Wyoming {vsubrama, hma3, wang}@cs.uwyo.edu

More information

Rethinking Data Management for Big Data Scientific Workflows

Rethinking Data Management for Big Data Scientific Workflows Rethinking Data Management for Big Data Scientific Workflows Karan Vahi, Mats Rynge, Gideon Juve, Rajiv Mayani, Ewa Deelman Information Sciences Institute - University of Southern California Marina Del

More information

Rethinking Data Management for Big Data Scientific Workflows

Rethinking Data Management for Big Data Scientific Workflows Rethinking Data Management for Big Data Scientific Workflows Karan Vahi, Mats Rynge, Gideon Juve, Rajiv Mayani, Ewa Deelman Information Sciences Institute - University of Southern California Marina Del

More information

A General Approach to Real-time Workflow Monitoring Karan Vahi, Ewa Deelman, Gaurang Mehta, Fabio Silva

A General Approach to Real-time Workflow Monitoring Karan Vahi, Ewa Deelman, Gaurang Mehta, Fabio Silva A General Approach to Real-time Workflow Monitoring Karan Vahi, Ewa Deelman, Gaurang Mehta, Fabio Silva USC Information Sciences Institute Ian Harvey, Ian Taylor, Kieran Evans, Dave Rogers, Andrew Jones,

More information

The Case for Resource Sharing in Scientific Workflow Executions

The Case for Resource Sharing in Scientific Workflow Executions The Case for Resource Sharing in Scientific Workflow Executions Ricardo Oda, Daniel Cordeiro, Rafael Ferreira da Silva 2 Ewa Deelman 2, Kelly R. Braghetto Instituto de Matemática e Estatística Universidade

More information

Connecting Scientific Data to Scientific Experiments with Provenance

Connecting Scientific Data to Scientific Experiments with Provenance Connecting Scientific Data to Scientific Experiments with Provenance Simon Miles 1, Ewa Deelman 2, Paul Groth 3, Karan Vahi 2, Gaurang Mehta 2, Luc Moreau 3 1 Department of Computer Science, King s College

More information

Scientific Workflow Applications on Amazon EC2

Scientific Workflow Applications on Amazon EC2 Scientific Workflow Applications on Amazon EC2 Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta USC Information Sciences Institute {gideon,deelman,vahi,gmehta}@isi.edu Bruce Berriman NASA Exoplanet

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

The Application of Cloud Computing to Scientific Workflows: A Study of Cost and Performance

The Application of Cloud Computing to Scientific Workflows: A Study of Cost and Performance The Application of Cloud Computing to Scientific Workflows: A Study of Cost and Performance G. Bruce Berriman, Ewa Deelman, Gideon Juve, Mats Rynge and Jens-S. Vöckler G. Bruce Berriman Infrared Processing

More information

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman A Very Brief Introduction To Cloud Computing Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman What is The Cloud Cloud computing refers to logical computational resources accessible via a computer

More information

PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D. PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD Natasha Balac, Ph.D. Brief History of SDSC 1985-1997: NSF national supercomputer center; managed by General Atomics

More information

Building Platform as a Service for Scientific Applications

Building Platform as a Service for Scientific Applications Building Platform as a Service for Scientific Applications Moustafa AbdelBaky moustafa@cac.rutgers.edu Rutgers Discovery Informa=cs Ins=tute (RDI 2 ) The NSF Cloud and Autonomic Compu=ng Center Department

More information

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Scientific Computing Meets Big Data Technology: An Astronomy Use Case Scientific Computing Meets Big Data Technology: An Astronomy Use Case Zhao Zhang AMPLab and BIDS UC Berkeley zhaozhang@cs.berkeley.edu In collaboration with Kyle Barbary, Frank Nothaft, Evan Sparks, Oliver

More information

Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice

Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice Eddy Caron 1, Frédéric Desprez 2, Adrian Mureșan 1, Frédéric Suter 3, Kate Keahey 4 1 Ecole Normale Supérieure de Lyon, France

More information

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Introduction to Arvados. A Curoverse White Paper

Introduction to Arvados. A Curoverse White Paper Introduction to Arvados A Curoverse White Paper Contents Arvados in a Nutshell... 4 Why Teams Choose Arvados... 4 The Technical Architecture... 6 System Capabilities... 7 Commitment to Open Source... 12

More information

The application of cloud computing to scientific workflows: a study of cost and performance

The application of cloud computing to scientific workflows: a study of cost and performance rsta.royalsocietypublishing.org Review Cite this article: Berriman GB, Deelman E, Juve G, Rynge M, Vöckler J-S. 213 The application of cloud computing to scientific workflows: a study of cost and performance.

More information

The Cost of Doing Science on the Cloud: The Montage Example

The Cost of Doing Science on the Cloud: The Montage Example The Cost of Doing Science on the Cloud: The Montage Example Ewa Deelman 1, Gurmeet Singh 1, Miron Livny 2, Bruce Berriman 3, John Good 4 1 USC Information Sciences Institute, Marina del Rey, CA 2 University

More information

Looking into the Future of Workflows: The Challenges Ahead

Looking into the Future of Workflows: The Challenges Ahead Looking into the Future of Workflows: The Challenges Ahead Ewa Deelman Contributors: Bruce Berriman, Thomas Fahringer, Dennis Gannon, Carole Goble, Andrew Jones, Miron Livny, Philip Maechling, Steven McGough,

More information

BSC vision on Big Data and extreme scale computing

BSC vision on Big Data and extreme scale computing BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,

More information

Peer-to-Peer Data Sharing for Scientific Workflows on Amazon EC2

Peer-to-Peer Data Sharing for Scientific Workflows on Amazon EC2 Peer-to-Peer Data Sharing for Scientific Workflows on Amazon EC2 Rohit Agarwal Department of Computer Science and Engineering Indian Institute of Technology, Ropar ragarwal@iitrpr.ac.in Abstract In this

More information

Violin: A Framework for Extensible Block-level Storage

Violin: A Framework for Extensible Block-level Storage Violin: A Framework for Extensible Block-level Storage Michail Flouris Dept. of Computer Science, University of Toronto, Canada flouris@cs.toronto.edu Angelos Bilas ICS-FORTH & University of Crete, Greece

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen Concepts and Architecture of Grid Computing Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Grid users: who are they? Concept of the Grid Challenges for the Grid Evolution of Grid systems

More information

What is Analytic Infrastructure and Why Should You Care?

What is Analytic Infrastructure and Why Should You Care? What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group grossman@uic.edu ABSTRACT We define analytic infrastructure to be the services,

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent

More information

Software challenges in the implementation of large surveys: the case of J-PAS

Software challenges in the implementation of large surveys: the case of J-PAS Software challenges in the implementation of large surveys: the case of J-PAS 1/21 Paulo Penteado - IAG/USP pp.penteado@gmail.com http://www.ppenteado.net/ast/pp_lsst_201204.pdf (K. Taylor) (A. Fernández-Soto)

More information

Data Sharing Options for Scientific Workflows on Amazon EC2

Data Sharing Options for Scientific Workflows on Amazon EC2 Data Sharing Options for Scientific Workflows on Amazon EC2 Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta USC Information Sciences Institute {gideon,deelman,vahi,gmehta}@isi.edu Bruce Berriman NASA

More information

Semantic Workflows and the Wings Workflow System

Semantic Workflows and the Wings Workflow System To Appear in AAAI Fall Symposium on Proactive Assistant Agents, Arlington, VA, November 2010. Assisting Scientists with Complex Data Analysis Tasks through Semantic Workflows Yolanda Gil, Varun Ratnakar,

More information

The Application of Cloud Computing to Astronomy: A Study of Cost and Performance

The Application of Cloud Computing to Astronomy: A Study of Cost and Performance The Application of Cloud Computing to Astronomy: A Study of Cost and Performance G. Bruce Berriman Infrared Processing and Analysis Center California Institute of Technology Pasadena, California, USA gbb@ipac.caltech.edu

More information

From Data to Knowledge to Discoveries: Scientific Workflows and Artificial Intelligence

From Data to Knowledge to Discoveries: Scientific Workflows and Artificial Intelligence To appear in Scientific Programming, Volume 16, Number 4, 2008. From Data to Knowledge to Discoveries: Scientific Workflows and Artificial Intelligence Yolanda Gil Information Sciences Institute University

More information

Intelligent Workflow Systems and Provenance-Aware Software

Intelligent Workflow Systems and Provenance-Aware Software International Environmental Modelling and Software Society (iemss) 7th Intl. Congress on Env. Modelling and Software, San Diego, CA, USA, Daniel P. Ames, Nigel W.T. Quinn and Andrea E. Rizzoli (Eds.) http://www.iemss.org/society/index.php/iemss-2014-proceedings

More information

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain

More information

Cloud Computing. Lectures 3 and 4 Grid Schedulers: Condor 2014-2015

Cloud Computing. Lectures 3 and 4 Grid Schedulers: Condor 2014-2015 Cloud Computing Lectures 3 and 4 Grid Schedulers: Condor 2014-2015 Up until now Introduction. Definition of Cloud Computing. Grid Computing: Schedulers: Condor architecture. Summary Condor: user perspective.

More information

- Behind The Cloud -

- Behind The Cloud - - Behind The Cloud - Infrastructure and Technologies used for Cloud Computing Alexander Huemer, 0025380 Johann Taferl, 0320039 Florian Landolt, 0420673 Seminar aus Informatik, University of Salzburg Overview

More information

Part I Courses Syllabus

Part I Courses Syllabus Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Planning for workflow construction and maintenance on the Grid

Planning for workflow construction and maintenance on the Grid Planning for workflow construction and maintenance on the Grid Jim Blythe, Ewa Deelman, Yolanda Gil USC Information Sciences Institute 4676 Admiralty Way Marina del Rey, CA 90292 USA {blythe,deelman,gil}@isi.edu

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

XSEDE Science Gateway Use Cases

XSEDE Science Gateway Use Cases 25th October 2012 Version 0.4 Page 1 Table of Contents A. Document History B. Document Scope C. Science Gateway Use Cases D. Foundational (general XSEDE) use case that is a prerequisite for one of the

More information

Scientific Workflows in the Cloud

Scientific Workflows in the Cloud Scientific Workflows in the Cloud Gideon Juve and Ewa Deelman Abstract The development of cloud computing has generated significant interest in the scientific computing community. In this chapter we consider

More information

The Virtual Grid Application Development Software (VGrADS) Project

The Virtual Grid Application Development Software (VGrADS) Project The Virtual Grid Application Development Software (VGrADS) Project VGrADS: Enabling e-science Workflows on Grids and Clouds with Fault Tolerance http://vgrads.rice.edu/ VGrADS Goal: Distributed Problem

More information

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds DynamicCloudSim: Simulating Heterogeneity in Computational Clouds Marc Bux, Ulf Leser {bux leser}@informatik.hu-berlin.de The 2nd international workshop on Scalable Workflow Enactment Engines and Technologies

More information

Chapter 18: Database System Architectures. Centralized Systems

Chapter 18: Database System Architectures. Centralized Systems Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

Extending Hadoop beyond MapReduce

Extending Hadoop beyond MapReduce Extending Hadoop beyond MapReduce Mahadev Konar Co-Founder @mahadevkonar (@hortonworks) Page 1 Bio Apache Hadoop since 2006 - committer and PMC member Developed and supported Map Reduce @Yahoo! - Core

More information

Active Code Generation and Visual Feedback for Scientific Workflows using Tigres

Active Code Generation and Visual Feedback for Scientific Workflows using Tigres Active Code Generation and Visual Feedback for Scientific Workflows using Tigres Ryan A. Rodriguez Lawrence Berkeley National Lab Berkeley, CA Submitted to the TRUST REU Email: {ryanrodriguez}@lbl.gov

More information

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber Introduction to grid technologies, parallel and cloud computing Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber OUTLINES Grid Computing Parallel programming technologies (MPI- Open MP-Cuda )

More information

Visualization and Data Analysis

Visualization and Data Analysis Working Group Outbrief Visualization and Data Analysis James Ahrens, David Rogers, Becky Springmeyer Eric Brugger, Cyrus Harrison, Laura Monroe, Dino Pavlakos Scott Klasky, Kwan-Liu Ma, Hank Childs LLNL-PRES-481881

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

Introduction to LSST Data Management. Jeffrey Kantor Data Management Project Manager

Introduction to LSST Data Management. Jeffrey Kantor Data Management Project Manager Introduction to LSST Data Management Jeffrey Kantor Data Management Project Manager LSST Data Management Principal Responsibilities Archive Raw Data: Receive the incoming stream of images that the Camera

More information

HYBRID WORKFLOW POLICY MANAGEMENT FOR HEART DISEASE IDENTIFICATION DONG-HYUN KIM *1, WOO-RAM JUNG 1, CHAN-HYUN YOUN 1

HYBRID WORKFLOW POLICY MANAGEMENT FOR HEART DISEASE IDENTIFICATION DONG-HYUN KIM *1, WOO-RAM JUNG 1, CHAN-HYUN YOUN 1 HYBRID WORKFLOW POLICY MANAGEMENT FOR HEART DISEASE IDENTIFICATION DONG-HYUN KIM *1, WOO-RAM JUNG 1, CHAN-HYUN YOUN 1 1 Department of Information and Communications Engineering, Korea Advanced Institute

More information

Hybrid Software Architectures for Big Data. Laurence.Hubert@hurence.com @hurence http://www.hurence.com

Hybrid Software Architectures for Big Data. Laurence.Hubert@hurence.com @hurence http://www.hurence.com Hybrid Software Architectures for Big Data Laurence.Hubert@hurence.com @hurence http://www.hurence.com Headquarters : Grenoble Pure player Expert level consulting Training R&D Big Data X-data hot-line

More information

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center

Intro to Data Management. Chris Jordan Data Management and Collections Group Texas Advanced Computing Center Intro to Data Management Chris Jordan Data Management and Collections Group Texas Advanced Computing Center Why Data Management? Digital research, above all, creates files Lots of files Without a plan,

More information

Condor for the Grid. 3) http://www.cs.wisc.edu/condor/

Condor for the Grid. 3) http://www.cs.wisc.edu/condor/ Condor for the Grid 1) Condor and the Grid. Douglas Thain, Todd Tannenbaum, and Miron Livny. In Grid Computing: Making The Global Infrastructure a Reality, Fran Berman, Anthony J.G. Hey, Geoffrey Fox,

More information

Inca User-level Grid Monitoring

Inca User-level Grid Monitoring Inca User-level Grid Monitoring Shava Smallen ssmallen@sdsc.edu SC 09 November 17, 2009 Goal: reliable grid software and services for users Over 750 TF Over 30 PB of online and archival data storage Connected

More information

urika! Unlocking the Power of Big Data at PSC

urika! Unlocking the Power of Big Data at PSC urika! Unlocking the Power of Big Data at PSC Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center February 1, 2013 nystrom@psc.edu 2013 Pittsburgh Supercomputing Center Big Data

More information

Cloud Computing. Alex Crawford Ben Johnstone

Cloud Computing. Alex Crawford Ben Johnstone Cloud Computing Alex Crawford Ben Johnstone Overview What is cloud computing? Amazon EC2 Performance Conclusions What is the Cloud? A large cluster of machines o Economies of scale [1] Customers use a

More information

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University

More information

Cloud Computing. Up until now

Cloud Computing. Up until now Cloud Computing Lecture 3 Grid Schedulers: Condor, Sun Grid Engine 2010-2011 Introduction. Up until now Definition of Cloud Computing. Grid Computing: Schedulers: Condor architecture. 1 Summary Condor:

More information

Enabling multi-cloud resources at CERN within the Helix Nebula project. D. Giordano (CERN IT-SDC) HEPiX Spring 2014 Workshop 23 May 2014

Enabling multi-cloud resources at CERN within the Helix Nebula project. D. Giordano (CERN IT-SDC) HEPiX Spring 2014 Workshop 23 May 2014 Enabling multi-cloud resources at CERN within the Helix Nebula project D. Giordano (CERN IT-) HEPiX Spring 2014 Workshop This document produced by Members of the Helix Nebula consortium is licensed under

More information

Integrated Communication Systems

Integrated Communication Systems Integrated Communication Systems Courses, Research, and Thesis Topics Prof. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de

More information

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research

Astrophysics with Terabyte Datasets. Alex Szalay, JHU and Jim Gray, Microsoft Research Astrophysics with Terabyte Datasets Alex Szalay, JHU and Jim Gray, Microsoft Research Living in an Exponential World Astronomers have a few hundred TB now 1 pixel (byte) / sq arc second ~ 4TB Multi-spectral,

More information

Dutch HPC Cloud: flexible HPC for high productivity in science & business

Dutch HPC Cloud: flexible HPC for high productivity in science & business Dutch HPC Cloud: flexible HPC for high productivity in science & business Dr. Axel Berg SARA national HPC & e-science Support Center, Amsterdam, NL April 17, 2012 4 th PRACE Executive Industrial Seminar,

More information

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

BlobSeer: Towards efficient data storage management on large-scale, distributed systems : Towards efficient data storage management on large-scale, distributed systems Bogdan Nicolae University of Rennes 1, France KerData Team, INRIA Rennes Bretagne-Atlantique PhD Advisors: Gabriel Antoniu

More information

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,

More information

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy Derrick Kondo INRIA, France Outline Cloud Grid Volunteer Computing Cloud Background Vision Hide complexity of hardware

More information

Clouds vs Grids KHALID ELGAZZAR GOODWIN 531 ELGAZZAR@CS.QUEENSU.CA

Clouds vs Grids KHALID ELGAZZAR GOODWIN 531 ELGAZZAR@CS.QUEENSU.CA Clouds vs Grids KHALID ELGAZZAR GOODWIN 531 ELGAZZAR@CS.QUEENSU.CA [REF] I Foster, Y Zhao, I Raicu, S Lu, Cloud computing and grid computing 360-degree compared Grid Computing Environments Workshop, 2008.

More information

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams Neptune A Domain Specific Language for Deploying HPC Software on Cloud Platforms Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams ScienceCloud 2011 @ San Jose, CA June 8, 2011 Cloud Computing Three

More information

Building All-Flash Software Defined Storages for Datacenters. Ji Hyuck Yun (dr.jhyun@sk.com) Storage Tech. Lab SK Telecom

Building All-Flash Software Defined Storages for Datacenters. Ji Hyuck Yun (dr.jhyun@sk.com) Storage Tech. Lab SK Telecom Building All-Flash Software Defined Storages for Datacenters Ji Hyuck Yun (dr.jhyun@sk.com) Storage Tech. Lab SK Telecom Introduction R&D Motivation Synergy between SK Telecom and SK Hynix Service & Solution

More information

Duke University http://www.cs.duke.edu/starfish

Duke University http://www.cs.duke.edu/starfish Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University http://www.cs.duke.edu/starfish Practitioners of Big Data Analytics Google Yahoo! Facebook ebay Physicists Biologists Economists

More information

Managing a local Galaxy Instance. Anushka Brownley / Adam Kraut BioTeam Inc.

Managing a local Galaxy Instance. Anushka Brownley / Adam Kraut BioTeam Inc. Managing a local Galaxy Instance Anushka Brownley / Adam Kraut BioTeam Inc. Agenda Who are we Why a local installation Local infrastructure Local installation Tips and Tricks SlipStream Appliance WHO ARE

More information

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of

More information

Improved metrics collection and correlation for the CERN cloud storage test framework

Improved metrics collection and correlation for the CERN cloud storage test framework Improved metrics collection and correlation for the CERN cloud storage test framework September 2013 Author: Carolina Lindqvist Supervisors: Maitane Zotes Seppo Heikkila CERN openlab Summer Student Report

More information

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and

More information

Enabling Execution of Service Workflows in Grid/Cloud Hybrid Systems

Enabling Execution of Service Workflows in Grid/Cloud Hybrid Systems Enabling Execution of Service Workflows in Grid/Cloud Hybrid Systems Luiz F. Bittencourt, Carlos R. Senna, and Edmundo R. M. Madeira Institute of Computing University of Campinas - UNICAMP P.O. Box 6196,

More information

DAME Astrophysical DAta Mining Mining & & Exploration Exploration GRID

DAME Astrophysical DAta Mining Mining & & Exploration Exploration GRID DAME Astrophysical DAta Mining & Exploration on GRID M. Brescia S. G. Djorgovski G. Longo & DAME Working Group Istituto Nazionale di Astrofisica Astronomical Observatory of Capodimonte, Napoli Department

More information

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management

More information

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) ( TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011

Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011 Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011 Magellan Exploring Cloud Computing Co-located at two DOE-SC Facilities

More information

Cornell University Center for Advanced Computing A Sustainable Business Model for Advanced Research Computing

Cornell University Center for Advanced Computing A Sustainable Business Model for Advanced Research Computing Cornell University Center for Advanced Computing A Sustainable Business Model for Advanced Research Computing David A. Lifka lifka@cac.cornell.edu 4/20/13 www.cac.cornell.edu 1 My Background 2007 Cornell

More information

A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems *

A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems * A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems * Dong Yuan, Yun Yang, Xiao Liu, Gaofeng Zhang, Jinjun Chen Faculty of Information and Communication

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

Big data blue print for cloud architecture

Big data blue print for cloud architecture Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Minimal Cost Data Sets Storage in the Cloud

Minimal Cost Data Sets Storage in the Cloud Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.1091

More information

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data

More information

Hadoopizer : a cloud environment for bioinformatics data analysis

Hadoopizer : a cloud environment for bioinformatics data analysis Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,

More information

Technical. Overview. ~ a ~ irods version 4.x

Technical. Overview. ~ a ~ irods version 4.x Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number

More information