Addressing Crosscutting Compute/Data Challenges

Size: px
Start display at page:

Download "Addressing Crosscutting Compute/Data Challenges"

Transcription

1 Rutgers Discovery Informatics Institute (RDI 2 ) New Jersey s Center for Advanced Computation Addressing Crosscutting Compute/Data Challenges Manish Parashar parashar@rutgers.edu Manish Parashar, Rutgers University CDSE - Many crosscutting challenges Computing Multicore; large and increasing core counts, deep memory hierarchies (TH-2: 54.9 PF, 3.12 M Cores, 1.4 PB, 25 MW) New prgm. model, concerns (fault tolerance, energy, etc) New models & technologies: Clouds, grids, hybrid manycore, accelerators, deep storage hierarchies, Data Generating more data than in all of human history: preserve, mine, share? Software Complex applications on coupled compute-data-networked environments, tools needed Modern apps: lines, many groups contribute, take decades to develop, very long lifetimes People Multidisciplinary expertise essential! Appropriate academic program, career tracks 2 1

2 Complex Coupled Application Workflows Tight coupling Coupled simulations/models run concurrently and exchange data frequently E.g., Coupled fusion simulations, multi-block subsurface modeling simulations Loose coupling Coupling and data exchange are less frequent, and often asynchronous and possibly opportunistic E.g., Combustion simulations DNS + LES coupling, material science simulations Data-flow (data-driven) coupling Workflows expressed as dataflow based DAG where data produced by a simulation flows through one or more data processing pipelines E.g., End-to-end simulations workflows with data analytics Ensemble workflows Multiple fan-out stages composed of large numbers of loosely coupled or independent simulations E.g., Uncertainty quantification, history matching, data assimilation Example: Asynchronous Replica Exchange 2

3 Scientific Simulations: A Big Data Challenge Scientific simulations running on high-end computing systems generate huge amounts of data! If a single core produces 2MB/minute on average, one of these machines could generate simulation data between ~170TB per hour -> ~700PB per day -> ~1.4EB per year Successful scientific discovery depends on a comprehensive understanding of this enormous simulation data How we enable the computation scientists to efficiently manage and explore extreme scale data: find the needles in haystack?? Challenges Faced by Traditional HPC Data Pipelines Data analysis challenge Can current data mining, manipulation and visualization algorithms still work effectively on extreme scale machine? I/O challenge Increasing performance gap: disks are outpaced by computing speed Figure. Traditional data analysis pipeline Data movement challenge Lots of data movement between simulation and analysis machines, between coupled mutli-physics simulation components -> longer latencies Improving data locality is critical: do work where the data resides! Energy challenge Future extreme systems are designed to have low-power chips however, much greater power consumption will be due to memory and data movement! The costs of data movement are increasing and dominating! 3

4 Coupling Cycles of Innovation Interoperability/Standards? 7 Addressing Complexity Programming System programming model, languages/ abstraction syntax + semantics abstract machine execution context and assumptions infrastructure, middleware & runtime Hide v/s expose? virtual homogeneity ease of programming robustness? cross-layer adaptation & optimization the inverted stack Applications Programming System (Programming Model + Abstract Machine) Abstraction (Support abstract behaviors/assumptions) Grid Middleware Infrastructure Virtualization (Create and manage VOs and VMs) Virtual Machine Virtual Machine Virtual Machine Virtual Organization Organization Organization Organization Conceptual Implementation Abstractions? Conceptual and Implementation Models for the Grid, M. Parashar and J.C. Browne, Proceedings of the IEEE, Special Issue on Grid Computing, IEEE Press, Vol. 19, No. 3, March 2005.! 4

5 Software Engineering and CDS&E Many sources of complexity complexity from the problem (complex physics) complexity of the end-to-end application workflow complexity from coordination & coupling across codes and research teams complexity from the codes and how they are developed and implemented complexity from the underlying disruptive infrastructure Challenge: Manage complexity while maintaining performance/scalability Coupling cycles of innovation Software Engineering in the Large v/s Software Engineering in the Small? Software management of large systems v/s smaller but highly complex systems Well defined v/s rapidly evolving requirements/execution environments Primary complexity from size v/s application requirements Relative importance of performance Design Principles Separation of Concerns Abstractions Interoperability and standardization Complexity suggests a SOA approach Service Oriented Architecture (SOA): Software as a composition of services Used to deal with system/application complexity, rapidly changing requirements, evolving target platforms, and diverse teams Service: a well-defined, self-contained, and independently developed software element that does not depend on the context or state of other services. Abstraction & separation Computations from compositions and coordination Interface from implementations Existing and proven concept - widely accepted/used by the enterprise computing community For example, extreme-scale data analysis services routinely run at Yahoo & Google, for near real-time web data analysis, for rapid deployment of new web services, etc. 5

6 Examples of some relevant services I/O services: Efficient and adaptable I/O support I/O services should be componentized so that it allows codes to switch between and tune different methods easily. Example: a user may easily switch between file-based and memorybased I/O or switch formats without changing any code, all the while maintaining the same data model when switching I/O components. Code-coupling services: Code-coupling services includes codes that can run on the same or different machines, codes that are tightly coupled (memory-to-memory), as well as codes that are loosely coupled (exchange of data through files). Automatic online (in-transit) data processing services: Efficient data movement services for distributing and archiving data Example: generating summary statistics, graphs, images, movies, etc. Monitoring Services: dynamic real-time monitoring of large-scale simulations (during execution) Automated provenance collections services: Data, system workflow, application, etc. ADIOS: Adaptable I/O System Provides portable, fast, scalable, easy- to- use, metadata rich output Simple API Layered so;ware architecture: Allows plug- ins for different I/O implementacons Abstracts the API from the method used for I/O Beyond I/O coupling, coordinacon, in- situ/in- transit, etc. Open source: hgp:// support/ center- projects/adios/ Research methods from many groups: Georgia Tech, Rutgers, NCSU, Emory, Auburn, Sandia, LBL, PPPL Interface)to)apps)for)descrip/on)of)data)(ADIOS,)etc.)) Feedback) Mul/Bresolu/on) methods) Provenance) Data)Management)Services) Buffering) Data)Compression) methods) Plugins)to)the)hybrid)staging)area) )Schedule) Data)Indexing) (FastBit))methods) Workflow))Engine) Run/me)engine) Data)movement) Analysis)Plugins) Parallel)and)Distributed)File)System) Visualiza/on)Plugins) AdiosBbp) IDX) HDF5) pnetcdf) raw )data) Image)data) Viz.)Client) I/O performance of the Combustion S3D code (96K cores), the SCEC PCML3D (30K cores), and the Fine/Turbo (4K cores) codes. 35 GB/s MPI-IO OR PHDF5 ADIOS S3D SEC Fine/Turbo Simulations with and without ADIOS 6

7 ADIOS Application Partners Over 158 Publications from Application Code Contact Astrophysics+ Chimera+ T.+Mezzacappa+ Combus6on+ S3D+ J.+Chen+ AMR+ Chombo+ B.+Van+Straalen+ CFD+ Fine/Turbo+ M.+Gon6er+ FusionCEdge+ XGC1+ C.+S.+Chang+ FusionCEdge+ XGC0+ S.+H.+Ku+ Geoscience+ AWPCODC+ Y.+Cui+ Materials+ LAMMPS+ A.+Frachioni+ Weather+ GRAPES+ W.+Xue+ Nuclear+ HFODD+ H.+Nam+ Rela6vity+ Maya+ P.+Laguna+ Sub+surface+ Materials+ M.+Wheeler+ M.+Parashar+ Fusion+ GTCCP+ S.+Ethier+ Application Code Materials+ M.+Parashar+ Application Fusion+ Code GTCCP+ Contact S.+Ethier+ Fusion+ GTS+ W.+Wang+ Fusion+ M3DCC1+ S.+Jardin+ Fusion+ M3DCK+ G.+Y.+Fu+ Fusion+ GTC+ Z.+Lin+ Fusion+ GEM+ S.+Parker+ Quantum+ QLG2Q+ M.+Soe+ Image+Analysis+ T.+Kurc+ AMR+ Boxlib+ J.+Bell+ Rela6vity+ Cactus+ G.+Allen+ Weather+ NASA+ T.+Clune+ Materials+ QMC+ J.+Kim+ Climate+ Contact Astrophysics+ Chimera+ T.+Mezzacappa+ Combus6on+ S3D+ J.+Chen+ AMR+ Chombo+ B.+Van+Straalen+ CFD+ Fine/Turbo+ M.+Gon6er+ FusionCEdge+ XGC1+ C.+S.+Chang+ FusionCEdge+ XGC0+ S.+H.+Ku+ Geoscience+ AWPCODC+ Y.+Cui+ Materials+ LAMMPS+ A.+Frachioni+ Weather+ GRAPES+ W.+Xue+ Nuclear+ HFODD+ H.+Nam+ Rela6vity+ Maya+ P.+Laguna+ Sub+surface+ M.+Wheeler+ J.+Fu+ Climate+ CAM+ K.+Evans+ Application Code Contact Fusion+ GTS+ W.+Wang+ Fusion+ M3DCC1+ S.+Jardin+ Fusion+ M3DCK+ G.+Y.+Fu+ Fusion+ GTC+ Z.+Lin+ Fusion+ GEM+ S.+Parker+ Quantum+ QLG2Q+ M.+Soe+ Image+Analysis+ T.+Kurc+ AMR+ Boxlib+ J.+Bell+ Rela6vity+ Cactus+ G.+Allen+ Weather+ NASA+ T.+Clune+ Materials+ QMC+ J.+Kim+ Climate+ J.+Fu+ Climate+ CAM+ K.+Evans+ The ADIOS Approach The overarching design philosophy of our framework is based on the Service-Oriented Architecture Used to deal with system/application complexity, rapidly changing requirements, evolving target platforms, and diverse teams Applications constructed by assembling services based on a universal view of their functionality using a welldefined API Service implementations can be changed easily Integrated simulation can be assembled using these services 7

8 ADIOS SOA: I/O Componentization End users should be able to select the most efficient I/O method for their code, with minimal effort in terms of code alternations/updates. Performance-driven choices should not prevent data from being stored in the desired file format, since this is crucial for later data analysis. Have efficient ways of identifying and selecting certain data for analysis, to help end users cope with the flood of data being produced. Make it easy to introduce new research I/O methods, without changing your code. allow the best CDS&E researchers to contribute in application science projects. Make it easy to allow I/O to do more than just I/O à code coupling, in situ visualization. Rethinking the Data Management Pipeline Hybrid Staging + In-Situ & In-Transit Execution Exploit multi-levels in-memory Hybrid Data Staging to: Decrease the gap between CPU and IO speeds Dynamically deploy and execute data analytical or pre-processing operations either in-situ or in-transit Improve IO write performance 8

9 In-situ/In-transit Data Management & Analytics l l l l Virtual shared-space programming abstraction l Simple API for coordination, interaction and messaging Distributed, associative, in-memory object store l Online data indexing, flexible querying Adaptive cross-layer runtime management l Hybrid in-situ/in-transit execution Efficient, high-throughput/low-latency asynchronous data transport Overview of Research RU Programming abstractions / system DataSpaces/XpressSpaces: Interaction, coordination and messaging abstraction for coupled scientific workflow [HPDC10, CCGrid10,11, HiPC12, CCPE13] ActiveSpaces: Dynamic code deployment for in-staging data processing [IPDPS11] Runtime mechanisms Data-centric Task Mapping: Reduce data movement and increase intranode data sharing [IPDPS12, DISC12] In-situ & In-transit Data Analytics: Simulation-time analysis of large volume data by combining in-situ and in-transit execution [SC 12] Cross-layer Adaptation: adaptive cross-layer approach for dynamic data management in large scale simulation-analysis workflow High-throughput/low-latency asynchronous data transport DART: Network independent transport library for high speed asynchronous data extraction and transfer [HPDC08] 9

10 DataSpaces: A Shared Space Abstraction for the Hybrid Staging Area [HPDC 10] Semantically-specialized virtual shared space abstraction in the staging area Shared (distributed) data objects Simple put/get/query API Supports coupled app. workflows Provide a global-view programming abstraction consistent with the PGAS model (UPC, GA) DataSpaces (HPDC10) Constructed on-the-fly on hybrid staging nodes Indexes data for quick access and retrieval Provides asynchronous coordination and interaction support Complements existing interaction/ coordination mechanisms Figure. Graphical view of the shared space model (derived from Tuple space) for coordination/ interaction Code coupling using DataSpaces Transparent data redistribution Complex geometry-based queries In-space (online) data transformation and manipulations The Replica Exchange Algorithm A powerful sampling algorithm that preserves canonical distributions Allows for efficient crossing of high energy barriers that separate thermodynamically stable states Can significantly reduce sampling times as compared to formulations based on constant temperatures Algorithm overview Several copies, or replicas, of the system are simulated in parallel at different temperatures using walkers Walkers occasionally swap temperatures to allow them to bypass enthalpic barriers by moving to a high temperature Exchanges governed by a probability condition to ensure detailed balance Replica exchange can significantly impact the fields of structural biology and drug design structure based drug design associated with protein misfolding, for example, structure based drug design and binding affinity optimization molecular basis of human diseases associated with protein misfolding 10

11 Parallel Replica Exchange General formulation requires dynamic and complex coordination and communication patterns between the walkers Pair-wise and asynchronous Dependent on the current state (temperature, energy) of replicas, which are changing dynamically Implementation based on commonly used parallel programming frameworks is challenging Message passing frameworks, e.g. MPI, require matching sends and receives to be explicitly defined for each interaction Implementations use restrictive formulations using synchronous exchanges Exchanges between neighboring temperatures only Asynchronous Replica Exchange using DataSpaces DataSpaces provides a semantically specialized virtual shared space abstraction to support scalable asynchronous replica exchange for molecular dynamics applications Enables general non-nearest neighboring temperature exchanges Exchanges are decoupled and asynchronously and dynamically determined Communications are decentralized and peer-to-peer, and occur in parallel Operation Walkers post temperature range of interest to the shared space using post Walkers discover partners and negotiate exchange Walkers execute exchange (handshake) 11

12 Example: Salsa Operation Summary Unprecedented opportunities for dramatic insights through computation! Challenge: Manage complexity while maintaining performance/scalability. complexity from the problem (complex physics) complexity from the codes and how they are developed and implemented complexity of underlying infrastructure (disruptive hardware trends) complexity from coordination across codes and research teams Overarching philosophy Abstraction & Separation through Service Orientation Independence in development, deployment, execution Computations from compositions and coordination; Interface from implementations Existing and proven concepts - widely accepted/used by the enterprise computing community ADIOS Reducing barriers from a scientists perspective Ease-of-use, simple code integration and maintainability Minimizing performance impact Addressing unique requirements of high-end CDS&E 12

13 Thank You!! EPSi Edge Physics Simulation Manish Parashar, Ph.D. Prof., Dept. of Electrical & Computer Engr. Rutgers Discovery Informatics Institute (RDI 2 ) Cloud & Autonomic Computing Center (CAC) Rutgers, The State University of New Jersey parashar@rutgers.edu WWW: rdi2.rutgers.edu 13

Addressing Big Data Challenges in Simulation-based Science

Addressing Big Data Challenges in Simulation-based Science Addressing Big Data Challenges in Simulation-based Science Manish Parashar* Rutgers Discovery Informatics Institute (RDI2) NSF Center for Cloud and Autonomic Computing (CAC) Department of Electrical and

More information

Big Data Challenges in Simulation-based Science

Big Data Challenges in Simulation-based Science Big Data Challenges in Simulation-based Science Manish Parashar* Rutgers Discovery Informatics Institute (RDI 2 ) Department of Computer Science *Hoang Bui, Tong Jin, Qian Sun, Fan Zhang, ADIOS Team, and

More information

Big Data Challenges in Simulation-based Science

Big Data Challenges in Simulation-based Science Big Data Challenges in Simulation-based Science Manish Parashar* Rutgers Discovery Informatics Institute (RDI2) NSF Center for Cloud and Autonomic Computing (CAC) Department of Electrical and Computer

More information

Big Data Challenges in Simulation-based Science

Big Data Challenges in Simulation-based Science Big Data Challenges in Simulation-based Science Manish Parashar Rutgers Discovery Informatics Institute (RDI 2 ) NSF Center for Cloud and Autonomic Computing (CAC) http://parashar.rutgers.edu *Hoang Bui,

More information

New Jersey Big Data Alliance

New Jersey Big Data Alliance Rutgers Discovery Informatics Institute (RDI 2 ) New Jersey s Center for Advanced Computation New Jersey Big Data Alliance Manish Parashar Director, Rutgers Discovery Informatics Institute (RDI 2 ) Professor,

More information

Building Platform as a Service for Scientific Applications

Building Platform as a Service for Scientific Applications Building Platform as a Service for Scientific Applications Moustafa AbdelBaky moustafa@cac.rutgers.edu Rutgers Discovery Informa=cs Ins=tute (RDI 2 ) The NSF Cloud and Autonomic Compu=ng Center Department

More information

Exploring Software Defined Federated Infrastructures for Science

Exploring Software Defined Federated Infrastructures for Science Exploring Software Defined Federated Infrastructures for Science Manish Parashar NSF Cloud and Autonomic Computing Center (CAC) Rutgers Discovery Informatics Institute (RDI 2 ) Rutgers, The State University

More information

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Big Data Mining Services and Knowledge Discovery Applications on Clouds Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades

More information

BSC vision on Big Data and extreme scale computing

BSC vision on Big Data and extreme scale computing BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,

More information

Beyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp

Beyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp Beyond Embarrassingly Parallel Big Data William Gropp www.cs.illinois.edu/~wgropp Messages Big is big Data driven is an important area, but not all data driven problems are big data (despite current hype).

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

Cluster, Grid, Cloud Concepts

Cluster, Grid, Cloud Concepts Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of

More information

Big Data Management in the Clouds and HPC Systems

Big Data Management in the Clouds and HPC Systems Big Data Management in the Clouds and HPC Systems Hemera Final Evaluation Paris 17 th December 2014 Shadi Ibrahim Shadi.ibrahim@inria.fr Era of Big Data! Source: CNRS Magazine 2013 2 Era of Big Data! Source:

More information

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray

More information

Data Semantics Aware Cloud for High Performance Analytics

Data Semantics Aware Cloud for High Performance Analytics Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Visualization and Data Analysis

Visualization and Data Analysis Working Group Outbrief Visualization and Data Analysis James Ahrens, David Rogers, Becky Springmeyer Eric Brugger, Cyrus Harrison, Laura Monroe, Dino Pavlakos Scott Klasky, Kwan-Liu Ma, Hank Childs LLNL-PRES-481881

More information

HPC in the age of (Big)Data

HPC in the age of (Big)Data HPC in the age of (Big)Data The fusion of supercomputing and Big, Fast Data Analytics Ugo Varetto (uvaretto@cray.com) CRAY EMEA Research Lab Agenda Outlook Requirements Implementation 2 Outlook Cray s

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India Call for Papers Cloud computing has emerged as a de facto computing

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Scientific Computing's Productivity Gridlock and How Software Engineering Can Help

Scientific Computing's Productivity Gridlock and How Software Engineering Can Help Scientific Computing's Productivity Gridlock and How Software Engineering Can Help Stuart Faulk, Ph.D. Computer and Information Science University of Oregon Stuart Faulk CSESSP 2015 Outline Challenges

More information

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India 1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Scientific versus Business Workflows

Scientific versus Business Workflows 2 Scientific versus Business Workflows Roger Barga and Dennis Gannon The formal concept of a workflow has existed in the business world for a long time. An entire industry of tools and technology devoted

More information

HPC technology and future architecture

HPC technology and future architecture HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr

More information

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

BlobSeer: Towards efficient data storage management on large-scale, distributed systems : Towards efficient data storage management on large-scale, distributed systems Bogdan Nicolae University of Rennes 1, France KerData Team, INRIA Rennes Bretagne-Atlantique PhD Advisors: Gabriel Antoniu

More information

Business Transformation for Application Providers

Business Transformation for Application Providers E SB DE CIS IO N GUID E Business Transformation for Application Providers 10 Questions to Ask Before Selecting an Enterprise Service Bus 10 Questions to Ask Before Selecting an Enterprise Service Bus InterSystems

More information

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory 21 st Century Research Continuum Theory Theory embodied in computation Hypotheses tested through experiment SCIENTIFIC METHODS

More information

HPC Programming Framework Research Team

HPC Programming Framework Research Team HPC Programming Framework Research Team 1. Team Members Naoya Maruyama (Team Leader) Motohiko Matsuda (Research Scientist) Soichiro Suzuki (Technical Staff) Mohamed Wahib (Postdoctoral Researcher) Shinichiro

More information

Towards Scalable Visualization Plugins for Data Staging Workflows

Towards Scalable Visualization Plugins for Data Staging Workflows Towards Scalable Visualization Plugins for Data Staging Workflows David Pugmire Oak Ridge National Laboratory Oak Ridge, Tennessee James Kress University of Oregon Eugene, Oregon Jeremy Meredith, Norbert

More information

Platform Autonomous Custom Scalable Service using Service Oriented Cloud Computing Architecture

Platform Autonomous Custom Scalable Service using Service Oriented Cloud Computing Architecture Platform Autonomous Custom Scalable Service using Service Oriented Cloud Computing Architecture 1 B. Kamala 2 B. Priya 3 J. M. Nandhini 1 2 3 ABSTRACT The global economic recession and the shrinking budget

More information

CORBA and object oriented middleware. Introduction

CORBA and object oriented middleware. Introduction CORBA and object oriented middleware Introduction General info Web page http://www.dis.uniroma1.it/~beraldi/elective Exam Project (application), plus oral discussion 3 credits Roadmap Distributed applications

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Using an In-Memory Data Grid for Near Real-Time Data Analysis SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses

More information

Grid Computing Vs. Cloud Computing

Grid Computing Vs. Cloud Computing International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 577-582 International Research Publications House http://www. irphouse.com /ijict.htm Grid

More information

Distribution transparency. Degree of transparency. Openness of distributed systems

Distribution transparency. Degree of transparency. Openness of distributed systems Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science steen@cs.vu.nl Chapter 01: Version: August 27, 2012 1 / 28 Distributed System: Definition A distributed

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

Service Oriented Architecture

Service Oriented Architecture Service Oriented Architecture Charlie Abela Department of Artificial Intelligence charlie.abela@um.edu.mt Last Lecture Web Ontology Language Problems? CSA 3210 Service Oriented Architecture 2 Lecture Outline

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

Big Data Analytics. Chances and Challenges. Volker Markl

Big Data Analytics. Chances and Challenges. Volker Markl Volker Markl Professor and Chair Database Systems and Information Management (DIMA), Technische Universität Berlin www.dima.tu-berlin.de Big Data Analytics Chances and Challenges Volker Markl DIMA BDOD

More information

Hank Childs, University of Oregon

Hank Childs, University of Oregon Exascale Analysis & Visualization: Get Ready For a Whole New World Sept. 16, 2015 Hank Childs, University of Oregon Before I forget VisIt: visualization and analysis for very big data DOE Workshop for

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Autonomics, Cyberinfrastructure Federation, and Software-Defined Environments for Science

Autonomics, Cyberinfrastructure Federation, and Software-Defined Environments for Science Autonomics, Cyberinfrastructure Federation, and Software-Defined Environments for Science Manish Parashar NSF Cloud and Autonomic Computing Center (CAC) Rutgers Discovery Informatics Institute (RDI 2 )

More information

Appendix 1 ExaRD Detailed Technical Descriptions

Appendix 1 ExaRD Detailed Technical Descriptions Appendix 1 ExaRD Detailed Technical Descriptions Contents 1 Application Foundations... 3 1.1 Co- Design... 3 1.2 Applied Mathematics... 5 1.3 Data Analytics and Visualization... 8 2 User Experiences...

More information

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study DISTRIBUTED SYSTEMS AND CLOUD COMPUTING A Comparative Study Geographically distributed resources, such as storage devices, data sources, and computing power, are interconnected as a single, unified resource

More information

NASA's Strategy and Activities in Server Side Analytics

NASA's Strategy and Activities in Server Side Analytics NASA's Strategy and Activities in Server Side Analytics Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at the ESGF/UVCDAT Conference Lawrence Livermore National Laboratory

More information

Manjrasoft Market Oriented Cloud Computing Platform

Manjrasoft Market Oriented Cloud Computing Platform Manjrasoft Market Oriented Cloud Computing Platform Aneka Aneka is a market oriented Cloud development and management platform with rapid application development and workload distribution capabilities.

More information

Manjrasoft Market Oriented Cloud Computing Platform

Manjrasoft Market Oriented Cloud Computing Platform Manjrasoft Market Oriented Cloud Computing Platform Innovative Solutions for 3D Rendering Aneka is a market oriented Cloud development and management platform with rapid application development and workload

More information

Data Requirements from NERSC Requirements Reviews

Data Requirements from NERSC Requirements Reviews Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community

More information

The Role of the Operating System in Cloud Environments

The Role of the Operating System in Cloud Environments The Role of the Operating System in Cloud Environments Judith Hurwitz, President Marcia Kaufman, COO Sponsored by Red Hat Cloud computing is a technology deployment approach that has the potential to help

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

Service-Oriented Architectures

Service-Oriented Architectures Architectures Computing & 2009-11-06 Architectures Computing & SERVICE-ORIENTED COMPUTING (SOC) A new computing paradigm revolving around the concept of software as a service Assumes that entire systems

More information

NASA s Big Data Challenges in Climate Science

NASA s Big Data Challenges in Climate Science NASA s Big Data Challenges in Climate Science Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at IEEE Big Data 2014 Workshop October 29, 2014 1 2 7-km GEOS-5 Nature Run

More information

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver 1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution

More information

2 (18) - SOFTWARE ARCHITECTURE Service Oriented Architecture - Sven Arne Andreasson - Computer Science and Engineering.

2 (18) - SOFTWARE ARCHITECTURE Service Oriented Architecture - Sven Arne Andreasson - Computer Science and Engineering. Service Oriented Architecture Definition (1) Definitions Services Organizational Impact SOA principles Web services A service-oriented architecture is essentially a collection of services. These services

More information

Performance Monitoring of Parallel Scientific Applications

Performance Monitoring of Parallel Scientific Applications Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure

More information

Relational Databases in the Cloud

Relational Databases in the Cloud Contact Information: February 2011 zimory scale White Paper Relational Databases in the Cloud Target audience CIO/CTOs/Architects with medium to large IT installations looking to reduce IT costs by creating

More information

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,

More information

In situ data analysis and I/O acceleration of FLASH astrophysics simulation on leadership-class system using GLEAN

In situ data analysis and I/O acceleration of FLASH astrophysics simulation on leadership-class system using GLEAN In situ data analysis and I/O acceleration of FLASH astrophysics simulation on leadership-class system using GLEAN Venkatram Vishwanath 1, Mark Hereld 1, Michael E. Papka 1, Randy Hudson 2, G. Cal Jordan

More information

Percipient StorAGe for Exascale Data Centric Computing

Percipient StorAGe for Exascale Data Centric Computing Percipient StorAGe for Exascale Data Centric Computing per cip i ent (pr-sp-nt) adj. Having the power of perceiving, especially perceiving keenly and readily. n. One that perceives. Introducing: Seagate

More information

Hardware- and Network-Enhanced Software Systems for Cloud Computing

Hardware- and Network-Enhanced Software Systems for Cloud Computing The HARNESS Project: Hardware- and Network-Enhanced Software Systems for Cloud Computing Prof. Alexander Wolf Imperial College London (Project Coordinator) Cloud Market Strata SaaS Software as a Service

More information

IO Informatics The Sentient Suite

IO Informatics The Sentient Suite IO Informatics The Sentient Suite Our software, The Sentient Suite, allows a user to assemble, view, analyze and search very disparate information in a common environment. The disparate data can be numeric

More information

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Cloud computing insights from 110 implementation projects

Cloud computing insights from 110 implementation projects IBM Academy of Technology Thought Leadership White Paper October 2010 Cloud computing insights from 110 implementation projects IBM Academy of Technology Survey 2 Cloud computing insights from 110 implementation

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

A Study on Service Oriented Network Virtualization convergence of Cloud Computing

A Study on Service Oriented Network Virtualization convergence of Cloud Computing A Study on Service Oriented Network Virtualization convergence of Cloud Computing 1 Kajjam Vinay Kumar, 2 SANTHOSH BODDUPALLI 1 Scholar(M.Tech),Department of Computer Science Engineering, Brilliant Institute

More information

Big Data in HPC Applications and Programming Abstractions. Saba Sehrish Oct 3, 2012

Big Data in HPC Applications and Programming Abstractions. Saba Sehrish Oct 3, 2012 Big Data in HPC Applications and Programming Abstractions Saba Sehrish Oct 3, 2012 Big Data in Computational Science - Size Data requirements for select 2012 INCITE applications at ALCF (BG/P) On-line

More information

PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D. PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD Natasha Balac, Ph.D. Brief History of SDSC 1985-1997: NSF national supercomputer center; managed by General Atomics

More information

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner Research Group Scientific Computing Faculty of Computer Science University of Vienna AUSTRIA http://www.par.univie.ac.at

More information

How To Understand The Concept Of A Distributed System

How To Understand The Concept Of A Distributed System Distributed Operating Systems Introduction Ewa Niewiadomska-Szynkiewicz and Adam Kozakiewicz ens@ia.pw.edu.pl, akozakie@ia.pw.edu.pl Institute of Control and Computation Engineering Warsaw University of

More information

September 25, 2007. Maya Gokhale Georgia Institute of Technology

September 25, 2007. Maya Gokhale Georgia Institute of Technology NAND Flash Storage for High Performance Computing Craig Ulmer cdulmer@sandia.gov September 25, 2007 Craig Ulmer Maya Gokhale Greg Diamos Michael Rewak SNL/CA, LLNL Georgia Institute of Technology University

More information

Platform Autonomous Custom Scalable Service using Service Oriented Cloud Computing Architecture

Platform Autonomous Custom Scalable Service using Service Oriented Cloud Computing Architecture Platform Autonomous Custom Scalable Service using Service Oriented Cloud Computing Architecture 1 B. Kamala 2 B. Priya 3 J. M. Nandhini - 1 AP-II, MCA Dept, Sri Sai Ram Engineering College, Chennai, kamala.mca@sairam.edu.in

More information

A standards-based approach to application integration

A standards-based approach to application integration A standards-based approach to application integration An introduction to IBM s WebSphere ESB product Jim MacNair Senior Consulting IT Specialist Macnair@us.ibm.com Copyright IBM Corporation 2005. All rights

More information

EHR Data Preservation

EHR Data Preservation Emerging Computational Approaches to Interoperability the Key to Long Term Preservation of EHR Data William W. Stead, M.D. Associate Vice Chancellor for Health Affairs Chief Strategy & Information Officer

More information

Digital libraries of the future and the role of libraries

Digital libraries of the future and the role of libraries Digital libraries of the future and the role of libraries Donatella Castelli ISTI-CNR, Pisa, Italy Abstract Purpose: To introduce the digital libraries of the future, their enabling technologies and their

More information

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Introducing Unisys All in One software based weather platform designed to reduce server space, streamline operations, consolidate

More information

Welcome to the Force.com Developer Day

Welcome to the Force.com Developer Day Welcome to the Force.com Developer Day Sign up for a Developer Edition account at: http://developer.force.com/join Nicola Lalla nlalla@saleforce.com n_lalla nlalla26 Safe Harbor Safe harbor statement under

More information

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

VStore++: Virtual Storage Services for Mobile Devices

VStore++: Virtual Storage Services for Mobile Devices VStore++: Virtual Storage Services for Mobile Devices Sudarsun Kannan, Karishma Babu, Ada Gavrilovska, and Karsten Schwan Center for Experimental Research in Computer Systems Georgia Institute of Technology

More information

Figure 1: Illustration of service management conceptual framework

Figure 1: Illustration of service management conceptual framework Dagstuhl Seminar on Service-Oriented Computing Session Summary Service Management Asit Dan, IBM Participants of the Core Group Luciano Baresi, Politecnico di Milano Asit Dan, IBM (Session Lead) Martin

More information

Towards the Magic Green Broker Jean-Louis Pazat IRISA 1/29. Jean-Louis Pazat. IRISA/INSA Rennes, FRANCE MYRIADS Project Team

Towards the Magic Green Broker Jean-Louis Pazat IRISA 1/29. Jean-Louis Pazat. IRISA/INSA Rennes, FRANCE MYRIADS Project Team Towards the Magic Green Broker Jean-Louis Pazat IRISA 1/29 Jean-Louis Pazat IRISA/INSA Rennes, FRANCE MYRIADS Project Team Towards the Magic Green Broker Jean-Louis Pazat IRISA 2/29 OUTLINE Clouds and

More information

Data Mining Governance for Service Oriented Architecture

Data Mining Governance for Service Oriented Architecture Data Mining Governance for Service Oriented Architecture Ali Beklen Software Group IBM Turkey Istanbul, TURKEY alibek@tr.ibm.com Turgay Tugay Bilgin Dept. of Computer Engineering Maltepe University Istanbul,

More information

NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing

NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing Purpose of the Workshop In October 2014, the President s Council of Advisors on Science

More information

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007 Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the

More information

SERVICE ORIENTED ARCHITECTURE

SERVICE ORIENTED ARCHITECTURE SERVICE ORIENTED ARCHITECTURE Introduction SOA provides an enterprise architecture that supports building connected enterprise applications to provide solutions to business problems. SOA facilitates the

More information

What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications.

What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications. What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications. 2 Contents: Abstract 3 What does DDS do 3 The Strengths of DDS 4

More information

for my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste

for my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste Which infrastructure Which infrastructure for my computation? Stefano Cozzini Democrito and SISSA/eLAB - Trieste Agenda Introduction:! E-infrastructure and computing infrastructures! What is available

More information

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain

More information

Exploiting peer group concept for adaptive and highly available services

Exploiting peer group concept for adaptive and highly available services Exploiting peer group concept for adaptive and highly available services Muhammad Asif Jan Centre for European Nuclear Research (CERN) Switzerland Fahd Ali Zahid, Mohammad Moazam Fraz Foundation University,

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Scientific Computing Programming with Parallel Objects

Scientific Computing Programming with Parallel Objects Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore

More information

HPCOR 2014. Data Management Policies Co-Chairs: Bill Allcock, Julia White. HPCOR 2014, June 18-19, Oakland, CA 1

HPCOR 2014. Data Management Policies Co-Chairs: Bill Allcock, Julia White. HPCOR 2014, June 18-19, Oakland, CA 1 HPCOR 2014 Data Management Policies Co-Chairs: Bill Allcock, Julia White HPCOR 2014, June 18-19, Oakland, CA 1 Contributors Bill Allcock, ANL Sasha Ames, LLNL Ashley Barker, ORNL Laura Biven, DOE Jeff

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information