Data Management/Visualization on the Grid at PPPL. Scott A. Klasky Stephane Ethier Ravi Samtaney

Similar documents
DATA MANAGEMENT, CODE DEPLOYMENT, AND SCIENTIFIC VISUALLIZATION TO ENHANCE SCIENTIFIC DISCOVERY IN FUSION RESEARCH THROUGH ADVANCED COMPUTING

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Clusters: Mainstream Technology for CAE

Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop

A GENERAL PURPOSE DATA ANALYSIS MONITORING SYSTEM WITH CASE STUDIES FROM THE NATIONAL FUSION GRID AND THE DIII D MDSPLUS BETWEEN PULSE ANALYSIS SYSTEM

IS-ENES/PrACE Meeting EC-EARTH 3. A High-resolution Configuration

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

THE NATIONAL FUSION COLLABORATORY PROJECT: APPLYING GRID TECHNOLOGY FOR MAGNETIC FUSION RESEARCH

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

Kriterien für ein PetaFlop System

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

On-Demand Supercomputing Multiplies the Possibilities

PARALLELS CLOUD STORAGE

2. COMPUTER SYSTEM. 2.1 Introduction

IBM Deep Computing Visualization Offering

Cray: Enabling Real-Time Discovery in Big Data

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

Client/Server Computing Distributed Processing, Client/Server, and Clusters

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

CHAPTER FIVE RESULT ANALYSIS

Stanford HPC Conference. Panasas Storage System Integration into a Cluster

Grid Scheduling Dictionary of Terms and Keywords

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Science DMZs Understanding their role in high-performance data transfers

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

Big Data and Cloud Computing for GHRSST

High Performance Computing. Course Notes HPC Fundamentals

Visualization Infrastructure and Services at the MPCDF

Cluster, Grid, Cloud Concepts

MOSIX: High performance Linux farm

globus online Cloud-based services for (reproducible) science Ian Foster Computation Institute University of Chicago and Argonne National Laboratory

Performance Monitoring on NERSC s POWER 5 System

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs + D. Pugmire, D. Camp, C. Garth, G.

Chapter 18: Database System Architectures. Centralized Systems

A Theory of the Spatial Computational Domain

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

Parallel Large-Scale Visualization

Parallels Server 4 Bare Metal

Real Time Analysis of Advanced Photon Source Data

High Availability of the Polarion Server

A Steering Environment for Online Parallel Visualization of Legacy Parallel Simulations

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

ECE6130 Grid and Cloud Computing

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

IBM Storwize V7000: For your VMware virtual infrastructure

Next-Generation Networking for Science

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Oracle WebLogic Server 11g: Administration Essentials

High-Performance Visualization of Geographic Data

PRIMERGY server-based High Performance Computing solutions

PETASCALE DATA STORAGE INSTITUTE. Petascale storage issues. 3 universities, 5 labs, G. Gibson, CMU, PI

Modeling Big Data/HPC Storage Using Massively Parallel Simula:on

OpenMP Programming on ScaleMP

Cloud Computing with Red Hat Solutions. Sivaram Shunmugam Red Hat Asia Pacific Pte Ltd.

IBM Communications Server for Linux - Network Optimization for On Demand business

Elements of Scalable Data Analysis and Visualization

NLSS: A Near-Line Storage System Design Based on the Combination of HDFS and ZFS

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

How To Create A Grid On A Microsoft Web Server On A Pc Or Macode (For Free) On A Macode Or Ipad (For A Limited Time) On An Ipad Or Ipa (For Cheap) On Pc Or Micro

Smart Campus Management with Cloud Services

WINDOWS AZURE AND WINDOWS HPC SERVER

A DISCUSSION ON DATA ACQUISITION, DATA MANAGEMENT AND REMOTE PARTICIPATION FOR ITER

Large Data Visualization using Shared Distributed Resources

IDL. Get the answers you need from your data. IDL

Supercomputing on Windows. Microsoft (Thailand) Limited

The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

SGI HPC Systems Help Fuel Manufacturing Rebirth

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

Powering Linux in the Data Center

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

IBM Smart Business Storage Cloud

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Windows Compute Cluster Server Miron Krokhmal CTO

Hadoop. Sunday, November 25, 12

Building a Converged Infrastructure with Self-Service Automation

Integrated Open-Source Geophysical Processing and Visualization

Linux clustering. Morris Law, IT Coordinator, Science Faculty, Hong Kong Baptist University

Protecting enterprise servers with StoreOnce and CommVault Simpana

Scalability of Master-Worker Architecture on Heroku

Large File System Backup NERSC Global File System Experience

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Parallels Cloud Storage

Data Requirements from NERSC Requirements Reviews

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure

LSKA 2010 Survey Report Job Scheduler

HPC technology and future architecture

Transcription:

Data Management/Visualization on the Grid at PPPL Scott A. Klasky Stephane Ethier Ravi Samtaney

The Problem Simulations at NERSC generate GB s TB s of data. The transfer time for practical visualization is too long. Using standard ftp, we get 600 KB/s. GTC data is over 3TB, transfer time = 2 months. Interferes with productivity of our research program. Leaves valuable data unexamined.

Visualization/Data Management 3 Categories of data and techniques Small (less than 1x10 6 mesh points) [1D/2D + Time] <200MB/Run. Often consists of many signals; commonplace in experimental datasets MDS+ is used for this type of data Grab data from a server to local client Since the dataset is small, the transfer time is small. Scalable display walls allow one to present many functions at one time. [NOTE: NO NEED for a cluster to drive wall for <20 screens]. KEEP THE COST DOWN! IDL is de-facto standard for this type of viz. in fusion community. We are developing Elvis as part of the Fusion Grid (http://www.fusiongrid.org). [Feibush/Klasky/McCune] Java. Fully collaborative with a multi-tier architecture; send the minimal amount of information to collaborate. Web-based graphics. Links to IDL math routines.

Visualization/Data Management Medium (less than 10x10 6 mesh points) [Typically 2D/3D + Time] [<50GB /simulation] Can be analyzed on 1 PC Use HDF5. Parallel I/O with MPI2 Highly supported by the general public FAST! (look at http://w3.pppl.gov/~pletzer/performance/hdf5-netcdf/index.html Data is analyzed many times, so grabbing data for every signal from a network does not make sense; 1 minute transfer medium 10 minute transfer for large signals. Locality of data is CRUCIAL for this type of data! Use AVS/Express 6 ($500/license onetime cost, with a $2,000 yearly maintenance fee) We continue to try other viz. products, but they do not meet our needs (IDL, OpenDX, SciRun,. VTK, Amira)

Visualization/Data Management Large (> 100x10 6 mesh points); [3D + Time] [n-tbytes/simulation] Key area of research, which must use parallel visualization! Locality of data is even more CRUCIAL for this type of data! Remote Visualization is common for this type of data. Possibly look at streaming-video running from some viz. supercomputer to analyze results. We often give up features, for usability.

Our Proposed Solution Data Grids: Support for existing AVS/Express-based data analysis infrastructure at PPPL that depends on having local copies of the data. This entails leveraging on the emerging DOE Science Grid infrastructure to provide efficient automated data migration and mirroring capabilities between PPPL and NERSC. Remote/Grid-Distributed Visualization: Build upon the tools that are already inside the Cactus framework NERSC/NGI-developed Visapult system Provides support for interactive remote visualization of large fusion datasets. Advanced Display Technologies: Leverage existing advanced display technologies High-resolution tiled display walls assembled at both PPPL and NERSC Build tools for parallel data analysis. PPPL is working with IBM research for a parallel MPI version of OpenDX Visapult technologies to use parallel viz. transferred to clients running AVS/Express Simultaneously transfer data from SP to HPSS to PPPL ~800GB/day (9MB/sec) Wall based visualization Using OpenDX + MPI Remote Visualization NERSC SP NERSC HPSS ESNET PPPL pared01 pared02 pared03 pared04 pared05 pared06 pared07 pared08 pared09 pared10 pared11pared12

What we need from MICS/DOE 2 CS PostDocs 2 years 1 at PPPL, 1 at NERSC 2 Graduate students (Picasso program) 1 at PPPL, 1 at NERSC OC12 ($100K)

Our Definition of a Princeton Grid Computer A group of trusted cluster computers who WANT to share resources at certain times. Standard Operating System Linux Keep directory structures similar Parallel Virtual File System (separate on each cluster) Keep firewall interaction at a minimum!! Part of TCF 19 node (38 AMD processors); gigabit network. PPPL 10 node (20 AMD processors); gigabit network. P.U. 10 node (20 AMD processors); gigabit network. GFDL.

Exploratory Analysis Grid Linked Environment (EAGLE) Objective: Utilize the Cactus computational framework to create a comprehensive grid-based simulation and analysis environment. Provides important capabilities such as Rapid prototyping Parallelization Architecture-independent checkpointing/ remote visualization/ computational monitoring Computing in a distributed heterogeneous environment. Future realistic simulations of fusion devices may require more computing power than any single machine can provide Enables the creation of large-scale multi-physics codes that span the spectrum from small-scale microturbulence to device-scale simulations A move to the distributed/metacomputing paradigm can help address these needs. Executes on wide-area distributed heterogeneous computing resources. Complement the existing SciDAC funded national fusion collaboratory.