Experience of Data Transfer to the Tier-1 from a DIRAC Perspective



Similar documents
Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

Building a Top500-class Supercomputing Cluster at LNS-BUAP

New Storage System Solutions

Sun Constellation System: The Open Petascale Computing Architecture

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA)

IMPLEMENTING GREEN IT

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Cluster Implementation and Management; Scheduling

Running Native Lustre* Client inside Intel Xeon Phi coprocessor

Next Generation Tier 1 Storage

Interact Intranet Version 7. Technical Requirements. August Interact

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

Virtual Appliance Setup Guide

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS

The Hartree Centre helps businesses unlock the potential of HPC

Quick Reference Selling Guide for Intel Lustre Solutions Overview

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

HPC data becomes Big Data. Peter Braam


Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal by SGI Federal. Published by The Aerospace Corporation with permission.

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Overview of HPC Resources at Vanderbilt

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Analisi di un servizio SRM: StoRM

Parallel Processing using the LOTUS cluster

Globus and the Centralized Research Data Infrastructure at CU Boulder

Hardware Performance Optimization and Tuning. Presenter: Tom Arakelian Assistant: Guy Ingalls

Lessons learned from parallel file system operation

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Performance Evaluation of Amazon EC2 for NASA HPC Applications!

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

Enterprise-class Backup Performance with Dell DR6000 Date: May 2014 Author: Kerry Dolan, Lab Analyst and Vinny Choinski, Senior Lab Analyst

RFQ IT Services. Questions and Answers

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

High Performance Computing in CST STUDIO SUITE

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Architecting a High Performance Storage System

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

RICOH Data Center Services

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

Virtual Appliance Setup Guide

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Virtual Server and Storage Provisioning Service. Service Description

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Enterprise Deployment

ACANO SOLUTION VIRTUALIZED DEPLOYMENTS. White Paper. Simon Evans, Acano Chief Scientist

Steven Newhouse, Head of Technical Services

REQUEST FOR PROPOSAL FOR DATA CENTRE CO-LOCATION AND NETWORK CONNECTIVITY SOLUTION Pre-Bid Meeting Held On : May 18, 2010, 15:30 Hrs

IT Discovery / Assessment Report Conducted on: DATE (MM/DD/YYY) HERE On-site Discovery By: AOS ENGINEER NAME Assessment Document By: AOS ENGINEER NAME

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Centrata IT Management Suite 3.0

HPC Update: Engagement Model

A Flexible Cluster Infrastructure for Systems Research and Software Development

1 DCSC/AU: HUGE. DeIC Sekretariat /RB. Bilag 1. DeIC (DCSC) Scientific Computing Installations

White Paper Solarflare High-Performance Computing (HPC) Applications

Making the Business and IT Case for Dedicated Hosting

An Oracle White Paper June Oracle Database Firewall 5.0 Sizing Best Practices

White paper: Unlocking the potential of load testing to maximise ROI and reduce risk.

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2015

Virtualized Disaster Recovery (VDR) Overview Detailed Description... 3

OBSERVEIT DEPLOYMENT SIZING GUIDE

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Stingray Traffic Manager Sizing Guide

DATA ED EDINBURGH DATA SCIENCE AND MANAGING NATIONAL DATA SERVICES AT EDINBURGH PROF MARK PARSONS

Performance Characteristics of Large SMP Machines

Altix Usage and Application Programming. Welcome and Introduction

Network Security Platform 7.5

Mass Storage System for Disk and Tape resources at the Tier1.

CHESS DAQ* Introduction

An overview of Drupal infrastructure and plans for future growth. prepared by Kieran Lal and Gerhard Killesreiter for the Drupal Association

Dimension Data Enabling the Journey to the Cloud

Cloud Computing and Amazon Web Services

E4 UNIFIED STORAGE powered by Syneto

Transcription:

Experience of Data Transfer to the Tier-1 from a DIRAC Perspective Lydia Heck Institute for Computational Cosmology Manager of the DiRAC-2 Data Centric Facility COSMA 1

Talk layout Introduction to DiRAC? The DiRAC computing systems What is DiRAC What type of science is done on the DiRAC facility? Why do we need to copy data to RAL? Copying data to RAL network requirements Collaboration between DiRAC and RAL to produce the archive Setting up the archiving tools Archiving Open issues Conclusions 2

Introduction to DiRAC DIRAC -- Distributed Research utilising Advanced Computing established in 2009 with DiRAC-1 Support of research in theoretical astronomy, particle physics and nuclear physics Funded by STFC with infrastructure money allocated from the Department for Business, Innovation and Skills (BIS) The running costs, such as staff costs and electricity are funded by STFC 3

Introduction to DiRAC, cont d 2009 DiRAC-1 8 installations across the UK of which COSMA-4 at the ICC in Durham is one. Still a loose federation. 2011/2012 DiRAC-2 major funding of 15M for e-infrastructure in bidding to host 5 installations identified judged by peers for successful bidders scrutiny and interview by representatives for BIS to see if we could deliver by a tight deadline 4

Introduction to DiRAC, cont d DiRAC has full management structure. Computing time on the DiRAC facility is allocated through a peer-reviewed procedure. Current director: Dr Jeremy Yates, UCL Current technical director:prof Peter Boyle, Edinburgh 5

The DiRAC computing systems Blue Gene Edinburgh Cosmos Cambridge Complexity Leicester Data Centric Durham Data Analytic Cambridge 6

The Bluegene @ DiRAC Edinburgh IBM Blue Gene 98304 cores 1 Pbyte of GPFS storage designed around (Lattice)QCD applications 7

COSMA @ DiRAC (Data Centric) Durham Data Centric system IBM IDataplex 6720 Intel Sandy Bridge cores 53.8 TB of RAM FDR10 infiniband 2:1 blocking 2.5 Pbyte of GPFS storage (2.2 Pbyte used!) 8

Complexity @ DiRAC Leicester Complexity HP system 4352 Intel Sandy Bridge cores 30 Tbyte of RAM FDR 1:1 non-blocking 0.8 Pbyte of Panasas storage 9

Cosmos @ DiRAC (SMP) Cambridge COSMOS SGI shared memory system 1856 Intel Sandy Bridge cores 31 Intel Xeon Phi coprocessors 14.8 Tbyte of RAM 146 Tbyte of storage 10

HPCS @ DiRAC (Data Analytic) Cambridge Data Analytic Dell 4800 Intel Sandy Bridge cores 19.2 TByte of RAM FDR Infiniband 1:1 nonblocking 0.75 PB of Lustre storage 11

What is DiRAC A national service run/managed/allocated by the scientists who do the science funded by BIS and STFC The systems are built around and for the applications with which the science is done. We do not rival a facility like ARCHER, as we do not aspire to run a general national service. DiRAC is classed as a major research facility by STFC on a par with the big telescopes 12

What is DiRAC, cont d Long projects with significant amount of CPU hours allocated for 3 years typically on a specific system for 2012 2015 with examples: Cosmos - dp002 : ~20M cpu hours on Cambridge Cosmos Virgo-dp004 : 63M cpu hours on Durham DC UK-MHD-dp010 : 40.5M cpu hours on Durham DC UK-QCD-dp008 : ~700M cpu hours on Edinburgh BG Exeter dp005: ~15M cpu hours on Leicester Complexity HPQCD dp019 : ~20M cpu hours on Cambridge Data Analytic 13

What type of Science is done on DiRAC? For the highlights of science carried out on the DiRAC facility please see: http://www.dirac.ac.uk/science.html Specific example: Large scale structure calculations with the Eagle run 4096 cores ~8 GB RAM/core 47 days = 4,620,288 cpu hours 200 TB of data 14

Why do we need to copy data (to RAL)? Original plan - each research project should make provisions for storing the research data requires additional storage resource at researchers home institutions Not enough provision will require additional funds. data creation considerably above expectation? if disaster struck many cpu hours of calculations would be lost. 15

Why do we need to copy data (to RAL)? Research data must now be shared with/available to interested parties Install DiRAC s own archive requires funds and currently there is no budget. we needed to get started: Jeremy Yates negotiated access to the RAL archive system Acquire expertise Identify bottlenecks and technical challenges submitted 2,000,000 files and created an issue at the file servers How can we collaborate and make use of previous experience. AND: copy data! 16

Copying data to RAL network requirements network bandwidth situation for Durham now: currently possible 300-400 Mbytes/sec required investment and collaboration from DU CIS upgrade to 6GBit/sec to JANET - Sep 2014 past: will be 10 Gbit/sec by end of 2015 infra structure already installed identified Durham related bottlenecks - FIREWALL 17

Copying data to RAL network requirements network bandwidth situation for Durham investment to by-pass of external campus firewall: two new routers (~ 80k) configured for throughput with minimal ACL enough to safeguard site. deploying internal firewalls part of new security infrastructure, essential for such a venture Security now relies on front-end system of Durham DiRAC and Durham GridPP. 18

Copying data to RAL network requirements Result for COSMA and GridPP in Durham guaranteed 2-3 Gbit/sec with bursts of up to 3-4Gbit/sec (3 Gbit/sec outside of term time) pushed the network performance for Durham GridPP from bottom 3 in the country to top 5 of the UK GridPP sites achieves up to 300 400 Mbyte/sec throughput to RAL on archiving depending on file sizes. 19

Collaboration between DiRAC and GridPP/RAL Durham Institute for Computational Cosmology (ICC) volunteered to be the prototype installation Huge thanks to Jens Jensen and Brian Davies - there were many emails exchanged, many questions asked and many answers given. Resulting document Setting up a system for data archiving using FTS3 by Lydia Heck, Jens Jensen and Brian Davies 20

Setting up the archiving tools Identify appropriate hardware could mean extra expense: need freedom to modify and experiment with - cannot have HPC users logged in and working! free to do very latest security updates requires optimal connection to storage - infiniband card 21

Setting up the archiving tools Create an interface to access the file/archving service at RAL using the GridPP tools gridftp Globus Toolkit also provides Globus Connect Trust anchors (egi-trustanchors) voms tools (emi3-xxx) fts3 (cern) 22

Archiving? long-lived voms proxy? myproxy-init; myproxy-logon; voms-proxy-init; ftstransfer-delegation How to create a proxy and delegation that lasts weeks even months? still an issue grid-proxy-init; fts-transfer-delegation grid-proxy-init valid HH:MM fts-transfer-delegation e time-in-seconds creates proxy that lasts up to certificate life time. 23

Archiving Large files optimal throughput limited by network bandwidth Many small files limited by latency; using -r flag to ftstransfer-submit to re-use connection Transferred: ~40 Tbytes since 20 August ~2M files challenge to FTS service at RAL User education on creating lots of small files 24

Open issues ownership and permissions are not preserved depends on single admin to carry out. what happens when content in directories change? complete new archive sessions? tries to archive all the files again but then fails as file already exists should be more like rsync 25

Conclusions With the right network speed we can archive the DiRAC data to RAL. The documentation has to be completed and shared with the system managers on the other DiRAC sites Each DiRAC site will have their own dirac0x account Start with and keep on archiving Collaboration between DiRAC and GridPP/RAL DOES work! Can we aspire to more? 26