Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC

Similar documents
CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

The dcache Storage Element

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

PES. Batch virtualization and Cloud computing. Part 1: Batch virtualization. Batch virtualization and Cloud computing

Vico Software Installation Guide

Shoal: IaaS Cloud Cache Publisher

Report from SARA/NIKHEF T1 and associated T2s

The CMS analysis chain in a distributed environment

Common Server Setups For Your Web Application - Part II

NETWRIX EVENT LOG MANAGER

Technical White Paper BlackBerry Enterprise Server

Installing and Configuring vcenter Multi-Hypervisor Manager

Deploying Adobe Experience Manager DAM: Architecture blueprints and best practices

Kaseya Server Instal ation User Guide June 6, 2008

Ignify ecommerce. Item Requirements Notes

Volume SYSLOG JUNCTION. User s Guide. User s Guide

CMS Dashboard of Grid Activity

Cisco Wide Area Application Services Optimizes Application Delivery from the Cloud

W H I T E P A P E R : T E C H N I C A L. Understanding and Configuring Symantec Endpoint Protection Group Update Providers

ESX 4 Patch Management Guide ESX 4.0

Infor Web UI Sizing and Deployment for a Thin Client Solution

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Storage strategy and cloud storage evaluations at CERN Dirk Duellmann, CERN IT

Sage Timberline Enterprise Installation and Maintenance Guide

Shared Parallel File System

Evolution of Database Replication Technologies for WLCG

Online Transaction Processing in SQL Server 2008

RightNow November 09 Workstation Specifications

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

DEPLOYMENT GUIDE Version 1.2. Deploying the BIG-IP system v10 with Microsoft Exchange Outlook Web Access 2007

WebSphere Portal 8 Using GPFS file sharing in a Portal Farm. Test Configuration

StreamServe Persuasion SP5 StreamStudio

Cloud Gateway. Agenda. Cloud concepts Gateway concepts My work. Monica Stebbins

Installation Instructions Release Version 15.0 January 30 th, 2011

Cisco Wide Area Application Services Software Version 4.1: Consolidate File and Print Servers

Status and Evolution of ATLAS Workload Management System PanDA

McAfee Web Gateway 7.4.1

Running a Workflow on a PowerCenter Grid

How To Manage Storage With Novell Storage Manager 3.X For Active Directory

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

Quantum StorNext. Product Brief: Distributed LAN Client

OTM Performance OTM Users Conference Jim Mooney Vice President, Product Development August 11, 2015

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Installation Notes for Outpost Network Security (ONS) version 3.2

Get Success in Passing Your Certification Exam at first attempt!

Metalogix SharePoint Backup. Advanced Installation Guide. Publication Date: August 24, 2015

DEPLOYMENT GUIDE Version 1.2. Deploying the BIG-IP System v10 with Microsoft IIS 7.0 and 7.5

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Metalogix Replicator. Quick Start Guide. Publication Date: May 14, 2015

ZingMe Practice For Building Scalable PHP Website. By Chau Nguyen Nhat Thanh ZingMe Technical Manager Web Technical - VNG

Release Notes. CTERA Portal May CTERA Portal Release Notes 1

CISCO WIDE AREA APPLICATION SERVICES (WAAS) OPTIMIZATIONS FOR EMC AVAMAR

Owner of the content within this article is Written by Marc Grote

Multi-site Best Practices

Gladinet Cloud Backup V3.0 User Guide

Step-by-Step Guide to Setup Instant Messaging (IM) Workspace Datasheet

Mediasite for the enterprise. Technical planner: TP-05

TECHNICAL PAPER. Veeam Backup & Replication with Nimble Storage

HTTP-FUSE PS3 Linux: an internet boot framework with kboot

High Availability Databases based on Oracle 10g RAC on Linux

Table 1. Requirements for Domain Controller. You will need a Microsoft Active Directory domain. Microsoft SQL Server. SQL Server Reporting Services

The Deployment Production Line

Benchmarking FreeBSD. Ivan Voras

Lab Testing Summary Report

Isilon OneFS. Version 7.2. OneFS Migration Tools Guide

PC-Duo Web Console Installation Guide

How To Use Gfi Mailarchiver On A Pc Or Macbook With Gfi From A Windows 7.5 (Windows 7) On A Microsoft Mail Server On A Gfi Server On An Ipod Or Gfi.Org (

Database Monitoring Requirements. Salvatore Di Guida (CERN) On behalf of the CMS DB group

Industrial Network Security and Connectivity. Tunneling Process Data Securely Through Firewalls. A Solution To OPC - DCOM Connectivity

WAN Optimization for Microsoft SharePoint BPOS >

SysPatrol - Server Security Monitor

Distributed File Systems

Virtualization Management the ovirt way

Oracle Applications Release 10.7 NCA Network Performance for the Enterprise. An Oracle White Paper January 1998

Transcription:

EGEE and glite are registered trademarks Enabling Grids for E-sciencE Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC Elisa Lanciotti, Arnau Bria, Gonzalo Merino PIC Tier1, Barcelona 5 th EGEE User Forum, Uppsala, April 12 th 2010 www.eu-egee.org

Contents Current model to distribute VO specific software to WLCG sites Some issues with the current model An alternative model to distribute software packages to the WNs using web caching, focusing on LHCb use case Description of the setup at PIC Functionality tests and preliminary results A second alternative model using CVMFS Prototype at PIC and first tests Summary and outlook E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 2

Current model In a distributed computing model as WLCG the software of VO specific applications has to be efficiently distributed to any site of the Grid Applications software currently installed in a shared area of the site visible for all worker nodes (WN) of the site (NFS, AFS or other) The software is installed by jobs which run on the SGM node (a privileged node of the computing farm where the shared area is mounted in write mode) PIC LAN SGM node NFS server RW RO WN E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 3

Some issues with the current model Enabling Grids for E-sciencE Some issues observed with this model: NFS scalability issues Shared area sometimes not reachable (not properly mounted on the WN, or too loaded NFS server..) NFS locked by SQLite (known bug if NFS is mounted in r-w mode) Software installation in many Grid sites is a tough task (job failures, resubmission, tags publications...) Limited quota per VO in the shared area: if VOs want to install new releases and keep the old ones they have to ask for an increase of quota For LHCb: In case the software area is not reachable jobs as fall back case can download the software from a web server and install the application locally Drawback: network latency to download the application tarballs through the WAN E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 4

An alternative model using http caching Enabling Grids for E-sciencE The software is accessed through http from a central web server at CERN A web cache at the site stores all the packages currently in use: Squid is a very good solution for this: optimizes network traffic and provides very efficient caching WNs access the software through the LAN Potentially more scalable: using local caches at the WN and using p2p transfer among WNs ORIGIN WEB SERVER at CERN A web cache and http proxy server: Squid PIC LAN WN Estimation of the disk cache on the WN about 3GB. Computed on the basis of LHCb use case: last 3-4 DaVinci (data analysis) versions in use + official production software E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 5

Test setup at PIC Tier1 Two level hierarchy: One main Squid server acting both as http proxy and as main cache at the site One Squid at each WN with a small cache (2-3 GB) and limited memory requirement (recommended 32 MB memory for each GB of disk cache). Configured a test queue with only these WNs visible through ce-test.pic.es Test job: a bash script which export VO_LHCB_SW_DIR=. export http_proxy=localhost:3128 executes install_project DaVinci(v25r1,v25r2,v24r7) runs DaVinci (analysis software of LHCb) No development required: LHCb standard installation tools already foresee local installation E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 6

Functionality tests First run: the job triggers the software download from the software repository at CERN software packages not cached on the WN Squid on the WN forwards the request to the parent cache. The parent cache doesn't have the objects cached either, so it sends the http request to the origin server About 5 minutes to download one release (600 MB). See picture. Following runs: no network traffic. Installation time dominated by the software tarballs decompression: 1 min to uncompress one release for a 8 x Xeon X5355 @ 2.66 GHz core box, but increases dramatically with the number of concurrent jobs (disk contention) Download of one release to 3 WNs at the same time Parent cache WN E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 7

Advantages and drawbacks Advantages wrt the current model: No need of jobs for software installation site by site Ease of maintenance at sites: standard web caches, no need to worry about software area size or quota per VO. No need to mount the shared area on all WNs (source of quite a lot GGUS tickets at many sites..) Drawbacks: Unnecessary network traffic due to the fact that the software packages are delivered in tar archives of what the jobs need only 10% (in average) Even in the best case (the package is already cached in the WN and no network traffic is generated) there is an overhead due to uncompress tar archives every time. Possible solution: CVMFS, a network file system being developed at CERN in the framework of the CERNVM project E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 8

Another alternative model yet: CVMFS to access software CVMFS is a client-server file system developed to deliver software distributions to virtual machines The software reside in form of binaries on a repository server The software repository is seen as a directory tree read-only file system by the remote system CVMFS is implemented as FUSE module Files are accessed through http(s) protocol, avoiding firewall issues Web caching reduces to the minimum network latency Client The remote repository is mounted as a RO file system File catalog + cache Http (data are downloaded zlibcompressed) ORIGIN WEB SERVER hosting the software repository E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 9

Using CVMFS to distribute software to 'real' worker nodes at PIC Enabling Grids for E-sciencE Test setup at PIC: CVMFS installed in a WN at PIC. The LHCb software repository hosted by a CERN web server is mounted on the WN as a RO file system Preliminary results: First run of a test job which runs DaVinci-v25r1: it takes 15 minutes. About 300MB transferred through the WAN and cached locally The actual content of the local cache on the WN is around 700MB (it also contains catalogue files needed by CVMFS client) Running two more DaVinci versions (v24r7, v25r2) increase the local cache only by 100MB (many files are common) Second run of the same version: 35 s, No network traffic. Incoming network traffic to the WN while running DaVinci-v25r1 About 300kb/s during 15 min E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 10

Next to do More tests foreseen, also in coordination with CVMFS developers. Even if CVMFS was designed to be used in virtual machines in the framework of CVM project, they consider interesting this application on real WNs Compare the network traffic to the total amount of data transferred (this gives an idea of the quality of the compression) Run N concurrent jobs which download data from the same repository to see how the load increases on the origin web server hosting the repository Setup a Squid proxy local at the site that mirrors the main web server which hosts the software repository (load balance of the traffic). Compare the network latency in two cases E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 11

Summary and outlook Setup based on Squid has some advantages wrt the current model of software distribution, but also presents some drawbacks: Need to install Squid on every WN Unnecessary network traffic to transfer big tar archives Serious overhead at the beginning of the job to uncompress tar archives Not sure whether going on with testing activity Setup based on CVMFS has just been started and only some preliminary results are available. It looks promising and more tests are foreseen, also in collaboration with CVMFS developers. So far no major drawback observed. Possible problems: the CVMFS repository has to be kept up to date. Need to verify if the VO takes care of that E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 12

Acknowledgements Jakob Blomer (CERN) for his support with CVMFS Hubert DeGaudenzi (CERN) for his support with LHCb installation tools Thank you for your attention! Questions? Feedback to lanciotti@pic.es E. Lanciotti (PIC) - 5th EGEE UF, Uppsala 12-15 April 2010 13