Quiz name: HUST15 Attendee Survey Date: 11/20/2015 Question with Most Correct Answers: #0 Total Questions: 15 Question with Fewest Correct Answers: #0



Similar documents
Lustre Monitoring with OpenTSDB

Jason Hill HPC Operations Group ORNL Cray User s Group 2011, Fairbanks, AK

Chao Chen 1 Michael Lang 2 Yong Chen 1. IEEE BigData, Department of Computer Science Texas Tech University

Sun Constellation System: The Open Petascale Computing Architecture

The RWTH Compute Cluster Environment

Scaling Graphite Installations

FLOW-3D Performance Benchmark and Profiling. September 2012

Improved metrics collection and correlation for the CERN cloud storage test framework

Automating Big Data Benchmarking for Different Architectures with ALOJA

Genome Analysis in a Dynamically Scaled Hybrid Cloud

TEST AUTOMATION FRAMEWORK

MPI / ClusterTools Update and Plans

Hadoop Cluster Applications

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

High Performance Computing Facility Specifications, Policies and Usage. Supercomputer Project. Bibliotheca Alexandrina

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

Lustre & Cluster. - monitoring the whole thing Erich Focht

Lustre * Filesystem for Cloud and Hadoop *

Lynx Design System Delivering Higher Productivity and Predictability in IC Design

A High Performance Computing Scheduling and Resource Management Primer

locuz.com HPC App Portal V2.0 DATASHEET

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

CRIBI. Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012

Monitoring Tools for Large Scale Systems

Database Services for CERN

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

Proactive Performance Monitoring Using Metric Extensions and SPA

SAS Application Performance Monitoring for UNIX

HPC Wales Skills Academy Course Catalogue 2015

Building Platform as a Service for Scientific Applications

Bright Cluster Manager

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

The Asterope compute cluster

Copyrighted , Address :- EH1-Infotech, SCF 69, Top Floor, Phase 3B-2, Sector 60, Mohali (Chandigarh),

Big Data Mining Services and Knowledge Discovery Applications on Clouds

whitepaper Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

/ Cloud Computing. Recitation 5 September 29 th & October 1 st 2015

How To Monitor Infiniband Network Data From A Network On A Leaf Switch (Wired) On A Microsoft Powerbook (Wired Or Microsoft) On An Ipa (Wired/Wired) Or Ipa V2 (Wired V2)

Workshop & Chalk n Talk Catalogue Services Premier Workshop & Chalk n Talk Catalogue

Introduction to Database as a Service

SLURM Workload Manager

bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 24.

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

Part I Courses Syllabus

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

Deploying complex applications to Google Cloud. Olia Kerzhner

An HPC Application Deployment Model on Azure Cloud for SMEs

Monitoring and Alerting

Uncovering degraded application performance with LWM 2. Aamer Shah, Chih-Song Kuo, Lucas Theisen, Felix Wolf November 17, 2014

Parallel Processing using the LOTUS cluster

New and Improved Lustre Performance Monitoring Tool. Torben Kling Petersen, PhD Principal Engineer. Chris Bloxham Principal Architect

Development at the Speed and Scale of Google. Ashish Kumar Engineering Tools

Bernie Velivis President, Performax Inc

Tools and strategies to monitor the ATLAS online computing farm

CA NSM System Monitoring Option for OpenVMS r3.2

Jenkins World Tour 2015 Santa Clara, CA, September 2-3

Big data blue print for cloud architecture

A Brief Introduction to Apache Tez

Self service for software development tools

Moving beyond hardware

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

How To Scale A Server Farm

<Insert Picture Here> Private Cloud with Fusion Middleware

vrealize Operations Manager Customization and Administration Guide

Scalable Architecture on Amazon AWS Cloud

CloudCenter Full Lifecycle Management. An application-defined approach to deploying and managing applications in any datacenter or cloud environment

Performance Testing at Scale

SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009

Load Testing at Yandex. Alexey Lavrenuke

SQL Server Solutions GETTING STARTED WITH. SQL Safe Backup

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Sun Grid Engine, a new scheduler for EGEE

SCI Briefing: A Review of the New Hitachi Unified Storage and Hitachi NAS Platform 4000 Series. Silverton Consulting, Inc.

Interoperability between Sun Grid Engine and the Windows Compute Cluster

A web-based contact center performance analytics application that gathers information from across your customer interaction technologies and provides

Big Data Evaluator 2.1: User Guide

Cloud Utilization for Online Price Intelligence

Performance White Paper

The Complete Performance Solution for Microsoft SQL Server

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

LMT Lustre Monitoring Tools

Matlab on a Supercomputer

HPC-Nutzer Informationsaustausch. The Workload Management System LSF

Effective Java Programming. efficient software development

The GRID and the Linux Farm at the RCF

DEPLOYMENT GUIDE Version 1.0. Deploying the BIG-IP LTM with Apache Tomcat and Apache HTTP Server

PHP on IBM i: What s New with Zend Server 5 for IBM i

Unifying IT How Dell Is Using BMC

Using Parallel Computing to Run Multiple Jobs

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

Cloud-pilot.doc SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A

Monitoring HTCondor with Ganglia

Current Status of FEFS for the K computer

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Transcription:

Quiz name: HUST15 Attendee Survey Date: 11/20/2015 Question with Most Correct Answers: #0 Total Questions: 15 Question with Fewest Correct Answers: #0 1. Who are you? 5/29 A Developer 1/29 B HPC User 13/29 C System Administrator 12/29 D User Support 10/29 E Manager 2. How do users manage their environment on your HPC system? 0/29 A Nothing 10/29 B TCL modules 16/29 C Lmod 0/29 D SoftEnv 2/29 E Dotkit 4/29 F Other 3. If using modules (or similar), what module naming scheme do you use? 10/29 A Flat 4/29 B Categorized 15/29 C Hierarchical 1/29 D Something else 0/29 E N/A 4. Do you automatically generate module files? 8/29 A Yes 19/29 B No, we write them by hand. 2/29 C Don't know. 5. Do you use tools to build/install scientific software? 13/29 A Hand-written scripts 12/29 B Build tool Page 1 of 8

7/29 C Binary Packages (RPM, deb, etc.) 12/29 D No. We build manually. 6. Do you collaborate with other sites on software deployment? 6/29 A Yes 9/29 B No 12/29 C Sometimes 3/29 D I don't know 7. Are you using tools to track executable/library use at your site? 7/29 A Yes 13/29 B No 6/29 C Sometimes 2/29 D I don't know. 8. Do you test software builds? 5/29 A Automatically, with each install 8/29 B Manually (always) 18/29 C Manually (sometimes) 1/29 D No testing. 0/29 E I don't know 9. Do you test the performance of your HPC software? 6/29 A Yes 5/29 B No 16/29 C Sometimes 1/29 D I don't know. 10. Do you monitor the performance of your software over time? 2/29 A Yes 16/29 B No 8/29 C Sometimes 1/29 D I don't know 11. What are the biggest user support issues you face at your HPC site? Anon a1e80 Page 2 of 8

providing many tools and versions for systems without internet access Anon b4a30 - Anon c5384 Inefficient use of job scheduler. Anon 13d81 Lack of support staff Anon 5d124 Users don't know how debug problems or what information to provide when they request help. Anon eb997 reducing the number of requests that we can automate checks for Anon 6dc9e Increasingly naive userbase which requires more and better tools to help them. Anon 4b42b understanding performance bottlenecks. Anon 1ad40 Not enough time Anon 1a831 Getting users to use the system as intended with a constantly shifting and renewing user base Anon db4cc Not enough support staff Anon 1077a Naively written job scripts, lack of fundamental understanding of HPC concepts and limitations Anon 9f591 Lack of times/staff Anon 22ff8 batch scheduling parallel file system performance Anon 54955 not enough staff Anon 4bd21 Building their custom code and running code. Anon 5b047 User failure to comply woth best practices. Anon d0729. Anon 624dd Retention of high quality staff; Process improvement Anon 93ce4 Bad Code Anon 1d9e0 Page 3 of 8

Would you please help me install xxxx? (Translation: please let me know when you're done installing this for me.) Anon 80edc software installation requests, issues with resource manager/scheduler (Torque/MOAB) Anon 3d70f staff time for installation of software Anon 8be08 * Increasing number of projects and users generates increasing and even more varying support load * Root-causing failed jobs (primarily OOM issues) * Help with getting jobs to run correctly * Software installation requests Anon 2b92d. Anon e4620 Education 12. What is the highest level of support your provide for your users? 0/29 A None 8/29 B Email 4/29 C Telephone hotline with dedicated tech support 2/29 D Telephone hotline with dedicated technical experts 19/29 E Close collaboration between support experts and application teams 2/29 F Other 13. What are the most important performance metrics for your HPC applications? Anon a1e80 time to solution io performance uptime/availability/sla Anon b4a30 - Anon c5384 Time to solution. Anon 13d81 number of users, student users Anon 5d124 Time to result for user. Anon eb997 repeatable, predictable execution times Anon 6dc9e. Anon 4b42b Page 4 of 8

run time, queue wait. Anon 1ad40 Utilization, time to finish. Anon 1a831 Queuing times Annual user surveys Anon db4cc None Anon 1077a Depends on the application Anon 9f591 None Anon 22ff8 node utilization Anon 54955 happy end users Anon 4bd21 Efficiency Anon 5b047 cpu usage, ib traffic Anon d0729 I/O related Anon 624dd Time to solution; Memory usage; I/O Anon 93ce4 Science Output Anon 1d9e0 Scalability understood in the broadest sense -- e.g. good citizenship with respect to Lustre when running at scale. Anon 80edc runtime, memory usage, scalability Anon 3d70f memory usage, cpu utilization Anon 8be08 * mpi library time * memory (max, min, avarage etc) * i/o load (max, min, avarage etc) * cpu usage (max, min, avarage etc) Anon 2b92d mem/core usage Anon e4620 Memory usage Page 5 of 8

14. Could your user support issues be solved by better automation? If so, what kind of tools would you use? Anon a1e80 n/a Anon c5384 I don't think so, unfortunately... Anon 13d81 yes, build tools, modules Anon 5d124 Yes. Very interested in many of the tools presented at HUST'14 and HUST'15. Anon eb997 automated environment checks Anon 6dc9e Would help, both for admin and users. Looking at build tools & user file mgmt tools. Anon 4b42b certainly. auto profiling and user notification. Anon 1ad40 Probably. XALT, EasyBuild, etc. Anon 1a831 Most likely. Tools which automatically present performance issues and other problems directly to the user as well as provides systemwide views for the administrator. The frontend should be both CLI based and a web GUI that has powerful capabilities to do ad hoc data exploration. Anon db4cc Some could. Anon 1077a Some could be automated. Run-time monitoring tools would help. Anon 9f591 No Anon 22ff8 some could be solved, automated monitoring of i/o may help Anon 54955 sw package expertise and many of the tools in the morning session were super great!!! Anon 4bd21 Some. Anon 5b047 no Anon 624dd We use a locally-developed tool tailored to our needs Anon 93ce4 performance detection Anon 1d9e0 Tools that detect, prevent, correct common mistakes. Anon 80edc Page 6 of 8

YES, and we do that already (EasyBuild for software installation) Anon 3d70f yes, and will be. easybuild, spack, and more extensive monitoring for problem identification. Anon 8be08 Not solved, but helped by: * Per-job resource utilization reports * Automatic OOM events reports * Automatic feedback from the scheduler reg. expected queue time, as well as some submit-time sanity checks Anon 2b92d Definitely. Would like to make use of better job analysis tools as described in the workshop. Anon e4620 Yes, job monitoring would help 15. What steps could be taken to build wider collaboration among HPC sites? Anon a1e80 n/a Anon c5384 A robust and standardized installation procedure, which removes "customization" for path, environmental vars etc from documentations and others. Anon 13d81 normalize support tool stack Anon 5d124 The HUST workshop has been a great start, provides a venue for tools work. Discussing the need with program managers/funding agencies, will also help community get resources to develop these tools. Ensuring that tools are flexible so that the can be adapted to the ecentricities at each site is also important. Anon eb997 continued venues like hust Anon 6dc9e HUST is a great start. Is there something similar for cluster sysadmin? Only see occasional BOFs. Is operations mgmt within HUST's scope? Anon 4b42b it would help if sites took a look at whats out there before creating new stuff. i found it odd that with so many performance monitoring tools out there (collectl, collectd, ganglia, etc. ) that Tacc decided to start from scratch with tacc stats. wouldn't it make more sense to work within the community? Anon 1ad40 Not sure. Anon 1a831 Contributing recipes / configuration management playbooks into a central repository. The OpenHPC initiative's repository could perhaps provide a good platform for this. Anon db4cc Workshops, mailing list, web site Anon 1077a I don't know. Page 7 of 8

Anon 9f591 More events like HUST Anon 22ff8 hust is a great start awareneas and availability of automated tools in opensource community Anon 54955 I would love to help establish a formal HUST user group!! Count me in...skalwani@yahoo.com Anon 4bd21 Open source tools with good documentation Anon 5b047 na Anon 624dd More workshops for HPC support staff; mailing lists targetted at support staff Anon 93ce4 Sites actually using open tools, and not re-writing tools every time.. For example: taccstats, at the time it was written multiple tools were available to gather metrics, collectd, ganglia, nwperf, etc. and to visualize, cview, ganglia, and others. But it seems they decided to start all over, which makes just one more tool to look at and decide to use. if they had added plug-ins to collectd say, then many people would have access to them, they would have a base of open source developers looking at code, and all the visualization and analytics codes would be able to directly use the data. Anon 1d9e0 Frankly, more liberal ops/support budgets. Collaboration is rewarding and valuable. But those alligators need attention now! Anon 80edc OpenHPC is a good step, workshops like HUST, BoF sessions like Getting Scientific Software Installed a common mailing list specific to HPC user support!!! Anon 3d70f just evangelism Anon 8be08 This workshop helps. Anon 2b92d unfortunately we're a much smaller site with limited staff - this leave little time for spending time on being involved in the collaborations that we'd like. More funding would be great - of course - but we're working towards being even more involved. Anon e4620 Don't know Page 8 of 8