Automatic Script Generation Based on User-System Interactions



Similar documents
HPC Wales Skills Academy Course Catalogue 2015

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Network Security In Linux: Scanning and Hacking

Beginners Shell Scripting for Batch Jobs

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

An Introduction to High Performance Computing in the Department

Operating System Structure

TEST AUTOMATION FRAMEWORK

Introduction to Running Computations on the High Performance Clusters at the Center for Computational Research

Using the Yale HPC Clusters

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Streamline Computing Linux Cluster User Training. ( Nottingham University)

The Advanced JTAG Bridge. Nathan Yawn 05/12/09

HP-UX Essentials and Shell Programming Course Summary

Maintaining Non-Stop Services with Multi Layer Monitoring

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Advanced PBS Workflow Example Bill Brouwer 05/01/12 Research Computing and Cyberinfrastructure Unit, PSU

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

SEMS: The SIP Express Media Server. FRAFOS GmbH

Installing IBM Websphere Application Server 7 and 8 on OS4 Enterprise Linux

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping

RECOVER ( 8 ) Maintenance Procedures RECOVER ( 8 )

PHP Debugging. Draft: March 19, Christopher Vickery

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Web Hosting: Pipeline Program Technical Self Study Guide

Parallel Debugging with DDT

SLURM Workload Manager

Visualization of Semantic Windows with SciDB Integration

Improved metrics collection and correlation for the CERN cloud storage test framework

Laboratory Report. An Appendix to SELinux & grsecurity: A Side-by-Side Comparison of Mandatory Access Control & Access Control List Implementations

Command Line - Part 1

Getting Started with HPC

Linux command line. An introduction to the Linux command line for genomics. Susan Fairley

Hadoop Basics with InfoSphere BigInsights

In: Proceedings of RECPAD th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal

Networks. Inter-process Communication. Pipes. Inter-process Communication

Using Parallel Computing to Run Multiple Jobs

Lab 2 : Basic File Server. Introduction

NEC HPC-Linux-Cluster

Chapter 3 Operating-System Structures

Green Telnet. Making the Client/Server Model Green

Grid 101. Grid 101. Josh Hegie.

CERULIUM TERADATA COURSE CATALOG

Command-Line Operations : The Shell. Don't fear the command line...

MatrixSSL Getting Started

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

winhex Disk Editor, RAM Editor PRESENTED BY: OMAR ZYADAT and LOAI HATTAR

Introduction. dnotify

Assessment for Master s Degree Program Fall Spring 2011 Computer Science Dept. Texas A&M University - Commerce

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

OPERATING SYSTEM SERVICES

Command Line Crash Course For Unix

pbuilder Debian Conference 2004

CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)

Application-Level Debugging and Profiling: Gaps in the Tool Ecosystem. Dr Rosemary Francis, Ellexus

Monitoring, Tracing, Debugging (Under Construction)

Yocto Project Eclipse plug-in and Developer Tools Hands-on Lab

Command Line Interface User Guide for Intel Server Management Software

Kiko> A personal job scheduler

Chapter 10 Case Study 1: LINUX

Harmonizing policy management with Murphy in GENIVI, AGL and TIZEN IVI

MD Link Integration MDI Solutions Limited

Analytics for Performance Optimization of BPMN2.0 Business Processes

Linux Tools for Monitoring and Performance. Khalid Baheyeldin November 2009 KWLUG

Load and Performance Load Testing. RadView Software October

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop Hands-On Exercises

DevKey Documentation. Release 0.1. Colm O Connor

Week Overview. Running Live Linux Sending from command line scp and sftp utilities

Computer Technology Department, De La Salle University, 2401 Taft Ave., Manila

Unix Scripts and Job Scheduling

Workshop on Hadoop with Big Data

Quick Tutorial for Portable Batch System (PBS)

Testing for Security

About This Document 3. Integration and Automation Capabilities 4. Command-Line Interface (CLI) 8. API RPC Protocol 9.

System Calls Related to File Manipulation

What's New in SAS Data Management

Parallel Programming for Multi-Core, Distributed Systems, and GPUs Exercises

Introduction to Shell Scripting

Hadoop Hands-On Exercises

Beyond Windows: Using the Linux Servers and the Grid

Monitoring Infrastructure (MIS) Software Architecture Document. Version 1.1

How To Run A Powergen On A Network With Nagios (Networking) On A Microsoft Powergen (Netware) On Your Computer Or Network With A Network (Network) On An Ipnet (

Visionet IT Modernization Empowering Change

Paper Designing Web Applications: Lessons from SAS User Interface Analysts Todd Barlow, SAS Institute Inc., Cary, NC

Assessment Plan for CS and CIS Degree Programs Computer Science Dept. Texas A&M University - Commerce

Deposit Identification Utility and Visualization Tool

THE EASY WAY EASY SCRIPT FUNCTION

Programa de Actualización Profesional ACTI Oracle Database 11g: SQL Tuning Workshop

Linux Kernel Architecture

Hodor and Bran - Job Scheduling and PBS Scripts

TNM093 Practical Data Visualization and Virtual Reality Laboratory Platform

Product Training Services. Training Options and Procedures for JobScheduler and YADE

Transcription:

Automatic Script Generation Based on User-System Interactions M. Golik, M. Bubak, and T. Szepieniec ACC Cyfronet AGH CGW'4, Kraków, 28.0.204

Background 2 User wants to conduct a simulation: Opens terminal, Stays in local shell or connects via SSH to cluster, Sets the environment, Executes all needed commands, probably multiple times with different parameters, Saves files, Exits. Users wants to conduct the same experiment again, so he/she has to repeat all command from memory, Or write script, Or find script on the Internet or elsewhere and then make necessary changes, Or...?

Creating the script 3 Approaches: start from scratch, use base bash history? Tools/helpers: Templates, Bash history, Guides, List of needed commands, Things to consider: Language: Bash, Python Program arguments and variables, Validation? Environment, Special requirements: PBS directives, different filesystems,

Why not Bash history? 4 pwdd pwd cd experiment ls experiment.exe input.txt experiment.exe input.txt echo 0 > input.txt experiment.exe input.txt experiment.exe input.txt 2 # misspelled # not important # does not change data # not important # ok # again? # ok # input changed? # different argumetn? Problems: repetitions, constants, misspelled commands, no validation

Proposed solution 5 Track users actions: executed programs and opened files and using connections between them autocratically create optimized scripts. Transparently track users interactions with the system, Provide analytic tools, Be simple for beginners but provide advanced for demanding users, Apply special validation or analysis: PBS directives, I/O characteristic,... Targeted audience: users without programming knowledge, Users who can create scripts by themselves can still take advantage of analytical parts of this tool or use output as base for improvements.

Concept 6 Concept is based on idea of data flow logical chain of files and processes connected by data reads and writes, Flow starts on input file and goes through intermediate programs and files while finally ending on final output file(s), Following this chain backwards recreates the workflow executed by user, Can be used for automatic parallelization!

Architecture of Proof-of-Concept implementation, modules 7 Tracer data fathering Parser parsing and filtering Analyser text analysis Visualizer graphical analysis Generator script generation

Tracer 8 Transparent! Saves shell history, environment, system calls and additional meta data, Uses standard system packages: GNU Bash and strace, Sits between shell and system... not between shell and user.

Example tracing session 9

Parser 0 Parses files generated by tracer Generates Internal Representation, In open, standard, text, format, Provides simple statistics, Can be used on different machines than the Tracer lines_read : 4284 totall_syscalls : 479 % syscalls : 97.54 handled_calls : 479 unhandled_calls : 0 % handled : 00.0 ============================== 2446 : write 567 : open 539 : read 466 : close 33 : fcntl 29 : dup2 28 : clone 8 : pipe 8 : execve 7 : socket 6 : unlink : chdir : dup

Analyser and Visualizer Uses IR for text and graphical presentation of gathered data, Does not modify IR, Can be used to present data about: File and process relations, Read and write operations, Missing files, Processes executed explicitly and implicitly, I/O characteristics, And more... Can be used for debugging.

Analyser text analysis example 2 [32276] ridft [r: 260499 B (in 44 op), w: 6555 B (in 584 op)]... ridft.log [bytes: 5324; op: 389; flags: O_WRONLY O_CREAT O_TRUNC] control [bytes: 24078; op: 845; flags: O_RDWR O_CREAT] coord [bytes: 434; op: 2; flags: O_RDWR O_CREAT]

Visualizer graphical analysis example 3

Script generator 4 Can create three types of scipts: Simple executable scripts, Scripts with PBS directives (based on gathered data) Scripts that create other scripts, e.x. script that first asks about input parameters and then create multiple executable scripts with different parameters and proper PBS directives. Can create scripts in any language (currently Bash only) Parameter sweep should be executed using best possible solution like multiple executables or array jobs, etc. In addition to recreating users workflow it can add standard checks like checking for presence of input file or argument type validation, Can perform automatic parallelization!.

Script generator example 5 [0] /mnt/auto/people/plgmgolik/turbomole/ridft.log [] /mnt/auto/people/plgmgolik/turbomole/control... Please provide number of targeted file [7]: 0 Generating backtrace from file [0] /mnt/auto/people/plgmgolik/turbomole/ridft.log --- SCRIPT START --- #!/bin/bash ridft > ridft.log 2>& if [[ $? -ne 0 ]]; then echo 'Command ridft exited with error'; exit 2; fi --- SCRIPT END ---

Script with additional validation 6 #!/bin/bash if [[! -f water.xyz ]]; then echo 'File [water.xyz] not found!'; exit ; fi x2t water.xyz >coord 2>preparation.txt if [[ $? -ne 0 ]]; then echo 'Command x2t exited with error'; exit 2; fi...

Automatic parallelization 7 F, F3, F4, F5 (circles) files P, P2, P3 (rectangles) processes F->P->P3 data flow can run in parallel to F3->P2->F4 P3 has to wait for F2 and F4 - P and P2 has to exit (successfully)

Embracing social aspect 8 Idea similar to highly popular Docker platform. Allow users to create common usage scripts, Users can learn new software by looking for it's common usage script in database, Everyone can fork the script and share the improved version, Can also possibly be used for sharing input data and results...

Development focus 9 Finalizing and releasing of specification of IR format and protocols. Better heuristics, Graphical tools for data flow manipulation and parameter matching, Social platform and/or additional simple utilities for data sharing, etc, Non bash version, System wide tracing?

Summary 2 0 Designed tools can aid users in writing scripts, Additionally providing some useful routines like input validation and automatic parallelization, Can help understand what program is doing ( strace on steroids ), Can help optimize I/O usage, Can help using PBS system comfortably, Can apply special site specific optimizations. SVN: mg.grid.cyf-kr.edu.pl/svn/monkey email: m.golik@cyfronet.pl