Remote Solve Manager (RSM) ANSYS, Inc. ANSYS Release 12.1 Southpointe November 2009 275 Technology Drive ANSYS, Inc. is Canonsburg, PA 15317 certified to ISO ansysinfo@ansys.com 9001:2008. http://www.ansys.com (T) 724-746-3304 (F) 724-514-9494
Copyright and Trademark Information 2009 SAS IP, Inc. All rights reserved. Unauthorized use, distribution or duplication is prohibited. ANSYS, ANSYS Workbench, Ansoft, AUTODYN, EKM, Engineering Knowledge Manager, CFX, FLUENT, HFSS and any and all ANSYS, Inc. brand, product, service and feature names, logos and slogans are registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries in the United States or other countries. ICEM CFD is a trademark used by ANSYS, Inc. under license. CFX is a trademark of Sony Corporation in Japan. All other brand, product, service and feature names or trademarks are the property of their respective owners. Disclaimer Notice THIS ANSYS SOFTWARE PRODUCT AND PROGRAM DOCUMENTATION INCLUDE TRADE SECRETS AND ARE CONFIDENTIAL AND PROPRIETARY PRODUCTS OF ANSYS, INC., ITS SUBSIDIARIES, OR LICENSORS. The software products and documentation are furnished by ANSYS, Inc., its subsidiaries, or affiliates under a software license agreement that contains provisions concerning non-disclosure, copying, length and nature of use, compliance with exporting laws, warranties, disclaimers, limitations of liability, and remedies, and other provisions. The software products and documentation may be used, disclosed, transferred, or copied only in accordance with the terms and conditions of that software license agreement. ANSYS, Inc. is certified to ISO 9001:2008. U.S. Government Rights For U.S. Government users, except as specifically granted by the ANSYS, Inc. software license agreement, the use, duplication, or disclosure by the United States Government is subject to restrictions stated in the ANSYS, Inc. software license agreement and FAR 12.212 (for non-dod licenses). Third-Party Software See the legal information in the product help files for the complete Legal Notice for ANSYS proprietary software and third-party software. If you are unable to access the Legal Notice, please contact ANSYS, Inc. Published in the U.S.A.
Table of Contents The Remote Solve Manager (RSM)... 1 Setup... 1 Windows... 2 Installing RSM Services... 2 Uninstalling RSM Services... 2 Quick Start... 2 Quick Configuration... 4 User Accounts and Passwords... 5 Alternate Accounts... 5 Manually Running the Password Application... 5 UNIX/Linux Accounts When Using RSH/SSH... 6 Terminology... 6 Capabilities... 7 File Handling... 7 Cancelling Jobs on Vista... 8 User Interface... 9 Menu Bar... 10 Tree View... 11 List View... 13 Progress Pane... 14 Tray Context Menu... 14 Accounts Dialog... 15 Administration... 15 Integration with Unix/Linux using RSH/RCP... 19 Integration with Unix/Linux using SSH/SCP... 21 Additional Unix/Linux Considerations... 22 Integration with Platform LSF Cluster on Windows... 23 Integration with LSF and PBS Clusters on Unix/Linux... 25 Integration with Microsoft Compute Cluster... 28 Customization Guide... 29 Customizing an Existing Job... 29 RSM Integration... 30 Manually Submitting Jobs... 30 Index... 33 iii
iv
The Remote Solve Manager (RSM) Introduction Remote Solve Manager is a job queuing system. Any computer with RSM installed can act as the Client, Solve Manager or Compute Server. These different functions are often referred to as the tiers of the software architecture. Further, a computer with RSM installed may act simultaneously as two or all three of the tiers. The Client is the computer that submits the job to the Solve Manager. The Client machine has the end-user application installed such as ANSYS Workbench. The Solve Manager is configured with Compute Servers and Queues. Jobs are submitted to a Queue and the Solve Manager is responsible for dispatching the job to an idle Compute Server. The RSM administrator decides if users should use the Solve Manager on their local machine or a central Solve Manager. This will depend on the number of users and compute resources. The most effective use of RSM is to designate one computer as the Solve Manager for central management of compute resources. All clients then submit jobs to one or more Queues configured for that Solve Manager and their jobs are dispatched as compute resources become available. The Compute Server is the computer on which the Job is run. A Compute Server may be the local machine ( localhost ) or a remote Windows computer. Compute Server Proxy settings are used to utilize Unix/Linux computers and integrate with third party queuing systems (for example, LSF, PBS, Microsoft Compute Cluster). Once Compute Servers are configured, they are added to a Queue. A Queue can contain multiple Compute Servers. Jobs must specify a Queue when they are submitted to Solve Manager. RSM is really the underlying services that dispatch and run jobs. Included with RSM is an administration and status application that allows you to view and manage local and remote Solve Managers. This application is not required for RSM to function, however, for the purposes of this documentation we may refer to this application and RSM as being one and the same. This application may also be referred to as the RSM user interface. The RSM user interface is a single instance application that, by default, loads in the Windows system tray. Double-clicking on the tray icon ( ) will open the application main window. With the RSM user interface you can filter jobs by status, manage queues and servers of local and remote solve managers, monitor the progress of jobs, and delete jobs. In addition, the Desktop Alert is a small popup dialog that appears in the bottom right portion of the screen near the system tray. It provides limited status information without opening the main window. Clicking on the desktop alert dialog will also open the main window. Setup When delivered with ANSYS Workbench products, RSM is automatically installed during the setup. RSM will be automatically removed when Workbench is uninstalled. Administrator privilege is not required to install RSM on client machines. You can install RSM by itself only if ANSYS Workbench products are not being installed. The scenario when this would be done is when a machine acting solely as RSM manager and/or compute server. In this case, the RSM services are required for connectivity with remote machines. 1
The Remote Solve Manager (RSM) To install RSM and its Windows services, locate the RSM.TGZ file on the distribution media. This is a tar/gzip formatted file for use on Windows or Linux computers. Extract the files (WinZIP can be used on Windows) to a desired location maintaining the directory structure. Windows Topics related to RSM Windows setup are as follows: Installing RSM Services Uninstalling RSM Services Installing RSM Services RSM Services are required if the computer will be accessed by remote Clients (Manager Service required) or a remote Manager (Compute Server service required [aka ScriptHost]). Use the following steps to install the Manager and Compute Server services as required. Administrative privileges are required to perform these steps. 1. Be sure no Ans.Rsm.* processes are running in Task Manager. 2. Locate Ans.Deployment.Config.RSM.exe in the [RSMInstallDir]/bin directory. 3. Open a command prompt in this directory. 4. Run the command Ans.Deployment.Config.RSM.exe mgr -svr. Uninstalling RSM Services To uninstall, run Ans.Deployment.Unonfig.RSM.exe. If services have been installed by an administrator, an administrator must uninstall them in order for services to be unregistered. Once services have been unregistered the RSM installation directory can simply be deleted. Quick Start Overview To use RSM, you must first understand the following primary functions: Manager - The Manager is the central RSM service that contains a configuration of Queues and Compute Servers. Its main function is to dispatch Jobs to an available computing resource. Compute Server - A Compute Server is the machine on which jobs are run. In most cases, the Compute Server refers to a remote machine, but it can also refer to your local machine (localhost). Compute Servers are created using the RSM user interface. Queue - A Queue is a list of Compute Servers available to run Jobs. When a job is sent to a Queue, the Manager selects an idle Compute Server in the list. Queues are created using the RSM user interface. Job A Job is defined by a Job Template that specifies input and output files and the Job Script that should be executed on the Compute Server. The Job Script is written specifically to run an application on the Compute Server such as the ANSYS Solvers. Jobs are submitted from the client application (e.g. ANSYS Workbench). The RSM is typically used in the following ways: Client machine submits jobs through RSM running locally on the same machine, directly to itself. Here, the Client, Manager, and the Compute Server are all on the local machine. 2
Overview Client machine submits jobs to the Manager running locally on client machine which then may assign a remote Compute Server to run the Job. Client machine submits jobs to a Manager running on a remote machine which is also acting as the Compute Server. Client machine submits jobs to a Manager running on a remote machine which then assigns the Job to a remote Compute Server. In this case, 3 or more separate machines are involved. 3
The Remote Solve Manager (RSM) Quick Configuration Local Background Use If RSM is to be used solely for local background jobs, it is not necessary to cache your password with the Manager. Additional configuration may be required to utilize remote compute resources. Working with a Remote Manager The RSM client application can be used to monitor and configure multiple RSM Managers. To add a remote RSM Manager: 1. Open RSM user interface: If the RSM user interface is already running, double-click on the tray icon ( ) to open the main window. If not, you can also use the Start menu to launch the RSM client application. 2. In the RSM window select Tools>Options. 3. In the name field, enter the name of the remote Manager machine. This machine must have RSM Manager service installed. 4. Press the Add button and then OK. The remote Manager and all of its Queues and Compute Servers appear in the tree view. 5. Passwords are cached on the RSM Manager machine so the password will have to set again. Right click on the remote Manager machine in the tree view and select Set Password as described below. To use a remote Manager, a user must cache their Windows account password with the Manager so that processes can be run on their behalf on remote machines. The following procedure describes how to do this. 1. In the RSM user interface, right click on the Manager and select Set Password... In a Windows environment, this will open a command window. In a Linux environment, the password application will run in the terminal where the RSM user interface was started. If the user interface is not started in a terminal, the password application can be run manually (see Manually Running the Password Application (p. 5)). 2. Press Enter at the username prompt to set the password for the currently logged-in user. 3. Type password and confirm. 4. Press any key to exit the command window. To utilize compute resources that are remote to the Manager machine you will have to create a Compute Server and Queue. Create Compute Server: Right mouse click on the Servers folder and select Add. The property dialog for the new Compute Server will open. If the Compute Server is running Windows, only the machine name in the top section is required. If the Compute Server involves integration with a Linux/UNIX machine or another Job scheduler, see Integration with Unix/Linux using SSH/SCP (p. 21) or Integration with Unix/Linux using RSH/RCP (p. 19) for integration details. Note Once created, be sure to test the Compute Server configuration. Select the new server, right mouse click and select Test Server. Use the log window to validate the test. 4
Create a Queue: Right mouse click on the Queues folder and select Add. When the new Queue property dialog appears, provide an informative name and select the Compute Server created in the previous step. Now RSM is ready for Jobs to be submitted from the Client application. The Client application user will need to know the Manager machine name and the Queue name. The Client application will have some mechanism to allow the user to specify the RSM Manager machine and Queue. For example, ANSYS Workbench Mechanical has Solve Process Settings for this purpose. User Accounts and Passwords Unless using RSM solely on the client computer, one of the first things a user should do is cache their Windows account password with the local (My Computer) and/or a remote RSM Manager. RSM caches Windows account passwords so that it can run Jobs on a Compute Server on behalf of the user. The passwords are encrypted and stored by the Manager. Each Manager has a list of registered Windows accounts. Anytime a password changes, it must be updated in RSM. See the Tree View (p. 11) section under User Interface (p. 9) for details about setting account passwords. Alternate Accounts Alternate accounts are a new feature for RSM 12.1. Alternate accounts allow the client user to run jobs under an alternate account on a specific RSM Compute Server. The primary or owner account is considered to be the account under which the user is communicating with RSM. This is generally the account used when using the client application (e.g. Workbench). Alternate accounts are then added to the owner account. In the example below, ANSYS\johnd is the owner account and johndoe is an alternate account to be used on a Compute Server(s) specified in the Accounts Dialog. Setting password for primary (default), alternate or new alternate account. Existing alternate accounts: johndoe Enter user name (ANSYS\johnd):johndoe Enter password for ANSYS\johnd: ******** Re-enter password: ******** Password set for johndoe: Your password has been encrypted and stored. It can only be decrypted and used to run jobs on behalf of ANSYS\johnd. Manually Running the Password Application Manually Running the Password Application It is normally not necessary to run the password application manually. The scenario where this might be necessary is on Linux when the terminal used to start the RSM user interface is not available. The password application will run in the original terminal used to start the RSM interface so, if it was later closed, the output and prompts will not be visible. A simpler solution may be to stop and restart the RSM interface but these instructions are included in the event that a general solution is desired. The password application is named Ans.Rsm.Password.exe and is located in the [RSMInstall]\bin directory. On Windows, the application can be run directly. On Linux, the application must be run using the Mono runtime (installed with the Workbench Framework). An example script named rsmpassword is located in [RSMInstall]/Config/tools/linux. This script must be edited to specify the Workbench Framework path and platform. 5
The Remote Solve Manager (RSM) Running the password application with no arguments shows the usage as follows: Usage: Ans.Rsm.Password.exe [-m manager][-a account][-o owner][-p password] -m manager: RSM Manager machine (default = localhost). -a account:target account. If no -o owner, this is a primary account. -o owner: Account owner. Setting password for an alternate account specified with -a. -p password: Account password. -? or -h: Show usage. NOTES: If no -a or -p, this is normal interactive mode. Accounts can be entered as username or DOMAIN\username. UNIX/Linux Accounts When Using RSH/SSH If the Windows and UNIX account names are the same (e.g. DOMAIN\johnd on Windows and johnd on UNIX) then no additional configuration is required. If the account name is different, then an account can be specified in the Compute Server properties proxy section. Client applications may also have a mechanism to specify an alternate account name. For example, in the Workbench Solve Process Settings Advanced dialog a UNIX account can be set. Remember that RSH/SSH must be configured for password-less access (See Integration with LSF and PBS Clusters on Unix/Linux (p. 25)). RSM does not store UNIX passwords for use with RSH/SSH. Terminology Client: The Client computer submits the Job to the Solve Manager via an end-user application such as ANSYS Workbench. The Client must specify a Queue. Manager: This computer has Queues and Compute Servers configured. The Manager assigns Jobs to idle Compute Servers. Compute Server: The Compute Server runs the Job Script. The Computer Server can be a Windows-based computer or a non-windows operating system equipped with Mono [http://www.mono-project.com/] (as of version 12.1). The Job Script will perform some task (e.g. run a finite element solver). If the Job Script will be running an application, the application must be installed on the Compute Server. Queue: A list of one or more Compute Servers. A Job specifies a Queue when submitted to the Manager. The Manager dispatches queued jobs to available Compute Servers. Job: A Job is defined by input and output files and a Job Script to run that will perform some task. Jobs can be submitted to a local (My Computer) or remote Manager. An RSM component loaded into the client application (e.g. ANSYS Workbench) handles the transfer of input/output files to/from the Manager. Job Template: An XML file that defines the Job: inputs, outputs and Job Script. Job templates are located in [RSMInstallDirectory]\Config\xml on the Manager machine. Job Script: One or more code files defined in Code Template. The Job Script file(s) is sent to the Compute Server, compiled and run. Code Template: An XML file containing code files (for example, C#, VB, JScript), references and support files required for the Job. Code Templates are located in [RSMInstallDirectory]\Config\xml on the Manager machine. 6
Compute Proxy: A Compute Proxy is defined in the Compute Server properties dialog. Proxy settings are used by Job Scripts to integrate with Unix/Linux machines (when native RSM service is not available) and third party job schedulers (for example, LSF, PBS, Microsoft Compute Cluster). Native Linux: Prior to version 12.1, RSM could only be installed on Windows-based computers. As of version 12.1, RSM can be installed and run on Linux-based computers where Mono is supported. With RSM services running natively on Linux, SSH/RSH are no longer required for Windows-Linux connectivity. Mono: Mono, the open source development platform based on the.net framework, allows developers to build Linux and cross-platform applications with improved developer productivity. Mono's.NET implementation is based on the ECMA standards for C# and the Common Language Infrastructure. See http://monoproject.com/. Capabilities RSM supports Windows XP and Windows Server 2003 and later versions (32bit and x64). RSM will also natively support non-windows operating systems equipped with Mono (open source equivalent to Microsoft.Net) in version 12.1. Natively means that RSM is installed and runs directly on that platform. Windows and Linux installations can then freely interoperate without additional configuration (e.g. RSH or SSH). RSM supports non-windows operating systems via Compute Proxy settings of the Computer Server and customized Job Scripts. Built-in Job Scripts for the ANSYS Solvers utilize RSH and SSH network communication protocols. Support for other protocols is possible through customization of Job Scripts (see Customization Guide (p. 29)). Support of third party job schedulers such as LSF, PBS, and Microsoft Compute Cluster (MSCC) are also supported using Compute Proxy settings and custom Job Scripts. Built-in Job Scripts for the ANSYS Solvers integrate with LSF (Windows and Linux), PBS (Linux only) and MSCC in this way. Additional job schedulers can be integrated via customization of Job Scripts (see Customization Guide (p. 29)). File Handling Input files are generally transferred from the Client working directory, to the Manager project directory, to the Compute Server working directory where the Job is run. Output files generated by the Job are immediately transferred back to the Manager s project storage when the Job finishes. The files are stored there until the Client application requests to download the output files. The following topics provide more details about how RSM handles files. Client The location of files on the client machine is controlled by the client application (e.g. Workbench). When the client submits a job to RSM it specifies a directory where inputs can be found and where output files should be placed. Refer to the client application documentation to determine where input files are placed when submitting jobs to RSM. Input files are copied to the Manager machine (see below) immediately when the job is submitted. Manager File Handling If the client is using the Manager on the same computer (e.g. My Computer ), the Manager will simply use the client s file location as an optimization. Otherwise, when a job is submitted the Manager creates a Project directory that will contain all the inputs and outputs for the job. The base project directory location is controlled with the Manager properties dialog (SeeSolve Manager Properties (p. 15) ). All job files are stored in 7
The Remote Solve Manager (RSM) this location until the client releases the job. The job can also be deleted manually using the RSM user interface. Compute Server If the Compute Server is the same machine as the Manager (e.g. machine name is localhost ) then, as an optimization, the job will run directly in the Manager s project directory. Otherwise, the Compute Server will create a temporary directory in the location defined by the Working Directory property (See Creating a Compute Server (p. 16)). When this property is blank the system TMP variable is used. When the job is done, output files are immediately copied back to the Manager project directory and the temporary directory is deleted. Compute Proxies Unix Integration If the RSH/SSH mechanism is used to utilize a remote Unix compute resource, the Work Directory property in the lower portion of the Compute Server properties dialog (SeeCreating a Compute Server (p. 16)) controls where files are located. If this field is empty, the account s home directory is used as the base location. In either case a unique temporary directory is created. Third Party Schedulers When using the RSM job scripts that integrate with a third party scheduler (e.g. LSF, PBS, MS), the above file handling rules apply to the extent that RSM is involved. When the third party scheduler runs its job it may be necessary to copy files to/from the cluster node that was selected by that scheduler. Network File Systems Examples of network file systems include Window shared folders or directories on Unix computers shared to PC s with SAMBA. These shared folders can be used to minimize file transfers. RSM places marker files in the client, manager, and compute server directories to uniquely identify the Job. If the Manager is able to find (by recursively searching subfolders) the Client s marker in the Project storage area, it will use that folder rather than copying the files to a separate folder. Similarly, if the Compute Server finds (by recursively searching subfolders) the Manager s marker, it will also use that location rather than copying files unnecessarily. Remember that while this leverages operating system level drivers optimized for network file manipulation, files are still located on remote hard drives so that there will still be significant network traffic. Each customer will have to determine the RSM configuration that best utilizes network resources. Cancelling Jobs on Vista Job cancellation is not currently fully functional when running RSM with Windows Vista. This problem only occurs when RSM is used locally and no RSM services are installed (non-admin mode). In this situation, processes continue running even though the RSM Job has been cancelled. In the case of an ANSYS solver job for example, the top level RSM process (Ans.Rsm.TaskHost.exe) starts the solver launcher (e.g. ansys121.exe), which then starts one or more ansys.exe processes. These ansys.exe processes may continue to run after a job is cancelled. This problem occurs because RSM uses the Windows Job Object to wrap all processes started during the course of a job. When the job is cancelled, the Job Object is terminated (Windows API call TerminateJobObject), causing the RSM Job and all child processes to be terminated. With Windows Vista, the Windows API call AssignProcessToJobObject fails to add the top level RSM job process (Ans.Rsm.Task- Host.exe) to the Windows Job Object due to insufficient permissions. Workaround ANSYS recommends that you install the Compute Server service (Ans.Rsm.ScriptHostService.exe), which runs under the System account and has sufficient permissions 8
(see Windows (p. 2)). A potential drawback of this workaround is that your Windows password will have to be stored with the RSM manager (see User Accounts and Passwords (p. 5)). User Interface User Interface This section describes the functional elements of the RSM user interface (client application). The main window of the application is shown below. The functional elements of the interface include the following: Interface Element Menu Bar Description Provides selectable menu options. Toolbar Show: drop down list - filters jobs that will display. Choices are: All Jobs, Completed, Running, Queued, and Failed. Delete button - deletes a currently selected job or jobs. The function is the same as using the Remove option of the List View context menu or the Delete key. All Owner Jobs button - Displays or hides jobs that belong to owners other than yourself. The function is the same as using View> All Owner Jobs in the Menu Bar. Progress Pane button - Displays or hides the progress pane. The function is the same as using View> Progress Pane in the Menu Bar. Tree View List View Progress Pane Status Bar Shows configured queues and servers for one or more Solve Managers. Displays current jobs.you may delete jobs from this area. Displays the job progress and log messages. Located at the bottom of the interface window, it contains two panes: Information pane shows the status of the application. Job count pane displays the number of jobs displayed in the list view. Tray The tray context menu is a duplication of most options available via the main menu. 9
The Remote Solve Manager (RSM) In addition, to the user interface elements described above, the Remote Solve Manager user interface provides two other interactive elements that include: RSM Tray Icon - Double clicking the tray icon displays the main window. The tray icon changes based on the status of jobs (no jobs running, at least one job is running, at least one job has failed). A tool tip is available on the tray icon to display the current status of jobs. Desktop Alert - The Desktop Alert automatically displays when jobs are active. It displays the running, queued and completed jobs. The number of queued, running and completed jobs is also displayed in the window title. Tool tips are available on the individual jobs that show the full name of the job. If all jobs are finished, the desktop alert will disappear automatically. If you wish to hide the desktop alert, use the menu options or tray context (right-click) menu to turn it off. Closing the desktop alert will not cause it to hide itself permanently, it will pop-up again as long as jobs are active unless turned off. Finished jobs can be hidden by using the Tools> Options menu or the tray context menu. A screen capture of this window is shown below. The following sections describe further details of the RSM interface: Menu Bar Tree View List View Progress Pane Tray Context Menu Accounts Dialog Menu Bar The menu bar provides the following functions. Menu File Options View Selections Close Always On Top Hide When Minimized All Owner Jobs Progress Pane Refresh Now Update Speed Function Hides the main window.the application continues to run in the tray. Use the tray menu Exit option to exit the application. Main window remains in front of all other windows unless minimized. If checked, RSM will not appear in the task bar when minimized. It will only appear as a tray icon. Displays or hides jobs that belong to owners other than yourself (if checked, shows your jobs and other owners, otherwise displays only jobs which you own). Displays or hides the progress pane. Forces the list view to update immediately, regardless of the update speed setting. Provides the following submenu selections: High - updates the display automatically every 2 seconds. 10
Tree View Menu Tools Help Tree View Selections Desktop Alert Remove Submit a Job... Options ANSYS Remote Solve Manager Help About ANSYS Remote Solve Manager Function Normal - updates the display automatically every 4 seconds. Low - updates the display automatically every 8 seconds. Paused - the display will not automatically update. Enables/disables the desktop alert window. Deletes the selected job or jobs. Displays the manual job submit dialog. Displays the Remote Solve Manager options window. Displays the Help system in another browser window. Provides information about the program. The tree view contains a list of servers and queues setup on either your local machine (shown as My Computer) or a remote Solve Manager. You can set passwords and configure queues and servers via the context (right-click) menus of the tree nodes. See the Administration (p. 15) section for further details. When selecting a Solve Manager, Properties, Accounts and Set Password options are available. When the Queues node is selected, only the Add option is available. 11
The Remote Solve Manager (RSM) When a Queue is selected the Properties and Delete options are available. When a Compute Server is selected, either under a Queue or under the Compute Servers node, the Properties and Test Server options are available. The Delete option also becomes available if selected under Compute Servers (not shown), however, the server must not be assigned to any Queue in order to be deleted. When the Compute Servers node is selected in the tree only the Add option is available. 12
List View Context Menu Options For more information on using the Tree Context Menu options, see the Administration section. List View You can sort the displayed fields by selecting the appropriate column to sort. You may delete jobs that belong to you by clicking the delete button in the toolbar, or clicking Remove in the context menu, or pressing the delete key. The List View supports the deletion of multiple jobs by highlighting the jobs you wish to remove and following the above steps to delete them. If you delete a job, the job may not be removed from the list view immediately. It will disappear the next time the List View is refreshed. Note If a Job is Running you cannot remove it. Use the List Context Menu Cancel option first and wait for the Job to complete. Sending a cancel command to a Job will allow it to cleanup processes spawned by the Job. There may also be a job stopping option in the application that submitted the job (e.g. ANSYS Workbench Mechanical Stop Solution command). List View Context Menu Options The List View context menu provides the following options: Inquire - Inquire about a running Job. This action depends on the type of Job being run. Generally, the Inquire command will run some additional Job Script code to perform some action on a running Job. It can also bring back intermediate output and progress files. Cancel - Cancel a running Job. The Job Script will implement Cancel code to cleanup spawned process and perform any necessary cleanup. Remove - same as main menu. Set Priority - see Changing Job Priority (p. 14) below. The bottom option (if available) is where Job add-ins can provide specific functionality for that type of Job. A Job add-in may display specific information about the running Job such as an output file (Inquire may be required to update the file). The add-in functionality allows for more detailed information about a Job without running the client software that submitted the Job. The add-in can also be used on a different computer than where the Job was originally submitted. 13
The Remote Solve Manager (RSM) Changing Job Priority When jobs are submitted they have a default priority of Normal. To change the priority of a Queued job, right click on the job name, choose Set Priority from the context menu, and change the priority. The higher priority jobs in a queue will run first. Changing the priority of a job that is in any state except Queued will have no effect. Only RSM Administrators can change a job priority to the highest level. Progress Pane This pane provides Log Messages about the Job. You can copy the log messages using the context menu or the Copy (Ctrl+C) key combination. The log will automatically scroll to keep the most recent messages in view. To stop automatic scrolling, select one or more lines in the log. To resume automatic scrolling Ctrlselect the selected line(s) so that no lines are selected. Note When making a support call concerning RSM functionality, the complete contents of the progress pane should be sent. Tray Context Menu By right-clicking the Tray icon, the following options become available: Menu Option Options Description Displays the Options dialog box, which includes: Solve Managers list the list of available solve manager machines. You can Add, Delete, or Change machine names in the list. Desktop Alert Settings check boxes, which include the following: Show Running Jobs Show Pending Jobs Show Completed Jobs Help About All Owner Jobs Desktop Alert Open Job Status Same as Menu Bar option. Same as Menu Bar option. Same as Menu Bar option. Same as Menu Bar option. Displays the main application window. 14
Solve Manager Properties Menu Option Exit Description Exits the RSM client application. Accounts Dialog The Accounts dialog is opened by right clicking on the Manager node in the tree view and selecting Accounts.... Alternate accounts are added when the password is set using the password application (see User Accounts and Passwords (p. 5)). After adding the alternate account(s), use the accounts dialog to specify the Compute Server(s) where that account should be used. Accounts can be removed here by right clicking on the account and selecting Remove. Administration RSM administrators must satisfy one of the following requirements. They are a Windows administrator on the Solve Manager machine (they are in the local or domain Administrators group). They are a member of the RSM Admins group on the Solve Manager machine. If this method is used, the RSM services [ ANSYS JobManager Service V12 and ANSYS ScriptHost Service V12 ] may need to be restarted for the changes to take effect. A user with RSM administrator privileges can create/modify Compute Servers and Queues, set Job priority and delete Jobs of any user. Solve Manager Properties The following properties can be modified for a Solve Manager by right clicking on the Manager node in the tree view and selecting Properties : Job Cleanup Period this is the time in HH:MM:SS format (hours:minutes:seconds) that the job will remain in the list after its status is Released. Project Directory this is the base location where the Manager will store input and output files for a Job. A unique subdirectory is created for each Job. This location can be on the local disk of the Manager computer or it could be a network share (e.g. \\fileserver\rsmprojects). 15
The Remote Solve Manager (RSM) Compress Threshold (MB) The threshold at which files will be compressed prior to transfer. There is always a trade-off between the time it takes to compress/decompress versus the time to transfer. The appropriate value will depend on the specific network environment. Creating a Compute Server To create a Compute Server right click on the Compute Servers node for the desired Solver Manager and select Add. For example, to create a Compute Server in your local Solve Manager, select the Compute Servers node under My Computer. The Compute Server Properties dialog will appear as shown below. A short description of each field is displayed in the bottom portion of the dialog box when a field is selected. 1. General Friendly Name: This name can be any descriptive string for the compute server. It is initialized to Server (date time) to guarantee its uniqueness. Examples of friendly names include Bob s Computer, or My Computer to Linux Proxy. Machine Name: Currently, this is always a Windows-based computer that has RSM installed. Use localhost for the local machine or an actual network machine name. If using localhost, note that this means the Solve Manager machine is also acting as the Compute Server. 16
Enabled: If True, this Compute Server can accept Jobs. Working Directory: Directory on the Compute Server where the Job is run. Job Scripts, input and output files are located in this directory. If blank, the system temporary directory is used which is defined by the TMP environment variable. RSM users must have write access and full permissions to this directory. Logging Enabled: If True, a log file is created in the Windows temporary directory on the Solve Manager machine. The file name will have the form RSM_machine.log. Max Running Jobs: The maximum number of simultaneous jobs this Compute Server is allowed to run. In previous versions, to run multiple jobs on one Compute Server it was necessary to create duplicate servers differing only in name and add them to the same Queue. This is no longer necessary. This is especially useful when the Job is simply forwarding the work to a proxy job scheduler (e.g. LSF, PBS). Start Time: Time in 24 hr HH:MM:SS format at which time the Compute Server can accept Jobs. Use 00:00:00 for both Start Time and End Time for 24 hour availability. Start Time is very useful if a Compute Server or application licenses are only available at certain times of the day. End Time: Time in 24 hour HH:MM:SS format at which time the Compute Server stops accepting jobs. Jobs already running at the end time will continue to run. Use 00:00:00 for both Start Time and End Time for 24 hour availability. 2. Compute Proxy Keyword: Keywords invoke customized Job Scripts. For example, the standard Code Template for an ANSYS Solver job is AnsysCode.xml. If the keyword RSH is entered, the Code Template AnsysCode_RSH.xml is used which contains a different set of code files designed to communicate with another computer using the RSH/RCP protocol. The remaining proxy settings are used exclusively by the custom Job Script. Machine Name: This proxy setting is normally used to specify a non-windows compute proxy machine. This value is accessed with the Task.ProxyMachine property in the Job Script. The setting can be used for any purpose by a custom Job Script. Account Name: This proxy setting is normally used to specify an account name on a non-windows compute proxy machine. This value is accessed with the Task.ProxyAccount property in the Job Script. The setting can be used for any purpose by a custom Job Script. A blank entry is allowed. Working Directory: This proxy setting is normally used to specify a working directory on the non- Windows compute proxy machine. This value is accessed with the Task.ProxyWorkingDirectory property in the Job Script. The setting can be used for any purpose by a custom Job Script. A blank entry is allowed. Custom 1: This is a reserved field that can be used to send parameters to a customized Job Script. The value is accessed with the Task.Custom1 property. Custom 2: This is a reserved field that can be used to send parameters to a customized Job Script. The value is accessed with the Task.Custom2 property. Testing a Compute Server Testing a Compute Server A Compute Server configuration can be tested by right clicking on the Compute Server node in the tree selecting Test Server. This will run a test Job using the settings provided. By viewing the Log Messages in the Progress Pane, you can determine if the actual Job will work. 17
The Remote Solve Manager (RSM) Creating a Queue To create a Queue, right-click on the Queues node for the desired Solver Manager and select Add. For example, to create a Queue in your local Solve Manager, select the Queues node under My Computer. The Queue Properties dialog will appear as shown below. The Queue Properties dialog contains the following: Name: This field should contain a descriptive name for the Queue. Examples of Queue names include Local Queue, Linux Servers, or MS Compute Cluster. If the Compute Server(s) of the Queue have Start/End Time specified it may be useful to indicate this to users (e.g. Night Time Only ). Enabled: If True, the Solve Manager will dispatch queued Jobs to available Compute Servers. If False, Jobs will remain in a Queued state until the Queue is enabled. Priority: This value can be Low, Below Normal, Normal, Above Normal, or High. When determining the next Job to run the Manager will pull Jobs from the highest priority Queue first. A common use for this setting to have a separate, higher priority Queue for smaller Jobs, so that they will be processed first before running large Jobs that will tie up the computing resource for a long period of time. Assigned Servers: Check the Compute Servers that will be utilized for this Queue. A Queue can contain more than one Compute Server. A Compute Server can also be a member of more than one Queue. Computers with Multiple Network Interface Cards (NIC) Server Side Configuration This special case requires manual editing of the JobManagerService.exe.config and ScriptHostService.exe.config files on the multi-nic machine adding machinename=ip_address to the TCP channel configuration. Do not make any other changes. Restart both the Ansys JobManager Service V12 and the Ansys ScriptHost Service V12 in Control Panel> Administrative Tools> Services after making the change. For example:... <channels> <channel ref="tcp" port="9012" secure="flase" machinename="1.2.3.4"> <serverproviders> <formatter ref="binary" typefilterlevel="full"/> </serverproviders> </channel> 18
Integration with Unix/Linux using RSH/RCP </channels>... To determine the correct IP address to use, use the ping utility at the command prompt from a separate computer using the multi-nic computer s DNS name. For example: C:\>ping multinicmachine Pinging multinicmachine.mycompany.com [10.2.10.32] with 32 bytes of data: Reply from 10.2.10.32: bytes=32 time=56ms TTL=61 In this example, the IP address 10.2.10.32 represents the network that clients see. Client Side Configuration If the above manual configuration has not resolved a client-server communication, it may be that the client has multiple NICs. For example, a virtual NIC used for a VPN connection can cause a conflict (even if not connected). For this configuration follow the same process as above except that the client configuration file is Ans.Rsm.ClientApi.dll.config. This file will have to be created. When this file exists it will be used. When not found a default configuration is used. Copy and paste the contents below: <?xml version="1.0" encoding="utf-8"?> <configuration> <system.runtime.remoting> <application> <channels> <channel ref="tcp" port="0" secure="true" machinename="1.2.3.4"> <clientproviders> <formatter ref="binary" typefilterlevel="full"/> </clientproviders> </channel> </channels> </application> </system.runtime.remoting> </configuration> Integration with Unix/Linux using RSH/RCP Remote Solve Manager supports using RSH/RCP (remote shell/remote copy) in custom Job Scripts. The built-in Job Scripts for the ANSYS Solver have been tested using the RSH client (rsh.exe) provided by Windows. Note Windows Server 2008 and Vista versions do not include the RSH client. RSH integration with Unix is not officially supported on these platforms. It may be possible to use a third party RSH client on these Windows versions with some customization, however, PuTTY SSH is the recommended alternative. 19
The Remote Solve Manager (RSM) RSH Configuration In order to send Jobs to a remote Unix/Linux machine using RSH, RSH must be configured to allow access from a specific user and a specific machine. This is done by editing the.rhosts file in the home directory of the Unix/Linux account under which the job will run. The entry is of the form: windows_machine_name windows_username or windows_machine_ip windows_username For example, if the windows domain account is COMPANY\johnd and the Windows Compute Server is named jdoexp06, the.rhosts entry would be: jdoexp06 johnd In some network environments it is necessary to use the fully qualified name of the Windows computer: jdoexp06.company.com johnd Along with the Test Server feature of the RSM user interface, the following troubleshooting commands can be used to verify the RSH configuration. These commands are executed at the Windows command prompt on the Compute Server that will be communicating directly with the Unix/Linux machine. 1. If Windows and Unix/Linux account names match, this command will list the home directory on the remote machine: C:\>rsh unixmachine ls -l 2. If the Windows and Unix/Linux account names are different, this command will list the home directory of the unixaccount on the remote machine. C:\>rsh unixmachine -l unixaccount ls -l 20
PuTTY SSH Configuration Integration with Unix/Linux using SSH/SCP Remote Solve Manager supports using SSH/SCP (secure shell, secure copy) in custom Job Scripts. The builtin Job Scripts for the ANSYS Solver have been tested using the PuTTY SSH client http://www.chiark.greenend.org.uk/~sgtatham/putty. PuTTY SSH Configuration In order to send RSM jobs to a remote Unix/Linux machine using SSH, SSH must be configured to allow access from a Windows machine. SSH configuration involves creating a cryptographic key on the Windows Compute Server and placing public portions of the key on the Unix/Linux machine. Use the following configuration steps to set up PuTTY SSH: 1. Download and install PuTTY from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html. If this link is invalid, perform a web search for "putty". 2. Create cryptographic key using PuTTYGen (puttygen.exe). a. Click Generate. b. Move the mouse as instructed. c. Change key comment to include your machine name and windows username. d. Save private key file without passphrase, for example, <drive>:\program Files\Putty\id_rsa.ppk. Using passphrase will result in jobs being hung at prompt to enter passphrase. Be sure to secure the private key file using some other means. For example, if key only being used by you, save to My Documents folder where only you and Administrators will have access to the file. If multiple users will use the same key, allow owner full control. Then create a group and allow only users in that group to read this file. e. Move public portion of key to Unix machine. This requires editing the ~/.ssh/authorized_keys file on the Unix machine. This can be done in a telnet session on the Windows machine. Copy all the text from the box under "Public key for pasting " and paste it into ~/.ssh/authorized_keys. All of this text should be one line. If the authorized_keys file 21
The Remote Solve Manager (RSM) does not exist, create one. Alternatively, paste into a text file and get that file to the Unix machine for editing. 3. Modify environment variables using Explorer>My Computer>Properties>Advanced. a. Add PuTTY install directory to Path, such as C:\Program Files\putty. You must add to the system environment variable area at the bottom of the dialog. b. Add a KEYPATH environment variable with full path to private key file, such as <drive>:\program Files\Putty\id_rsa.ppk. Use a user variable if the key file is used only by you. Use a system variable if other users are sharing the key file. If users will have their own key file in My Documents, set KEYPATH = %USERPROFILE%\My Documents\id_rsa.ppk. This will expand to <drive>:\documents and Settings\ <user> \My Documents\id_rsa.ppk. c. Reboot the Windows machine for RSM to pick up environment changes. 4. As an initial test, run the following from the command prompt (quotes around %KEYPATH% is required): plink -i %KEYPATH% unixlogin@unixmachinename pwd a. If plink prompts you to store the key in cache, select Yes. b. If plink prompts you to trust key, select Yes. 5. Set up a server in the RSM using "PSSH" keyword as shown above and use "Test Server..." to verify the configuration. Additional Unix/Linux Considerations Unix/Linux Path Configuration On Unix/Linux machines, the RSM Unix/Linux job scripts (currently keywords RSH, SSH, PSSH) require that the job executable be in the path when running the remote command. For example, the built-in scripts to run the ANSYS solver require that the ANSYS command ( ansys121 for version 12.1) can be found in the path. If the job is not running properly, check the job log in the RSM windows for "Command not found". If this is the case, the Unix/Linux account will need to be configured so that this command is found. Remote command clients like RSH generally use the remote account's default shell for running commands. For example, if the account's default shell is CSH, the following line needs to be added to the.cshrc file (path may be different for your environment): set path = ( $path /ansys_inc/v110/ansys/bin ) Note ~ (tilde) representation of the home directory is not supported for use in RSM paths (e.g. working directory in Compute Server properties). Note Different shells use different initialization files from the account's home directory and may have a different syntax than shown above. Refer to the man page for the specific shell or consult the machine administrator. 22
Integration with Platform LSF Cluster on Windows Shell Compatibility and Aliases RSM works with the Bourne shell (/bin/sh), the Bourne Again shell (/bin/bash), the Korn shell (/bin/ksh) and /bin/tcsh. Accounts intended to run RSM jobs should not include any aliases for commonly used Unix commands such as awk, ls, pwd, ps, and so on. RSH and SSH Job Limitations The Windows RCP and PuTTY SCP clients have file size limitations that RSM circumvents by splitting and joining very large files (greater than 2GB). The Windows compute server and the Unix/Linux compute proxy machine may also have file system limitations beyond the control of RSM. In order to transfer files larger than approximately 2GB, the Unix/Linux machine must be configured with large file support and the Windows file system must be NTFS. If any job output file is not successfully retrieved, all job output files are left on the Compute Proxy machine. Consult the Job log for the temporary directory name used for the job. These files can then be retrieved manually and loaded by the ANSYS Workbench Mechanical Application. See Writing and Reading ANSYS Files in the Mechanical Help for details. The Windows RCP client has exhibited hanging behavior under some conditions. Other third party clients are available if this becomes a problem. The supplied RSH Job Scripts assume rsh.exe and rcp.exe are in the user s path so they made need to be customized to utilize another RSH/RCP client (See Customization Guide (p. 29)). Note Windows Server 2008 and Vista versions do not include the RSH client. RSH integration with Unix is not officially supported on these platforms. It may be possible to use a third party RSH client on these Windows versions with some customization, however, PuTTY SSH is the recommended alternative. An alternative SCP client is WinSCP (http://www.winscp.com) which resolves the 4GB file size limitation. At the time of this writing, the PSSH Job Scripts do not support the SCP client. Secure FTP (SFTP) is another alternative that is not currently supported. Many third party SFTP clients are available online. Utilizing WinSCP or an SFTP client will require customization of Job Scripts (See Customization Guide (p. 29)). Before embarking on Job Script customization, consult ANSYS Technical Support or your ANSYS Software Distributor to check if new Job Scripts have been made available after release. Explicit Dynamics Systems RSM currently does not support UNIX/Linux connections for Explicit Dynamics systems. Only Windows to Windows connections are currently supported for Explicit Dynamics systems. Integration with Platform LSF Cluster on Windows To integrate with a Windows-based LSF cluster, RSM must be installed on at least one node of the cluster and that node must be an LSF submission host (bsub and lsrcp [requires RES service] commands are available). Create a Compute Server as shown below where lsfclusternode1 is the name of the node on which RSM is installed. If this machine is also the RSM Manager, then lsfclusternode1 in the Machine Name field can be replaced with localhost. The Working Directory and Proxy Working Directory values are the same physical location. The network share representation will be the staging area for job inputs and outputs and will be accessed from the LSF 23
The Remote Solve Manager (RSM) execution host. This directory and share must exist on all nodes. In this example, there must be a D:\RSMTemp and share \\nodename\rsmtemp on all nodes. The RSM Job Script will submit the job to LSF with the following bsub command. This code is located in the file...\rsm\config\scripts\lsfutilties.cs. bsub q queue n numcpus o stdout.out e stderr.out cscript.exe //nologo LSF_job.js control.txt The number of CPUs is dependent on the specific job. For example, for the ANSYS solver, the number of CPUs is parsed from the command line coming from Workbench. The LSF job then is actually a JScript named LSF_job.js located in...\rsm\config\scripts. This script will: 1. Read the control file containing paths, inputs and outputs. 2. Make temporary directories on all nodes assigned for the job. 3. Copy inputs to working directory of execution host. 4. Run command (e.g. solver). 5. Copy outputs to staging folder on submission host. 6. Clean up. Note For a Server Test (...RSM\Config\scripts\TestServer_LSF.cs), the queue is hardcoded to normal and the number of cpus is set to 2. Edit this file if these default settings are not appropriate. To complete the configuration, create an RSM Queue and add this Compute Server. The RSM Queue name must match the LSF queue name exactly. Jobs can now be submitted to this Queue and will be forwarded to LSF queue for scheduling. A new feature added at version 12 is the Max Running Jobs property. Previously, multiple RSM Compute Servers would need to created to allow multiple, simultaneous jobs to be submitted to the cluster. You can now set this property to the value appropriate to your cluster. Note that the RSM Job could be in a Running state but LSF may not yet be able to execute the job due to limited resources. Refer to the RSM log pane to determine LSF job ID and state. 24
Integration with LSF and PBS Clusters on Unix/Linux Integration with LSF and PBS Clusters on Unix/Linux To integrate with LSF and PBS cluster on Linux, first establish RSH or SSH connectivity to one node of the cluster. Refer to Integration with Unix/Linux using [RSH/RCP SSH/SCP] for instructions on how to configure client and cluster nodes for password-less access. The selected cluster node must be an LSF submission host (i.e. BSUB command is available). Example RSM Compute Server properties are shown below. In this example, the Machine Name in the top section is localhost because the RSM Manager and Compute Server are running on the same machine (i.e. the Compute Server is local with respect to the Manager). Set Max Running Jobs to the maximum number jobs that should be submitted to the cluster before queuing jobs in RSM. The Compute Proxy settings in the bottom portion control the integration as follows: Keyword: Currently supported keywords are LSFSSH, LSFRSH (LSF cluster via SSH and RSH, respectively), PBSSSH and PBSRSH (PBS cluster via SSH and RSH, respectively). Additional third party job schedulers may be integrated in the future. Machine Name: This is a cluster node previously configured for password-less access via SSH or RSH. Working Directory: This directory is the file staging path. The RSM job will create a temporary directory here (e.g. /staging/dh3h543j.djn). This directory must be mounted on all execution hosts so that the LSF/PBS job has access. Custom1: For this integration, this field is being used to specify local scratch space on the execution nodes. This path must exist on all nodes. The LSF/PBS job script will create temporary directories in this location on all assigned nodes using the LSF/PBS job ID (e.g. /scratch/job1234). 25
The Remote Solve Manager (RSM) Test the Compute Server After the Compute Server is created, test the configuration by right clicking Test Server from the context menu of the server. Note For a Server Test (...RSM\Config\scripts\TestServerUnixTP.cs), the queue is hardcoded to normal for LSF and workq for PBS. Also, the number of CPUs is set to 2. Edit this file if these default settings are not appropriate. Create a Queue The Compute Server must now be added to a new Queue before jobs can be submitted from the client application (i.e. ANSYS Workbench). The name of the queue must EXACTLY match the name of the queue on the cluster. RSM Job Script Customization (required) An RSM job script ([RSMInstall]\Config\scripts\LSFUnixUtils.cs or PBSUnixUtils.cs) must be customized to enter the number of nodes and number of CPUs per node. This information along with the number of CPUs requested is used to determine the job submission command (bsub for LSF or qsub for PBS). The pertinent C# code in the script is shown below. // CUSTOMIZE HERE: # of execution nodes available // This variable will be modified below for the qsub select option int numberclusternodes = 3; 26
PBS Details // CUSTOMIZE HERE: # of CPUs per node. // If this varies use a value such that numberclusternodes * numbercpupernode // doesn't exceed the actual CPUs available int numbercpupernode = 4; General Details of the Integration RSM is essentially forwarding the job to a third party job scheduler. This RSM job must build and execute the job submission command of the scheduler specified through the keyword. The RSM job doesn t really do any real work, but it monitors the status of the job is has submitted to LSF or PBS. Specific details for each scheduler are discussed further below. The job sent to the Linux cluster is a shell script that performs the following tasks: 1. Make temporary directory in scratch directory and move there. 2. Make temporary directories on other nodes assigned for the job. 3. Copy inputs to working directory of execution host. 4. Run command (e.g. Ansys solver). 5. Copy outputs back to staging folder on submission host. 6. Clean up temporary directories. LSF Details Shell scripts that are submitted to LSF are located in [RSMInstall]\Config\scripts. For a server test the name is lsfjob_generic.sh. For an Ansys solver job the file is named lsfjob_ansys.sh. Any additional solvers that are supported will follow the same convention. These shell scripts are actually a generic base portion that is appended to a dynamically generated top portion resulting in a shell script for a specific job (lsfjob.sh). The dynamic portion contains the variable values for the job including number of processors requested, input file names, scratch location, etc. The following are example BSUB commands that can be chosen in LSFUnixUtils.cs. To use a form of the command, remove the comment characters (//) from the beginning of the line. Only one line should be used at a time. Customize the BSUB command as needed. //string submitjobstr = string.format("bsub -q {0} -n {1} -R \"span[hosts={2}]\" -o stdout.out -e stderr.out./lsfjob.sh", queue, numprocs, numbernodesrequested); //string submitjobstr = string.format("bsub -q {0} -n {1} -R \"span[ptile={2}]\" -o stdout.out -e stderr.out./lsfjob.sh", queue, numprocs, corespernode); string submitjobstr = string.format("bsub -q {0} -n {1} -o stdout.out -e stderr.out./lsfjob.sh", queue, numprocs); To monitor the job the RSM job parses the output of the BJOBS command. The default polling frequency of 30 seconds can be customized in LSFUnixUtils.cs. PBS Details Shell scripts that are submitted to PBS are located in [RSMInstall]\Config\scripts. For a server test the name is pbsjob_generic.sh. For an Ansys solver job the file is named pbsjob_ansys.sh. Any additional 27
The Remote Solve Manager (RSM) solvers that are supported will follow the same convention. These shell scripts are actually a generic base portion that is appended to a dynamically generated top portion resulting in a shell script for a specific job (pbsjob.sh). The dynamic portion contains the variable values for the job including number of processors requested, input file names, scratch location, etc. The following QSUB command is located in PBSUnixUtils.cs. Customize the QSUB command as needed. string submitjobstr = string.format("qsub -q {0} -l select={1}:ncpus={2}:mpiprocs={2} -l place=scatter:excl -o stdout.out -e stderr.out./pbsjob.sh", queue, numbernodesrequested, numprocs / numbernodesrequested); To monitor the job, the RSM job parses the output of the QSTAT command. The default polling frequency of 30 seconds can be customized in PBSUnixUtils.cs. Integration with Microsoft Compute Cluster To integrate with a Microsoft Computer Cluster (MSCC), RSM must be installed on the head node. You can then choose to use the Manager and Compute Server (aka ScriptHost) services or just the Compute Server service. If the latter is chosen, the Manager is running on the client or a central, dedicated RSM Manager machine. Create a Compute Server as shown below where, msccheadnode is the name of the cluster head node. If using the Manager on the head node, this can be replaced with localhost. The Working Directory and Proxy Working Directory values are the same physical location. The network share representation will be the staging area for job inputs and outputs and will be accessed from the node where MSCC chooses to run the job. This directory and share must exist on all nodes. In this example, there must be a D:\RSMTemp and share \\nodename\rsmtemp on all nodes. The MSCC job is actually a JScript named MSCC_job.js located in...\rsm\config\scripts. This script will do the following: 1. Read a control file containing paths, inputs and outputs. 2. Make temporary directories on all nodes assigned for the job. 3. Copy inputs to working directory of execution host. 4. Run command (e.g. solver). 5. Copy outputs to staging folder on submission host. 6. Clean up. Now Create a Queue and add this Compute Server. Jobs can now be submitted to this Queue and will be forwarded to the MSCC for scheduling. The number of CPUs/nodes allocated by MSCC is controlled by the Job Script implementation. For example, the Mechanical application contains a Max number of utilized processors setting that is passed along on the solver command line. The command line is parsed in the Job Script and this information is passed on to MSCC. The number of CPUs requests is reported in the RSM log pane. 28
Customizing an Existing Job Passwords As of version 12.1, RSM no longer requires users to manually cache their Windows password with the Microsoft Compute Cluster. Each RSM job will run the hpcutils.exe tool prior to submitting the job to the cluster. This tool programmatically does the equivalent of cluscfg setcreds. Mixed Domains RSM can be used when the client computer and the cluster are different domains. The assumption is that the client computer and user account are on the corporate domain and the cluster is its own domain. In this case, the cluster domain must be configured to have a one-way trust with the corporate domain. That is, the cluster domain trusts the corporate domain but not visa versa. Corporate domain users must be able to use cluster resources (login as CORPORATE\user into a cluster node). If the cluster administrator can add corporate domain accounts as cluster users then this trust has likely been configured when the cluster domain was created. Multiple Network Interface Cards Cluster nodes, especially the head node, generally have multiple network interface cards (NIC) to facilitate separate public and private networks. See Computers with Multiple Network Interface Cards (NIC) in the Administration section for configuration instructions. Customization Guide Before embarking on Job Script customization, consult ANSYS Technical Support or your ANSYS Software Distributor to check if new Job Scripts have been made available after release. Customizing an Existing Job Job Templates, Code Templates, and code for the Mechanical application are included with the RSM installation are located in: [installdir]\rsm\config\xml and \scripts 29
The Remote Solve Manager (RSM) Visual Studio projects are supplied in the scripts folder to aid in editing. Editing Scripts The Job Script files are currently delivered in the C# language. Theoretically, it is possible to use other.net supported languages (e.g. Visual Basic, Jscript) but currently this has not been officially tested. C++ may also be supported in future release. Using Keywords Keywords are used to invoke specialized scripts to integrate with non-native operating systems (e.g. Unix, Linux) and third party job schedulers (e.g. LSF, MS Compute Cluster). The Keyword is appended to the base Job Code Template name. For example, if the code template is named ServerTestCode.xml and the Keyword is LSF, RSM will first look for ServerTestCode_LSF.xml. Keywords are the best way to invoke custom scripts without affecting the existing jobs. Job Templates Job Templates define the Code Template (see below), inputs and outputs of a Job. The Job Template for a Server Test job is shown below. <?xml version="1.0"?> <jobtemplate> <script>servertestcode.xml</script> <debug>true</debug> <cleanup>true</cleanup> <inputs> <file type="ascii">*.in</file> </inputs> <outputs> <file type="ascii">*.out</file> </outputs> </jobtemplate> Code Template A Code Template for a Server Test Job is shown below. The template contains sections for the actual code files, referenced assemblies (.dlls) and support files (such as localized strings). These files are transferred to the Compute Server and processed by the ScriptHostService on that computer. <?xml version="1.0"?> <codedefinition> <codefiles> <codefile>servertest.cs</codefile> </codefiles> <references/> <supportfiles/> </codedefinition> RSM Integration Submitting completely new jobs (as opposed to customizing an existing job) involves using the RSM API (Ans.Rsm.ClientAPI.dll). Contact ANSYS Technical Support for information on integrating an application with RSM. Manually Submitting Jobs Jobs can be manually submitted to RSM using Tools>Submit a Job... in the RSM window as shown below. This job is being submitted to the Manager currently selected in the Tree View (e.g. My Computer). 30
Manually Submitting Jobs The following items are required to manually submit a job. Queue: This drop down list will be preloaded with all the queues currently defined for the selected Manager. Job Template: This is the job template file path relative to the Manager machine. If a remote Manager is being used, a customized job template will have to be copied to that machine into a known location. In the case of a remote Manager, the Browse... option is meaningless. Working Directory: This is the directory where the job input files are located. 31
32
Index R remote solve manager accounts, 5 administration, 15 capabilities, 7 customization, 29 description, 1 LSF - unix/linux, 25 LSF - Windows, 23 Microsoft compute cluster, 28 passwords, 5 terminology, 6 unix/linux limitations, 22 unix/linux MPI, 22 unix/linux path configurations, 22 unix/linux RSH/RCP, 19 unix/linux SSH/SCP, 21 user interface, 9 RSH/RCP configuration, 19 S SSH/SCP configuration, 21 33
34