Moab and Fabriscale s Fabric Manager White Paper Unleashing the Power of your Data Center Executive Summary The data center today needs to be operational at the highest levels of efficiency to ensure all types of workloads are scheduled and executed with minimum effort but maximum efficiency. Adaptive Computing powers many of the world s largest private/hybrid cloud and technical computing environments with Moab, its award- winning workload orchestration and scheduling software. Fabriscale Technologies specializes in fabric management software with an emphasis on smart algorithms which simplify network configuration, management, monitoring, and routing. When integrated together, the resulting solution is able to provide workload- aware network monitoring which enables organizations with the ability to monitor the workload and its impact on your network topology through Fabriscale s Fabric Manager (FFM). Workload- aware Network Monitoring with Moab & FFM Integration When administering an HPC network, it is important to have an effective monitoring tool for identifying problems such as overloaded links and switches, network ports that are down and other network device failures. The Fabriscale Fabric Manager (FFM), provides functionality for monitoring bandwidth at the network, switch and port level. Advanced dashboards can then display these performance results at different granularities. Figure 2 shows the network activity dashboard, which shows send and receive traffic for each compute node and the inter- switch traffic. The user can enable or disable individual plots through interacting with the graph. How it Works The integration was achieved using Moab Web Services (MWS) interacting with FFM. This ensured that Moab via MWS updated FFM with detailed job information. FFM then tracks and graphically maps job relationships within its dashboard, allowing you to visualize the relationship between the jobs, network fabric and compute nodes. Using simple mouse movements, hover over an icon to see summary information, click on it to obtain more information about a compute node, network or job.
Figure 2: The network traffic dashboard showing an overview of the network activity. By aggregating and analyzing the topology description, routing tables, and event logs, it is possible to annotate performance metrics with network event information, helping the administrator to better understand the network status. In addition, with the integration of Moab s workload management capabilities, performance data can be annotated with scheduling events. It then becomes easy to identify correlations between job submission/completion and network utilization by overlaying performance plots with Moab scheduling events, as illustrated by the blue vertical arrows in Figure 2. More details about performance plots and network events are available by hovering the mouse over the plots and annotations as shown in Figure 3. Figure 3: The plot and event details available by hovering over plots and annotations.
The integration with Moab also includes a jobs dashboard that present a jobs overview as shown in Figure 4. Figure 4: The job dashboard showing an overview of the load situation. The job dashboard provides the administrator with the following information (numbers corresponds to Figure 3): 1. Job map of topology with color- coded nodes, according to job ID. 2. List of running jobs. 3. List of last five completed jobs. 4. Timeline of total number of jobs running, grouped by job status (queued or running). 5. Timeline of total number of jobs running, grouped by user. 6. Timeline of total number of jobs running grouped by group. The job dashboard gives an overview of the current scheduling situation and, when required, more details for a specific job can be retrieved by clicking the job ID.
The dashboard shown in Figure 5 provides the administrator with the following information (numbers correspond to Figure 5): 1. Details about the job such as command, user, nodes etc. 2. Details about the network path (route) used for all nodes involved in the job. 3. Traffic sent for all involved nodes and switches. 4. Traffic received for all involved nodes and switches. Figure 5: The job details dashboard showing information about a specific job. The combination of these four dashboards provides a basic tool for monitoring your HPC network. The high level network traffic, job, and job details dashboards, along with the additional hover- details, will help you to visualize the health of your network.
Summary The Fabriscale Fabric Manager (FFM) provides the basic tools required for workload- aware network monitoring through job scheduler integration. Its detailed network visibility, combined with job scheduler information and real- time reporting of network performance metrics, helps you to proactively identify, troubleshoot, and resolve issues before they impact your business. Become a Pilot User Both Adaptive Computing and Fabriscale run a pilot program to provide organizations with the opportunity to evaluate Moab and FFM prior to purchase or in anticipation of future funding. The pilot program offers organizations the opportunity to observe first- hand the unique features and benefits provided by Fabriscale s integration with Moab. Learn more at http://fabriscale.com/pilot. Contact Adaptive Computing Adaptive Computing Enterprises Inc. 1712 S. East Bay Blvd., Suite 300 Provo, UT 84606 USA Telephone: +1 (801) 717-3700 Fax: +1 (801) 717-3738 Email: sales@adaptivecomputing.com Fabriscale Martin Linges vei 17 1364 Fornebu Norway Telephone: +47 92 46 78 42 Email: info@fabriscale.com Web: http://fabriscale.com/ EMEA Office Adaptive Computing Limited 3000 Cathedral Hill Guildford GU2 7YB UK Telephone: +44 (0) 1483 243578 Fax: +44 (0) 1483 243666 Email: sales@adaptivecomputing.com