The Importance of Software License Server Monitoring



Similar documents
The Importance of Software License Server Monitoring

LSKA 2010 Survey Report Job Scheduler

Parallels Virtuozzo Containers vs. VMware Virtual Infrastructure:

Energy Efficient MapReduce

Migration Scenario: Migrating Batch Processes to the AWS Cloud

SCALABILITY AND AVAILABILITY

Understanding Neo4j Scalability

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

Adaptive Resource Optimizer For Optimal High Performance Compute Resource Utilization

SaaS or On-Premise? How to Select the Right Paths for Your Enterprise. David Linthicum

The Benefits of Virtualizing

Rapid Bottleneck Identification

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Using In-Memory Computing to Simplify Big Data Analytics

Optimizing Shared Resource Contention in HPC Clusters

Building Scalable Applications Using Microsoft Technologies

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

11.1 inspectit inspectit

IBM Global Technology Services March Virtualization for disaster recovery: areas of focus and consideration.

Energy Constrained Resource Scheduling for Cloud Environment

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

CPU Scheduling Outline

LicenseMonitor The Top-10 Issues Facing License Managers

16.1 MAPREDUCE. For personal use only, not for distribution. 333

Load Balancing in Distributed Data Base and Distributed Computing System

Windows Server Performance Monitoring

Overlapping Data Transfer With Application Execution on Clusters

WHITE PAPER Guide to 50% Faster VMs No Hardware Required

Recommendations for Performance Benchmarking

NetApp and Microsoft Virtualization: Making Integrated Server and Storage Virtualization a Reality

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

1. Simulation of load balancing in a cloud computing environment using OMNET

Performance with the Oracle Database Cloud

Cloud Management: Knowing is Half The Battle

Lecture 36: Chapter 6

CALL CENTER PERFORMANCE EVALUATION USING QUEUEING NETWORK AND SIMULATION

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Four Keys to Successful Multicore Optimization for Machine Vision. White Paper

Parallels Virtuozzo Containers

A Middleware Strategy to Survive Compute Peak Loads in Cloud

WHITE PAPER Guide to 50% Faster VMs No Hardware Required

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Understanding Object Storage and How to Use It

A High Performance Computing Scheduling and Resource Management Primer

Performance Test Report For OpenCRM. Submitted By: Softsmith Infotech.

You re not alone if you re feeling pressure

Resource Utilization of Middleware Components in Embedded Systems

Azul Compute Appliances

Firewalls Overview and Best Practices. White Paper

The International Journal Of Science & Technoledge (ISSN X)

Performance Prediction, Sizing and Capacity Planning for Distributed E-Commerce Applications

Understanding Server Configuration Parameters and Their Effect on Server Statistics

How To Build A Cloud Computer

Web Server Software Architectures

solution brief September 2011 Can You Effectively Plan For The Migration And Management of Systems And Applications on Vblock Platforms?

Table of Contents INTRODUCTION Prerequisites... 3 Audience... 3 Report Metrics... 3

Performance Testing. Nov 2011

Performance Workload Design

The Truth Behind IBM AIX LPAR Performance

Sla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters

PORTrockIT. Spectrum Protect : faster WAN replication and backups with PORTrockIT

Reduce IT Costs by Simplifying and Improving Data Center Operations Management

- An Essential Building Block for Stable and Reliable Compute Clusters

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

Establishing a Private Cloud

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

Relational Databases in the Cloud

BridgeWays Management Pack for VMware ESX

Desktop Virtualization and Storage Infrastructure Optimization

Integrated Grid Solutions. and Greenplum

Azul Pauseless Garbage Collection

Efficient Parallel Processing on Public Cloud Servers Using Load Balancing

What s New in SharePoint 2016 (On- Premise) for IT Pros

The Business Case Migration to Windows Server 2012 R2 with Lenovo Servers

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

Key differences between virtualization and cloud computing

Justifying Simulation. Why use simulation? Accurate Depiction of Reality. Insightful system evaluations

IBM SmartCloud Monitoring

Virtualization in Healthcare: Less Can Be More

Cloud Computing Capacity Planning. Maximizing Cloud Value. Authors: Jose Vargas, Clint Sherwood. Organization: IBM Cloud Labs

New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN

An Approach to Load Balancing In Cloud Computing

MAGENTO HOSTING Progressive Server Performance Improvements

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

Scaling Microsoft SQL Server

Transcription:

The Importance of Software License Server Monitoring NetworkComputer How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization White Paper

Introduction Semiconductor companies typically run jobs by queuing them up and then using a jobscheduler to dispatch them onto the available cores in the server farms while pulling EDA tool licenses from a license server. There are two primary goals facing organizations which are somewhat at odds with each other. First, maximizing the utilization of server farms and software licenses. Second, running jobs with minimal latency such that users are not unduly delayed. The easiest way to satisfy the requirements of high utilization and low latency is to maximize the number of short duration jobs. Just as it is a lot easier to fill a bucket up with sand than it is with large rocks, the scheduler has a lot more flexibility of just what jobs to run and when. Short duration jobs will not block or occupy a resource for a long time therefore they are not likely to impede the progress of higher priority jobs arriving in the queue. Job Scheduling Challenges in Today s Environment Many job-schedulers are not tuned to handle a large numbers of short jobs as their maximum queue size is limited. In today s semiconductor design environment, there are inevitably some tasks which generate a huge numbers of short jobs, most notably characterizing an entire standard cell library at across a number of process corners, voltages and frequencies, etc. In practice, if short jobs are encouraged, the queue can quickly grow to millions of jobs or more. In addition, the dispatch rate of many of today s schedulers is too low, a byproduct of their cycle-based architecture which limits them to dispatching jobs on a regular interval such as a minute boundary. If a job runs for just a few seconds the scheduler will not notice and the available hardware and software licenses will sit idle until the next cycle boundary. While there are many other important facets of job schedulers (i.e. crash recovery and ease-of-installation), this white paper is focused on the challenge of scheduling large numbers of short duration jobs. Using short jobs might sound like an impossible task but it turns out that somewhere between 30% and 50% of all jobs in a typical semiconductor organization run for less than 1 minute and often over 50% are less than 5 minutes. Only less than 10% of jobs are those that run for 4 hours or more. The graphs below are the actual job mixes from real semiconductor development organizations, one an IP company and one an SoC company. (Figured 1) How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization Page 2

Figure 1: EDA Job Mix for Different Organizations If the preponderance of small jobs still seems hard to believe, it must be emphasized that it is looking across an entire organization. Obviously some engineering groups, such as physical design, do not themselves generate a lot of short jobs, but other parts of the organization such as design verification or library characterization do. This just goes to emphasize the importance of running the entire job mix from a single queue so that the scheduler has both the short-running and the long-running jobs to trade off against each other. However, in practice, many semiconductor organizations adapt to their existing scheduler s inability to efficiently handle short jobs by batching up small jobs into larger ones which poses additional challenges: Jobs may wait longer before they start to run because the scheduler penalizes them for being long-running jobs. Jobs take longer to run because there is no (or limited) opportunity to run the smaller jobs in parallel on different. Overall throughput of the server-farm is reduced since there may be thousands of cores but only dozens of jobs to be run. For instance, if a large workload of jobs each taking a minute is run on 60 servers then one can assume that a single job finishes approximately every second. This means that machines and licenses free up fast for later high priority jobs, even those that might How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization Page 3

require many cores simultaneously, such as running place and route on a modern EDA tool. Furthermore if the scheduler has the capability to reserve machines for large jobs requiring multiple servers, even a job that requires 10 servers will normally only have to wait about 10 seconds for them to become available. Without reservation, of course, 10 servers will never free up at the same time as the scheduler would immediately launch a new job. This is why a large number of short jobs paradoxically also makes scheduling large jobs easier. NetworkComputer: Addressing EDA Job Scheduling Challenges Performance & Scalability Runtime s NetworkComputer is a full-featured agile scheduler, optimized for today s EDA workloads. Probably the most important difference between NetworkComputer and other popular schedulers is its event-driven architecture allowing it to schedule a new job immediately as compute resources and software licenses become available. Traditional cycle-based schedulers operate differently. They have a heartbeat, typically a minute, when they wake up and see what resources have been freed up and what new jobs have arrived in the queue dispatching jobs in accordance to the available resources at the cycle boundary. If the heartbeat is a minute then there will be an average delay of 30 seconds before a job is launched. Figure 2 illustrates the elapsed time impact of cycle based scheduler dispatch latency on the turn-around-time of 100000 jobs running on 500 cores on an average of 5 minutes per job. The results clearly illustrate NetworkComputer s performance advantage over alternative cycle based solutions with a turn-around time advantage ranging from ~20 minutes to ~2.5 hours for a single iteration. This turn-around-time advantage has a compounding affect when the total number of jobs and expected number of iterations are taken into consideration for a typical semiconductor design environment. How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization Page 4

Figure 2: Turn-Around-Time Impact of Job Scheduler Dispatch Latency The difference between an event-driven and cycle-based schedulers is minimal for large jobs, as a delay of a minute is insignificant when compared to a job that lasts several hours. However, consider the case when the same workload is split into many short jobs. In this scenario, it is not only the scheduler efficiency and its ability to keep the server farm loaded, but also the delay associated with scheduler latency and dispatch time. When each short job is delayed 30 seconds or more there is also a compounding effect since it is not only that the current job that has to wait, but also any later jobs that use the same resources. Another consideration for running short jobs is that there are simply a lot more jobs to run if the same workload is done in short jobs rather than large ones. In this scenario, the scheduler overhead becomes a critical factor as it adds up quickly over a large number of jobs. The scheduler memory footprint and dispatch latency are other important considerations as they impact the schedulers resource consumption and scalability. NetworkComputer has been designed specifically for EDA job mixes as a versatile fullfeatured high performance scheduler. Featuring the lowest dispatch latency, NetworkComputer s small memory footprint enables it to handle over 2M jobs on a 32- bit server and over 5M on a 64-bit server. Its event-driven architecture allows it to respond immediately when resource such as server or software licenses are freed up. How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization Page 5

End-user Benefits There are really two communities of users who greatly benefit from NetworkComputer s features. First, the actual chip designers who create the jobs and wait for them to complete. NetworkComputer significantly reduces the delay from the time a job is created to getting results back improving designer productivity and user experience. Designers are no longer under pressure to batch up a lot of work and create big workloads as they would with cycle-base schedulers. Furthermore, NetworkComputer is fast enough that interactive users do not see the difference between running locally or on a server farm. The second benefit is more at the enterprise and IT level. NetworkComputer has a higher utilization of hardware and software license resources when compared to other schedulers. At the margin this either means less hardware and fewer software licenses, or faster turnaround for the same investment. These two features also mean that designers no longer need to spend their time trying to beat the scheduler by doing unusual things such as adapting their methodology to that of the scheduler. In fact about the only thing that are needed is to break the jobs up into more short jobs, which is behavior to be encouraged. Two additional features of NetworkComputer help in this include: Controllability and Observability. Management needs to be able to dynamically set and alter policy as to which jobs are the most critical and then see that the changes are being effective. On the other hand, it is also important that every user can see that the scheduling is being done fairly and understand what is delaying their jobs. It is much more efficient for an organization if a single queue of jobs is scheduled onto all the available hardware, than compensating for limitations of the scheduler with multiple queues, dedicated pools of hardware and dedicated software licenses. Inevitably these dedicated resources turn out to be heavily underutilized resulting in additional cost for the organization in both dollars and delay. As an example, a company was unable to fully utilize their software licenses with their existing scheduler, maxing out at around 80%. NetworkComputer maximized their software license utilization using the exact same compute resources and software licenses, getting close to 100%. Furthermore, as the company acquired for hardware and software licenses they continued to achieve near 100% utilization on their investment. (Figure 2) How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization Page 6

Figure 3: NetworkComputer Capacity Utilization Another customer did an analysis of their existing scheduler versus NetworkComputer for dispatching a large workload of 300,000 jobs. The existing scheduler ran the workload in about 39 hours. NetworkComputer ran the same workload on the same hardware in about 10.5 hours. The theoretical limit is 9.5 hours. (Figure 3) In this case, NetworkComputer was only requiring a quarter of compute resources and software licenses as their existing scheduler resulting in a significant financial saving. Figure 4: NetworkComputer vs. Competition TAT Advantage Finally, consider the example of characterizing a cell library that might have 500 cells at 3 voltages and 16 process corners. That is a total of 24,000 jobs, all of them short. If How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization Page 7

each job is delayed by 45 seconds due to cycle-based scheduling it accounts for 300 hours more server time. The amount of delay this will introduce depends on the number of servers allocated to the project. However assuming a customer is using 20 servers, the dispatch delay alone quickly adds up to an additional 15 hours to the turnaround time. Furthermore, the large jobs that are also being processed on the same hardware can easily be scheduled fairly so that if a big long-running place and route job is launched, the reservation system can quickly acquire all the hardware and necessary software licenses and launch the job in a timely manner. While the advantages mentioned above apply to the majority of the cases, there are corner cases where customers may not fully realize the advantage mentioned above. These include: Access To Compute farms with Large Core Count: Compute resources may already have an excellent job turnover rate. Additional small jobs formed by breaking up larger ones will only have a limited improvement which needs to be offset against the cost of the job refactoring and any possible scheduler overhead. Software License Checkout Time: Software license checkout takes finite time that for short jobs can become significant. The added load on the license daemon should be considered in this case. Uniform Workload May Be Easier to Optimize: For example a design verification workload may benefit from being run on a separate cluster. The normal costs of having a separate cluster may be mitigated by DV s ability to use all the resources all the time. Summary NetworkComputer is a full-featured, high-performance scheduler specifically designed to handle EDA job-mixes in the most efficient way possible, maximizing use of compute resources and software licenses while minimizing latency. As shown above, most jobs in semiconductor development groups are short or can be made to be short by breaking up large jobs that have been bundled together. In that environment, NetworkComputer runs a representative job mix in about one quarter of the time of cycle-based schedulers. How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization Page 8