The Importance of Software License Server Monitoring

Similar documents
The Importance of Software License Server Monitoring

Improving Compute Farm Efficiency for EDA

Scale-out NAS Unifies the Technical Enterprise

Make the Most of Big Data to Drive Innovation Through Reseach

Guideline for stresstest Page 1 of 6. Stress test

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data

OPERATING SYSTEMS SCHEDULING

Cognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services

Adaptive Resource Optimizer For Optimal High Performance Compute Resource Utilization

Rapid Bottleneck Identification

Improving Compute Farm Throughput in Electronic Design Automation (EDA) Solutions

White Paper: Nasuni Cloud NAS. Nasuni Cloud NAS. Combining the Best of Cloud and On-premises Storage

1. Simulation of load balancing in a cloud computing environment using OMNET

- An Essential Building Block for Stable and Reliable Compute Clusters

Microsoft Exchange Server 2003 Deployment Considerations

Characterizing Performance of Enterprise Pipeline SCADA Systems

Capacity Estimation for Linux Workloads

Windows Server Performance Monitoring

WHITE PAPER The Storage Holy Grail: Decoupling Performance from Capacity

Desktop Virtualization and Storage Infrastructure Optimization

The HP Neoview data warehousing platform for business intelligence Die clevere Alternative

The Association of System Performance Professionals

Distributed Software Development with Perforce Perforce Consulting Guide

For more information about UC4 products please visit Automation Within, Around, and Beyond Oracle E-Business Suite

QLIKVIEW SERVER LINEAR SCALING

Choosing the Right Cloud Provider for Your Business

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

ICS Principles of Operating Systems

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

Cloud Management: Knowing is Half The Battle

Customer-Centric Cloud Provisioning. White Paper

Bernie Velivis President, Performax Inc

Project Management Office: Seeing the Whole Picture

Introduction to Apache YARN Schedulers & Queues

Product Overview. UNIFIED COMPUTING Managed Load Balancing Data Sheet

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

Market Landscape Report

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Public or Private Cloud: The Choice is Yours

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Optimizing Shared Resource Contention in HPC Clusters

Copyright 1

Building Scalable Applications Using Microsoft Technologies

Cloud Based Application Architectures using Smart Computing

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

Consequences of Poorly Performing Software Systems

CPU Scheduling Outline

A High Performance Computing Scheduling and Resource Management Primer

Scalable Data Collection for Internet-based Digital Government Applications

Analyzing IBM i Performance Metrics

ioscale: The Holy Grail for Hyperscale

Windows Server Virtualization An Overview

Fair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing

The Virtualization Practice

Cronacle. Introduction

Clustered NAS: Scalable, Manageable Storage

BridgeWays Management Pack for VMware ESX

How To Manage Cloud Computing

CONNECTRIA MANAGED AMAZON WEB SERVICES (AWS)

Accelerating Time to Market:

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

Business Usage Monitoring for Teradata

Optimizing the Virtual Data Center

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

Five Technology Trends for Improved Business Intelligence Performance

The Benefits of Virtualizing

Relational Databases in the Cloud

The Advantages of a Predatory Retail Enterprise

solution brief September 2011 Can You Effectively Plan For The Migration And Management of Systems And Applications on Vblock Platforms?

The Business Case for Virtualization Management: A New Approach to Meeting IT Goals By Rich Corley Akorri

Introduction to AutoMate 6

Integrated Grid Solutions. and Greenplum

New Issues and New Capabilities in HPC Scheduling with the Maui Scheduler

Rapid Bottleneck Identification A Better Way to do Load Testing. An Oracle White Paper June 2009

LSKA 2010 Survey Report Job Scheduler

Survey on Scheduling Algorithm in MapReduce Framework

Capacity Planning Fundamentals. Support Business Growth with a Better Approach to Scaling Your Data Center

Business Process Management In An Application Development Environment

Grid Computing Approach for Dynamic Load Balancing

Software-Defined Storage: What it Means for the IT Practitioner WHITE PAPER

PBS Professional Job Scheduler at TCS: Six Sigma- Level Delivery Process and Its Features

SaaS. Web-Based, SaaS Purchasing Model Lives Up To Its Promises. Invoice Approval. Purchasing. Receiving. Inventory Control Purchasing Requisitions

Accelerate Testing Cycles With Collaborative Performance Testing

/ Operating Systems I. Process Scheduling. Warren R. Carithers Rob Duncan

DAS, NAS or SAN: Choosing the Right Storage Technology for Your Organization

CLOUD COMPUTING SOLUTION - BENEFITS AND TESTING CHALLENGES

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

What is meant by the term, Lean Software Development? November 2014

SSD Performance Tips: Avoid The Write Cliff

Making the right choice: Evaluating outsourced revenue cycle services vendors

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Service Virtualization:

A PRACTICAL GUIDE TO COST SAVINGS WITH VIRTUALIZATION

SAS Grid: Grid Scheduling Policy and Resource Allocation Adam H. Diaz, IBM Platform Computing, Research Triangle Park, NC

Axceleon s CloudFuzion Turbocharges 3D Rendering On Amazon s EC2

HPC-Nutzer Informationsaustausch. The Workload Management System LSF

Developing a Load Testing Strategy

Transcription:

The Importance of Software License Server Monitoring NetworkComputer Meeting The Job Scheduling Challenges of Organizations of All Sizes White Paper

Introduction Every semiconductor design group uses a job-scheduler of some sort, whether it is specified by a corporate IT department or selected by the group themselves. At a high level, the function of a job scheduler is simple: be aware of what is in the queue and what resources are available (hardware and software licenses), and make good decisions about what job to schedule when. In practice there are a number of subtle issues that are just as important: job mix, prioritization and ease of support to name a few. While every company is unique, the issues surrounding job-scheduling are often related to the size of company or more accurately the size of the compute farm and design teams. Small companies with limited resource (100-500 cores), tend to be focused on using their job-scheduler to solve a single problem, such as maximizing the use of their precious simulation licenses. In this case, job schedulers are typically managed by the designers themselves using a commercial/freeware solution or even relying on internally developed scripts. With limited resources, small companies are well aware of the resource tradeoff between investing effort in the scheduler versus doing the real work. There is no enthusiasm for bringing an expensive enterprise class job-scheduler into the organization as it would be both cost and resource prohibitive. The most important factor in a job-scheduler for this type of organization is making optimal use of the limited number of software licenses since buying more is rarely an option. Therefore; they heavily rely on all of the exotic features of a scheduler to maximize their software license usage. A medium sized company (500-3,000 cores) depends more on the visibility and transparency of what is being scheduled since there are likely multiple design teams competing for the same resources, while the administration and management of jobschedulers falls on system administrator resources shared with several other groups. With multiple teams competing for the same resources, it becomes essential to be able to customize the job scheduler to accommodate varying requirements and have the flexibility and visibility to ensure that resource allocations are being made consistent with the company priorities. Departments in large organization (greater than 3,000 cores) have a different set of issues as their available resources are being used for a wide range of purposes. Large organizations are typically supported by a sizable IT department managing a single general-purpose job-scheduler. However, a semiconductor design team within a large entity may find itself at odds with the internally supported solution since their specific job mix might require more queue slots and higher dispatch rates than the jobscheduler can handle, as is typically the case with library characterization, design Meeting The Job Scheduling Challenges of Organizations of All Sizes Page 2

verification and large tape-out jobs. For this group the most important factor may simply be to have the ability to remove their problem jobs from the IT organization s sanctioned scheduler, running a departmental-level scheduler alongside or on top of the corporate solution. Common Scheduling Problems Despite the somewhat different use cases between the organizations mentioned above, there is a lot of commonality between them in scheduling their workloads and thus in the type of job scheduler that can address their needs. Software Licenses Utilization The biggest problem facing any organization looking to acquire or adopt a scheduler remains maximizing the utilization of their compute resources and software licenses. Semiconductor design is uniquely different as the cost of software licenses required is many times higher than the cost of the hardware on which they run. In this scenario, it is of utmost importance to select a job-scheduler that ensures maximum utilization of software licenses. Complexity One of the most common problems is the complexity of the job-scheduler and expertise required to administrate the solution. For example, maintaining a complex scheduler that requires an expert from IT organization will not suit a small or medium size company as it takes time away from design resources. Adaptability Job-schedulers need to have the correct feature set to handle the typical mix of jobs that a semiconductor design group produces. Typically this is a varied mix consisting of a large number of small jobs that run for short periods of time as the case is in design verification or library characterization, along with a limited number of extremely large jobs that run for hours or even days (e.g. during the final runs of tape-out). In particular, it is important that the easy to schedule, small jobs do not cause excessive wait times to the large jobs. There is also a need for having a provision to remove any roadblocks when a lower priority long-running job is consuming a critical resource. Having the right mix of features that facilitate setting and managing job priorities and organization policies is an essential requirement for a job scheduler in the semiconductor design environment. Scalability If a job-scheduler is too slow or cannot scale to a large number of jobs it invariably penalizes small jobs as they either run too slow or in worst case cannot even be queued up without being manually batched to run. This encourages designers to combine small jobs into larger ones, prohibiting them from being executed in parallel and occupying Meeting The Job Scheduling Challenges of Organizations of All Sizes Page 3

resources such as software licenses for a long time, ultimately having a negative impact on large jobs too. Visibility Transparency is another important consideration for job schedulers. If the jobscheduler s queue and resource allocations are too opaque, design groups who are the true users will not be able to adapt their job submission behavior based on the current state of the project, queue and resource availability. Some aspects of design, such as functional verification or fault grading, are never complete however it makes little sense to flood the compute farm with a large number of these jobs when critical or interactive jobs are waiting. In contrast, it makes more sense to look at availability, see what other jobs are competing for resources before making job submission decisions or self imposed job limits. All organizations follow this methodology to some extent, running large jobs overnight while keeping some interactive resources free during the day. NetworkComputer: Job-Scheduler for All Organizations Runtime s NetworkComputer is a full-featured, high performance yet agile scheduler designed with EDA environment in mind. Scalability & Performance NetworkComputer s efficient memory model allows it to keep track of job counts in excess of a million at any given time. Unlike Cycle-based solutions where jobs are submitted some fixed boundary (e.g. every minute or so), NetworkComputer s Event Driven architecture allows it to dispatch jobs at any moment available resources are identified. Furthermore, NetworkComputer s dispatch latency is only a few milliseconds. These two factors eliminate the attractiveness of batching up small jobs into large ones, reducing latency until the last of a collection of small jobs completes, while increasing the utilization of the server farm by running as many small jobs at the same time as the hardware can accommodate. Figure 1 illustrates the elapsed time impact of cycle based scheduler dispatch latency on the turn-around-time for 100,000 jobs running on 500 cores on an average of 5 minutes per job. The results clearly illustrate NetworkComputer s performance advantage over alternative cycle based solutions with a turn-around time advantage ranging from 23 minutes to ~2.5 hours for a single iteration. This turn-around-time advantage has a compounding affect when the total number of jobs and expected number of iterations are taken into consideration for a typical semiconductor design environment. Meeting The Job Scheduling Challenges of Organizations of All Sizes Page 4

Figure 1: Turn-Around-Time Impact of Job Scheduler Dispatch Latency Software License Utilization NetworkComputer is software-license-aware job scheduler with full visibility to both the licenses that are in use and available along with a full understanding of the licenses that are required by the queued jobs. For example, NetworkComputer will not dispatch a job knowing that the desired software license is not yet available as it would prohibit another job with an available license to use the allocated compute resources. Unlike other batch systems whose knowledge of license availability is limited to a simple running job count, NetworkComputer keeps track of actual license availability for jobs both in and out-of-queue. NetworkComputer also keeps track of whether jobs are actively any given software licenses that they claimed and can reallocate them to jobs waiting for them, a procedure known as license reconciliation. This is useful for composite jobs in which scarce licenses are only used for a portion of the job. The resulting holistic view of both hardware and software resources enables efficient job scheduling. Figure 2 represents the license utilization of the simple workload mentioned above comparing various job dispatch delays. The results demonstrate the impact of the job scheduler architecture and its efficiency on the overall software license utilization. In this scenario, NetworkComputer achieves a software license utilization of 98.4% as compared to maximum license utilizations ranging from 82.7% to 95.1% for the comparable cycle based implementations. At software license utilizations of near 80% one can imagine that it becomes increasingly more difficult to justify procurement of additional software licenses as most organizations opt to increase their license utilization which can be nearly impossible if the root cause is driven by the dispatch latency of the job scheduler itself. Meeting The Job Scheduling Challenges of Organizations of All Sizes Page 5

Figure 2: Job Scheduler Dispatch Delay Impact On Software License Utilization Visibility NetworkComputer has a single job queue for all jobs, the simplest possible approach for users. Other schedulers end up with a proliferation of designated task or resource specific queues (i.e. tape-out queue) limiting end-user visibility while creating unnecessary confusion. NetworkComputer has a number of mechanisms for displaying both the current job mix running on the hardware in tandem with what is waiting in the queue to run. This provides end-users with a high level of visibility on the status of their jobs while allowing system administrators with the tools to make administrative changes in order to meet organizational priorities such as changing job priorities or manually pre-empting a job. (Figure 3) Meeting The Job Scheduling Challenges of Organizations of All Sizes Page 6

Figure 3: NetworkComputer Provides Users With Added Visibility Adaptability Job schedulers need to be equipped with the necessary tools for allocating resources according to some policy. NetworkComputer handles this by a system called Fair Share that allocates shares of the resources hierarchically, a share for each group, a share for each sub-group, down to the level of individuals. Within each level, it is possible to change the allocated shares quickly to ensure that the changing requirements of the group are reflected in the allotted shares. Fair Share looks back through a window of time and attempts to ensure that everyone gets their fair proportion of the total resources, in line with their shares during that period. This ensures that when there is resource contention, the resulting share of resources are allocated to a predetermined policy. Unlike a simple priority system, Fair Share ensures that even low priority get their share of resources. It is equally critical that users have full visibility into what share of the compute resources their jobs are getting as it is for the management to have the ability to set priorities as project objectives change or as the design moves through the natural cycle towards tape-out. For sharing to work well in most organizations it needs to be visible and seen to reflect project and organizational priorities. However, fair does not necessarily mean equal as lower priority jobs should get fewer resources but should not be stalled or delayed forever. Figure 4 illustrates a scenario where a single user is running a total of five jobs with varying weights ranging from 30 to 200 assigned to each of the tasks. In this example, mars is assigned a weight of 200 out of a total of 500 or 40% target share but in reality it has only received a 16.75% share. As a result, mars is assigned a rank value of 0 as the highest priority job to get access to compute resources whereas during the same period pluto has manage to receive an actual share of 46.90% in comparison to its assigned target of 20%, hence being assigned the lowest rank value of 4. Meeting The Job Scheduling Challenges of Organizations of All Sizes Page 7

Figure 4: NetworkComputer Provides Full Visibility Into Fair Share Settings Another important consideration facing today s design organizations is to ensure that difficult to schedule jobs, those requiring many cores, specialized hardware, large amounts of RAM and scarce license resources, can be blocked by lower priority but easier to schedule jobs. There are two ways in which this can happen. First, a job that requires multiple cores may wait forever if the job scheduler aggressively schedules smaller jobs on every core immediately as they are freed up. Second, a job that requires a specific resource (such as a core attached to an emulator) may be blocked by a job running on that core while it did not require any emulation resources. NetworkComputer addresses both of these issues with a reservation concept, noticing that the problem is occurring and leaving cores unscheduled for a time waiting for enough to be freed to dispatch the large job. Reservation also provides a mechanism to ensure that jobs that do not require specialized resources do not initiate if jobs that require the specific resources are waiting in the queue. A powerful feature of NetworkComputer is pre-emption. This capability allows rules to be set in advance or via manual intervention to prevent long running jobs that have already started from blocking higher priority jobs. NetworkComputer features three different methods of pre-emption: Cancel and Re-Queue: A lower priority job is killed and put back on the queue to restart at a later time when the necessary resources are available. Suspend and Restart: Allows a lower priority job to be suspended while its resources are made available to a pending higher priority job. Upon completion of the high priority job, the suspended job is restarted to continue. Soft pre-emption: The ability to reserve both hardware and software resources for waiting jobs. Summary NetworkComputer is a versatile high-performance, full-featured scheduler designed with EDA in mind, meeting the requirements and challenges of small, medium and large companies. Meeting The Job Scheduling Challenges of Organizations of All Sizes Page 8

NetworkComputer can handle the demands of a small company. Small companies can greatly benefit from its end user visibility resulting from NetworkComputer s single queue architecture and leverage its Fair Share and pre-emption capabilities to maximize their resource utilization without the IT administration overhead. Medium size companies with multiple design teams sharing a common job-scheduler can leverage NetworkComputer s ability to easily handle a heterogeneous job mix, along with the combination of Fair Share priority, reservation and pre-emption to ensure that even the most diverse mixture of jobs can be efficiently scheduled with efficient resource and competitive job latency. Furthermore, NetworkComputer provides the visibility necessary to gain an understanding of what jobs are waiting and why. NetworkComputer s unlimited queue size and the fast dispatch rate makes it an ideal solution for large companies as eliminates any need to batch up smaller jobs. NetworkComputer will efficiently schedule small jobs in and around the large jobs while ensuring that high priority large jobs are not unnecessarily delayed. Reservation and pre-emption allow the largest jobs to run in a timely manner. The mixture of speed and simplicity, visibility and controllability gives a departmental view of the available resources to allow tuning them to the department s specific needs. In this scenario, NetworkComputer works on top of, or alongside, the existing IT/corporate sanctioned job-scheduler while offloading certain departmental workloads eliminating bottlenecks and improving user experience. In summary, Runtime s NetworkComputer s high-performance event-based architecture combine with a rich set of policy management features along with the single queue transparency is the right solution for small, medium and large organizations. Meeting The Job Scheduling Challenges of Organizations of All Sizes Page 9