NC State University Model for Providing Campus High Performance Computing Services



Similar documents
Collaborative Project in Cloud Computing

Virtual Computing Laboratory (VCL)

Powered by VCL - Using Virtual Computing Laboratory (VCL) Technology to Power Cloud Computing

Office of Information Technology Strategic Planning Process June 2012

Hadoop Cluster Applications

UC Irvine Information Technology Consolidation. Dana Roode UCI Office of Information Technology July 8, 2010 (updated 7/2012)

IBM Exam IBM Dynamic Infrastructure Technical Support Leader Version: 5.1 [ Total Questions: 83 ]

Annual Report: Talking Points and Institutional Effectiveness Statements

The Sustainability of High Performance Computing at Louisiana State University

Role of Cloud Computing in Education

Microsoft s Open CloudServer

PRIMERGY server-based High Performance Computing solutions

Program Summary. Criterion 1: Importance to University Mission / Operations. Importance to Mission

IBM Deep Computing Visualization Offering

THE ROLE AND BENEFITS OF IMPLEMENTING CLOUD COMPUTING SYSTEM IN SUDANESE HIGHER EDUCATION INSTITUTIONS

How Desktop-as-a-Service Can Solve Higher Education s End-User Computing Challenges

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Data Center Evolution without Revolution

ANALYTICS PAYS BACK $13.01 FOR EVERY DOLLAR SPENT

NetApp Big Content Solutions: Agile Infrastructure for Big Data

Make the Most of Big Data to Drive Innovation Through Reseach

Virtual Infrastructure: A Guide to Bottom-Line Benefits WHITE PAPER

Top 10 Automotive Manufacturer Makes the Business Case for OpenStack

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

How Microsoft Designs its Cloud-Scale Servers

Cloud Computing for Universities: A Prototype Suggestion and use of Cloud Computing in Academic Institutions

CLUSTER COMPUTING TODAY

Unparalleled demands on storage shift IT expectations for managed storage services. April 2015 TBR

Bringing Much Needed Automation to OpenStack Infrastructure

Exam : IBM : IBM NEDC Technical Leader. Version : R6.1

Logically a Linux cluster looks something like the following: Compute Nodes. user Head node. network

UNIVERSITY OF MASSACHUSETTS PROCEDURES FOR UNIVERSITY APPROVAL OF NEW ACADEMIC DEGREE PROGRAMS, PROGRAM CHANGES, AND PROGRAM TERMINATION

Case Study: D+M Group Cloud Consolidation: Migrating 15 physical data centers into three private clouds

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

HP 3PAR storage technologies for desktop virtualization

Consolidate and Virtualize Your Windows Environment with NetApp and VMware

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

University of North Carolina at Greensboro

Ubuntu and Hadoop: the perfect match

IJRSET 2015 SPL Volume 2, Issue 11 Pages: 29-33

IBM System x reference architecture solutions for big data

Leveraging Statewide Virtualization And Emerging Technologies To Improve Education

Managed Servers ASA Extract FY14

Achieve Economic Synergies by Managing Your Human Capital In The Cloud

Issues in Information Systems Volume 16, Issue I, pp , 2015

White Paper: Optimizing the Cloud Infrastructure for Enterprise Applications

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

ITS Technology Infrastructure Plan 3/7/2013

Dell Wyse Datacenter for Virtual Labs with Citrix XenDesktop

Department of Chemistry Program Review Dean s Response

Microsoft Exchange Server 2003 Customer Solution Case Study

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Medical Center Trims Budget by $600,000 by Switching to Hyper-V Private Cloud

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Is Hyperconverged Cost-Competitive with the Cloud?

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

Private Cloud for the Enterprise: Platform ISF

Open Source and IBM. Dr. Bob Sutor Vice President, Open Source and Linux IBM Software Group IBM Corporation

Introducing Red Hat Enterprise Linux. Packaging. Joe Yu. Solution Architect, Red Hat

Cost Benefits of Server Virtualization

Hoster Saves $971,000 in Data Center Costs with Integrated Virtualization Solution

news Oracle ZDLRA Zero Data Loss Recovery Appliance

Information Technology Plan

Easier - Faster - Better

Migrating Production HPC to AWS

IBM Enterprise Content Management Product Strategy

CLOUD COMPUTING AND BENEFITS OF PRIVATE CLOUD IN E-LEARNING SOLUTIONS

MANAGEMENT AND ORCHESTRATION WORKFLOW AUTOMATION FOR VBLOCK INFRASTRUCTURE PLATFORMS

Build more and grow more with Cloudant DBaaS

The Ultimate in Scale-Out Storage for HPC and Big Data

Shipper Switches from IBM Lotus Notes to Microsoft Solution, Improves Productivity

How To Virtualize A Server At Swic

Manitoba s Server and Data Centre Consolidation Success Story. GTEC October 19, 2011

The Pain Curve Lack of Hadoop Automation Leads to Failure

Understanding Virtualization and Cloud in the Enterprise

Microsoft Deploys EMC SAN for Data Centers and Reduces Costs by 50 Percent

Healthcare Firm Gains More Efficiency, Cuts Costs with Private Cloud Environment

DISTRIBUTED COMPUTER SYSTEMS CLOUD COMPUTING INTRODUCTION

Implementing Multi-Tenanted Storage for Service Providers with Cloudian HyperStore. The Challenge SOLUTION GUIDE

Platform as a Service: The IBM point of view

The Challenge of Managing On-line Transaction Processing Applications in the Cloud Computing World

MOVING TO THE CLOUD: Understanding the Total Cost of Ownership

Delivering Cloud Services

SGI HPC Systems Help Fuel Manufacturing Rebirth

Hadoop in the Hybrid Cloud

CommVault Backup Appliance with NetApp

BIG DATA TRENDS AND TECHNOLOGIES

IBM Managed Hosting - server services virtual xseries

NC State University Initiatives in Big Data

The Identity Management Collaborative: Remote Middleware Support

Building an AWS-Compatible Hybrid Cloud with OpenStack

Technology Insight Series

custom hosting for how you do business

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

A Tale of Two Workloads

Cloud-ready network architecture

IIS AND HORTONWORKS ENABLING A NEW ENTERPRISE DATA PLATFORM

Transcription:

Eric Sills, PhD Director of Advanced Computing Office of Information Technology NC State University NC State University Model for Providing Campus High Performance Computing Services Sustainable evokes the feeling of perpetual motion - start it and it sustains itself - but sustainable actually requires nearly continuous ongoing work, adaptation, and adjustment. North Carolina State Universityʼs (NC Stateʼs) central, academic information technology unit began offering high performance computing (HPC) services in July 2003. NC Stateʼs HPC activity was urgently implemented in response to the closing of the North Carolina Supercomputing Center (NCSC) which, although it had operated for 13 years, ultimately was not able to be sustained. Well over half of the use of NCSC resources was by NC State researchers and there were no central HPC services at NC State. When it was announced that NCSC would close at the end of June 2003 it was clear to NC State administrators that they were facing a serious situation due to the large number of graduate student thesis research projects and the large number and value of research grants dependent on the availability of HPC resources. The primary compute resource at NCSC in 2003 was a large IBM SP system that was three years old and was being purchased on a five year payment schedule of about $2M per year with additional operational costs in excess of $1M annually for system software, power, and cooling. The total annual NCSC budget was more than $5M with much of the remaining funds used to cover staff costs for an operations group, a systems group, and a scientific support group. It was immediately clear that attempting to replicate the cost structure of NCSC was not feasible for NC State. A lower cost alternative was needed - and quickly. A group of researchers and academic IT administrators looked at cost models and determined that a minimal HPC service using commodity servers to create a Linux cluster could be provided to the campus with $1.5M annual funding. NC Stateʼs Chancellor, Provost, and Vice Chancellor for Research reviewed the proposal and agreed to provide annual funding of $800K (or only about half the funding that appeared to be needed for a minimal service). The Provost's Office and the Research Office each agreed to provide $345K annually with the remaining $110K being provided from the central academic IT unit (which at the time reported to the Provost). This base HPC funding provided for three positions: one primarily systems focused, one primarily applications focused, and an architect spanning systems and applications; a modest suite of system and application software; and compute and storage hardware. Rather than purchase a single cluster financed over a number of years the strategy was adopted to incrementally add to the cluster each year, buying hardware with three year warranties and retiring hardware as it fell out of warranty (or became obsolete for HPC

use). In part due to limited data center space and in part due to performance characteristics blade servers were selected to be used for the cluster compute hardware. It was recognized that NC Stateʼs base HPC funding level was insufficient to provide the capacity to meet existing HPC needs. A Partnerʼs Program was developed to leverage the base HPC investment by attracting faculty partner participation to expand the central resource rather than buying small departmental or individual group clusters (of which there were already many on campus). Over time the central HPC hardware funds have increasingly been used to provide infrastructure: racks, chassis, networking, and storage to support partner purchased compute resources. The agreements, in the form of memoranda of understanding, that are made with partners are that they can use the quantity of resources that they purchase as much as they need. But, if the resources are not being used by the partner owner then they are made available to the general campus HPC user community. Overall our experience has been that about half of the cycles from partner resources are used by the partners and about half are available for use by the general campus community. NC State HPC services are available at no charge to all NC State faculty members using a fair share scheduling algorithm. All projects receive an equal base share based on the quantity of the cluster resource that have been purchased with base funds. Partners receive additional shares based on the quantity of resources that they have added to the cluster. Partner participation in 2003 was small, but increased quickly and has been about $500K annually the last few years [detailed annual reports are available online at http:// hpc.ncsu.edu/documents/annual_reports]. Since 2003, the HPC base funding experienced an initial period of decline, but has recently been stable at about $690K annually. In 2004 NC Stateʼs central HPC group began a collaboration with NC Stateʼs College of Engineering seeking to provide application delivery services to student owned computers. The objective was to provide access to applications students needed for their work anywhere their computers had a network connection and without regard for the flavor of operating system being used. This was motivated by an increasingly large fraction of students arriving on campus with their own computer. The ability to deliver applications to the studentʼs computers enhances the value of the studentʼs computers and increases safety as students are able to work effectively from their dorms and apartments rather than travel, potentially late at night, to campus computer labs. Distance education was another motivation. Coming to a campus computer lab is not typically feasible for distance education students - therefore limiting courses that could be offered through distance education. This HPC/Engineering collaborative project was named the Virtual Computer Lab (VCL) - in that the objective was to deliver the computer lab virtually to the students wherever they were, whenever they wanted, on the platform of their choice. VCL services are also available to faculty and staff to serve their needs.

While it may not be immediately obvious that this VCL service complements HPC, the two services turn out to be extremely synergistic. The concept of delivering applications on demand had already been developed, but what the HPC team recognized was that the architecture and tools used for managing the commodity cluster could also be effectively used to deploy any arbitrary image onto a cluster node - not just the standard HPC compute node image. The College of Engineering team developed a web interface that manages authentication and resource scheduling while connecting to the HPC cluster management tool on the backend to handle resource provisioning for Linux or Microsoft Windows based environments. While HPC demand by NC State researchers consistently exceeds available capacity, the imbalance grows worse during course breaks and summers when faculty and graduate students have more time to focus on their research. With the addition of VCL hardware resources - identical to HPC hardware resources and managed with the same tools - the VCL resources can be dynamically reallocated to HPC use when not needed by students simply by loading the HPC compute node image in place of the student application image and automatically adjusting network connections for the blade servers. Since 2004 the number of VCL server resources has grown to similar size as the number of HPC server resources. While VCL demand does not disappear over summer and course breaks it is significantly reduced, allowing the HPC resource pool to be significantly increased during these periods of increased demand for HPC resources. Another synergy for HPC and VCL is staff resources. Even with the significant expansion of hardware resources since 2003 (nearing 2000 physical servers distributed across three data centers) these are still being managed by one systems person utilizing the freely available cluster management tool xcat. This scale is also a result of the strategy of incrementally adding hardware rather than making large monolithic cluster purchases. There are continuously a few new blade servers being added to the cluster - a steady stream of work, not the tidal wave of work associated with installation of a new multi-hundred node cluster. While not part of the original objectives for VCL, VCL has provided significant additional HPC capabilities through dynamic application delivery. The most widely used is the ability for HPC users to reserve a dedicated login node to use for resource intensive interactive tasks - such as visualization or interactive meshing. Previously, these tasks were performed on shared login nodes and negatively impacted the performance of other users and the user of the resource intensive application. VCL also provides the capability to dynamically provision a cluster on demand - loading a specified image onto a number of nodes. Previously, requests from HPC users to test customized operating systems were not feasible to accommodate, but VCL allows this to be done routinely and to be done by the user, rather than requiring support staff time. A user can generate an image with customized operating system that can be loaded on any number of physical servers through a single web interface request. In retrospect, what has been developed with VCL is a cloud computing environment utilizing a commodity HPC hardware architecture. Today the VCL managed cloud delivers both HPC services and instructional applications and soon is likely to also deliver administrative applications. The VCL provides additional HPC resources during

periods of peak HPC resource demand while allowing student application delivery to leverage HPC staff expertise. Both services are able to be provided from the shared environment more cost effectively than if implemented within separate silos. Sustainability There are a number of important points that have enabled the NC State HPC activity to be sustained for the past seven years and that appear to position the activity well to be sustained moving forward. Base Funding - In spite of some early decreases, base funding for HPC has remained relatively constant. This has allowed the hiring of key staff resources and creation of a hardware nucleus to attract additional investments. Having some staff support is important in attracting HPC partners as is having some compute and storage capability onto which the partner can build. Incremental Expansion - The strategy to incrementally expand the HPC cluster rather than to seek to finance the purchase of larger systems has resulted in a number of benefits: the work of installing new hardware is a relatively small, nearly constant effort; newly released, improved hardware is very quickly available within the cluster; and faculty with grant funds can add cluster resources at any time - not just discrete points when a new cluster is being acquired. Multi-use Cloud Computing Environment - Providing student application delivery services and HPC services from a common hardware environment managed by shared staff allows for higher hardware utilization, lower staff cost, and additional diversity of funding sources. For sharing a systems person, HPC services have significant additional compute resources available during peak demand periods. Valuable Services - It is critical that the HPC services meet the needs of the researchers and enable them to make good progress. One objective of providing central HPC services is to make the researchers more productive, as their groupʼs work can be devoted to research without the diversion of systems administration and maintenance of a local cluster, and for them to get new grants rather than no-cost extensions. This productivity is critical for maintaining the base HPC funding - and HPC usersʼ total active grant values, annualized grant values, and annualized F&A values are all important metrics for NC Stateʼs HPC activity. Cost Effective Services - The services provided to NC State researchers by NCSC were much better in several regards than the services currently available on campus. There was a much more extensive software library, there was much broader and deeper applications support, and there was (relative to the time) a higher level computational capability - but there was also a much higher annual cost. NC Stateʼs campus HPC services have sought to find cost effective compute and storage hardware that faculty partners can afford to augment, to hire a small staff with very broad skills rather than a large staff of highly specialized positions, and to be very selective in making software purchases.

Partners - The most important factor in sustainability of the campus HPC resources are the partners, faculty who invest their funds to expand the central resources. Partners have purchased the vast majority of the nodes in the HPC cluster and have also added significant quantities of storage to the system. Partners also become advocates for the central HPC services within their colleges and departments - recruiting additional partners and maintaining support for the base HPC funding. Having sustained the campus activity for seven years does not make it sustainable in its current form forever. There are significant budget pressures that require that activities continually demonstrate their value and cost-effectiveness. Moving to a cloud computing model based on a commodity HPC cluster architecture has positioned NC Stateʼs campus HPC services well for the immediate future. However, changes will come and it will be important to remain agile and to continue to adapt to new opportunities to effectively deliver needed HPC services more cost effectively.