UC High Performance Computing Position Description (PD) HPC System and Application Support Specialist - NeSI Fixed Term Position Number: 18962 July 2015 Our Vision We have a vision of people prepared to make a difference: tangata tū, tangata ora. Our mission is to contribute to society through knowledge in chosen areas of endeavour by promoting a world class learning environment known for attracting people with the greatest potential to make a difference. We seek to be known as a University where knowledge is created, critiqued, disseminated and protected and where research, teaching and learning take place in ways that are inspirational and innovative. Looking towards 2023, the 150th anniversary of our founding, the primary components of our strategy are to Challenge, Concentrate and Connect. Purpose of the UC High Performance Computing UC High Performance Computing (HPC) is to deliver HPC services to the research community across the University and to promote the reputation of the University of Canterbury (UC) as a centre of expertise in HPC. New Zealand escience Infrastructure Collaboration The New Zealand escience Infrastructure (https://www.nesi.org.nz/) is a collaboration of five institutions: The University of Auckland, the National Institute of Water and Atmospheric Research Limited, Landcare Research New Zealand Limited, the University of Otago. NeSI, working in partnership with the Crown, provides High Performance Computing (HPC), analytics and consultancy services to the NZ Research sector, Government Initiatives/Agencies and Industrial Research. NeSI s primary aim is to grow the computing and analytics capability of researchers to ensure New Zealand s future prosperity. Role, Purpose and Scope This position consists of two roles in the NeSI Collaboration: System Engineer (0.5 FTE): The HPC System Engineer works as a technical expert in the NeSI Platforms Service Line and a trusted adviser to the NeSI Platforms Manager. He/she maintains a keen awareness of developments in the science and HPC arena, and will contribute to strategies, plans and projects that enhance NeSI services. Human Resources rs_msc35 Page 1 of 9 Date issued: 17-Jun-15
This position is responsible for the operation and availability of NeSI HPC Platforms and is expected to work with ICT Services in UC for obtaining and monitoring infrastructure required by the HPC platforms in UC. This position also works with other NeSI colleagues to support HPC users. There are opportunities for this position to be involved in the decommission of existing NeSI HPC Platforms, and procurement and commission of future HPC Platforms. Application Support Specialist (0.5 FTE): The Application Support Specialist is a member of the Application Support Team in the NeSI Solutions Service Line. The team supports researchers directly through compiling, deploying and tuning research-related software using one or more of NeSI HPC platforms. This team is responsible for the NeSI Service Desk function and working with members in the NeSI Platform Service Line to resolve both internal and external incidents or service requests which are usually related to machines access, job submission, the use of specific research-related software installed on NeSI HPC platforms or the setup of a workflow. The team also has the responsibility to identify and escalate research project opportunities to the Computational Science Team. Key Relationships Reporting Relationships Responsible to: UC HPC Director Reports to: NeSI Site Manager (Staff Management), NeSI Solutions Manager (Functional Management), NeSI Platforms Manager (Functional Management) Responsible for: None Functional Relationships The HPC System Engineer will develop and maintain excellent relationships with the following colleagues, customers and clients for the purposes stated below: Internal Relationships Who does the job holder work or interact with inside the University Site Manager - NeSI ICT Services staff University administration staff The purpose and frequency of these interactions is to: On demand: staff management reporting On demand: Coordinate with UC ICT service to obtain ICT services required by NeSI services On demand: Obtaining University services. External Relationships Who does the job holder work or interact with outside the University NeSI Platforms Manager NeSI Platforms Service Line members NeSI Solutions Manager or Application Support Team Lead The purpose and frequency of these interactions is to: At least weekly: Obtain new tasks for a new project, provide update on ongoing projects and business as usual activities, surface new opportunities solicited from researchers and escalate issues that requires the Management Team s intervention. Weekly: Inform current workload and system issues to other members in the NeSI Platforms Service Line, and provide assistance to others when it is necessary and is required. At least weekly: Obtain new tasks for a new project, provide update on ongoing projects and business as usual activities, surface new opportunities solicited from researchers and escalate issues that requires the Management Team s intervention. Human Resources rs_msc35 Page 2 of 9 Date issued: 17-Jun-15
NeSI Application Support Team members NeSI team members End users of NeSI services IT Service/Infrastructure Providers Weekly: Inform current workload and achievements on research projects to other members in the Application Support Team, and provide assistance to others when it is necessary and is required. On demand: Support other NeSI team members to deliver NeSI services and projects On demand: Work with researchers who use NeSI services to optimise their use of NeSI services On demand: Coordinate with external IT service provider to obtain services or infrastructure required by the NeSI services Salary Range This position is full-time (nominally 37.5 hours per week = 1 FTE) and for a fixed term until 30 th June 2018. This position is in Band 6. Delegations Human Resources Has involvement in training / guiding staff within the organisation or manages large and complex projects, but without line management responsibility. Financial Budgetary and Expenditure Limits Budget Expenditure No authority to commit to expenditure Purchase Orders No authority to approve or issue purchase orders Purchase Card (P-Card) Monthly limit of $10,000 Correspondence No authority to sign external correspondence Human Resources rs_msc35 Page 3 of 9 Date issued: 17-Jun-15
Key Result Areas 1. Supporting Researchers Researchers are satisfied when using NeSI platforms and services. 1. Support NeSI team members in the development, testing and deployment of new and revised NeSI services; 2. Assist new users to gain access to NeSI HPC platforms and services; 3. Implement and adapt complex operating systems and related software to meet NeSI team members and users requirements; 4. Provide direct support to researchers with their workflows and scientific applications needs that assist them in achieving their research outcomes; 5. Install, port, optimise, verify and maintain research-related software and their documentation on NeSI platforms; 6. Respond to incidents lodged by NeSI team members or end users in accordance with their assigned priority, which is based on impact and urgency. Escalate further NeSI Site Manager in UC or NeSI Solutions Manager, NeSI Platforms Managers if necessary; 7. Contribute to documentation and knowledge base used by end users and NeSI team members; 8. Maintain user identify information in accordance with UC s and NeSI s security and identify policy; 9. Assist in planning and running workshops and training courses to grow the NeSI user community. 2. NeSI HPC Platforms and Services NeSI HPC Platforms and services achieve their designed availability and service levels. 1. Ensure NeSI HPC platforms and services are available at the agreed level of services through appropriate service design and administration; 2. Undertake and optimise day to day system administration of NeSI HPC platforms and its resources, including but not limited to backups, resource utilisation monitoring, and storage and system tuning; 3. Work with other NeSI team members to maintain high levels of HPC platform utilisation through automating and streaming workload management wherever possible; 4. Establish monitoring, alerting and reporting for the performance of HPC platforms and services and their utilisations to support capacity planning; 5. Identify and report on systems software and hardware problem trends and suggest appropriate test and potential solutions; 6. Undertake problem analysis and resolution in accordance with approved Problem Management and Changes Management process; 7. Review diagnoses of software or hardware failures and recommended solutions; 8. Research, design and specify and plan the technical implementation of new NeSI services and enhancements; 9. Maintain up to date, accurate and complete document and configuration data for HPC platforms and services, in accordance with the approved Change Management and Configuration Management processes; 10. Ensure software used by the HPC platforms is up to date and proactively managed, including prerelease testing, validation against operational systems as part of the approved Change Management process 11. Ensure the HPC platforms and services comply with NeSI access and service policies; 12. Have adequate peer-reviewed disaster recovery documentation in an approved repository. 3. NeSI Application Support Team Human Resources rs_msc35 Page 4 of 9 Date issued: 17-Jun-15
Knowledge, skills and practices are well shared and established within the team. 1. Achieve a high level of understanding of NeSI platforms, research-related software and HPC software tools; 2. Prepare documentation regarding activities and procedures, with such documentation of a standard to permit other members of the NeSI Solutions Service Line to perform the task at needed; 3. Have regular communication with the NeSI Solutions Manager and other members of in the service line regarding progress on duties and projects. Preparation of regular reports, as required, and attendance at regular meetings; 4. Complete tasks in a timely and accurate manner, and completing timesheets that record activities in the standard form; 5. Train other staff as required. 4. Vendor Relationship Management Vendors achieve service levels specified in Service Agreements. 1. Monitor all services supplied by vendors to support the operation of HPC platforms and services and report on their performance; 2. Lodge all system incidents with vendors, response their follow up requests for information, monitor and track progress on these incidents through to resolution. Escalate incidents as appropriate; 3. Assist vendors staff to fulfil their duties when are resolving a reported incident or planning a system upgrade. 5. Support NeSI s overall objective NeSI Computational Science Team contributes to NeSI s overall objective. 4. Maintaining positive relationships with users and colleagues in the NeSI Solutions Services Group, NeSI Platform Services Group, research users, encouraging open and constructive communication at all times; 5. Promoting knowledge-sharing within the NeSI Solutions Services Group, NeSI Platform Services Group, and the wider research community; 6. Contributing to building and maintaining a positive work environment and culture that enables NeSI colleagues to perform to their potential. 7. Using NeSI shared tools (as appropriate) to share knowledge of NeSI HPC services and systems, thus avoiding sole expert status and helping to develop the technical expertise of others; 8. As needed, visiting other NeSI sites to maintain understanding of the staff and systems based there. 9. Adhering to NeSI s standards of professionalism and ethics. 10. Actively contributing to the development of NeSI plans, technology roadmaps, service designs, and annual budgets, providing expert-level technical advice based on detailed knowledge of HPC, NeSI user needs, and HPC industry developments and practices; 11. Supporting NeSI reporting requirements. 6. University Service The University is assisted with the attainment of its strategic objectives through the provision of commitment and contribution to the wider wellbeing of your College/Service Unit. 1. Participate in projects in line with your College/Service Unit s strategic objectives. Human Resources rs_msc35 Page 5 of 9 Date issued: 17-Jun-15
2. Keep current and comply with UC systems, policies and procedures and relevant legislation, and constantly look for ways to improve processes and procedures. 3. Contribute to the University s image as a good place to work and study through the provision of high quality, professional services and showing courtesy and respect in interactions. 4. Demonstrate an honest respect for and appreciation of biculturalism and diversity by supporting fair treatment and equal opportunities for all. 5. Contribute to the sustainability efforts of the University through the responsible use of resources and equipment. 6. Demonstrate commitment to providing students with an educational environment that incorporates the UC Graduate Attributes employability and entrepreneurship, community engagement, internationalisation and contribution to a bi-cultural New Zealand in a multi-cultural society. 7. Health and Safety A safe and healthy working and learning environment is maintained at all times. 1. Comply with Occupational Health and Safety Legislation and Regulations. 2. Observe all University of Canterbury safe work policies, procedures and instructions. 3. Take responsibility for your own health and safety and ensure no action or inaction on your own part harms others in the workplace. 8. Projects or Other Duties To carry out other duties which may reasonably be required by your Manager from time to time in the course of the University s business and which fit the role s purpose as stated, and for which the position holder is qualified or has received adequate training or instruction. Human Resources rs_msc35 Page 6 of 9 Date issued: 17-Jun-15
Professional Development and Review (PD&R) The University has a Professional Development and Review Process (PD&R) which is undertaken annually. During this process, the Manager and Staff Member will discuss and agree what contribution the Staff Member is expected to make during the review period towards achieving the University s objectives. Objectives (consistent with the Key Result Areas and Behaviours in this Position Description and the Department / Unit / College s Business Plan); performance measures (indicators of achievement) and the support (including development) required by the Staff Member to achieve these objectives will be agreed. NeSI Team Memorandum of Understanding MANAGEMENT PERFORMANCE OBJECTIVE & DEVELOPMENT PLANNING PERFORMANCE & DEVELOPMENT REVIEW PROCESS ANNUAL SALARY REVIEW PROCESS Day-to-day functional management and allocation of work will be the responsibility of the Services Manager with advice from the Site Manager. If any dispute or difference shall arise between local site employee and Service Line Manager(s), both parties shall use their best endeavours to resolve such dispute or difference in the spirit of co-operation and good faith. If the parties are unable to resolve the matter themselves, they will participate in mediation with a mutually acceptable third party appointed if necessary by the Board. The Site Manager in collaboration with the relevant Services Manager(s) will agree performance objectives and development plans with the employee by [date] of each year in accordance with their local processes. The Site Manager will lead the performance review process in accordance with local policy and procedures with the relevant Services Line Manager(s) providing second level review, feedback, and support where applicable. In accordance with local performance and development review progress meetings the Site Manager will discuss performance feedback with the employee taking into consideration input from the Services Line Manager(s). The annual salary review process shall be administered in accordance with local policies and procedures. For any dispute or difference between site staff and their employer local standard employment procedures and policies for resolving disputes shall apply. Human Resources rs_msc35 Page 7 of 9 Date issued: 17-Jun-15
UC High Performance Computing Person Specification HPC System and Application Support Specialist - NeSI Fixed Term Education A bachelor degree in a computationally intensive field is essential and a postgraduate degree in the same filed is preferred. Technical or Professional Knowledge, Skills, and Experience Essential At least five years experience in the operation of services based on Linux and/or Unix, preferably with some having been mission-critical services. At least two years experience in scientific computing HPC software and operating systems, such GPFS (Parallel File System), TSM/HSM (Storage management) and xcat (Configuration Management) Demonstrable capability in software management (including installation, package management, testing and revision control) and user support of HPC applications in science or engineering. Knowledge of the principles and practices of HPC systems and MPI/OpenMP. Application programming experience in C/C++ and/or Fortran. UNIX command line programming, scripting language knowledge and programming tools in a Linux environment. Familiarity with using a batch queuing systems (SLURM etc.). High level analytical, conceptual and problem resolutions skills. Excellent planning, organising and time management skills. Able to independently represent NeSI. Ability to identify and satisfy customer requirements. Ability to work both independently and collaboratively as part of a wider team. Strong work ethic and an ability to work well under pressure. Preferred Skills and Knowledge Prior experience with MPI, OpenMP, CUDA/accelerator programming and using various scientific applications. Debugger and IBM/Intel compiler experience. A genuine interest in different sciences. Human Resources rs_msc35 Page 8 of 9 Date issued: 17-Jun-15
Employment Checks Candidates who successfully reach the final stages of the selection process for this role will be required to undergo Employment Checks, inclusive of but not limited to Qualification and Criminal History/ or Police Vetting checks. A satisfactory report from the relevant agency will be a condition of employment. The University will, however, make the final decision as to whether the appropriate standard has been met. Behaviours These are the abilities, attributes and personal characteristics that the staff member will need to consistently display in order to achieve their Key Result Areas (KRAs) [that is, to do the job effectively]. These behaviours describe how someone does the job, whilst KRAs describe what is to be done. Customer Focus Developing and sustaining productive customer relationships and making their needs a primary focus of one s actions. Contributing to Team Success Actively participating as a member of a team to move the team toward the completion of goals. Work Standards Setting high standards of performance for self and others, and assuming responsibility for successful completion of tasks. Continuous Learning Actively identifying new areas for learning, seizing learning opportunities, and learning through the application of newly gained knowledge and skills. Communicating with Impact Clearly conveying information and ideas through a variety of media in a manner that engages the audience and helps them understand and retain the message. Valuing Diversity Appreciating and making best use of the diverse capabilities, insights and ideas of all individuals, and understanding differences in style, ability and motivation. Building Trust Interacting with others in a way that gives them confidence in one s intentions and those of UC. Creativity Creating novel ideas and approaches to tackle the tasks and responsibilities of one s role. HRPF: PositionDescription Human Resources rs_msc35 Page 9 of 9 Date issued: 17-Jun-15