BENCHMARKING V ISUALIZATION TOOL



Similar documents
How To Use Splunk For Android (Windows) With A Mobile App On A Microsoft Tablet (Windows 8) For Free (Windows 7) For A Limited Time (Windows 10) For $99.99) For Two Years (Windows 9

Stream Deployments in the Real World: Enhance Opera?onal Intelligence Across Applica?on Delivery, IT Ops, Security, and More

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

Splunk for Networking and SDN

Architec;ng Splunk for High Availability and Disaster Recovery

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

Incident Response Using Splunk for State and Local Governments

Architec;ng Splunk for High Availability and Disaster Recovery

An Open Dynamic Big Data Driven Applica3on System Toolkit

Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM

DDC Sequencing and Redundancy

Business Analysis Standardization A Strategic Mandate. John E. Parker CVO, Enfocus Solu7ons Inc.

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Connec(ng to the NC Educa(on Cloud

PROJECT PORTFOLIO SUITE

Data Centric Systems (DCS)

Big Data in medical image processing

Accelera'ng Your Solu'on Development with Splunk Reference Apps

Big Data Research at DKRZ

Sceneric Quote Engine

NextGen Infrastructure for Big DATA Analytics.

Data Stream Algorithms in Storm and R. Radek Maciaszek

Effec%ve AX 2012 Upgrade Project Planning and Microso< Sure Step. Arbela Technologies

Blue Medora VMware vcenter Opera3ons Manager Management Pack for Oracle Enterprise Manager

Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope

Exchange of experience from a SuccessFactors LMS Implementa9on

Mission. To provide higher technological educa5on with quality, preparing. competent professionals, with sound founda5ons in science, technology

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Solving today's challenges with Oracle SOA Suite, and Oracle Coherence

Strategy and Architecture to Establish 'Smart Plants'

Cloud Based Tes,ng & Capacity Planning (CloudPerf)

So#ware quality assurance - introduc4on. Dr Ana Magazinius

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

MAXIMIZING THE SUCCESS OF YOUR E-PROCUREMENT TECHNOLOGY INVESTMENT. How to Drive Adop.on, Efficiency, and ROI for the Long Term

Deploying the Splunk App for Microso> Exchange

Crowdsourcing the Matrix: Improving the Service Desk Experience and ITIL/ SDLC Processes

Bank of America Security by Design. Derrick Barksdale Jason Gillam

Online Enrollment Op>ons - Sales Training Benefi+ocus.com, Inc. All rights reserved. Confiden>al and Proprietary 1

Big Data Use Cases. At Salesforce.com. Narayan Bharadwaj Director, Product Management

Interna'onal Standards Ac'vi'es on Cloud Security EVA KUIPER, CISA CISSP HP ENTERPRISE SECURITY SERVICES

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

Experiments on cost/power and failure aware scheduling for clouds and grids

High Performance Computing. Course Notes HPC Fundamentals

Case Study. The SACM Journey at the Ontario Government

Project Overview. Collabora'on Mee'ng with Op'mis, Sept. 2011, Rome

Phone Systems Buyer s Guide

Automate the monitoring of your Network through PMp

PRIMERGY server-based High Performance Computing solutions

Big Data and Health Insurance Product Selec6on (and a few other applica6on) Jonathan Kolstad UC Berkeley and NBER

Performance Management in Big Data Applica6ons. Michael Kopp, Technology

Strategies for Medical Device So2ware Development Presented By Anthony Giles of Blackwood Embedded Solu;ons And a Case Study by Francis Amoah of Creo

Project Por)olio Management

How To Understand Cloud Compueng

GeBng Started with Splunk MINT

Perspec'ves on SDN. Roadmap to SDN Workshop, LBL

The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems

Building your cloud porbolio APS Connect

Privileged Administra0on Best Prac0ces :: September 1, 2015

Shannon Rykaceski Director of Opera4ons CCFHCC

The Real Score of Cloud

OS/Run'me and Execu'on Time Produc'vity

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

VoIP Security How to prevent eavesdropping on VoIP conversa8ons. Dmitry Dessiatnikov

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

WINDOWS AZURE AND WINDOWS HPC SERVER

Splunk for.net Developers

Kaseya Fundamentals Workshop DAY THREE. Developed by Kaseya University. Powered by IT Scholars

Transcription:

Copyright 2014 Splunk Inc. BENCHMARKING V ISUALIZATION TOOL J. Green Computer Scien<st High Performance Compu<ng Systems Los Alamos Na<onal Laboratory

Disclaimer During the course of this presenta<on, we may make forward- looking statements regarding future events or the expected performance of the company. We cau<on you that such statements reflect our current expecta<ons and es<mates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in the this presenta<on are being made as of the <me and date of its live presenta<on. If reviewed ater its live presenta<on, this presenta<on may not contain current or accurate informa<on. We do not assume any obliga<on to update any forward- looking statements we may make. In addi<on, any informa<on about our roadmap outlines our general product direc<on and is subject to change at any <me without no<ce. It is for informa<onal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obliga<on either to develop the features or func<onality described or to include any such feature or func<onality in a future release. 2

Introduc<on: High Performance Compu<ng @ LANL 3

High Performance Computing (HPC)! A.k.a Supercompu<ng Providing super- sized computers (distributed systems) for numerically intensive / data intensive computa<ons A.k.a Supercomputing! Providing super-sized computers for numerically intensive / data intensive computations! 4

Our Nation, Our Lab, Our Mission! Ensure our goals align with Lab s mission, which aligns with the Na<onal Nuclear Security Administra<on Goals Provide state- of- the- art pladorms that sa<sfy stakeholders requirements Na<onal Security Mission LANL s Mission Nuclear Non- Prolifera<on; Na<onal Safety and Security Apply Scien<fic Excellence to Na<onal Security Missions HPC s Mission Enable Scien<fic Discovery via World Class High Performance Compu<ng Resources 5

How Fast Is Fast? Petascale 96 Cabinets ~9,000 Nodes 100,000s of cores Looking to Exascale! 6

Presenta<on Overview 7

Sec<ons Covered Sections Covered Base-lining for Rapid Intervention via Continual Testing! Systems Monitoring and Test Data Correlation! Test results analysis! 8

Introduc<on: Drivers to Automate Tes<ng 9

Why Test? Ensure that Delivered Components Match Performance Specifica<ons Test: Valida<on of Computa<onal Accuracy Sustained Performance [ Computer Life Cycle ]! 10

LANL s High Performance Compu<ng Tes<ng Strategy ProacDve TesDng to Improve Reliability Acceptance Tes<ng Integra<on Tes<ng Correctness Tes<ng Regression Tes<ng Performance Tes<ng SoTware Tes<ng Fault Tolerance Tes<ng Resilience Tes<ng Parameter Studies [ omg that s a lot of tes<ng ] 11

Do I want to rely on someone else when this thing breaks? COTS Solu<on Decision Tree Decide on Solu<on Do I really want to be responsible when this thing Extensibility? breaks? Status Quo DOESN T EXIST w/o SEVERE MODIFICATION Ease of Use? Labor Intensity? Requirements Sustainability? WRITE OWN TOOL TAILORED TO OUR NEEDS Manpower Req ts? Etc. 12 CONTINUE TO HACK ON RUN SCRIPTS Ease of Deployment? Standard Data Output?

Data Flow Diagram for New Test Harness [ DB CONNECT ] [ SPLUNK APP INTERFACE ] Initial Design Plan for Developing a more Robust Test Harness, presented to Salishan, Conference on High-Speed Computing, 2011 13

Base- Lining for Rapid Interven<on via Con<nual Tes<ng 14

Categories of Sections Covered Performance Tests Memory Bandwidth Tests IO Bandwidth Tests CPU Speed Tests Accelerator Speed Tests Infiniband (IB) Tests Mini Applica<ons (Total System Tests) 15

Memory Bandwidth Tes<ng Sections Covered Stream Memory Bandwidth Test (McAlpin, et. al) Performs 4 computa<ons Main Memory Bandwidth per Processor Triad is the money computa<on indicates performance expected with typical scien<fic computa<ons Expect Tight Performance Variances from Baseline Indicate Problem 16

CPU/GPU Sections Covered Performance Tes<ng Floa<ng Point Opera<ons Per Second (FLOP/s) is typical measure of computa<onal performance HPL - High Performance Linpack (Dongarra, et. al ) FLOPs are free, as per theme of SC 09 Enter HPCG Scalable Heterogeneous Cluster Benchmark (Spafford) 17

I/O in HPC Poten<ally the Biggest Bouleneck! Bursty File- system Performance Baseline Represented by Yellow Line 18 Parallel I/O follows pauerns of [ n to n] or [ n to 1] writes, reads Hidden in these system calls are file open, file close and stat opera<ons Can add unknown overhead to the opera<on Can create burdensome load to file- systems and overhead to applica<on if not programmed op<mally (i.e. open file handles, metadata overhead if too many files are simultaneously opened, etc. ) File- system tes<ng helps to iden<fy poten<al failures, and load impacts on running jobs

Whole Machine Performance Overview The supercomputer operates at 197 teraflops/sec. CollecDvely, it houses 9,856 compute cores and 19.7 terabytes of memory. It will give users working on unclassified projects access to 86.3 million central processing unit core hours/yr. Wolf will inidally be working on modeling the climate, materials, and astrophysical bodies and system. 1 Wolf, a New Supercomputer, Up and Running at Los Alamos Na;onal Lab h>p://machinedesign.com/ news/wolf- new- supercomputer- and- running- los- alamos- na;onal- lab 19

Systems Monitoring and Test Data Correla<on 20

Monitoring the Test Harness Sections Covered 21

Consistent Tes<ng Sections Covered [ Credit for this view: Dominic Manno ]! 22

U<liza<on Sta<s<cs Per Machine! Sections Covered 23

Test Results Analysis 24

Raw Data Parser Post Process Raw Test Data! Sections Covered DateStamp=$Date TestName=$TestName OS=$OS- Version MachineName=$MachineName NumNodes=$NumNodes TestMetric=$Measurement etc Must differen<ate data by: Test Name/version System Name Resources Used SoTware Versions Or valid results comparison is impossible! 25

Other Important Monitoring Panels! Sections Covered 26

Prototype f or N ew T est V iews! Sections Covered U<lizing a weighted radial line graph to visualize inter- nodal communica<on speeds, Prabhu Singh Khalsa, Scien<st, Los Alamos Na<onal Laboratory MPI BW Communication Visualization Tool Prototype Prabhu Khalsa 27

Future Plans 28

Going forward...! Sections Covered Integrate Fully New Test Harness Database Collec<on Into Splunk Vis. Fully Develop Custom Test Visualiza<ons to Suit Specific Teams Needs Use Monitoring (System / User) Data to Enhance Informa<on Team Specific Test Dashboards Fully Implement Monitoring Infrastructure Changes to Leverage Scalability Enhancements 29

Acknowledgements! Sections Covered Tes<ng is Crucial, Test Development is Itera<ve / Evolving Thanks for Pa<ence from Administra<ve Teams Thanks for Resources from Management / Oversight Thanks to Monitoring Team for Infrastructure Improvements Thanks to Dominic Manno / Ben Turrubiates, New Mexico Tech Excellent Work, Diligence, Valuable Contribu<ons Craig Idler, Scien<st, Enhancements to Gazebo + Pavilion Test Harnesses Mike Mason, Scien<st, Admin Assistance Splunk Guidance 30

Ques<ons? 31

Special Offer: Try Splunk MINT Express for Free! Splunk MINT offers a fast path to mobile intelligence. How fast? Find out with a 6- month trial* Register for your free trial: hup://mint.splunk.com/conf2014offer Download the Splunk MINT SDKs Add the Splunk MINT line of SDK code and publish** Start ge{ng digital intelligence at your finger<ps! *Offer valid for.conf2014 a>endees and coworkers of a>endees only. **Trial allows monitoring of up to 750,000 monthly ac;ve users (MAUs). 32

THANK YOU