Performance Analysis and Optimization Tool
|
|
|
- Steven Osborne
- 10 years ago
- Views:
Transcription
1 Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL Performance Analysis Team, University of Versailles
2 Introduction Performance Analysis Develop performance analysis and optimization tools: MAQAO Framework and Toolsuite Establish partnerships Optimize industrial applications
3 Introduction Performance Analysis Understand the performance of an application How well it behaves on a given machine What are the issues? Generally a multifaceted problem Maximizing the number of views = better understand Use techniques and tools to understand issues Once understood Optimize application
4 Introduction Compilation chain Compiler remains your best friend Be sure to select proper flags (e.g., -xavx) Pragmas: Unrolling, Vector alignment O2 V.S. O3 Vectorisation/optimisation report
5 Introduction MAQAO Tool Open source (LGPL 3.0) Currently binary release Available for x86-64 and Xeon Phi Looking forward in porting MAQAO on BlueGene
6 Introduction MAQAO Tool Easy install Packaging : ONE (static) standalone binary Easy to embeed Audience User/Tool developer: analysis and optimisation tool Performance tool developer: framework services TAU: tau_rewrite (MIL) ScoreP: on-going effort (MIL)
7 Introduction MAQAO Tool
8 Introduction MAQAO Tool Scripting language Lua language : simplicity and productivity Fast prototyping MAQAO Lua API : Access to services
9 Introduction MAQAO Tool Built on top of the Framework Loop-centric approach Produce reports We deal with low level details You get high level reports
10 Introduction MAQAO Tool A lot of tools! Which one to use? When Our approach/experience: decision tree Currently working on HPC Multi-node > Node > Socket > Core Classify IO/Memory/MPI/OpenMP/Application PAMDA methodology to be published: 7 th Parallel Tools Workshop
11 Outline Introduction Pinpointing hotspots Functions, loops MPI characterization Code quality analysis
12 Pinpointing hotspots Measurement methodology MAQAO Profiling Instrumentation Through binary rewriting High overhead / More precision Sampling Hardware counter through perf_event_open system call Very low overhead / less details
13 Pinpointing hotspots Parallelism level SPMD Program level SIMD Instruction level By default MAQAO only considers system processes and threads
14 Pinpointing hotspots Display Display functions and their exclusive time Associated callchains and their contribution Loops Innermost loops can then be analyzed by the code quality analyzer module (CQA) Command line and GUI (HTML) outputs
15 Pinpointing hotspots GUI snapshot (1/4)
16 Pinpointing hotspots GUI snapshot (2/4)
17 Pinpointing hotspots GUI snapshot (3/4)
18 Pinpointing hotspots GUI snapshot (4/4)
19 Outline Introduction Pinpointing hotspots Functions, loops MPI characterization Code quality analysis
20 Pinpointing hotspots MPI characterization: Introduction Our methodology Coarse grain: overview, global trends/patterns Fine grain: filtering precise issues Tracing issues Scalability Memory size: can we reduce it? Trace size: can we reduce it? IO s wall: remove it?
21 Pinpointing hotspots MPI characterization: overview Scalable coarse grain analysis Web Visualizer MPI APPLICATION MAQAO profile.js
22 Pinpointing hotspots MPI characterization: key concepts Online profiling Aggregated metrics No traces No IOs (only one result file) Reduced memory footprint Scalable on 100+ procs
23 Pinpointing hotspots MPI characterization: visualization Web based visualizer Only requires a web browser
24 Pinpointing hotspots MPI characterization: analyses (1/5) MPI primitives high level profile (hits,time,size)
25 Pinpointing hotspots MPI characterization: analyses (2/5) Function scattering over time
26 Pinpointing hotspots MPI characterization: analyses (3/5) Probability densities
27 Pinpointing hotspots MPI characterization: analyses (4/5) Topology view (1/2) Selector Force driven topology view Information Panel Color scale 3 D topology
28 Pinpointing hotspots MPI characterization: analyses (4/5) Topology view (2/2) Click on a node in the force layout to display its information Hover a node to see its MPI rank Hover an edge to see its value
29 Pinpointing hotspots MPI characterization: analyses (5/5) Communication matrix (1/3)
30 Pinpointing hotspots MPI characterization: analyses (5/5) Communication matrix (2/3) Zoom
31 Pinpointing hotspots MPI characterization: analyses (5/5) Communication matrix (3/3)
32 Pinpointing hotspots MPI characterization: next steps Fine grained analyses should be: Investigated using (MPI) tracing tools And filtering on specific processes/events of interest detected thanks to this tool
33 Outline Introduction Pinpointing hotspots Functions, loops MPI characterization Code quality analysis
34 Static performance modeling Introduction Main performance issues: Core level Multicore interactions Communications Most of the time core level is forgotten
35 Static performance modeling Goals Static performance model Targets innermost loops source loop V.S. assembly loop Take into account processor (micro)architecture Assess code quality Estimate performance Degree of vectorization Impact on micro architecture ASM Loop 1 Source Loop [email protected] ASM Loop 2 ASM Loop 4 ASM Loop 5 ASM Loop 3
36 Static performance modeling Model Simulates the target (micro)architecture Instructions description (latency, uops dispatch...) Machine model For a given binary and micro-architecture, provides Quality metrics (how well the binary is fitted to the micro architecture) Static performance (lower bounds on cycles) Hints and workarounds to improve static performance
37 Static performance modeling Metrics Vectorization (ratio and speedup) Allows to predict vectorization (if possible) speedup and increase vectorization ratio if it s worth High latency instructions (division/square root) Allows to use less precise but faster instructions like RCP (1/x) and RSQRT (1/sqrt(x)) Unrolling (unroll factor detection) Allows to statically predict performance for different unroll factors (find main loops)
38 Static performance modeling Output example (1/2)
39 Static performance modeling Output example (2/2)
40 MAQAO Tool Thanks for your attention! Questions?
MAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Performance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
How To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power
A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems TADaaM Team - Nicolas Denoyelle - Brice Goglin - Emmanuel Jeannot August 24, 2015 1. Context/Motivations
How To Visualize Performance Data In A Computer Program
Performance Visualization Tools 1 Performance Visualization Tools Lecture Outline : Following Topics will be discussed Characteristics of Performance Visualization technique Commercial and Public Domain
Debugging and Profiling Lab. Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma [email protected]
Debugging and Profiling Lab Carlos Rosales, Kent Milfeld and Yaakoub Y. El Kharma [email protected] Setup Login to Ranger: - ssh -X [email protected] Make sure you can export graphics
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
-------- Overview --------
------------------------------------------------------------------- Intel(R) Trace Analyzer and Collector 9.1 Update 1 for Windows* OS Release Notes -------------------------------------------------------------------
Load Imbalance Analysis
With CrayPat Load Imbalance Analysis Imbalance time is a metric based on execution time and is dependent on the type of activity: User functions Imbalance time = Maximum time Average time Synchronization
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
OpenFOAM: Computational Fluid Dynamics. Gauss Siedel iteration : (L + D) * x new = b - U * x old
OpenFOAM: Computational Fluid Dynamics Gauss Siedel iteration : (L + D) * x new = b - U * x old What s unique about my tuning work The OpenFOAM (Open Field Operation and Manipulation) CFD Toolbox is a
HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief
Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...
Getting Started with CodeXL
AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic
End-user Tools for Application Performance Analysis Using Hardware Counters
1 End-user Tools for Application Performance Analysis Using Hardware Counters K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, T. Spencer Abstract One purpose of the end-user tools described in
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102
Building an energy dashboard. Energy measurement and visualization in current HPC systems
Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 [email protected] SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators
A Brief Survery of Linux Performance Engineering. Philip J. Mucci University of Tennessee, Knoxville [email protected]
A Brief Survery of Linux Performance Engineering Philip J. Mucci University of Tennessee, Knoxville [email protected] Overview On chip Hardware Performance Counters Linux Performance Counter Infrastructure
Understanding applications using the BSC performance tools
Understanding applications using the BSC performance tools Judit Gimenez ([email protected]) German Llort([email protected]) Humans are visual creatures Films or books? Two hours vs. days (months) Memorizing
Recent Advances in Periscope for Performance Analysis and Tuning
Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
Keys to node-level performance analysis and threading in HPC applications
Keys to node-level performance analysis and threading in HPC applications Thomas GUILLET (Intel; Exascale Computing Research) IFERC seminar, 18 March 2015 Legal Disclaimer & Optimization Notice INFORMATION
Chapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
A Case Study - Scaling Legacy Code on Next Generation Platforms
Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy
Using the Windows Cluster
Using the Windows Cluster Christian Terboven [email protected] aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster
Scalability evaluation of barrier algorithms for OpenMP
Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science
High Performance Computing in the Multi-core Area
High Performance Computing in the Multi-core Area Arndt Bode Technische Universität München Technology Trends for Petascale Computing Architectures: Multicore Accelerators Special Purpose Reconfigurable
High Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
Wiggins/Redstone: An On-line Program Specializer
Wiggins/Redstone: An On-line Program Specializer Dean Deaver Rick Gorton Norm Rubin {dean.deaver,rick.gorton,norm.rubin}@compaq.com Hot Chips 11 Wiggins/Redstone 1 W/R is a Software System That: u Makes
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Matlab on a Supercomputer
Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides
Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led
Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led Course Description This three day course prepares IT Professionals to administer enterprise search solutions using
MAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 [email protected] 1.866.963.0424 www.simplehelix.com 2 Table of Contents
The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver
1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution
Sequential Performance Analysis with Callgrind and KCachegrind
Sequential Performance Analysis with Callgrind and KCachegrind 4 th Parallel Tools Workshop, HLRS, Stuttgart, September 7/8, 2010 Josef Weidendorfer Lehrstuhl für Rechnertechnik und Rechnerorganisation
CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson
CS 3530 Operating Systems L02 OS Intro Part 1 Dr. Ken Hoganson Chapter 1 Basic Concepts of Operating Systems Computer Systems A computer system consists of two basic types of components: Hardware components,
Best practices for efficient HPC performance with large models
Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization
An Experimental Model to Analyze OpenMP Applications for System Utilization Mark Woodyard Principal Software Engineer 1 The following is an overview of a research project. It is intended
Instruction Set Design
Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,
Resource Aware Scheduler for Storm. Software Design Document. <[email protected]> Date: 09/18/2015
Resource Aware Scheduler for Storm Software Design Document Author: Boyang Jerry Peng Date: 09/18/2015 Table of Contents 1. INTRODUCTION 3 1.1. USING
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
Large-Data Software Defined Visualization on CPUs
Large-Data Software Defined Visualization on CPUs Greg P. Johnson, Bruce Cherniak 2015 Rice Oil & Gas HPC Workshop Trend: Increasing Data Size Measuring / modeling increasingly complex phenomena Rendering
How To Program With Adaptive Vision Studio
Studio 4 intuitive powerful adaptable software for machine vision engineers Introduction Adaptive Vision Studio Adaptive Vision Studio software is the most powerful graphical environment for machine vision
SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)
Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania) Outline Introduction EO challenges; EO and classical/cloud computing; EO Services The computing platform Cluster -> Grid -> Cloud
CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content
Advances in Networks, Computing and Communications 6 92 CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content Abstract D.J.Moore and P.S.Dowland
Data Structure Oriented Monitoring for OpenMP Programs
A Data Structure Oriented Monitoring Environment for Fortran OpenMP Programs Edmond Kereku, Tianchao Li, Michael Gerndt, and Josef Weidendorfer Institut für Informatik, Technische Universität München,
FLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux
White Paper Real-time Capabilities for Linux SGI REACT Real-Time for Linux Abstract This white paper describes the real-time capabilities provided by SGI REACT Real-Time for Linux. software. REACT enables
Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10
Performance Counter Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10 Performance Counter Hardware Unit for event measurements Performance Monitoring Unit (PMU) Originally for CPU-Debugging
Delivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture
White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive
DB2 for i5/os: Tuning for Performance
DB2 for i5/os: Tuning for Performance Jackie Jansen Senior Consulting IT Specialist [email protected] August 2007 Agenda Query Optimization Index Design Materialized Query Tables Parallel Processing Optimization
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
INTEL PARALLEL STUDIO XE EVALUATION GUIDE
Introduction This guide will illustrate how you use Intel Parallel Studio XE to find the hotspots (areas that are taking a lot of time) in your application and then recompiling those parts to improve overall
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs
ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs ABSTRACT Xu Liu Department of Computer Science College of William and Mary Williamsburg, VA 23185 [email protected] It is
Cluster performance, how to get the most out of Abel. Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013
Cluster performance, how to get the most out of Abel Ole W. Saastad, Dr.Scient USIT / UAV / FI April 18 th 2013 Introduction Architecture x86-64 and NVIDIA Compilers MPI Interconnect Storage Batch queue
Software Engineering Best Practices. Christian Hartshorne Field Engineer Daniel Thomas Internal Sales Engineer
Software Engineering Best Practices Christian Hartshorne Field Engineer Daniel Thomas Internal Sales Engineer 2 3 4 Examples of Software Engineering Debt (just some of the most common LabVIEW development
Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH
Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory
Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1 A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks
Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup
Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to
Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda
Objectives At the completion of this module, you will be able to: Understand the intended purpose and usage models supported by the VTune Performance Analyzer. Identify hotspots by drilling down through
Microsoft Private Cloud Fast Track
Microsoft Private Cloud Fast Track Microsoft Private Cloud Fast Track is a reference architecture designed to help build private clouds by combining Microsoft software with Nutanix technology to decrease
Evaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected]
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected] Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.
Vendor Update Intel 49 th IDC HPC User Forum Mike Lafferty HPC Marketing Intel Americas Corp. Legal Information Today s presentations contain forward-looking statements. All statements made that are not
Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation
Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03
An Oracle Technical White Paper November 2011. Oracle Solaris 11 Network Virtualization and Network Resource Management
An Oracle Technical White Paper November 2011 Oracle Solaris 11 Network Virtualization and Network Resource Management Executive Overview... 2 Introduction... 2 Network Virtualization... 2 Network Resource
Petascale Software Challenges. Piyush Chaudhary [email protected] High Performance Computing
Petascale Software Challenges Piyush Chaudhary [email protected] High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab
Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event
An Introduction to Parallel Computing/ Programming
An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European
Optimization tools. 1) Improving Overall I/O
Optimization tools After your code is compiled, debugged, and capable of running to completion or planned termination, you can begin looking for ways in which to improve execution speed. In general, the
Analysis report examination with CUBE
Analysis report examination with CUBE Brian Wylie Jülich Supercomputing Centre CUBE Parallel program analysis report exploration tools Libraries for XML report reading & writing Algebra utilities for report
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Qingyu Meng, Alan Humphrey, Martin Berzins Thanks to: John Schmidt and J. Davison de St. Germain, SCI Institute Justin Luitjens
A Lab Course on Computer Architecture
A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,
Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment
Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment Nuno A. Carvalho, João Bordalo, Filipe Campos and José Pereira HASLab / INESC TEC Universidade do Minho MW4SOC 11 December
ELEC 377. Operating Systems. Week 1 Class 3
Operating Systems Week 1 Class 3 Last Class! Computer System Structure, Controllers! Interrupts & Traps! I/O structure and device queues.! Storage Structure & Caching! Hardware Protection! Dual Mode Operation
Petascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp
Petascale Software Challenges William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges Why should you care? What are they? Which are different from non-petascale? What has changed since
Parallel Algorithm for Dense Matrix Multiplication
Parallel Algorithm for Dense Matrix Multiplication CSE633 Parallel Algorithms Fall 2012 Ortega, Patricia Outline Problem definition Assumptions Implementation Test Results Future work Conclusions Problem
Testing & Assuring Mobile End User Experience Before Production. Neotys
Testing & Assuring Mobile End User Experience Before Production Neotys Agenda Introduction The challenges Best practices NeoLoad mobile capabilities Mobile devices are used more and more At Home In 2014,
