Keys to node-level performance analysis and threading in HPC applications
|
|
- Chad Shields
- 7 years ago
- Views:
Transcription
1 Keys to node-level performance analysis and threading in HPC applications Thomas GUILLET (Intel; Exascale Computing Research) IFERC seminar, 18 March 2015
2 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #
3 Application performance: a multiscale problem Microarch Core Socket Node Cluster Multicore: vector ISA, cores, cache hierarchies, Manycore: new vector ISAs, MPI+OMP?, memory/core? Optimization space is getting larger Goal of this presentation: Provide keys to application performance and threading analysis Based on characterization & projection experience with full applications 3
4 Node-level performance Choice of algorithm or scheme Source code implementation Binary code Actual execution Programmer Data access patterns Compiler Vectorization Code generation Architecture Cache behavior Execution pathologies Memory bandwidth/data reuse optimizations Vectorization/code quality optimizations 2 main performance factors (at first order) : Memory (DRAM) bandwidth demand Computation: Flops (but also non-flop instructions sometimes), use of execution units Key questions: What are the requirements of my algorithm, in terms of compute vs. memory transfers? What performance can I expect? Where am I with respect to ideal performance? How can I get closer to ideal? 4
5 Flops, bytes & arithmetic intensity Arithmetic intensity = Flop/byte: a measure of compute vs. ideal data transfer balance for a particular kernel DAXPY (Triad) do i=1,n y(i) = y(i) + a*x(i) end do Read x Read y Compute y Write y 8N bytes 8N bytes 2N Flops 8N bytes Flop/byte = 2/24 = D Stencil (Gauss-Seidel) do k=1,n do j=1,n do i=1,n x(i,j,k) = ONE_SIXTH * ( & x(i+1,j,k) + x(i-1,j,k) + & x(i,j+1,k) + x(i,j-1,k) + & x(i,j,k+1) + x(i,j,k-1)) end do end do end do Read x Compute update Write new x 8N^3 bytes 6N^3 Flops 8N^3 bytes Flop/byte = 6/16 = Source code level analysis: Count floating point operations Count bytes (arrays) read & written, assume perfect reuse (infinite cache) ideal case 5
6 Compute vs. bandwidth analysis Quantitative System Performance, D. Lazowska, J. Zahorjan, G. Graham, K. Sevcik Williams et al., log GFLOP/s = performance Compute bound Ideal execution Actual vs. ideal execution: Efficiency (% peak) depends on microarch. Finite cache size will reduce flop/byte Actual execution Vectorization, Code generation Data reuse, Cache optims Actual Flop/byte Theoretical Flop/byte log Flop/byte = arithmetic intensity Measuring data for actual execution: GFlops/s derived from code performance: GFlops/s = Gcells/s Flops/cell DRAM bandwidth Flop/byte = (GFlop/s) / (GB/s) Intel VTune Amplifier XE Open source tools, e.g. Requires root access or special kernel module 6
7 Illustration: GYSELA kernels on Xeon 2 sockets, Xeon E (Sandy Bridge, 2.6 GHz) This kernel is BW bound when vectorized, but compute bound when not vectorized! 7
8 Illustration: GYSELA kernels on Xeon Phi Xeon Phi 7120 (16 GB GDDR, 61 cores, 1.2GHz) Efficiency drops for complex loop bodies Smaller caches incur more memory traffic 8
9 Node-level characterization: Wrap Up Simple compute vs. bandwidth characterization («roofline») Helps determine max performance expectations Allows to identify optimization directions Can be complemented by quick analysis tricks Measure time on 1 full node (avail b/w = BW 1 ), and write: T 1 full = T compute + T bw Measure time on 2 half-filled nodes (avail b/w = BW 2 > BW 1 ), and write: T 2 half = T compute + T bw (BW 1 BW 2 ) Solve for T compute and T bw to estimate «memory-boundedness» of app on this architecture Also useful for quick projections across similar architectures General trends on Xeon Phi Smaller caches incur more memory demand In-order core, complex vector ISA compiler and code generation matter So far, we assumed good parallelism (no threading or MPI issues) 9
10 Shared memory: To thread or not to thread? Why is threading interesting in applications? Allows «larger» MPI ranks (for domain decomp.) for a same problem May improve surface/volume ratio Amortizes memory footprint of MPI runtime Allows dynamic load balancing for imbalanced applications What could possibly go wrong? Amdahl s law strikes back On computation: getting good coverage is hard On communications MPI+X is not intrinsically «better» than MPI 4x1 v.s. 1x4 10
11 200 Illustration: CFD application Configurations with {#ranks} x {#threads} = 24 cores Temporal loop wtime [s] 120 Footprint/core [MB] 2.5E+11 App instructions/core E E E E E Measured [s] Amdahl projection OMP threads/rank
12 Ranks x Threads Illustration: CFD application Configurations with {#ranks} x {#threads} = 24 cores Wtime spent inside OpenMP parallel regions CFD app example: Wall time [s] on master thread x1 12x2 6x4 Wtime spent in MPI library grows with # threads OMP Serial MPI 4x6 2x12 Non-threaded computation wtime («Amdahl s law on threads»)
13 Can threading help with imbalance? [synthetic data for illustration] Small-scale 50% imbalance Large-scale 50% imbalance Imbalance time = max - mean Shared mem dynamic load balancing may be effective against imbalance Shared mem dynamic load balancing ineffective alone against imbalance core id core id
14 Ranks x Threads Threading and imbalance: Highly imbalanced adaptive mesh refinement code OMP computation scales less than ideally Wall time [s] on master thread, rank x1 12x2 OMP Serial MPI 8x3 6x4 Threading helps reduce extreme MPI imbalance 4x6 2x12 But Amdahl s law still overtakes at high thread counts
15 OpenMP: things to watch for in apps Code coverage (a.k.a. Amdahl s law) Extensive coverage is critical for scalability Can be very tedious/impossible to achieve for flat-profile applications Coarse threading ( loop-level) helps, but reimplementing MPI doesn t Granularity Important metric = average wall time of OpenMP regions Compare to OpenMP barrier/sync time Both points grow in importance on Xeon Phi Lots of threads coverage grows in importance Limited memory/core short loops Vtune profiling can help diagnose both issues 15
16 Wrap-up Careful performance analysis is essential to guide code optimizations Set pragmatic performance targets Collect data on application behavior Simple compute vs. bandwidth model can provide: Robust first-order characterization Insights into specific or second-order effects Threading can help address some strong-scaling issues Amortize halo overheads, level out imbalance No magic: obtaining good coverage is hard work Threading: an important adjustment variable for Heterogeneous computing resources (e.g. symmetric mode) Available memory/core 16
17
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationScaling up to Production
1 Scaling up to Production Overview Productionize then Scale Building Production Systems Scaling Production Systems Use Case: Scaling a Production Galaxy Instance Infrastructure Advice 2 PRODUCTIONIZE
More informationINTEL PARALLEL STUDIO XE EVALUATION GUIDE
Introduction This guide will illustrate how you use Intel Parallel Studio XE to find the hotspots (areas that are taking a lot of time) in your application and then recompiling those parts to improve overall
More informationContributed Article Program and Intel DPD Search Optimization Training. John McHugh and Steve Moore January 2012
Contributed Article Program and Intel DPD Search Optimization Training John McHugh and Steve Moore January 2012 Contributed Article Program Publish good stuff and get paid John McHugh Marcom 2 Contributed
More informationThree Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture
White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive
More informationTowards OpenMP Support in LLVM
Towards OpenMP Support in LLVM Alexey Bataev, Andrey Bokhanko, James Cownie Intel 1 Agenda What is the OpenMP * language? Who Can Benefit from the OpenMP language? OpenMP Language Support Early / Late
More informationThe ROI from Optimizing Software Performance with Intel Parallel Studio XE
The ROI from Optimizing Software Performance with Intel Parallel Studio XE Intel Parallel Studio XE delivers ROI solutions to development organizations. This comprehensive tool offering for the entire
More informationThe Foundation for Better Business Intelligence
Product Brief Intel Xeon Processor E7-8800/4800/2800 v2 Product Families Data Center The Foundation for Big data is changing the way organizations make business decisions. To transform petabytes of data
More informationHigh Performance Computing and Big Data: The coming wave.
High Performance Computing and Big Data: The coming wave. 1 In science and engineering, in order to compete, you must compute Today, the toughest challenges, and greatest opportunities, require computation
More informationNew Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC
New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC Alan Gara Intel Fellow Exascale Chief Architect Legal Disclaimer Today s presentations contain forward-looking
More informationIntel Media SDK Library Distribution and Dispatching Process
Intel Media SDK Library Distribution and Dispatching Process Overview Dispatching Procedure Software Libraries Platform-Specific Libraries Legal Information Overview This document describes the Intel Media
More informationVendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.
Vendor Update Intel 49 th IDC HPC User Forum Mike Lafferty HPC Marketing Intel Americas Corp. Legal Information Today s presentations contain forward-looking statements. All statements made that are not
More informationYALES2 porting on the Xeon- Phi Early results
YALES2 porting on the Xeon- Phi Early results Othman Bouizi Ghislain Lartigue Innovation and Pathfinding Architecture Group in Europe, Exascale Lab. Paris CRIHAN - Demi-journée calcul intensif, 16 juin
More informationIntel Platform and Big Data: Making big data work for you.
Intel Platform and Big Data: Making big data work for you. 1 From data comes insight New technologies are enabling enterprises to transform opportunity into reality by turning big data into actionable
More informationExascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation
Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03
More informationAccelerating Business Intelligence with Large-Scale System Memory
Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness
More informationMAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
More informationIntel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual
Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Overview Metrics Monitor is part of Intel Media Server Studio 2015 for Linux Server. Metrics Monitor is a user space shared library
More informationOpenMP* 4.0 for HPC in a Nutshell
OpenMP* 4.0 for HPC in a Nutshell Dr.-Ing. Michael Klemm Senior Application Engineer Software and Services Group (michael.klemm@intel.com) *Other brands and names are the property of their respective owners.
More informationImprove Fortran Code Quality with Static Analysis
Improve Fortran Code Quality with Static Analysis This document is an introductory tutorial describing how to use static analysis on Fortran code to improve software quality, either by eliminating bugs
More informationAccelerating Business Intelligence with Large-Scale System Memory
Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness
More informationOpenMP and Performance
Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Tuning Cycle Performance Tuning aims to improve the runtime of an
More informationParallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
More informationIntel Many Integrated Core Architecture: An Overview and Programming Models
Intel Many Integrated Core Architecture: An Overview and Programming Models Jim Jeffers SW Product Application Engineer Technical Computing Group Agenda An Overview of Intel Many Integrated Core Architecture
More informationMonte Carlo Method for Stock Options Pricing Sample
Monte Carlo Method for Stock Options Pricing Sample User's Guide Copyright 2013 Intel Corporation All Rights Reserved Document Number: 325264-003US Revision: 1.0 Document Number: 325264-003US Intel SDK
More informationFinding Performance and Power Issues on Android Systems. By Eric W Moore
Finding Performance and Power Issues on Android Systems By Eric W Moore Agenda Performance & Power Tuning on Android & Features Needed/Wanted in a tool Some Performance Tools Getting a Device that Supports
More informationLarge-Data Software Defined Visualization on CPUs
Large-Data Software Defined Visualization on CPUs Greg P. Johnson, Bruce Cherniak 2015 Rice Oil & Gas HPC Workshop Trend: Increasing Data Size Measuring / modeling increasingly complex phenomena Rendering
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationOverview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation
Overview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation Agenda 1D interpolation problem statement Computation flow Application areas Data fitting in Intel MKL Data
More informationAssessing the Performance of OpenMP Programs on the Intel Xeon Phi
Assessing the Performance of OpenMP Programs on the Intel Xeon Phi Dirk Schmidl, Tim Cramer, Sandra Wienke, Christian Terboven, and Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum
More informationIntel Media Server Studio Professional Edition for Windows* Server
Intel Media Server Studio 2015 R3 Professional Edition for Windows* Server Release Notes Overview What's New System Requirements Installation Installation Folders Known Limitations Legal Information Overview
More informationIntel Service Assurance Administrator. Product Overview
Intel Service Assurance Administrator Product Overview Running Enterprise Workloads in the Cloud Enterprise IT wants to Start a private cloud initiative to service internal enterprise customers Find an
More informationMeasuring Cache and Memory Latency and CPU to Memory Bandwidth
White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary
More informationCloud Computing through Virtualization and HPC technologies
Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC
More informationIntel X38 Express Chipset Memory Technology and Configuration Guide
Intel X38 Express Chipset Memory Technology and Configuration Guide White Paper January 2008 Document Number: 318469-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationImplementation and Performance of AES-NI in CyaSSL. Embedded SSL
Implementation and Performance of AES-NI in CyaSSL Embedded SSL In 2010, Intel introduced the 32nm Intel microarchitecture code name Westmere. With this introduction, Intel announced support for a new
More informationThe Transition to PCI Express* for Client SSDs
The Transition to PCI Express* for Client SSDs Amber Huffman Senior Principal Engineer Intel Santa Clara, CA 1 *Other names and brands may be claimed as the property of others. Legal Notices and Disclaimers
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationGet an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows*
Get an Easy Performance Boost Even with Unthreaded Apps for Windows* Can recompiling just one file make a difference? Yes, in many cases it can! Often, you can achieve a major performance boost by recompiling
More informationEvaluating Intel Virtualization Technology FlexMigration with Multi-generation Intel Multi-core and Intel Dual-core Xeon Processors.
Evaluating Intel Virtualization Technology FlexMigration with Multi-generation Intel Multi-core and Intel Dual-core Xeon Processors. Executive Summary: In today s data centers, live migration is a required
More information-------- Overview --------
------------------------------------------------------------------- Intel(R) Trace Analyzer and Collector 9.1 Update 1 for Windows* OS Release Notes -------------------------------------------------------------------
More informationCOLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationIntelligent Business Operations
White Paper Intel Xeon Processor E5 Family Data Center Efficiency Financial Services Intelligent Business Operations Best Practices in Cash Supply Chain Management Executive Summary The purpose of any
More informationDesign and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
More informationIntel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide
Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide White Paper August 2007 Document Number: 316971-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationElemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva robert.geva@intel.com Introduction Intel
More informationUsing the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial
Using the Intel Xeon Phi (with the Stampede Supercomputer) ISC 13 Tutorial Bill Barth, Kent Milfeld, Dan Stanzione Tommy Minyard Texas Advanced Computing Center Jim Jeffers, Intel June 2013, Leipzig, Germany
More informationIrregular Applications and their Architectural Challenges
Irregular Applications and their Architectural Challenges Pradeep K. Dubey Intel Fellow and Fellow of IEEE IA^3 - SC12 Workshop Emerging Applications and sources of Irregularity 2 Who Needs Compute Traditional
More informationBig Data Visualization on the MIC
Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth timothy.dykes@port.ac.uk Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth
More informationImprove Fortran Code Quality with Static Security Analysis (SSA)
Improve Fortran Code Quality with Static Security Analysis (SSA) with Intel Parallel Studio XE This document is an introductory tutorial describing how to use static security analysis (SSA) on C++ code
More informationLarge Scale Simulation on Clusters using COMSOL 4.2
Large Scale Simulation on Clusters using COMSOL 4.2 Darrell W. Pepper 1 Xiuling Wang 2 Steven Senator 3 Joseph Lombardo 4 David Carrington 5 with David Kan and Ed Fontes 6 1 DVP-USAFA-UNLV, 2 Purdue-Calumet,
More informationExtended Attributes and Transparent Encryption in Apache Hadoop
Extended Attributes and Transparent Encryption in Apache Hadoop Uma Maheswara Rao G Yi Liu ( 刘 轶 ) Who we are? Uma Maheswara Rao G - umamahesh@apache.org - Software Engineer at Intel - PMC/committer, Apache
More informationCloud-based Analytics and Map Reduce
1 Cloud-based Analytics and Map Reduce Datasets Many technologies converging around Big Data theme Cloud Computing, NoSQL, Graph Analytics Biology is becoming increasingly data intensive Sequencing, imaging,
More informationBuilding an energy dashboard. Energy measurement and visualization in current HPC systems
Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 thomas.geenen@surfsara.nl SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators
More informationPerformance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
More informationMulti-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationIMAGE SIGNAL PROCESSING PERFORMANCE ON 2 ND GENERATION INTEL CORE MICROARCHITECTURE PRESENTATION PETER CARLSTON, EMBEDDED & COMMUNICATIONS GROUP
IMAGE SIGNAL PROCESSING PERFORMANCE ON 2 ND GENERATION INTEL CORE MICROARCHITECTURE PRESENTATION PETER CARLSTON, EMBEDDED & COMMUNICATIONS GROUP Q3 2011 325877-001 1 Legal Notices and Disclaimers INFORMATION
More informationUnlocking Hidden Potential at Intel Through Big Data Analytics
Unlocking Hidden Potential at Intel Through Big Data Analytics Ivan Harrow Director Insights & Analytics Intel IT @ivanh Legal Notices This presentation is for informational purposes only. INTEL MAKES
More informationIntel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study
Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study The adoption of cloud computing creates many challenges and opportunities in big data management and storage. To
More informationDeveloping High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services
Reference Architecture Developing Storage Solutions with Intel Cloud Edition for Lustre* and Amazon Web Services Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud
More information* * * Intel RealSense SDK Architecture
Multiple Implementations Intel RealSense SDK Architecture Introduction The Intel RealSense SDK is architecturally different from its predecessor, the Intel Perceptual Computing SDK. If you re a developer
More informationFloating-point control in the Intel compiler and libraries or Why doesn t my application always give the expected answer?
Floating-point control in the Intel compiler and libraries or Why doesn t my application always give the expected answer? Software Solutions Group Intel Corporation 2012 *Other brands and names are the
More informationKashif Iqbal - PhD Kashif.iqbal@ichec.ie
HPC/HTC vs. Cloud Benchmarking An empirical evalua.on of the performance and cost implica.ons Kashif Iqbal - PhD Kashif.iqbal@ichec.ie ICHEC, NUI Galway, Ireland With acknowledgment to Michele MicheloDo
More informationAccomplish Optimal I/O Performance on SAS 9.3 with
Accomplish Optimal I/O Performance on SAS 9.3 with Intel Cache Acceleration Software and Intel DC S3700 Solid State Drive ABSTRACT Ying-ping (Marie) Zhang, Jeff Curry, Frank Roxas, Benjamin Donie Intel
More informationIntel Solid-State Drives Increase Productivity of Product Design and Simulation
WHITE PAPER Intel Solid-State Drives Increase Productivity of Product Design and Simulation Intel Solid-State Drives Increase Productivity of Product Design and Simulation A study of how Intel Solid-State
More informationFLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
More informationHaswell Cryptographic Performance
White Paper Sean Gulley Vinodh Gopal IA Architects Intel Corporation Haswell Cryptographic Performance July 2013 329282-001 Executive Summary The new Haswell microarchitecture featured in the 4 th generation
More informationBig Data for Big Science. Bernard Doering Business Development, EMEA Big Data Software
Big Data for Big Science Bernard Doering Business Development, EMEA Big Data Software Internet of Things 40 Zettabytes of data will be generated WW in 2020 1 SMART CLIENTS INTELLIGENT CLOUD Richer user
More informationHetero Streams Library 1.0
Release Notes for release of Copyright 2013-2016 Intel Corporation All Rights Reserved US Revision: 1.0 World Wide Web: http://www.intel.com Legal Disclaimer Legal Disclaimer You may not use or facilitate
More informationPerformance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis
Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work
More informationIT@Intel. Comparing Multi-Core Processors for Server Virtualization
White Paper Intel Information Technology Computer Manufacturing Server Virtualization Comparing Multi-Core Processors for Server Virtualization Intel IT tested servers based on select Intel multi-core
More informationSAP * Mobile Platform 3.0 Scaling on Intel Xeon Processor E5 v2 Family
White Paper SAP* Mobile Platform 3.0 E5 Family Enterprise-class Security SAP * Mobile Platform 3.0 Scaling on Intel Xeon Processor E5 v2 Family Delivering Incredible Experiences to Mobile Users Executive
More informationIntel 965 Express Chipset Family Memory Technology and Configuration Guide
Intel 965 Express Chipset Family Memory Technology and Configuration Guide White Paper - For the Intel 82Q965, 82Q963, 82G965 Graphics and Memory Controller Hub (GMCH) and Intel 82P965 Memory Controller
More informationMeasuring Processor Power
White Paper Intel Xeon Processor Processor Architecture Analysis Measuring Processor Power TDP vs. ACP Specifications for the power a microprocessor can consume and dissipate can be complicated and may
More informationLS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
More informationINTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism
Intel Cilk Plus: A Simple Path to Parallelism Compiler extensions to simplify task and data parallelism Intel Cilk Plus adds simple language extensions to express data and task parallelism to the C and
More informationIntel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms
Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms Enomaly Elastic Computing Platform, * Service Provider Edition Executive Summary Intel Cloud Builder Guide
More informationMPI Application Tune-Up Four Steps to Performance
MPI Application Tune-Up Four Steps to Performance Abstract Cluster systems continue to grow in complexity and capability. Getting optimal performance can be challenging. Making sense of the MPI communications,
More informationHPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK
HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK Barry Davis, General Manager, High Performance Fabrics Operation Data Center Group, Intel Corporation Legal Disclaimer Today s presentations contain
More informationOpenFOAM: Computational Fluid Dynamics. Gauss Siedel iteration : (L + D) * x new = b - U * x old
OpenFOAM: Computational Fluid Dynamics Gauss Siedel iteration : (L + D) * x new = b - U * x old What s unique about my tuning work The OpenFOAM (Open Field Operation and Manipulation) CFD Toolbox is a
More information21152 PCI-to-PCI Bridge
Product Features Brief Datasheet Intel s second-generation 21152 PCI-to-PCI Bridge is fully compliant with PCI Local Bus Specification, Revision 2.1. The 21152 is pin-to-pin compatible with Intel s 21052,
More informationBandwidth Calculations for SA-1100 Processor LCD Displays
Bandwidth Calculations for SA-1100 Processor LCD Displays Application Note February 1999 Order Number: 278270-001 Information in this document is provided in connection with Intel products. No license,
More informationCLOUD SECURITY: Secure Your Infrastructure
CLOUD SECURITY: Secure Your Infrastructure 1 Challenges to security Security challenges are growing more complex. ATTACKERS HAVE EVOLVED TECHNOLOGY ARCHITECTURE HAS CHANGED NIST, HIPAA, PCI-DSS, SOX INCREASED
More informationIntroducing the First Datacenter Atom SOC
Introducing the First Datacenter Atom SOC Diane Bryant Intel Vice President, General Manager, Datacenter & Connected Systems Group, Intel Jason Waxman General Manager, Cloud Platform Group, Intel December
More informationCPU Session 1. Praktikum Parallele Rechnerarchtitekturen. Praktikum Parallele Rechnerarchitekturen / Johannes Hofmann April 14, 2015 1
CPU Session 1 Praktikum Parallele Rechnerarchtitekturen Praktikum Parallele Rechnerarchitekturen / Johannes Hofmann April 14, 2015 1 Overview Types of Parallelism in Modern Multi-Core CPUs o Multicore
More informationCreate Natural User Interfaces with the Next-Generation Intel Perceptual Computing SDK
Create Natural User Interfaces with the Next-Generation Intel Perceptual Computing SDK Ryan Tabrah, Group Manager, UX Developer Products @PerceptualSDK Intel Innovation: Transforming the Game Intel's Vision
More informationIntel True Scale Fabric Architecture. Enhanced HPC Architecture and Performance
Intel True Scale Fabric Architecture Enhanced HPC Architecture and Performance 1. Revision: Version 1 Date: November 2012 Table of Contents Introduction... 3 Key Findings... 3 Intel True Scale Fabric Infiniband
More informationBest Practices for Increasing Ceph Performance with SSD
Best Practices for Increasing Ceph Performance with SSD Jian Zhang Jian.zhang@intel.com Jiangang Duan Jiangang.duan@intel.com Agenda Introduction Filestore performance on All Flash Array KeyValueStore
More informationOverview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationHETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
More informationIntel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms
Intel Cloud Builders Guide Intel Xeon Processor-based Servers RES Virtual Desktop Extender Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Client Aware Cloud with RES Virtual
More informationDesigning and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
More informationWhat is in Your Workstation?
Product Brief E3-1200 Family What is in Your Workstation? Why choose E3-based workstations versus i3, i5 and i7 -based desktops -based workstations represent the premier platform used by industry innovators
More information<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization
An Experimental Model to Analyze OpenMP Applications for System Utilization Mark Woodyard Principal Software Engineer 1 The following is an overview of a research project. It is intended
More informationCUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
More informationHardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
More information