SciDAC Software Infrastructure for Lattice Gauge Theory
|
|
|
- Dennis Carr
- 9 years ago
- Views:
Transcription
1 SciDAC Software Infrastructure for Lattice Gauge Theory DOE meeting on Strategic Plan --- April 15, 2002 Software Co-ordinating Committee Rich Brower Boston University Carleton DeTar University of of Utah Robert Edwards Jefferson Laboratory Don Holmgren Fermi National Laboratory Bob Mawhinney Columbia University/BNL Celso Mendes University of of Illinois Chip Watson Jefferson Laboratory
2 SciDAC Software Infrastructure Goals Create a unified programming environment that will enable the US lattice community to achieve very high efficiency on diverse multi-terascale hardware Major Software Tasks I. QCD API and Code Library II. Optimize Network Communication III. Optimize Lattice QCD Kernels IV. Application Porting and Optimization V. Data Management and Documentation VI. Execution Environment
3 Participants in Software Development Project Bob Mawhinney Columbia ---- Chulwoo Jung BNL 100% Sept 1, 2001 Chris Miller BNL 100% Konstantin Petrov BNL 100% June 1, 2002 Don Holmgren FNAL 40% Jim Simone FNAL 35% Simon Epsteyn FNAL 10% Amitoj Sing FNAL 100% Jan 22, 2002 Daniel A. Reed Illinois ---- Celso L. Mendes Illinois 35% Oct 1, 2001
4 Robert Edwards Jlab ---- Chip Watson Jlab 33% Walt Akers Jlab 100% Jan 1, 2002 Jie Chen Jlab 100% Jan 1, 2002 Andrew Pochinsky MIT 100% Richard Brower BU 30% New Hire BU 100% (Oct 1, 2002) Carleton DeTar Utah ---- James Osborn Utah 50% Sept 1, 2001 Doug Toussaint Arizona ---- Eric Gregory Arizona 50% Oct 15, 2001
5 Lattice QCD extremely uniform Dirac operator: ( µ µ ) ψ ( ) ( ) = + ( ) Dψ x iga x x µ Lattice Operator: 1 Dψ ( x) = U x ψ x+ µ U x µ ψ x µ 2 ( ) ( ˆ) ( ˆ) ( ˆ) a µ Periodic or very simple boundary conditions SPMD: Identical sublattices per processor
6 QCD-API Level Structure Level 3 Dirac Operators, CG Routines etc. C, C++, etc. (Organized by MILC or SZIN or CPS etc.) QDP_XXX Level 2 Data Parallel QCD Lattice Operations (overlapping Algebra and Messaging) A = SHIFT(B, mu) * C; Global sums, etc Lattice Wide Linear Algebra (No Communication) Lattice Wide Data Movement (Pure Communication, non-blocking) e.g. A = B * C e.g Atemp = SHIFT(A, mu) QLA_XXX Level 1 QMP_XXX Single Site Linear Algebra API SU(3), gamma algebra etc. Message Passing API (Know about mapping of Lattice onto Network Geometry)
7 I. Design & Documentation of QCD-API Major Focus of Software Co-ordinating Committee Working documents on Published documents to appear on Design workshops: Jlab: Nov. 8-9, 2001, Feb 2, 2002 Next Workshop: MIT/BU: June, 2002 after Lattice 2002 Goal: C and C++ implementation for community review by Lattice 2002 in Boston, MA. Foster Linux style contributions to level 3 API library functions.
8 Data Parallel paradigm on top of Message Passing Data layout over processors Basic uniform operations across lattice: C(x) = A(x)*B(x) Map grid onto virtual machine grid. API should hide subgrid layout and subgrid faces communicated between nodes. Implement API without writing a compiler.
9 API Design Criteria Routines are extern C functions callable from C and Fortran: extern functions <==> C++ methods. Overlapping of computation and communications. Hide data layout: Constructor, destructors. Query routines to support limited number of deftypes. Support for multi-process or multi-threaded computations hidden from user control. Functions do not (by default) make conversions of arguments from one layout into another layout. An error is generated if arguments are in incompatible.
10 II. Level 1 MP-API implementation Definition of MP interface (Edwards, Watson) Bindings for C, C++ and eventually Fortran. see doc Implementation of MP-API over MPI subset (Edwards) Implementation of C++ MP-API for QCDOC (Jung) Myrinet optimization using GM (Jie Chen) Port of MILC code to level 1 MP-API (DeTar, Osborn)
11 Performance Considerations for Level 2 Data layout over processors Overlapping communications and computations: C(x)=A(x)*shift(B,mu): The face of a subgrid is sent nonblocking to a neighboring node, e.g. in the forward direction. The neighboring node, in the backward direction, sends its face into a preallocated buffer. While this is going on, the operation is performed on the interior sites. A wait is issued and the operation is performed on the face.
12 Lazy Evaluation for Overlapping Comm/Comp Consider the equation dest(x) = src1(x)*src2(x+nu) ; (for all x) or decomposed as tmp(x) = src2(x+mu); dest(x) = src1(x)*tmp(x) Implementation 1: As two functions: Shift(tmp, src2, mu,plus); Multiply(dest, src1, tmp); Implementation 2: Shift also return its result: Multiply(dest, src1, Shift(src2, mu,plus));
13 Data Types Fields have various types (indices): Color: ( ), Spin: Γ, ψ ( ), ( ) U ij x i x Q ij αβ α αβ x Index type ( i.e the fiber over base lattice site ) Gauge : Product(Matrix(Nc),Scalar) Dirac: Product(Vector(Nc),Vector(Ns)) Scalars: Scalar Propagators: Product(Matrix(Nc),Matrix(Ns))? Support Red/Black sublattices & other subsets (Mask?) Support compatible operations on types: i ( )* Γ * ψ ( ) ij U x αβ α x Matrix(color)*Matrix(spin)*Vector(color,spin)
14 C Naming Convention for Level 2 void QCDF_mult_T3T1T2_op3(Type3 *r, const Type1 *a,consttype2*b) T3, T1, T2 are short for the type Type1, Type2 and Type3 : LatticeGaugeF, LatticeDiracFermionF, LatticeHalfFermionF, LatticePropagatorF op3 are options like nnr r = a*b nnn r = -a*b ncr r = a*conj(b) ncn r = -a*conj(b) cnr r = conj(a)*b cnn r = -conj(a)*b ccr r = conj(a)*conj(b) ccn r = -conj(a)*conj(b) nna r = r + a*b nns r = r a*b nca r = r + a*conj(b) ncs r = r a*conj(b) cna r = r + conj(a)*b cna r = r conj(a)*b cca r = r + conj(a)*conj(b) ccs r = r - conj(a)*conj(b)
15 Data Parallel Interface for Level 2 Unary operations: operate on one source into a target Lattice_Field Shift(Lattice_field source, enum sign, int direction); void Copy(Lattice_Field dest, Lattice_Field source, enum option); void Trace(double dest, Lattice_Field source, enum option); Binary operations: operate on two sources into a target void Multiply(Lattice_Field dest, Lattice_Field src1, Lattice_Field src2, enum option); void Compare(Lattice_Bool dest, Lattice_Field src1, Lattice_Field src2, enum compare_func); Broadcasts: broadcast throughout lattice void Fill(Lattice_Field dest, float val); Reductions: reduce through the lattice void Sum(double dest, Lattice_Field source);
16 III. Linear Algebra: QCD Kernels First draft of Level 1 Linear Algebra API (DeTar, Edwards, Pochinksy) Vertical slice for QCD API (Pochinsky) API conformant example of Dirac CG MILC implementation (Osborn) Optimize on Pentium 4 SSE & SSE2 code: for MILC (Holgrem, Simone, Gottlieb) for SZIN l (Edwards, McClendon)
17 Single Node Performance in Megaflops for single-node Dirac operator, 1.7 GHz Pentium 4 Xeon Megaflops bit C 32bit SSE 64bit C 64bit SSE ^4 8*4^3 16*4^3 8^4 lattice size
18 IV. Application Porting & Optimization MILC: ( revision version 6_15oct01) QCDOC ASIC simulation of MILC (Calin, Christan, Toussaint, Gregory) Prefetching Strategies (Holgren, Simone, Gottlieb) SZIN: (new documentation and revision) (Edwards) Implementation on top of QDP++ (Edwards, Pochinsky) Goal: efficient code for P4 by Summer 2002 CPS (Columbia Physics System) Software Testing environment running on QCDSP (Miller) Native OS & fabric for MP-API (Jung)
19 V. Data Archives and Data Grid File formats and header Build on successful example of NERSC QCD archive Extend to include lattice sets, propagators, etc. Consider XML for ascii headers Control I/O for data files Search user data using SQL to find locations. Lattice Portal Replicate data (multi-site), global tree structure. SQL-like data base for storing data and retrieving Web based computing batch system and uniform scripting tool.
20 VI. Performance and Exec. Environment Performance Analysis Tool: SvPABLO instrumentation of MILC (Celso) Extension through PAPI interface to P4 architecture (Dongarra) FNAL Tools: Trace Tools extension to Pentium 4 and instrumentation of MILC (Rechenmacher, Holmgen, Matsumura) FNAL rgang (parallel command dispatcher) FermiQCD (DiPierro) Cluster Tools: ( Holmgren, Watson ) Building, operations, monitoring, BIOS update, etc
21 SvPablo Instrumentation of MILC
Status of the US LQCD Infrastructure Project
Status of the US LQCD Infrastructure Project Bob Sugar Allhands Meeting, April 6-7, 2006 p.1/15 Overview SciDAC-2 Proposal The LQCD Computing Project Allhands Meeting, April 6-7, 2006 p.2/15 SciDAC-2 Proposal
Workflow Requirements (Dec. 12, 2006)
1 Functional Requirements Workflow Requirements (Dec. 12, 2006) 1.1 Designing Workflow Templates The workflow design system should provide means for designing (modeling) workflow templates in graphical
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
End-user Tools for Application Performance Analysis Using Hardware Counters
1 End-user Tools for Application Performance Analysis Using Hardware Counters K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, T. Spencer Abstract One purpose of the end-user tools described in
1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
An Introduction to Android
An Introduction to Android Michalis Katsarakis M.Sc. Student [email protected] Tutorial: hy439 & hy539 16 October 2012 http://www.csd.uoc.gr/~hy439/ Outline Background What is Android Android as a
QCD as a Video Game?
QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture
Building an Inexpensive Parallel Computer
Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University
Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory
Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1 A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks
Performance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
MPI / ClusterTools Update and Plans
HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski
BLM 413E - Parallel Programming Lecture 3
BLM 413E - Parallel Programming Lecture 3 FSMVU Bilgisayar Mühendisliği Öğr. Gör. Musa AYDIN 14.10.2015 2015-2016 M.A. 1 Parallel Programming Models Parallel Programming Models Overview There are several
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Lattice QCD Performance. on Multi core Linux Servers
Lattice QCD Performance on Multi core Linux Servers Yang Suli * Department of Physics, Peking University, Beijing, 100871 Abstract At the moment, lattice quantum chromodynamics (lattice QCD) is the most
ELEC 377. Operating Systems. Week 1 Class 3
Operating Systems Week 1 Class 3 Last Class! Computer System Structure, Controllers! Interrupts & Traps! I/O structure and device queues.! Storage Structure & Caching! Hardware Protection! Dual Mode Operation
Interoperability between Sun Grid Engine and the Windows Compute Cluster
Interoperability between Sun Grid Engine and the Windows Compute Cluster Steven Newhouse Program Manager, Windows HPC Team [email protected] 1 Computer Cluster Roadmap Mainstream HPC Mainstream
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX
Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy
Solution for private cloud computing
The CC1 system Solution for private cloud computing 1 Outline What is CC1? Features Technical details Use cases By scientist By HEP experiment System requirements and installation How to get it? 2 What
Use of ROOT in Geant4
Use of ROOT in Geant4 A.Dotti, SLAC I. Hrivnacova, IPN Orsay W. Pokorski, CERN ROOT Users Workshop, 11-14 March 2013, Saas-Fee Outline Analysis tools in Geant4 Use of Root in Geant4 testing Experience
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
MPLAB TM C30 Managed PSV Pointers. Beta support included with MPLAB C30 V3.00
MPLAB TM C30 Managed PSV Pointers Beta support included with MPLAB C30 V3.00 Contents 1 Overview 2 1.1 Why Beta?.............................. 2 1.2 Other Sources of Reference..................... 2 2
Using the Windows Cluster
Using the Windows Cluster Christian Terboven [email protected] aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster
Introduction. What is an Operating System?
Introduction What is an Operating System? 1 What is an Operating System? 2 Why is an Operating System Needed? 3 How Did They Develop? Historical Approach Affect of Architecture 4 Efficient Utilization
Large-Data Software Defined Visualization on CPUs
Large-Data Software Defined Visualization on CPUs Greg P. Johnson, Bruce Cherniak 2015 Rice Oil & Gas HPC Workshop Trend: Increasing Data Size Measuring / modeling increasingly complex phenomena Rendering
ST810 Advanced Computing
ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
Integrated Open-Source Geophysical Processing and Visualization
Integrated Open-Source Geophysical Processing and Visualization Glenn Chubak* University of Saskatchewan, Saskatoon, Saskatchewan, Canada [email protected] and Igor Morozov University of Saskatchewan,
Client/Server Computing Distributed Processing, Client/Server, and Clusters
Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the
Lua as a business logic language in high load application. Ilya Martynov [email protected] CTO at IPONWEB
Lua as a business logic language in high load application Ilya Martynov [email protected] CTO at IPONWEB Company background Ad industry Custom development Technical platform with multiple components Custom
Lecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
What is An Introduction
What is? An Introduction GenICam_Introduction.doc Page 1 of 14 Table of Contents 1 SCOPE OF THIS DOCUMENT... 4 2 GENICAM'S KEY IDEA... 4 3 GENICAM USE CASES... 5 3.1 CONFIGURING THE CAMERA... 5 3.2 GRABBING
A Brief Survery of Linux Performance Engineering. Philip J. Mucci University of Tennessee, Knoxville [email protected]
A Brief Survery of Linux Performance Engineering Philip J. Mucci University of Tennessee, Knoxville [email protected] Overview On chip Hardware Performance Counters Linux Performance Counter Infrastructure
Big Data Visualization on the MIC
Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth [email protected] Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth
Building Applications Using Micro Focus COBOL
Building Applications Using Micro Focus COBOL Abstract If you look through the Micro Focus COBOL documentation, you will see many different executable file types referenced: int, gnt, exe, dll and others.
Lesson Objectives. To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization
Lesson Objectives To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization AE3B33OSD Lesson 1 / Page 2 What is an Operating System? A
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
Parallel Processing using the LOTUS cluster
Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS
Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer
Virtualization of a Cluster Batch System
Virtualization of a Cluster Batch System Christian Baun, Volker Büge, Benjamin Klein, Jens Mielke, Oliver Oberst and Armin Scheurer Die Kooperation von Cluster Batch System Batch system accepts computational
Application Development Guide: Building and Running Applications
IBM DB2 Universal Database Application Development Guide: Building and Running Applications Version 8 SC09-4825-00 IBM DB2 Universal Database Application Development Guide: Building and Running Applications
High Performance Computing
High Performance Computing Trey Breckenridge Computing Systems Manager Engineering Research Center Mississippi State University What is High Performance Computing? HPC is ill defined and context dependent.
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29.
bwgrid Treff MA/HD Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29. September 2010 Richling/Kredel (URZ/RUM) bwgrid Treff WS 2010/2011 1 / 25 Course
Introduction to Linux and Cluster Basics for the CCR General Computing Cluster
Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959
Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
Part I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
A High Performance Computing Scheduling and Resource Management Primer
LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
Cluster Computing in a College of Criminal Justice
Cluster Computing in a College of Criminal Justice Boris Bondarenko and Douglas E. Salane Mathematics & Computer Science Dept. John Jay College of Criminal Justice The City University of New York 2004
How To Develop Android On Your Computer Or Tablet Or Phone
AN INTRODUCTION TO ANDROID DEVELOPMENT CS231M Alejandro Troccoli Outline Overview of the Android Operating System Development tools Deploying application packages Step-by-step application development The
OpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
Streamline Computing Linux Cluster User Training. ( Nottingham University)
1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running
64-Bit versus 32-Bit CPUs in Scientific Computing
64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples
PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
Introduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
Chapter 11 I/O Management and Disk Scheduling
Operatin g Systems: Internals and Design Principle s Chapter 11 I/O Management and Disk Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles An artifact can
benchmarking Amazon EC2 for high-performance scientific computing
Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received
KITES TECHNOLOGY COURSE MODULE (C, C++, DS)
KITES TECHNOLOGY 360 Degree Solution www.kitestechnology.com/academy.php [email protected] [email protected] Contact: - 8961334776 9433759247 9830639522.NET JAVA WEB DESIGN PHP SQL, PL/SQL
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva [email protected] Introduction Intel
Parallel Analysis and Visualization on Cray Compute Node Linux
Parallel Analysis and Visualization on Cray Compute Node Linux David Pugmire, Oak Ridge National Laboratory and Hank Childs, Lawrence Livermore National Laboratory and Sean Ahern, Oak Ridge National Laboratory
Improved LS-DYNA Performance on Sun Servers
8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms
Symmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures
Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.
Content Introduction... 2 Data Access Server Control Panel... 2 Running the Sample Client Applications... 4 Sample Applications Code... 7 Server Side Objects... 8 Sample Usage of Server Side Objects...
Performance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
Oracle9i Release 2 Database Architecture on Windows. An Oracle Technical White Paper April 2003
Oracle9i Release 2 Database Architecture on Windows An Oracle Technical White Paper April 2003 Oracle9i Release 2 Database Architecture on Windows Executive Overview... 3 Introduction... 3 Oracle9i Release
Building an energy dashboard. Energy measurement and visualization in current HPC systems
Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 [email protected] SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators
Performance Analysis of Mixed Distributed Filesystem Workloads
Performance Analysis of Mixed Distributed Filesystem Workloads Esteban Molina-Estolano, Maya Gokhale, Carlos Maltzahn, John May, John Bent, Scott Brandt Motivation Hadoop-tailored filesystems (e.g. CloudStore)
GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy
Inmagic Content Server Standard and Enterprise Configurations Technical Guidelines
Inmagic Content Server v1.3 Technical Guidelines 6/2005 Page 1 of 15 Inmagic Content Server Standard and Enterprise Configurations Technical Guidelines Last Updated: June, 2005 Inmagic, Inc. All rights
Going Linux on Massive Multicore
Embedded Linux Conference Europe 2013 Going Linux on Massive Multicore Marta Rybczyńska 24th October, 2013 Agenda Architecture Linux Port Core Peripherals Debugging Summary and Future Plans 2 Agenda Architecture
Thin@ System Architecture V3.2. Last Update: August 2015
Thin@ System Architecture V3.2 Last Update: August 2015 Introduction http://www.thinetsolution.com Welcome to Thin@ System Architecture manual! Modern business applications are available to end users as
Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012
Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing N.F. Huysamen and A.E. Krzesinski Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South
Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner
(Conference Report) Peter Wegner SC2004 conference Top500 List BG/L Moors Law, problems of recent architectures Solutions Interconnects Software Lattice QCD machines DESY @SC2004 QCDOC Conclusions Technical
ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER
ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER Pierre A. von Kaenel Mathematics and Computer Science Department Skidmore College Saratoga Springs, NY 12866 (518) 580-5292 [email protected] ABSTRACT This paper
An Incomplete C++ Primer. University of Wyoming MA 5310
An Incomplete C++ Primer University of Wyoming MA 5310 Professor Craig C. Douglas http://www.mgnet.org/~douglas/classes/na-sc/notes/c++primer.pdf C++ is a legacy programming language, as is other languages
Turbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
How To Visualize Performance Data In A Computer Program
Performance Visualization Tools 1 Performance Visualization Tools Lecture Outline : Following Topics will be discussed Characteristics of Performance Visualization technique Commercial and Public Domain
INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism
Intel Cilk Plus: A Simple Path to Parallelism Compiler extensions to simplify task and data parallelism Intel Cilk Plus adds simple language extensions to express data and task parallelism to the C and
