Program Coupling and Parallel Data Transfer Using PAWS
|
|
|
- Mary Holt
- 10 years ago
- Views:
Transcription
1 Program Coupling and Parallel Data Transfer Using PAWS Sue Mniszewski, CCS-3 Pat Fasel, CCS-3 Craig Rasmussen, CCS-1
2 2 Primary Goal of PAWS Provide the ability to share data between two separate, parallel applications, using the maximum bandwidth available
3 3 Challenges Coupling of parallel applications Avoid serialization Provide several synchronization strategies Connect parallel applications with unequal numbers of processes Be extensible to new, user-defined parallel distribution strategies
4 4 PAWS Design Goals Rewrite of PAWS 1 Same general philosophy Using Cheetah/MPI run-time C++, C, and Fortran APIs Separation of layout and data (Ports & Views) Still single-threaded Channel abstraction Action caching (no barriers!)
5 5 PAWS Port Concept Separates layout from data Has a shape, but no data type Ports are connected, not data Allows multiple datas to use a single port Scalar ports for int, float, double, string View ports for rectilinear parallel data
6 6 Sample PAWS Environment Application PAWS Controller Visualization Parallel Transfer
7 7 PAWS Controller PAWS Controller Application Database Port Database Database with registered applications Database with registered ports Initialize connection between ports
8 8 PAWS Application Must be told where to find Controller Use PAWS API to: Register app and ports with Controller Query Controller databases Make connections Send/Receive data Disconnect ports or app from Controller
9 9 Running Applications PAWS Launcher/Batch (bsub) mpirun..... Cheetah/MPI Groups App1 PAWS Controller App2 Script
10 10 PAWS Networks PAWS Launcher/Controller accepts a script to start apps and connect ports Controller with registered apps and ports Controller script connectpawsports ta1::a ta2::b1 connectpawsports ta2::b2 ta3::c paws go ta1 ta2 ta3 Port A Port B1 Port B2 Port C
11 11 Parallel Redistribution Each application can partition data differently Different numbers of processes Different parallel layouts
12 PAWS Data Model for 12 Rectilinear Data Parallel data objects in PAWS have two parts: A global domain G, and a list of subdomains { Di } D1 D2 D3 G Allocated blocks of memory, and mappings (known as views) from memory to the { Di } subdomains G and { Di } are all that is needed to compute a message schedule Views and allocated memory locations can change
13 Example of G and { Di } in 13 PAWS G D1 D2 D3
14 14 PAWS Data Model: Views Views describe how data is mapped from user storage to the subdomain blocks: User s allocated storage V1 V2 D1 D2 PAWS Data Model
15 PAWS Data Model: Types of 15 Views Ways that we can specify the view domain Vi: Strided normal D-dimensional domain, zero-based 1D example: Vi = {0, Asize-1, 1} means all of Ai. Indirection list: list of M indirection points, each a D- dimensional zero-based index into Ai domain. (future) 1D example: Vi = { {3,4}, {8,1}, {4,4} } is a 3-element list. Function view: generate the data from a userprovided function (or functor), instead of copying from memory. (future) The user could implement his/her own type of view. This will be the user-extensibility method in PAWS.
16 PAWS Data: Row- or 16 Column-major Organization of data in memory (row- or columnmajor) is a characteristic of the data storage, not the data model. Data model always stores sizes in same format. Dimension information is listed left-to-right, first dimension to last. Only use row- or column-major info to calculate final location in memory during transfers. So, R- or C-major is a characteristic of a view. Specify it when you specify a view.
17 17 Registering a Port Information specified when you register ports from the user s program: Initial data model info G and { Di } (data model) Input or Output Port type, scalar or distributed
18 18 Connecting Ports What info does both sides of a connection share? Data model info G and { Di } for both sides Dynamic/static nature of layout on both sides Send/Receive message schedules, if they can be computed
19 19 Synchronous Data Transfer Data is transferred when both sides indicate they are ready One side does a send(), the other a receive() Before transfer, users must describe G, { Di }, and the assignment of each Di to the processors when registering a Port. The user provides the views, Vi, and pointers to the allocated blocks of memory, in the send() and receive() calls.
20 // REGISTER WITH PAWS Paws::Application app(argc, argv); Send (C++) // CREATE PORT PAWS Code Example Paws::Representation rep(wholedom, mydom, /* REGISTER myrank); WITH PAWS */ Paws::Port A("A", rep, Paws::PAWS_OUT, paws_initialize(argc, Paws::PAWS_DISTRIBUTED); argv); Paws::Port I("I", Paws::PAWS_OUT, Paws::PAWS_SCALAR); Receive (C) app.addport(&i); app.addport(&a); /* CREATE PORTS */ int rep = paws_representation(wholedom, mydom, myrank); // ALLOCATE DATA AND CREATE VIEW int* data = new int[mysize]; int B = paws_view_port("b", rep, PAWS_IN); Paws::ViewArray<int> view(paws::paws_column); int J = paws_scalar_port("j", PAWS_IN); view.addviewblock(data, myallocdom, myviewdom); /* ALLOCATE DATA AND CREATE VIEW */ // READY app.ready(); // SEND DATA int iter = 100; I.send(&iter); for (i = 0; i < iter; i++) { A.send(view); } // CLEANUP app.finalize(); int* data = (int*) malloc(mysize * sizeof(int)); int view = paws_view(paws_integer, PAWS_ROW); paws_add_view_block(view, data, myallocdom, myviewdom); /* READY */ paws_ready(); /* RECEIVE DATA */ paws_scalar_receive(j, iter, PAWS_INTEGER, 1); for (i = 0; i < iter; i++) { } paws_view_receive(b, view); /* CLEANUP */ paws_finalize(); 20
21 21 Dynamically Resizing Data What if one side decided to resize or repartition? The other side must find out about new size or distribution of the other side A new message schedule must be calculated Memory may have to be reallocated PAWS provides an interface that: Lets the user find out the new global domain G and subdomains { Di } for the other side of the connection Lets the user reallocate their data if it is necessary Supports resize events from either side of the connection
22 22 Resize Code Example Application app(argc, argv); Send Application app(argc, argv); Receive // CREATE EMPTY REP AND VIEW // CREATE EMPTY REP AND VIEW Representation rep(myrank); Representation rep(myrank); Port A("A", rep, PAWS_OUT, PAWS_DISTRIBUTED); Port B("B", rep, PAWS_IN, PAWS_DISTRIBUTED); ViewArray<int> view(paws_column); ViewArray<int> view(paws_column); app.ready(); app.ready(); // RESIZE DATA ON EVERY SEND for (i = 0; i < iter; i++) { int* data = new int[newmysize]; rep.update(newwholedom, newmydom); view.update(data, newmyallocdom, newmyviewdom); A.resize(rep); A.send(view); } app.finalize(); // WAIT FOR SIZE ON EVERY RECEIVE for (i = 0; i < iter; i++) B.resizeWait(); newwholedom = B.domain(); newmydom = redistribute(newwholedom, rank); int* data = new int[newmysize]; rep.update(newwholedom, newmydom); view.update(data, newmyallocdom, newmyviewdom); B.update(rep); B.receive(view); } app.finalize();
23 23 Preliminary Results PAWS Timing Comparison, 128 X 128 X 128 data, 100 iterations Time(sec) Sync send/receive # of processors (send:receive)
24 24 Future Directions Asynchronous data transfer get(), unlock(), lock(), wait(), check() MPI 2 Communications More parallel data structures Particles, unstructured, trees, etc. Better scripting interface Common Component Architecture (CCA) style framework
ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008
ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State
Apache Traffic Server Extensible Host Resolution
Apache Traffic Server Extensible Host Resolution at ApacheCon NA 2014 Speaker Alan M. Carroll, Apache Member, PMC Started working on Traffic Server in summer 2010. Implemented Transparency, IPv6, range
GPI Global Address Space Programming Interface
GPI Global Address Space Programming Interface SEPARS Meeting Stuttgart, December 2nd 2010 Dr. Mirko Rahn Fraunhofer ITWM Competence Center for HPC and Visualization 1 GPI Global address space programming
Parallel Computing. Parallel shared memory computing with OpenMP
Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014
OpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
Multi-GPU Load Balancing for Simulation and Rendering
Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks
Debugging with TotalView
Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
Data-Flow Awareness in Parallel Data Processing
Data-Flow Awareness in Parallel Data Processing D. Bednárek, J. Dokulil *, J. Yaghob, F. Zavoral Charles University Prague, Czech Republic * University of Vienna, Austria 6 th International Symposium on
Parallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
Resource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
Java 7 Recipes. Freddy Guime. vk» (,\['«** g!p#« Carl Dea. Josh Juneau. John O'Conner
1 vk» Java 7 Recipes (,\['«** - < g!p#«josh Juneau Carl Dea Freddy Guime John O'Conner Contents J Contents at a Glance About the Authors About the Technical Reviewers Acknowledgments Introduction iv xvi
Load Imbalance Analysis
With CrayPat Load Imbalance Analysis Imbalance time is a metric based on execution time and is dependent on the type of activity: User functions Imbalance time = Maximum time Average time Synchronization
Practical Data Visualization and Virtual Reality. Virtual Reality VR Software and Programming. Karljohan Lundin Palmerius
Practical Data Visualization and Virtual Reality Virtual Reality VR Software and Programming Karljohan Lundin Palmerius Synopsis Scene graphs Event systems Multi screen output and synchronization VR software
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version 6.3.1 Fix Pack 2.
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version 6.3.1 Fix Pack 2 Reference IBM Tivoli Composite Application Manager for Microsoft
Parallelization: Binary Tree Traversal
By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
Parallel Computing. Shared memory parallel programming with OpenMP
Parallel Computing Shared memory parallel programming with OpenMP Thorsten Grahs, 27.04.2015 Table of contents Introduction Directives Scope of data Synchronization 27.04.2015 Thorsten Grahs Parallel Computing
SQL Server Array Library 2010-11 László Dobos, Alexander S. Szalay
SQL Server Array Library 2010-11 László Dobos, Alexander S. Szalay The Johns Hopkins University, Department of Physics and Astronomy Eötvös University, Department of Physics of Complex Systems http://voservices.net/sqlarray,
Distributed communication-aware load balancing with TreeMatch in Charm++
Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration
Implementation Aspects of OO-Languages
1 Implementation Aspects of OO-Languages Allocation of space for data members: The space for data members is laid out the same way it is done for structures in C or other languages. Specifically: The data
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
BLM 413E - Parallel Programming Lecture 3
BLM 413E - Parallel Programming Lecture 3 FSMVU Bilgisayar Mühendisliği Öğr. Gör. Musa AYDIN 14.10.2015 2015-2016 M.A. 1 Parallel Programming Models Parallel Programming Models Overview There are several
Index. AdWords, 182 AJAX Cart, 129 Attribution, 174
Index A AdWords, 182 AJAX Cart, 129 Attribution, 174 B BigQuery, Big Data Analysis create reports, 238 GA-BigQuery integration, 238 GA data, 241 hierarchy structure, 238 query language (see also Data selection,
Lecture 6: Introduction to MPI programming. Lecture 6: Introduction to MPI programming p. 1
Lecture 6: Introduction to MPI programming Lecture 6: Introduction to MPI programming p. 1 MPI (message passing interface) MPI is a library standard for programming distributed memory MPI implementation(s)
HPCC - Hrothgar Getting Started User Guide MPI Programming
HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...
Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies
Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA Characterizing the Performance of Dynamic Distribution
OpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group [email protected] This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
How to Configure the Workflow Service and Design the Workflow Process Templates
How-To Guide SAP Business One 9.0 Document Version: 1.0 2012-11-15 How to Configure the Workflow Service and Design the Workflow Process Templates Typographic Conventions Type Style Example Description
An Introduction to Parallel Computing/ Programming
An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European
G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.
SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware
OpenMP & MPI CISC 879 Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction OpenMP MPI Model Language extension: directives-based
Getting Started with Vision 6
Getting Started with Vision 6 Version 6.9 Notice Copyright 1981-2009 Netop Business Solutions A/S. All Rights Reserved. Portions used under license from third parties. Please send any comments to: Netop
An Incomplete C++ Primer. University of Wyoming MA 5310
An Incomplete C++ Primer University of Wyoming MA 5310 Professor Craig C. Douglas http://www.mgnet.org/~douglas/classes/na-sc/notes/c++primer.pdf C++ is a legacy programming language, as is other languages
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
HPC ABDS: The Case for an Integrating Apache Big Data Stack
HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox [email protected] http://www.infomall.org
FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r
: 1. FHE DEFINITIVE GUIDE fir y ti rvrrtuttnrr i t i r ^phihri^^lv ;\}'\^X$:^u^'! :: ^ : ',!.4 '. JEFFREY GARBUS PLAMEN ratchev Alvin Chang Joe Celko g JONES & BARTLETT LEARN IN G Contents About the Authors
Optimization tools. 1) Improving Overall I/O
Optimization tools After your code is compiled, debugged, and capable of running to completion or planned termination, you can begin looking for ways in which to improve execution speed. In general, the
Topological Properties
Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva [email protected] Introduction Intel
DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2
DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.
Lecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling
Center for Information Services and High Performance Computing (ZIH) FD4: A Framework for Highly Scalable Dynamic Load Balancing and Model Coupling Symposium on HPC and Data-Intensive Applications in Earth
Power Reduction Techniques in the SoC Clock Network. Clock Power
Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a
Control 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!
Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel
C#5.0 IN A NUTSHELL. Joseph O'REILLY. Albahari and Ben Albahari. Fifth Edition. Tokyo. Sebastopol. Beijing. Cambridge. Koln.
Koln C#5.0 IN A NUTSHELL Fifth Edition Joseph Albahari and Ben Albahari O'REILLY Beijing Cambridge Farnham Sebastopol Tokyo Table of Contents Preface xi 1. Introducing C# and the.net Framework 1 Object
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
32 K. Stride 8 32 128 512 2 K 8 K 32 K 128 K 512 K 2 M
Evaluating a Real Machine Performance Isolation using Microbenchmarks Choosing Workloads Evaluating a Fixed-size Machine Varying Machine Size Metrics All these issues, plus more, relevant to evaluating
www.quilogic.com SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology
SQL/XML-IMDBg GPU boosted In-Memory Database for ultra fast data management Harald Frick CEO QuiLogic In-Memory DB Technology The parallel revolution Future computing systems are parallel, but Programmers
IceWarp to IceWarp Server Migration
IceWarp to IceWarp Server Migration Registered Trademarks iphone, ipad, Mac, OS X are trademarks of Apple Inc., registered in the U.S. and other countries. Microsoft, Windows, Outlook and Windows Phone
Capacity Planning Process Estimating the load Initial configuration
Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting
Availability Digest. www.availabilitydigest.com. Raima s High-Availability Embedded Database December 2011
the Availability Digest Raima s High-Availability Embedded Database December 2011 Embedded processing systems are everywhere. You probably cannot go a day without interacting with dozens of these powerful
Monitoring Infrastructure (MIS) Software Architecture Document. Version 1.1
Monitoring Infrastructure (MIS) Software Architecture Document Version 1.1 Revision History Date Version Description Author 28-9-2004 1.0 Created Peter Fennema 8-10-2004 1.1 Processed review comments Peter
A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
Cellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
Advanced MPI. Hybrid programming, profiling and debugging of MPI applications. Hristo Iliev RZ. Rechen- und Kommunikationszentrum (RZ)
Advanced MPI Hybrid programming, profiling and debugging of MPI applications Hristo Iliev RZ Rechen- und Kommunikationszentrum (RZ) Agenda Halos (ghost cells) Hybrid programming Profiling of MPI applications
Apache Hama Design Document v0.6
Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault
vcenter Operations Manager Administration 5.0 Online Help VPAT
Administration 5.0 Online Help VPAT Product Name: Administration 5.0 Online Help VPAT Since the VPAT must be comprehensive, all Section 508 issues on all pages must be corrected to sustain compliance.
E) Modeling Insights: Patterns and Anti-patterns
Murray Woodside, July 2002 Techniques for Deriving Performance Models from Software Designs Murray Woodside Second Part Outline ) Conceptual framework and scenarios ) Layered systems and models C) uilding
DB2 for i. Analysis and Tuning. Mike Cain IBM DB2 for i Center of Excellence. [email protected]
DB2 for i Monitoring, Analysis and Tuning Mike Cain IBM DB2 for i Center of Excellence Rochester, MN USA [email protected] 8 Copyright IBM Corporation, 2008. All Rights Reserved. This publication may refer
GPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
Specific Simple Network Management Tools
Specific Simple Network Management Tools Jürgen Schönwälder University of Osnabrück Albrechtstr. 28 49069 Osnabrück, Germany Tel.: +49 541 969 2483 Email: Web:
Hybrid Programming with MPI and OpenMP
Hybrid Programming with and OpenMP Ricardo Rocha and Fernando Silva Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2015/2016 R. Rocha and F. Silva (DCC-FCUP) Programming
Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup
Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor
KPACK: SQL Capacity Monitoring
KPACK: SQL Capacity Monitoring Microsoft SQL database capacity monitoring is extremely critical for enterprise high availability deployments. Although built-in SQL tools and certain 3 rd party monitoring
Parallel Programming with MPI on the Odyssey Cluster
Parallel Programming with MPI on the Odyssey Cluster Plamen Krastev Office: Oxford 38, Room 204 Email: [email protected] FAS Research Computing Harvard University Objectives: To introduce you
Software Engineering Best Practices. Christian Hartshorne Field Engineer Daniel Thomas Internal Sales Engineer
Software Engineering Best Practices Christian Hartshorne Field Engineer Daniel Thomas Internal Sales Engineer 2 3 4 Examples of Software Engineering Debt (just some of the most common LabVIEW development
SQL Server 2005 Features Comparison
Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions
2. Metadata Modeling Best Practices with Cognos Framework Manager
IBM Cognos 10.1 DWH Basics 1 Cognos System Administration 2 Metadata Modeling Best Practices With Cognos Framework Manager 3 OLAP Modeling With Cognos Transformer (Power Play Tranformer) 4 Multidimensional
Scientific Computing Programming with Parallel Objects
Scientific Computing Programming with Parallel Objects Esteban Meneses, PhD School of Computing, Costa Rica Institute of Technology Parallel Architectures Galore Personal Computing Embedded Computing Moore
This course provides students with the knowledge and skills to develop ASP.NET MVC 4 web applications.
20486B: Developing ASP.NET MVC 4 Web Applications Course Overview This course provides students with the knowledge and skills to develop ASP.NET MVC 4 web applications. Course Introduction Course Introduction
PicoScope 5000 Series (A API)
PicoScope 5000 Series A API PC Oscilloscopes Programmer's Guide PicoScope 5000 Series A API Programmer's Guide I Contents 1 Introduction...1 1 Overview...1...2 2 Minimum PC requirements 3 License agreement...3
Automation Engine 14.1. AE Server management
14.1 AE Server management 06-2015 Contents 1. The Server Web Page... 3 2. Status Overview...4 2.1 FAQs on Restarting and Reactivating the Server...5 3. Server Activity... 6 4. Server Setup... 7 4.1 Server
Streamline Computing Linux Cluster User Training. ( Nottingham University)
1 Streamline Computing Linux Cluster User Training ( Nottingham University) 3 User Training Agenda System Overview System Access Description of Cluster Environment Code Development Job Schedulers Running
HPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators
A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators Sandra Wienke 1,2, Christian Terboven 1,2, James C. Beyer 3, Matthias S. Müller 1,2 1 IT Center, RWTH Aachen University 2 JARA-HPC, Aachen
Big Data looks Tiny from the Stratosphere
Volker Markl http://www.user.tu-berlin.de/marklv [email protected] Big Data looks Tiny from the Stratosphere Data and analyses are becoming increasingly complex! Size Freshness Format/Media Type
Acronis Backup & Recovery: Events in Application Event Log of Windows http://kb.acronis.com/content/38327
Acronis Backup & Recovery: Events in Application Event Log of Windows http://kb.acronis.com/content/38327 Mod ule_i D Error _Cod e Error Description 1 1 PROCESSOR_NULLREF_ERROR 1 100 ERROR_PARSE_PAIR Failed
Lab 3: Introduction to Data Acquisition Cards
Lab 3: Introduction to Data Acquisition Cards INTRODUCTION: In this lab, you will be building a VI to display the input measured on a channel. However, within your own VI you will use LabVIEW supplied
BSC vision on Big Data and extreme scale computing
BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,
Optimizing Application Performance with CUDA Profiling Tools
Optimizing Application Performance with CUDA Profiling Tools Why Profile? Application Code GPU Compute-Intensive Functions Rest of Sequential CPU Code CPU 100 s of cores 10,000 s of threads Great memory
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
Network Monitoring. SAN Discovery and Topology Mapping. Device Discovery. Send documentation comments to mdsfeedback-doc@cisco.
CHAPTER 57 The primary purpose of Fabric Manager is to manage the network. In particular, SAN discovery and network monitoring are two of its key network management capabilities. This chapter contains
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
