Cloud-based OpenMP Parallelization Using a MapReduce Runtime. Rodolfo Wottrich, Rodolfo Azevedo and Guido Araujo University of Campinas
|
|
|
- Andra Byrd
- 10 years ago
- Views:
Transcription
1 Cloud-based OpenMP Parallelization Using a MapReduce Runtime Rodolfo Wottrich, Rodolfo Azevedo and Guido Araujo University of Campinas 1
2 MPI_Init(NULL, NULL); MPI_Comm_size(MPI_COMM_WORLD, &comm_sz); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); if(my_rank!= 0) { sprintf(greeting, "Greetings from process %d of %d!", my_rank, comm_sz); MPI_Send(greeting, strlen(greeting)+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD); else { printf("greetings from process %d of %d!\n", my_rank, comm_sz); for(int q = 1; q < comm_sz; q++) { MPI_Recv(greeting, MAX_STRING, MPI_CHAR, q, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("%s\n", greeting); MPI_Finalize(); 2
3 MPI_Init(NULL, NULL); MPI_Comm_size(MPI_COMM_WORLD, &comm_sz); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); if(my_rank!= 0) { sprintf(greeting, "Greetings from process %d of %d!", my_rank, comm_sz); MPI_Send(greeting, strlen(greeting)+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD); else { printf("greetings from process %d of %d!\n", my_rank, comm_sz); for(int q = 1; q < comm_sz; q++) { MPI_Recv(greeting, MAX_STRING, MPI_CHAR, q, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("%s\n", greeting); MPI_Finalize(); 3
4 MPI_Init(NULL, NULL); MPI_Comm_size(MPI_COMM_WORLD, &comm_sz); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); if(my_rank!= 0) { sprintf(greeting, "Greetings from process %d of %d!", my_rank, comm_sz); MPI_Send(greeting, strlen(greeting)+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD); else { printf("greetings from process %d of %d!\n", my_rank, comm_sz); for(int q = 1; q < comm_sz; q++) { MPI_Recv(greeting, MAX_STRING, MPI_CHAR, q, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("%s\n", greeting); MPI_Finalize(); OpenMR 4
5 OpenMP Directive-annotated code #pragma omp parallel for private(i) for(i = 0; i < N; i++) { a[i] = 2*a[i]; 5
6 MapReduce Framework for distributed processing BigData Functional programming concepts 6
7 map({1,2,3,4,(*2)) {2,4,6,8 reduce({1,2,3,4,(*)) {24 7
8 for(i = 0; i < 16; i++) { a[i] = 2*a[i]; 8
9 OpenMR 9
10 OpenMR 10
11 OpenMR Syntax #pragma omp mapreduce for input(a[]) \ output(sum) reduction(+:sum) for(i = 0; i < N; i++) { sum += 2*a[i]; 11
12 #pragma omp mapreduce data int a[10000]; #pragma omp mapreduce data int b[10000:nodes][10000]; #pragma omp mapreduce data int c[10000:nodes][10000:nodes]; 12
13 OpenMR Preparing the MR job - before #pragma omp mapreduce for input(a[]) \ output(sum) reduction(+:sum) for(i = 0; i < N; i++) { sum += 2*a[i]; 13
14 OpenMR Preparing the MR job - after for(i = 0; i < N; i++) { file_write(omr_input, &i, 1); file_write(omr_data, &a, N); omr_push_files(); 14
15 OpenMR Preparing the MR job - after for(i = 0; i < N; i++) { file_write(omr_input, &i, 1); file_write(omr_data, &a, N); omr_push_files(); omr_trigger_and_wait(); 15
16 OpenMR Preparing the MR job - after for(i = 0; i < N; i++) { file_write(omr_input, &i, 1); file_write(omr_data, &a, N); omr_push_files(); omr_trigger_and_wait(); omr_retrieve_output(); sum += file_read(omr_output); 16
17 OpenMR Preparing the MR job - mapper int main() { sum = 0; file_read(omr_data, &a, N); while(buffer = getline()) { i = getiteration(buffer); sum = 2*a[i]; // The actual loop body printf( 0\t%d\n, sum); // Print output 17
18 OpenMR Preparing the MR job - reducer int main() { sum = 0; while(buffer = getline()) { tmp = getvalue(buffer); sum += tmp; printf( %d\n, sum); // Print output 18
19 OpenMR Classes of applications DOALL loops No explicit synchronization constructs 19
20 Benchmarks Experiments: benchmarks with code equivalent to OpenMR SPEC OMP2012 (compute-bound) Rodinia (compute-bound) Synthetic benchmarks (I/O-bound) 20
21 SPEC 358botsalgn 372smithwa Rodinia b+tree lavamd myocyte Synthetic Vector Add Dot Product Matrix-vector Multiplication Matrix-matrix Multiplication 21
22 Experimental Setup 1 Baseline: Intel Xeon (8-core) x 3 2 Cloud experiments: Amazon AWS: S3 + EC2 + EMR 235 EC2 instances (Intel Xeon) 1 m1small 234 c1medium (468 vcpus) 22
23 Results 358botsalgn + Amazon EMR 23
24 Results 372smithwa + Amazon EMR 24
25 Results lavamd + Amazon EMR 25
26 Results Synthetic + Amazon EMR 26
27 Related work Based on Target Elasticity Fault tolerance Programmability MPI --- Cloud CPUs No No - OpenMP Pthreads Local CPUs MapReduce --- Cloud CPUs Yes Yes -/+ OpenACC OpenMP, OpenCL, CUDA Local Accelerators SnuCL OpenCL Cloud Accel No No -/+ PGAS --- Cloud CPUs No No (X10) -/+ Elastic OpenMP OpenMP Cloud CPUs Yes (vertical) No + OpenMR OpenMP, MapReduce Cloud CPUs Yes (horizontal) Yes ++ 27
28 Cloud-based OpenMP Parallelization Using a MapReduce Runtime Rodolfo Wottrich, Rodolfo Azevedo and Guido Araujo University of Campinas 28
Parallelization: Binary Tree Traversal
By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First
Parallel Computing. Parallel shared memory computing with OpenMP
Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014
HPCC - Hrothgar Getting Started User Guide MPI Programming
HPCC - Hrothgar Getting Started User Guide MPI Programming High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents 1. Introduction... 3 2. Setting up the environment...
OpenACC Parallelization and Optimization of NAS Parallel Benchmarks
OpenACC Parallelization and Optimization of NAS Parallel Benchmarks Presented by Rengan Xu GTC 2014, S4340 03/26/2014 Rengan Xu, Xiaonan Tian, Sunita Chandrasekaran, Yonghong Yan, Barbara Chapman HPC Tools
Case Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke [email protected] ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators
A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators Sandra Wienke 1,2, Christian Terboven 1,2, James C. Beyer 3, Matthias S. Müller 1,2 1 IT Center, RWTH Aachen University 2 JARA-HPC, Aachen
A Cost-Evaluation of MapReduce Applications in the Cloud
1/23 A Cost-Evaluation of MapReduce Applications in the Cloud Diana Moise, Alexandra Carpen-Amarie Gabriel Antoniu, Luc Bougé KerData team 2/23 1 MapReduce applications - case study 2 3 4 5 3/23 MapReduce
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
Why Choose C/C++ as the programming language? Parallel Programming in C/C++ - OpenMP versus MPI
Parallel Programming (Multi/cross-platform) Why Choose C/C++ as the programming language? Compiling C/C++ on Windows (for free) Compiling C/C++ on other platforms for free is not an issue Parallel Programming
WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed
WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between
Parallel Computing. Shared memory parallel programming with OpenMP
Parallel Computing Shared memory parallel programming with OpenMP Thorsten Grahs, 27.04.2015 Table of contents Introduction Directives Scope of data Synchronization 27.04.2015 Thorsten Grahs Parallel Computing
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
OpenACC Programming on GPUs
OpenACC Programming on GPUs Directive-based GPGPU Programming Sandra Wienke, M.Sc. [email protected] Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum
Cloud Computing. Chapter 1 Introducing Cloud Computing
Cloud Computing Chapter 1 Introducing Cloud Computing Learning Objectives Understand the abstract nature of cloud computing. Describe evolutionary factors of computing that led to the cloud. Describe virtualization
OpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group [email protected] This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
Programming the Intel Xeon Phi Coprocessor
Programming the Intel Xeon Phi Coprocessor Tim Cramer [email protected] Rechen- und Kommunikationszentrum (RZ) Agenda Motivation Many Integrated Core (MIC) Architecture Programming Models Native
To connect to the cluster, simply use a SSH or SFTP client to connect to:
RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or
CP3109: Introduction to Cloud Computing
SoC HUST Summer School 20 26 26 June 2012, Hanoi CP3109: Introduction to Cloud Computing Teo Yong Meng* Department of Computer Science National University of Singapore Email: [email protected] URL:
Lightning Introduction to MPI Programming
Lightning Introduction to MPI Programming May, 2015 What is MPI? Message Passing Interface A standard, not a product First published 1994, MPI-2 published 1997 De facto standard for distributed-memory
Introduction to Hybrid Programming
Introduction to Hybrid Programming Hristo Iliev Rechen- und Kommunikationszentrum aixcelerate 2012 / Aachen 10. Oktober 2012 Version: 1.1 Rechen- und Kommunikationszentrum (RZ) Motivation for hybrid programming
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism
Hybrid Programming with MPI and OpenMP
Hybrid Programming with and OpenMP Ricardo Rocha and Fernando Silva Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2015/2016 R. Rocha and F. Silva (DCC-FCUP) Programming
An Introduction to Parallel Computing/ Programming
An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European
Performance Analysis for GPU Accelerated Applications
Center for Information Services and High Performance Computing (ZIH) Performance Analysis for GPU Accelerated Applications Working Together for more Insight Willersbau, Room A218 Tel. +49 351-463 - 39871
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
OpenACC Basics Directive-based GPGPU Programming
OpenACC Basics Directive-based GPGPU Programming Sandra Wienke, M.Sc. [email protected] Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum (RZ) PPCES,
OpenACC Programming and Best Practices Guide
OpenACC Programming and Best Practices Guide June 2015 2015 openacc-standard.org. All Rights Reserved. Contents 1 Introduction 3 Writing Portable Code........................................... 3 What
SCALING USER-SESSIONS FOR LOAD TESTING OF INTERNET APPLICATIONS
SCALING USER-SESSIONS FOR LOAD TESTING OF INTERNET APPLICATIONS Benjamin Houdeshell IS809 5/14/2014 Background/Motivation Performance/load testing research concerned with the simulation of users behavior
A quick tutorial on Intel's Xeon Phi Coprocessor
A quick tutorial on Intel's Xeon Phi Coprocessor www.cism.ucl.ac.be [email protected] Architecture Setup Programming The beginning of wisdom is the definition of terms. * Name Is a... As opposed
Spring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
A Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
AWS Account Setup and Services Overview
AWS Account Setup and Services Overview 1. Purpose of the Lab Understand definitions of various Amazon Web Services (AWS) and their use in cloud computing based web applications that are accessible over
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
University of Amsterdam - SURFsara. High Performance Computing and Big Data Course
University of Amsterdam - SURFsara High Performance Computing and Big Data Course Workshop 7: OpenMP and MPI Assignments Clemens Grelck [email protected] Roy Bakker [email protected] Adam Belloum [email protected]
Developing MapReduce Programs
Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2015/16 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP
COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State
High Performance Computing
High Performance Computing Oliver Rheinbach [email protected] http://www.mathe.tu-freiberg.de/nmo/ Vorlesung Introduction to High Performance Computing Hörergruppen Woche Tag Zeit Raum
Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams
Neptune A Domain Specific Language for Deploying HPC Software on Cloud Platforms Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams ScienceCloud 2011 @ San Jose, CA June 8, 2011 Cloud Computing Three
Scalability evaluation of barrier algorithms for OpenMP
Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science
Part I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
Objectives. Overview of OpenMP. Structured blocks. Variable scope, work-sharing. Scheduling, synchronization
OpenMP Objectives Overview of OpenMP Structured blocks Variable scope, work-sharing Scheduling, synchronization 1 Overview of OpenMP OpenMP is a collection of compiler directives and library functions
Scheduling Task Parallelism" on Multi-Socket Multicore Systems"
Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction
Platforms and Algorithms for Big Data Analytics Chandan K. Reddy Department of Computer Science Wayne State University
Platforms and Algorithms for Big Data Analytics Chandan K. Reddy Department of Computer Science Wayne State University http://www.cs.wayne.edu/~reddy/ http://dmkd.cs.wayne.edu/tutorial/bigdata/ What is
High Throughput Sequencing Data Analysis using Cloud Computing
High Throughput Sequencing Data Analysis using Cloud Computing Stéphane Le Crom ([email protected]) LBD - Université Pierre et Marie Curie (UPMC) Institut de Biologie de l École normale supérieure
COSCO 2015 Heterogeneous Computing Programming
COSCO 2015 Heterogeneous Computing Programming Michael Meyer, Shunsuke Ishikuro Supporters: Kazuaki Sasamoto, Ryunosuke Murakami July 24th, 2015 Heterogeneous Computing Programming 1. Overview 2. Methodology
How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications
Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance
Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures
Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy Perspectives of GPU Computing in Physics
Chapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
MapReduce (in the cloud)
MapReduce (in the cloud) How to painlessly process terabytes of data by Irina Gordei MapReduce Presentation Outline What is MapReduce? Example How it works MapReduce in the cloud Conclusion Demo Motivation:
Cloud Computing. Chapter 1 Introducing Cloud Computing
Cloud Computing Chapter 1 Introducing Cloud Computing Learning Objectives Understand the abstract nature of cloud computing. Describe evolutionary factors of computing that led to the cloud. Describe virtualization
Cloud-based Analytics and Map Reduce
1 Cloud-based Analytics and Map Reduce Datasets Many technologies converging around Big Data theme Cloud Computing, NoSQL, Graph Analytics Biology is becoming increasingly data intensive Sequencing, imaging,
OpenMP and Performance
Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group [email protected] IT Center der RWTH Aachen University Tuning Cycle Performance Tuning aims to improve the runtime of an
High performance computing systems. Lab 1
High performance computing systems Lab 1 Dept. of Computer Architecture Faculty of ETI Gdansk University of Technology Paweł Czarnul For this exercise, study basic MPI functions such as: 1. for MPI management:
PUBLIC CLOUD USAGE TRENDS
PUBLIC CLOUD USAGE TRENDS 450 COMPANIES 165,000 INSTANCES 5.5 PB OF STORAGE FIRST QUARTER 2013 DAVID FEINLEIB UNDERWRITTEN BY thebigdatagroup.com Copyright 2013 The Big Data Group, LLC bigdatalandscape.com
Allinea Performance Reports User Guide. Version 6.0.6
Allinea Performance Reports User Guide Version 6.0.6 Contents Contents 1 1 Introduction 4 1.1 Online Resources...................................... 4 2 Installation 5 2.1 Linux/Unix Installation...................................
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline
High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach
High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach Beniamino Di Martino, Antonio Esposito and Andrea Barbato Department of Industrial and Information Engineering Second University of Naples
Getting OpenMP Up To Speed
1 Getting OpenMP Up To Speed Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline The
Course materials. In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful:
Course materials In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful: OpenCL C 1.2 Reference Card OpenCL C++ 1.2 Reference Card These cards will
Cloud Computing Summary and Preparation for Examination
Basics of Cloud Computing Lecture 8 Cloud Computing Summary and Preparation for Examination Satish Srirama Outline Quick recap of what we have learnt as part of this course How to prepare for the examination
Accelerating classical MD for multi-core CPUs and GPUs
Accelerating classical MD for multi-core CPUs and GPUs Dr. Axel Kohlmeyer Associate Dean for Scientific Computing College of Science and Technology Temple University, Philadelphia http://sites.google.com/site/akohlmey/
Getting Started with Hadoop with Amazon s Elastic MapReduce
Getting Started with Hadoop with Amazon s Elastic MapReduce Scott Hendrickson [email protected] http://drskippy.net/projects/emr-hadoopmeetup.pdf Boulder/Denver Hadoop Meetup 8 July 2010 Scott Hendrickson
Session 2: MUST. Correctness Checking
Center for Information Services and High Performance Computing (ZIH) Session 2: MUST Correctness Checking Dr. Matthias S. Müller (RWTH Aachen University) Tobias Hilbrich (Technische Universität Dresden)
An Introduction to Parallel Programming with OpenMP
An Introduction to Parallel Programming with OpenMP by Alina Kiessling E U N I V E R S I H T T Y O H F G R E D I N B U A Pedagogical Seminar April 2009 ii Contents 1 Parallel Programming with OpenMP 1
Parallel and Distributed Computing Programming Assignment 1
Parallel and Distributed Computing Programming Assignment 1 Due Monday, February 7 For programming assignment 1, you should write two C programs. One should provide an estimate of the performance of ping-pong
Experiences with HPC on Windows
Experiences with on Christian Terboven [email protected] aachen.de Center for Computing and Communication RWTH Aachen University Server Computing Summit 2008 April 7 11, HPI/Potsdam Experiences with on
Evaluation of CUDA Fortran for the CFD code Strukti
Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center
Parallel Programming with MPI on the Odyssey Cluster
Parallel Programming with MPI on the Odyssey Cluster Plamen Krastev Office: Oxford 38, Room 204 Email: [email protected] FAS Research Computing Harvard University Objectives: To introduce you
Scheduling in the Cloud
Scheduling in the Cloud Jon Weissman Distributed Computing Systems Group Department of CS&E University of Minnesota Introduction Cloud Context fertile platform for scheduling research re-think old problems
Cloud Computing. Chapter 1 Introducing Cloud Computing
Cloud Computing Chapter 1 Introducing Cloud Computing Learning Objectives Understand the abstract nature of cloud computing. Describe evolutionary factors of computing that led to the cloud. Describe virtualization
Data-Flow Awareness in Parallel Data Processing
Data-Flow Awareness in Parallel Data Processing D. Bednárek, J. Dokulil *, J. Yaghob, F. Zavoral Charles University Prague, Czech Republic * University of Vienna, Austria 6 th International Symposium on
On the Importance of Thread Placement on Multicore Architectures
On the Importance of Thread Placement on Multicore Architectures HPCLatAm 2011 Keynote Cordoba, Argentina August 31, 2011 Tobias Klug Motivation: Many possibilities can lead to non-deterministic runtimes...
C-Meter: A Framework for Performance Analysis of Computing Clouds
9th IEEE/ACM International Symposium on Cluster Computing and the Grid C-Meter: A Framework for Performance Analysis of Computing Clouds Nezih Yigitbasi, Alexandru Iosup, and Dick Epema Delft University
SYCL for OpenCL. Andrew Richards, CEO Codeplay & Chair SYCL Working group GDC, March 2014. Copyright Khronos Group 2014 - Page 1
SYCL for OpenCL Andrew Richards, CEO Codeplay & Chair SYCL Working group GDC, March 2014 Copyright Khronos Group 2014 - Page 1 Where is OpenCL today? OpenCL: supported by a very wide range of platforms
Parallelization of video compressing with FFmpeg and OpenMP in supercomputing environment
Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 231 237 doi: 10.14794/ICAI.9.2014.1.231 Parallelization of video compressing
Petascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp
Petascale Software Challenges William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges Why should you care? What are they? Which are different from non-petascale? What has changed since
Amazon Web Services (AWS) Setup Guidelines
Amazon Web Services (AWS) Setup Guidelines For CSE6242 HW3, updated version of the guidelines by Diana Maclean [Estimated time needed: 1 hour] Note that important steps are highlighted in yellow. What
OpenCL for programming shared memory multicore CPUs
Akhtar Ali, Usman Dastgeer and Christoph Kessler. OpenCL on shared memory multicore CPUs. Proc. MULTIPROG-212 Workshop at HiPEAC-212, Paris, Jan. 212. OpenCL for programming shared memory multicore CPUs
An Open-source Framework for Integrating Heterogeneous Resources in Private Clouds
An Open-source Framework for Integrating Heterogeneous Resources in Private Clouds Julio Proaño, Carmen Carrión and Blanca Caminero Albacete Research Institute of Informatics (I3A), University of Castilla-La
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014
Debugging in Heterogeneous Environments with TotalView ECMWF HPC Workshop 30 th October 2014 Agenda Introduction Challenges TotalView overview Advanced features Current work and future plans 2014 Rogue
5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model
5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model C99, C++, F2003 Compilers Optimizing Vectorizing Parallelizing Graphical parallel tools PGDBG debugger PGPROF profiler Intel, AMD, NVIDIA
