McMPI. Managed-code MPI library in Pure C# Dr D Holmes, EPCC [email protected]

Size: px
Start display at page:

Download "McMPI. Managed-code MPI library in Pure C# Dr D Holmes, EPCC [email protected]"

Transcription

1 McMPI Managed-code MPI library in Pure C# Dr D Holmes, EPCC [email protected]

2 Outline Yet another MPI library? Managed-code, C#, Windows McMPI, design and implementation details Object-orientation, design patterns, communication performance results Threads and the MPI Standard Pre- End Points proposal ideas

3 Why Implement MPI Again? Parallel program, distributed memory => MPI library Most (all?) MPI libraries written in C MPI Standard provides C and FORTRAN bindings C++ can use the C functions Other languages can follow the C++ model Use the C functions Alternatively, MPI can be implemented in that language Removes inter-language function call overheads but May not be possible to achieve comparable performance

4 Why Did I Choose C#? Experience and knowledge I gained from my career in software development My impression of the popularity of C# in commercial software development My desire to bridge the gap between high-performance programming and high-productivity programming One of the UK research councils offered me funding for a PhD that proposed to use C# to implement MPI

5 C# Myths C# only runs on Windows Not such a bad thing 3 of the Top500 machines use Windows Not actually true Mono works on multiple operating systems C# is a Microsoft language Not such a bad thing resources, commitment, support, training Not actually true C# follows ECMA and ISO standards C# is slow like Java Not such a bad thing expressivity, readability, re-usability Not actually true no easy way to prove this conclusively C# and its ilk are not things we need to care about Not such a bad thing they will survive/thrive, or not, without us Not actually true popularity trumps utility

6 McMPI Design & Implementation Desirable features of code Isolation of concerns -> easier to understand Human readability -> easier to maintain Compiler readability -> easier to get good performance Object-orientation can help with isolation of concerns So can modularisation and judiciously reducing LOC per code file Design patterns can help with human readability So can documentation and useful in-code comments Choice of language & compiler can help with performance So can coding style and detailed examination of compiler output What is the best compromise?

7 Communication Layer Abstract class factory design pattern Similar to plug-ins Enables addition of new functionality without re-compilation of the rest of the library All communication modules: Implement the same Abstract Device Interface (ADI) Isolate the details of their implementation from other layers Provide the same semantics and capabilities Reliable delivery Ordering of delivery Preservation of message boundaries Message = fixed size envelope information and variable size user data

8 Communication Layer UML

9 Protocol Layer Bridge design pattern Enables addition of new functionality without re-compilation of the rest of the library All protocol messages: Implement inherit from the same base class Isolate the details of their implementation from other layers Modify state of internal shared data structures independently Shared data structures (message queues ) Unexpected queue message envelope at receiver before receive Request queue receive called before message envelope arrival Matched queue at receiver waiting for message data to arrive Pending queue message data waiting at sender

10 Protocol Layer UML

11 Interface Layer Simple façade design pattern Translates MPI Standard-like syntax into protocol layer syntax Will become adapter design pattern For example, when custom data-types are implemented Current version of McMPI covers parts of MPI 1 only Initialisation and finalisation Administration functions, e.g. to get rank and size of communicator Point-to-point communication functions ready, synchronous, standard (not buffered) blocking, non-blocking, persistent Previous version had collectives Implemented on top of point-to-point Using hypercube or binary tree algorithms

12 McMPI Implementation Overview

13 Performance Results Introduction 1 Shared-memory results hardware details Number of Nodes: 1 Armari Magnetar server CPUs per Node: 2 Intel Xeon E5420 Threads per CPU: 4 Quad-core, no hyper-threading Core Clock Speed: 2.5GHz Front-side bus 1333MHz Level 1 Cache: 4x2x32KB Data & instruction per core Level 2 Cache: 2x6MB One per pair of cores Memory per Node: 16GB DDR2 667MHz Network Hardware: 2xNIC Intel 82575EB Gigabit Ethernet Operating System: WinXP Pro 64bit with SP3 version

14 Performance Results Introduction 2 Distributed-memory results hardware details Number of Nodes: 18 Dell PowerEdge 2900 CPUs per Node: 2 Intel Xeon 5130 Fam 6 mod 15 step 6 Threads per CPU: 2 Dual-core, no hyper-threading Core Clock Speed: 2.0GHz Front-side bus 1333MHz Level 1 Cache: 2x2x32KB Data & instruction per core Level 2 Cache: 1x4MB One per CPU Memory per Node: 4GB DDR2 533MHz Network Hardware: 2xNIC BCM5708C NetXtreme II GigE Operating System: Win2008 Server x64, SP2 version

15 Latency (µs) Shared-memory Latency MPICH2 Shared Memory MS-MPI Shared Memory McMPI thread-to-thread ,024 2,048 4,096 8,192 16,384 32,768 Message Size (bytes)

16 Bandwidth (Mbit/s) Shared-memory Bandwidth 70,000 60,000 50,000 40,000 30,000 20,000 McMPI thread-to-thread MPICH2 shared-memory MS-MPI shared-memory 10, ,096 8,192 16,384 32,768 65, , , ,288 1,048,576 Message Size (bytes)

17 Latency (µs) Distributed-memory Latency McMPI Eager MS-MPI ,024 2,048 4,096 8,192 16,384 32,768 Message Size (bytes)

18 Bandwidth (Mbit/s) Distributed-memory Bandwidth 1, McMPI Rendezvous McMPI Eager MS-MPI 0 4,096 8,192 16,384 32,768 65, , , ,288 1,048,576 Message Size (bytes)

19 Thread-as-rank Threading Level McMPI allows MPI_THREAD_AS_RANK as input for the MPI_INIT_THREAD function McMPI creates new threads during initialisation Not needed MPI_INIT_THREAD must be called enough times McMPI uses thread-local storage to store rank Not needed each communicator handle can encode rank Thread-to-thread message delivery is zero-copy Direct copy from user send buffer to user receive buffer Any thread can progress MPI messages

20 Thread-as-rank MPI Process Diagram created by Gaurav Saxena MSc, 2013

21 Thread-as-rank MPI Standard Is thread-as-rank compliant with the MPI Standard? Does the MPI Standard allow/support thread-as-rank? Ambiguous/debatable at best The MPI Standard assumes MPI process = OS process Call MPI_INIT or MPI_INIT_THREAD twice in one OS process Erroneous by definition or results in two MPI processes? MPI Standard thread compliant prohibits thread-as-rank To maintain a POSIX-process-like interface for MPI process End-points proposal violates this principle in exactly the same way Other possible interfaces exist

22 Thread-as-rank End-points Similarities Multiple threads can communicate reliably without using tags Thread rank can be stored in thread-local storage or handles Most common use-case likely requires MPI_THREAD_MULTIPLE Differences Thread-as-rank part of initialisation and active until finalisation End-points created after initialisation and can be destroyed Thread-as-rank has all possible ranks in MPI_COMM_WORLD End-points only has some ranks in MPI_COMM_WORLD Thread-as-rank cannot create ranks but may need to merge ranks End-points can create ranks and does not need to merge ranks

23 Thread-as-rank MPI Forum Proposal? Short answer: no Long answer: not yet, it s complicated More likely to be suggested amendments to end-points proposal Thread-as-rank is a special case of end-points Standard MPI_COMM_WORLD replaced with an end-points communicator during MPI_INIT_THREAD Thread-safety implications are similar (possibly identical?) Advantages/opportunities similar Thread-to-thread delivery rather than process-to-process delivery Work-stealing MPI progress engine or per-thread message queues

24 Questions?

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Boosting Data Transfer with TCP Offload Engine Technology

Boosting Data Transfer with TCP Offload Engine Technology Boosting Data Transfer with TCP Offload Engine Technology on Ninth-Generation Dell PowerEdge Servers TCP/IP Offload Engine () technology makes its debut in the ninth generation of Dell PowerEdge servers,

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering

More information

Informatica Ultra Messaging SMX Shared-Memory Transport

Informatica Ultra Messaging SMX Shared-Memory Transport White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade

More information

benchmarking Amazon EC2 for high-performance scientific computing

benchmarking Amazon EC2 for high-performance scientific computing Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received

More information

1000Mbps Ethernet Performance Test Report 2014.4

1000Mbps Ethernet Performance Test Report 2014.4 1000Mbps Ethernet Performance Test Report 2014.4 Test Setup: Test Equipment Used: Lenovo ThinkPad T420 Laptop Intel Core i5-2540m CPU - 2.60 GHz 4GB DDR3 Memory Intel 82579LM Gigabit Ethernet Adapter CentOS

More information

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality

More information

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology 3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related

More information

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,

More information

Intel DPDK Boosts Server Appliance Performance White Paper

Intel DPDK Boosts Server Appliance Performance White Paper Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks

More information

Building an Inexpensive Parallel Computer

Building an Inexpensive Parallel Computer Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University

More information

Symmetric Multiprocessing

Symmetric Multiprocessing Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

D1.2 Network Load Balancing

D1.2 Network Load Balancing D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],

More information

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics

Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

More information

Intel Pentium 4 Processor on 90nm Technology

Intel Pentium 4 Processor on 90nm Technology Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended

More information

Minimum Hardware Specifications Upgrades

Minimum Hardware Specifications Upgrades Minimum Hardware Specifications Upgrades http://www.varian.com/hardwarespecs Eclipse TM treatment planning system Hardware V 11.0 1 TPS Version 11.0 Minimum Hardware Specifications [DELL OS supported upgrade

More information

Kashif Iqbal - PhD [email protected]

Kashif Iqbal - PhD Kashif.iqbal@ichec.ie HPC/HTC vs. Cloud Benchmarking An empirical evalua.on of the performance and cost implica.ons Kashif Iqbal - PhD [email protected] ICHEC, NUI Galway, Ireland With acknowledgment to Michele MicheloDo

More information

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team

Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC Wenhao Wu Program Manager Windows HPC team Agenda Microsoft s Commitments to HPC RDMA for HPC Server RDMA for Storage in Windows 8 Microsoft

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

System Requirements Table of contents

System Requirements Table of contents Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5

More information

An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008

An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008 An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008 Computer Science the study of algorithms, including Their formal and mathematical properties Their hardware realizations

More information

Summary. Key results at a glance:

Summary. Key results at a glance: An evaluation of blade server power efficiency for the, Dell PowerEdge M600, and IBM BladeCenter HS21 using the SPECjbb2005 Benchmark The HP Difference The ProLiant BL260c G5 is a new class of server blade

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster

Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster Ryousei Takano Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Architecting High-Speed Data Streaming Systems. Sujit Basu

Architecting High-Speed Data Streaming Systems. Sujit Basu Architecting High-Speed Data Streaming Systems Sujit Basu stream ing [stree-ming] verb 1. The act of transferring data to or from an instrument at a rate high enough to sustain continuous acquisition or

More information

An Oracle White Paper September 2013. Advanced Java Diagnostics and Monitoring Without Performance Overhead

An Oracle White Paper September 2013. Advanced Java Diagnostics and Monitoring Without Performance Overhead An Oracle White Paper September 2013 Advanced Java Diagnostics and Monitoring Without Performance Overhead Introduction... 1 Non-Intrusive Profiling and Diagnostics... 2 JMX Console... 2 Java Flight Recorder...

More information

Business white paper. HP Process Automation. Version 7.0. Server performance

Business white paper. HP Process Automation. Version 7.0. Server performance Business white paper HP Process Automation Version 7.0 Server performance Table of contents 3 Summary of results 4 Benchmark profile 5 Benchmark environmant 6 Performance metrics 6 Process throughput 6

More information

Building a Private Cloud with Eucalyptus

Building a Private Cloud with Eucalyptus Building a Private Cloud with Eucalyptus 5th IEEE International Conference on e-science Oxford December 9th 2009 Christian Baun, Marcel Kunze KIT The cooperation of Forschungszentrum Karlsruhe GmbH und

More information

Scaling Database Performance in Azure

Scaling Database Performance in Azure Scaling Database Performance in Azure Results of Microsoft-funded Testing Q1 2015 2015 2014 ScaleArc. All Rights Reserved. 1 Test Goals and Background Info Test Goals and Setup Test goals Microsoft commissioned

More information

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed and Cloud Computing Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading

More information

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12 XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines A.Zydroń 18 April 2009 Page 1 of 12 1. Introduction...3 2. XTM Database...4 3. JVM and Tomcat considerations...5 4. XTM Engine...5

More information

IT Business Management System Requirements Guide

IT Business Management System Requirements Guide IT Business Management System Requirements Guide IT Business Management 8.0 This document supports the version of each product listed and supports all subsequent versions until the document is replaced

More information

RDMA over Ethernet - A Preliminary Study

RDMA over Ethernet - A Preliminary Study RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement

More information

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Virtual Machines. www.viplavkambli.com

Virtual Machines. www.viplavkambli.com 1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software

More information

Using the Windows Cluster

Using the Windows Cluster Using the Windows Cluster Christian Terboven [email protected] aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster

More information

Figure 1A: Dell server and accessories Figure 1B: HP server and accessories Figure 1C: IBM server and accessories

Figure 1A: Dell server and accessories Figure 1B: HP server and accessories Figure 1C: IBM server and accessories TEST REPORT SEPTEMBER 2007 Out-of-box comparison between Dell, HP, and IBM servers Executive summary Dell Inc. (Dell) commissioned Principled Technologies (PT) to compare the out-of-box experience of a

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

PARALLELS CLOUD SERVER

PARALLELS CLOUD SERVER PARALLELS CLOUD SERVER Performance and Scalability 1 Table of Contents Executive Summary... Error! Bookmark not defined. LAMP Stack Performance Evaluation... Error! Bookmark not defined. Background...

More information

Introduction to Web Services

Introduction to Web Services Department of Computer Science Imperial College London CERN School of Computing (icsc), 2005 Geneva, Switzerland 1 Fundamental Concepts Architectures & escience example 2 Distributed Computing Technologies

More information

Middleware Lou Somers

Middleware Lou Somers Middleware Lou Somers April 18, 2002 1 Contents Overview Definition, goals, requirements Four categories of middleware Transactional, message oriented, procedural, object Middleware examples XML-RPC, SOAP,

More information

Networking Driver Performance and Measurement - e1000 A Case Study

Networking Driver Performance and Measurement - e1000 A Case Study Networking Driver Performance and Measurement - e1000 A Case Study John A. Ronciak Intel Corporation [email protected] Ganesh Venkatesan Intel Corporation [email protected] Jesse Brandeburg

More information

Running applications on the Cray XC30 4/12/2015

Running applications on the Cray XC30 4/12/2015 Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes

More information

A low-cost, connection aware, load-balancing solution for distributing Gigabit Ethernet traffic between two intrusion detection systems

A low-cost, connection aware, load-balancing solution for distributing Gigabit Ethernet traffic between two intrusion detection systems Iowa State University Digital Repository @ Iowa State University Graduate Theses and Dissertations Graduate College 2010 A low-cost, connection aware, load-balancing solution for distributing Gigabit Ethernet

More information

Power Comparison of Dell PowerEdge 2950 using Intel X5355 and E5345 Quad Core Xeon Processors

Power Comparison of Dell PowerEdge 2950 using Intel X5355 and E5345 Quad Core Xeon Processors Power Comparison of Dell PowerEdge 2950 using Intel X5355 and E5345 Quad Core Xeon Processors By Scott Hanson and Todd Muirhead Dell Enterprise Technology Center Dell Enterprise Technology Center dell.com/techcenter

More information

RightNow November 09 Workstation Specifications

RightNow November 09 Workstation Specifications RightNow November 09 Workstation Specifications This document includes the workstation specifications required for using RightNow November 09. Additional requirements for Outlook Integration, RightNow

More information

Improved LS-DYNA Performance on Sun Servers

Improved LS-DYNA Performance on Sun Servers 8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms

More information

AMD Opteron Quad-Core

AMD Opteron Quad-Core AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced

More information

Determining Your Computer Resources

Determining Your Computer Resources Determining Your Computer Resources There are a number of computer components that must meet certain requirements in order for your computer to perform effectively. This document explains how to check

More information

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer. Guest lecturer: David Hovemeyer November 15, 2004 The memory hierarchy Red = Level Access time Capacity Features Registers nanoseconds 100s of bytes fixed Cache nanoseconds 1-2 MB fixed RAM nanoseconds

More information

WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed

WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005

Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1

More information

Datacenter Operating Systems

Datacenter Operating Systems Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major

More information

High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs.

High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs. High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs. 0 Outline Motivation and background Issues on current

More information

PCI Express High Speed Networks. Complete Solution for High Speed Networking

PCI Express High Speed Networks. Complete Solution for High Speed Networking PCI Express High Speed Networks Complete Solution for High Speed Networking Ultra Low Latency Ultra High Throughput Maximizing application performance is a combination of processing, communication, and

More information

The Bus (PCI and PCI-Express)

The Bus (PCI and PCI-Express) 4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the

More information

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

3.4 Planning for PCI Express

3.4 Planning for PCI Express 3.4 Planning for PCI Express Evaluating Platforms for Performance and Reusability How many of you own a PC with PCIe slot? What about a PCI slot? 168 Advances in PC Bus Technology Do you remember this

More information

Vocera Voice 4.3 and 4.4 Server Sizing Matrix

Vocera Voice 4.3 and 4.4 Server Sizing Matrix Vocera Voice 4.3 and 4.4 Server Sizing Matrix Vocera Server Recommended Configuration Guidelines Maximum Simultaneous Users 450 5,000 Sites Single Site or Multiple Sites Requires Multiple Sites Entities

More information

IBM Europe Announcement ZG08-0232, dated March 11, 2008

IBM Europe Announcement ZG08-0232, dated March 11, 2008 IBM Europe Announcement ZG08-0232, dated March 11, 2008 IBM System x3450 servers feature fast Intel Xeon 2.80 GHz/1600 MHz, 3.0 GHz/1600 MHz, both with 12 MB L2, and 3.4 GHz/1600 MHz, with 6 MB L2 processors,

More information

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces Software Engineering, Lecture 4 Decomposition into suitable parts Cross cutting concerns Design patterns I will also give an example scenario that you are supposed to analyse and make synthesis from The

More information

64-Bit versus 32-Bit CPUs in Scientific Computing

64-Bit versus 32-Bit CPUs in Scientific Computing 64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples

More information

Comparative performance test Red Hat Enterprise Linux 5.1 and Red Hat Enterprise Linux 3 AS on Intel-based servers

Comparative performance test Red Hat Enterprise Linux 5.1 and Red Hat Enterprise Linux 3 AS on Intel-based servers Principled Technologies Comparative performance test Red Hat Enterprise Linux 5.1 and Red Hat Enterprise Linux 3 AS on Intel-based servers Principled Technologies, Inc. Agenda Overview System configurations

More information

InterScan Web Security Virtual Appliance

InterScan Web Security Virtual Appliance InterScan Web Security Virtual Appliance Sizing Guide for version 6.0 July 2013 TREND MICRO INC. 10101 N. De Anza Blvd. Cupertino, CA 95014 www.trendmicro.com Toll free: +1 800.228.5651 Fax: +1 408.257.2003

More information

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable

More information

Introduction to Hybrid Programming

Introduction to Hybrid Programming Introduction to Hybrid Programming Hristo Iliev Rechen- und Kommunikationszentrum aixcelerate 2012 / Aachen 10. Oktober 2012 Version: 1.1 Rechen- und Kommunikationszentrum (RZ) Motivation for hybrid programming

More information

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers Dave Jaffe, PhD, Dell Inc. Michael Yuan, PhD, JBoss / RedHat June 14th, 2006 JBoss Inc. 2006 About us Dave Jaffe Works for Dell

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

COS 318: Operating Systems. I/O Device and Drivers. Input and Output. Definitions and General Method. Revisit Hardware

COS 318: Operating Systems. I/O Device and Drivers. Input and Output. Definitions and General Method. Revisit Hardware COS 318: Operating Systems I/O and Drivers Input and Output A computer s job is to process data Computation (, cache, and memory) Move data into and out of a system (between I/O devices and memory) Challenges

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Accelerating High-Speed Networking with Intel I/O Acceleration Technology White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

10.2 Requirements for ShoreTel Enterprise Systems

10.2 Requirements for ShoreTel Enterprise Systems 10.2 Requirements for ShoreTel Enterprise Systems The ShoreTel Enterprise Edition system is scalable. For economy, ShoreTel Enterprise customers provide their own server hardware, allowing them to build

More information

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

Microsoft Exchange Server 2003 Deployment Considerations

Microsoft Exchange Server 2003 Deployment Considerations Microsoft Exchange Server 3 Deployment Considerations for Small and Medium Businesses A Dell PowerEdge server can provide an effective platform for Microsoft Exchange Server 3. A team of Dell engineers

More information

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009 ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory

More information

Autodesk Revit 2016 Product Line System Requirements and Recommendations

Autodesk Revit 2016 Product Line System Requirements and Recommendations Autodesk Revit 2016 Product Line System Requirements and Recommendations Autodesk Revit 2016, Autodesk Revit Architecture 2016, Autodesk Revit MEP 2016, Autodesk Revit Structure 2016 Minimum: Entry-Level

More information

Legal Notices... 2. Introduction... 3

Legal Notices... 2. Introduction... 3 HP Asset Manager Asset Manager 5.10 Sizing Guide Using the Oracle Database Server, or IBM DB2 Database Server, or Microsoft SQL Server Legal Notices... 2 Introduction... 3 Asset Manager Architecture...

More information

Effective Java Programming. efficient software development

Effective Java Programming. efficient software development Effective Java Programming efficient software development Structure efficient software development what is efficiency? development process profiling during development what determines the performance of

More information

Programming Languages

Programming Languages Programming Languages Programming languages bridge the gap between people and machines; for that matter, they also bridge the gap among people who would like to share algorithms in a way that immediately

More information

Configuring and using DDR3 memory with HP ProLiant Gen8 Servers

Configuring and using DDR3 memory with HP ProLiant Gen8 Servers Engineering white paper, 2 nd Edition Configuring and using DDR3 memory with HP ProLiant Gen8 Servers Best Practice Guidelines for ProLiant servers with Intel Xeon processors Table of contents Introduction

More information