McMPI. Managed-code MPI library in Pure C# Dr D Holmes, EPCC [email protected]
|
|
|
- Hilda Doyle
- 9 years ago
- Views:
Transcription
1 McMPI Managed-code MPI library in Pure C# Dr D Holmes, EPCC [email protected]
2 Outline Yet another MPI library? Managed-code, C#, Windows McMPI, design and implementation details Object-orientation, design patterns, communication performance results Threads and the MPI Standard Pre- End Points proposal ideas
3 Why Implement MPI Again? Parallel program, distributed memory => MPI library Most (all?) MPI libraries written in C MPI Standard provides C and FORTRAN bindings C++ can use the C functions Other languages can follow the C++ model Use the C functions Alternatively, MPI can be implemented in that language Removes inter-language function call overheads but May not be possible to achieve comparable performance
4 Why Did I Choose C#? Experience and knowledge I gained from my career in software development My impression of the popularity of C# in commercial software development My desire to bridge the gap between high-performance programming and high-productivity programming One of the UK research councils offered me funding for a PhD that proposed to use C# to implement MPI
5 C# Myths C# only runs on Windows Not such a bad thing 3 of the Top500 machines use Windows Not actually true Mono works on multiple operating systems C# is a Microsoft language Not such a bad thing resources, commitment, support, training Not actually true C# follows ECMA and ISO standards C# is slow like Java Not such a bad thing expressivity, readability, re-usability Not actually true no easy way to prove this conclusively C# and its ilk are not things we need to care about Not such a bad thing they will survive/thrive, or not, without us Not actually true popularity trumps utility
6 McMPI Design & Implementation Desirable features of code Isolation of concerns -> easier to understand Human readability -> easier to maintain Compiler readability -> easier to get good performance Object-orientation can help with isolation of concerns So can modularisation and judiciously reducing LOC per code file Design patterns can help with human readability So can documentation and useful in-code comments Choice of language & compiler can help with performance So can coding style and detailed examination of compiler output What is the best compromise?
7 Communication Layer Abstract class factory design pattern Similar to plug-ins Enables addition of new functionality without re-compilation of the rest of the library All communication modules: Implement the same Abstract Device Interface (ADI) Isolate the details of their implementation from other layers Provide the same semantics and capabilities Reliable delivery Ordering of delivery Preservation of message boundaries Message = fixed size envelope information and variable size user data
8 Communication Layer UML
9 Protocol Layer Bridge design pattern Enables addition of new functionality without re-compilation of the rest of the library All protocol messages: Implement inherit from the same base class Isolate the details of their implementation from other layers Modify state of internal shared data structures independently Shared data structures (message queues ) Unexpected queue message envelope at receiver before receive Request queue receive called before message envelope arrival Matched queue at receiver waiting for message data to arrive Pending queue message data waiting at sender
10 Protocol Layer UML
11 Interface Layer Simple façade design pattern Translates MPI Standard-like syntax into protocol layer syntax Will become adapter design pattern For example, when custom data-types are implemented Current version of McMPI covers parts of MPI 1 only Initialisation and finalisation Administration functions, e.g. to get rank and size of communicator Point-to-point communication functions ready, synchronous, standard (not buffered) blocking, non-blocking, persistent Previous version had collectives Implemented on top of point-to-point Using hypercube or binary tree algorithms
12 McMPI Implementation Overview
13 Performance Results Introduction 1 Shared-memory results hardware details Number of Nodes: 1 Armari Magnetar server CPUs per Node: 2 Intel Xeon E5420 Threads per CPU: 4 Quad-core, no hyper-threading Core Clock Speed: 2.5GHz Front-side bus 1333MHz Level 1 Cache: 4x2x32KB Data & instruction per core Level 2 Cache: 2x6MB One per pair of cores Memory per Node: 16GB DDR2 667MHz Network Hardware: 2xNIC Intel 82575EB Gigabit Ethernet Operating System: WinXP Pro 64bit with SP3 version
14 Performance Results Introduction 2 Distributed-memory results hardware details Number of Nodes: 18 Dell PowerEdge 2900 CPUs per Node: 2 Intel Xeon 5130 Fam 6 mod 15 step 6 Threads per CPU: 2 Dual-core, no hyper-threading Core Clock Speed: 2.0GHz Front-side bus 1333MHz Level 1 Cache: 2x2x32KB Data & instruction per core Level 2 Cache: 1x4MB One per CPU Memory per Node: 4GB DDR2 533MHz Network Hardware: 2xNIC BCM5708C NetXtreme II GigE Operating System: Win2008 Server x64, SP2 version
15 Latency (µs) Shared-memory Latency MPICH2 Shared Memory MS-MPI Shared Memory McMPI thread-to-thread ,024 2,048 4,096 8,192 16,384 32,768 Message Size (bytes)
16 Bandwidth (Mbit/s) Shared-memory Bandwidth 70,000 60,000 50,000 40,000 30,000 20,000 McMPI thread-to-thread MPICH2 shared-memory MS-MPI shared-memory 10, ,096 8,192 16,384 32,768 65, , , ,288 1,048,576 Message Size (bytes)
17 Latency (µs) Distributed-memory Latency McMPI Eager MS-MPI ,024 2,048 4,096 8,192 16,384 32,768 Message Size (bytes)
18 Bandwidth (Mbit/s) Distributed-memory Bandwidth 1, McMPI Rendezvous McMPI Eager MS-MPI 0 4,096 8,192 16,384 32,768 65, , , ,288 1,048,576 Message Size (bytes)
19 Thread-as-rank Threading Level McMPI allows MPI_THREAD_AS_RANK as input for the MPI_INIT_THREAD function McMPI creates new threads during initialisation Not needed MPI_INIT_THREAD must be called enough times McMPI uses thread-local storage to store rank Not needed each communicator handle can encode rank Thread-to-thread message delivery is zero-copy Direct copy from user send buffer to user receive buffer Any thread can progress MPI messages
20 Thread-as-rank MPI Process Diagram created by Gaurav Saxena MSc, 2013
21 Thread-as-rank MPI Standard Is thread-as-rank compliant with the MPI Standard? Does the MPI Standard allow/support thread-as-rank? Ambiguous/debatable at best The MPI Standard assumes MPI process = OS process Call MPI_INIT or MPI_INIT_THREAD twice in one OS process Erroneous by definition or results in two MPI processes? MPI Standard thread compliant prohibits thread-as-rank To maintain a POSIX-process-like interface for MPI process End-points proposal violates this principle in exactly the same way Other possible interfaces exist
22 Thread-as-rank End-points Similarities Multiple threads can communicate reliably without using tags Thread rank can be stored in thread-local storage or handles Most common use-case likely requires MPI_THREAD_MULTIPLE Differences Thread-as-rank part of initialisation and active until finalisation End-points created after initialisation and can be destroyed Thread-as-rank has all possible ranks in MPI_COMM_WORLD End-points only has some ranks in MPI_COMM_WORLD Thread-as-rank cannot create ranks but may need to merge ranks End-points can create ranks and does not need to merge ranks
23 Thread-as-rank MPI Forum Proposal? Short answer: no Long answer: not yet, it s complicated More likely to be suggested amendments to end-points proposal Thread-as-rank is a special case of end-points Standard MPI_COMM_WORLD replaced with an end-points communicator during MPI_INIT_THREAD Thread-safety implications are similar (possibly identical?) Advantages/opportunities similar Thread-to-thread delivery rather than process-to-process delivery Work-stealing MPI progress engine or per-thread message queues
24 Questions?
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Boosting Data Transfer with TCP Offload Engine Technology
Boosting Data Transfer with TCP Offload Engine Technology on Ninth-Generation Dell PowerEdge Servers TCP/IP Offload Engine () technology makes its debut in the ninth generation of Dell PowerEdge servers,
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
Informatica Ultra Messaging SMX Shared-Memory Transport
White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade
benchmarking Amazon EC2 for high-performance scientific computing
Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received
1000Mbps Ethernet Performance Test Report 2014.4
1000Mbps Ethernet Performance Test Report 2014.4 Test Setup: Test Equipment Used: Lenovo ThinkPad T420 Laptop Intel Core i5-2540m CPU - 2.60 GHz 4GB DDR3 Memory Intel 82579LM Gigabit Ethernet Adapter CentOS
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality
The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
Intel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
Building an Inexpensive Parallel Computer
Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University
Symmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],
Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics
Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400
A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
Computer Systems Structure Input/Output
Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices
Lecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
Intel Pentium 4 Processor on 90nm Technology
Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended
Minimum Hardware Specifications Upgrades
Minimum Hardware Specifications Upgrades http://www.varian.com/hardwarespecs Eclipse TM treatment planning system Hardware V 11.0 1 TPS Version 11.0 Minimum Hardware Specifications [DELL OS supported upgrade
Kashif Iqbal - PhD [email protected]
HPC/HTC vs. Cloud Benchmarking An empirical evalua.on of the performance and cost implica.ons Kashif Iqbal - PhD [email protected] ICHEC, NUI Galway, Ireland With acknowledgment to Michele MicheloDo
Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team
Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC Wenhao Wu Program Manager Windows HPC team Agenda Microsoft s Commitments to HPC RDMA for HPC Server RDMA for Storage in Windows 8 Microsoft
Oracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
System Requirements Table of contents
Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5
An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008
An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008 Computer Science the study of algorithms, including Their formal and mathematical properties Their hardware realizations
Summary. Key results at a glance:
An evaluation of blade server power efficiency for the, Dell PowerEdge M600, and IBM BladeCenter HS21 using the SPECjbb2005 Benchmark The HP Difference The ProLiant BL260c G5 is a new class of server blade
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
LS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster
Toward a practical HPC Cloud : Performance tuning of a virtualized HPC cluster Ryousei Takano Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
Architecting High-Speed Data Streaming Systems. Sujit Basu
Architecting High-Speed Data Streaming Systems Sujit Basu stream ing [stree-ming] verb 1. The act of transferring data to or from an instrument at a rate high enough to sustain continuous acquisition or
An Oracle White Paper September 2013. Advanced Java Diagnostics and Monitoring Without Performance Overhead
An Oracle White Paper September 2013 Advanced Java Diagnostics and Monitoring Without Performance Overhead Introduction... 1 Non-Intrusive Profiling and Diagnostics... 2 JMX Console... 2 Java Flight Recorder...
Business white paper. HP Process Automation. Version 7.0. Server performance
Business white paper HP Process Automation Version 7.0 Server performance Table of contents 3 Summary of results 4 Benchmark profile 5 Benchmark environmant 6 Performance metrics 6 Process throughput 6
Building a Private Cloud with Eucalyptus
Building a Private Cloud with Eucalyptus 5th IEEE International Conference on e-science Oxford December 9th 2009 Christian Baun, Marcel Kunze KIT The cooperation of Forschungszentrum Karlsruhe GmbH und
Scaling Database Performance in Azure
Scaling Database Performance in Azure Results of Microsoft-funded Testing Q1 2015 2015 2014 ScaleArc. All Rights Reserved. 1 Test Goals and Background Info Test Goals and Setup Test goals Microsoft commissioned
Enabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12
XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines A.Zydroń 18 April 2009 Page 1 of 12 1. Introduction...3 2. XTM Database...4 3. JVM and Tomcat considerations...5 4. XTM Engine...5
IT Business Management System Requirements Guide
IT Business Management System Requirements Guide IT Business Management 8.0 This document supports the version of each product listed and supports all subsequent versions until the document is replaced
RDMA over Ethernet - A Preliminary Study
RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Outline Introduction Problem Statement
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
Virtual Machines. www.viplavkambli.com
1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software
Using the Windows Cluster
Using the Windows Cluster Christian Terboven [email protected] aachen.de Center for Computing and Communication RWTH Aachen University Windows HPC 2008 (II) September 17, RWTH Aachen Agenda o Windows Cluster
Figure 1A: Dell server and accessories Figure 1B: HP server and accessories Figure 1C: IBM server and accessories
TEST REPORT SEPTEMBER 2007 Out-of-box comparison between Dell, HP, and IBM servers Executive summary Dell Inc. (Dell) commissioned Principled Technologies (PT) to compare the out-of-box experience of a
High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
PARALLELS CLOUD SERVER
PARALLELS CLOUD SERVER Performance and Scalability 1 Table of Contents Executive Summary... Error! Bookmark not defined. LAMP Stack Performance Evaluation... Error! Bookmark not defined. Background...
Introduction to Web Services
Department of Computer Science Imperial College London CERN School of Computing (icsc), 2005 Geneva, Switzerland 1 Fundamental Concepts Architectures & escience example 2 Distributed Computing Technologies
Middleware Lou Somers
Middleware Lou Somers April 18, 2002 1 Contents Overview Definition, goals, requirements Four categories of middleware Transactional, message oriented, procedural, object Middleware examples XML-RPC, SOAP,
Networking Driver Performance and Measurement - e1000 A Case Study
Networking Driver Performance and Measurement - e1000 A Case Study John A. Ronciak Intel Corporation [email protected] Ganesh Venkatesan Intel Corporation [email protected] Jesse Brandeburg
Running applications on the Cray XC30 4/12/2015
Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes
A low-cost, connection aware, load-balancing solution for distributing Gigabit Ethernet traffic between two intrusion detection systems
Iowa State University Digital Repository @ Iowa State University Graduate Theses and Dissertations Graduate College 2010 A low-cost, connection aware, load-balancing solution for distributing Gigabit Ethernet
Power Comparison of Dell PowerEdge 2950 using Intel X5355 and E5345 Quad Core Xeon Processors
Power Comparison of Dell PowerEdge 2950 using Intel X5355 and E5345 Quad Core Xeon Processors By Scott Hanson and Todd Muirhead Dell Enterprise Technology Center Dell Enterprise Technology Center dell.com/techcenter
RightNow November 09 Workstation Specifications
RightNow November 09 Workstation Specifications This document includes the workstation specifications required for using RightNow November 09. Additional requirements for Outlook Integration, RightNow
Improved LS-DYNA Performance on Sun Servers
8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms
AMD Opteron Quad-Core
AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced
Determining Your Computer Resources
Determining Your Computer Resources There are a number of computer components that must meet certain requirements in order for your computer to perform effectively. This document explains how to check
Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.
Guest lecturer: David Hovemeyer November 15, 2004 The memory hierarchy Red = Level Access time Capacity Features Registers nanoseconds 100s of bytes fixed Cache nanoseconds 1-2 MB fixed RAM nanoseconds
WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed
WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between
ECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005
Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005 Compute Cluster Server Lab 3: Debugging the parallel MPI programs in Microsoft Visual Studio 2005... 1
Datacenter Operating Systems
Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major
High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs.
High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs. 0 Outline Motivation and background Issues on current
PCI Express High Speed Networks. Complete Solution for High Speed Networking
PCI Express High Speed Networks Complete Solution for High Speed Networking Ultra Low Latency Ultra High Throughput Maximizing application performance is a combination of processing, communication, and
The Bus (PCI and PCI-Express)
4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the
Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging
Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
3.4 Planning for PCI Express
3.4 Planning for PCI Express Evaluating Platforms for Performance and Reusability How many of you own a PC with PCIe slot? What about a PCI slot? 168 Advances in PC Bus Technology Do you remember this
Vocera Voice 4.3 and 4.4 Server Sizing Matrix
Vocera Voice 4.3 and 4.4 Server Sizing Matrix Vocera Server Recommended Configuration Guidelines Maximum Simultaneous Users 450 5,000 Sites Single Site or Multiple Sites Requires Multiple Sites Entities
IBM Europe Announcement ZG08-0232, dated March 11, 2008
IBM Europe Announcement ZG08-0232, dated March 11, 2008 IBM System x3450 servers feature fast Intel Xeon 2.80 GHz/1600 MHz, 3.0 GHz/1600 MHz, both with 12 MB L2, and 3.4 GHz/1600 MHz, with 6 MB L2 processors,
Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces
Software Engineering, Lecture 4 Decomposition into suitable parts Cross cutting concerns Design patterns I will also give an example scenario that you are supposed to analyse and make synthesis from The
64-Bit versus 32-Bit CPUs in Scientific Computing
64-Bit versus 32-Bit CPUs in Scientific Computing Axel Kohlmeyer Lehrstuhl für Theoretische Chemie Ruhr-Universität Bochum March 2004 1/25 Outline 64-Bit and 32-Bit CPU Examples
Comparative performance test Red Hat Enterprise Linux 5.1 and Red Hat Enterprise Linux 3 AS on Intel-based servers
Principled Technologies Comparative performance test Red Hat Enterprise Linux 5.1 and Red Hat Enterprise Linux 3 AS on Intel-based servers Principled Technologies, Inc. Agenda Overview System configurations
InterScan Web Security Virtual Appliance
InterScan Web Security Virtual Appliance Sizing Guide for version 6.0 July 2013 TREND MICRO INC. 10101 N. De Anza Blvd. Cupertino, CA 95014 www.trendmicro.com Toll free: +1 800.228.5651 Fax: +1 408.257.2003
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
Introduction to Hybrid Programming
Introduction to Hybrid Programming Hristo Iliev Rechen- und Kommunikationszentrum aixcelerate 2012 / Aachen 10. Oktober 2012 Version: 1.1 Rechen- und Kommunikationszentrum (RZ) Motivation for hybrid programming
JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers
JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers Dave Jaffe, PhD, Dell Inc. Michael Yuan, PhD, JBoss / RedHat June 14th, 2006 JBoss Inc. 2006 About us Dave Jaffe Works for Dell
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
COS 318: Operating Systems. I/O Device and Drivers. Input and Output. Definitions and General Method. Revisit Hardware
COS 318: Operating Systems I/O and Drivers Input and Output A computer s job is to process data Computation (, cache, and memory) Move data into and out of a system (between I/O devices and memory) Challenges
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
Accelerating High-Speed Networking with Intel I/O Acceleration Technology
White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing
Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer
10.2 Requirements for ShoreTel Enterprise Systems
10.2 Requirements for ShoreTel Enterprise Systems The ShoreTel Enterprise Edition system is scalable. For economy, ShoreTel Enterprise customers provide their own server hardware, allowing them to build
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
Microsoft Exchange Server 2003 Deployment Considerations
Microsoft Exchange Server 3 Deployment Considerations for Small and Medium Businesses A Dell PowerEdge server can provide an effective platform for Microsoft Exchange Server 3. A team of Dell engineers
ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009
ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory
Autodesk Revit 2016 Product Line System Requirements and Recommendations
Autodesk Revit 2016 Product Line System Requirements and Recommendations Autodesk Revit 2016, Autodesk Revit Architecture 2016, Autodesk Revit MEP 2016, Autodesk Revit Structure 2016 Minimum: Entry-Level
Legal Notices... 2. Introduction... 3
HP Asset Manager Asset Manager 5.10 Sizing Guide Using the Oracle Database Server, or IBM DB2 Database Server, or Microsoft SQL Server Legal Notices... 2 Introduction... 3 Asset Manager Architecture...
Effective Java Programming. efficient software development
Effective Java Programming efficient software development Structure efficient software development what is efficiency? development process profiling during development what determines the performance of
Programming Languages
Programming Languages Programming languages bridge the gap between people and machines; for that matter, they also bridge the gap among people who would like to share algorithms in a way that immediately
Configuring and using DDR3 memory with HP ProLiant Gen8 Servers
Engineering white paper, 2 nd Edition Configuring and using DDR3 memory with HP ProLiant Gen8 Servers Best Practice Guidelines for ProLiant servers with Intel Xeon processors Table of contents Introduction
