InfiniBand, PCI Express, and Intel Xeon Processors with Extended Memory 64 Technology (Intel EM64T)
|
|
- Daniel Maximilian Lester
- 7 years ago
- Views:
Transcription
1 White Paper InfiniBand, PCI Express, and Intel Xeon Processors with Extended Memory 64 Technology (Intel EM64T) Towards a Perfectly Balanced Computing Architecture 1.0 The Problem The performance and efficiency of applications attracts a significant amount of attention from computing users and suppliers. This is easy to understand - users want to understand the productivity and return on investment of systems they are considering to purchase. However, manufacturers will often develop special and costly hardware, or tune an existing platform to peak the performance for a given application. Often the customer receives an efficient solution for his specific application only to discover: Performance is not broadly applicable to other applications or even different datasets The system cost is high and costly to maintain The solution is not very scalable beyond the delivered configuration In seeking solutions to this problem, there have been many lessons learned about application efficiency, including concepts regarding the balance of CPU throughput, memory performance, and I/O subsystem performance Stender Way, Santa Clara, CA Tel: Fax: Document Number 2279WP
2 InfiniHost III Ex HCA Architecture System Balance in Computing Platforms The ultimate solution to this scenario is to achieve a broad range of application performance and efficiency out of a low cost, high volume, and Industry Standard computing architecture. This is a key motivation behind Industry Standards like PCI Express and InfiniBand and the subject of this paper. 2.0 System Balance in Computing Platforms System balance is the ability of the system to maximize processor productivity to feed the compute processors demand for data. For example, parallel applications such as simulation and modeling, are compute and communication intensive. These applications must perform millions of calculations and then exchange intermediate results before beginning another iteration. To maximize processor performance, this exchange of data must be fast enough to prevent the compute processor from sitting idle, waiting for data. Thus, the faster the processors, the greater the bandwidth required and the lower the latency that can be tolerated. In addition, different applications stress different aspects of the computing system. A balanced system takes into account the needs of the applications and matches memory, I/O and interconnect performance with computing power to keep the processors fed. Therefore it is clear that system and application efficiency are influenced by three basic elements of the computing architecture: 1. Central Processing Unit (CPU) throughput 2. Memory sub-system performance 3. Input/Output (I/O) sub-system performance A weakness in any one of these three legs results in a potentially crippling degradation in efficiency and overall platform performance. Thus it is critical to architect a platform perfectly balanced in each of these three important elements. For example, the implicit Finite Element Analysis (FEA) used in the dynamic simulation of automotive or aerospace structures is very demanding of memory bandwidth, requiring, on average, 1.2 bytes per floating point operation performed. In these applications, processors will waste many idle cycles waiting for memory access on systems where memory bandwidth is not matched to CPU performance. This mismatch translates to longer run times and fewer jobs that can be processed. A poorly balanced 128 processor cluster may deliver the same performance as a 64 processor system, wasting expensive capital, computing, and management resources. Another example is that over the past few decades, standard I/O technologies have not kept pace with improvements in CPU and memory performance, creating a system imbalance which impacts overall platform performance and scalability. For all parallel applications this is double trouble considering that the I/O subsystem usually has double duty: both clustering and storage traffic will stress the I/O channel to keep the CPU fully utilized. This has resulted in an I/O bottleneck which limits the overall system performance achievable and demands a platform architecture upgrade. 3.0 Industry Solutions to Improve System Balance Fortunately, several years ago, key system and I/O architects recognized this looming unbalanced platform scenario and developed several key new technologies including 64 bit addressing, InfiniBand, and PCI Express (PCIe) I/O interfaces to address these potential limitations. The result is a platform upgrade that delivers a highly balanced compute architecture achieved with the combination of: Intel Xeon Processors Intel Extended Memory 64 Technology (EM64T) and DDR2 SDRAM memory 2
3 InfiniHost III Ex HCA Architecture Amdahl s Law and the Weakest Link I/O Subsystem with InfiniBand and PCI Express 4.0 Amdahl s Law and the Weakest Link Amdahl s law (named for Gene Amdahl) is one of the most basic concepts in computer architecture that points out the importance of having a balanced system architecture. Amdahl s law states that the performance gain that can be obtained by improving some portion (sub-system) of a computer is limited by the fraction of time that this sub-system contributes to the overall processing task. Mathematically the law can be expressed as: Speedup = Two/Tw (EQ 1) Two = Execution time for entire task without the sub-system improvement Tw = Execution time with from a single sub-system improvement Thus speedup represents how much faster the task will run with the improved sub-system enhancement. What is important to recognize is that if the contribution of one element of the overall system dominates the total execution time, than performance improvements in the other two components will have little effect on the overall performance. This is fairly intuitive as if one sub-system contributes 95% of the total run time, it does not make sense to expend effort to optimize the performance of the sub-systems that contribute only the remaining 5% of run time. Instead it makes sense to focus on the weakest link. An example helps to make this clearer. Consider the case of a distributed (clustered) database with the entire database image being distributed across 16 nodes. Oracle 10g Grid Database is a good example of this type of distributed system and recognizes significantly improved price/performance vs. typical big box symmetrical multi-processing (SMP) machines. Oracle distributes data across all the nodes in the cluster with their Cache Fusion architecture. A typical low level operation in this type of architecture requires a given node to fetch a large (say 64KByte) block of data from another machine, store the data in memory, and perform some processing on the data (ex: search for the largest value in a record of such values). Fundamentally then there are three elements which contribute to the overall compute time: 1. Get the data from the other node (I/O) 2. Fetch the data from memory 3. Process the data and generate a result Consider the following cluster architecture: Processor: Intel 2.8GHz Xeon CPUs Chipset/Memory: DDR 300MHz, 128 bit wide I/O: PCI-X and Gigabit Ethernet (later we will compare to a system utilizing Intel EM64T, PCI Express and Infini- Band) The first task is to get the data from the other node. With an I/O sub-system based on PCI-X and Gigabit Ethernet it requires on the order of 1100us to transfer 64K Bytes of data. The next step is to get the data from memory. Assuming the memory operating at 300MHz data rate with a 128 bit wide bus width and 50% bus efficiency (conservative) than the data can be fetched in about 27us. Finally the data is processed by the CPU. This amount of work done in this step is variable and highly dependent on the actual processing task at hand, but for concreteness, assume that the algorithm being performed requires to the order of 3 instructions per byte. For a 2.8 Ghz processor the processing contribution is thus about 70us. 3
4 InfiniHost III Ex HCA Architecture Limited Performance Gains in an Un-Balanced Platform Thus the total execution time is: Sub-System I/O Memory Processing Total Execution Time Execution Time Contribution us 27.3us 70.2us us Clearly the first term dominates the total execution time. 5.0 Limited Performance Gains in an Un-Balanced Platform Now consider the speedup achieved when the CPU clock frequency is increased from 2.8GHz to 3.4GHz. While this represents a substantial improvement in CPU performance, the overall run time performance is actually fairly small. The data is summarized as: Sub-System I/O Memory Processing Total Execution Time Execution Time Contribution us 27.3us 57.8us us Thus the overall run time improves by only about 1% despite the significantly higher improvement in CPU clock frequency. Similarly a performance boost in memory transfer rate from 300MHz to 400MHz results in only a 0.6% overall performance improvement. Combining both the CPU speedup and the memory bandwidth speedup results in a paltry 1.6% improvement in overall execution time. Despite substantial improvements in both CPU and memory performance the overall performance is barely improved. Clearly this relatively small speedup is a result of focusing on improvements in the CPU and memory sub-systems without addressing the largest contribution to overall run time - in this case the I/O contribution. For other applications which are compute rather than I/O intensive, the processing contribution might dominate and considerably better speedup would be achieved. Nonetheless for a very large class of applications (such as the clustered database application described here) I/O is extremely important. More importantly servers are general purpose machines and one can never know exactly which applications they will be required to support. Thus it is vital that the entire system architecture is balanced so that significant performance gains in one sub-system actually result in substantial overall speedup. 6.0 Upgrading the Platform with PCI Express and InfiniBand Clearly the I/O component dominates the overall run time. Fortunately system architects from Intel and the leading server vendors recognized the need for improved I/O performance and defined new technologies such as InfiniBand and PCI Express to address the I/O bottleneck and Intel EM64T to expand the performance and scalability of the memory sub-system. New server platforms with PCI Express and Intel EM64T memory sub-systems are beginning to ship from major server vendors and Mellanox is now shipping an 8x PCI Express version of the InfiniHost HCA that matches perfectly with 4
5 InfiniHost III Ex HCA Architecture Upgrading the Platform with PCI Express and InfiniBand these platforms. The HCA features both, the new 8x PCI Express interface and a new HCA device that features improved caching as well as greater I/O operations that transparently improve application speed. Typical Server Architecture PCI Express Architecture Front Side Bus Processor Front Side Bus XEON with EM64T Technology Chip Set Bus Memory Controller Mem MEM Memory Controller DDR2 MEM DDR2 MEM DDR2 MemMEM Shared PCIX Bus PCIX Bridge InfiniBand HCA IO Device InfiniBand HCA Dual 10Gb/s IB Links Dual 10Gb/s InfiniBand* Links One Fat Serial Pipe System architecture advancements with PCI Express and InfiniBand both simplify and improve performance of next generations servers. The new server platform architectures actually get I/O closer to the CPU and memory sub-system. Previous generations of server architectures required that data traversed PCI-X I/O bridges to reach the CPU and memory sub-system. With Infini- Band and PCI Express there is one fat serial pipe directly between servers to the CPU and memory sub-systems. This results in reduced chip count and complexity and improves both bandwidth and latency, and as will be shown overall system level balance and performance. This improvement in I/O performance is achievable simply by adding an InfiniBand Host Channel Adapter card to an 8X enabled PCI Express server platform. Both of these are industry standard components available as off-the-shelf products from multiple system vendors. The Supermicro Platform shown in Figure 1, Supermicro Platform 6014H-82-1U DP High Performance Server with PCI Express Slot, on page 6, featuring 3.4 GHz Xeon CPU, E7520 chipset, 4GB of memory, 8X PCI Express slot with Dual port InfiniBand HCA, is a good example of this architecture. 5
6 InfiniHost III Ex HCA Architecture Upgrading the Platform with PCI Express and InfiniBand Figure 1. Supermicro Platform 6014H-82-1U DP High Performance Server with PCI Express Slot Intel CPUs Dual Xeon with EM64T 8X PCI Express Slot DDR2 Memory Figure 2. The InfiniHost III Ex device based Low Profile 8x PCI Express HCA Adapter Card 6
7 InfiniHost III Ex HCA Architecture Balanced Platform Speedup with Intel EM64T, PCI Express, and InfiniBand With InfiniBand and PCI-X, the delivered I/O bandwidth performance improves more than eight times that of Gigabit Ethernet. With dual InfiniBand ports and PCI Express, the performance improvement is even more dramatic and achieves over 20Gbs of net delivered data bandwidth. Obviously the combination of InfiniBand and PCI Express yields impressive bandwidth performance improvements to the system architecture, which translates to improved block transfer latency. 7.0 Balanced Platform Speedup with Intel EM64T, PCI Express, and InfiniBand Thus the stage is set for a platform upgrade that adds PCI Express and InfiniBand resulting in considerably better I/O performance. The combination of PCI Express and InfiniBand delivers effective I/O bandwidth of over 900MB/sec for 64K Byte blocks! This blistering bandwidth cuts the time to fetch the 64K Byte block from a remote node from us to 72.8us. Examining the overall performance with the I/O upgrade to InfiniBand and PCI Express yields: Sub-System I/O (InfiniBand & PCI Express) Memory (300MHz/128bits) Processing (2.8GHz CPU) Total Execution Time Execution Time Contribution 72.8us 27.3us 70.2us 170.3us Now we re talking! The total execution time has been reduced from ~1190us to ~170us or about 86%! Clearly focusing on the largest contribution to the total execution time yields impressive speedup. But even better the platform is now balanced so that speedup of the other sub-systems will generate substantial additional performance improvements. Now, with the PCI Express and InfiniBand I/O sub-system, consider upgrading the CPU performance from 2.8GHz to 3.4GHz: Sub-System I/O (InfiniBand & PCI Express) Memory (300MHz/128bits) Processing (3.4GHz CPU) Total Execution Time Execution Time Contribution 72.8us 27.3us 57.8us 157.9us 7
8 InfiniHost III Ex HCA Architecture No Software Hurdles By moving to the faster CPU, the total execution time is reduced by 12.4us for a overall performance improvement of 7.8%. Similarly, moving to the faster 400MHz memory sub-system further reduces the run time to 151.1us for an overall performance improvement of 11.3% from the baseline with PCI Express and InfiniBand I/O sub-system. Clearly, the move to PCI Express and InfiniBand yields the biggest performance increase, but by re-establishing a balanced system architecture this move also allows additional performance gains to be achieved by increasing the performance of other system components. Clearly together, the Intel Xeon processor with EM64T, and PCI Express and InfiniBand I/O sub-system achieve the ideal of the balanced computing platform architecture. 8.0 No Software Hurdles Frequently, the adoption of new platform architectures is slowed significantly by the requirements for new software. Fortunately in this case, the move to PCI Express is completely transparent. The InfiniBand driver software is structured to transparently migrate from PCI-X to PCI Express while providing complete backwards compatibility. Furthermore, the InfiniBand software is fully interoperable so that heterogeneous clusters using both PCI-X and PCI Express platforms can be created. Therefore, the investments made by Mellanox as well as customers, in software drivers, APIs and applications, are preserved and readily usable for the new platforms. 9.0 Summary A balanced computing architecture must address equally the triad of processing, memory and I/O in order to achieve optimized overall system performance. Neglecting any one element means that performance gains in the other two elements are squandered and do not result in significant overall performance improvements. The combination of new high performance Xeon processors including higher bandwidth Intel EM64T memory technology, and a dramatically improved I/O sub-system with PCI Express and InfiniBand yield this balanced platform. Once balance has been returned to the overall system architecture performance gains in each of the elements yields substantial performance gains in the overall performance. It is expected that advanced processors and memory sub-systems will be able to continue to advance Moore s law and deliver increasing clock speed and performance. Fortunately, the combination of InfiniBand and PCI Express has re-balanced the platform such that these advantages actually deliver benefits at the system level. Furthermore, both InfiniBand and PCI Express have a roadmap to continue to increase performance (with double data rate and even quad data rate signalling rates, fatter pipes, etc). In short, the industry is delivering new processor, memory, I/O technologies just in time to keep the steady advance in cost-effective system level performance marching along. Mellanox, InfiniBridge, InfiniHost and InfiniScale are registered trademarks of Mellanox Technologies, Inc. InfiniBand (TM/SM) is a trademark and service mark of the InfiniBand Trade Association. All other trademarks are claimed by their respective owners. 8
Understanding PCI Bus, PCI-Express and In finiband Architecture
White Paper Understanding PCI Bus, PCI-Express and In finiband Architecture 1.0 Overview There is some confusion in the market place concerning the replacement of the PCI Bus (Peripheral Components Interface)
More informationInfiniBand vs Fibre Channel Throughput. Figure 1. InfiniBand vs 2Gb/s Fibre Channel Single Port Storage Throughput to Disk Media
InfiniBand Storage The Right Solution at the Right Time White Paper InfiniBand vs Fibre Channel Throughput Storage Bandwidth (Mbytes/sec) 800 700 600 500 400 300 200 100 0 FibreChannel InfiniBand Figure
More informationAchieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building
More informationSolving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
More informationSockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
More informationLS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
More informationInfiniBand -- Industry Standard Data Center Fabric is Ready for Prime Time
White Paper InfiniBand -- Industry Standard Data Center Fabric is Ready for Prime Time December 2005 Server and storage clusters benefit today from industry-standard InfiniBand s price, performance, stability,
More informationInfiniBand Clustering
White Paper InfiniBand Clustering Delivering Better Price/Performance than Ethernet 1.0 Introduction High performance computing clusters typically utilize Clos networks, more commonly known as Fat Tree
More informationECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
More informationEDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation
PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationPCI Express IO Virtualization Overview
Ron Emerick, Oracle Corporation Author: Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and
More informationPCI Express and Storage. Ron Emerick, Sun Microsystems
Ron Emerick, Sun Microsystems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature
More informationCut I/O Power and Cost while Boosting Blade Server Performance
April 2009 Cut I/O Power and Cost while Boosting Blade Server Performance 1.0 Shifting Data Center Cost Structures... 1 1.1 The Need for More I/O Capacity... 1 1.2 Power Consumption-the Number 1 Problem...
More informationRecommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
More informationI/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology
I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology Reduce I/O cost and power by 40 50% Reduce I/O real estate needs in blade servers through consolidation Maintain
More informationPCI Technology Overview
PCI Technology Overview February 2003 February 2003 Page 1 Agenda History and Industry Involvement Technology Information Conventional PCI PCI-X 1.0 2.0 PCI Express Other Digi Products in PCI/PCI-X environments
More informationPCI Express* Ethernet Networking
White Paper Intel PRO Network Adapters Network Performance Network Connectivity Express* Ethernet Networking Express*, a new third-generation input/output (I/O) standard, allows enhanced Ethernet network
More informationConfiguring and using DDR3 memory with HP ProLiant Gen8 Servers
Engineering white paper, 2 nd Edition Configuring and using DDR3 memory with HP ProLiant Gen8 Servers Best Practice Guidelines for ProLiant servers with Intel Xeon processors Table of contents Introduction
More informationInfrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
More informationAchieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
More informationExploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
More information- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
More informationA Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
More informationPerformance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers
WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that
More informationUsing PCI Express Technology in High-Performance Computing Clusters
Using Technology in High-Performance Computing Clusters Peripheral Component Interconnect (PCI) Express is a scalable, standards-based, high-bandwidth I/O interconnect technology. Dell HPC clusters use
More informationMaximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.
Filename: SAS - PCI Express Bandwidth - Infostor v5.doc Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI Article for InfoStor November 2003 Paul Griffith Adaptec, Inc. Server
More informationRemoving Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
More informationOracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationPerformance Beyond PCI Express: Moving Storage to The Memory Bus A Technical Whitepaper
: Moving Storage to The Memory Bus A Technical Whitepaper By Stephen Foskett April 2014 2 Introduction In the quest to eliminate bottlenecks and improve system performance, the state of the art has continually
More informationWindows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
More informationComparing the performance of the Landmark Nexus reservoir simulator on HP servers
WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus
More informationDDR3 memory technology
DDR3 memory technology Technology brief, 3 rd edition Introduction... 2 DDR3 architecture... 2 Types of DDR3 DIMMs... 2 Unbuffered and Registered DIMMs... 2 Load Reduced DIMMs... 3 LRDIMMs and rank multiplication...
More informationTechnology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips
Technology Update White Paper High Speed RAID 6 Powered by Custom ASIC Parity Chips High Speed RAID 6 Powered by Custom ASIC Parity Chips Why High Speed RAID 6? Winchester Systems has developed High Speed
More informationWhite Paper. Enhancing Storage Performance and Investment Protection Through RAID Controller Spanning
White Paper Enhancing Storage Performance and Investment Protection Through RAID Controller Spanning May 2005 Introduction It is no surprise that the rapid growth of data within enterprise networks is
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationIntel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationMS EXCHANGE SERVER ACCELERATION IN VMWARE ENVIRONMENTS WITH SANRAD VXL
MS EXCHANGE SERVER ACCELERATION IN VMWARE ENVIRONMENTS WITH SANRAD VXL Dr. Allon Cohen Eli Ben Namer info@sanrad.com 1 EXECUTIVE SUMMARY SANRAD VXL provides enterprise class acceleration for virtualized
More informationIntroduction to PCI Express Positioning Information
Introduction to PCI Express Positioning Information Main PCI Express is the latest development in PCI to support adapters and devices. The technology is aimed at multiple market segments, meaning that
More informationEvaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array
Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array Evaluation report prepared under contract with Lenovo Executive Summary Even with the price of flash
More informationServer: Performance Benchmark. Memory channels, frequency and performance
KINGSTON.COM Best Practices Server: Performance Benchmark Memory channels, frequency and performance Although most people don t realize it, the world runs on many different types of databases, all of which
More informationConfiguring Memory on the HP Business Desktop dx5150
Configuring Memory on the HP Business Desktop dx5150 Abstract... 2 Glossary of Terms... 2 Introduction... 2 Main Memory Configuration... 3 Single-channel vs. Dual-channel... 3 Memory Type and Speed...
More informationOpenSPARC T1 Processor
OpenSPARC T1 Processor The OpenSPARC T1 processor is the first chip multiprocessor that fully implements the Sun Throughput Computing Initiative. Each of the eight SPARC processor cores has full hardware
More informationBuilding a Scalable Storage with InfiniBand
WHITE PAPER Building a Scalable Storage with InfiniBand The Problem...1 Traditional Solutions and their Inherent Problems...2 InfiniBand as a Key Advantage...3 VSA Enables Solutions from a Core Technology...5
More informationWhitepaper. Implementing High-Throughput and Low-Latency 10 Gb Ethernet for Virtualized Data Centers
Implementing High-Throughput and Low-Latency 10 Gb Ethernet for Virtualized Data Centers Implementing High-Throughput and Low-Latency 10 Gb Ethernet for Virtualized Data Centers Introduction Adoption of
More informationIntel Xeon Processor E5-2600
Intel Xeon Processor E5-2600 Best combination of performance, power efficiency, and cost. Platform Microarchitecture Processor Socket Chipset Intel Xeon E5 Series Processors and the Intel C600 Chipset
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationIntel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
More informationWhite Paper Solarflare High-Performance Computing (HPC) Applications
Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters
More informationQuiz for Chapter 6 Storage and Other I/O Topics 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [6 points] Give a concise answer to each
More informationCluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking
Cluster Grid Interconects Tony Kay Chief Architect Enterprise Grid and Networking Agenda Cluster Grid Interconnects The Upstart - Infiniband The Empire Strikes Back - Myricom Return of the King 10G Gigabit
More informationDisk Storage Shortfall
Understanding the root cause of the I/O bottleneck November 2010 2 Introduction Many data centers have performance bottlenecks that impact application performance and service delivery to users. These bottlenecks
More informationThe Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment
The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment Introduction... 2 Virtualization addresses key challenges facing IT today... 2 Introducing Virtuozzo... 2 A virtualized environment
More informationIntel PCI and PCI Express*
Intel PCI and PCI Express* PCI Express* keeps in step with an evolving industry The technology vision for PCI and PCI Express* From the first Peripheral Component Interconnect (PCI) specification through
More informationThe Bus (PCI and PCI-Express)
4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the
More informationImproving Grid Processing Efficiency through Compute-Data Confluence
Solution Brief GemFire* Symphony* Intel Xeon processor Improving Grid Processing Efficiency through Compute-Data Confluence A benchmark report featuring GemStone Systems, Intel Corporation and Platform
More informationHETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
More informationDell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III
White Paper Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III Performance of Microsoft SQL Server 2008 BI and D/W Solutions on Dell PowerEdge
More informationAdvanced Core Operating System (ACOS): Experience the Performance
WHITE PAPER Advanced Core Operating System (ACOS): Experience the Performance Table of Contents Trends Affecting Application Networking...3 The Era of Multicore...3 Multicore System Design Challenges...3
More informationThe Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage
The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...
More informationStovepipes to Clouds. Rick Reid Principal Engineer SGI Federal. 2013 by SGI Federal. Published by The Aerospace Corporation with permission.
Stovepipes to Clouds Rick Reid Principal Engineer SGI Federal 2013 by SGI Federal. Published by The Aerospace Corporation with permission. Agenda Stovepipe Characteristics Why we Built Stovepipes Cluster
More informationImproved LS-DYNA Performance on Sun Servers
8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms
More informationSoftware-defined Storage at the Speed of Flash
TECHNICAL BRIEF: SOFTWARE-DEFINED STORAGE AT THE SPEED OF... FLASH..................................... Intel SSD Data Center P3700 Series and Symantec Storage Foundation with Flexible Storage Sharing
More informationPARALLELS CLOUD SERVER
PARALLELS CLOUD SERVER Performance and Scalability 1 Table of Contents Executive Summary... Error! Bookmark not defined. LAMP Stack Performance Evaluation... Error! Bookmark not defined. Background...
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationSCI Briefing: A Review of the New Hitachi Unified Storage and Hitachi NAS Platform 4000 Series. Silverton Consulting, Inc.
SCI Briefing: A Review of the New Hitachi Unified Storage and Hitachi NAS Platform 4000 Series Silverton Consulting, Inc. StorInt Briefing Written by: Ray Lucchesi, President and Founder Published: July,
More informationOPTIMIZING SERVER VIRTUALIZATION
OPTIMIZING SERVER VIRTUALIZATION HP MULTI-PORT SERVER ADAPTERS BASED ON INTEL ETHERNET TECHNOLOGY As enterprise-class server infrastructures adopt virtualization to improve total cost of ownership (TCO)
More informationRoCE vs. iwarp Competitive Analysis
WHITE PAPER August 21 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...4 Summary...
More informationCan High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
More informationMellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct
Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 Direct Increased Performance, Scaling and Resiliency July 2012 Motti Beck, Director, Enterprise Market Development Motti@mellanox.com
More informationThe Benefits of Virtualizing
T E C H N I C A L B R I E F The Benefits of Virtualizing Aciduisismodo Microsoft SQL Dolore Server Eolore in Dionseq Hitachi Storage Uatummy Environments Odolorem Vel Leveraging Microsoft Hyper-V By Heidi
More informationFault Tolerant Servers: The Choice for Continuous Availability
Fault Tolerant Servers: The Choice for Continuous Availability This paper discusses today s options for achieving continuous availability and how NEC s Express5800/ft servers can provide every company
More informationIntel Itanium Architecture
Intel Itanium Architecture Roadmap and Technology Update Dr. Gernot Hoyler Technical Marketing EMEA Intel Itanium Architecture Growth MARKET Over 3x revenue growth Y/Y* More than 10x growth* in shipments
More informationDell PowerEdge Servers 2009 - Memory
Dell PowerEdge Servers 2009 - Memory A Dell Technical White Paper By Paul Benson Dell Enterprise Development February 2009 1 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL
More informationWhite Paper. Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor
White Paper Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor Introduction ADL Embedded Solutions newly introduced PCIe/104 ADLQM67 platform is the first ever PC/104 form factor board to
More informationAdvances in Virtualization In Support of In-Memory Big Data Applications
9/29/15 HPTS 2015 1 Advances in Virtualization In Support of In-Memory Big Data Applications SCALE SIMPLIFY OPTIMIZE EVOLVE Ike Nassi Ike.nassi@tidalscale.com 9/29/15 HPTS 2015 2 What is the Problem We
More informationLustre Networking BY PETER J. BRAAM
Lustre Networking BY PETER J. BRAAM A WHITE PAPER FROM CLUSTER FILE SYSTEMS, INC. APRIL 2007 Audience Architects of HPC clusters Abstract This paper provides architects of HPC clusters with information
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationHP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads
HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads Gen9 Servers give more performance per dollar for your investment. Executive Summary Information Technology (IT) organizations face increasing
More informationNetworking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
More informationServer Virtualization: Avoiding the I/O Trap
Server Virtualization: Avoiding the I/O Trap How flash memory arrays and NFS caching helps balance increasing I/O loads of virtualized servers November 2010 2 Introduction Many companies see dramatic improvements
More informationBuilding High-Performance iscsi SAN Configurations. An Alacritech and McDATA Technical Note
Building High-Performance iscsi SAN Configurations An Alacritech and McDATA Technical Note Building High-Performance iscsi SAN Configurations An Alacritech and McDATA Technical Note Internet SCSI (iscsi)
More information3G Converged-NICs A Platform for Server I/O to Converged Networks
White Paper 3G Converged-NICs A Platform for Server I/O to Converged Networks This document helps those responsible for connecting servers to networks achieve network convergence by providing an overview
More informationNew!! - Higher performance for Windows and UNIX environments
New!! - Higher performance for Windows and UNIX environments The IBM TotalStorage Network Attached Storage Gateway 300 (NAS Gateway 300) is designed to act as a gateway between a storage area network (SAN)
More informationSUN ORACLE EXADATA STORAGE SERVER
SUN ORACLE EXADATA STORAGE SERVER KEY FEATURES AND BENEFITS FEATURES 12 x 3.5 inch SAS or SATA disks 384 GB of Exadata Smart Flash Cache 2 Intel 2.53 Ghz quad-core processors 24 GB memory Dual InfiniBand
More informationEMC XtremSF: Delivering Next Generation Performance for Oracle Database
White Paper EMC XtremSF: Delivering Next Generation Performance for Oracle Database Abstract This white paper addresses the challenges currently facing business executives to store and process the growing
More information3.2 Limitations of Software Transports
1.0 Preface InfiniBand and TCP in the Data Center White Paper The InfiniBand Architecture is designed to allow streamlined operation of enterprise and internet data centers by creating a fabric that allows
More informationECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009
ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory
More informationMicrosoft Office SharePoint Server 2007 Performance on VMware vsphere 4.1
Performance Study Microsoft Office SharePoint Server 2007 Performance on VMware vsphere 4.1 VMware vsphere 4.1 One of the key benefits of virtualization is the ability to consolidate multiple applications
More informationMS Exchange Server Acceleration
White Paper MS Exchange Server Acceleration Using virtualization to dramatically maximize user experience for Microsoft Exchange Server Allon Cohen, PhD Scott Harlin OCZ Storage Solutions, Inc. A Toshiba
More informationPedraforca: ARM + GPU prototype
www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of
More informationThe virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.
White Paper Virtualized SAP: Optimize Performance with Cisco Data Center Virtual Machine Fabric Extender and Red Hat Enterprise Linux and Kernel-Based Virtual Machine What You Will Learn The virtualization
More informationPerformance Optimization Guide
Performance Optimization Guide Publication Date: July 06, 2016 Copyright Metalogix International GmbH, 2001-2016. All Rights Reserved. This software is protected by copyright law and international treaties.
More informationAccelerating Data Compression with Intel Multi-Core Processors
Case Study Predictive Enterprise Intel Xeon processors Intel Server Board Embedded technology Accelerating Data Compression with Intel Multi-Core Processors Data Domain incorporates Multi-Core Intel Xeon
More informationSX1012: High Performance Small Scale Top-of-Rack Switch
WHITE PAPER August 2013 SX1012: High Performance Small Scale Top-of-Rack Switch Introduction...1 Smaller Footprint Equals Cost Savings...1 Pay As You Grow Strategy...1 Optimal ToR for Small-Scale Deployments...2
More informationPCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation
PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies
More informationVirtualization of the MS Exchange Server Environment
MS Exchange Server Acceleration Maximizing Users in a Virtualized Environment with Flash-Powered Consolidation Allon Cohen, PhD OCZ Technology Group Introduction Microsoft (MS) Exchange Server is one of
More informationx64 Servers: Do you want 64 or 32 bit apps with that server?
TMurgent Technologies x64 Servers: Do you want 64 or 32 bit apps with that server? White Paper by Tim Mangan TMurgent Technologies February, 2006 Introduction New servers based on what is generally called
More informationIBM Europe Announcement ZG08-0232, dated March 11, 2008
IBM Europe Announcement ZG08-0232, dated March 11, 2008 IBM System x3450 servers feature fast Intel Xeon 2.80 GHz/1600 MHz, 3.0 GHz/1600 MHz, both with 12 MB L2, and 3.4 GHz/1600 MHz, with 6 MB L2 processors,
More information