Comparing the performance of the Landmark Nexus reservoir simulator on HP servers
|
|
|
- Rebecca Watts
- 10 years ago
- Views:
Transcription
1 WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services
2
3 SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Including a comparison with the VIP reservoir simulator S. Crockett, Landmark, and S. Devere, HP Introduction is paper discusses the results of a benchmarking study conducted with Landmark Nexus reservoir simulation so ware to determine its performance characteristics on a variety of HP ProLiant x86_64 servers, and to provide recommendations for server configurations appropriate for optimal Nexus so ware performance. e tests and configurations were chosen to show likely performance results for a wide variety of server hardware. If specific datasets or configurations are of interest, see the For more information section. Configurations Tested e computer systems used in this study were clusters of server nodes connected by a high-speed network or by Gigabit Ethernet. Each server node was configured with two processors, which is the most common server configuration used for highperformance computing. Servers with quadcore processors from both Intel and AMD were tested, as were servers with dual-core Intel processors. HP does not currently manufacture servers with dual-core AMD processors, as these are no longer part of AMD s current processor product line. Both rack-mount and blade servers were tested. e HP ProLiant DL server line was used for rack-mount servers; the HP BladeSystem c-class portfolio was used for blade servers, with HP ProLiant BL blades. Processor speed Both Intel and AMD provide ranges of processors with similar architectures, but different clock speeds. Rather than test every possible processor, a representative was chosen from each processor class, not with the highest clock speed, but rather one with a reasonable price and power utilization relative to its performance. For example, for testing the HP ProLiant BL460c, the 3.0GHz Intel Xeon 5450 processor was chosen over the 3.16GHz Intel Xeon 5460 processor, since the processor performance difference was not significant relative to the price difference or the power consumption difference. Network e Nexus reservoir simulator is a parallel application that uses HP-MPI to communicate between processes on multiple servers. Parallel runs of the Nexus simulator require the input model data to be decomposed into sub-domains. e Nexus model chosen for this testing had 64 sub-domains, which allowed performance to be measured using up to 64 cores. Most of the clusters we evaluated had 32 cores, meaning we used eight servers when testing dual-core processors, and four servers when testing quad-core processors. We performed some runs on clusters with 64 cores; these clusters had eight servers, each with two quad-core processors. Gigabit Ethernet, IB (InfiniBand) DDR (double data rate) ConnectX, and 10 Gigabit Ethernet were tested and compared.
4 Memory Tested configurations had 2 GB/core, which was suitable for these tests. Most Nexus data sets will be able to run on a similarly configured cluster with no swapping to disk. e servers with dual-core processors were configured with eight 1GB memory DIMMS; those with quad-core processors used eight 2GB memory DIMMS. All used 667MHz DIMMS (DDR-PC5300). File system Since I/O generally is not a determining factor in the performance of Nexus simulation runs, two striped 15K-RPM disks attached locally on each server were used to store all files used and generated by the tests. Software Nexus R500.1 and HP-MPI v applications were used. Landmark Nexus so ware enables fully implicit, fully coupled surface-to-subsurface simulation for a comprehensive look at an oil reservoir. e Nexus reservoir simulator couples the surface network model with the subsurface model in a way that allows a simultaneous solution of the two models, which is computed both faster and more accurately than a loosely coupled solution. Furthermore, multiple reservoirs can be modeled simultaneously with a shared surface network, which is necessary to accurately model many of today s off-shore oil developments. HP-MPI is the only messagepassing interface supported by Landmark for the R5000 release of Nexus so ware and VIP so ware, and is provided for both Linux and Windows operating systems. Specific test configurations Servers with dual-core processors are annotated as 2p4c, signifying two processors with a total of four cores. Servers with quadcore processors are marked as 2p8c. HP BladeSystem server names begin with BL, while rack-mount model names begin with DL. HP ProLiant model Landmark Data Set e spe10_64grids data set was used for these benchmarks. is data is derived from Model 2 of the Tenth Society of Petroleum Engineers (SPE) Comparative Solution Project, in which a waterflood of a large geostatistical model was modeled. e reservoir modeled had the characteristics of a Brent sequence, in which the upper 70 feet represent the Tarbert formation, and the lower 100 feet represent the Upper Ness formation. e model had slightly more than 1.1 million cells, of which 766,000 were active. e grid was decomposed into 64 subgrids, which allowed the model to be run in parallel using up to 64 cores. e model was run to simulate 2,000 days of production from four wells, with a fi h well as a water injector; all five wells were vertical. Conditions in the reservoir were maintained such that no free gas was present in the reservoir throughout the entire run. Performance Summary Processor model (Code name) DL160 G5 Intel Xeon 5272 ( Wolfdale ) BL460c G1, DL360 G5 or DL380 G5 BL465c G5 and DL165 G5 BL465c G5 and DL165 G5 Intel Xeon 5160 ( Woodcrest ) AMD Opteron 2384 ( Shanghai ) AMD Opteron 2356 ( Barcelona ) DL160 G5 Intel Xeon 5472 ( Harpertown ) BL460c G5 Intel Xeon 5450 ( Harpertown ) Processor Speed Server Config 3.4 GHz 2p4c 1, GHz 2p4c 1, GHz 2p8c n/a 2.3 GHz 2p8c n/a 3.0 GHz 2p8c 1, GHz 2p8c 1,333 Front-side bus speed (MHz) People involved in purchasing decisions for HPC systems o en use different criteria to determine the best solution for their needs. Some find that the solution with the shortest runtime is best. Others prefer a solution in which the ratio of run speed to system cost is maximized. Still others are most concerned with parallel scalability. Since these are all valid ways to characterize the performance of a solution, all will be presented below, first in summary, then in detail. As newer servers become available, this report will be updated, but these fundamental methods of looking at the data will always apply. Nexus software: Breakthrough performance e Nexus simulator was designed with performance and parallel scalability as important features. is differentiates it from most commercially available reservoir simulators, in which performance was o en sacrificed as features were added, or in which parallel capabilities were added long a er the original commercial release. is 2
5 focus on performance has led Nexus so ware to be faster than other simulators by a wide margin; for some models, Nexus simulation runs require less than one-fi h the time of other simulators. Nexus models with a single reservoir and simple wells with a minimal surface network generally do not show as much performance improvement as models with multiple reservoirs and more complex surface networks. However, even for simpler cases, such as the benchmark data chosen for this study, Nexus simulation so ware can provide a significant performance benefit. is can be seen in Figure 1, which shows Nexus so ware to be more than three times faster on this benchmark model than Landmark s VIP reservoir simulator, for the tested range of process counts. Nexus software vs. VIP Relative Performance for 4-, 8-, 16- and 32-way parallel, IB DDR HP-MPI; Test: spe10_64 grids Relative Performance (bigger is better for Nexus) HP DL160 G5 Xeon GHz IB DDR ConnectX 2p4c HP-MPI HP DL160 G5 Xeon GHz IB DDR ConnectX 2p8c HP-MPI Number of cores Figure 1: Comparison of Landmark s two reservoir simulators, at various process counts Absolute performance comparison Looking at absolute performance is useful to those requiring best throughput. Of the servers using Xeon processors, the HP ProLiant DL160 G5 Xeon dual-core 5272 processor performed best. Of the servers using AMD Opteron processors, the ProLiant BL465 G5 and DL165 G5 Opteron 2384 solution performed best. Price-Performance Ratio e price-performance ratio shows which solution delivers the best performance when scaled by the U.S. list price of the server cluster. is analysis does not include the cost of the so ware licenses, so it exaggerates the importance of the price difference between clusters. e BL465 G5 and DL165 G5 with AMD 2356 processors using Gigabit Ethernet (GigE) had the best priceperformance ratio, followed by the BL465 G5 and DL165 G5 with AMD 2384 ( Shanghai ) processors using both IB DDR and GigE. Scaling and network Here we compare the relative performance of multiple servers to a single server, to see the benefit of using the cluster for large parallel runs. Scalability is most important when the users workflow gives them the option of running multiple serial jobs, a smaller number of parallel jobs, or a mix. Better scalability improves the relative efficiency of including a higher fraction of parallel jobs in the workflow. High-speed interconnects such as InfiniBand are recommended for optimal scalability when running Nexus so ware. e benefit of a high-speed interconnect can be seen in the comparisons between GigE, InfiniBand, and 10 GbE. Benchmark Results and Discussion General overview ere is no performance difference between blade systems and rack-mount servers, if the internal components are identical. ere are some HP blade products that do not have equivalent rack-mount servers; the converse is true as well. For example, the ProLiant DL160 G5 rack mount servers with Xeon 5472 and 5272 processors have frontside bus speeds of 1,600 MHz, which are not offered in the blade BL460c counterpart. When an application is demanding on the memory subsystem, as Nexus so ware is, there is a benefit to the faster front-side bus. is can be seen when comparing the results of the Xeon 5472 processor to the Xeon 5450 processor, both of which are 3.0- GHz quad-core processors; the front-side bus of the blade system (1,333 MHz) is slower than that of the rack-mount server (1,600 MHz), and although the processor clock and the other components are identical, the server with the faster frontside bus performs much better on the Nexus simulator benchmark run. Multi-core processor suitability for Nexus simulation e benchmark runs presented here have been performed with all cores on the blades or servers in use, except for cases where the number of Nexus processes is less than the number of cores available in a single node. Runs using all cores in a node, however, also 3
6 Relative Performance must share access to memory across the cores. All the cores in a single processor share the same path to memory; a quad-core processor, having more cores, gives each core a smaller fraction of that path than a dual-core processor does. us, when all cores are in use, the older quad-core Harpertown processor had poorer performance for Nexus so ware than two Wolfdale dual-core processors with the same processor speed. Because Nexus simulator performance is so dependent upon access to memory, as noted above, servers with the highest memory bandwidth per core perform best. One way to increase per-core performance on a server built with quad-core processors is to reduce the load on the server by using only half the cores in the system, effectively treating each quad-core processor as though Number of servers used it were dual core. Timing results from runs done in this manner are shown in Figure 2. e le chart shows the relative performance of runs using a fixed number of servers, either half loaded or fully loaded; the number of cores used in the fully loaded case is double the number used in the halfloaded case. Under these conditions, the runs with more cores are always faster, but doubling the number of cores used results in only minimal performance gain. e right chart shows the relative performance of two sets of runs using the same number of active cores. For one set of runs, the minimal number of servers was used, with all cores of each processor in use, while, for the other, each processor had only half its cores used, which requires doubling the number of servers. e set of runs using more servers but only partially loading them had much better performance than the set of runs Nexus software Half-loaded vs. fully loaded processors on DL160 G5 Xeon GHz using one, two, four, eight servers, IB DDR ConnectX HP-MPI; Test: spe10_64 grids 2 cores per processor (half-loaded) 4 cores per processor (fully loaded) Relative Performance Number of cores Figure 2: Comparison of half-loaded processors to fullyloaded processors: e time to run a job is always reduced by adding more cores (le chart), but there is a benefit to spreading those cores across additional servers if they are available (right chart). using fully loaded servers. e benefit of running using only half the cores of a server will vary, depending on the server s architecture. e results shown are for the DL160 G5 Xeon 5472 Harpertown ; similar tests on the BL465c Opteron 2384 Shanghai processor saw less improvement between the two sets of runs the runs with twice the number of half-loaded servers were only percent faster than the runs using fully loaded servers. Absolute performance results Figures 3 and 4 show the absolute performance of the servers tested, relative to the BL460c Xeon GHz cluster. For clusters using InfiniBand (specifically IB DDR ConnectX), at all tested process counts, the best performing server node is the DL160 G5 (or BL460 G5) with 3.4GHz Xeon 5272 dual-core processors. When Gigabit Ethernet is the private network for MPI communications, this same server node is the fastest for all tested cases except for runs with eight parallel processes. e reason for this exception is that an eightway parallel job fits in a single node for all the other systems tested, but not for this cluster with dual-core processors. e cluster interconnect is thus being used in the eight-way run only on the cluster built from dual-core processors. e off-node communications take relatively more time for this cluster than do intra-node communications on the other clusters, which decreases the relative performance of this cluster. Note that for higher process counts, where all clusters are using multiple nodes, the cluster built from dual-core processors again leads in performance. x Price-performance ratio e price data used for this calculation is the 4
7 Nexus software BL460c Xeon GHz relative performance for 8-, 16- and 32-way parallel, IB DDR Connect X, HP-MPI; Test: spe10_64 grids Relative Performance HP BL460c G5 Xeon GHz IB DDR ConnectX 2p8c HP-MPI HP DL160 G5 Xeon GHz IB DDR ConnectX 2p8c HP-MPI HP DL165 G5 Opteron GHz IB DDR ConnectX 2p8c HP-MPI HP BL465 G5 Opteron GHz IB DDR ConnectX 2p8c HP-MPI HP DL160 G5 Xeon GHz IB DDR ConnectX 2p4c HP-MPI Number of cores Figure 3: Comparison of HP server relative performance using IB, with BL460c Xeon 5450 as a baseline. Nexus software BL460c Xeon GHz relative performance for 8-, 16- and 32-way parallel, Gigabit Ethernet, HP-MPI; Test: spe10_64 grids Relative Performance HP BL460c G5 Xeon GHz 2p8c HP-MPI HP DL160 G5 Xeon GHz 2p8c HP-MPI HP DL165 G5 Opteron GHz 2p8c HP-MPI HP BL465 G5 Opteron GHz 2p8c HP-MPI HP DL160 G5 Xeon GHz 2p4c HP-MPI Number of cores Figure 4: Comparison of HP server relative performance using Gigabit Ethernet, with BL460c Xeon 5450 as a baseline. 5
8 U.S. list price of the cluster hardware as of November e price for blade configurations includes the entire enclosure, while, for rack-mount servers, it includes one 42U rack. Red Hat Linux with 9x5 oneyear tech support is also included in the price. is analysis does not include the cost of Nexus so ware licenses. is leads to more weight being given to hardware pricing differences than an actual user would see. e head node for Xeon-based servers is a ProLiant DL380 G5 Xeon 5450 processor, and, for Opteron-based servers it is a DL385 G5 AMD 2356 processor, regardless of the configuration used in the compute nodes. e clusters with the best price performance are built from DL165 G5 server nodes using either Opteron 2384 or 2356 processors, and use a Gigabit Ethernet interconnect for MPI traffic. e DL165 G5 Opteron 2356 is used as the baseline in Figure 5, with other servers shown relative to it. Scaling In a multi-user environment where the users have jobs of widely varying size and duration, the ability to add processors to a job in an efficient manner allows a fixed hardware resource to be used effectively. Large jobs can be submitted in parallel to work around memory limitations; additionally, jobs of long duration can be run more quickly in parallel. If the overall solution hardware, so ware and data does not scale well, the overall effectiveness of the solution will be limited when large or long-duration jobs must be run. When reviewing scalability information, the absolute performance results shown in Figures 3 and 4 must also be considered. A high-performing server may scale less effectively than a lower-performing server, and thereby be less effective as a part of a cluster. Price Performance (lower is better) Nexus software: Price-performance ratio for 32-way parallel HP-MPI; spe10_64 grids Figure 5: Price-performance ratio of a range of 32-core clusters of servers. Scaling relative to 1 server Nexus software - IB DDR and 10 GbE scaling to 32 or 64 cores HP-MPI; Test: spe10_64 grids HP DL165 G5 Opteron GHz GigE 2p8c HP-MPI HP BL465c G5 Opteron GHz GigE 2p8c HP-MPI HP DL160 G5 Xeon GHz GigE 2p8c HP-MPI HP BL465 G5 Opteron GHz IB DDR ConnectX 2p8c HP-MPI HP DL165 G5 Opteron GHz IB DDR ConnectX 2p8c HP-MPI HP DL160 G5 Xeon GHz IB DDR ConnectX 2p4c HP-MPI HP DL160 G5 Xeon GHz IB DDR ConnectX 2p8c HP-MPI HP DL160 G5 Xeon GHz GigE 2p4c HP-MPI HP BL465 G5 Opteron GHz IB DDR ConnectX 2p8c HP-MPI HP BL460c G5 Xeon GHz GigE 2p8c HP-MPI HP BL460c G5 Xeon GHz IB DDR ConnectX 2p8c HP-MPI cores HP BL460c G5 Xeon GHz IB DDR ConnectX 2p8c HP-MPI HP DL160 G5 Xeon GHz IB DDR ConnectX 2p8c HP-MPI HP DL160 G5 Xeon GHz IB DDR ConnectX 2p4c HP-MPI HP DL380 G5 Xeon GHz IB DDR Ex 2p4c HP-MPI HP BL460c G1 Xeon GHz Chelsio & Blade 10GbE udapl 2p4c HP-MPI HP BL465 G5 Opteron GHz IB DDR ConnectX 2p8c HP-MPI HP DL165 G5 Opteron GHz IB DDR ConnectX 2p8c HP-MPI Number of servers Figure 6: Comparison using high-speed interconnects showing scaling from one to eight servers, where all cores in each server are used
9 Scaling relative to 1 server Nexus software Gigabit Ethernet Scaling to 32 or 64 cores HP-MPI; Test: spe10_64 grids HP DL160 G5 Xeon GHz GigE 2p8c HP-MPI HP DL160 G5 Xeon GHz GigE 2p4c HP-MPI HP BL460c G1 Xeon GHz GigE 2p4c HP-MPI HP BL460c G5 Xeon GHz GigE 2p8c HP-MPI HP DL165 G5 Opteron GHz GigE 2p8c HP-MPI HP BL465c G5 Opteron GHz GigE 2p8c HP-MPI Number of servers Figure 7: Comparison using Gigabit Ethernet interconnect, showing scaling from one to eight servers, where all cores in each server are used. Relative Performance to Gigabit Ethernet (bigger is better for IB) Nexus software IB DDR and 10 GbE performance relative to Gigabit Ethernet (IB DDR ConnectX, except where noted) HP-MPI; Test: spe10_64 grids HP BL465c Opteron GHz 2p8c HP-MPI HP BL465c Opteron GHz 2p8c HP-MPI HP DL160 G5 Xeon GHz 2p8c HP-MPI HP DL160 G5 Xeon GHz 2p4c HP-MPI HP BL460c G5 Xeon GHz 2p8c HP-MPI HP DL380 G5 Xeon GHz 2p4c HP-MPI (Chelsio 10GbE) HP DL380 G5 Xeon GHz 2p4c HP-MPI (IB DDR Ex) Number of cores Figure 8: Relative performance of high-speed interconnects to Gigabit Ethernet, showing the advantage in performance using high-speed interconnects. 8 Scalability is a useful way to determine the benefit of running on a cluster. In Figures 6 and 7 we compare the performance of a single server with that of a cluster of multiple servers to see the benefit of the cluster for runs of up to 32 processes (in some cases, to 64 processes). In all cases, the servers are fully loaded. When using servers with dual-core processors, we compare a single server to two, four, and eight servers; when using servers with quadcore processors, we compare the single server to two and four servers. is provides comparisons of 32-process runs in all cases. Figure 6 shows results for highspeed interconnects (IB DDR ConnectX and 10GbE), while Figure 7 shows results for Gigabit Ethernet. Note that most of the clusters configured with high-speed interconnects reduce the runtime of the model by a factor of more than 3 when four servers are used rather than only one, and some reduce the runtime of the model by a factor approaching 7 when eight servers are used. When Gigabit Ethernet is used instead, most clusters of four servers cannot achieve a speed-up of 3, and none of the eight-node clusters can provide a speed-up of 5. Network: High-speed interconnects vs. Gigabit Ethernet e equations used by the Nexus simulator to describe the physical behavior of the reservoir being modeled have a strong dependence on the pressure field throughout the model, because pressure differences are usually the major driver of the flow of fluids in the reservoir. Changes in pressure in one area of the reservoir affect all areas of the reservoir. During a parallel simulation, reservoir pressure data in each subdomain must be communicated to other subdomains. is places a significant load on the private 7
10 Nexus software IB DDR Ex, 10 GigE, 1 GigE scaling to 32 cores, HP-MPI; Test: spe10_64 grids HP DL380 G5 Xeon GHz IB DDR Ex 2p4c HP-MPI HP BL460c G1 Xeon GHz Chelsio & Blade 10GbE udapl 2p4c HP-MPI HP DL380 G5 Xeon GHz GigE 2p4c HP-MPI Scaling relative to 1 server Number of servers 8 Figure 9 Comparison of IB, 10-gigabit, and 1-Gigabit Ethernet for a specific server type. network connecting the cluster nodes. us, the performance of the interconnect network can play a significant role in determining the overall performance of parallel Nexus simulation runs. e server systems tested for this paper, whether blades or rack-mount systems, have two built-in Gigabit Ethernet ports. One of these ports can be used for the private network. Optional add-on hardware can be used to provide a network that has lower latency and higher bandwidth than the built-in Gigabit Ethernet. Two available options which were tested were InfiniBand (specifically, DDR ConnectX IB) and 10- Gigabit Ethernet (10GbE). Both these options showed significant improvements in Nexus parallel performance when compared to the built-in Gigabit Ethernet. Figure 8 shows the performance of the highspeed interconnects relative to the built-in Gigabit Ethernet. e 10-Gigabit Ethernet was available only on one cluster, which was built from servers that used dual-core Xeon 5160 (3.0-GHz) processors. Note that for the clusters built from servers using quad-core processors, the 32-core run requires only four servers; even at this small server count, the high-speed interconnects can provide a benefit as large as 50 percent. e 10- gigabit performance is comparable, but slightly slower, to the InfiniBand performance in the one case where a direct comparison can be made. Looking at the three interconnect types in isolation, as shown in Figure 9, it is evident that both high-speed interconnects provide a significant speed-up over Gigabit Ethernet, with InfiniBand slightly faster than 10- Gigabit Ethernet. Because the two highspeed interconnects perform in a similar manner for Nexus so ware, nonperformance characteristics may be a deciding factor for a customer desiring the most appropriate technology for his site. Selecting a high-speed interconnect that exists elsewhere in your facility is a reasonable choice, if there is one. If not, select it based on price, and be sure to include cable costs in the analysis. 8
11 For more information blades/components/c-class-components.html blades/components/ethernet/ 10gb-bl-c/index.html Further assistance Contact Michael Mott or Ken Glass for assistance with hardware specifications, such as power and cooling information and/or custom benchmarks. Contact John Davidson with Nexus so ware requests. Server and pricing info e online HP Product Bulletin can be used to see which processors are available for each server. e HP Product Bulletin tool is available at: e QuickSpecs, search tools, and a rudimentary mechanism for list price lookup are all available. Nexus and VIP are trademarks of Halliburton. Intel and Xeon are trademarks of Intel. AMD and Opteron are trademarks of AMD. HP Blade System and ProLiant are trademarks of HP. All other marks are marks of their respective owners. 10 Gb Ethernet Switches and Adapters e following 10 Gb Ethernet products were used to achieve the performance results of the Landmark Nexus application on the HP platform: HP c-class BladeSystem - One HP 10 Gb Ethernet BL-c Switch. is 10 Gb HP switch is designed to deliver high-performance 10Gb networking throughput at a breakthrough price. With 200Gb (400- Gb full duplex) of aggregate switching capacity, it allows full utilization of the HP BladeSystem multi-core capabilities. For more information, see hpc/oil-gas.html. - 8 units of Chelsio S320EM HP BladeSystem 10 Gb Ethernet adapters. Chelsio s S320 family of 10 GbE adapters, using Unified Wire technology, delivers high throughput with low latency. Interprocess communication and storage networking traffic can be concurrently handled on a single S320Eclass adapter. For more information, see HP DL ProLiant Server cluster - One RackSwitch G8100. e RackSwitch G8100 from BLADE Network Technologies is a top of rack switch that offers 480 Gb of network throughput, optimal for highperformance computing applications requiring the highest bandwidth and lowest latency. With flexible airflow options and variable speed fans, it allows for variable mounting options in the server rack. For more information, see oil-gas.html. - Eight units of Chelsio S320E 10 Gb Ethernet adapters. Chelsio s S320 family of 10 GbE adapters, using Unified Wire technology, delivers high throughput with low latency. Interprocess communication and storage networking traffic can be concurrently handled on a single S320Eclass adapter. For more information, see 9
12 2009 Halliburton. All rights reserved. Sales of Halliburton products and services will be in accord solely with the terms and conditions contained in the contract between Halliburton and the customer that is applicable to the sale. H /09
A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
Clusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
ECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
HP reference configuration for entry-level SAS Grid Manager solutions
HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2
LS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009
ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory
HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads
HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads Gen9 Servers give more performance per dollar for your investment. Executive Summary Information Technology (IT) organizations face increasing
Michael Kagan. [email protected]
Virtualization in Data Center The Network Perspective Michael Kagan CTO, Mellanox Technologies [email protected] Outline Data Center Transition Servers S as a Service Network as a Service IO as a Service
How System Settings Impact PCIe SSD Performance
How System Settings Impact PCIe SSD Performance Suzanne Ferreira R&D Engineer Micron Technology, Inc. July, 2012 As solid state drives (SSDs) continue to gain ground in the enterprise server and storage
Improved LS-DYNA Performance on Sun Servers
8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms
Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage
Cisco for SAP HANA Scale-Out Solution Solution Brief December 2014 With Intelligent Intel Xeon Processors Highlights Scale SAP HANA on Demand Scale-out capabilities, combined with high-performance NetApp
Building Clusters for Gromacs and other HPC applications
Building Clusters for Gromacs and other HPC applications Erik Lindahl [email protected] CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network
Business white paper. HP Process Automation. Version 7.0. Server performance
Business white paper HP Process Automation Version 7.0 Server performance Table of contents 3 Summary of results 4 Benchmark profile 5 Benchmark environmant 6 Performance metrics 6 Process throughput 6
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution
Arista 10 Gigabit Ethernet Switch Lab-Tested with Panasas ActiveStor Parallel Storage System Delivers Best Results for High-Performance and Low Latency for Scale-Out Cloud Storage Applications Introduction
White Paper Solarflare High-Performance Computing (HPC) Applications
Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters
SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center
SRNWP Workshop HP Solutions and Activities in Climate & Weather Research Michael Riedmann European Performance Center Agenda A bit of marketing: HP Solutions for HPC A few words about recent Met deals
HP PCIe IO Accelerator For Proliant Rackmount Servers And BladeSystems
WHITE PAPER HP PCIe IO Accelerator For Proliant Rackmount Servers And BladeSystems 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Overview & Features... 3 QuickSpecs...3 HP Supported
HP ProLiant BL460c achieves #1 performance spot on Siebel CRM Release 8.0 Benchmark Industry Applications running Microsoft, Oracle
HP ProLiant BL460c achieves #1 performance spot on Siebel CRM Release 8.0 Benchmark Industry Applications running Microsoft, Oracle HP ProLiant BL685c takes #2 spot HP Leadership» The HP ProLiant BL460c
HPC Update: Engagement Model
HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems [email protected] Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
How To Write An Article On An Hp Appsystem For Spera Hana
Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ
HP recommended configuration for Microsoft Exchange Server 2010: HP LeftHand P4000 SAN
HP recommended configuration for Microsoft Exchange Server 2010: HP LeftHand P4000 SAN Table of contents Executive summary... 2 Introduction... 2 Solution criteria... 3 Hyper-V guest machine configurations...
Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband
Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering [email protected] Company Overview
CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW. Dell PowerEdge M-Series Blade Servers
CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW Dell PowerEdge M-Series Blade Servers Simplifying IT The Dell PowerEdge M-Series blade servers address the challenges of an evolving IT environment by delivering
Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers
WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that
Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage
Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage Technical white paper Table of contents Executive summary... 2 Introduction... 2 Test methodology... 3
IT@Intel. Comparing Multi-Core Processors for Server Virtualization
White Paper Intel Information Technology Computer Manufacturing Server Virtualization Comparing Multi-Core Processors for Server Virtualization Intel IT tested servers based on select Intel multi-core
Interoperability Testing and iwarp Performance. Whitepaper
Interoperability Testing and iwarp Performance Whitepaper Interoperability Testing and iwarp Performance Introduction In tests conducted at the Chelsio facility, results demonstrate successful interoperability
Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation
Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation Evaluation report prepared under contract with HP Executive Summary The computing industry is experiencing an increasing demand for
Oracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:
Overview HP supports 56 Gbps Fourteen Data Rate (FDR) and 40Gbps 4X Quad Data Rate (QDR) InfiniBand (IB) products that include mezzanine Host Channel Adapters (HCA) for server blades, dual mode InfiniBand
The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.
White Paper Virtualized SAP: Optimize Performance with Cisco Data Center Virtual Machine Fabric Extender and Red Hat Enterprise Linux and Kernel-Based Virtual Machine What You Will Learn The virtualization
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
A Smart Investment for Flexible, Modular and Scalable Blade Architecture Designed for High-Performance Computing.
Appro HyperBlade A Smart Investment for Flexible, Modular and Scalable Blade Architecture Designed for High-Performance Computing. Appro HyperBlade clusters are flexible, modular scalable offering a high-density
FLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief
Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...
HP SN1000E 16 Gb Fibre Channel HBA Evaluation
HP SN1000E 16 Gb Fibre Channel HBA Evaluation Evaluation report prepared under contract with Emulex Executive Summary The computing industry is experiencing an increasing demand for storage performance
HP RA for SAS Visual Analytics on HP ProLiant BL460c Gen8 Servers running Linux
Technical white paper HP RA for SAS Visual Analytics on Servers running Linux Performance results with a concurrent workload of 5 light users and 1 heavy user accessing a 112GB dataset Table of contents
Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison
April 23 11 Aviation Parkway, Suite 4 Morrisville, NC 2756 919-38-28 Fax 919-38-2899 32 B Lakeside Drive Foster City, CA 9444 65-513-8 Fax 65-513-899 www.veritest.com [email protected] Microsoft Windows
HP ProLiant BL685c takes #1 Windows performance on Siebel CRM Release 8.0 Benchmark Industry Applications
HP takes #1 Windows performance on Siebel CRM Release 8.0 Benchmark Industry Applications Defeats IBM System x3850 in performance The HP Difference The test system demonstrated that Oracle s Siebel CRM
HP ProLiant BL460c takes #1 performance on Siebel CRM Release 8.0 Benchmark Industry Applications running Linux, Oracle
HP ProLiant BL460c takes #1 performance on Siebel CRM Release 8.0 Benchmark Industry Applications running Linux, Oracle HP first to run benchmark with Oracle Enterprise Linux HP Leadership» The HP ProLiant
SX1012: High Performance Small Scale Top-of-Rack Switch
WHITE PAPER August 2013 SX1012: High Performance Small Scale Top-of-Rack Switch Introduction...1 Smaller Footprint Equals Cost Savings...1 Pay As You Grow Strategy...1 Optimal ToR for Small-Scale Deployments...2
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
Boosting Data Transfer with TCP Offload Engine Technology
Boosting Data Transfer with TCP Offload Engine Technology on Ninth-Generation Dell PowerEdge Servers TCP/IP Offload Engine () technology makes its debut in the ninth generation of Dell PowerEdge servers,
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4
IBM System x family brochure
IBM Systems and Technology System x IBM System x family brochure IBM System x rack and tower servers 2 IBM System x family brochure IBM System x servers Highlights IBM System x and BladeCenter servers
Logically a Linux cluster looks something like the following: Compute Nodes. user Head node. network
A typical Linux cluster consists of a group of compute nodes for executing parallel jobs and a head node to which users connect to build and launch their jobs. Often the compute nodes are connected to
Summary. Key results at a glance:
An evaluation of blade server power efficiency for the, Dell PowerEdge M600, and IBM BladeCenter HS21 using the SPECjbb2005 Benchmark The HP Difference The ProLiant BL260c G5 is a new class of server blade
Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms
IT@Intel White Paper Intel IT IT Best Practices: Data Center Solutions Server Virtualization August 2010 Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms Executive
Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
Server Migration from UNIX/RISC to Red Hat Enterprise Linux on Intel Xeon Processors:
Server Migration from UNIX/RISC to Red Hat Enterprise Linux on Intel Xeon Processors: Lowering Total Cost of Ownership A Case Study Published by: Alinean, Inc. 201 S. Orange Ave Suite 1210 Orlando, FL
CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER
CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER Tender Notice No. 3/2014-15 dated 29.12.2014 (IIT/CE/ENQ/COM/HPC/2014-15/569) Tender Submission Deadline Last date for submission of sealed bids is extended
Cluster Grid Interconects. Tony Kay Chief Architect Enterprise Grid and Networking
Cluster Grid Interconects Tony Kay Chief Architect Enterprise Grid and Networking Agenda Cluster Grid Interconnects The Upstart - Infiniband The Empire Strikes Back - Myricom Return of the King 10G Gigabit
IBM System Cluster 1350 ANSYS Microsoft Windows Compute Cluster Server
IBM System Cluster 1350 ANSYS Microsoft Windows Compute Cluster Server IBM FLUENT Benchmark Results IBM & FLUENT Recommended Configurations IBM 16-Core BladeCenter S Cluster for FLUENT Systems: Up to Six
Upgrading Data Center Network Architecture to 10 Gigabit Ethernet
Intel IT IT Best Practices Data Centers January 2011 Upgrading Data Center Network Architecture to 10 Gigabit Ethernet Executive Overview Upgrading our network architecture will optimize our data center
Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin. http://www.dell.com/clustering
Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin Reza Rooholamini, Ph.D. Director Enterprise Solutions Dell Computer Corp. [email protected] http://www.dell.com/clustering
Integrated Grid Solutions. and Greenplum
EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing
Interconnect Analysis: 10GigE and InfiniBand in High Performance Computing WHITE PAPER Highlights: There is a large number of HPC applications that need the lowest possible latency for best performance
Microsoft Exchange Server 2007 and Hyper-V high availability configuration on HP ProLiant BL680c G5 server blades
Microsoft Exchange Server 2007 and Hyper-V high availability configuration on HP ProLiant BL680c G5 server blades Executive summary... 2 Introduction... 2 Exchange 2007 Hyper-V high availability configuration...
White Paper. Better Performance, Lower Costs. The Advantages of IBM PowerLinux 7R2 with PowerVM versus HP DL380p G8 with vsphere 5.
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 White Paper Better Performance, Lower Costs The Advantages of IBM PowerLinux 7R2 with PowerVM versus HP DL380p G8 with vsphere
Accelerating Data Compression with Intel Multi-Core Processors
Case Study Predictive Enterprise Intel Xeon processors Intel Server Board Embedded technology Accelerating Data Compression with Intel Multi-Core Processors Data Domain incorporates Multi-Core Intel Xeon
DDR3 memory technology
DDR3 memory technology Technology brief, 3 rd edition Introduction... 2 DDR3 architecture... 2 Types of DDR3 DIMMs... 2 Unbuffered and Registered DIMMs... 2 Load Reduced DIMMs... 3 LRDIMMs and rank multiplication...
Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
High Performance SQL Server with Storage Center 6.4 All Flash Array
High Performance SQL Server with Storage Center 6.4 All Flash Array Dell Storage November 2013 A Dell Compellent Technical White Paper Revisions Date November 2013 Description Initial release THIS WHITE
Configuring the HP DL380 Gen9 24-SFF CTO Server as an HP Vertica Node. HP Vertica Analytic Database
Configuring the HP DL380 Gen9 24-SFF CTO Server as an HP Vertica Node HP Vertica Analytic Database HP Big Data Foundations Document Release Date: February, 2015 Contents Using the HP DL380 Gen9 24-SFF
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.
Filename: SAS - PCI Express Bandwidth - Infostor v5.doc Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI Article for InfoStor November 2003 Paul Griffith Adaptec, Inc. Server
Adaptec: Snap Server NAS Performance Study
March 2006 www.veritest.com [email protected] Adaptec: Snap Server NAS Performance Study Test report prepared under contract from Adaptec, Inc. Executive summary Adaptec commissioned VeriTest, a service
The Bus (PCI and PCI-Express)
4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the
Sizing guide for SAP and VMware ESX Server running on HP ProLiant x86-64 platforms
Sizing guide for SAP and VMware ESX Server running on HP ProLiant x86-64 platforms Executive summary... 2 Server virtualization overview... 2 Solution definition...2 SAP architecture... 2 VMware... 3 Virtual
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION
SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION AFFORDABLE, RELIABLE, AND GREAT PRICES FOR EDUCATION Optimized Sun systems run Oracle and other leading operating and virtualization platforms with greater
THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC
THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC The Right Data, in the Right Place, at the Right Time José Martins Storage Practice Sun Microsystems 1 Agenda Sun s strategy and commitment to the HPC or technical
PRIMERGY server-based High Performance Computing solutions
PRIMERGY server-based High Performance Computing solutions PreSales - May 2010 - HPC Revenue OS & Processor Type Increasing standardization with shift in HPC to x86 with 70% in 2008.. HPC revenue by operating
SMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
Recommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
benchmarking Amazon EC2 for high-performance scientific computing
Edward Walker benchmarking Amazon EC2 for high-performance scientific computing Edward Walker is a Research Scientist with the Texas Advanced Computing Center at the University of Texas at Austin. He received
Out-of-box comparison between Dell and HP blade servers
Out-of-box comparison between and blade servers TEST REPORT JUNE 2007 Executive summary Inc. () commissioned Principled Technologies (PT) to compare the out-of-box experience of a PowerEdge 1955 Blade
enabling Ultra-High Bandwidth Scalable SSDs with HLnand
www.hlnand.com enabling Ultra-High Bandwidth Scalable SSDs with HLnand May 2013 2 Enabling Ultra-High Bandwidth Scalable SSDs with HLNAND INTRODUCTION Solid State Drives (SSDs) are available in a wide
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
Power efficiency and power management in HP ProLiant servers
Power efficiency and power management in HP ProLiant servers Technology brief Introduction... 2 Built-in power efficiencies in ProLiant servers... 2 Optimizing internal cooling and fan power with Sea of
RLX Technologies Server Blades
Jane Wright Product Report 10 July 2003 RLX Technologies Server Blades Summary RLX Technologies has designed its product line to support parallel applications with high-performance compute clusters of
Solving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
OPTIMIZING SERVER VIRTUALIZATION
OPTIMIZING SERVER VIRTUALIZATION HP MULTI-PORT SERVER ADAPTERS BASED ON INTEL ETHERNET TECHNOLOGY As enterprise-class server infrastructures adopt virtualization to improve total cost of ownership (TCO)
The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment
The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment Introduction... 2 Virtualization addresses key challenges facing IT today... 2 Introducing Virtuozzo... 2 A virtualized environment
