revolutionising high performance computing
|
|
|
- Collin Allen
- 10 years ago
- Views:
Transcription
1 revolutionising high performance computing WITH supermicro SOLUTIONS USING nvidia tesla
2 nvidia tesla Gpus are revolutionizing computing The high performance computing (HPC) industry s need for computation is increasing, as large and complex computational problems become commonplace across many industry segments. Traditional CPU technology, however, is no longer capable of scaling in performance sufficiently to address this demand. The parallel processing capability of the GPU allows it to divide complex computing tasks into thousands of smaller tasks that can be run concurrently. This ability is enabling computational scientists and researchers to address some of the world s most challenging computational problems up to several orders of magnitude faster. Conventional CPU computing architecture can no longer support the growing HPC needs. Source: Hennesse]y &Patteson, CAAQA, 4th Edition. 100x Performance Advantage by % year 1000 Performance vs VAX % year % year CPU GPU Growth per year
3 Latest GPU SuperServer, SuperBlade and NVIDIA Maximus Certified SuperWorkstation Solutions from Supermicro Super Micro Computer, Inc. (NASDAQ: SMCI), a global leader in high-performance, high-efficiency server technology and green computing, will showcase its latest graphics processing unit (GPU) enabled X9 server and workstation solutions at the NVIDIA GPU Technology Conference (GTC) May in San Jose, CA. Supermicro s GPU solutions support Intel Xeon E processors and feature greater memory capacity (up to 256GB for servers and 512GB in workstations), higher performance I/O and connectivity with PCI-E 3.0, 10GbE and 4x QDR (40Gb) InfiniBand support (GPU SuperBlade) as well as innovative energy efficient power saving technologies. Supermicro X9 solutions also feature the highest density GPU computing available today. The non-blocking architecture supports 4 GPUs per 1U in a standard, short depth 32, rack chassis. The SuperBlade can fit 30 GPUs in 7U- another industry first from Supermicro. Combined with the latest GPUs based on NVIDIA Kepler architecture, the X9 platform offers industry professionals one of the most powerful, accelerated and green computing solutions available on the market. Supermicro is transforming the high performance computing landscape with our advanced, highdensity GPU server and workstation platforms, said Charles Liang, President and CEO of Supermicro. At GTC, we are showcasing our new generation X9 SuperServer, SuperBlade TM and latest NVIDIA Maximus certified SuperWorkstation systems which deliver groundbreaking performance, reliability, scalability and efficiency. Our expanding lines of GPU-based computing solutions empower scientists, engineers, designers and many other professionals with the most cost-effective access to supercomputing performance. hybrid computing Accelerating Research, Scietific, Engineering, Computational Finance and Design Applications 1U SuperServer 2U SuperServer 7U SuperBlade SuperWorkstation 1027Gr-TQF Up to 4x GPUs 2027GR-TRF Up to 6x GPUs SBI-7127RG Up to 20x GPUs 7047GR-TRF 4x NVIDIA Tesla GPUs 1x NVIDIA Quadro GPU These select GPU enabled servers and workstations are a sampling of Supermicro s vast array of GPU solutions. Visit us at the San Jose McEnery Convention Center, May in GTC Booth #75 to see Supermicro s latest GPU products. For a complete look at Supermicro s extensive line of high performance, high efficiency GPU solutions, visit or go to to keep up with Supermicro s evolving line of NVIDIA Maximus powered SuperWorkstations. 3
4 NVIDIA TESLA TM / Hybrid computing solutions 2010 superworkstations Superworkstations Server Grade Performance for Multimedia, Engineering, and Scientific Applications Performance. Efficiency. Expandability. Reliability Optimized Solutions: Video Editing, Digital Content Creation, MCAD, CAE, Financial, SW Development, Scientific, Oil and Gas Whisper-Quiet 21dB Acoustics models Up to Redundant Platinum Level (95%+) Digital Power Supplies Best Value for your IT Investment Up to 512 GB DDR3 memory and 8 Hot-Swap HDDs Up to 5 GPUs supported with PCI-E 3.0 Server-Grade Design for 24x7 Operation The 7047GR-TRF is Supermicro s latest high-end, enterprise-class X9 SuperWorkstation, with NVIDIA Maximus certification. This system accelerates design and visualization tasks with an NVIDIA Quadro GPU, while providing dedicated processing power for simultaneous compute intensive tasks such as simulation and rendering with up to four NVIDIA Tesla C2075 GPUs. The upcoming 7047GR-TPRF SuperWorkstation supports passively cooled GPUs making it ideal for high performance trading (HPT) applications. X9 systems feature dual Intel Xeon E family processors, maximized memory and non-blocking native PCI-E 3.0 configurations along with redundant Platinum level high-efficiency (94%+) power supplies GR-TRF NVIDIA Maximus Certified supporting NVIDIA Quadro and up to 4 NVIDIA Tesla for real-time 3D design, visualization, simulation and accelerated rendering GPU Server for Mission-critical applications, enterprise server, large database, e-business, on-line transaction processing, oil & gas, medical applications Dual Intel Xeon processor E family; Socket R (LGA 2011) 8 Hot-swap 3.5 SATA HDD Bays, 3x 5.25 peripheral drive bays, 1x 3.5 fixed drive bay 16 DIMMs support up to 512GB DDR3 1600MHz reg. ECC memory 4 (x16) PCI-E 3.0 (support 4 double width GPU cards), 2 (x8) PCI-E 3.0 (1 in x16), and 1 (x4) PCI-E 2.0 (in x8) slot I/O ports: 2 GbE, 1 Video, 1 COM/Serial, 9 USB 2.0 System management: Built-in Server management tool (IPMI 2.0, KVM/media over LAN) with dedicated LAN port 4 Hot-swap PWM cooling fans and 2 Hot-swap rear fans 1620W Redundant Platinum Level Power Supplies 4
5 High Performance GPU servers The foundation of a flexible and Efficient HPC & Data center deployment X9 SuperServers which provide a wide range of configurations targeting high performance computing (HPC) applications. Systems include the 1027GR-TQF offering up to 4 double-width GPUs in 1U for maximum compute density in a compact 32 short depth, standard rack mount format. The 2U 2027GR-TRF supports up to 6 GPUs and is ideal for scalable, high performance computing clusters in scientific research fields with a 2027GR-TRFT model available supporting dual-port 10GBase-T for increased bandwidth and reduced latency. The GPU SuperBlade SBI-7127RG packs the industry s highest compute density of 30 GPUs in 7U delivering ultimate processing performance for applications such as oil and gas exploration. hybrid computing 2027GR-TRFT / 2027GR-TRF (-FM475/-FM409) GPU Server, Mission-critical app., enterprise server, large database, e-business, on-line transaction processing, oil & gas, medical app. Dual Intel Xeon processor E family; Socket R (LGA 2011) 10 Hot-swap 2.5 SATA HDD Bays (4 SATA2, 6 SATA3) 8 DIMMs support up to 256GB DDR3 1600MHz reg. ECC memory 4 (x16) PCI-E 3.0 (support 4 double width GPU cards), 1 (x8) PCI-E 3.0 (in x16), and 1 (x4) PCI-E 2.0 (in x16) slots I/O ports: 2 GbE/10GBase-T (TRFT), 1 Video, 1 COM/ Serial, 2 USB heavy duty fans w/ optimal fan speed control 1800W Redundant Platinum Level Power Supplies -FM475/409 : 4x NVIDIA M2075/M2090 GPU cards integrated Brains, Brains and More Brains!!! Professor Wu Feng of Virginia Tech s Synergy Laboratory is the main Brain behind VT s latest and greatest Supercomputer, HokieSpeed. HokieSpeed is a Supermicro supercluster of SuperServer 2026GT- TRFs with thousands of CPU/GPU cores or brains. In Nov. 2011, HokieSpeed ranked 11th for energy efficiency on the Green500 List and 96th on the world s fastest supercomputing Top500 List. Dr. Feng s Supermicro powered HokieSpeed boasts single-precision peak of 455 teraflops, or 455 trillion operations per second, and a double-precision peak of 240 teraflops, or 240 trillion operations per second. Watch video*: Superservers in action: Virginia Tech HokieSpeed Supercomputer * Use your phone, smartphone or tablet PC with QR reader software to read the QR code. 5
6 NVIDIA TESLA TM / Hybrid computing solutions 2010 SuperServers High Performance GPU servers 1017GR-TF (-FM275/-FM209) GPU Server, Mission-critical app., enterprise server, oil & gas, financial, 3D rendering, chemistry, HPC Single Intel Xeon processor E family; Socket R (LGA 2011) 6 Hot-swap 2.5 SATA HDD Bays 8 DIMMs support up to 256GB DDR3 1600MHz reg. ECC memory 2 (x16) PCI-E 3.0 (for 2 GPU cards) and 1 (x8) PCI-E 3.0 slot I/O ports: 2 GbE, 1 Video, 2 USB counter rotating fans w/ optimal fan speed control 1400W Platinum Level Power Supply w/ Digital Switching -FM175/109 : 1x NVIDIA M2075/M2090 GPU card integrated -FM275/209 : 2x NVIDIA M2075/M2090 GPU cards integrated 5017GR-TF (-FM275/-FM209) GPU Server, Mission-critical app., enterprise server, oil & gas, financial, 3D rendering, chemistry, HPC Single Intel Xeon processor E family; Socket R (LGA 2011) 3 Hot-swap 3.5 SATA HDD Bays 8 DIMMs support up to 256GB DDR3 1600MHz reg. ECC memory 2 (x16) PCI-E 3.0 (for 2 GPU cards) and 1 (x8) PCI-E 3.0 (for IB card) I/O ports: 2 GbE, 1 Video, 2 USB counter rotating fans w/ optimal fan speed control 1400W Platinum Level Power Supply w/ Digital Switching -FM175/109 : 1x NVIDIA M2075/M2090 GPU card integrated -FM275/209 : 2x NVIDIA M2075/M2090 GPU cards integrated 1027GR-TRFT/1027GR-TRF (-FM375/-FM309) / 1027GR-TSF GPU Server, Mission-critical app., enterprise server, oil & gas, financial, 3D rendering, chemistry, HPC Dual Intel Xeon processor E family; Socket R (LGA 2011) 4 Hot-swap 2.5 SATA2/3 HDD Bays 8 DIMMs support up to 256GB DDR3 1600MHz reg. ECC memory 3 (x16) PCI-E 3.0 and 1 (x8) PCI-E 3.0 (in x16) slots I/O ports: 2 GbE/10GBase-T (TRFT), 1 Video, 1 COM/ Serial, 2 USB heavy duty fans w/ optimal fan speed control 1800W Redundant Platinum Level Power Supplies (TRF) 1800W Platinum Level Power Supply (TSF) -FM375/309 : 3x NVIDIA M2075/M2090 GPU cards integrated 6
7 GPU SUPERBLADE SOLUTIONS Supermicro offers GPU blade solutions optimized for HPC applications, low-noise blade solutions for offices and SMB as well as personal supercomputing applications. With acoustically optimized thermal and cooling technologies it achieves < 50dB with 10 DP server blades and features VAC, Platinum Level high-efficiency (94%+), N+1 redundant power supplies. Superblade solutions Space Optimization When housed within a 19 EIA-310D industry-standard 42U rack, SuperBlade servers reduce server footprint in the datacenter. Power, cooling and networking devices are removed from each individual server and positioned to the rear of the chassis thereby reducing the required amount of space while increasing flexibility to meet changing business demands. Up to twenty DP blade nodes can be installed in a 7U chassis. Compared to the rack space required by twenty individual 1U servers, the SuperBlade provides over 65% space savings. SBI-7127RG Up to 120 GPU CPU per 42U Rack! Up to 2 Tesla M2090/M2075/M2070Q/M2070/2050 GPU Up to 2 Intel Xeon E series processors Up to 256GB DDR3 1600/1333/1066 ECC DIMM 1 internal SATA Disk-On-Module 1 USB flash drive 2 PCI-E 2.0 x16 Full-heigh Full-length Expansion Slots Onboard BMC for IPMI 2.0 support - KVM over IP, remote Virtual Media 4x QDR (40Gb) InfiniBand or 10Gb Ethernet supported via optional mezzanine card Dual-port Gigabit Ethernet NIC SBI-7126TG Up to 20 GPU + 20 CPU per 7U! Up to 2 Tesla M2090/M2070Q/M2070/2050 GPU Up to 2 Intel Xeon 5600/5500 series processors Up to 96GB DDR3 1333/1066 ECC DIMM 1 internal SATA Disk-On-Module 1 USB flash drive 2 PCI-E 2.0 x16 or 2 PCI-2.0 x8 or 2 PCI-2.0 x8 (Full-height/ maximum length 9.75 ) Onboard BMC for IPMI 2.0 support - KVM over IP, remote Virtual Media Dual IOH per blade Dual 40Gb InfiniBand or 10Gb Ethernet supported via optional mezzanine card Dual-port Gigabit Ethernet NIC Redundant GBX connectors 7
8 Turnkey Preconfigured GPU Clusters Ready-to-use solutions for science and research ACCESS THE POWER OF GPU COMPUTING WITH OFF-THE-SHELF, READY TO DEPLOY TESLA GPU CLUSTER These preconfigured solutions from Supermicro and NVIDIA provide a powerful tool for researchers and scientists to advance their science through faster simulations. GPU Test Drive is the easiest way to start using GPUs that offer supercomputing scale HPC performance at substantially lower costs and power. Experience a significant performance increase in a wide range of applications from various scientific domains. Supercharge your research with preconfigured Tesla GPU cluster today! 10 TFlops 20 TFlops 42 TFlops SRS-14URKS-GPUS-11 GPU Nodes: 4 CPU: 2x Westmere X GHz Memory: 24 GB per node Default GPU/Node: 2x M2090 Network Cards/Node: 1x InfiniBand Storage/Node: 2x 500GB HDD (1TB/Node) SRS-14URKS-GPUS-12 GPU Nodes: Head node CPU: 2x Westmere X GHz Memory: 48 GB per node Default GPU/Node: 2x M2090 Network Cards/Node: 1x InfiniBand Storage/Node: 2x 500GB HDD (1TB/Node) SRS-42URKS-GPUS-13 GPU Nodes: Head node CPU: 2x Westmere X GHz Memory: 48 GB per node Default GPU/Node: 2x M2090 Network Cards/Node: 1x InfiniBand Storage/Node: 2x 1TB HDD (2TB/ Node) 8
9 supermicro is committed to support Tesla kepler GPU accelerators 1 hybrid computing Tesla K10 GPU Computing Accelerator Optimized for single precision applications, the Tesla K10 is a throughput monster based on the ultra-efficient Tesla Kepler. The accelerator board features two Tesla Kepler and delivers up to 2x the performance for single precision applications compared to the previous generation Fermi-based Tesla M2090 in the same power envelope. With an aggregate performance of 4.58 teraflop peak single precision and 320 gigabytes per second memory bandwidth for both GPUs put together, the Tesla K10 is optimized for computations in seismic, signal, image processing, and video analytics. Tesla K20 GPU Computing Accelerator Designed for double precision applications and the broader supercomputing market, the Tesla K20 delivers 3x the double precision performance compared to the previous generation Fermi-based Tesla M2090, in the same power envelope. Tesla K20 features a single Tesla Kepler that includes the Dynamic Parallelism and Hyper-Q features. With more than one teraflop peak double precision performance, the Tesla K20 is ideal for a wide range of high performance computing workloads including climate and weather modeling, CFD, CAE, computational physics, biochemistry simulations, and computational finance. TECHNICAL SPECIFICATIONS Peak double precision floating point performance (board) Peak single precision floating point performance (board) Tesla K10 2 Tesla K teraflops To be announced 4.58 teraflops To be announced Number of GPUs 2 x GK104s 1 x GK110 CUDA cores 2 x 1536 To be announced Memory size per board (GDDR5) 8 GB To be announced Memory bandwidth for board (ECC off) GBytes/sec To be announced GPU Computing Applications Seismic, Image, Signal Processing, Video analytics CFD, CAE, Financial computing, Computational chemistry and Physics, Data analytics, Satellite imaging, Weather modeling Architecture Features SMX SMX, Dynamic Parallelism, Hyper-Q System Servers only Servers and Workstations. Available May 2012 Q products and availability is subject to confirmation 2 Tesla K10 specifications are shown as aggregate of two GPUs. 3 With ECC on, 12.5% of the GPU memory is used for ECC bits. So, for example, 6 GB total memory yields 5.25 GB of user available memory with ECC on. 9
10 NVIDIA Tesla Kepler.. GPU Computing Accelerators.. Tesla Kepler World s fastest and most power efficient x86 accelerator Fastest, Most Efficient HPC Architecture With the launch of Fermi GPU in 2009, NVIDIA ushered in a new era in the high performance computing (HPC) industry based on a hybrid computing model where CPUs and GPUs work together to solve computationallyintensive workloads. And in just a couple of years, NVIDIA Fermi GPUs powers some of the fastest supercomputers in the world as well as tens of thousands of research clusters globally. Now, with the new Tesla Kepler, NVIDIA raises the bar for the HPC industry, yet again. Comprised of 7.1 billion transistors, the Tesla Kepler is an engineering marvel created to address the most daunting challenges in HPC. Kepler is designed from the ground up to maximize computational performance with superior power efficiency. The architecture has innovations that make hybrid computing dramatically easier, applicable to a broader set of applications, and more accessible. Tesla Kepler is a computational workhorse with teraflops of integer, single precision, and double precision performance and the highest memory bandwidth. The first GK110 based product will be the Tesla K20 GPU computing accelerator. Let us quickly summarize three of the most important features in the Tesla Kepler: SMX, Dynamic Parallelism, and Hyper-Q. For further details on additional SM FERMI CONTROL LOGIC 3X PERF/WATT SMX KEPLER CONTROL LOGIC 32 CORES 192 CORES Figure 1: SMX: 192 CUDA cores, 32 Special Function Units (SFU), and 32 Load/Store units (LD/ST) architectural features, please refer to the Kepler GK110 whitepaper. SMX Next Generation Streaming Multiprocessor At the heart of the Tesla Kepler is the new SMX unit, which comprises of several architectural innovations that make it not only the most powerful Streaming Multiprocessor (SM) we ve ever built but also the most programmable and power-efficient. Dynamic Parallelism Creating Work on-the-fly One of the overarching goals in designing the Kepler GK110 architecture was to make it easier for developers to more easily take advantage of the immense parallel processing capability of the GPU. To this end, the new Dynamic Parallelism feature enables the Tesla Kepler to dynamically spawn new threads by adapting to the data without going back to the host CPU. This effectively allows more of a program to be run directly on the GPU, as kernels now have the ability to independently launch additional workloads as needed. Any kernel can launch another kernel and can create the necessary streams, events, and dependencies needed to process additional work without the need for host CPU interaction. This simplified programming model is easier to create, optimize, and maintain. It also creates a programmer friendly environment by maintaining the same syntax for GPU launched workloads as traditional CPU kernel launches. Dynamic Parallelism broadens what applications can now accomplish with GPUs in various disciplines. Applications can launch small and medium sized parallel workloads dynamically where it was too expensive to do so previously. 10
11 DYNAMIC PARALLELISM CPU GPU CPU GPU Figure 2: Without Dynamic Parallelism, the CPU launches every kernel onto the GPU. With the new feature, Tesla Kepler can now launch nested kernels, eliminating the need to communicate with the CPU. Nvidia Tesla Kepler Hyper-Q Maximizing the GPU Resources Hyper-Q enables multiple CPU cores to launch work on a single GPU simultaneously, thereby dramatically increasing GPU utilization and slashing CPU idle times. This feature increases the total number of connections between the host and the the Tesla Kepler by allowing 32 simultaneous, hardware managed connections, compared to the single connection available with Fermi. Hyper-Q is a flexible solution that allows connections for both CUDA streams and Message Passing Interface (MPI) processes, or even threads from within a process. Existing applications that were previously limited by false dependencies can see up to a 32x performance increase without changing any existing code. Hyper-Q offers significant benefits for use in MPIbased parallel computer systems. Legacy MPI-based algorithms were often created to run on multi-core CPU-based systems. Because the workload that could be efficiently handled by CPU-based systems is generally smaller than that available using GPUs, the amount of work passed in each MPI process is generally insufficient to fully occupy the GPU processor. While it has always been possible to issue multiple MPI processes to concurrently run on the GPU, these processes could become bottlenecked by false dependencies, forcing the GPU to operate below peak efficiency. Hyper-Q removes false dependency bottlenecks and dramatically increases speed at which MPI processes can be moved from the system CPU(s) to the GPU for processing. Hyper-Q promises to be a performance boost for MPI applications. Conclusion Tesla Kepler is engineered to deliver ground-breaking performance with superior power efficiency while making GPUs easier than ever to use. SMX, Dynamic Parallelism, and Hyper-Q are three important innovations in the Tesla Kepler to bring these benefits to reality for our customers. For further details on additional architectural features, please refer to the Kepler GK110 whitepaper at Figure 3: Hyper-Q allows all streams to run concurrently using a separate work queue. In the Fermi model, concurrency was limited due to intra-stream dependencies caused by the single hardware work queue. To learn more about NVIDIA Tesla, go to 11
12 Programming environment for gpu computing nvidia s cuda CUDA architecture has the industry s most robust language and API support for GPU computing developers, including C, C++, OpenCL, DirectCompute, and Fortran. NVIDIA Parallel Nsight, a fully integrated development environment for Microsoft Visual Studio is also available. Used by more than six million developers worldwide, Visual Studio is one of the world s most popular development environments for Windows-based applications and services. Adding functionality specifically for GPU computing developers, Parallel Nsight makes the power of the GPU more accessible than ever before. The latest version - CUDA 4.0 has seen a host of new exciting features to make parallel computing easier. Among them the ability to relieve bus traffic by enabling GPU to GPU direct communications. Integrated development environment languages & APIs research & Education All major platforms LIBRARIES CUDA Tools & Partners Mathematical packages consultants, training, & certification Order personalized CUDA education course at: Accelerate Your Code Easily with OpenACC Directives Get 2x Speed-up in 4 Weeks or Less Accelerate your code with directives and tap into the hundreds of computing cores in GPUs. With directives, you simply insert compiler hints into your code and the compiler will automatically map compute-intensive portions of your code to the GPU. By starting with a free, 30-day trial of PGI directives today, you are working on the technology that is the foundation of the OpenACC directives standard. OpenACC is: Easy: simply insert hints in your codebase Open: run the single codebase on either the CPU or GPU Powerful: tap into the power of GPUs within hours OpenACC directives The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator, providing portability across operating systems, host CPUs and accelerators. The directives and programming model defined in this document allow programmers to create highlevel host+accelerator programs without the need to explicitly initialize the accelerator, manage data or program transfers between the host and accelerator, or initiate accelerator startup and shutdown. Watch video*: PGI ACCELERATOR, TECHNICAL PRESENTATION AT SC11 * Use your phone, smartphone or tablet PC with QR reader software to read the QR code. 12
13 What is gpu computing GPU computing is the use of a GPU (graphics processing unit) to do general purpose scientific and engineering comp History of gpu computing Graphics chips started as fixed function graphics pipelines. Over the years, these graphics chips became increasingly programmab hybrid computing GPU ACCELERATION IN Life Sciences Tesla bio workbench The NVIDIA Tesla Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research. It turns a standard PC into a computational laboratory capable of running complex bioscience codes, in fields such as drug discovery and DNA sequencing, more than times faster through the use of NVIDIA Tesla GPUs. Relative Performance Scale (normalized ns/day) 1,5 1,0 50% Faster with GPUs 0,5 192 Quad-Core CPUs 2 Quad-Core CPUs + 4 GPUs JAC NVE Benchmark (left) 192 Quad-Core CPUs simulation run on Kraken Supercomputer (right) Simulation 2 Intel Xeon Quad-Core CPUs and 4 Tesla M2090 GPUs It consists of bioscience applications; a community site for downloading, discussing, and viewing the results of these applications; and GPU-based platforms. Complex molecular simulations that had been only possible using supercomputing resources can now be run on an individual workstation, optimizing the scientific workflow and accelerating the pace of research. These simulations can also be scaled up to GPU-based clusters of servers to simulate large molecules and systems that would have otherwise required a supercomputer. Applications that are accelerated on GPUs include: Molecular Dynamics & Quantum Chemistry AMBER, GROMACS, HOOMD, LAMMPS, NAMD, TeraChem (Quantum Chemistry), VMD Bio Informatics CUDA-BLASTP, CUDA-EC, CUDA- MEME, CUDASW++ (Smith- Waterman), GPU-HMMER, MUMmerGPU For more information, visit: 13
14 NVIDIA TESLA TM / Hybrid computing solutions 2010 GPU ACCELERATION IN Life Sciences Amber Researchers today are solving the world s most challenging and important problems. From cancer research to drugs for AIDS, computational research is bottlenecked by simulation cycles per day. More simulations mean faster time to discovery. To tackle these difficult challenges, researchers frequently rely on national supercomputers for computer simulations of their models. GPUs offer every researcher supercomputerlike performance in their own office. Benchmarks have shown four Tesla M2090 GPUs significantly outperforming the existing world record on CPU-only supercomputers. AMBER: NODE PROCESSING COMPARISON 4 Tesla M2090 GPUs + 2 CPUs 69 ns/day 192 Quad-Core CPUs 46 ns/day RECOMMENDED HARDWARE CONFIGURATION Workstation 4xTesla C2070 Dual-socket Quad-core CPU 24 GB System Memory Server Up to 8x Tesla M2090s in cluster Dual-socket Quad-core CPU per node 128 GB System Memory Server 8x Tesla M2090 Dual-socket Quad-core CPU 128 GB System Memory GROMACS GROMACS is a molecular dynamics package designed primarily for simulation of biochemical molecules like proteins, lipids, and nucleic acids that have a lot complicated bonded interactions. The CUDA port of GROMACS enabling GPU acceleration supports Particle-Mesh-Ewald (PME), arbitrary forms of nonbonded interactions, and implicit solvent Generalized Born methods. ns/day T+C2075 3T 6T+2xC2075 6T 12T+4xC T cubic cubic + pcoupl dodec dodec + vsites Figure 4: Absolute performance of GROMACS running CUDA- and SSE-accelerated non-bonded kernels with PME on 3-12 CPU cores and 1-4 GPUs. Simulations with cubic and truncated dodecahedron cells, pressure coupling, as well as virtual interaction sites enabling 5 fs are shown. 14
15 AMBER and NAMD 5x Faster Take a Free and Easy Test Drive Today Run your molecular dynamics simulation 5x faster. Take a free test drive on a remotely-hosted cluster loaded with the latest GPU-accelerated applications such as AMBER and NAMD and accelerate your results. Simply log on and run your application as usual, no GPU programming expertise required. Try it now and see how you can reduce simulation time from days to hours. GPU Test Drive Try the Tesla GPU Test drive today! Step 1 Register Step 2 Upload your data Step 3 Run your application and get faster results AMBER CPU vs. GPU Performance Namd CPU vs. GPU Performance ns/day (Higher is Better) СPU Only CPU + GPU Node Nodes 4 Nodes 2.04 Cellulose NPT up to 3.5x faster Benchmarks for AMBER were generated with the following config -1 Node includes: Dual Tesla M2090 GPU (6GB), Dual Intel 6-core X5670 (2.93 GHz), AMBER 11 + Bugfix17, CUDA 4.0, ECC off. ns/day (Higher is Better) СPU Only CPU + GPU Node Nodes 4 Nodes 0.16 STMV up to 6.5x faster Benchmarks for NAMD were generated with the following config -1 Node includes: Dual Tesla M2090 GPU (6GB), Dual Intel 4-core Xeon (2.4 GHz), NAMD 2.8, CUDA 4.0, ECC On. Learn more and order today at: 15
16 GPU ACCELERATED ENGINEERING ansys: supercomputing from your workstation with NVIDIA Tesla gpu A new feature in ANSYS Mechanical leverages graphics processing units to significantly lower solution times for large analysis problem sizes. By Jeff Beisheim, Senior Software Developer, ANSYS, Inc. With ANSYS Mechanical 14.0 and NVIDIA Professional GPUs, you can: Improve product quality with 2x more design simulations Accelerate time-to-market by reducing engineering cycles Develop high fidelity models with practical solution times How much more could you accomplish if simulation times could be reduced from one day to just a few hours? As an engineer, you depend on ANSYS Mechanical to design high quality products efficiently. To get the most out of ANSYS Mechanical 14.0, simply upgrade your Quadro GPU or add a Tesla GPU to your workstation, or configure a server with Tesla GPUs, and instantly unlock the highest levels of ANSYS simulation performance. Future Directions As GPU computing trends evolve, ANSYS will continue to enhance its offerings as necessary for a variety of simulation products. Certainly, performance improvements will continue as GPUs become computationally more powerful and extend their functionality to other areas of ANSYS software. 20x Faster Simulations with GPUs Design Superior Products with CST Microwave Studio What can product engineers achieve if a single simulation run-time reduced from 48 hours to 3 hours? CST Microwave Studio is one of the most widely used electromagnetic simulation software and some of the largest customers in the world today are leveraging GPUs to introduce their products to market faster and with more confidence in the fidelity of the product design. Recommended Tesla Configurations Workstation Server 4x Tesla C2070 Dual-socket Quad-core CPU 48 GB System Memory 4x Tesla M2090 Dual-socket Quad-core CPU 48 GB System Memory Relative Performance vs CPU x CPU 9X Faster with Tesla GPU 2x CPU + 1x C x Faster with Tesla GPUs 2x CPU + 4x C2070 Benchmark: BGA models, 2M to 128M mesh models. CST MWS, transient solver. CPU: 2x Intel Xeon X5620. Single GPU run only on 50M mesh model. 16
17 GPU acceleration in Computer aided Engineering MSC Nastran, Marc 5X performance boost with single GPU over single core, >1.5X with 2 GPUs over 8 core Nastran direct equation solver is GPU accelerated Real, symmetric sparse direct factorization Handles very large fronts with minimal use of pinned host memory Impacts SOL101, SOL400 that are dominated by MSCLDL factorization times More of Nastran (SOL108, SOL111) will be moved to GPU in stages Support of multi-gpu and for both Linux and Windows With DMP> 1, multiple fronts are factorized concurrently on multiple GPUs; 1 GPU per matrix domain NVIDIA GPUs include Tesla 20-series and Quadro 6000 CUDA 4.0 and above Sol101, 3.4M DOF Core 1 GPU + 1 Core 2 GPU + 2 Core 4 Core (SMP) 8 Core (DMP=2) (DMP=2) 4.6x 1.6x * fs0= NVIDIA PSG cluster node 2.2 TB SATA 5-way striped RAID Speed Up 5.6x 1.8x 3.4M DOF; fs0; Total 3.4M DOF; fs0; Solver Linux, 96GB memory, Tesla C2050, Nehalem 2.27ghz GPU ACCELERATED ENGINEERING SIMULIA Abaqus / Standard Reduce Engineering Simulation Times in Half As products get more complex, the task of innovating with more confidence has been ever increasingly difficult for product engineers. Engineers rely on Abaqus to understand behavior of complex assembly or of new materials. With GPUs, engineers can run Abaqus simulations twice as fast. A leading car manufacturer, NVIDIA customer, reduced the simulation time of an engine model from 90 minutes to 44 minutes with GPUs. Faster simulations enable designers to simulate more scenarios to achieve, for example, a more fuel efficient engine. Speed up vs. CPU only Abaqus 6.12 Multi GPU Execution 24 core 2 host, 48 GB memory per host 1,9 1,8 1,7 1,6 1,5 1,4 1,3 1,2 1,1 1 1 GPU/Host 2 GPUs/Host 1,5 1,5 3,0 3,4 3,8 Number of Equations (millions) 17
18 Visualize and simulate at the same time on a single system Nvidia Maximus Technology Engineers, designers, and content creation professionals are constantly being challenged to find new ways to explore and validate more ideas faster. This often involves creating content with both visual design and physical simulation demands. For example, designing a car or creating a digital film character and understanding how air flows over the car or the character s clothing moves in an action scene. Unfortunately, the design and simulation processes have often been disjointed, occurring on different systems or at different times. Introducing nvidia Maximus NVIDIA Maximus-powered workstations solve this challenge by combining the visualization and interactive design capability of NVIDIA Quadro GPUs and the high-performance computing power of NVIDIA Tesla GPUs into a single workstation. Tesla companion processors automatically perform the heavy lifting of photorealistic rendering or engineering simulation computation. This frees up CPU resources for the work they re best suited for I/O, running the operating system and multi-tasking and also allows the Quadro GPU to be dedicated to powering rich, full-performance, interactive design. Reinventing the workflow Traditional workstation With Maximus, engineers, designers and content creation professionals can continue to remain productive and work with maximum complexity in real-time. Design Simulate (CPU) NVIDIA Maximus workstation Design Design 2 Design 3 Design Simulate (GPU) Simulate (GPU) Simulate (GPU) Simulate (GPU) Faster Iterations = Faster Time to Market Watch video*: NVIDIA Maximus Technology Helps Drive the Silver Arrow Mercedes-Benz Concept Car * Use your phone, smartphone or tablet PC with QR reader software to read the QR code. 18
19 Faster Simulations, more job instances Relative Performance Scale vs 2 CPU Cores 8 CPU Cores + Tesla C CPU Cores 2 CPU Cores Relative Performance Simulation analysis and cad With Maximus technology, you can perform simultaneous structural or fluid dynamics analysis with applications such as ANSYS while running your design application, including SolidWorks and PTC Creo. For more information, visit Nvidia Maximus Technology Maximus performance for 3ds max 2012 with iray Relative Performance Scale vs 8 CPU Cores Tesla C Quadro 6000 Tesla C Quadro 5000 Tesla C Quadro 4000 Tesla C Quadro 2000 Quadro CPU Cores Relative Performance Photorealistic rendering and CAD With Maximus technology, you can perform rapid photorealistic rendering of your designs in applications such as 3ds Max or Bunkspeed while still using your system for other work. For more information, visit quadro-3ds-max-uk MAXIMUS PERFORMANCE FOR CATIA LIVE RENDERING Relative Performance Scale vs 8 CPU Cores Tesla C Quadro 6000 Tesla C Quadro 4000 Quadro CPU Cores Relative Performance Ray tracing with Catia live rendering With Maximus technology, photorealistic rendering is interactive. And it allows you to simultaneously run other applications without bogging down your system. For more information, visit quadro-catia-uk ADOBE PREMIERE PRO PRICE/PERFORMANCE* Adobe Mercury Playback Engine (MPE) Tesla C Quadro 2000 Quadro 6000 Quadro 5000 Quadro 4000 Quadro CPU Cores Fast, fluid editing with premiere pro With Maximus technology, you can relieve the pressure of getting more done in less time. For more information, visit PremiereproCS5_uk % Value Increase * Adobe Premier Pro results obtained from 5 layers 6 effects per layer output to H.264 on a Dell T7500, 48 GB, Windows 7 at 1440 x 1080 resolution. Price performance calculated using cost per system and number of clips possible per hour. 19
20 2012 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, NVIDIA Tesla, CUDA, GigaThread, Parallel DataCache and Parallel NSight are trademarks and/or registered trademarks of NVIDIA Corporation. All company and product names are trademarks or registered trademarks of the respective owners with which they are associated. Features, pricing, availability, and specifications are all subject to change without notice.
www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING
www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging
Up to 4 PCI-E SSDs Four or Two Hot-Pluggable Nodes in 2U
Twin Servers Up to 4 PCI-E SSDs Four or Two Hot-Pluggable Nodes in 2U New Generation TwinPro Systems SAS 3.0 (12Gbps) Up to 12 HDDs/Node FDR(56Gbps)/QDR InfiniBand 1TB DDR3 up to 1866 MHz in 16 DIMMs Redundant
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
FLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
NVIDIA GRID OVERVIEW SERVER POWERED BY NVIDIA GRID. WHY GPUs FOR VIRTUAL DESKTOPS AND APPLICATIONS? WHAT IS A VIRTUAL DESKTOP?
NVIDIA GRID OVERVIEW Imagine if responsive Windows and rich multimedia experiences were available via virtual desktop infrastructure, even those with intensive graphics needs. NVIDIA makes this possible
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server
Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Clusters: Mainstream Technology for CAE
Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief
NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief NVIDIA changed the high performance computing (HPC) landscape by introducing its Fermibased GPUs that delivered
APACHE HADOOP PLATFORM HARDWARE INFRASTRUCTURE SOLUTIONS
APACHE HADOOP PLATFORM BIG DATA HARDWARE INFRASTRUCTURE SOLUTIONS 1 BIG DATA. BIG CHALLENGES. BIG OPPORTUNITY. How do you manage the VOLUME, VELOCITY & VARIABILITY of complex data streams in order to find
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture
White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive
FatTwin. 16% Lower Power Consumption 30% More Storage Capacity. Evolutionary 4U Twin Architecture. 4 Nodes in 4U 8 Hot-swap 3.
FatTwin Evolutionary 4U Twin Architecture 16% Lower Power Consumption 30% More Storage Capacity 4 Nodes in 4U 8 Hot-swap 3.5 HDDs per 1U 8 Nodes in 4U 6 Hot-swap 2.5 HDDs per ½U Optimized Design for Data
Unified Computing Systems
Unified Computing Systems Cisco Unified Computing Systems simplify your data center architecture; reduce the number of devices to purchase, deploy, and maintain; and improve speed and agility. Cisco Unified
HPC Update: Engagement Model
HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems [email protected] Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value
HUAWEI TECHNOLOGIES CO., LTD. HUAWEI FusionServer X6800 Data Center Server
HUAWEI TECHNOLOGIES CO., LTD. HUAWEI FusionServer X6800 Data Center Server HUAWEI FusionServer X6800 Data Center Server Data Center Cloud Internet App Big Data HPC As the IT infrastructure changes with
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
GPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
HUAWEI Tecal E6000 Blade Server
HUAWEI Tecal E6000 Blade Server Professional Trusted Future-oriented HUAWEI TECHNOLOGIES CO., LTD. The HUAWEI Tecal E6000 is a new-generation server platform that guarantees comprehensive and powerful
White Paper Solarflare High-Performance Computing (HPC) Applications
Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment
The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment Introduction... 2 Virtualization addresses key challenges facing IT today... 2 Introducing Virtuozzo... 2 A virtualized environment
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
Scaling from Workstation to Cluster for Compute-Intensive Applications
Cluster Transition Guide: Scaling from Workstation to Cluster for Compute-Intensive Applications IN THIS GUIDE: The Why: Proven Performance Gains On Cluster Vs. Workstation The What: Recommended Reference
Boost Database Performance with the Cisco UCS Storage Accelerator
Boost Database Performance with the Cisco UCS Storage Accelerator Performance Brief February 213 Highlights Industry-leading Performance and Scalability Offloading full or partial database structures to
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 [email protected] THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
24/12/8 UP Server Nodes in 3U
MicroCloud 24/12/8 UP Server Nodes in 3U New! Intel Atom Processor C2750 Series / Xeon E5-2600 v2 / E3-1200 v3 Product Families Support High-Density, High-Performance High-Efficiency, and Cost-Effective
QuickSpecs. NVIDIA Quadro M6000 12GB Graphics INTRODUCTION. NVIDIA Quadro M6000 12GB Graphics. Overview
Overview L2K02AA INTRODUCTION Push the frontier of graphics processing with the new NVIDIA Quadro M6000 12GB graphics card. The Quadro M6000 features the top of the line member of the latest NVIDIA Maxwell-based
Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution
Arista 10 Gigabit Ethernet Switch Lab-Tested with Panasas ActiveStor Parallel Storage System Delivers Best Results for High-Performance and Low Latency for Scale-Out Cloud Storage Applications Introduction
HP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich [email protected]
præsentation oktober 2011
Johnny Olesen System X presale præsentation oktober 2011 2010 IBM Corporation 2 Hvem er jeg Dagens agenda Server overview System Director 3 4 Portfolio-wide Innovation with IBM System x and BladeCenter
Accelerating CFD using OpenFOAM with GPUs
Accelerating CFD using OpenFOAM with GPUs Authors: Saeed Iqbal and Kevin Tubbs The OpenFOAM CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. Its user base represents a wide
Application Server Platform Architecture. IEI Application Server Platform for Communication Appliance
IEI Architecture IEI Benefits Architecture Traditional Architecture Direct in-and-out air flow - direction air flow prevents raiser cards or expansion cards from blocking the air circulation High-bandwidth
Private cloud computing advances
Building robust private cloud services infrastructures By Brian Gautreau and Gong Wang Private clouds optimize utilization and management of IT resources to heighten availability. Microsoft Private Cloud
CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW. Dell PowerEdge M-Series Blade Servers
CUTTING-EDGE SOLUTIONS FOR TODAY AND TOMORROW Dell PowerEdge M-Series Blade Servers Simplifying IT The Dell PowerEdge M-Series blade servers address the challenges of an evolving IT environment by delivering
Cisco Unified Computing System and EMC VNXe3300 Unified Storage System
Cisco Unified Computing System and EMC VNXe3300 Unified Storage System An Ideal Solution for SMB Server Consolidation White Paper January 2011, Revision 1.0 Contents Cisco UCS C250 M2 Extended-Memory Rack-Mount
LS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
ClearCube: Dedicated 1:1 Blade PC Workstations for Centralized and Virtualized Desktop Infrastructure
1 of 1 1/22/2013 9:59 AM PCoIP Blade PCs ClearCube Blade PCs deliver the highest datacenter-to-desktop PCoIP remote performance available in blade computers. Blade PCs provide IT organizations with the
The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015
The GPU Accelerated Data Center Marc Hamilton, August 27, 2015 THE GPU-ACCELERATED DATA CENTER HPC DEEP LEARNING PC VIRTUALIZATION CLOUD GAMING RENDERING 2 Product design FROM ADVANCED RENDERING TO VIRTUAL
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
PNY Professional Solutions NVIDIA GRID - GPU Acceleration for the Cloud
PNY Professional Solutions NVIDIA GRID - GPU Acceleration for the Cloud PNY Professional Solutions GRID PARALLEL COMPUTING QUADRO ADVANCED VISUALIZATION TESLA PARALLEL COMPUTING PREVAIL & PREVAIL ELITE
IBM System x family brochure
IBM Systems and Technology System x IBM System x family brochure IBM System x rack and tower servers 2 IBM System x family brochure IBM System x servers Highlights IBM System x and BladeCenter servers
Maximum performance, minimal risk for data warehousing
SYSTEM X SERVERS SOLUTION BRIEF Maximum performance, minimal risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (95TB) The rapid growth of technology has
The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver
1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution
Flash Memory Arrays Enabling the Virtualized Data Center. July 2010
Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
A Smart Investment for Flexible, Modular and Scalable Blade Architecture Designed for High-Performance Computing.
Appro HyperBlade A Smart Investment for Flexible, Modular and Scalable Blade Architecture Designed for High-Performance Computing. Appro HyperBlade clusters are flexible, modular scalable offering a high-density
NVIDIA GeForce GTX 580 GPU Datasheet
NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines
A-CLASS The rack-level supercomputer platform with hot-water cooling
A-CLASS The rack-level supercomputer platform with hot-water cooling INTRODUCTORY PRESENTATION JUNE 2014 Rev 1 ENG COMPUTE PRODUCT SEGMENTATION 3 rd party board T-MINI P (PRODUCTION): Minicluster/WS systems
PRIMERGY server-based High Performance Computing solutions
PRIMERGY server-based High Performance Computing solutions PreSales - May 2010 - HPC Revenue OS & Processor Type Increasing standardization with shift in HPC to x86 with 70% in 2008.. HPC revenue by operating
HUAWEI Tecal E9000 Converged Infrastructure Blade Server
E9000 Converged Infrastructure Blade Server Professional Trusted Future-oriented HUAWEI TECHNOLOGIES CO., LTD. The E9000 is a new-generation blade server that integrates computing, storage, switching,
Boosting Data Transfer with TCP Offload Engine Technology
Boosting Data Transfer with TCP Offload Engine Technology on Ninth-Generation Dell PowerEdge Servers TCP/IP Offload Engine () technology makes its debut in the ninth generation of Dell PowerEdge servers,
Sun Microsystems Special Promotions for Education and Research January 9, 2007
Sun Microsystems Special Promotions for Education and Research Solve big problems on a small budget with Sun-Education s trusted partner for cutting-edge technology solutions. Sun solutions help your campus
NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5.
Performance benefit of MAX5 for databases The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5 Vinay Kulkarni Kent Swalin IBM
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
SGI HPC Systems Help Fuel Manufacturing Rebirth
SGI HPC Systems Help Fuel Manufacturing Rebirth Created by T A B L E O F C O N T E N T S 1.0 Introduction 1 2.0 Ongoing Challenges 1 3.0 Meeting the Challenge 2 4.0 SGI Solution Environment and CAE Applications
Comparing the performance of the Landmark Nexus reservoir simulator on HP servers
WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus
Jean-Pierre Panziera Teratec 2011
Technologies for the future HPC systems Jean-Pierre Panziera Teratec 2011 3 petaflop systems : TERA 100, CURIE & IFERC Tera100 Curie IFERC 1.25 PetaFlops 256 TB ory 30 PB disk storage 140 000+ Xeon cores
Servers, Clients. Displaying max. 60 cameras at the same time Recording max. 80 cameras Server-side VCA Desktop or rackmount form factor
Servers, Clients Displaying max. 60 cameras at the same time Recording max. 80 cameras Desktop or rackmount form factor IVR-40/40-DSKT Intellio standard server PC 60 60 Recording 60 cameras Video gateway
Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage
Cisco for SAP HANA Scale-Out Solution Solution Brief December 2014 With Intelligent Intel Xeon Processors Highlights Scale SAP HANA on Demand Scale-out capabilities, combined with high-performance NetApp
Accelerating Data Compression with Intel Multi-Core Processors
Case Study Predictive Enterprise Intel Xeon processors Intel Server Board Embedded technology Accelerating Data Compression with Intel Multi-Core Processors Data Domain incorporates Multi-Core Intel Xeon
OPTIMIZING SERVER VIRTUALIZATION
OPTIMIZING SERVER VIRTUALIZATION HP MULTI-PORT SERVER ADAPTERS BASED ON INTEL ETHERNET TECHNOLOGY As enterprise-class server infrastructures adopt virtualization to improve total cost of ownership (TCO)
SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS - 2013 UPDATE
SUBJECT: SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE KEYWORDS:, CORE, PROCESSOR, GRAPHICS, DRIVER, RAM, STORAGE SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE Below is a summary of key components of an ideal SolidWorks
A High-Performance Storage and Ultra-High-Speed File Transfer Solution
A High-Performance Storage and Ultra-High-Speed File Transfer Solution Storage Platforms with Aspera Abstract A growing number of organizations in media and entertainment, life sciences, high-performance
Chapter 5 Cubix XP4 Blade Server
Chapter 5 Cubix XP4 Blade Server Introduction Cubix designed the XP4 Blade Server to fit inside a BladeStation enclosure. The Blade Server features one or two Intel Pentium 4 Xeon processors, the Intel
PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation
PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation 1. Overview of NEC PCIe SSD Appliance for Microsoft SQL Server Page 2 NEC Corporation
Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation
Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation Evaluation report prepared under contract with HP Executive Summary The computing industry is experiencing an increasing demand for
Recent Advances in HPC for Structural Mechanics Simulations
Recent Advances in HPC for Structural Mechanics Simulations 1 Trends in Engineering Driving Demand for HPC Increase product performance and integrity in less time Consider more design variants Find the
MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products
MaxDeploy Ready Hyper- Converged Virtualization Solution With SanDisk Fusion iomemory products MaxDeploy Ready products are configured and tested for support with Maxta software- defined storage and with
Building a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
MS Exchange Server Acceleration
White Paper MS Exchange Server Acceleration Using virtualization to dramatically maximize user experience for Microsoft Exchange Server Allon Cohen, PhD Scott Harlin OCZ Storage Solutions, Inc. A Toshiba
SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST
SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory supports up to 4 display monitor(s) without DisplayPort 4 x Maximum Display
Cisco 7816-I5 Media Convergence Server
Cisco 7816-I5 Media Convergence Server Cisco Unified Communications Solutions unify voice, video, data, and mobile applications on fixed and mobile networks, enabling easy collaboration every time from
HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief
Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...
Cost Efficient VDI. XenDesktop 7 on Commodity Hardware
Cost Efficient VDI XenDesktop 7 on Commodity Hardware 1 Introduction An increasing number of enterprises are looking towards desktop virtualization to help them respond to rising IT costs, security concerns,
The Fastest, Most Efficient HPC Architecture Ever Built
Whitepaper NVIDIA s Next Generation TM CUDA Compute Architecture: TM Kepler GK110 The Fastest, Most Efficient HPC Architecture Ever Built V1.0 Table of Contents Kepler GK110 The Next Generation GPU Computing
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
Dell Desktop Virtualization Solutions Stack with Teradici APEX 2800 server offload card
Dell Desktop Virtualization Solutions Stack with Teradici APEX 2800 server offload card Performance Validation A joint Teradici / Dell white paper Contents 1. Executive overview...2 2. Introduction...3
Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
Rack Solutions Integration Solution Service
Rack Solutions Professional service integrating server, storage, switch, and rack products to deliver applicationoptimized, end-to-end solutions to data centers of any organization. With this service,
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
GTC Presentation March 19, 2013. Copyright 2012 Penguin Computing, Inc. All rights reserved
GTC Presentation March 19, 2013 Copyright 2012 Penguin Computing, Inc. All rights reserved Session S3552 Room 113 S3552 - Using Tesla GPUs, Reality Server and Penguin Computing's Cloud for Visualizing
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
Cisco Unified Computing System Hardware
Cisco Unified Computing System Hardware C22 M3 C24 M3 C220 M3 C220 M4 Form Factor 1RU 2RU 1RU 1RU Number of Sockets 2 2 2 2 Intel Xeon Processor Family E5-2400 and E5-2400 v2 E5-2600 E5-2600 v3 Processor
Cisco UCS B-Series M2 Blade Servers
Cisco UCS B-Series M2 Blade Servers Cisco Unified Computing System Overview The Cisco Unified Computing System is a next-generation data center platform that unites compute, network, storage access, and
Turbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
