Ethernet in the World s Top500 Supercomputers

Size: px
Start display at page:

Download "Ethernet in the World s Top500 Supercomputers"

Transcription

1 White PAPER Ethernet in the World s Top500 Supercomputers Updated June 2006 Introduction: The Top500 Supercomputing Elite The world s most powerful supercomputers continue to get faster. According to top500.org, which maintains the list of the 500 supercomputers with the highest Linpack performance, the aggregate performance of the listed computers has grown 21% in the last seven months and 65% in the last year. This growth rate is slower than for other recent lists, but continues to compare favorably with the rate of improvement predicted by Moore s Law (2x every 18 months). The #1 supercomputer on the June 2006 list is still the BlueGene/L whose performance is unchanged at TeraFlops, while the #500 supercomputer comes in at TeraFlops vs TeraFlops in November. The somewhat slower rate of performance improvement is notable throughout the list. At the top, seven of the Top10 systems from the November 2005 list were able to maintain a Top10 position. For the November 2005 list, only 4 systems from the June 2005 list held onto their Top 10 status. At the bottom of the new list, performance improvement caused 158 systems from November to be de-listed compared to more than 200 systems that fell off the previous time. All 500 listed supercomputers use architectures that employ large numbers of processors, (from as many as 131,072 to as few as 40), to achieve very high levels of parallel performance. A typical modern supercomputer is based on a large number of networked compute dedicated to parallel execution of applications plus a number of I/O that deal with external communications and with access to data storage resources. Top500.org categorizes the supercomputers on their list in the following way: Clusters: Parallel computer systems assembled from commercially available systems/servers and networking devices, with each compute or I/O node a complete system capable of standalone operation. The current list includes 364 cluster systems (up from 360 in 11/05), including the #6 and #7 in the Top 10. Constellations: Clusters in which the number of processors in a multi-processor compute node (typically an n-way Symmetric Multi-Processing or SMP node) exceeds the number of compute. There are 38 constellations listed in the Top500 (up from 36 in 11/05), with the highest performing system listed at #5. Not counting the #5 computer, the highest performing constellation is at #67. Massively Parallel Processors (): Parallel computer systems comprised in part of specialized, purpose-built and/ or networking systems that are not commercially available as separate components. s include vector supercomputers, DM-MIMD (distributed memory-multiple instruction stream, multiple data stream), and SM-MIMD (shared memory-multiple instruction stream, multiple data stream) supercomputers available from HPC computer vendors such as IBM, Cray, SGI, and NEC. systems account for 98 entries on the current list (down from 104 in 11/05), including #1 through #4 and three more of the Top 10. For a more detailed discussion of supercomputer designs and topologies, see the following white papers on the Force10 Networks website: Building Scalable, High Performance Cluster/Grid Networks: The Role of Ethernet ( Ethernet in High Performance Computing Clusters ( Among the major trends in recent Top500 lists is the emergence of highly scalable switched Gigabit Ethernet as the most widely deployed system interconnect (SI). The remainder of this document focuses on the key roles that Ethernet plays as a multi-purpose interconnect technology used by virtually all supercomputers on the Top500 list FORCE10 NETWORKS, INC. [ P AGE 1 OF 11 ]

2 The Rise of Clusters and Ethernet Cluster Interconnect Cluster systems have become the dominant category of supercomputer largely because of the unmatched price/ performance ratio they offer. As shown by the top curve in Figure 1, the total number of clusters on the list continues to grow, with an increase of 26% in the last two years. As clusters have become both more powerful and more cost-effective, they have helped to make High Performance Computing (HPC) considerably more accessible to corporations for speeding up numerous parallel applications in research, engineering, and data analysis. As shown by the middle curve in Figure 1, the number of Top500 clusters owned by industrial enterprises has grown by 53% over the last two years. The Top500 list places supercomputer owners in the following categories: industry, research, government, vendor, academic, and classified. Figure 1. Growth of cluster systems on the Top500 Over the last three years, the adoption of HPC clusters by industrial enterprises has spurred a 26% increase in the total number of industrial systems on the Top500, as shown by Figure 2. Clusters have now become the dominant computer architecture for the industrial component of thetop500. In June 2006, clusters account for over 88% of the 257 industrial supercomputers on the list. The cost-effectiveness of clusters as a category of supercomputer is driven by three major factors: 1. Availability of high volume server products incorporating very high performance single and multicore microprocessors, minimizing hardware costs. Enterprises can even build high performance clusters using the same models of server already being deployed in the data center for mainstream IT applications. Figure 2. Growth of industry-owned systems on the Top Linux is the cluster operating system of choice, minimizing software licensing costs. Linux is the operating system used by 367 supercomputers on the list, up from 334 one year ago. Most Linux systems on the list are clusters, although some s, including the IBM BlueGenes and Cray XT3s, also run Linux. Because Linux is increasingly popular as an enterprise server operating system, no new expertise is required and applications can readily be migrated from conventional Linux servers to Linux clusters. 3. Gigabit Ethernet (GbE) is the most cost-effective networking system for cluster interconnect (inter compute-node communication). GbE is particularly attractive to enterprises because it is already a familiar mainstream technology in data centers and campus LAN networks. In addition, most high end Linux servers come with integral GbE at no extra cost. As shown by the bottom curve on Figure 1, GbE is the cluster interconnect for 94% of the industrial enterprise clusters on the Top500 (212 out 226). The top performing GbE cluster, at #27on the list. with performance of 12.3 TeraFlops, is a Geoscience industry computer built by IBM. The system is a BladeCenter LS20 with 5,000 AMD Opteron processor cores. Networking for Supercomputers Regardless of whether the supercomputer architecture is a cluster, constellation, or, the computer that house the multiple processors must be supported by a network or multiple networks to provide the system connections for the following functions: IPC Fabric: Also known as simply the "Interconnect", an essential aspect of multi-processor supercomputing is the Interprocessor communications (IPC) that allow large numbers of processors/compute to work in 2006 FORCE10 NETWORKS, INC. [ P AGE 2 OF 11 ]

3 a parallel, yet coordinated, fashion. Depending on the application, the bandwidth and latency of transfers between processors can have a significant impact on overall performance. For example, processors may waste time in an idle state waiting to receive intermediary results from other processors. All compute are connected to the IPC fabric. In some cases, I/O are also connected to this fabric. Management Fabric: A separate management fabric allows system to be accessed for control, troubleshooting and status/performance monitoring without competing with the IPC for bandwidth. In general, every compute node and I/O node is attached to the management fabric. I/O Fabric. This fabric connects the supercomputer I/O to the outside world, providing user access and connection to complementary computing resources over the campus LAN, WAN, or Internet. Storage Fabric: A common practice is to attach file servers or other storage subsystems to I/O. This isolates the compute from the overhead of storage access. A separate storage fabric may be used to provide connectivity between I/O and file servers. In a few cases, the compute are attached to storage resources via a SAN, which then acts as the storage fabric. Top500.org focuses most of its attention on the Interconnect (IPC) fabric because that is obviously the Figure 3. Growth of Gigabit Ethernet vs. other IPC interconnects in Top500 network connection that has the primary impact on system performance. Figure 3 shows the number of supercomputers on recent Top500 lists that use each type of IPC Interconnect fabric. The continued rapid growth of GbE to its current position as the #1 Interconnect fabric in the Top500 (51% of the 6/06 list) reflects the accelerated adoption of clusters discussed previously. Virtually all the GbE IPC fabric systems shown in the chart are clusters. In Figure 3, the "Other" category includes vendorspecific Interconnect fabrics that computer vendors incorporate in their products, such as the IBM and Cray 3D torus networks, IBM s SP and Federation networks, the NEC crossbar, and the SGI NumaLink. Myrinet and Quadrics are commercially available proprietary HPC switching systems that were both specifically designed to provide low latency IPC Rank System LLNL BlueGene/L, IBM IBM Watson Research Center BlueGene/W, IBM LLNL ASC Purple, IBM NASA Columbia SGI Altix CEA Tera-10, Bull Sandia National Labs Thunderbird, Dell GSIC Centre Tsubame, NEC/Sun FZJ / JUBL BlueGene, IBM Sandia National Labs Red Storm, Cray XT3 Processors /System Type 131,072 DM-MIMD 40,960 DM-MIMD 10,240 10,160 SM-MIMD 8,704 Cluster 8,000 Cluster 10,368 Cluster 16,384 DM-MIMD 10,880 DM-MIMD Interconnect Fabric 3D Torus + Tree + Barrier 3D Torus + Tree + Barrier Federation switch NunaLink + 20 x IB + 40 x 10GbE Quadrics InfiniBand InfiniBand 3D Torus + Tree + Barrier 3D Torus Control/Mgmt FE on 65,536 compute FE on compute FE on 1,536 compute and I/O GbE on 20 Super FE/GbE on 600 FE on 8,192 compute 2,582 x FE External Network I/O GbE on 1,024 I/O GbE on 320 I/O 32x 10 GbE + 8 GbE Via 10 GbE fabric GbE on 56 I/O GbE on x GbE and 10 GbE via InfiniBand GbE on 288 I/O 40 x 10 GbE Storage FibreChannel Via 10 GbE fabric FibreChannel InfiniBand 80 x 10 GbE TeraFlops Earth Simulator NEC 5,120 Vector 640 x 640 crossbar GbE on 640 vector SC Nodes GbE on 640 vector SC FibreChannel on 640 vector SC 35.9 Figure 4. System interconnects of the Top 10 supercomputers (Top500 list) 2006 FORCE10 NETWORKS, INC. [ P AGE 3 OF 11 ]

4 system interconnect. InifinBand is an industry standard, low latency, general purpose system interconnect. InfiniBand is the only IPC interconnect besides GbE that is capturing a growing share of the Top500 list. As this chart indicates, the Top500 list, as a whole, is moving away from proprietary interconnect technologies. On the June 2006 list, all but one of the Quadrics systems are listed as clusters, while Myrinet systems include 53 clusters and 33 constellations (i.e., nearly all the constellations in the Top500 use Myrinet as the IPC Interconnect fabric). The recent decline in the number of Myrinet systems on the list is due partly to growth of the Ethernet cluster and partly to the decline in the number of constellations that make the list (down to 38 in June 2006 from 70 in June 2005). Networking for the Top 10 Supercomputers Figure 4 provides an overview of the Top 10 systems on the June 2005 top500.org list: with respect to the networking fabrics deployed in each of the functional areas described above. As can be noted from column four of the table, all of the systems in the Top 10, use low latency IPC Interconnect fabrics to help achieve high performance. The six systems in the table rely on the computer vendors proprietary interconnects, the clusters use commercially available interconnect (InfiniBand or Quadrics). It is notable that commercially available proprietary interconnects are losing popularity in the Top10 as well as throughout the list. Figure 5 summarizes typical performance levels for these more specialized interconnects as well as Gigabit Ethernet and 10 Gigabit Ethernet in terms of Message Passing Interface (MPI) latency and bandwidth. There are a number of Ethernet NIC technologies now on the market (RDMA, TOE, kernel bypass, etc.) that reduce the host component of IP/Ethernet MPI latency. Therefore, MPI latency for Ethernet is expected to continue to decline towards a figure closer to the switch latency on the order of 10 microseconds. The impact of TOE NICs is seen in the last row of the table. The TOE data comes from recent testing of 10 Gigabit Ethernet cluster interconnect by Chelsio Communications ( a leading Ethernet NIC supplier, and Los Alamos National Laboratory. The results demonstrate compelling performance levels compared with Myrinet and Infiniband. 10 GbE switches and TOE NICs are now available at volume price levels with additional improvements to come over the next couple of years. As these technologies ride further down the cost curve, Ethernet clusters should be able to continue to enhance their share of the Top500 by delivering ever-improving performance even without significant increases in processor counts. For a more detailed discussion of supercomputer designs and topologies, see the following white papers on the Force10 Networks website: Building Scalable, High Performance Cluster/Grid Networks: The Role of Ethernet ( Ethernet in High Performance Computing Clusters ( As shown in columns five and six of Figure 4, all of the Top 10 supercomputers use switched Gigabit Ethernet or Fast Ethernet networks as the management fabric and general I/O fabric. Gigabit Ethernet is also the predominant fabric used to connect I/O to file server resources. Therefore, although none of these Top 10 systems is categorized as a system with GbE interconnect, all of Technology Vendor MPI Latency (msec) short message single hop MPI Bandwidth (MB/s) (unidirectional) NumaLink 4 3D Torus QsNet II SeaStar 3D Torus InfiniBand 4X 640 x 640 crossbar Myrinet XP2 Gigabit Ethernet 10 Gigabit Ethernet SGI IBM Quadrics Cray Voltaire NEC Myricom Various Various n/a (non-toe) (with TOE) 3, , , (non-toe) 863 (with TOE) Source: IBM, NEC, Sandia, Chelsio, and SGI ( Figure 5. Latency and bandwidth of IPC interconnect fabrics 2006 FORCE10 NETWORKS, INC. [ P AGE 4 OF 11 ]

5 the systems make extensive use of Fast Ethernet, Gigabit Ethernet, and/or 10 Gigabit Ethernet for non-ipc system interconnect. A more complete description of the Top 10 Supercomputers on the list is included toward the end of this document. If the other 490 systems in the Top500 were examined as closely, we would expect to see that scalable Ethernet switching always plays an important system interconnect role in more than one of the four required functional areas. Top500 Performance by System Type Figure 6 is a column chart that shows the Linpack performance of all systems in the Top500, where the type of system is identified by the color of the column. Clusters are increasingly dominant between #100 and #500 on the list, accounting for 80% of the systems. In addition, clusters have made significant inroads among the Top 100 positions on the list. Clusters now occupy 45 positions in the Top100, including 37 with low latency interconnects and 7 with GbE interconnect. If current trends continue, we can expect to see clusters becoming even more dominant for at least the next one or two list iterations. Networking for Gigabit Ethernet Clusters For supercomputer clusters that use GbE as the Interconnect fabric, switched Ethernet technology can be chosen to satisfy all of the system networking requirements. Figure 7 provides a conceptual example of how this may be done. Highly scalable switches with 10 GbE and GbE ports are connected in a mesh forming a "fat tree" serving as the Figure 7. Cluster using Ethernet for all four system interconnect fabrics IPC fabric connecting compute and the I/O. Additional 10 GbE or GbE ports in the mesh of switches serve as the Storage Fabric that provides connectivity between the I/O and file servers. Another set of switch ports and logical interfaces can play the role of an I/O fabric connecting the supercomputer I/O to external resources and users. A separate set of meshed Fast Ethernet switches can be used to construct a out-of-band management fabric. Fast Ethernet has more than adequate bandwidth for the management fabric function and is very inexpensive. Frequently the management fabric can be built using re-purposed high density Fast Ethernet switches previously used for server connectivity in data centers or earlier generations of cluster. Figure 6. Performance of Top500 computers by system architecture 2006 FORCE10 NETWORKS, INC. [ P AGE 5 OF 11 ]

6 The Top 10 Supercomputers on the June 2006 Top500 List This section provides additional information on the systems in the Top 10. Information is limited to that which the owners or vendors of the systems have placed on their web sites or elsewhere on the Internet. Links to some of these information sources are included in the Appendix at the end of the document. #1 Lawrence Livermore National Labs (LLNL) Blue Gene/L The highest performing supercomputer on the Top500 list is the LLNL Blue Gene/L, whose performance has risen to TeraFlops from 137 TeraFlops five months ago. Blue Gene/L is an system with 65,536 dual-processor compute and 1,024 I/O. The compute run a stripped down version of the Linux kernel and the I/O run a complete version of the Linux operating system. The full system consists of 64 racks with each rack housing 1,024 compute and 16 I/O. Blue Gene/L (BG/L) uses three specialized networks for IPC: a 3D torus with 1.4 Gbps bidirectional bandwidth for the bulk of message passing via MPI, a tree network for collective operations, and a synchronization barrier/interrupt network. Interfaces for all three of these networks are integrated on the node processor ASICs as shown in Figure 8. Figure 8. Block diagram of the Blue Gene/L processor ASIC In addition to the IPC networks, further connectivity is provided by two separate Ethernet networks, as shown in Figure 9. Each compute node has a 100/1000 Ethernet interface dedicated to control and management, including system boot, debug, and performance/health monitoring (control information can also be transmitted via the ASIC s JTAG interface). Each of the 1,024 I/O uses Gigabit Ethernet for file access and external communications. Therefore, the LLNL BG/L system incorporates 65,536 ports of Fast Ethernet or GbE in the control management network and ports of GbE in the I/O and file server network. One of the key design guidelines for the BG/L was to optimize performance per watt of power consumed rather than maximizing performance per processor. The result is the ability to integrate 1,024 dual-processor compute into a rack 0.9 m wide, 0.9 m deep, and 1.9 m high that consumes 27.5 kw of total power. For example, BG/L yields 25 times more performance per KW than the NEC Earth Simulator at #7 on the current list. Figure 9. High level view of the Blue Gene/L system Because of the large number of in a single rack, more than 85% of the inter-node connectivity is contained within the racks. The corresponding dramatic reduction in connectivity across racks allows for higher density, higher reliability, and a generally more manageable system. Because the design philosophy led to a very large numbers of processors, the decision was made to provide the system with a very robust set of Reliability, Availability, and Serviceability (RAS) features. The BG/L design team 2006 FORCE10 NETWORKS, INC. [ P AGE 6 OF 11 ]

7 was able to exploit the flexibility afforded by an ASIC level design to integrate a number of RAS features typically not found on commodity servers used in cluster implementations. As supercomputers continue to scale up in processor count, RAS is expected to become an increasingly critical aspect of HPC system design. BG/L has been designed to be applicable to a broad range of applications in the following categories: Simulations of physical phenomena. Real-time data processing Off-line data analysis. Accordingly, IBM has made the BG/L into a standard product: line which it intends to sell to both the traditional HPC market and the broader enterprise market. The Linux-based IBM eserver Blue Gene Solution is available from 1 to 64 racks with peak performance up to 5.7 TeraFlops per rack. A one-rack entry version sells for approximately $1.5M. This price/performance point is likely to be attractive for enterprises with compute-intensive, mission critical applications that can be accelerated through parallelization. As a result of this eserver Blue Gene Solution initiative, we can expect to see an increasing number of Blue Gene systems appearing on the Top500 list for some time to come. There are 24 eserver Blue Gene Solution computers on the current list. #2 IBM Thomas J. Watson Research Center Blue Gene/W At #2 on the Top500 list, with performance of 91.3 TeraFlops, is another Blue Gene system installed at the IBM Thomas J. Watson Research Center (BG/W). BG/W uses the same system design as BG/L but is a 20 rack system with 20,480 compute and 320 I/O. Therefore, the IBM BG/W system incorporates 20,480 ports of Fast Ethernet or GbE in the control management network and 320 ports of GbE in the I/O and file server network. #3 Lawrence Livermore National Laboratory ASC Purple IBM At #3 with upgraded performance of 75.8 TeraFlops for the June 2006 list is the ASC Purple built by IBM for LLNL. ASC Purple currently consists of 1,536, including 1280 compute and 128 I/O. Purple is comprised of 131 node racks, 90 disk racks, and 48 switch racks. Each p575 Squadron IH node is an 8-way SMP server that is powered by eight Power5 microprocessors running at 1.9 GHz and is configured with 32 GB of memory. As shown in Figure 10, the ASC Purple IPC interconnect fabric is an IBM 3-stage, dual-plane Federation switch with 1,536 dual ports. This switch array is built from port switches and 9,216 cables. The fabric provides 8 GBps of peak bi-directional bandwidth. Purple has 2 million gigabytes of storage furnished by SATA and FibreChannel RAID arrays with over 11,000 disks. More than 2,000 FibreChannel2 links are required for storage access. In addition, the system has two Squadron 64-way Power5 Logically Partitioned into four login. Each login node has eight 10 GbE ports for parallel FTP access to the archive and 2 GbE ports for NFS and SSH (login) traffic. System management functions are facilitated with a separate Ethernet management fabric with over 1,536 Fast Ethernet ports. Figure 10. High level view of the ASC Purple system 2006 FORCE10 NETWORKS, INC. [ P AGE 7 OF 11 ]

8 processors in an SMP configuration. The SMP is based on FAME (Flexible Architecture for Multiple Environments) internal switches that are used to provide individual processors with access to I/O and shared memory. Note that the system has been incorrectly categorized as a constellation in the Top500 list. Quadrics QSnet II provides the IPC fabric connecting compute and I/O, FibreChannel on the I/O is used for storage connect, and Ethernet is used for data I/O and management. Figure 11. NASA s Columbia System #4 NASA Columbia SGI Altix 3700 The fourth system on the Top500 list at 51.9 TeraFlops is the NASA Columbia system consisting of 20 SGI Altix 3700 Superclusters, as shown in Figure11. Each Supercluster contains 512 Itanuium 2 processors with 1 Terabyte of global shared memory across the cluster. Each Supercluster runs a single image of the Linux operating system. The primary fabric for IPC is NumaLink, a low-latency proprietary SGI interconnect with low latency and 24 Gbps of bidirectional bandwidth. Each supernode is also connected with InfiniBand and two 10 Gigabit Ethernet ports for I/O and Storage system access. Therefore, this system design requires 40 ports of 10 Gigabit Ethernet switching. Figure 12. CEA s Tera-10 System #5 Commissariat à l Énergie Atomique (CEA) Tera-10 The fifth computer on the list is the Tera-10 supercomputer with performance of 42.9 Teraflops owned by the French nuclear energy agency. The Tera-10 is a Linux cluster of Bull NovaScale 602 servers consisting of 544 compute and 56 I/O, as shown conceptually in Figure 12. Each NovaScale 602 server node has 16 Itanium-2 #6 Sandia National Labs Thunderbird Dell PowerEdge Cluster At the #6 position, the Sandia Thunderbird is the second highest performing cluster on the list with 38.3 TeraFlops of performance. Thunderbird is constructed of 4,096 compute- consisting of Dell PowerEdge servers. Each PowerEdge U server has two single-core Intel 64-bit (EM64T) Xeon 3.6 MHz processors, for a total of 8,192 processors. The IPC fabric is provided by 10 Gbps InfiniBand. A large switched Ethernet network with 4600 GbE ports and forty 10 GbE ports serves as the management fabric, I/O fabric, and storage fabric of the cluster. The management fabric spans the compute, the InfiniBand switches, and the storage. #7 Tokyo Institute of Technology TSUBAME Sun Fire Cluster The #7 computer on the list at 38.2 TeraFlops is the Tokyo Tech TSUBAME based on 655 Sun Fire x64 servers with a total of 10,480 AMD Opteron processor cores and Sun InifiniBand-attached storage. Each Sun Fire uses a Galaxy 4 8-way SMP processor configuration. All are interconnected via InfiniBand DDR (20 Gbps) for IPC communications, as well as for storage interconnect and network I/O via an InfiniBand/Ethernet Gateway. The TSUBAME, therefore, is based on the version of a converged server fabric being promoted by InfiniBand vendors. The Sun Fire servers also use ClearSpeed's Advance floating-point co-processors to accelerate floating point operations. The Advance board can reportedly deliver 25 GigaFlops of number-crunching performance and only consume 10 watts of power. The Advance co-processor is a multi-core special parallel processor implemented as a system on a chip. The coprocessor uses a MultiThreaded Array Processor (MTAP), with 96 floating point cores, and a high-speed network interconnecting them and dedicated DDR2 memory FORCE10 NETWORKS, INC. [ P AGE 8 OF 11 ]

9 Clusters based on multi-core SMP compute and multi-core co-processors, perhaps interconnected in a grid of clusters, appear to be a fruitful direction in the pursuit of higher performance that is less constrained by either physical size or power consumption difficulties. #8 Forschungszentrum Juelich (FZJ) JUBL BlueGene/L At #8 on the Top500 list, with performance of 37.3 TeraFlops, is another Blue Gene system installed at FZJ The JUelicher BlueGene/L (JUBL) uses the same system design as BG/L but is a 8 rack system with 8,192 compute and 288 I/O. Therefore, the JUBL system incorporates 8,192 ports of Fast Ethernet or GbE in the control/management network and 288 ports of GbE in the I/O and file server network. #9 Sandia National Labs Red Storm Cray XT3 Sandia worked closely with Cray to develop Thor s Hammer, the first supercomputer to use the Red Storm architecture. Cray has now leveraged the design to create its next generation product: the Cray XT3 supercomputer. Thor s Hammer is now listed as the Red Storm Cray XT3. Red Storm is currently comprised of 5,184 dual-core Opteron processor compute housed in 108 cabinets. In addition, there are 256 service and I/O housed in 16 cabinets. The compute run a microkernel derived from Linux developed at Sandia, while the service and I/O run a complete version of Linux. The system architecture allows the number of processors to be increased to 30,000 processors, potentially upping performance from the current 36.2 TeraFlops. The installation at Sandia will operate as a partitioned network configuration with a classified section (Red) and unclassified section (Black), as shown in Figure 13 The machine can be rapidly reconfigured to switch 50% of all the compute between the classified or unclassified sections. In Figure 13, the switchable compute cabinets are shown in white. In normal operation, threequarters of the compute are in either the Red or Black section. Red Storm uses a 3D Torus 27 x 16 x 24 IPC fabric to interconnect for its compute. The peak bi-directional bandwidth of each link is 7.6 GBps with a sustained bandwidth in excess of 4 GBps The torus leverages the Opteron s HyperTransport interfaces and is based on Cray s SeaStar chip. The Cray torus interconnect carries all message passing traffic as well as the traffic among the compute and the I/O as shown in Figure 13. As with other clusters, Fast Ethernet is used for management of the compute and I/O for a total of over 2,582 ports. In addition, Red Storm will incorporate more than 80 ports of 10GbE to connect the system to file servers and another 40 ports for external I/O to other computing resources such as a new "Visualization Cluster" for 3D modeling. Figure 13. High level view of the Sandia Red Storm 2006 FORCE10 NETWORKS, INC. [ P AGE 9 OF 11 ]

10 #10 NEC Earth Simulator System (ESS) The #10 system is the NEC Earth Simulator, in Japan. The Earth Simulator is a special purpose machine, made by NEC with the same vector processing technology used in the NEC SX-6 commercial product. The decision by NEC to base the design entirely on vector processors was something of a departure from previous approaches to supercomputer design. The Earth Simulator consists of 640 shared memory vector supercomputers that are connected by a massive highspeed interconnect network. The interconnection network (IN) consists of a 640 x 640 single-stage crossbar switch with approximately 100 Gbps of bi-directional bandwidth per port. The aggregate switching capacity of this interconnect network is over 63 Tbps. This high level of performance was achieved by splitting the switch into 128 data switch units, each consisting of a byte-wide 640 x 640 switch. The 128 data switch units are housed in 65 racks and require over 83,000 cables. Each supercomputer node contains eight vector processors with a peak performance of 8GFlops and a high-speed memory of 16 GBytes. The total number of processors is 5120 (8 x 640), which translates to a total of approximately 40 TeraFlops peak performance, and a total main memory of 10 Terabytes. However, the SX-6 processors consume considerable Figure 14. Special building for the Earth Simulator power and space. With only16 processors per rack, 320 racks are required for the processors alone. A special building 65m x 50m in area was constructed to house NSS as shown in Figure 14. The system layout for the ESS is similar to the one shown in Figure 15, which is from a large NEC SX-8 based system that adheres to the same general architecture. The Compute are connected by three switched networks: the 640 x 640 crossbar (IN or IXS), GbE for I/O and management, and a Fibre Channel SAN for storage access. Therefore, the ESS uses a total of 640 ports of GbE switching. Figure 15. Earth Simulator block diagram 2006 FORCE10 NETWORKS, INC. [ P AGE 10 OF 11 ]

11 Conclusion Switched Ethernet technology is making an increasingly significant contribution to the advancement of supercomputing and HPC. Within the Top500 Ethernet has achieved the following milestones: GbE is now the leading IPC fabric (used by 51% of the supercomputers on the list) GbE is the leading IPC fabric for clusters (69% of clusters use GbE) The cost-effectiveness of GbE is helping make supercomputing accessible to more industrial enterprises. (94% of all industrial clusters use GbE for the IPC fabric) Driven by GbE cluster technology, supercomputing is being more widely adopted by industry. With the growth in industrial clusters that began in earnest in June 2003, 88% of all industrial supercomputers are now clusters and 51% of the supercomputers on the list are now owned by industrial enterprises. Although they are not listed as being based on GbE interconnect, the Top 10 supercomputers in the world make extensive use of high density Fast Ethernet, GbE, and 10 GbE switching for non-ipc fabric functions: management, network I/O, and storage I/O. The cost-effectiveness and the accessibility of supercomputing based on GbE clusters has been well demonstrated within the Top500. This is encouraging more enterprises to work to identify opportunities to derive business benefit from parallel applications and HPC, even in areas such as financial analysis and database processing. As a result, GbE clusters are expected to continue to grow in significance as both a mainstream technology of the enterprise data center and as a component of the Top500 list. Appendix: Links for additional information Top500 Lists and Database #1 LLNL Blue Gene/L more on how the control Ethernet is used: General info on IBM s Blue Gene/L: #2 BGW IBM s Blue Gene at Watson: /4.BGW_Overview.pdf and something on the apps: /7.BGW_Mission_Utilization.pdf #3 LLNL ASC Purple #4 NASA Columbia #5 Commissariat à l Énergie Atomique (CEA) Tera-10 #6 Sandia Thunderbird #7 Tokyo Institute of Technology TSUBAME Sun Fire Cluster xml #8 Forschungszentrum Juelich (FZJ) JUBL BlueGene/L #9 Sandia Red Storm NewsRelease.html pr10_21_03.html ppt#6 #10 Earth Simulator Force10 Networks, Inc. 350 Holger Way San Jose, CA USA PHONE FACSIMILE 2006 Force10 Networks, Inc. All rights reserved. Force10 Networks and the Force10 logo are registered trademarks, and EtherScale, FTOS, SFTOS, and TeraScale are trademarks of Force10 Networks, Inc. All other brand and product names are trademarks or registered trademarks of their respective holders. Information in this document is subject to change without notice. Certain features may not yet be generally available. Force10 Networks, Inc. assumes no responsibility for any errors that may appear in this document. WP v FORCE10 NETWORKS, INC. [ P AGE 11 OF 11 ]

Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner

Supercomputing 2004 - Status und Trends (Conference Report) Peter Wegner (Conference Report) Peter Wegner SC2004 conference Top500 List BG/L Moors Law, problems of recent architectures Solutions Interconnects Software Lattice QCD machines DESY @SC2004 QCDOC Conclusions Technical

More information

Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect

Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect White PAPER Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect Introduction: High Performance Data Centers As the data center continues to evolve to meet rapidly escalating

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

Managing Data Center Power and Cooling

Managing Data Center Power and Cooling White PAPER Managing Data Center Power and Cooling Introduction: Crisis in Power and Cooling As server microprocessors become more powerful in accordance with Moore s Law, they also consume more power

More information

White Paper Solarflare High-Performance Computing (HPC) Applications

White Paper Solarflare High-Performance Computing (HPC) Applications Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters

More information

Current Trend of Supercomputer Architecture

Current Trend of Supercomputer Architecture Current Trend of Supercomputer Architecture Haibei Zhang Department of Computer Science and Engineering haibei.zhang@huskymail.uconn.edu Abstract As computer technology evolves at an amazingly fast pace,

More information

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007

More information

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one

More information

BSC - Barcelona Supercomputer Center

BSC - Barcelona Supercomputer Center Objectives Research in Supercomputing and Computer Architecture Collaborate in R&D e-science projects with prestigious scientific teams Manage BSC supercomputers to accelerate relevant contributions to

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

EMC Unified Storage for Microsoft SQL Server 2008

EMC Unified Storage for Microsoft SQL Server 2008 EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache Reference Copyright 2010 EMC Corporation. All rights reserved. Published October, 2010 EMC believes the information

More information

3G Converged-NICs A Platform for Server I/O to Converged Networks

3G Converged-NICs A Platform for Server I/O to Converged Networks White Paper 3G Converged-NICs A Platform for Server I/O to Converged Networks This document helps those responsible for connecting servers to networks achieve network convergence by providing an overview

More information

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability White Paper Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability The new TCP Chimney Offload Architecture from Microsoft enables offload of the TCP protocol

More information

Block based, file-based, combination. Component based, solution based

Block based, file-based, combination. Component based, solution based The Wide Spread Role of 10-Gigabit Ethernet in Storage This paper provides an overview of SAN and NAS storage solutions, highlights the ubiquitous role of 10 Gigabit Ethernet in these solutions, and illustrates

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

OpenPower: IBM s Strategy for Best of Breed 64-bit Linux

OpenPower: IBM s Strategy for Best of Breed 64-bit Linux HRG Harvard Research Group HRG Assessment: OpenPower: IBM s Strategy for Best of Breed 64-bit Linux Overview As users migrate from RISC/UNIX platforms to Linux, to which 64-bit architecture will they move

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

Cluster Implementation and Management; Scheduling

Cluster Implementation and Management; Scheduling Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /

More information

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller White Paper From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller The focus of this paper is on the emergence of the converged network interface controller

More information

Advanced Core Operating System (ACOS): Experience the Performance

Advanced Core Operating System (ACOS): Experience the Performance WHITE PAPER Advanced Core Operating System (ACOS): Experience the Performance Table of Contents Trends Affecting Application Networking...3 The Era of Multicore...3 Multicore System Design Challenges...3

More information

Building Clusters for Gromacs and other HPC applications

Building Clusters for Gromacs and other HPC applications Building Clusters for Gromacs and other HPC applications Erik Lindahl lindahl@cbr.su.se CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network

More information

Clusters: Mainstream Technology for CAE

Clusters: Mainstream Technology for CAE Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux

More information

Upgrading Data Center Network Architecture to 10 Gigabit Ethernet

Upgrading Data Center Network Architecture to 10 Gigabit Ethernet Intel IT IT Best Practices Data Centers January 2011 Upgrading Data Center Network Architecture to 10 Gigabit Ethernet Executive Overview Upgrading our network architecture will optimize our data center

More information

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer

More information

On-Demand Supercomputing Multiplies the Possibilities

On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Image courtesy of Wolfram Research, Inc. On-Demand Supercomputing Multiplies the Possibilities Microsoft Windows Compute Cluster Server

More information

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007 Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms Cray User Group Meeting June 2007 Cray s Storage Strategy Background Broad range of HPC requirements

More information

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC

More information

High Performance MySQL Cluster Cloud Reference Architecture using 16 Gbps Fibre Channel and Solid State Storage Technology

High Performance MySQL Cluster Cloud Reference Architecture using 16 Gbps Fibre Channel and Solid State Storage Technology High Performance MySQL Cluster Cloud Reference Architecture using 16 Gbps Fibre Channel and Solid State Storage Technology Evaluation report prepared under contract with Brocade Executive Summary As CIOs

More information

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers

Comparing the performance of the Landmark Nexus reservoir simulator on HP servers WHITE PAPER Comparing the performance of the Landmark Nexus reservoir simulator on HP servers Landmark Software & Services SOFTWARE AND ASSET SOLUTIONS Comparing the performance of the Landmark Nexus

More information

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET CRAY XT3 DATASHEET Cray XT3 Supercomputer Scalable by Design The Cray XT3 system offers a new level of scalable computing where: a single powerful computing system handles the most complex problems every

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed and Cloud Computing Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading

More information

InfiniBand vs Fibre Channel Throughput. Figure 1. InfiniBand vs 2Gb/s Fibre Channel Single Port Storage Throughput to Disk Media

InfiniBand vs Fibre Channel Throughput. Figure 1. InfiniBand vs 2Gb/s Fibre Channel Single Port Storage Throughput to Disk Media InfiniBand Storage The Right Solution at the Right Time White Paper InfiniBand vs Fibre Channel Throughput Storage Bandwidth (Mbytes/sec) 800 700 600 500 400 300 200 100 0 FibreChannel InfiniBand Figure

More information

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Cray Gemini Interconnect Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak Outline 1. Introduction 2. Overview 3. Architecture 4. Gemini Blocks 5. FMA & BTA 6. Fault tolerance

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

10GBASE T for Broad 10_Gigabit Adoption in the Data Center

10GBASE T for Broad 10_Gigabit Adoption in the Data Center 10GBASE T for Broad 10_Gigabit Adoption in the Data Center Contributors Carl G. Hansen, Intel Carrie Higbie, Siemon Yinglin (Frank) Yang, Commscope, Inc 1 Table of Contents 10Gigabit Ethernet: Drivers

More information

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution Arista 10 Gigabit Ethernet Switch Lab-Tested with Panasas ActiveStor Parallel Storage System Delivers Best Results for High-Performance and Low Latency for Scale-Out Cloud Storage Applications Introduction

More information

Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security

More information

Improved LS-DYNA Performance on Sun Servers

Improved LS-DYNA Performance on Sun Servers 8 th International LS-DYNA Users Conference Computing / Code Tech (2) Improved LS-DYNA Performance on Sun Servers Youn-Seo Roh, Ph.D. And Henry H. Fong Sun Microsystems, Inc. Abstract Current Sun platforms

More information

Lustre Networking BY PETER J. BRAAM

Lustre Networking BY PETER J. BRAAM Lustre Networking BY PETER J. BRAAM A WHITE PAPER FROM CLUSTER FILE SYSTEMS, INC. APRIL 2007 Audience Architects of HPC clusters Abstract This paper provides architects of HPC clusters with information

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Interoperability Testing and iwarp Performance. Whitepaper

Interoperability Testing and iwarp Performance. Whitepaper Interoperability Testing and iwarp Performance Whitepaper Interoperability Testing and iwarp Performance Introduction In tests conducted at the Chelsio facility, results demonstrate successful interoperability

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

SAND 2006-3866P Issued by Sandia National Laboratories for NNSA s Office of Advanced Simulation & Computing, NA-114.

SAND 2006-3866P Issued by Sandia National Laboratories for NNSA s Office of Advanced Simulation & Computing, NA-114. ON THE COVER: This Parallel Volume Rendering of a cross-wind fire simulation shows the temperature of gases. This 150 million degree-of-freedom simulation uses loosely coupled SIERRA framework s codes:

More information

Cut I/O Power and Cost while Boosting Blade Server Performance

Cut I/O Power and Cost while Boosting Blade Server Performance April 2009 Cut I/O Power and Cost while Boosting Blade Server Performance 1.0 Shifting Data Center Cost Structures... 1 1.1 The Need for More I/O Capacity... 1 1.2 Power Consumption-the Number 1 Problem...

More information

InfiniBand Strengthens Leadership as the High-Speed Interconnect Of Choice

InfiniBand Strengthens Leadership as the High-Speed Interconnect Of Choice InfiniBand Strengthens Leadership as the High-Speed Interconnect Of Choice Provides the Best Return-on-Investment by Delivering the Highest System Efficiency and Utilization TOP500 Supercomputers June

More information

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009 ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory

More information

Life Sciences Opening the pipe to faster research, discovery, computation and resource sharing

Life Sciences Opening the pipe to faster research, discovery, computation and resource sharing Solution Brief: Life Sciences Opening the pipe to faster research, discovery, computation and resource sharing Abstract Advances in Information Technology (IT) are significantly improving the speed at

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage Cisco for SAP HANA Scale-Out Solution Solution Brief December 2014 With Intelligent Intel Xeon Processors Highlights Scale SAP HANA on Demand Scale-out capabilities, combined with high-performance NetApp

More information

Cloud-Based Apps Drive the Need for Frequency-Flexible Clock Generators in Converged Data Center Networks

Cloud-Based Apps Drive the Need for Frequency-Flexible Clock Generators in Converged Data Center Networks Cloud-Based Apps Drive the Need for Frequency-Flexible Generators in Converged Data Center Networks Introduction By Phil Callahan, Senior Marketing Manager, Timing Products, Silicon Labs Skyrocketing network

More information

Virtual Compute Appliance Frequently Asked Questions

Virtual Compute Appliance Frequently Asked Questions General Overview What is Oracle s Virtual Compute Appliance? Oracle s Virtual Compute Appliance is an integrated, wire once, software-defined infrastructure system designed for rapid deployment of both

More information

ANALYSIS OF SUPERCOMPUTER DESIGN

ANALYSIS OF SUPERCOMPUTER DESIGN ANALYSIS OF SUPERCOMPUTER DESIGN CS/ECE 566 Parallel Processing Fall 2011 1 Anh Huy Bui Nilesh Malpekar Vishnu Gajendran AGENDA Brief introduction of supercomputer Supercomputer design concerns and analysis

More information

2. COMPUTER SYSTEM. 2.1 Introduction

2. COMPUTER SYSTEM. 2.1 Introduction 2. COMPUTER SYSTEM 2.1 Introduction The computer system at the Japan Meteorological Agency (JMA) has been repeatedly upgraded since IBM 704 was firstly installed in 1959. The current system has been completed

More information

High Performance Computing (HPC)

High Performance Computing (HPC) High Performance Computing (HPC) High Performance Computing (HPC) White Paper Attn: Name, Title Phone: xxx.xxx.xxxx Fax: xxx.xxx.xxxx 1.0 OVERVIEW When heterogeneous enterprise environments are involved,

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

RLX Technologies Server Blades

RLX Technologies Server Blades Jane Wright Product Report 10 July 2003 RLX Technologies Server Blades Summary RLX Technologies has designed its product line to support parallel applications with high-performance compute clusters of

More information

Dell PowerEdge Blades Outperform Cisco UCS in East-West Network Performance

Dell PowerEdge Blades Outperform Cisco UCS in East-West Network Performance Dell PowerEdge Blades Outperform Cisco UCS in East-West Network Performance This white paper compares the performance of blade-to-blade network traffic between two enterprise blade solutions: the Dell

More information

supercomputing. simplified.

supercomputing. simplified. supercomputing. simplified. INTRODUCING WINDOWS HPC SERVER 2008 R2 SUITE Windows HPC Server 2008 R2, Microsoft s third-generation HPC solution, provides a comprehensive and costeffective solution for harnessing

More information

Private cloud computing advances

Private cloud computing advances Building robust private cloud services infrastructures By Brian Gautreau and Gong Wang Private clouds optimize utilization and management of IT resources to heighten availability. Microsoft Private Cloud

More information

Cluster Computing at HRI

Cluster Computing at HRI Cluster Computing at HRI J.S.Bagla Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019. E-mail: jasjeet@mri.ernet.in 1 Introduction and some local history High performance computing

More information

An Analysis of 8 Gigabit Fibre Channel & 10 Gigabit iscsi in Terms of Performance, CPU Utilization & Power Consumption

An Analysis of 8 Gigabit Fibre Channel & 10 Gigabit iscsi in Terms of Performance, CPU Utilization & Power Consumption An Analysis of 8 Gigabit Fibre Channel & 1 Gigabit iscsi in Terms of Performance, CPU Utilization & Power Consumption An Analysis of 8 Gigabit Fibre Channel & 1 Gigabit iscsi 1 Key Findings Third I/O found

More information

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building

More information

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Accelerating High-Speed Networking with Intel I/O Acceleration Technology White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair

InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures Brian Sparks IBTA Marketing Working Group Co-Chair Page 1 IBTA & OFA Update IBTA today has over 50 members; OFA

More information

iscsi Top Ten Top Ten reasons to use Emulex OneConnect iscsi adapters

iscsi Top Ten Top Ten reasons to use Emulex OneConnect iscsi adapters W h i t e p a p e r Top Ten reasons to use Emulex OneConnect iscsi adapters Internet Small Computer System Interface (iscsi) storage has typically been viewed as a good option for small and medium sized

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Brocade Solution for EMC VSPEX Server Virtualization

Brocade Solution for EMC VSPEX Server Virtualization Reference Architecture Brocade Solution Blueprint Brocade Solution for EMC VSPEX Server Virtualization Microsoft Hyper-V for 50 & 100 Virtual Machines Enabled by Microsoft Hyper-V, Brocade ICX series switch,

More information

Using PCI Express Technology in High-Performance Computing Clusters

Using PCI Express Technology in High-Performance Computing Clusters Using Technology in High-Performance Computing Clusters Peripheral Component Interconnect (PCI) Express is a scalable, standards-based, high-bandwidth I/O interconnect technology. Dell HPC clusters use

More information

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing Microsoft Windows Compute Cluster Server Runs

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Building Enterprise-Class Storage Using 40GbE

Building Enterprise-Class Storage Using 40GbE Building Enterprise-Class Storage Using 40GbE Unified Storage Hardware Solution using T5 Executive Summary This white paper focuses on providing benchmarking results that highlight the Chelsio T5 performance

More information

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband

Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering slyness@appro.com Company Overview

More information

State of the Art Cloud Infrastructure

State of the Art Cloud Infrastructure State of the Art Cloud Infrastructure Motti Beck, Director Enterprise Market Development WHD Global I April 2014 Next Generation Data Centers Require Fast, Smart Interconnect Software Defined Networks

More information

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

Optimizing Infrastructure Support For Storage Area Networks

Optimizing Infrastructure Support For Storage Area Networks Optimizing Infrastructure Support For Storage Area Networks December 2008 Optimizing infrastructure support for Storage Area Networks Mission critical IT systems increasingly rely on the ability to handle

More information

Michael Kagan. michael@mellanox.com

Michael Kagan. michael@mellanox.com Virtualization in Data Center The Network Perspective Michael Kagan CTO, Mellanox Technologies michael@mellanox.com Outline Data Center Transition Servers S as a Service Network as a Service IO as a Service

More information

New Storage System Solutions

New Storage System Solutions New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems

More information

Network Bandwidth Measurements and Ratio Analysis with the HPC Challenge Benchmark Suite (HPCC)

Network Bandwidth Measurements and Ratio Analysis with the HPC Challenge Benchmark Suite (HPCC) Proceedings, EuroPVM/MPI 2005, Sep. 18-21, Sorrento, Italy, LNCS, Springer-Verlag, 2005. c Springer-Verlag, http://www.springer.de/comp/lncs/index.html Network Bandwidth Measurements and Ratio Analysis

More information

HPC Update: Engagement Model

HPC Update: Engagement Model HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems mikev@sun.com Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value

More information

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

QUADRICS IN LINUX CLUSTERS

QUADRICS IN LINUX CLUSTERS QUADRICS IN LINUX CLUSTERS John Taylor Motivation QLC 21/11/00 Quadrics Cluster Products Performance Case Studies Development Activities Super-Cluster Performance Landscape CPLANT ~600 GF? 128 64 32 16

More information

SummitStack in the Data Center

SummitStack in the Data Center SummitStack in the Data Center Abstract: This white paper describes the challenges in the virtualized server environment and the solution Extreme Networks offers a highly virtualized, centrally manageable

More information

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

The Bus (PCI and PCI-Express)

The Bus (PCI and PCI-Express) 4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the

More information

High Performance Computing in the Multi-core Area

High Performance Computing in the Multi-core Area High Performance Computing in the Multi-core Area Arndt Bode Technische Universität München Technology Trends for Petascale Computing Architectures: Multicore Accelerators Special Purpose Reconfigurable

More information

You re not alone if you re feeling pressure

You re not alone if you re feeling pressure How the Right Infrastructure Delivers Real SQL Database Virtualization Benefits The amount of digital data stored worldwide stood at 487 billion gigabytes as of May 2009, and data volumes are doubling

More information

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at www.frankdenneman.

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at www.frankdenneman. WHITE PAPER All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at www.frankdenneman.nl 1 Monolithic shared storage architectures

More information

7 Real Benefits of a Virtual Infrastructure

7 Real Benefits of a Virtual Infrastructure 7 Real Benefits of a Virtual Infrastructure Dell September 2007 Even the best run IT shops face challenges. Many IT organizations find themselves with under-utilized servers and storage, yet they need

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Martin County Administration. Information Technology Services. Proposal For. Storage Area Network Systems. Supporting WinTel Server Consolidation

Martin County Administration. Information Technology Services. Proposal For. Storage Area Network Systems. Supporting WinTel Server Consolidation Martin County Administration Information Technology Services Proposal For Storage Area Network Systems Supporting WinTel Server Consolidation February 17, 2005 Version: DRAFT 1.4 Tim Murnane, Systems Administrator

More information

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a

More information

Virtualizing the SAN with Software Defined Storage Networks

Virtualizing the SAN with Software Defined Storage Networks Software Defined Storage Networks Virtualizing the SAN with Software Defined Storage Networks Introduction Data Center architects continue to face many challenges as they respond to increasing demands

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC The Right Data, in the Right Place, at the Right Time José Martins Storage Practice Sun Microsystems 1 Agenda Sun s strategy and commitment to the HPC or technical

More information