April 2005 White Paper April 2005 t TM CONFIGURING DDR2 MEMORY ON DELL PLATFORMS BASED ON THE INTEL E7520 AND E7525 CHIP SETS Greg Darnell, Lead Engineer, Dell Server and Storage Performance Analysis Sundar Iyengar, Senior Performance Analyst, Intel Corporation This white paper was jointly developed by Dell and Intel. In the summer of 2004, Dell introduced new dual-processor Power- Edge servers and Dell Precision workstations equipped with Intel Xeon processors with Extended Memory 64 Technology (EM64T). The workstations are based on the Intel E7525 chip set, formerly code-named Tumwater, and most of the servers are based on the Intel E7520 chip set, formerly code-named Lindenhurst. These chip sets support the latest memory technology, Double Data Rate 2 (DDR2). 1 The new DDR2 interface enables the industry to continue to implement faster and higher-density main memory solutions. As processor performance improves at the rate of Moore's Law, DDR2 technology provides the memory subsystem performance improvements needed to balance overall system performance. The E7520 and E7525 chip sets have flexible DDR2 memory subsystem implementation options. Dell worked closely with Intel to design server and workstation platforms that provide flexible DDR2 memory configuration options that meet a variety of customer requirements. These platforms feature dual-channel, 400-MHz DDR2 memory architectures and have six DIMM slots. Customers can choose different memory configurations to meet capacity, performance, and other requirements of the applications that will run on the systems. In this white paper, we provide guidance on how to configure Dell PowerEdge and Dell Precision systems based on the E7520 and E7525 chip sets. We begin with a discussion of the memory support provided by these chip sets on Dell platforms. A glossary of the memory terminology used in this paper is presented on page 2. We continue with memory configuration guidance, supported by performance benchmarks, where helpful. Chip Set Memory Support The E7520 and E7525 chip sets provide two independent channels for controlling DDR2 400-MHz memory. Each channel supports up to four memory slots for a total of eight slots. Each channel is limited to four ranks of memory or a total of eight ranks. (See Memory Terminology for a discussion of ranks.) Dell platforms based on these chip sets support up to three DIMMs per channel and up to 16 GB of memory. Figure 1 depicts Dell s six-slot systems based on the E7520 and E7525 chip sets. Figure 1. Six-Slot Dell Systems Dell Precision and PowerEdge systems based on the Intel E7520 and E7525 chip sets have two memory channels and six DIMM slots. The availability of two independent channels is very important for good memory performance. Two channels provide twice the maximum theoretical memory bandwidth of a single-channel design and allow for more independent parallel operations to improve performance. Dell strongly recommends a minimum configuration of one DIMM per channel. In fact, Dell Precision workstations do not support a single-dimm configuration. 1. See Memory Terminology on page 2 for a discussion of memory terminology, including DDR2, DIMMs, ranks, and banks. Read Dell s technology white papers @ www.dell.com/r&d 1
www.dell.com/r&d Configuring DDR2 Memory on Dell Platforms Based on the Intel E7520 and E7525 Chip Sets Interleaving Support The E7520 and E7525 chip sets support more concurrent memory activities via interleaving than previous Intel chip sets. Interleaving refers to the way that the memory banks on a DDR2 DRAM can be accessed in parallel to enhance performance. Without interleaving, memory accesses are sequential. With interleaving, when one bank is read or written, another bank in the same or a different DIMM can simultaneously be activated for another command, a third can be readied for activation, and so forth. With 1-Gb DDR2 DRAM devices, as many as six activities can be in progress simultaneously. Adding ranks of DDR2 memory to a system increases the number of banks available for interleaving. For this reason, additional ranks in a memory configuration can Memory Terminology improve performance on applications that benefit from the improved memory performance associated with more-efficient interleaving. For example, many applications that are run on high-performance computing clusters (HPCC) and workstation systems used for high-end graphics applications can take advantage of interleaving. The interleaving mode used can further improve performance on these types of applications. The E7520 and E7525 chip sets operate in two different interleave modes symmetric and asymmetric determined by the types and quantities of DIMMs installed. Symmetric mode is used when eight identical ranks are installed in the system. This means that all eight ranks are composed of DRAM devices of the same density either 256 Mb, 512 Mb, or 1 Gb. A common symmetric configuration is four dual-rank DIMMs. All other configura- Platforms based on the E7520 and E7525 chip sets use single-rank and dual-rank DDR2-400 registered ECC DIMMs built from x4 and x8 DRAM parts. Here, we briefly discuss these and other terms. Double Data Rate (DDR) Memory interface technology with a data transfer rate that is twice the clock rate. DDR2 400 MHz is the latest generation of DDR technology that provides a number of evolutionary improvements over DDR. Because of its potential for increased operating frequency in high capacity systems, DDR2 400 MHz provides higher bandwidth than DDR 266 MHz and DDR 333 MHz. Despite its higher frequency, DDR2 operates with lower I/O and core voltages 1.8, instead of 2.5, volts providing significant power savings. Inline Memory Module (DIMM) Memory module found in most server, workstation, and desktop computers. DIMMs are available in various sizes such as 256 MB, 512 MB, 1 GB, and 2 GB. Dynamic RAM (DRAM) Device The basic building block of a DIMM. Each DIMM is composed of DRAM devices of a particular memory density. Current available memory densities are 256 megabit (Mb), 512 Mb, and 1 Gigabit (Gb). Each DRAM device typically used for server and workstation DIMMs may be x4 or x8, depending on its number of data outputs. A x4 DRAM device has four data outputs and a x8 has eight data outputs. A number of DRAM devices are connected to provide the desired DIMM capacity. For instance, a 512-MB DIMM can be built with eight x8 512-Mb DRAM devices. This provides 512 MB of total capacity and 64 data outputs. An additional x8 DRAM device can be added to support error correcting code (ECC) for a total of 72 data outputs. All E7520 and E7525 platforms support ECC memory. ECC Refers to a set of extra bits used to detect and correct memory errors. The memory bus operates at a width of 64 bits, with an additional 8 bits of ECC for a total of 72 bits. ECC also refers to the 72-bit width of the word provided by a DIMM that supports ECC. s A set of DRAM devices on a DIMM that provides eight bytes (64 bits) of data outputs, or nine bytes (72 bits) in ECC implementations. All of the DRAM devices in a rank are tied to a chip-select signal. Using x4 DRAM devices, a rank of ECC memory is composed of 72/4 or 18 DRAMs. Similarly, using x8 DRAM devices, a rank is composed of only 72/8 or 9 DRAMs. A DIMM module may contain 1 or 2 ranks. For example, a 512-MB ECC DIMM built with nine x8, 512-Mb DRAM devices is a single-rank module. Usually, these DRAM devices are mounted on one side of the DIMM. If the DIMM contains nine additional x8, 512-Mb DRAM devices tied to a second chip-select signal, the DIMM is dual-rank with a total of 1 GB capacity. The DRAM devices that make up the second rank are usually mounted on the other side of the DIMM. Thirty-six x4 1-Gb DRAM devices are required to construct a 4-GB, dual-rank ECC DIMM. This DIMM is called a stacked DIMM because devices are often, but not always, physically stacked on top of each other, each set forming two ranks. Banks Each DRAM device on a DIMM is organized into a number of banks that can be accessed simultaneously, thus increasing performance. DDR DRAMs have four banks. Depending on density, DDR2 DRAMs have four or eight banks. Registered and Unbuffered DIMMs For electrical loading reasons, when a platform supports a large number of DIMMs, the clock and address signals are buffered within each DIMM to strengthen the signal. This type of DIMM is called a registered (or buffered) DIMM. Platforms based on the E7520 and E7525 chip sets require registered DIMMs. This is a single-rank 512-MB DIMM. It is composed of eight 512-Mb DRAMs. These are x8 DRAMs, yielding a total of 8x8 = 64 data outputs. Thus, this set of DRAMs provides 64 bits of data output and is referred to as a rank of memory. If this DIMM was a single rank of ECC memory, it would contain an additional x8 512-Mb DRAM to provide 8 more bits for error correction. Each DRAM is organized into four banks that can be accessed simultaneously. 2
April 2005 tions operate in asymmetric mode. The main difference between the modes is that symmetric mode uses lower-order address bits for interleaving than asymmetric mode, thus breaking contiguous memory into smaller chunks. The performance impact of lower-order address bits can be seen in memory benchmarks that use a relatively small amount of memory such as STREAM, which typically uses 25-50 MB. Most actual applications use larger amounts of memory and see little performance advantage. However, for certain compute-intensive scientific applications, symmetric-mode configurations can result in better performance. Reliability Features The E7520 and E7525 chip sets support three additional memory configuration options for server environments: chipfail, memory mirroring, and DIMM sparing. Chipfail allows continued operation in the presence of a DRAM chip failure. This feature requires DIMMs composed of x4 DRAM chips and use of both memory channels. Memory mirroring allows for two copies of all data in the memory subsystem (one on each channel) to be maintained. DIMM sparing allows for one DIMM per channel to be held in reserve and brought online if another DIMM in the channel becomes defective. DIMM sparing and memory mirroring are mutually exclusive of one another. In the following sections, we provide memory configuration guidance for Dell PowerEdge servers and Dell Precision workstations based on the Intel E7520 and E7525 chip sets. We begin with the top considerations when configuring for best performance. Configuring DDR2 Memory for Best Performance The memory subsystem can affect overall system performance in different ways. The three main attributes of the memory subsystem that affect performance are: Memory size Memory bandwidth or throughput Memory latency or response time Memory Size Total memory size usually has the most dramatic effect on workload performance. If an application and its data do not fit in memory, disk I/O will be required. Disk accesses take several orders of magnitude longer than memory accesses, and will have a severe negative impact on overall performance. When configuring memory, the primary consideration should be to provide sufficient memory for the application environment. Once this decision is made, the guidelines in the following sections can be used to optimize for memory bandwidth and latency. Memory Bandwidth and Latency Memory bandwidth is the amount of data moved through the memory system over a certain period of time. In this paper, we focus mainly on the bandwidth between system memory and the processors. In some applications, the bandwidth between system memory and the I/O subsystem can be more critical. However, the optimizations presented in this paper to improve bandwidth to the processors also improve bandwidth to the I/O subsystem. In contrast, memory latency refers to the average time, as seen by the processor, for each memory read, which includes time required to transfer data between the processor and memory and the delays waiting in any queues in between. In most systems, as memory bandwidth utilization rises, latency increases. However, when more parallelism is available in the memory subsystem, more memory accesses may be done simultaneously, and latency will be less affected. For applications that move large amounts of data between the processor and memory, optimizing for bandwidth is more important than optimizing for latency. The most effective step that customers can take to increase bandwidth is to choose dual-channel memory configurations. The E7520 and E7525 chip sets provide two independent channels for controlling DDR2 400-MHz memory. Populating both channels with at least one memory DIMM doubles the theoretical memory bandwidth. The associated performance increase can be significant as shown in the STREAM microbenchmark results presented in Figure 2 in the next section. However, small random accesses are more common in most business applications. In these cases, improving memory latency may have a more significant positive effect on system performance if the bandwidth and la- 3
www.dell.com/r&d Configuring DDR2 Memory on Dell Platforms Based on the Intel E7520 and E7525 Chip Sets tency provided by the system design are insufficient to meet an application's needs. Latency can be reduced by increasing parallelism through dual-channel configurations. It can also be reduced through interleaving efficiencies, which are discussed in Performance Benchmarks. Once you have chosen the amount of memory and configured it in a dual-channel configuration, the next considerations depend on the demands of your application environment. The following table summarizes Dell s recommended memory configuration priorities for Dell Precision workstations and general-purpose and HPCC PowerEdge servers. Priority General-Purpose Servers HPCC Servers Workstations 1 Sufficient memory for applications Sufficient memory for applications Sufficient memory for applications 2 Two channels Two channels Two channels 3 Reliability features Total ranks Expandability (chipfail, mirroring, or DIMM sparing) 4 Expandability Expandability Total ranks 5 Total ranks Table 1. Memory Configuration Priorities by Platform The top two priorities for all three platforms is to configure the system with sufficient memory for the applications being run and to populate both memory channels. The third priority varies by platform. For general-purpose servers, reliability features chipfail, mirroring, and DIMM sparing are extremely important. In contrast, workstation customers are usually more concerned with having adequate memory expansion capability to meet the increasing demands of workstation-class applications. For these customers, it is important to initially populate the memory slots appropriately so that there are open slots for expansion. In addition, because the chip sets support a maximum of eight ranks of memory (four per channel), the initial configuration must not exceed six ranks of memory (or three ranks per channel). 2 This configuration allows for two single-rank DIMMs to be added in the future. Finally, there are cost considerations when configuring memory. For instance, dual-rank DIMMs of a particular capacity such as 2 GB are usually less expensive than a singlerank DIMM of the same capacity. 3 For this reason, it is advisable to use dual-rank DIMMs when feasible. See Configuring Dell Precision Workstation and Power- Edge Server Systems later in this paper for a more detailed discussion of single- vs. dual-rank memory configuration options for Dell systems based on the E7520 and E7525 chip sets. The third priority for HPCC servers differs from the other platforms. It can be important for HPCC applications to maximize the number of ranks of memory installed. As mentioned earlier, increasing the number of ranks also increases the number of banks in the memory configuration. As shown in the following section, this can result in more-efficient interleaving and, thus, better memory performance for HPCC applications. Interleaving efficiencies can also benefit high-end workstation applications that rely heavily on floating-point operations. Performance Benchmarks In this section, we present microbenchmark results run by Dell performance labs. We begin with a set of microbenchmarks that measures the performance of the memory subsystem only. The benchmark results suggest that there can be performance benefits associated with increasing the number of ranks in an HPCC server or a high-end workstation memory configuration. There can also be performance benefits associated with symmetric-mode memory configurations. In addition, we present server and workstation application benchmarks. In each benchmark, the minimum memory configurations exceed the memory requirements of the benchmark so that the performance impacts of bandwidth and latency optimizations can be isolated. Microbenchmarks Memory-intensive microbenchmarks, in some cases designed specifically to measure the characteristics of 2. Most platforms based on the E7520/E7525 chip sets require that the slots be populated in pairs. For each DIMM in Channel A, there must be a corresponding DIMM in Channel B. The exception to this rule is platforms that are specially designed to support a single DIMM. 3. The reason for this is that dual-rank DIMMs are usually composed of lower-density 256-Mb DRAMs, which are less expensive than the higher-density 512-Mb or 1-Gb DRAMs used on single-rank DIMMs. 4
April 2005 a memory subsystem, can be used to find the optimal configurations for best memory bandwidth and latency. Three popular benchmarks that are sensitive to memory characteristics are STREAM, LMbench, and SPEC CPU2000. STREAM is designed to measure sustainable memory bandwidth as seen in very large vector-style scientific applications. LMbench measures a wide variety of system performance characteristics. A derivative of the LMbench memory latency subtest mmem is useful for measuring memory latency on Windowsbased systems. SPEC CPU2000 is a benchmark created by the Standard Performance Evaluation Corporation (SPEC) to compare compute-intensive workloads on different computer systems. Both STREAM and SPEC CPU2000 include a considerable number of sequential memory accesses, for which efficient memory interleaving is critical for best bandwidth. STREAM and SPEC CPU benchmarks are designed to simulate computing workloads similar to HPCC workloads. The LMBench latency test performs pointer chasing that is, jumping from place to place in memory, but only transferring a small amount of data. In this test, the speed of a memory access is critical and memory bandwidth is much less important. STREAM Benchmark Results The STREAM benchmark is composed of four subbenchmarks: copy, scale, add, and triad. We focus on the most commonly quoted single STREAM metric, triad. This metric measures the performance of a vector operation common to scientific and other similar applications that perform matrix math intensively. Figure 2 shows triad performance under a variety of memory configurations. The chart shows a clear progression of performance increases as the memory subsystem becomes more and more optimized. Starting with the top performance bar, the first and largest increase comes from the change from one to two memory channels. The following performance bars show that performance increases due to interleaving efficiencies as the number of DIMM ranks increases. The final two eight-rank configurations are symmetric eight identical ranks and show a measurable improvement over the eight-rank asymmetric configuration. SPEC CPU2000 Benchmark Results These benchmarks also show sensitivity to memory bandwidth. The rate benchmarks of CPU2000 measure the throughput of a system by running multiple copies of integer and floating-point workloads simultaneously. Figure 3 presents the benchmark results. A dual-processor system was used for this measurement and each processor executed one copy of the workload. Although the STREAM results show an almost 20 percent difference between four- and eight-rank configurations, the maximum difference in CPU2000 floating-point rate between four and eight ranks is about 15 percent. The integer-rate benchmark is usually less sensitive to memory because most of its component workloads fit in the processor cache and do not access system memory frequently. As a result, the maximum difference between four- and eight-rank configurations is just over 5 percent. Normalized STREAM Results (higher number is better) Figure 2. Relative STREAM Triad Performance Across Varying Number of s 5
www.dell.com/r&d Configuring DDR2 Memory on Dell Platforms Based on the Intel E7520 and E7525 Chip Sets Figure 3. Relative Impact of Memory Organization on -Processor System SPECrate 2000 Benchmark Results LMbench Benchmark Results Mmem latency measurements clearly show the benefits of the additional parallelism that extra ranks can provide. When running alone, mmem measures unloaded memory latency, which refers to the delay when there is minimal load on the memory subsystem. As Figure 4 shows, unloaded latency does not vary as the number of DIMM ranks increases. Because mmem is a single-threaded stream of reads, there is no benefit to parallelism in this case. However, when STREAM is run simultaneously with mmem to add a heavy load to the memory subsystem, the extra parallelism of multiple ranks results in significantly reduced latency. Latency in the eight-rank configuration increases 50 percent under load while the four-rank configuration increases 75 percent under load and the two-rank single-channel configuration shows an increase of 125 percent under load. Microbenchmark Result Conclusions These microbenchmark results suggest that there can be HPCC performance benefits associated with increasing the number of ranks and, thus, interleaving efficiencies. A small number of HPCC applications may be able to take advantage of the even-higher interleaving efficiency achieved under symmetric mode configurations. In addition, the floating-point results in the SPEC CPU2000 benchmark suggest that workstations can benefit from increasing the number of ranks when running high-end applications that rely heavily on floatingpoint operations. The workloads in these microbenchmarks are not typical of the memory workload presented by typical applications run on servers and workstations. In the next section, we explore the effect of increasing the number of ranks on mainstream server and workstation application benchmarks. Figure 4. Loaded vs. Unloaded Latency LMbench Benchmark Results 6
April 2005 Server Workloads Memory accesses in typical server applications are much more likely to be random, rather than sequential. For this reason, they do not benefit as much from more efficient interleaving. Figure 5 shows the effect of different rank configurations on a transaction-processing database workload, while holding memory size constant. Here we see very little difference between configurations; only 2 percent difference in transaction rate between two and eight ranks. Figure 5. Relative Performance Impact of Memory Organization on a Database Server Benchmark Figure 6 shows the relative performance of different numbers of ranks in a Web server benchmark environment. Here, the results for two, four, and six ranks are virtually identical, while the eight-rank symmetric configuration provides an extra 4 percent throughput. These benchmark scenarios are much more intensive than standard customer usage, typically involving sustained CPU utilization of 100 percent. Thus, benchmark tests demonstrate the effects of the memory change to a greater degree than will be seen in a customer environment. These results suggest that there is little benefit in additional ranks and symmetric-mode configurations for mainstream server applications. Workstation Workloads The SPEC Application Performance Characterization Group (SPECapc SM ) was formed to address graphics performance evaluation based on actual software applications. Benchmarks produced by SPECapc are ideal for demonstrating typical workstation application performance. SPECapc for 3ds max 6 and SPECapc for Maya 5.0 were chosen as the benchmarks likely to show the least and most effect from memory performance. The results are presented in Figures 7 and 8. As the charts show, 3ds max 6 shows a maximum of 1 percent difference between two and eight ranks, while Maya 5.0 shows a significant 4 percent difference from least to greatest numbers of ranks. These results suggest that there is little perceptible performance benefit for typical workstation applications associated with increasing the number of ranks and with symmetric-mode configurations. However, the microbenchmark results presented in Figure 3 suggest that there can be performance benefits for memoryintensive applications such as Nastran and Ansys, and high-end 3D workstation applications such as Dassault Catia. Figure 6. Relative Web Server Throughput Figure 7. SPECapc for 3ds max 6 7
www.dell.com/r&d Configuring DDR2 Memory on Dell Platforms Based on the Intel E7520 and E7525 Chip Sets and workstation applications is not significantly enhanced by increasing the number of ranks. Instead, other considerations are more important such as the ability to upgrade system memory in the future and, in the case of servers, reliability features such as chipfail, mirroring, and DIMM sparing. Configuring Dell Precision Workstations and PowerEdge Servers Figure 8. SPECapc for Maya 5.0 Summary The primary consideration when configuring DDR2 memory in Dell platforms based on the E7520/E7525 chip sets is to ensure that the system has a sufficient quantity of memory for the expected workload. Systems that have insufficient memory will rely more on paging less-used memory areas to and from a hard drive. The performance penalties of paging far outweigh any benefits to be gained from optimizing memory bandwidth or latency. Dell also strongly recommends that a minimum of two DIMMs be installed one per memory channel so that applications can take advantage of dual memory channels. This configuration is also required to enable the chipfail feature. Although entry-level, single-dimm configurations are supported on PowerEdge servers, most applications will show better performance with memory configurations that use both channels. The next considerations for memory configuration depend on the application environment and system type. The benchmark results in this paper show that many HPCC applications benefit greatly from additional memory performance. In these environments, increasing the number of ranks should take precedence over other considerations. This should also be a priority for certain high-end workstation applications, where interleaving may improve floating-point performance. On the other hand, the performance of most general-purpose server The memory subsystems of Dell server and workstation systems based on the Intel E7520/E7525 chip sets are carefully architected to strike a balance between performance, ability to upgrade, cost, and system size. Dell found the sweet spot to be six memory slots, which can be populated in a variety of ways to meet customer requirements, but which enable a small chassis size and competitive system pricing. Figure 9 shows how a Dell six-slot system can be configured with 8 GB of memory to meet two different customer requirements. If future expansion is not required, the six DIMM slots can be populated with 8 GB of memory less expensively than a four-slot system. Figure 9 also shows how the DIMM slots could be populated to allow for future expansion. Table 2 presents the options for populating Dell s DDR2 memory slots with single- and dual-rank memory. DDR2 Memory Slot Slot 1A Slot 1B Slot 2A Slot 2B Slot 3A Slot 3B NOTE: The single-channel configuration shown here is an entry-level configuration on PowerEdge servers that is not supported on Dell Precision Workstations. Table 2. Configuration Options for Dell Precision and PowerEdge Systems Based on Intel E7520/E7525 Chip Sets 8
April 2005 This four-slot system is maxed out at 8 GB of memory using four 2-GB DIMMs, which are the highest-capacity DIMMs commonly available today. Lower-Cost Configuration If future expansion is not a priority, the Dell six-slot system can be configured with 8 GB of memory more economically than the fourslot system by replacing two 2-GB DIMMs with less-expensive 1-GB DIMMs. Figure 9. Configuring a Six-Slot Dell System Future Expansion If future expansion is required, the Dell six-slot system can be configured with a mix of dualand single-rank 2-GB DIMMs, leaving two slots available for future expansion. The configuration shown here uses 6 of the 8 total ranks supported by the chip set, leaving 2 ranks for expansion. Alternatively, the expansion slots could be equipped with two single-rank 2-GB DIMMs for a maximum configuration of 12 GB of memory. For More Information For more information, see the following: Introducing DDR2 Memory in Eighth-Generation Dell PowerEdge Servers for Improved Performance, Dell Power Solutions, October 2004, www.dell.com/downloads/global/power/ ps4q04-20040168-radhakrishnan.pdf Intel white paper, New DDR2 Memory Offers Advantages for -Processor Servers, September 2004, www.intel.com/update/departments/standards/ st09042.pdf Technical data on Intel E7520/E7525 chip sets: http://developer.intel.com/design/chipsets/index.htm Intel DDR2 website: www.intel.com/technology/memory 9
www.dell.com/r&d Configuring DDR2 Memory on Dell Platforms Based on the Intel E7520 and E7525 Chip Sets Appendix The following is system configuration information for each benchmark presented in this paper. Figure 1: STREAM Triad Benchmark PowerEdge 2800, dual 3-6-GHz Intel Xeon processors with 1-MB L2 cache, processor version D0, BIOS X23, HyperThreading Technology (HT) off, Sequential Memory Access on, Microsoft Windows Server 2003 (32- bit with PAE enabled). Figure 2: SPEC CPU2000 Benchmarks PowerEdge 2800, dual 3-6-GHz Intel Xeon processors with 1-MB L2 cache, HT off, Sequential Memory Access on; Red Hat Enterprise Linux, version 3, update 2, for EM64T with Intel 8.1 EM64T compilers. Figure 3: LMbench Benchmark PowerEdge 2800, single 3.6-GHz Intel Xeon processor with 1-MB L2 cache, BIOS X23, HT off, Sequential Memory Access on, Microsoft Windows 2003 Server. Figure 4: Database Server Benchmark PowerEdge 2850, single 3.6-GHz Intel Xeon processor with 1-MB L2 cache, BIOS X23, HT on, Sequential Memory Access on, Microsoft Windows 2003 Server, SQL Server 2000. Figure 5: Web Server Benchmark PowerEdge 1850, dual 3.6-GHz Intel Xeon processors, HT on, version X23 BIOS, SUSE Linux Enterprise Server 9, Zeus Web Server 4.2r4. Figures 6 and 7: Workstation Benchmarks Dell Precision Workstation 670; dual 3.6-GHz Intel Xeon processors with 1-MB L2 cache and 800-MHz system bus, processor version D0; Windows XP Pro, SP1; HT Off; 2 GB of DDR2 400 registered ECC; Invidia Quadro FX 3400 with video driver version 65.62; Western Digital 160-GB Serial ATA 7200-RPM hard drive; version A02 BIOS. Disclaimer The performance tests and results in this document are measured using specific configurations within the Dell performance labs as of November 2004. Actual performance results will vary based on the user's system configuration and applications. These results should be used as a relative indicator only. Buyers should consider their own usage and consult with sources of performance data such as the Transaction Processing Council at www.tpc.org, SPEC at www.spec.org, and BAPCo at www.bapco.com. For the latest SPEC benchmark results, visit www.spec.org. THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. 2005 Dell Inc. All rights reserved. Trademarks used in this text: Dell, the DELL logo, Dell Precision, and PowerEdge are trademarks of Dell Inc.; Intel is a registered trademark and Xeon is a trademark of Intel Corporation; Microsoft and Windows are registered trademarks of Microsoft Corporation; Red Hat is a registered trademark of Red Hat, Inc.; SPEC and SPECrate are registered trademarks and SPECapc is a service mark of the Standard Performance Evaluation Corporation. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell Inc. disclaims any proprietary interest in trademarks and trade names other than its own. 10