AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano
Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced cache structure integrated memory controller sustain multi-threaded application throughput fitting modern servers and workstations needs Daniele Magliozzi - Politecnico di Milano - 1 - AMD Opteron Quad-Core : A Brief Overview
3 levels of dedicated & shared cache 4 different caches accelerate instruction exec. and data processing L1 Instruction Cache: 64-Kbyte, 2-way set-associative, 64 bytes line length, LRU; for instruction loads, instruction prefetching, instruction predecoding, and branch prediction. L1 Data Cache: 64-Kbyte, 2-way set-associative, W.A. & W.B. with LRU, divided into eight banks(16 bytes wide), with prefetcher and 3- cycle load-to-use latency. L2 Cache: contains only victim or copy-back blocks from L1. L3 Cache: dynamically shared, non-inclusive victim cache with blocks allocated on L2 victim/copy-backs. Hits in L3 can either leave the data there (for data accessed by multiple cores), or remove the data from L3 placing it solely in L1(for data accessed by a single core) Daniele Magliozzi - Politecnico di Milano - 2 - AMD Opteron Quad-Core : A Brief Overview
DDR2 SDRAM with integrated memory controller SDRAM: store memory in memory cells activated using clock signal to synchronize their operation with an external data bus. DDR2 SDRAM: (double data rate synchronous dynamic random access memory) cells transfer data both on the rising and falling edge of the clock (a technique called "double pumping"). Improvement: operation of the external data bus at twice the clock rate achieved to obtain twice the bandwidth over its predecessor (DDR) Memory Controller: integrated on-die, manages the flow of data going to and from the main memory, optimizing memory performance and bandwidth per CPU and reducing latency inherent in front-side buffer architectures. Daniele Magliozzi - Politecnico di Milano - 3 - AMD Opteron Quad-Core : A Brief Overview
Direct Connect Architecture Front side bus eliminated, core directly connected to: memory controller I/O subsystem other processors by high bw. Hypertransport links. Improving overall system performance and efficiency by eliminating traditional bottlenecks inherent in legacy front side bus architectures. Daniele Magliozzi - Politecnico di Milano - 4 - AMD Opteron Quad-Core : A Brief Overview
HyperTransport Technology high-speed, low latency, point-to-point, unidirectional links between two devices, capable of extremely fast signaling (up to 800MHz ck. sp.) compatible with PCI interface. Packetized bus: addresses, data, and commands are sent along the same wires allowing narrower links easier to route. HT System: a processor with a HyperTransport port called HyperTransport host, the HyperTransport bus and any I/O channels connected to it. Differential signaling: (employed by links) use two wires for each signal, with the result being the difference between the two signals sent, does not suffer from problems associated with the singleended signaling of high speed parallel buses (bouncing signals, interference, cross-talk). Daniele Magliozzi - Politecnico di Milano - 5 - AMD Opteron Quad-Core : A Brief Overview
HyperTransport Technology (Switch Topology) supports multiple connection topologies: daisy chain, switch, star. Switch Topology The host communicates directly with the switch chip, which in turn manages multiple independent slaves including tunnels, bridges, and end-device chips (Parallelize Daisy Chain). Each port on the switch benefits from the full bandwidth of the HyperTransport technology I/O link because the switch directs the flow of electrical signals between the slave devices connected to it. Daniele Magliozzi - Politecnico di Milano - 6 - AMD Opteron Quad-Core : A Brief Overview
AMD Virtualization To allow multiple operating systems to run on the same physical platform, a SW platform layer ( Hypervisor) decouples the operating system from the underlying hardware. It is also a translation layer for guest virtual addresses that could operate in 2 ways: SW: Hypervisor modifies the guest source code to cooperate with him or to control his privileged operations(at run-time). HW-assisted virtualization: Hypervisor uses a set processor extensions (ex: AMD-V) to intercept and emulate guest privileged operations. In AMD-V technology Hypervisor specifies how the processor should handle privileged operations in guest itself without transferring control to the Hypervisor. This improves the efficiency of switching between VM, helping improve performance and effectively isolates VM for secure operation. Daniele Magliozzi - Politecnico di Milano - 7 - AMD Opteron Quad-Core : A Brief Overview
Rapid Virtualization Indexing (RVI) Paging enabled: the operating system defines a set of Page Tables, used by the Page Walker (implemented in processor HW), in order to translate the linear addresses to physical addresses. guest Page Table (gpt): another level of translation under virtualization. Hypervisor can manage it via SW (with the shadow Page Table) or via HW: nested Page Tables (npt): set by the Hypervisor in the Page Walker and letting it manage translations using a second level of translation, reducing overheads found in equivalent shadow paging implementations, storing recent translations in an internal translation look-aside buffer (TLB). Daniele Magliozzi - Politecnico di Milano - 8 - AMD Opteron Quad-Core : A Brief Overview
Power Performances Enhanced AMD PowerNow! with Independent Dynamic Core: Allows processors and cores to operate at various voltages and frequencies. AMD CoolCore Technology: Reduces processor energy consumption by turning off unused parts of the processor. AMD Smart Fetch Technology: Allows core to enter "halt" state and draw less power. Reduces CPU power consumption. Daniele Magliozzi - Politecnico di Milano - 9 - AMD Opteron Quad-Core : A Brief Overview
Opteron 4-C 3 rd generation optimizations 1.Load-Execute Instructions (for Floating-Point or Integer Operands) 2.Write-Combining (multiple memory-write cycles in a 64-B buffer) 3.Branches That Depend on Random Data(avoid random condition branch) 4.Loop Unrolling 5.Pointer Arithmetic in Loops(using loop count. as index into memory arrays) 6.Explicit Load Instructions 7.Reuse of Dead Registers 8.ccNUMA (cache coherent non-uniform memory access) 9.Prefetch and Streaming Instructions Daniele Magliozzi - Politecnico di Milano - 10 - AMD Opteron Quad-Core : A Brief Overview
Some Technical Data Core Speed System Bus Speed Integrated memory Speed Wattage L1 Cache Size L2 Cache Size L3 Cache Size 2800 MHz 2200 MHz 2200 MHz 75 W 64 Kbyte (X 4 Cache) 512 Kbyte (X 4 Cache) 6144 Kbyte (X 1 Cache) Daniele Magliozzi - Politecnico di Milano - 11 - AMD Opteron Quad-Core : A Brief Overview