Central Processor Units (CPUs) Overview... 1 CPU Manufacturers... 1 Current Intel and AMD Offerings... 1 Evolution of Intel Processors... 3 S-Spec Code... 5 Basic Components of a CPU... 6 The CPU Die and Package... 6 How CPUs Work... 6 Four Stages of CPU Operation... 6 Pipelining... 7 Superscalar Technology... 7 Deep Pipelines... 8 Registers... 8 64-Bit Processors... 9 Advantages and Disadvantages of 64-bit Processing... 9 Advantages... 9 Disadvantages... 9 The System Clock... 10 Over-Clocking... 10 Cache Memory... 10 Level 1 (L1) Cache... 10 Level 2 (L2) Cache... 11 Level 3 (L3) Cache... 11 Hyper-Threading... 11 Turbo Boost... 11 Overview The Central Processing Unit (CPU) is a small integrated circuit (that may contain millions of transistors) that executes the sequence of instructions found in a computer programs. It is the most complex and significant component in a computer. CPU Manufacturers The two major manufacturers of CPUs in the world are Intel and Advanced Micro Devices (AMD). AMD is a 5.4 billion dollar corporation headquartered in Sunnyvale, California, and Intel is a 43.6 billion dollar corporation headquartered in Santa Clara, California. In 3Q10, Intel earned 80.4% of the CPU market share (a loss of 0.3%) while AMD earned 19.2% (a gain of 0.2%). VIA Technologies earned 0.4% ( a gain of 0.1%.) These statistics were obtained from: xtremesystems.org/forums/showthread.php?p=4623580 Current Intel and AMD Offerings The table below provides a representative sampling of the current AMD and Intel CPU offerings - ranked by price. The table includes CPU speeds (frequency) and thermal design power (TDP). TDP represents a CPU's maximum power usage which is used by cooling system designers to build cooling systems with adequate heat dissipation capacity. Source: neoseeker.com/articles/hardware/reviews/intel_i7_2600k_i5_2500k/.
Phenom II X6 1100T BE 3.3GHz 125W $265 Core 2 Quad Q9550 Core 2 Quad Q9505 2.83GHz 95W $275 2.83GHz 95W $240 AMD Intel Processor name Frequency TDP Price Processor name Frequency TDP Price Core i7-980x EE 3.33GHz 130W $950 Core i7-970 3.20GHz 130W $880 Core i7-960 3.20GHz 130W $580 Core i7-950 3.06GHz 130W $570 Core i7-875k 2.93GHz 95W $330 Core i7 2600K 3.4-3.8GHz 95W $317 Core i5-680 3.60GHz 73W $310 Core i5-670 3.46GHz 73W $300 Core 2 Duo E8600 3.33GHz 65W $290 Core i7-870 2.93GHz 95W $285 Core i7-950 3.06GHz 130W $280 Core i7-930 2.80GHz 130W $280 Phenom II X6 1090T 2.3-3.2GHz 125W $229 Core i5-2500k BE 2.7GHz 95W $215 Core i5-661 3.33GHz 87W $210 Core i5-760 2.80GHz 95W $209 Core i5-660 3.33GHz 73W $208 Phenom II X6 1075T 3.0GHz 125W $200 Core i5-655k 3.20GHz 73W $200 Core i5-750 2.66GHz 95W $200 Core 2 Duo E8500 3.16GHz 65W $195 Phenom II X4 970 BE 3.5GHz 125W $180 Core i5-650 3.20GHz 73W $180 Phenom II X6 1055T 2.8Ghz 125W $179 Core 2 Quad Phenom II X4 965 BE 3.4GHz 125W $160 Q8400 2.66Ghz 95W $170 Core 2 Duo E8400 3.00GHz 65W $168 Core i3-560 3.33GHz 73W $150 Core 2 Duo E7600 3.06GHz 65W $150 Core 2 Quad Q8300 2.50GHz 95W $150 Phenom II X4 955 BE 3.2GHz 125W $145 Phenom II X4 945 3.0GHz 95W $136 Phenom II X4 925 2.8GHz 95W $130 Core i3-550 3.20GHz 73W $130 Core 2 Duo E7500 2.93GHz 65W $125 Athlon II X4 645 3.0GHz 95W $118 Phenom II X2 565 3.4GHz 80W $115 Core i3-540 3.06GHz 73W $105 Phenom II X2 560 BE 3.3GHz 80W $100 Core i3-530 2.93GHz 73W $100
AMD Intel Processor name Frequency TDP Price Processor name Frequency TDP Price Athlon II X4 640 3.0GHz 95W $100 Pentium G6950 2.80GHz 73W $100 Pentium E6800 3.33GHz 65W $100 Athlon II X4 635 2.9GHz 95W $99 Phenom II X2 555 BE 3.2GHz 80W $90 Athlon II X3 455 3.3GHz 95W $87 Pentium E6700 3.20GHz 65W $87 Pentium E6600 3.06GHz 65W $87 Pentium E6500 2.93GHz 65W $80 Pentium E5700 3.00GHz 65W $80 Athlon II X3 450 3.2GHz 95W $79 Athlon II X2 265 3.3GHz 65W $75 Athlon II X3 445 3.0GHz 95W $74 Pentium E5400 2.70GHz 65W $70 Pentium E5500 2.8GHz 65W $70 Athlon II X2 260 3.2GHz 65W $68 Athlon II X2 255 3.1GHz 65W $63 Pentium E3500 2.70GHz 65W $63 Athlon II X2 250 3.0GHz 65W $59 Athlon II X2 245 2.9GHz 65W $58 Celeron E3400 2.6GHz 65W $54 Celeron E3300 2.50GHz 65W $52 Celeron 430 1.80GHz 35W $42 Sempron 145 2.8GHz 45W $37 Sempron 140 2.7GHz 45W $33 Evolution of Intel Processors In the 30 years or so of PC evolution, there have been approximately ten generations of Intel processors, as shown in the table below: During the 7th generation of CPUs, the x86-64 instruction set was introduced which allows 64 bit processing. Previous generations used the x86 (32 bit) instruction set. The x86-64 instruction set is backwards compatible with x86 instruction set. Generation Introduced CPUs Architecture Process (nm) 1 1979 8086, 8088 P1 3200 2 1982 286 P2 1500 3 1988 386 P3 1500-1000 4 1991 486 P4 1000-600 5 Mar 1993 Pentium, Pentium MMX P5 800-250 Instruction Set x86
Generation Introduced CPUs Architecture Pentium Pro, II, III 6 Celeron 1995 Xeon 7 Jun 2001 8 Jul 2006 9 Nov 2008 9 Mar 2008 7-9 2001 10 Jan 2011 Pentium 4 Celeron Xeon Core 2 Celeron Xeon Core i Celeron Xeon Process (nm) P6 350-65 Netburst 180-65 Instruction Set Core 65-45 x86-64 Nehalem 45-32 x86-64 Atom Atom 45 Itanium I, II, Tukwila Core i (2nd Generation) Itanium Sandy Bridge 180-65 32 - (22) x86 & x86-64 Itanium x86-64 The Intel Itanium processor, which is generally used only with servers, has an instruction set that is incompatible with the mainstream x86-64 and x86 instruction sets. Up through the Pentium 4 (Netburst) era, the primary means of estimating the performance of a processor was CPU speed - measured in megahertz or gigahertz. Today, a more accurate way of estimating CPU performance is the number of transistors it contains. The two charts below illustrate this trend. The first chart depicts the maximum speed of major Intel CPU architectures, from earliest to latest. The second chart depicts the number of transistors per CPU architecture. Speed (GHz) 4 3.5 3 2.5 2 1.5 1 0.5 0 0.01 0.025 0.033 0.1 0.3 1.4 3.8 3.3 3.47
You can see from the chart above that CPU speed maxed out in the Pentium 4 era, and has since level out. Since this time period, however, CPU performance has increasing significantly. Speed alone, is not an accurate means of judging the performance of a modern CPU. 2500 2300 Transistors (millions) 2000 1500 1000 500 0 0.03 0.13 0.86 1.4 3.3 44 376 820 You can see from the chart above that early CPU's, such as the 8086 and 8088, contain less than 30,000 transistors. Modern CPUs, on the other hand, contain billions of transistors. CPU Performance over the years has generally paralleled transistor count. S-Spec Code The S-Spec code is a five-character string used by Intel to identify the stepping of a processor model. It is printed on each processors, and can be used to identify it - see the diagram to the right. S-Spec Code A stepping is a revision of the processor silicon, and there are two types: full layer and metal layer. A full layer stepping level is one in which all the masks that are used to create the silicon are changed. A metal layer stepping is one in which only metal layer masks are changed - allowing for fewer changes. Intel Pentium 4 1.5 GHz/256/400/1.75 SL5N8 Costa Rica 3137A118-0411 INTEL '01 4138A008 Stepping names consists of a capital letter followed by a number. Example: A0. The letter part of the name is incremented for full layer steppings and the number part of the name is incremented for metal layer steppings. As an example, the first full-layer stepping from A0 would typically be B0 and the first metal-layer stepping from A0 would be A1. When a processor stepping changes, a new S-Spec code is generated. However, some steppings can have more than one S-Spec code.
Intel recommends that users go to ark.intel.com/ to find information about their processors. One can use the search box in the upper right hand corner to find processor specifications based on S-Spec code, processor number, processor codename, product order code, or brand name. You also have the option of using the menu to drill down on the information for the processor you are looking for. Basic Components of a CPU A CPU consists of the following basic components: Component Control Unit Arithmetic Logic Unit Floating Point Unit Registers Description Contain the CPU instruction set and extensions Performs integer arithmetic, which is involved in probably 90% of calculations Perform decimal arithmetic. Small internal holding areas for data and instructions. The CPU Die and Package A CPU die is a section of silicon wafer on which a CPU is fabricated. Modern CPUs circuitry consist of millions of transistors. The size of transistors used in a die is measured in nanometers (nm), and is known as the manufacturing process. Leading edge CPUs use a process in the range of 32 nm and have a design that may incorporates the following features: Multiple execution units (i.e., cores) Three levels of integrated cache (i.e., L1, L2, L3) Integrated memory controller 1 Integrated graphics controller 1 Integrated graphics processor 1 Integrated PCIe bus 1 Network-on-a-chip design (i.e. QuickPath Interconnect) A CPU die is mounted on a small circuit board (substrate ) known as a package. Traces in the substrate electrically connect the die to an array or pins or lands - which is used to make electrical contact with a motherboard. The package may also include an integrated heat spreader which serves as the mating surface for a heat sink. Packages are manufactured to meet formal CPU socket specifications and guidelines. Most modern packages are pin grid array (PGA), land grid array (LGA), or their derivatives. A package can contain multiple dies. How CPUs Work Four Stages of CPU Operation The four stages of CPU operation are (in order) Fetch, Decode, Execute, and Write, as illustrated in the in the diagram below. 1 These components were originally part of the Northbridge within the motherboard chipset.
The fetch stage involves retrieving an instruction from program memory (either main memory or cache). After the instruction is fetched, the program counter is incremented by the length of the instruction word. In the decode stage, the instruction is interpreted according to the CPU's instruction set and the correct course of action is determined. Instructions are carried out in the execute stage using components such as the arithmetic logic unit (ALU), floating point unit (FPU) and registers. Finally, the results of the execution step are saved to cache or main memory in the write stage. After the write stage, the entire process repeats. The time required to execute instructions varies according to architecture, as shown in the table below: Generation CPUs Cycles per Instruction 1 8086 12 cycles per instruction 2, 3 286, 386 4.5 cycles per instruction 4 486 2 cycles per instruction 5 Pentium 1 1 cycle per instruction 6-9 Pentium 4 - Core i 3 or more instruction per cycle 7-9 Itanium 6 instruction per cycle 1. Pentium started the trend of using multiple pipelines. 2. Generation 6 started the trend of deep pipelines Pipelining With older CPUs, an entire instruction had to be completed before the next one was started. That meant that three fourths of the components of a four-stage CPU were idle at any given time. With modern CPUs, instructions are processed in a production line manner in which components are never idle - they work on separate instructions consecutively. This concept is known as pipelining, and is illustrated in the diagram below on the right. As you might expect, pipelining enhances CPU performance. Instruction Execution Without Pipelining Instruction Execution With Pipelining Superscalar Technology
The fifth generation (P5) Pentium processor marked the beginning of the use of what Intel calls superscalar technology - which is the use of multiple pipelines. Each pipeline has the capability of independently executing instructions. This means that a CPU with two pipelines would theoretically have twice the performance of a CPU at the same speed with one pipeline. Deep Pipelines Breaking down stages into smaller sub-stages in conjunction with increasing clock speed is considered to be a valid method of increasing performance - up to a point. The problem with this paradigm is that increasing CPU speed also increases the generation of heat and the consumption of power - so a balance has to be met. The table below indicates pipeline depth for select Intel CPUs. Generation Processor Pipeline Depth 6 Pentium III 10 7 Pentium 4 20 Pentium 4 Prescott 31 8 Core 2 14 9 Core i5 / i7 14 The deep pipeline depth of the Generation 7 (Netburst) process caused severe heat problems for Intel. This is one reason that the Netburst architecture was abandoned, and follow on architectures (Core and Nehalem) were build with a power saving perspective from the ground up. Registers Registers are areas of memory within the execution unit of a CPU where values are temporarily held during processing. Typical register widths today are 32 or 64 bits, which influences the size of the instructions and the size of the data that can be used in arithmetic calculations. As shown in the table below, registers have grown in size as CPUs have evolved since 1971. Registers and their interconnections are also known as the internal data bus. Width (bits) Addressable Registers Data Bus Address Bus Release Date CPU Memory (KB) Applications 4 4? 1971 4004.64 Pocket Calculator 8 8 16 1974 8080 64 Small CP/M based computers 16 8 20 1979 8088 1 MB First IBM computer running CP/M IBM PC-AT 16 16 20 1978 16-bit operating 8086, 286 1 MB 1982 systems (DOS) and applications. 32 16 24 1988 386 SX, SL 16 MB 16 bit (DOS kernel) 32 32 32 1985-386 DX 4 GB & 32-bit operating
32 64 32 32 64 36 64 64 40 2005 64 64 40 64 16 40 1989 486 systems and 1996 - K5, K6, applications 4 GB 2003 Athlon, XP 1993 - Pentium 2004 Pro, 1-4 64 GB Pentium D Pentium Ext 1 TB 2006-2008 2003-2009 64 4 40 2009 Core 2 Athlon 64, Athlon II, Phenom I/II, Core i7 Bloomfield Core i3/15/i7 Lynnfield 1 TB 1 TB 1 TB 32 & 64 bit operating systems and applications 64-Bit Processors The advent of 64 bit register gave rise to 64 bit operating systems that use 64 bit instruction sets. The x86-64 instruction set is the de facto standard for 64 bit processors today. It was developed by Advanced Micro Devices (AMD) and first used in an Opteron processor in 2003. One key feature in its success is that it is fully backwards compatible with the x86 instruction set and 32- bit operating systems. AMD markets the x86-64 architecture as AMD64 which is implemented in AMD's Athlon 64, Athlon 64 FX, Athlon 64 X2, Athlon II, Athlon X2, Opteron, Phenom, Phenom II, Turion 64, Turion 64 X2, and later Sempron processors. Intel 64 is Intel's implementation of x86-64 which is compatible with AMD64. It is used in Pentium D, Pentium Extreme Edition, and Pentium Dual-Core processors, the Atom 230, 330, D510, and N450 and in all versions of the Core 2, Core i7, Core i5 and Core i3 processors. Advantages and Disadvantages of 64-bit Processing According to Microsoft, following are some advantages and disadvantages of using a 64-bit version of Windows Vista. Advantages Increased memory support beyond that of the 4-GB addressable memory space that is available in a 32-bit operating system Increased program performance for programs that are written to take advantage of a 64- bit operating system Enhanced security features Disadvantages
64-bit device drivers may not be available for one or more devices in the computer. Device drivers must be digitally signed. 32-bit device drivers are not supported. 32-bit programs may not be fully compatible with a 64-bit operating system. It may be difficult to locate programs =that are written specifically for a 64-bit operating system. Not all hardware devices may be compatible with a 64-bit version of Windows Vista. The System Clock The speed of a CPU is measured in megahertz (MHz) or gigahertz (GHz) which is one million cycles per second or one billion cycles per second, respectively. Desktop processors topped the 1GHz mark in 2000, 2GHz in 2001, and 3GHz in 2002. CPU speed has since leveled off and generally does not exceed 4 GHz. One reason for the 4 GHz limit is the amount of heat generated at the higher speeds. Another reason is the consumption of power which is problematic with laptops. Most PCs run at a base frequency that is determined by a quartz crystal oscillator located on the mother board - known as the system clock. A step up in system clock speed is possible for CPUs through circuitry that multiplies the motherboard's base frequency by a factor known as the clock multiplier. The multiplier value is derived from manufacturer testing that determines the highest frequency that the CPU can operate and yet be guaranteed to be stable. This estimate of maximum speed by manufacturers is generally conservative and may leave some leeway for users to increase the CPU's speed by over-clocking. Over-Clocking Over-clocking a CPUs is accomplished by changing the CPU multiplier value in the system BIOS. Except for some special high-end processors, however, CPU manufacturers often lock this setting so that it cannot be changed. This to deter processor relabeling where an inexpensive processor is over-clocked, relabeled, and sold as a more expensive processor. Cache Memory As mentioned above, CPU speeds have remained relatively constant over the last several years yet performance has improved dramatically. One of the reasons for this has been the substantial increase in use of cache memory. Cache is special high-speed static random access memory (SRAM) that resides either on the CPU die itself or very close to it. SRAM is faster than conventional dynamic random access memory (DRAM) because it does not have to be continually refreshed. The function of cache is to store data that were pre-fetched or were recently used and are likely to be needed again in the near future. The CPU always tries to read from cache first. When it finds data in cache (a cache hit) considerable time is saved over having to read it from main memory. There are currently three types of cache: Level 1, Level 2 and Level 3. The numerical part of their label (i.e.,1, 2 & 3) indicates both the order in which they have been added to CPU design and the order in which they are searched by the CPU Level 1 (L1) Cache L1 cache is a relatively small amount of on die memory that can be searched very quickly. Because of very smart predictive controllers, current L1 cache has about a 90% hit rate. L1
cache was first used in the 486 processor. The 486DX model had 8 KB of cache in when it was released in 1989, and the 486DX4 had 16 KB of cache when it was released in 1994. Modern processors have up to 64 KB of L1 cache. Level 2 (L2) Cache L2 cache is secondary cache that is searched if L1 cache has a miss. The Pentium Pro processer introduced in 1995 was the first to have L2 cache integrated into the CPU package, but not the CPU itself. The Pentium II versions of the Celeron processor introduce in 1997 was the first to have L2 cache (128 KB) integrated in to the CPU die. Integrated L2 cache has been a standard CPU design feature ever since. Just like L1 cache, L2 cache has a hit rate of about 90%. By combining both caches, the hit rate it approximately 99% Level 3 (L3) Cache The first use of L3 cache was with the Pentium 4 Extreme addition in 2004. L3 cache is particularly useful in multicore processors where it is shared among all the cores. Hyper-Threading Hyperthreading (HT) technology allows the operating system to execute two application threads simultaneously. HT capable operating systems view HT-capable processor as two logical processors. With Windows, one can observe the operation of each logical CPU on the Windows Task Manager dialog Performance tab, as shown below. The Intel 3.06 GHz Pentium 4A (Northwood), released in 2002, was the first processor to use HT. Subsequent Intel Pentium 4 processors as well as Core i5 and Core i7 processors where also equipped to use HT. However HT was not implemented on the Intel Core 2 family of processors and is not available on any AMD processor. Turbo Boost The Intel Nehalem processors (Core i5, i7) have a feature called Turbo Boost that, in conjunction with the operating system, can automatically bump the clock multiplier up or down in increments of 133MHz depending on the work load, temperature, current, and power. The system clock is 133 MHz.