ELE 356 Computer Engineering II Section 1 Foundations Class 6 Architecture
History ENIAC Video 2 tj
History Mechanical Devices Abacus 3 tj
History Mechanical Devices The Antikythera Mechanism Oldest known scientific calculator From the Greek period Over 30 gears Used to find the position of the sun, moon and planets Predict eclipses 4 tj
History Mechanical Devices Sectors Developed in the late 1500s Adapted to a broad range of approximate computations Basic arithmetic, calculating areas and volumes, converting currencies 5 tj
History Mechanical Devices Slide Rule Developed in the 1600s Widely used until 1970s Adapted to a broad range of approximate computations Basic arithmetic, trig, logs, Specialized applications developed (nuclear, space, military) 6 tj
History Mechanical Devices Leibniz Stepped Reckoner calculator Developed in 1673 First four-function calculator Based on drum-shaped gears Lead to almost 300 years of derivative devices 7 tj
History Mechanical Devices Babbage Analytical Engine Started in 1834 Never completed Processing unit, memory unit, punch card input First programmable computer 8 tj
History Mechanical Devices Advanced Mechanical Calculators 9 tj
History Mechanical Devices Electric motors added to speed things up 10 tj
History Early Computers Harvard MK1 1937-1944 Electromechanical 50ft shaft Hand Programmed 11 tj
History Early Computers ENIAC 1943-1945 All electrical vacuum tubes Hand programmed 12 tj
History Early Computers Manchester Small Scale Experimental Machine (baby) 1948 First stored program computer Program memory was tube based 13 tj
History Early Computers EDVAC Modified the ENIAC to use the stored program concept First to use paper tape to input programs 14 tj
History Mass Produced Computers UNIVAC 1951 Console for control Used Mag tape for storage 15 tj
History Mass Produced Computers IBM 1958 First mass produced transistor based computer 16 tj
History Mini-Computers PDP8, LINC, PDP11, Vax11/780, Wang2200 17 tj
History Personal Computers Intel 8008 home made computers 18 tj
History Personal Computers TRS80, Apple II, Commodore PET Publically available 1977 IBM PC 1981 Huge adoption Software explodes Laptops 1989 Mobility 19 tj
History TRS 80 video 20 tj
Classes of Processors General Purpose Processor User Programmable Intended to run end user selected programs Application Independent PowerPoint, Chrome, Twitter, Angry birds, Embedded Processor Not User Programmable Programmed by manufacturer Application Driven Non-smart phone, appliances, missiles, automobiles, Very wide and very deep applications profile 21 tj
Classes of Processors General Purpose Processor Key Characteristics 32/64 bit operations Support non-real-time/time-sharing operating systems Support complex memory systems Multi-level cache dram Virtual memory Support DMA-driven I/O Complex CPU structures Pipelining Superscalar execution Out-of-order execution (OOO) Floating Point HW 22 tj
Classes of Processors General Purpose Processor Examples ARM 7, 9, Cortex A8, A9,A15 Intel Pentiums, Ix, AMD Phenom, Athleron, Opteron Apple A4, A5 TI OMAPs 23 tj
Classes of Processors Embedded Processor Key Characteristics 4/8/16/32 bit operations Support real-time operating systems Relatively simple memory systems Memory mapped I/O Simple CPU structures Few registers Limited Instructions Support for multiple I/O schemes Wide range of peripheral support A/D D/A Sensors Extensive interrupt support 24 tj
Classes of Processors Embedded Processor Examples Motorola/Freescale 68K, HC11, HCS12 ARM Cortex Rx, Mx Atmel AVR 25 tj
Instruction Sets CISC Complex Instruction Set Computer Name didn t even exist until RISC was defined Used in most processors until about 1980 One instruction holds multiple actions Load data from location, add, write data to new location Many times the instructions were designed to emulate high level language constructs RISC Reduced Instruction Set Computer Developed in the 80s Most prevalent architecture today Sometimes called a load/store architecture Instructions are simple Load data from location Add Store data to location RISC dominates today Much easier to take advantage of advanced structures like Pipelining, Superscalar, OOO 26 tj
Architectural Configurations Instruction / Data Structures SIMD Single Instruction Multiple Data SISD Single instruction Single Data SIMD INSTRUCTIONS SISD INSTRUCTIONS P DATA P DATA P P MIMD Multiple Instruction Multiple Data MISD Multiple Instruction Single Data MISD INSTRUCTIONS MIMD P INSTRUCTIONS P P DATA P P P DATA P P P P P P 27 tj
Architectural Configurations Memory Bus Structure von Neumann Harvard UNIFIED MEMORY INSTRUCTION MEMORY DATA MEMORY ADDRESS ADDRESS CONTROL CONTROL ALU CONTROL CONTROL ALU STATUS STATUS 28 tj
Architectural Configurations Memory Bus Structure Modified Harvard UNIFIED MEMORY INSTRUCTION MEMORY DATA MEMORY ADDRESS CONTROL CONTROL ALU STATUS 29 tj
Architectural Configurations Cache Memory Modified Harvard UNIFIED MEMORY INSTRUCTION MEMORY ADDRESS DATA MEMORY These memories are often augmented by cache memories or are caches themselves CONTROL CONTROL ALU STATUS 30 tj
Architectural Configurations Cache Memory Cache memory is used to store relatively small amounts of data or program for a relatively short amount of time Sit between the processor and the main memory Fast keep them small to make them fast allow the processor to run faster than main memory would allow Leverage the concept of temporal locality If you have recently used a piece of data you are more likely to use it again Leverage the concept of spatial locality Program code and data structures are generally contiguous in memory 31 tj
Architectural Configurations Cache Memory Basic Operation Processor requests a byte of program or data The system first checks to see if the byte is already in the cache if Yes read the byte and continue (called a cache hit) if No stall or allow the processor to do something else (called a cache miss) read the byte from main memory into the cache read the byte from the cache and continue If the cache is full and a new byte needs to be loaded several methods can be used to remove an existing byte LRU least recently used byte is removed FIF0 oldest byte loaded is removed 32 tj
Architectural Configurations Pipelining Clock Cycle 0 1 2 3 4 5 Waiting D C D Instructions B C D A B C D CPU Execute A B C D Retired Instructions Execute = fetch instruction, decode, execute, write back No Pipeline A B C D A B C A B A 4us 4us 4us 4us 4us 33 tj
Architectural Configurations Pipelining Break complex tasks into smaller chunks Start the next instruction as soon as each subtask is complete Clock Cycle 0 1 2 3 4 5 6 7 8 Pipeline Waiting D C D Instructions B C D A B C D Fetch A B C D Decode A B C D Execute A B C D Write back A B C D Retired Instructions A B C D A B C A B A 1us 1us 1us 1us 1us 1us 1us 1us 34 tj
Architectural Configurations Superscalar Parallelism at the micro-architecture level 35 tj
Architectural Configurations Modern Example 36 tj
Architectural Configurations Modern Example 37 tj
Architectural Configurations Modern Example 38 tj