SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external memory interconnect: buses, network-on-chip impact: time, area, power, reliability, configurability customisability: specialized processors, reconfiguration productivity/tools: model, explore, re-use, synthesise, verify examples: crypto, graphics, media, network, comm, security future: autonomous SOC, self-optimising/verifying design our focus overview, processor, memory wl 2015 10.1
iphone SOC Processor I/O I/O 1 GHz ARM Cortex A8 Memory I/O Source: UC Berkeley wl 2015 10.2
Basic system-on-chip model wl 2015 10.3
2MB shared L3 Cache 512KB L2 512KB L2 512KB L2 512KB L2 AMD s Barcelona Multicore Processor Core 1 Core 2 4 out-of-order cores 1.9 GHz clock rate 65nm technology 3 levels of caches integrated Northbridge Northbridge Core 3 Core 4 http://www.techwarelabs.com/reviews/processors/barcelona/ wl 2015 10.4
SOC vs processors on chip with lots of transistors, designs move in 2 ways: complete system on a chip multi-core processors with lots of cache processor System on chip multiple, simple, heterogeneous Processors on chip few, complex, homogeneous cache one level, small 2-3 levels, extensive memory embedded, on chip very large, off chip functionality special purpose general purpose interconnect wide, high bandwidth often through cache power, cost both low both high operation largely stand-alone need other chips wl 2015 10.5
Processor types: overview Processor type Architecture / Implementation approach SIMD Vector VLIW Superscalar Single instruction applied to multiple functional units Single instruction applied to multiple pipelined registers Multiple instructions issued each cycle under compiler control Multiple instructions issued each cycle under hardware control wl 2015 10.6
Processors for SOCs SOC Basic ISA Processor description Freescale c600: signal processing PowerPC Superscalar with vector extension ClearSpeed CSX600: general Proprietary Array processor with 96 processing elements PlayStation 2: gaming ARM VFP11: general MIPS ARM Pipelined with 2 vector coprocessors Configurable vector coprocessor wl 2015 10.7
Sequential and parallel machines basic single stream processors pipelined: overlap operations in basic sequential superscalar: transparent concurrency VLIW: compiler-generated concurrency multiple streams, multiple functional units array processors vector processors multiprocessors wl 2015 10.8
Pipelined processor Instruction #1 IF ID AG DF EX WB Instruction #2 IF ID AG DF EX WB Instruction #3 IF ID AG DF EX WB Instruction #4 Time IF ID AG DF EX WB wl 2015 10.9
Superscalar and VLIW processors Instruction #1 IF ID AG DF EX WB Instruction #2 IF ID AG DF EX WB Instruction #3 IF ID AG DF EX WB Instruction #4 IF ID AG DF EX WB Instruction #5 IF ID AG DF EX WB Instruction #6 IF ID AG DF EX WB Time wl 2015 10.10
Superscalar VLIW hardware for parallelism control wl 2015 10.11
Array processors perform op if condition = mask operand can come from neighbour mask op dest sr1 sr2 n PEs, each with memory; neighbour communications one instruction issued to all PEs wl 2015 10.12
Vector processors vector registers, eg 8 sets x 64 elements x 64 bits vector instructions: VR3 = VR2 VOP VR1 wl 2015 10.13
Memory addressing: three levels (each segment contains pages for a program/process) wl 2015 10.14
User view of memory: addressing a program: process address (offset + base + index) virtual address: from page address and process/user id segment table: process base and bound (for each process) system address: process base + page address pages: active localities in main/real memory virtual address: page table lookup to physical address page miss: virtual pages not in page table TLB (translation look-aside buffer): recent translations TLB entry: corresponding real and (virtual, id) address a few hashed virtual address bits address TLB entries if virtual, id = TLB (virtual, id) then use translation wl 2015 10.15
TLB and Paging: Address translation Virtual Address (recent translations) (find process) process base System Address (find page) Physical Address wl 2015 10.16
SOC interconnect interconnecting multiple active agents requires bandwidth: capacity to transmit information (bps) protocol: logic for non-interfering message transmission bus AMBA (Adv. Microcontroller Bus Architecture) from ARM, widely used for SOC bus performance: can determine system performance network on chip array of switches statically switched: eg mesh dynamically switched: eg crossbar wl 2015 10.17
Design cost: product economics increasingly product cost determined by design costs, including verification not marginal cost to produce manage complexity in die technology by engineering effort engineering cleverness design effort often dictated by product volume Design time and effort Basic physical tradeoffs Balance point depends on n, number of units wl 2015 10.18
Design complexity processors wl 2015 10.19
Cost: product program vs engineering Chip design Fixed costs Variable costs Verify & test Labor costs Marketing, sales, administration Manufacturing costs Software CAD support Engineering Engineering costs Mask costs Product cost CAD programs Capital equipment Fixed project costs wl 2015 10.20
Example: two scenarios fixed costs K f, support costs 0.1 x function(n), and variable costs K v x n, so design gets more complex, while production costs decrease K f increases while K v decreases if same price, requires higher volumes to break even when compared with 1995, in 2015 K f increased by 10 times K v decreased by the same amount wl 2015 10.21
More recent: higher NRE 2015 1995 wl 2015 10.22
IP: Intellectual Property wl 2015 10.23
Answers to Unassessed Coursework 5 1. rdl 1 R = snd [-] -1 ; R rdl n+1 R = snd apr n -1 ; rsh ; fst (rdl n R) ; R 2. P0 = rdl n Pcell; 1 <<s,x>, a> Pcell <sx+a, x> 3. rdl n R = row n (R i ; 2-1 ) ; 2 P1 = loop (row n Pcell1 ; fst map n D) ; 1 <<s,x>, a> Pcell1 <a,<sx+a, x>> 4. loop (row n R) = (loop R) n Proof: induction on n (see www.doc.ic.ac.uk/~wl/papers/scp90.pdf) P1 = P2 ; [D,D] -n P2 = (loop (Pcell1 ; [D,[D,D]])) n wl 2015 10.24