ABSTRACT INTRODUCTION HISTORY ARM LIMITED LICENSING ISSUES THE ARM ARCHITECTURE

Size: px
Start display at page:

Download "ABSTRACT INTRODUCTION HISTORY ARM LIMITED LICENSING ISSUES THE ARM ARCHITECTURE"

Transcription

1 ABSTRACT INTRODUCTION HISTORY ARM LIMITED LICENSING ISSUES THE ARM ARCHITECTURE ARCHITECTURE DEFINITION ARCHITECTURE VARIANTS GENERAL ISA FEATURES Programmer s Model The ARM coprocessor interface Conditional Execution Multiple Register Transfer Operation Fold shifts/ rotates into ALU operation OPERATING SYSTEM SUPPORT Coprocessor Number MMU architecture Synchronization Context switching AMBA INTERFACE ARMV Thumb Instruction Set ARMV ARM DSP Extensions Jazelle ARMV Media Instructions Thumb SUMMARY ARM IMPLEMENTATIONS ARM7TDMI ARM9TDMI ARM XSCALE AMULET AMULET CONCLUSION REFERENCES

2 Abstract The ARM processor is a Reduced Instruction Set Computer (RISC), a type of microprocessor that recognizes a relatively limited number of instructions. One advantage of RISC processors is that they can execute instructions very fast because the instructions are so simple. Another important advantage is that RISC chips require fewer transistors, which makes them cheaper to design and produce. This architecture is deferent from most major microprocessors such as the SPARC and Itanium processors as it is mainly designed for embedded systems. ARM processors are generally processors that implement the ARM architecture specifications. In this report, the ARM architecture, the interesting features, architectural extensions, ARM architectural implementations and some historical notes will be discussed. 1 Introduction 1.1 History The ARM processor originated from a small company named Acorn Computers Limited in England between the years 1983 and At that time, Acorn developed computers for BBC (British Broadcasting Corporation). Due to the BBC popularity, the computer which was built around the 8-bit 6502 microprocessor became the dominant machine in British schools. It also flourished in the hobbyist market and was also used in research laboratories and higher education establishments. In order to create a successor to the BBC microcomputer, Acorn needed a new microprocessor which is better than the 6502 microprocessor. At the time, most 16-it CISC microprocessors were slower than standard memory parts. BBC was reluctant in adopting such slow performance because the 6502 has actually better interrupt response. When Acorn was refused access to the Intel microprocessor, Acorn decided to create its own processor from scratch. By developing its own microprocessor, Acorn had to develop the whole platform and produced it as a complete product. Acorn thus developed the microprocessor, system board as well as the software (operating system) to run the system. With only just over 400 employees in total, Acorn could not invest in a development of a complex microprocessor. The company does not have the relevant experience to make a commercial microprocessor. Thus, it was decided that Acorn had to produce a better design with a fraction of the design effort. Fortunately, Acorn stumbled upon some papers published by a group of students which developed the Berkeley RISC 1 processor. The processor design was simple, with no complex instructions to ruin the interrupt latency. It was suggested that this design was the way to the future. An engineering team at Acorn, led by Roger Wilson and Steve Furber, started development of the ARM, which stood for Acorn RISC Machine. This microprocessor later became known as ARM 1, a prototype microprocessor which was not made commercial to the public yet. 1.2 ARM Limited The ARM1 processor was completed by 1985, and the first "real" production systems was launched as ARM2 the following year. The ARM2 featured a 32-bit data bus and 26-bit address bus, with 16 registers. The ARM2 was the simplest useful processor in the world, with only 30,000 transistors (compare with the four-year older Motorola 68000's 68,000). Much of this simplicity comes from not having microcode (which represents about 1/4 to 1/3 2

3 the size of the 68000) and not including any cache. This simplicity leads to its excellent lowpower needs, and yet it performed better than the 286. Soon after that, Acorn started working with Apple Computer on newer versions of the ARM core. The work was so important that Acorn spun off the design team in 1990, and is now a separate company named Advanced RISC Machines (still maintaining the ARM acronym). The first CPU produced by Advanced RISC Machines was the ARM6, This time, the ARM6 design is a true 32-bit CPU, while otherwise remaining similar to earlier models. This CPU utilizes a 32-bit addressing. The first models were released in 1991, and Apple used the ARM6-based ARM 610 as the basis for their Apple Newton PDA. Today, the company is just named ARM Limited but continues to be the leader in embedded RISC microprocessor technology. In 2002, ARM Limited was the leading provider of 32-bit embedded RISC microprocessors with 75% of the market. ARM s success was due to Common architecture High performance Low power consumption Low system cost ARM provided solutions for: Embedded real-time systems for mass storage, automotive, industrial and networking applications Secure applications smartcards and SIMs Open platforms running complex operating systems 1.3 Licensing Issues ARM s main business model is driven by licensing its technology to other companies. The company does not manufacture and fabricate its own microprocessors. Rather, it licenses the technology to companies like Intel and Motorola, who will use the ARM architecture in their microprocessors and market it to consumers. The following are the different types of licenses which are provided by ARM: Implementation License This license scheme is the most popular and is purchased by hundred over companies around the world. The license provides the licensee complete information to design & manufacture integrated circuits containing ARM core. ARM provides hard or soft core (macro cells). Hard cores refer to process and technology dependent implementations whereas soft cores refer to HDL (Hardware Description Languages) codes. These soft cores can be used in various processes but is not optimized. Foundry License This license is targeted to fab-less semiconductor vendors to develop & sell ARM core-based products manufactured by licensed companies. The license provides all the key elements and views needed to design an ARM-Powered system-on-chip. Architecture License The architecture license provides the licensee to develop their own CPU implementations compliant with ARM's Instruction Set Architecture. The architecture licensee must have extensive design resources and the highest level of implementation expertise. Intel is a very good example of a company developing 3

4 their own CPU based of ARM s ISA. The Intel XScale is based on the ARM s ISA and provides more instructions. Academic License Basic building blocks of the core to allow simulation and design of prototypes parts for academic research are provided. Thus, a core simulation environment can be created to provide further research on the ARM architecture for academic purposes. 4

5 2 The ARM Architecture The ARM Architecture has been developed and improved through the years in support for the additional computing power which is required by the current embedded systems today while maintaining low power consumption and low code density. The section will discuss the ARM architecture features and the additional features which were added during the major revisions of the architecture. 2.1 Architecture definition The architecture describes the rules for how the microprocessor will behave, but without constraining or specifying how it will be built. The definition of the architecture provides the specification for the interface with the outside world, thus enabling operating system, application and development support to be planned and implemented. In detailed terms, the microprocessor architecture defines: the processor s instruction set The instruction set definition provides the list of instructions available to the programmer. In the ARM, there are concurrent ISAs complementing the existing ARM ISA. These includes Thumb instructions and SIMD Media Instructions. the programmer s model The programmer s model defines the list of available instructions which are available at any one time to the programmer. It defines how the program counter is represented and how context switches will affect the register sets. In the ARM architecture, a link register and stack pointer is used for branching and context switching. how the processor interfaces with its closest memory resources The processor cannot be effective by working on its own. To expand the capability of the processor core, support is given to work with additional devices. The architecture specifies how the core interacts with additional devices such as co-processors. It also defines the bus interfaces in which it interacts with. 2.2 Architecture Variants ARMv1 The ARMv1 was the first processor architecture developed at Acorn Limited. The architecture was implemented in the ARM1 processor. This first architecture was very simple with only 26-bit addressing, thus it was not a fully 32-bit processor. There was no support for multiplication instructions and coprocessor support. ARMv2 The first commercial chip marketed by Acorn limited was the ARM2. TheARM2 featured a 32-bit data bus and 26-bit address bus, with 16 registers. With only 30,000 transistors, it was the one of the simplest useful processors in the world at that time. The ARM2 does not have microcode operations in contrast with most CISC processors. The ARMv2 architecture included 32-bit result multiply instructions and coprocessor support. ARMv2a Acorn introduced the ARM3 chip with on-chip cache. The additional features which were added include the atomic load and store (SWP) instruction and the use of coprocessor 15 as the system control coprocessor to manage the cache. ARMv3 ARM6 was the first processor developed and marketed after the team which was responsible for the ARM spunned off into Advanced RISC Machine in The ARM6 was sold as a macrocell, a stand-alone processor and as an integrated CPU with an on-chip cache, MMU and write buffer. The ARM6 was used in the Apple Newton PDA. The ARMv3 architecture 5

6 had a 32-bit addressing, separate CPSR and SPSRs, and added the undefined and abort modes to allow coprocessor emulation and virtual memory support in supervisor mode. The architectures discussed above have been replaced by newer architectures which is being supported today by most embedded systems. The chart below shows the timeline on the introduction of the new architectures and the implementation of the architectures themselves. Figure 2-1 Architectural Timeline ARMv4 ARMv4 adds the signed and unsigned half-word and signed byte load and store instructions and reserves some of the SWI space for architecturally defined operations. The system mode is introduced, and several unused corners of the instruction space are categorized as undefined instructions for future usage. ARMv4 introduces the Thumb instruction set which comprises of 16-bit Thumb instructions. These instructions result in higher code densities. ARMv5 Version 5 of the ARM architecture improves on the ARM and Thumb instruction interworking, count leading-zeroes (CLZ) instruction and introduces more architecture variants: E - enhanced DSP instructions including saturated arithmetic operations and 16-bit multiply operations J - support for new Java state, offering hardware and optimized software acceleration of byte code execution. ARMv6 ARMv6 was recently introduced in October The ARMv6 includes all TEJ enhancements, namely the Thumb instruction set, DSP enhancement and Jazelle technology (Java). The ARMv6 also includes improves memory management support, multiprocessing features and added new Media Instructions. This results in a newer programmable modal for the new media instructions which utilizes SIMD instructions. 6

7 2.3 General ISA features The Berkey RISC 1 architecture became the basis for the ARM processor. However, only certain elements were incorporated into the ARM architecture. The ARM architecture can be said to be RISC which some CISC features. Thus, the ARM is able to achieve powerefficiency and small core size while obtaining better code density than a pure RISC processor. The following are the features from Berkeley RISC design incorporated within the ARM architecture: A load-store architecture Memory access is only perform on load / store instruction. Other instructions only perform operations on registers. Fixed-length 32-bit instructions 3-address instructions A 32-bit instruction usually contains the address of 2 source registers and 1 destination register. The following features differ from the Berkey RISC design Register windows A register window refers to the range of visible registers that is visible at any one time. In the Berkeley RISC processors, there are 32 visible registers which can be used. During a procedure entry and exit, the visible window provides access to a new set of registers, thus foregoing the need to save the values of the previous registers into memory. This saves time of memory saves. However, doing so will incur a cost of implementing lots of registers in hardware. This results in larger chip size, speed as well as power. Thus, in order to reduce those factors, the ARM architecture allows only 17 registers to be visible at any one time. Delayed branches A delayed branch is actually a bubble or an extra cycle which is used by the branch instruction to decide whether a branch is taken or not. This extra cycle can be used to execute an instruction, which is usually not affected by the decision of the branch. The delayed branch implementation is removed because delayed branches removes the atomicity of individual instructions. This will interact badly in multi-issue implementation of the ARM microprocessor. Thus, the more complex exception handling can be avoided. Single-cycle execution of all instructions A single-execution of all instructions can be done only if both data and instruction are in separate memory blocks, thus accessing both memory blocks at the same time to fetch an instruction and write/fetch data to/from memory. In the earlier versions of the ARM, a single memory for both data and instruction is used. Thus, when memory access is needed (for data), then these instructions would take multiple clock cycles. When more than one cycle is needed, other useful stuff is performed, such as support for auto-indexing addressing modes. This redeuces the total number of ARM instructions to perform any sequence of operations, thus improving performance and code density. 7

8 2.3.1 Programmer s Model Figure 2-2 ARM's visible registers The ARM has a 32-bit RISC-processor core. It receives 32-bit instructions from the instruction memory. In total, the ARM has a total of 37 pieces of 32-bit integer registers. When writing user-level programs, only the 15 general-purpose 32-bit registers from r0 to r14, the program counter (r15) and the current program status register (CPSR) need to be considered. The ARM supports 8 / 16 / 32 bits data type. User level programming executes in User Mode whereas System level programming can execute in 5 other modes (fiq, svc, abort, irq and undefined mode). The ARM itself is pipelined. Thus it uses ILP (instruction level parrallelism) to execute certain instructions in parrallel. In older ARM implementations, the Von Neumann architecture (unified instruction and data memory) was used. However, to increase memory bandwidth as well as lower memory latency, the Harvard architecture was chosen The ARM coprocessor interface The ARM supports a general-purpose extension of its insutrction set through the addition of hardware coprocessors. The following are the important features of the coprocessor architecture. Support for up to 16 logical coprocessors Each coprocessor can have up to 16 private registers of any reasonalble size; they are not limited to 32 bits. Coprocessors use a load-store architecture, with instructions to perform internal operations on resgiters, instructions to load nad save resgiters form and to memory, and instructions to move data to or from an ARM register. Coprocessors communicate with the ARM core through handshaking protocols. Instructions are communicated through this method before it can be performed. 8

9 2.3.3 Conditional Execution In normal RISC and CISC processors, conditional branches are used to skip instructions, thus avoiding unnecesary executions. However, the ARM instruction set has an unsual feature. Every instruction (with the exception of Thumb instructions) is conditionally executed. int gcd(int i, int j) { while (i!= j) { if (i > j) i -= j; else j -= i; } return i; } b loop test subgt Ri,Ri,Rj suble Rj,Rj,Ri test cmp Ri,Rj bne loop There are advantages by having conditional executions. Cuts down significantly on the space available for displacement memory access Avoid branch instructions when generating code for small if statements Figure 2-3 The ARM condition code field Each instruction has a 4-bit condition code. Thus, everycode can be made conditional. Each instruction mnemonic may be extended by appending two letters defined. (EQ, NE, GE, LT, GT etc) Multiple Register Transfer Operation The ARM multiple register transfer instructions allow any subset of the 16 registers visible in the current operating mode to be loaded from or stored to memory. These instructions are normally used on procedure entry and return to save and restore workspace registers. They are useful for high-bandwidth memory block copy routines. Figure 2-4 Multiple register data transfer instruction binary encoding 9

10 2.3.5 Fold shifts/ rotates into ALU operation Another unique feature of the ARM ISA is the ability to fold shifts and rotates instructions into a normal data processing instruction. Thus, shifts and rotates can be performed to gether with arithmetic, logical and register-register move instructions. The operand can be shifted before being processed and stored into a destination register. For example, a+=(j<<2) can be rendered as a single instruction on the ARM. This results in an ARM program being denser than what would normally be expected from a normal RISC processor. Fewer instructions need to be fetched from memory, thus reducing bandwidth consumption on the memory bus. 2.4 Operating system support Coprocessor Number 15 The coprocessor Number 15 (CP15) is needed on ARM CPUs which are used in embedded systems that require a full memory management unit with address translation capabilities. This expands the capability of the ARM core. The CP15 is an on-chip coprocessor which controls the operation of the on-chip cache or caches, memory management or protection unit, write buffer, prefetch buffer, branch target cache and system configuration signals MMU architecture In general-purpose applications where the range and number of application programs in unknown, the ARM CPU will require a memory management unit with address translation. It translates virtual addresses into physical addresses. It also controls memory access permission and aborting accesses which are illegal. The MMU architecture in a typical ARM processor would use a 2-level page table with tablewalking hardware. A TLB is used to store recently used page translation for fast lookup. All accesses and controls are made to the CP15 registers. In the ARM MMU architecture, the memory mapping is performed at several different granularities. These units are Sections: 1 MB blocks of memory Large pages: 64 KB blocks of memory. Access control is applied to individual 16 KB subpages. Small pages: 4 KB blocks of memory. Access control is applied to individual 1 KB subpages. Tiny pages: 1 KB blocks of memory The ARM MMU architecture introduces domains, which are groups of sections and/or pages which have particular access permissions. This enables a number of different processes to run with the same translation tables while retaining protection from each other. Thus, each process need not have its own translation tables Synchronization When a system runs multiple processes which share data structures, there should be a control mechanism to ensure the correct behaviour when two or more processes want to write / read the data at the same time. For example, if process A wants to increment X, it has to read the value of X and then writes the incremented value back to X. However, if the operating system interrupts and allows process B to read and writes the new value to X, the variable X would have the incorrect value because neither A nor B sees the updated value of X. What should happen is that only 10

11 one process can access the variable at any one time. The other process must wait until no other process is accessing the data. A mutually exclusive access is required. Some sort of lock is required to prevent another process from accessing it before it has finished the operation. The ARM architecture supports synchronization by providing a SWAP instruction. The instruction is similar to an atomic test and set instruction. Thus, it is uninterruptible. A register is set to a busy value, and then this register is swapped with the memory location containing the Boolean. If the loaded value is free the process can continue. If it is still busy the process spins on the lock until it gets the free result. WHY DON T OTHER MAINSTREAM PROCESSORS USE THIS The swap instruction access the memory every time. If a process spins on a lock, this would take up a lot of memory bandwidth and thus, would impede performance Context switching A process runs in a context, which consists of all the states (variables, state registers) for the process to run properly. States includes the values of all the processor s registers, including the program counter, stack pointer etc. When a process switch takes places, the context of the old process must be saved and that of the new process restored. The architectural support for register saving and restoring offered on the ARM recognizes the difficulty of saving and restoring user registers from a privileged mode and provides special instructions to assist in this task. Thus, code running in a non-user mode can save and restore the user registers from an area of memory addressed by a non-user mode register. 2.5 AMBA Interface AMBA is an on-chip bus specification that details a strategy for the interconnection and management of functional blocks that makes up a System-on-Chip (SoC). It is an open standard, thus the standard is available to the public. It allows the developer to achieve a first-time-right result when combining and integrating one or more CPU/signal processors and multiple peripherals. Thus, IP (Intellectual Property) Developers can develop their own products without having to worry about connectivity. AMBA promotes a reusable design methodology by defining a common backbone for SoC modules. 2.6 ARMv Thumb Instruction Set The Thumb instruction set addresses the issue of code density. It is a compressed form of a subset of the ARM instruction set. Thumb instructions map onto ARM instructions whereas the Thumb programmer s model maps on to the ARM programmer s model. A dynamic decompression system is used in an ARM instruction pipeline and Thumb instructions execute as normal ARM instructions. This does not affect performance because expansion is done via dedicated hardware within the chip. 11

12 Figure 2-5 ARM and Thumb visible registers In the Thumb programmer s model, there are only 8 visible registers which are mapped to register r0 to r7 in the ARM programmer s model. The use of register R13 in the ARM programmer s model is purely a software convention, however, in the Thumb, there is no choice because it is hardwired. The CPSR register determines the mode of operation (Thumb or ARM). The mode is switched by executing a Branch and Exchange instruction (BX). ARM similarities: Load-store architecture Support for 8-bit byte, 16-bit half-word and 32-bit word data types 32-bit unsegmented memory ARM differences: Most Thumb instructions are executed unconditionally Data processing instructions use a 2 address format instead of 3 address format in the ARM Thumb instruction formats are less regular than ARM instruction formats, as a result of the dense encoding Results (typical case); Requires 70% the space of ARM code 40% higher instruction count 30% less external memory power than ARM code The downside to using Thumb is that there is an overhead to switch from 32-bit to 16-bit. When a branch and exchange instruction is executed (BX), the whole pipeline is flushed. 12

13 2.7 ARMv ARM DSP Extensions Normal DSP processors and coprocessors usually consume too much power and require additional silicon area. It has been found that the ARM processor was suitable even for DSP calculations. The DSP Extension has been introduced to broaden the suitability of the ARM CPU family to applications that require intensive signal processing. Thus, the new processor core is still able to retain the power and efficiency of a high-performance RISC. The ARM DSP Extensions features: Single-cycle 16x16 and 32x16 MAC implementations Zero overhead saturation extension support New instructions to load and store pairs of registers, with enhanced addressing modes New CLZ instruction improves normalization in arithmetic operations and improves divide performance Full support in the ARMv5TE and ARMv6 architecture Applications Audio encode/decode (MP3, AAC, WMA) MPEG4 decode Voice and handwriting recognition Embedded control Bit exact algorithms (GSM-AMR) Jazelle JAVA SOFTWARE ACCELERATION Optimized Java virtual machines typically offer sufficient memory efficiency, however, they are incapable of providing adequate performance for high-end application unless a high-performance processor is used. Thus, cost and power constraints cannot be achieved. JIT compilers bypass the Java Virtual Machine (JVM) for much of the byte code interpretation. Typical compilers are more than 100 KB in size, taking up huge memory. JIT compiler is also slow to initiate, resulting in pauses and user input disruptions. JAVA HARDWARE ACCELERATION Dedicated Java processors represent a significant overhead and additional integration and development complexity. These processors are dedicated for Java execution and must work along side other processors to support existing applications. Java co-processors translate Java byte code into existing core s instructions. Coprocessors require extra space for the dates and extra power to operate. They tend to run slow because they are loosely coupled with the core processor. The ARMv5 architecture added a third instruction set Java Byte Code to the ISA. The Jazelle extension is added to support Java acceleration technology, which is particularly suited to small memory footprint designs. Along with this new instruction set, is additional instruction set support for entering and exiting Java applications, real-time interrupt handling, and debug support for mixed Java/ARM applications. The Jazelle technology reuses all existing processor resources without the need to re-engineer existing architecture or add cost, power or memory resources. J-bit is set in CPSR to mark the mode of operation 13

14 All processor state related to Java execution are stored in normal ARM register set Any interrupt routine which saves on entry and restores on exit are compatible with Jazelle In most systems, the JVM is implemented in software and thus, runs dramatically slower than hardware implementations. The Jazelle Technology implements the JVM in hardware. In order to reduce die size and improve performance, Jazelle is implemented in the ARM pipeline as an FSM (finite State Machine) rather than a traditional microcoded engine. Surprisingly, the hardware logic contributes to only 12,000 gates. 14

15 2.8 ARMv6 In summary the ARMv6 provides the following improvements over previous ARM architectures: Media processing extensions o 2x faster MPEG4 encode/decode o 2x faster audio DSP Improved cache architecture o Physically addressed caches o Reduction in cache flush/refill o Reduced overhead in context switches Improved exception and interrupt handling o Important for improving performance in real-time tasks Unaligned and mixed-endian data support o Simpler data sharing, application porting and saves memory The architecture includes all Thumb, DSP Extensions and Jazelle enhancements. following are the new extensions which were added to the ARMv6 architecture. The Media Instructions The media instruction enables more efficient software implementation of high-performance media applications. There are over 60 SIMD instructions added to the architecture. The SIMD instructions will provide performance improvements of between 2x and 4x depending on the multimedia application. The SIMD instructions will support four 8-bit and two 16-bit operations, parallel add and subtract, selection, packing and unpacking. The new ISA also supports dual 16-bit multiply add/subtract operations. As now one 32-bit register can contain up to four 8-bit values, new status bits have to be defined for each of those values. Six new status bits have been added to the programmer s model GE[3:0] bits o SIMD status bits - greater than or equal to for each 8/16-bit slice E-bit o A-bit o Indicates the current load/store endian setting of the core can be set/cleared with the SETEND instruction Indicates if imprecise data abort exceptions are masked ARMv5TE: 5 cycles in a single-cycle implementation SMULTT Real,Ra,Rb ;Real = Ra.real*Rb.real SMULBB Temp,Ra,Rb ;Temp = Ra.imag*Rb.imag SUB Real,Real,Temp ;Real = Ra.real*Rb.real - Ra.imag*Rb.imag SMULTB Imag,Ra,Rb ;Imag = Ra.real*Rb.imag SMLABT Imag,Ra,Rb ;Imag = Ra.real*Rb.imag + Ra.imag*Rb.real ARMv6: 2 cycles in a single-cycle implementation SMUSD Real,Ra,Rb ;Real = Ra.real*Rb.real - Ra.imag*Rb.imag SMUADX Imag,Ra,Rb ;Imag = Ra.real*Rb.imag + Ra.imag*Rb.real Figure bit Complex Multiply The example above shows how code density can be improved on an ARMv6 architecture while reducing the clock cycles needed to perform the same operations. 15

16 Architecture Cycles/4 pixels ARMv5TE ARMv6 18 cycles 3 cycles Figure 2-7 Implementing Sum of Absolute Differences The table above shows that the Sum of Absolute Differences operation can be reduced to just 3 clock cycles using an ARMv6 architecture, compared to 18 clock cycles, using the ARMv5TE Thumb2 The Thumb instruction set is an extension to the 32-bit ARM architecture that enables very high code density. This efficiency is however typically at the expense of performance, due to switching from Thumb back to ARM mode. This is due to the lack of operations that can be performed in Thumb mode. Although a single Thumb instruction is equivalent to a single ARM instruction, more 16-bit Thumb instructions are needed to accomplish the same overall function. The ARM Thumb-2 core technology: Introduces the new 16-bit thumb instruction for improve program flow Provides new 32-bit Thumb instruction derived from ARM instruction equivalent. These instruction would have coprocessor access, privilege instructions and other special functions (SIMD) The ARM 32-bit ISA has also been improved. Thus, Thumb instructions are now not limited just to 16-bit instructions but includes 32-bit instructions too. The Performance of Thumb-2 technology: Performance similar to instruction based on ARM ISA 5 percent smaller than Thumb high density code 2-3 percent faster than Thumb high density code 2.9 Summary ArchitectureThumb DSP Jazelle Media TrustZone Thumb-2 v4t v5te v5tej v6 v6z v6t2 T: Thumb J: Jazelle E: DSP Instructions T2: Thumb2 16

17 3 ARM Implementations The principal current ARM processor core products offer a choice of cost, complexity and performance points from which the most effective solution can be selected. Each core is chosen as embedded in a CPU design or a microcontroller unit. Core ARM1 ARM2 ARM2aS, ARM3 ARM6, ARM600, ARM610, AMULET1, AMULET2 ARM7, ARM700, ARM710 ARM7TDMI, ARM710T, ARM720T, ARM740T Strong ARM, ARM8, ARM810 ARM9TDMI, ARM920T, ARM940T, AMULET3 ARM9E-S ARM10TDMI, ARM1020E, XScale ARM11 Architecture v1 v2 v2a v3 v3 v4t v4 v4t v5te v5te v6 Figure 3-1 ARM Architecture Implementations 3.1 ARM7TDMI The ARM7 family is now the lowest end ARM core and is used for personal audio players, entry level wireless handsets and two-way pagers. The ARM7TDMI evolved from ARM6, which was the first core to implement the 32-bit address space programming model. The ARM7TDMI stands for ARM7, a 3 volt 32-bit integer core Thumb 16-bit instruction set On-chip Debug support ro halt processor in response to a debug request An enhanced Multiplier, yielding a 64-bit results instead of a 32-bit result EmbeddedICE hardware to provide breakpoint and watch point support The ARM7TDMI uses a 3-stage pipeline with a multicycle execution stage. Fetch instruction is fetch from memory and placed in instruction pipeline Decode instruction is decoded and datapath control signal signals prepares Execute register bank is read, operand shifted, ALU result generated and written back in a destination register The ARM7TDMI has two read ports and one write port. One additional read port and an additional write port is provided to give special access to r15, the program counter. Thus, self-increment of the program counter would not affect or limit the number of registers which can be read / written. The following interfaces are supported: Memory interface MMU interface Coprocessor interface Debug interface JTAG interface 17

18 Process 0.35 m Transistors 74,209 MIPS 60 Metal layers 3 Core area 2.1 mm2 Power 87 mw Vdd 3.3 V Clock 0-66 MHz MIPS/W 690 Figure 3-2 ARM7TDMI characteristics 3.2 ARM9TDMI The ARM9TDMI represents a major improvement over the ARM7TDMI. Improvement is achieved by adopting a 5-stage pipeline to increase the maximum clock rate instead of a 3- stage pipeline in the ARM7TDMI. A separate instruction and data memory ports is used to allow an improved CPI (Clocks cycles Per Instruction). The ARM9TDMI adopts the Harvard style architecture. The improvements in the ARM9TDMI owe a lot to the StrongARM pipeline which is somewhat similar. However, the two major differences from the StrongARM pipeline is that: The StrongARM has a dedicated branch adder which operates in parallel with the register read stage. The ARM9TDMI uses the main ALU for branch target calculations. This results in an additional clock cycle penalty for a taken branch but achieves smaller and simpler core. The StrongARM was designed for a particular process technology where the timing path could be carefully managed. The ARM9TDMI is more flexible and readily portable to new processes. The Thumb instruction decoding differs from ARM7TDMI because the instruction decoding now uses hardware to decode both ARM and Thumb instructions directly. Thumb instructions are not converted to ARM instructions anymore. The ARM9TDMI employs a static branch prediction scheme. Although the core has a Harvard style architecture, a single unified memory can still be used. However, doing so requires a complex high-speed memory subsystem, particularly caching. This makes implementation more complicated and draws more power from caching. Thus, most embedded systems provide separate instruction and data memory in their systems. The ARM9E-S is a synthesizable version of the ARM9TDMI core and is 30% larger than the ARM9TDMI on the same process. It occupies 2.7mm 2 on a 0.25 m CMOS process. Figure 3-3 ARM7TDMI and ARM9TDMI pipeline comparison The figure above shows how the pipeline in the ARM7 is reorganized into the 5 pipeline of the ARM9. The longer pipeline allows the clock frequency to be doubled on the same process technology. A separate stage is now allocated for data memory access and register write-back. Thus, the next instruction can be executed while the current instruction 18

19 reads/write from/to memory. Instructions which do not require memory access still require 5 clock cycles to complete, although nothing is done in the Memory Stage. Process 0.25 m Transistors 111,000 MIPS 220 Metal layers 3 Core area 2.1 mm2 Power 150 mw Vdd 2.5 V Clock MHz MIPS/W 1500 Figure 3-4 ARM9TDMI characteristics 3.3 ARM11 The ARM11 microarchitecture is the first in the new family of ARM11 cores. It is also the first to implement the ARMv6 instruction set architecture. The objective of developing the ARM11 microarchitecture is to meet the needs of the next-generation wireless and portable consumer products while delivering it at low power and low cost. The ARM11 currently supports 4-64KB cache sizes. With the first cores in the family ranging from MHz, the microarchitecture represents a major step in system performance. Future versions will achieve over 1Ghz. The ARM11 allows developers to trade-off between performance and power to match the particular application. The ARM11 is available in both synthesizable and semi-custom hard macrocell implementations. Developers can take advantage of their semiconductor processes by using synthesis. However, the hard macro implementations are targeted at the highest performance, speed-sortable applications. These hard macrocells are optimized only for a particular process. The ARM11 cores have synthesis-friendly pipeline structure. Thus, HDL implementation of the ARM11 is designed to work with commercially available synthesis tools. Additionally, the microarchitecture features: Thumb / Thumb2 for code compression Enhanced DSP DSP processing Jazelle Java acceleration Figure 3-5 ARM11 pipeline organization The new microarchitecture now has an 8 pipeline stage, thus resulting in a 40% higher throughput compared to previous cores. The introduction of the 8 stage pipe can impair 19

20 efficiency by introducing excessive delays or latency in the system. Thus, extensive us of forwarding the pipeline has been used. The delays in the pipeline is also avoided by using branch prediction schemes to predict the flow of the instructions. The result of these optimizations is the same effective latency as a 5-stage pipeline which is found in the ARM9 family cores. There are 2 prediction scheme used in the ARM11: Dynamic branch predictor a 64-entry, 4 state branch target address cache is maintained and is used to hold the majority of the most recent branches. If the branch prediction has been encountered before, a prediction is made based on the previous outcome. Static branch predictor used when the dynamic branch predictor cannot find a record of the branch instruction. If the branch is going backwards, the predictor assumes it is a loop, and takes the branch. If the branch is a forward branch, the branch is not taken. As seen from Figure 3-5, the ARM11 deploys separate datapaths for the ALU, multiplyaccumulate (MAC) and Load/Store (LS) instructions. This enables out-of-order execution, allowing slow operations to continue processing while independent instructions continue processing in parallel. Thus, if there is a stall in the load/store pipeline, data processing execution can still continue in the ALU and MAC datapath. The ARM11 has improved memory access. The core has non-blocking and hit-under-miss operations in the memory system.. When a data is not available when an instruction requests for it, a cache miss results in a normal simple pipeline. However, this is a non-blocking operation for the ARM11. Only when there are three successive misses encountered, will the pipeline be stalled. Implementation of a 64-bit processor is still considered to be excessive in terms of power and area for the embedded systems market. However, the ARM11 uses 64-bit instructions without the need for a fully 64-bit processor implementation. 64-bit data bus is connected between the processor integer unit and the instruction and data caches, and between coprocessors and the integer unit. Thus, two 32-bit instructions can be fetched in a single clock cycle. Additionally, load- and store-multiple instructions can transfer 64-bits (two ARM registers) every cycle. The conclusion is that the ARM11 is able to achieve 64-bit effective performance, but only at a 32-bit cost. Process 0.13 m Core area 2.7 mm2 Power 150 mw Vdd 1.2 V Clock MHz mw/mhz 0.4 Figure 3-6 ARM11 characteristics 20

21 3.4 XScale XScale is a microprocessor core developed by Intel. The initiative first started when Intel took-over Digital Semiconductor in 1998, thus inheriting the StrongARM CPU as well. The StrongARM CPU was developed by Digital Equipment Corporation in collaboration with ARM Limited with the objective to create a high-end processor with much higher performance. Figure 3-7 Intel XScale Core Architecture Features The XScale fully implements the integer instruction set architecture of the ARMv5TE. Thus, the XScale supports the Thumb ISA as well as DSP-Enhanced operations. This core features a 7-stage integer pipe, in contrast to the 5-stage pipeline featured in the StrongARM processor. The processor has 32KB, 32-way set associative instruction and data caches. A 128-entry Branch Target Buffer (BTB) is used to predict the outcome of branch type instructions. The buffer provides the storage for the target address of branch type instructions and predicts the next address to present to the instruction cache when the current instruction address is that of a branch. Similar to the ARM11, a hit-under-miss feature allows execution to continue even when a cache miss is being processed. Improving on the StrongARM, a debug unit for use with Multi-ICE is implemented to support breakpoints and traces. The XScale core provides a few extension to the existing ARMv5 architecture to support the demands of the increasing demanding embedded systems market. A DSP coprocessor (CP0) is added to increase the performance and precision of audio processing algorithms. It contains a 40-bit accumulator and 8 new instructions. The existing page table descriptors have been added with one more extra bit. The C & B bits have been extended with an additional X bit. A P bit is also added in the first level descriptors to allow an ASSP to identify a new memory attribute. Additional functionality has been added to coprocessor 15. CP15 configures the MMU, caches, buffers and other system attributes. Coprocessor 14 is created. CP14 contains the performance monitor registers and the trace buffer registers. Other enhancements were also made to the Event Architecture, instruction cache and data cache parity error exceptions, breakpoint events and imprecise external data aborts. 21

22 Figure 3-8 Intel XScale Pipeline Organization The longer pipeline of the XScale architecture has several disadvantages: Longer branch prediction penalty. If the prediction is incorrect, a penalty of 4 cycles will be imposed, in contrast with only 1 cycle for the StrongARM. Large load use delay. When a value is loaded from memory, the next instruction cannot immediately obtain the read value from memory. There will be some bubbles which requires an optimizing compiler to fill in, so that time is not wasted on stalling the whole pipeline. Certain instructions incur a few extra cycles of delay on the Intel XScale core as compared to StrongARM processors (LDM, STM) Decode and register file lookups are spread out over 2 cycles in the Intel XScale core, instead of 1 cycle in predecessors. The pipeline above, which is similar to the ARM11 is able to execute memory, MAC and data processing instructions in parallel. Although instructions are issued in-order, the main execution pipeline, memory and MAC pipelines have different execution times. Instructions may finish out of order. Register scoreboarding is used in the MAC pipeline. A register dependency occurs when a previous MAC or load instruction is about to modify a register value that has not been returned to the register file yet. Only destination of MAC operations and memory loads are scoreboarded. The table below summarizes the features and comparisons among the ARM implementations. 22

23 3.5 AMULET Synchronous design has been bog down by many problems. As size of design increases, it gets really hard to synchronize various components on silicon. As a synchronous design depend and operate on an externally supplied clock, it is very important for each component to get the timing right. Delays due to large distances on silicon can impede validity of data through the wires. The practical problems incurred by clocked designs: Clock skew occurs when different components receives the clock signals at a different time due to the difference in distance from the clock source. This would lead to circuit malfunction if the clock frequencies are kept increasing. Higher clock rates lead to excessive power consumption. Electromagnetic interference is caused by global synchrony of circuits. This would impede the performance and functionality of the circuit which the chip is designed to control. For the above reasons, asynchronous techniques had to be used to address the above problems. The AMULET processor cores are fully asynchronous implementations of the ARM architecture. They are self-timed and operate without any externally supplied clock. The AMULET was developed at the University of Manchester as a research project for asynchronous design. The benefits of the AMULET are: Clock skew is non-existence because a global clock is not used. Transitions only occur in the circuit in response to a request to carry out useful work. The continuous drain by the clock signal can be avoided too. Power savings can be achieved. The circuit within the asynchronous design emits less electromagnetic radiation. This is due to the less coherent internal activity within the chip. A synchronous chip is design to perform in worst-case scenarios. With an asynchronous design, there is potential to achieve typical performance AMULET3 Figure 3-9 AMULET3 AMULET3 is the third generation asynchronous ARM processor from the AMULET family. It is a fully functional microprocessor with support for interrupts and memory faults. The chip supports Arm architecture version 4T, including the 16-bit Thumb instruction set. One of the objectives of the project is to achieve compatibility with the ARM9TDMI but with an asynchronous design. 23

24 Features include: 15% fewer cycles/instruction than AMULET2 Low latency load/store with asynchronous out-of-order completion Unrestricted register forwarding Branch target prediction and branch fetch suppression Very low power "sleep" mode Dual ("Harvard") bus interface 0.35µm, 3 layer metal process Figure 3-10 AMULET3 Pipeline The 6 pipeline stages of the AMULET3 are as follows: Prefetch : instruction prefetch unit, a branch target buffer (BTB) is included too Decode & Register read : instruction decode (ARM and Thumb), register read and forwarding stage Execute : execute stage, which includes the shifter, multiplier and ALU Data Interface : data memory interface Reorder Buffer : the reorder buffer Register Write : register result write-back stage All of the above components operate autonomously. microarchitecture is provided below: The performance of the Process 0.35 m Transistors 113,000 MIPS 120 Metal layers 3 Core area 3 mm 2 Power 154 mw Vdd 3.3 V Clock none MIPS/W

25 W = BW 4 Conclusion This report has provided an insight to the nature of the ARM processor and how the implementation work specifically for embedded applications. The ARM architecture addresses the issue for embedded systems and is completely at a different level compared to desktop / workstation microprocessors (i.e. Pentium, Itanium, SPARC etc). ARM Limited licenses its cores and architectures to other companies which will either integrate the cores into their own products or manufacture a new chip based on the licensed architecture. ARM does not fabricate its own chips. The different architectures have been discussed and support for various software issues have also been analysed. Finally, the various implementation of the architecture was looked at, particularly the ARM7TDMI, ARM9TDMI, ARM11 and the Intel XScale. Comparisons were also made among the architectures. An overview of the research version of the ARM implementation, AMULET was given; providing the motivation why asynchronous design of microprocessors will be the way into the future. The ARM processor has combined the benefits of RISC architectures while implementing some not so trivial components. The proper combination of architectures and the simplicity of the ARM core have made the ARM cores one of the most used processor cores in the embedded system world. The ARM has evolved with the needs to the market, particularly in the media market (with the advent of the ARMv6). However, there are still more challenges ahead and we will observe what other variants of the ARM will be revealed in the future. 5 References [1] S. Steele, "Accelerating to Meet The Challenging of Embedded Java," ARM Limited, Cambridge, UK 15 November [2] "Acorn RISC Machine," Wikipedia ( [3] S. B. Furber, J. D. Garside, and D. A. Gilbert, "AMULET3: a high-performance selftimed ARM microprocessor," Computer Design: VLSI in Computers and Processors, ICCD '98. Proceedings., International Conference on, pp , [4] "AMULET 3," Advanced Processor Technology Group ( [5] D. Cormie, "The ARM11 Microarchitecture," ARM Limited April [6] D. Snowdon, "ARM and StrongARM Architecture (Slides)," [7] D. Brash, "The ARM Architecture Version 6 (ARMv6)," ARM Limited January [8] "ARM Assembler," HeyRick! ( [9] S. Furber, ARM System-on-Chip Architecture, 2 ed: Addison-Wesley, [10] D. Snowdon, "ARM, StrongARM and XScale," The University of New South Wales 10 June [11] Intel XScale Core : Developer's Manual: Intel Corporation,

ARM Microprocessor and ARM-Based Microcontrollers

ARM Microprocessor and ARM-Based Microcontrollers ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 A Microcontroller-Based Embedded System Roadmap 1 Introduction ARM ARM Basics 2 ARM Extensions Thumb Jazelle NEON & DSP Enhancement

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

An Introduction to the ARM 7 Architecture

An Introduction to the ARM 7 Architecture An Introduction to the ARM 7 Architecture Trevor Martin CEng, MIEE Technical Director This article gives an overview of the ARM 7 architecture and a description of its major features for a developer new

More information

ARM Architecture. ARM history. Why ARM? ARM Ltd. 1983 developed by Acorn computers. Computer Organization and Assembly Languages Yung-Yu Chuang

ARM Architecture. ARM history. Why ARM? ARM Ltd. 1983 developed by Acorn computers. Computer Organization and Assembly Languages Yung-Yu Chuang ARM history ARM Architecture Computer Organization and Assembly Languages g Yung-Yu Chuang 1983 developed by Acorn computers To replace 6502 in BBC computers 4-man VLSI design team Its simplicity it comes

More information

Exception and Interrupt Handling in ARM

Exception and Interrupt Handling in ARM Exception and Interrupt Handling in ARM Architectures and Design Methods for Embedded Systems Summer Semester 2006 Author: Ahmed Fathy Mohammed Abdelrazek Advisor: Dominik Lücke Abstract We discuss exceptions

More information

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

More information

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

Design Cycle for Microprocessors

Design Cycle for Microprocessors Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types

More information

The ARM Architecture. With a focus on v7a and Cortex-A8

The ARM Architecture. With a focus on v7a and Cortex-A8 The ARM Architecture With a focus on v7a and Cortex-A8 1 Agenda Introduction to ARM Ltd ARM Processors Overview ARM v7a Architecture/Programmers Model Cortex-A8 Memory Management Cortex-A8 Pipeline 2 ARM

More information

Mobile Processors: Future Trends

Mobile Processors: Future Trends Mobile Processors: Future Trends Mário André Pinto Ferreira de Araújo Departamento de Informática, Universidade do Minho 4710-057 Braga, Portugal maaraujo@mail.pt Abstract. Mobile devices, such as handhelds,

More information

CISC, RISC, and DSP Microprocessors

CISC, RISC, and DSP Microprocessors CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks

More information

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

More information

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy

More information

Cortex-A9 MPCore Software Development

Cortex-A9 MPCore Software Development Cortex-A9 MPCore Software Development Course Description Cortex-A9 MPCore software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

Five Families of ARM Processor IP

Five Families of ARM Processor IP ARM1026EJ-S Synthesizable ARM10E Family Processor Core Eric Schorn CPU Product Manager ARM Austin Design Center Five Families of ARM Processor IP Performance ARM preserves SW & HW investment through code

More information

Instruction Set Design

Instruction Set Design Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,

More information

StrongARM** SA-110 Microprocessor Instruction Timing

StrongARM** SA-110 Microprocessor Instruction Timing StrongARM** SA-110 Microprocessor Instruction Timing Application Note September 1998 Order Number: 278194-001 Information in this document is provided in connection with Intel products. No license, express

More information

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single

More information

IA-64 Application Developer s Architecture Guide

IA-64 Application Developer s Architecture Guide IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve

More information

Intel StrongARM SA-110 Microprocessor

Intel StrongARM SA-110 Microprocessor Intel StrongARM SA-110 Microprocessor Product Features Brief Datasheet The Intel StrongARM SA-110 Microprocessor (SA-110), the first member of the StrongARM family of high-performance, low-power microprocessors,

More information

Preface. Aims. Audience. The ARM

Preface. Aims. Audience. The ARM Preface Aims Audience This book introduces the concepts and methodologies employed in designing a system-on-chip (SoC) based around a microprocessor core and in designing the microprocessor core itself.

More information

Which ARM Cortex Core Is Right for Your Application: A, R or M?

Which ARM Cortex Core Is Right for Your Application: A, R or M? Which ARM Cortex Core Is Right for Your Application: A, R or M? Introduction The ARM Cortex series of cores encompasses a very wide range of scalable performance options offering designers a great deal

More information

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit. Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

Low Power AMD Athlon 64 and AMD Opteron Processors

Low Power AMD Athlon 64 and AMD Opteron Processors Low Power AMD Athlon 64 and AMD Opteron Processors Hot Chips 2004 Presenter: Marius Evers Block Diagram of AMD Athlon 64 and AMD Opteron Based on AMD s 8 th generation architecture AMD Athlon 64 and AMD

More information

Overview of the Cortex-M3

Overview of the Cortex-M3 CHAPTER Overview of the Cortex-M3 2 In This Chapter Fundamentals 11 Registers 12 Operation Modes 14 The Built-In Nested Vectored Interrupt Controller 15 The Memory Map 16 The Bus Interface 17 The MPU 18

More information

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

More information

Architecture and Implementation of the ARM Cortex -A8 Microprocessor

Architecture and Implementation of the ARM Cortex -A8 Microprocessor Architecture and Implementation of the ARM Cortex -A8 Microprocessor October 2005 Introduction The ARM Cortex -A8 microprocessor is the first applications microprocessor in ARM s new Cortex family. With

More information

ARM Webinar series. ARM Based SoC. Abey Thomas

ARM Webinar series. ARM Based SoC. Abey Thomas ARM Webinar series ARM Based SoC Verification Abey Thomas Agenda About ARM and ARM IP ARM based SoC Verification challenges Verification planning and strategy IP Connectivity verification Performance verification

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

MICROPROCESSOR AND MICROCOMPUTER BASICS

MICROPROCESSOR AND MICROCOMPUTER BASICS Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit

More information

The Central Processing Unit:

The Central Processing Unit: The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Objectives Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

Computer Organization and Architecture

Computer Organization and Architecture Computer Organization and Architecture Chapter 11 Instruction Sets: Addressing Modes and Formats Instruction Set Design One goal of instruction set design is to minimize instruction length Another goal

More information

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software

More information

Central Processing Unit (CPU)

Central Processing Unit (CPU) Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

More information

VLIW Processors. VLIW Processors

VLIW Processors. VLIW Processors 1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW

More information

The ARM Cortex-A9 Processors

The ARM Cortex-A9 Processors The ARM Cortex-A9 Processors This whitepaper describes the details of a newly developed processor design within the common ARM Cortex applications profile ARM Cortex-A9 MPCore processor: A multicore processor

More information

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor Travis Lanier Senior Product Manager 1 Cortex-A15: Next Generation Leadership Cortex-A class multi-processor

More information

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007 Application Note 195 ARM11 performance monitor unit Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007 Copyright 2007 ARM Limited. All rights reserved. Application Note

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

Intel 8086 architecture

Intel 8086 architecture Intel 8086 architecture Today we ll take a look at Intel s 8086, which is one of the oldest and yet most prevalent processor architectures around. We ll make many comparisons between the MIPS and 8086

More information

Week 1 out-of-class notes, discussions and sample problems

Week 1 out-of-class notes, discussions and sample problems Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Pentium vs. Power PC Computer Architecture and PCI Bus Interface Pentium vs. Power PC Computer Architecture and PCI Bus Interface CSE 3322 1 Pentium vs. Power PC Computer Architecture and PCI Bus Interface Nowadays, there are two major types of microprocessors in the

More information

Let s put together a Manual Processor

Let s put together a Manual Processor Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce

More information

The ARM Processor Architecture

The ARM Processor Architecture The ARM Processor Architecture Seminar: Multimedia Vorname: Ioannis Name: Skazikis Matr-Nr.: 637868 E-Mail: Ioannis.Skazikis@MNI-Fh-Giessen.De Table of contents: 1. Some words about ARM 2. Introduction

More information

COMPUTER SCIENCE AND ENGINEERING - Microprocessor Systems - Mitchell Aaron Thornton

COMPUTER SCIENCE AND ENGINEERING - Microprocessor Systems - Mitchell Aaron Thornton MICROPROCESSOR SYSTEMS Mitchell Aaron Thornton, Department of Electrical and Computer Engineering, Mississippi State University, PO Box 9571, Mississippi State, MS, 39762-9571, United States. Keywords:

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

Processor Architectures

Processor Architectures ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture

More information

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27 Logistics Week 1: Wednesday, Jan 27 Because of overcrowding, we will be changing to a new room on Monday (Snee 1120). Accounts on the class cluster (crocus.csuglab.cornell.edu) will be available next week.

More information

OC By Arsene Fansi T. POLIMI 2008 1

OC By Arsene Fansi T. POLIMI 2008 1 IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5

More information

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to: 55 Topic 3 Computer Performance Contents 3.1 Introduction...................................... 56 3.2 Measuring performance............................... 56 3.2.1 Clock Speed.................................

More information

SPARC64 VIIIfx: CPU for the K computer

SPARC64 VIIIfx: CPU for the K computer SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS

More information

150127-Microprocessor & Assembly Language

150127-Microprocessor & Assembly Language Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an

More information

CHAPTER 7: The CPU and Memory

CHAPTER 7: The CPU and Memory CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

An Overview of Stack Architecture and the PSC 1000 Microprocessor

An Overview of Stack Architecture and the PSC 1000 Microprocessor An Overview of Stack Architecture and the PSC 1000 Microprocessor Introduction A stack is an important data handling structure used in computing. Specifically, a stack is a dynamic set of elements in which

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

More information

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu. Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive

More information

EMBEDDED SYSTEM BASICS AND APPLICATION

EMBEDDED SYSTEM BASICS AND APPLICATION EMBEDDED SYSTEM BASICS AND APPLICATION TOPICS TO BE DISCUSSED System Embedded System Components Classifications Processors Other Hardware Software Applications 2 INTRODUCTION What is a system? A system

More information

Architectures, Processors, and Devices

Architectures, Processors, and Devices Architectures, Processors, and Devices Development Article Copyright 2009 ARM Limited. All rights reserved. ARM DHT 0001A Development Article Copyright 2009 ARM Limited. All rights reserved. Release Information

More information

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune Introduction to RISC Processor ni logic Pvt. Ltd., Pune AGENDA What is RISC & its History What is meant by RISC Architecture of MIPS-R4000 Processor Difference Between RISC and CISC Pros and Cons of RISC

More information

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of CS:APP Chapter 4 Computer Architecture Wrap-Up William J. Taffe Plymouth State University using the slides of Randal E. Bryant Carnegie Mellon University Overview Wrap-Up of PIPE Design Performance analysis

More information

(Refer Slide Time: 00:01:16 min)

(Refer Slide Time: 00:01:16 min) Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control

More information

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing. CS143 Handout 18 Summer 2008 30 July, 2008 Processor Architectures Handout written by Maggie Johnson and revised by Julie Zelenski. Architecture Vocabulary Let s review a few relevant hardware definitions:

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Introduction to Digital System Design

Introduction to Digital System Design Introduction to Digital System Design Chapter 1 1 Outline 1. Why Digital? 2. Device Technologies 3. System Representation 4. Abstraction 5. Development Tasks 6. Development Flow Chapter 1 2 1. Why Digital

More information

Generations of the computer. processors.

Generations of the computer. processors. . Piotr Gwizdała 1 Contents 1 st Generation 2 nd Generation 3 rd Generation 4 th Generation 5 th Generation 6 th Generation 7 th Generation 8 th Generation Dual Core generation Improves and actualizations

More information

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1 MICROPROCESSOR A microprocessor incorporates the functions of a computer s central processing unit (CPU) on a single Integrated (IC), or at most a few integrated circuit. It is a multipurpose, programmable

More information

CPU Organization and Assembly Language

CPU Organization and Assembly Language COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:

More information

Computer Organization and Components

Computer Organization and Components Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides

More information

Central Processing Unit

Central Processing Unit Chapter 4 Central Processing Unit 1. CPU organization and operation flowchart 1.1. General concepts The primary function of the Central Processing Unit is to execute sequences of instructions representing

More information

High-speed image processing algorithms using MMX hardware

High-speed image processing algorithms using MMX hardware High-speed image processing algorithms using MMX hardware J. W. V. Miller and J. Wood The University of Michigan-Dearborn ABSTRACT Low-cost PC-based machine vision systems have become more common due to

More information

CSE597a - Cell Phone OS Security. Cellphone Hardware. William Enck Prof. Patrick McDaniel

CSE597a - Cell Phone OS Security. Cellphone Hardware. William Enck Prof. Patrick McDaniel CSE597a - Cell Phone OS Security Cellphone Hardware William Enck Prof. Patrick McDaniel CSE597a - Cellular Phone Operating Systems Security - Spring 2009 - Instructors McDaniel and Enck 1 2 Embedded Systems

More information

ARM Cortex-R Architecture

ARM Cortex-R Architecture ARM Cortex-R Architecture For Integrated Control and Safety Applications Simon Craske, Senior Principal Engineer October, 2013 Foreword The ARM architecture continuously evolves to support deployment of

More information

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng Slide Set 8 for ENCM 369 Winter 2015 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2015 ENCM 369 W15 Section

More information

Computer Architectures

Computer Architectures Computer Architectures 2. Instruction Set Architectures 2015. február 12. Budapest Gábor Horváth associate professor BUTE Dept. of Networked Systems and Services ghorvath@hit.bme.hu 2 Instruction set architectures

More information

Introducción. Diseño de sistemas digitales.1

Introducción. Diseño de sistemas digitales.1 Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1

More information

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus Datorteknik F1 bild 1 What is a bus? Slow vehicle that many people ride together well, true... A bunch of wires... A is: a shared communication link a single set of wires used to connect multiple subsystems

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s) Addressing The problem Objectives:- When & Where do we encounter Data? The concept of addressing data' in computations The implications for our machine design(s) Introducing the stack-machine concept Slide

More information

Intel Pentium 4 Processor on 90nm Technology

Intel Pentium 4 Processor on 90nm Technology Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended

More information

FLIX: Fast Relief for Performance-Hungry Embedded Applications

FLIX: Fast Relief for Performance-Hungry Embedded Applications FLIX: Fast Relief for Performance-Hungry Embedded Applications Tensilica Inc. February 25 25 Tensilica, Inc. 25 Tensilica, Inc. ii Contents FLIX: Fast Relief for Performance-Hungry Embedded Applications...

More information

Traditional IBM Mainframe Operating Principles

Traditional IBM Mainframe Operating Principles C H A P T E R 1 7 Traditional IBM Mainframe Operating Principles WHEN YOU FINISH READING THIS CHAPTER YOU SHOULD BE ABLE TO: Distinguish between an absolute address and a relative address. Briefly explain

More information

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture ELE 356 Computer Engineering II Section 1 Foundations Class 6 Architecture History ENIAC Video 2 tj History Mechanical Devices Abacus 3 tj History Mechanical Devices The Antikythera Mechanism Oldest known

More information

MACHINE ARCHITECTURE & LANGUAGE

MACHINE ARCHITECTURE & LANGUAGE in the name of God the compassionate, the merciful notes on MACHINE ARCHITECTURE & LANGUAGE compiled by Jumong Chap. 9 Microprocessor Fundamentals A system designer should consider a microprocessor-based

More information

Software based Finite State Machine (FSM) with general purpose processors

Software based Finite State Machine (FSM) with general purpose processors Software based Finite State Machine (FSM) with general purpose processors White paper Joseph Yiu January 2013 Overview Finite state machines (FSM) are commonly used in electronic designs. FSM can be used

More information

Computer System Design. System-on-Chip

Computer System Design. System-on-Chip Brochure More information from http://www.researchandmarkets.com/reports/2171000/ Computer System Design. System-on-Chip Description: The next generation of computer system designers will be less concerned

More information

Hardware accelerated Virtualization in the ARM Cortex Processors

Hardware accelerated Virtualization in the ARM Cortex Processors Hardware accelerated Virtualization in the ARM Cortex Processors John Goodacre Director, Program Management ARM Processor Division ARM Ltd. Cambridge UK 2nd November 2010 Sponsored by: & & New Capabilities

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw. Sequential Circuit

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw. Sequential Circuit Modeling Sequential Elements with Verilog Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw 4-1 Sequential Circuit Outputs are functions of inputs and present states of storage elements

More information