CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE
|
|
- Mitchell Summers
- 8 years ago
- Views:
Transcription
1 CHAPTER 5 71 FINITE STATE MACHINE FOR LOOKUP ENGINE 5.1 INTRODUCTION Finite State Machines (FSMs) are important components of digital systems. Therefore, techniques for area efficiency and fast implementation of FSMs are areas of great interest. The implementation of an FSM is strongly determined by the way codes are assigned to the states of an FSM. The state assignment problem can be stated as that of assigning codes to the states of an FSM while optimizing a given criterion. The state assignment problem has received considerable attention from researchers (Ellen et al 1992), because it is an important step in the process of sequential circuit synthesis. Some of the reported state assignment tools are analysed in the literature. Villa and Sangiovanni Vincentenlli(1990) for area minimization of PLA implementations, Lin and Newton (1989) for multilevel implementations. Classical approaches for reduction in the complexity of the next state and output functions involve reducing input or output dependencies, state splitting etc.(zvi Kohavi 1996, Frederick Hennie 1968, Demers et al 1989). The state assignment problem (Lucca Benini et al 1998) can also be viewed as that of the decomposition of an FSM. Recently, decomposition of FSM has attracted attention of researchers for area minimization and power reduction. Some of these approaches towards area minimization are Decomposition and Factorization (Srinivas Devadas, A. R. 1989, Newton 1989) Modify and Restore (Rama et al 1994). Chakraborthy, (1994) and Decomposition as Constrained
2 72 Covering (Pranav Ashar 1991). The decomposition and factorization approach tries to extract out repetitive parts of finite state machine, implement them only once and pass on the control to them whenever a particular transition occurs. Since the repetitive part of finite state machine is implemented only once, it can achieve some area reduction. However, there may not be exact repetitive parts in an FSM. This difficulty is resolved by the Modify and Restore approach, which involves modifying the next state and output functions to extract repetitive parts and restoring the original functions using restoring PLA and Ex-OR gates. In this approach, the restoring PLA might itself require a large area. The constrained covering approach has put forward strategies for handling various kinds of topologies such as cascadecascade, cascade-parallel, and parallel-parallel. These approaches are highly dependent on the structure of the machine. There are essentially two approaches to machine decomposition. If the original state graph is partitioned into several pieces, with each piece being implemented by a separate machine with a wait state, then exactly one machine would be active at any instant while the others remain in the reset state (Jose et al 1998). In this case, decomposition is viewed as the joining of two disjoint partitions on the set of states. An alternative approach to decompose an FSM is based on factoring the original state graph and is the focus of this research. In this research, decomposing a FSM into two interacting machines is proposed, for area effective as well as high performance implementation. Contrary to (Jose et al 1998) this decomposition is a factoring of the original machine (with N states) into two much smaller machines (with pn states each) and this factoring can be viewed as the meet of the two orthogonal partitions of the set of states of the original
3 machines. It is to be noted that the two smaller machines are both active simultaneously LOOKUP ENGINE ARCHITECTURE The basic architecture for the lookup engine is shown in Figure 5.1. This performs the address lookup. A reconfigurable hardware is essentially a circuit whose behavior can be modified on the fly. The hardware implementation is in the form of a programmable FSM. The state transition table can be loaded onto it by the processor. Figure 5.1 Lookup engine architecture The processor computes the FSM for a given routing database of address prefixes and then compiles it in a format appropriate for programming the reconfigurable hardware. Due to a routing update, if there is a change in the routing database, then the state machine is recomputed again. In case of changes, either the entire FSM may have to be reprogrammed or changes to some part of the FSM graph may have to be made. All the approaches (except CAM-based solutions) require
4 74 several memory accesses and, thus, the memory bandwidth is one of the major performance bottlenecks. The FSM-based architecture can be efficiently implemented using Flip-Flops (FF) and all the memory accesses can be reduced to accessing the high-speed registers. The implementation can, thus, scale with VLSI technology. This research presents a way to generate an efficient FSM for the routing database and evaluate the lookup speed of such an approach. 5.3 IP ADDRESS LOOKUP SCHEMES The performance of an IP address lookup algorithm is characterized by two parameters. One is the lookup time, i.e., the time required to determine the output interface corresponding to a destination IP address. Since routing table entries may change due to route updates, the time required by an IP lookup algorithm to respond to the changes in the routing database is another parameter used to characterize the IP address lookup. This is termed as update time. IP address lookup engines can be broadly classified into two categories: one based on content addressable memories (CAM) and the other processor memory combination. This work actually creates a third category, which is based on programmable FSMs CAM-based solutions In this model, the address lookup can be performed using ternary CAM (TCAM) and NetLogic Microsystems (2001). In a TCAM, a mask of bits can be specified per word. The routing table entries are stored in the order of decreasing prefix lengths. The longest prefix match, thus, corresponds to the first entry among all the entries that match the destination IP address. A TCAM is an attractive solution for high-speed IP address lookup, however, TCAMs with large sizes are
5 75 typically very expensive. Historically, the CAM technology has also not kept pace with the dynamic random access memory (DRAM) technology in terms of storage density. TCAMs are also very poor in terms of update time, though, recently some progress (P.Gupta et al 2000) has been made in this direction Processor memory based solutions In this model, the routing table entries are present in memory and the lookup algorithm runs on a processor. The objective of an IP lookup algorithm is to organize the routing database in an intelligent manner such that during actual lookup operation as few memory accesses are required as possible. For backbone routers with a large routing database, architectures that use off-chip DRAMs are usually employed. One measure of the lookup speed of an algorithm is the number of DRAM accesses that are required to be made. New memory technologies such as synchronous DRAM (SDRAM), RAMBUS, double data rate DRAM (DDR-DRAM) employ some form of parallel banks of memory and interleaving can be performed to hide memory access latency. As pointed out in Eatherton (1999), each memory technology introduces some tradeoffs and IP lookup algorithms need to be carefully tuned across memory architectures to extract the best performance. One of the simplest ways to store the routing database of address prefixes in memory is in the form of a 1-bit trie. A trie is a tree-like data structure where the prefix bits are used to create tree branches. Several modifications to the basic 1-bit trie have been proposed in the literature. Path compression techniques (Morrison 1968) can be used to remove those nodes from the tree that have only one child. The missing nodes are denoted by a skip value that indicates how many nodes have been skipped on the one-way path. Instead of 1-
6 76 bit tries, multibit tries (Srinivasan et al 1998) can also be used. Unlike in a 1-bit trie, where each node branches to its children depending upon the value of a binary bit, in multibit tries, the branching occurs depending upon the value of several bits taken together. The search also proceeds by inspecting several bits simultaneously. The number of bits to be examined is called the stride length. The strides can be of fixed length or of variable lengths at different levels of the tree. The address prefixes need to be converted into prefixes with lengths equal to the stride. The length of the strides offers a tradeoff between memory and search speed. The optimal strides can be computed using the prefix length distribution (V. Srinivasan et al 1998). In LC Tries (S. Nilsson et al 1999), each complete subtree of height is converted into a subtree of height 1 with children. Thus, a 1-bit trie gets converted into a multibit trie. In Gupta et al (1998), a multibit trie with fixed stride length is implemented using memory banks. This is, however, achieved at the expense of large memory size. Though the above algorithms have provided very novel techniques to arrange the prefixes in an intelligent manner, it is believed that the scalability of processor memory solutions is limited by the fact that the lookup operation requires DRAM accesses. Despite the considerable progress, the DRAM technology has not kept pace with the processor technology. 5.4 FSM FOR LOOKUP ENGINE To illustrate the basic approach, generating an FSM from the 1-bit trie structure is first analyzed. The procedure for generating the 1- bit trie begins at the root node for each prefix. The bits in the prefix are examined one by one. If the bit is zero, then the left node is formed (if not already present) otherwise if the bit is one, then the right node is formed. To generate an FSM, each node in the resulting 1-bit trie can be
7 77 associated with a state in a FSM. The 1-bit trie and the corresponding FSM for the prefix database is illustrated in Figure 5.2. This FSM is called a 1-bit FSM. The state transition table for this FSM is given in Table 5.1. The state corresponding to an address prefix stores the corresponding output interface. To perform a lookup, the destination IP address bits are applied in serial order and the state machine makes a transition from one state to another depending upon the bit. If a state representing a valid interface is encountered, the state number is stored. The IP address bits are applied until a node whose next state is FINAL is encountered. The search is terminated and the output interface number corresponding to the last stored state is retrieved. In the given example, if the destination IP address is 110, the states that would be traversed are S1, S3, and S10 and the output would be 6. Table 5.1 State transition table Current state Input bit Next state Output S1 0 S2 - S1 1 S3 - S2 0 S4 - S3 1 S5 3 S4 0 S6 2 S4 1 S7 1 S5 0 S8 2 S2 1 Final - S3 0 Final - S5 1 Final -
8 78 Current state Input bit Next state Output S6 0 Final - S6 1 Final - S7 0 Final - S7 1 Final - S8 0 Final - S8 1 Final - In the worst case, 32 states might have to be traversed for IP lookup, but note that these are not memory accesses and, hence, could be quite fast. For practical routing databases, the number of states in the state machine would be large. It is possible to calculate the number of states in the FSM generated for Mae east, FUNET, and RIPE routing databases. The results are summarized in Table 5.2. Figure bit trie corresponding to database
9 Table 5.2 Number of states in FSM 79 Database Number of entries Number of states FUNET RIPE The large number of states may result in inefficient hardware implementation and higher delays. Therefore a structured approach is followed in the work, where the FSM graph is partitioned into smaller machines each containing some maximum number of states, say The partitioning of FSM graph is done with a view to minimize the area of the chip and make the performance of the chip predictable. Each machine is made reconfigurable by introducing memory cells. When one machine completes the processing of a packet, the packet is handed over to an appropriate machine by the central block. Various methods for decomposition of state machine into smaller state machines by exploiting the structure of FSM graphs have been investigated 5.5 PATH-COMPRESSED TRIES While binary tries allow the representation of arbitrary length prefixes, they have the characteristic that long sequences of one-child nodes may exist since these bits need to be inspected, even though no actual branching decision is made, search time can be longer than necessary for some cases. Also, one-child nodes consume additional memory. In an attempt to improve time and space performance, this research uses a technique called path-compression.
10 80 Path-compression consists in collapsing one-way branch nodes. When one-way branch nodes are removed from a trie, additional information must be kept in remaining nodes so that search operation can be performed correctly. There are many ways to exploit the pathcompression technique corresponding to the binary trie as shown in Figure 5.3. Note that the two nodes 01 and 10 are removed. However a list of prefixes must be maintained in some of the nodes. Because oneway branch nodes are now removed, and it could be jumping directly to the bit where a significant decision is to be made, bypassing the bit inspection of some bits. As a result, a bit number field must be kept low to indicate which bit is the next bit to be inspected. Figure 5.3 Path-compressed FSM In Figure 5.3 these bit numbers are shown next to the nodes. Moreover, the bit strings of prefixes must be explicitly stored. A search in this kind of path-compressed tries is as follows: The algorithm performs, as usual. For instance the BMP of an address beginning with the bit pattern is taken in the path compressed trie as shown in Figure 5.3.
11 81 Searching of FSM starts at the root node and since its bit number is 1 the first bit of the address is inspected. The first bit is 0 in this example so the search goes to the left. Since the node is marked as prefix, the prefix a with the corresponding part of the address is compressed. Since the node s bit number is 3, operation skips the second bit of the address and inspect the third one. This bit is 0 so search goes to the left. Again checking continues to find whether the prefix b matches the corresponding part of the address (01011). Since they do not match, search stops and the last remembered BMP (prefix a ) is the correct BMP. Path-compression was first proposed in a scheme called PATRICIA, but this scheme does not support longest prefix matching. Sklower (1991) proposed a scheme with modifications for longest prefix matching in A (Broder et al 2001). In fact, this variant was originally designed not only to support prefixes but more general non-contiguous masks. Since this feature was really never used, current implementations differ somehow from the Sklower s original scheme. For example, the BSD version of the path-compressed trie (referred to as BSD trie) is essentially the same as one just described. The basic difference is that in the BSD scheme, the trie is first traversed without checking the prefixes at internal nodes. 5.6 TOPOGRAPHICAL BREAKDOWN OF THE FSM In this work, a simple method to implement the FSM for IP Address lookup and packet classification is designed. It is assumed that the database is static, i.e., it is not being updated. Initially the process of generating Finite State Machines for IP Address Lookup is performed. First, a one bit trie structure is generated using the prefix table. In the one bit trie each node stores the prefix, output interface number, pointer to parent and its children if present. For each prefix, start at the root node
12 82 at the top of the one bit trie. Next while looking at the bits in the prefix from the left, if the bit is 0, then create the left node if absent. If the bit is 1, create the right node if absent. Now change the current node to left node if the bit was 0 or to right node if the bit was 1. This process continues till all the bits in the prefix are exhausted. The output interface number as stated in prefix table is assigned to the current node. Consider a database having the following entries. (prefix ort). 0 such a one bit trie structure is shown in the Figure Figure 5.4 Topographically breakdown of FSM Each node shown in Figure 5.4 can be associated with a state in the corresponding FSM. The output is the output interface number. The FSM corresponding to the given database 5 consists of a large number of states. To handle such a large FSM is not practical. Therefore, it must be partitioned into smaller FSMs. To obtain the optimal performance and minimum area, these large FSMs have to be broken down into smaller FSMs. Some approaches for breaking the one bit trie FSM into machines have been discussed in Kobayashi (2000).
13 83 One possible solution for generating machines is to break the FSM topographically according to the implementation capacity of the machines. This criterion of topographical breaking is to minimize the total number of edges going in or coming out of the machines. It is based on orthogonal partitioning 7 of the original large FSM into smaller machines, which can be executed in parallel and their results, combined to yield the final output. 5.7 PERFORMANCE EVALUATION As discussed previously, the large FSM is broken down into smaller machines using orthogonal decomposition. The inputs to each of the machines namely Present State of the machine (PS), Next State of the machine (NS) and external inputs. Due to orthogonal decomposition, the original FSM consisting of N states is broken down into these 2 machines of this is O(N). The goal here is to minimize the number of multi-edges. Thus, the final graph is one having no parallel edges between a pair of nodes are formed. The machine operates in 2 modes, viz (1) Route lookup mode and (2) Update mode. Simulation results indicate that it takes 5 nsec for the signal to traverse the critical path, and generate the signals for the next state. Since each partition contains 3 states, it would take 15nsec for a complete lookup. However, this does not include some of the delays in the feedback path and some buffer chain delays. Hence, a delay of T rl F =20nsec per lookup would be a conservative estimate. In this mode, the bit stream generated by the processor is loaded serially in the scan chain and then these values are loaded stepwise into the memory cells in each of the rows of the Programmable Logic array. The length of the scan
14 84 chain is 200. The clock signal has a time period of 100psec. Thus it takes 20nsec to load the scan chain. The time required to load the memory cells is about 5nsec. There are 80 such rows to be loaded to complete the reconfiguration. Hence total time required to update is T u F = (20+5)nsec 80= 2 ec A major bottleneck is the generation of the actual bit stream after getting the data from the database, generation of the FSM and doing the optimal coding. This time is estimated to be around 2 min Timing summary after reduction This path compression is run into Xilinx to find the time taken for a complete search of any binary inputs and results were analyzed. The clock speed and maximum frequency taken are given below. Minimum period : 8.519nsec (Maximum Frequency: MHz) Minimum input arrival time before clock : 8.630nsec Maximum output time required after clock : 5.880nsec Timing summary before reduction Speed Grade: -6 Minimum period : nsec (Maximum Frequency : MHz) Minimum input arrival time before clock : nsec Maximum output time required after clock : 6.788nsec
15 CONCLUSION An FSM generated from a 1-bit trie has so far been considered. Multibit tries and its variants can also be considered within the framework. As indicated earlier, the VLSI architecture can be optimized for area by using dynamic reconfiguration. This can improve the packing properties of the architecture.
Decomposition of Finite State Machines for Area, Delay Minimization
Decomposition of Finite State Machines for Area, Delay Minimization Rupesh S. Shelar, Madhav P. Desai, H. Narayanan Department of Electrical Engineering, Indian Institute of Technology, Bombay Mumbai 400
More informationDecomposition of Finite State Machines for Area, Delay Minimization
Decomposition of Finite State Machines for Area, Delay Minimization Rupesh S. Shelar, Madhav P. Desai, H. Narayanan Department of Electrical Engineering, Indian Institute of Technology, Bombay Mumbai 400
More informationEfficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers ABSTRACT Jing Fu KTH, Royal Institute of Technology Stockholm, Sweden jing@kth.se Virtual routers are a promising
More informationInterconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!
Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel
More informationIP address lookup for Internet routers using cache routing table
ISSN (Print): 1694 0814 35 IP address lookup for Internet routers using cache routing table Houassi Hichem 1 and Bilami Azeddine 2 1 Department of Computer Science, University Center of Khenchela, Algeria
More informationScalable Prefix Matching for Internet Packet Forwarding
Scalable Prefix Matching for Internet Packet Forwarding Marcel Waldvogel Computer Engineering and Networks Laboratory Institut für Technische Informatik und Kommunikationsnetze Background Internet growth
More informationFORWARDING of Internet Protocol (IP) packets is the primary. Scalable IP Lookup for Internet Routers
Scalable IP Lookup for Internet Routers David E. Taylor, Jonathan S. Turner, John W. Lockwood, Todd S. Sproull, David B. Parlour Abstract IP address lookup is a central processing function of Internet
More informationNetwork Traffic Monitoring an architecture using associative processing.
Network Traffic Monitoring an architecture using associative processing. Gerald Tripp Technical Report: 7-99 Computing Laboratory, University of Kent 1 st September 1999 Abstract This paper investigates
More informationLecture 7: Clocking of VLSI Systems
Lecture 7: Clocking of VLSI Systems MAH, AEN EE271 Lecture 7 1 Overview Reading Wolf 5.3 Two-Phase Clocking (good description) W&E 5.5.1, 5.5.2, 5.5.3, 5.5.4, 5.5.9, 5.5.10 - Clocking Note: The analysis
More informationDDR subsystem: Enhancing System Reliability and Yield
DDR subsystem: Enhancing System Reliability and Yield Agenda Evolution of DDR SDRAM standards What is the variation problem? How DRAM standards tackle system variability What problems have been adequately
More informationDRAFT 18-09-2003. 2.1 Gigabit network intrusion detection systems
An Intrusion Detection System for Gigabit Networks (Working paper: describing ongoing work) Gerald Tripp Computing Laboratory, University of Kent. CT2 7NF. UK e-mail: G.E.W.Tripp@kent.ac.uk This draft
More informationIntroduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems
Harris Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems David Harris Harvey Mudd College David_Harris@hmc.edu Based on EE271 developed by Mark Horowitz, Stanford University MAH
More informationFinite State Machine. RTL Hardware Design by P. Chu. Chapter 10 1
Finite State Machine Chapter 10 1 Outline 1. Overview 2. FSM representation 3. Timing and performance of an FSM 4. Moore machine versus Mealy machine 5. VHDL description of FSMs 6. State assignment 7.
More information路 論 Chapter 15 System-Level Physical Design
Introduction to VLSI Circuits and Systems 路 論 Chapter 15 System-Level Physical Design Dept. of Electronic Engineering National Chin-Yi University of Technology Fall 2007 Outline Clocked Flip-flops CMOS
More informationVerification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems
Verification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems Melanie Berg 1, Kenneth LaBel 2 1.AS&D in support of NASA/GSFC Melanie.D.Berg@NASA.gov 2. NASA/GSFC Kenneth.A.LaBel@NASA.gov
More informationArchitectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng
Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption
More informationInterconnection Networks
Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode
More informationDesign and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana
More informationFSM Decomposition and Functional Verification of FSM Networks
VLSI DESIGN 1995, Vol. 3, Nos. 3-4, pp. 249-265 Reprints available directly from the publisher Photocopying permitted by license only (C) 1995 OPA (Overseas Publishers Association) Amsterdam B.V. Published
More informationFile Management. Chapter 12
Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution
More informationMemory Systems. Static Random Access Memory (SRAM) Cell
Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled
More informationData Structures For IP Lookup With Bursty Access Patterns
Data Structures For IP Lookup With Bursty Access Patterns Sartaj Sahni & Kun Suk Kim sahni, kskim @cise.ufl.edu Department of Computer and Information Science and Engineering University of Florida, Gainesville,
More informationCost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
More informationCS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions
CS 2112 Spring 2014 Assignment 3 Data Structures and Web Filtering Due: March 4, 2014 11:59 PM Implementing spam blacklists and web filters requires matching candidate domain names and URLs very rapidly
More informationLizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin
BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394
More informationScalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
More informationTowards TCAM-based Scalable Virtual Routers
Towards -based Scalable Virtual Routers Layong Luo *, Gaogang Xie *, Steve Uhlig, Laurent Mathy, Kavé Salamatian, and Yingke Xie * * Institute of Computing Technology, Chinese Academy of Sciences (CAS),
More informationTopics of Chapter 5 Sequential Machines. Memory elements. Memory element terminology. Clock terminology
Topics of Chapter 5 Sequential Machines Memory elements Memory elements. Basics of sequential machines. Clocking issues. Two-phase clocking. Testing of combinational (Chapter 4) and sequential (Chapter
More informationMemory unit. 2 k words. n bits per word
9- k address lines Read n data input lines Memory unit 2 k words n bits per word n data output lines 24 Pearson Education, Inc M Morris Mano & Charles R Kime 9-2 Memory address Binary Decimal Memory contents
More informationFAST IP ADDRESS LOOKUP ENGINE FOR SOC INTEGRATION
FAST IP ADDRESS LOOKUP ENGINE FOR SOC INTEGRATION Tomas Henriksson Department of Electrical Engineering Linköpings universitet SE-581 83 Linköping tomhe@isy.liu.se Ingrid Verbauwhede UCLA EE Dept 7440B
More informationMICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1
MICROPROCESSOR A microprocessor incorporates the functions of a computer s central processing unit (CPU) on a single Integrated (IC), or at most a few integrated circuit. It is a multipurpose, programmable
More informationScalable High-Speed Prefix Matching
Scalable High-Speed Prefix Matching MARCEL WALDVOGEL IBM Zurich Research Laboratory GEORGE VARGHESE University of California, San Diego JON TURNER Washington University in St. Louis and BERNHARD PLATTNER
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationModeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw. Sequential Circuit
Modeling Sequential Elements with Verilog Prof. Chien-Nan Liu TEL: 03-4227151 ext:34534 Email: jimmy@ee.ncu.edu.tw 4-1 Sequential Circuit Outputs are functions of inputs and present states of storage elements
More informationComputer Network. Interconnected collection of autonomous computers that are able to exchange information
Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.
More informationComputer Architecture
Computer Architecture Random Access Memory Technologies 2015. április 2. Budapest Gábor Horváth associate professor BUTE Dept. Of Networked Systems and Services ghorvath@hit.bme.hu 2 Storing data Possible
More informationHow Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet
How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet Professor Jiann-Liang Chen Friday, September 23, 2011 Wireless Networks and Evolutional Communications Laboratory
More informationSAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency
More informationWhite Paper Utilizing Leveling Techniques in DDR3 SDRAM Memory Interfaces
White Paper Introduction The DDR3 SDRAM memory architectures support higher bandwidths with bus rates of 600 Mbps to 1.6 Gbps (300 to 800 MHz), 1.5V operation for lower power, and higher densities of 2
More informationDESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL
IJVD: 3(1), 2012, pp. 15-20 DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL Suvarna A. Jadhav 1 and U.L. Bombale 2 1,2 Department of Technology Shivaji university, Kolhapur, 1 E-mail: suvarna_jadhav@rediffmail.com
More informationIP Address Lookup Using A Dynamic Hash Function
IP Address Lookup Using A Dynamic Hash Function Xiaojun Nie David J. Wilson Jerome Cornet Gerard Damm Yiqiang Zhao Carleton University Alcatel Alcatel Alcatel Carleton University xnie@math.carleton.ca
More informationFast Address Lookups Using Controlled Prefix Expansion
Fast Address Lookups Using Controlled Prefix Expansion V. SRINIVASAN and G. VARGHESE Washington University in St. Louis Internet (IP) address lookup is a major bottleneck in high-performance routers. IP
More informationDIGITAL COUNTERS. Q B Q A = 00 initially. Q B Q A = 01 after the first clock pulse.
DIGITAL COUNTERS http://www.tutorialspoint.com/computer_logical_organization/digital_counters.htm Copyright tutorialspoint.com Counter is a sequential circuit. A digital circuit which is used for a counting
More informationWindows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
More informationCounters and Decoders
Physics 3330 Experiment #10 Fall 1999 Purpose Counters and Decoders In this experiment, you will design and construct a 4-bit ripple-through decade counter with a decimal read-out display. Such a counter
More informationIP Lookups Using Multiway and Multicolumn Search
324 IEEE/ACM TRANSACTIONS NETWORKING, VOL. I, NO. 3, JUNE 1999 IP Lookups Using Multiway and Multicolumn Search Butler Lampson, Venkatachary Srinivasan, and George Varghese, Associate Member, IEEE Abstract-
More informationETEC 2301 Programmable Logic Devices. Chapter 10 Counters. Shawnee State University Department of Industrial and Engineering Technologies
ETEC 2301 Programmable Logic Devices Chapter 10 Counters Shawnee State University Department of Industrial and Engineering Technologies Copyright 2007 by Janna B. Gallaher Asynchronous Counter Operation
More informationAn Algorithm for Performing Routing Lookups in Hardware
31 CHAPTER 2 An Algorithm for Performing Routing Lookups in Hardware 1 Introduction This chapter describes a longest prefix matching algorithm to perform fast IPv4 route lookups in hardware. The chapter
More informationEE 42/100 Lecture 24: Latches and Flip Flops. Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad
A. M. Niknejad University of California, Berkeley EE 100 / 42 Lecture 24 p. 1/20 EE 42/100 Lecture 24: Latches and Flip Flops ELECTRONICS Rev B 4/21/2010 (2:04 PM) Prof. Ali M. Niknejad University of California,
More informationInstructor Notes for Lab 3
Instructor Notes for Lab 3 Do not distribute instructor notes to students! Lab Preparation: Make sure that enough Ethernet hubs and cables are available in the lab. The following tools will be used in
More informationPROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics
PROGETTO DI SISTEMI ELETTRONICI DIGITALI Digital Systems Design Digital Circuits Advanced Topics 1 Sequential circuit and metastability 2 Sequential circuit - FSM A Sequential circuit contains: Storage
More informationLatch Timing Parameters. Flip-flop Timing Parameters. Typical Clock System. Clocking Overhead
Clock - key to synchronous systems Topic 7 Clocking Strategies in VLSI Systems Peter Cheung Department of Electrical & Electronic Engineering Imperial College London Clocks help the design of FSM where
More informationSequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )
Sequential Logic (Materials taken from: Principles of Computer Hardware by Alan Clements ) Sequential vs. Combinational Circuits Combinatorial circuits: their outputs are computed entirely from their present
More informationInterconnection Network Design
Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure
More informationComputer Organization & Architecture Lecture #19
Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of
More informationRAM & ROM Based Digital Design. ECE 152A Winter 2012
RAM & ROM Based Digital Design ECE 152A Winter 212 Reading Assignment Brown and Vranesic 1 Digital System Design 1.1 Building Block Circuits 1.1.3 Static Random Access Memory (SRAM) 1.1.4 SRAM Blocks in
More informationSoC IP Interfaces and Infrastructure A Hybrid Approach
SoC IP Interfaces and Infrastructure A Hybrid Approach Cary Robins, Shannon Hill ChipWrights, Inc. ABSTRACT System-On-Chip (SoC) designs incorporate more and more Intellectual Property (IP) with each year.
More informationSwitch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
More information5. Classless and Subnet Address Extensions 최 양 희 서울대학교 컴퓨터공학부
5. Classless and Subnet Address Extensions 최 양 희 서울대학교 컴퓨터공학부 1 Introduction In the original IP addressing scheme, each physical network is assigned a unique network address Individual sites can have the
More informationSystolic Computing. Fundamentals
Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW
More informationCROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING
CHAPTER 6 CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING 6.1 INTRODUCTION The technical challenges in WMNs are load balancing, optimal routing, fairness, network auto-configuration and mobility
More informationModule 15: Network Structures
Module 15: Network Structures Background Topology Network Types Communication Communication Protocol Robustness Design Strategies 15.1 A Distributed System 15.2 Motivation Resource sharing sharing and
More informationSMALL INDEX LARGE INDEX (SILT)
Wayne State University ECE 7650: Scalable and Secure Internet Services and Architecture SMALL INDEX LARGE INDEX (SILT) A Memory Efficient High Performance Key Value Store QA REPORT Instructor: Dr. Song
More informationSequential Circuit Design
Sequential Circuit Design Lan-Da Van ( 倫 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2009 ldvan@cs.nctu.edu.tw http://www.cs.nctu.edu.tw/~ldvan/ Outlines
More informationFault Modeling. Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults. Transistor faults Summary
Fault Modeling Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults Single stuck-at faults Fault equivalence Fault dominance and checkpoint theorem Classes of stuck-at
More informationOpen Flow Controller and Switch Datasheet
Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development
More informationChapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe
More informationSystem Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1
System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect
More informationReconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra
More informationOperating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師
Lecture 7: Distributed Operating Systems A Distributed System 7.2 Resource sharing Motivation sharing and printing files at remote sites processing information in a distributed database using remote specialized
More informationPower Reduction Techniques in the SoC Clock Network. Clock Power
Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a
More informationComputer Systems Structure Main Memory Organization
Computer Systems Structure Main Memory Organization Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Storage/Memory
More informationIntroduction to Parallel Computing. George Karypis Parallel Programming Platforms
Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a Parallel Computer Hardware Multiple Processors Multiple Memories Interconnection Network System Software Parallel
More informationIndex Terms Domain name, Firewall, Packet, Phishing, URL.
BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet
More informationFuzzy Active Queue Management for Assured Forwarding Traffic in Differentiated Services Network
Fuzzy Active Management for Assured Forwarding Traffic in Differentiated Services Network E.S. Ng, K.K. Phang, T.C. Ling, L.Y. Por Department of Computer Systems & Technology Faculty of Computer Science
More informationDesign Verification & Testing Design for Testability and Scan
Overview esign for testability (FT) makes it possible to: Assure the detection of all faults in a circuit Reduce the cost and time associated with test development Reduce the execution time of performing
More informationDistributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
More informationEE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution
More informationA Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin
A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86
More informationExploiting Stateful Inspection of Network Security in Reconfigurable Hardware
Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Shaomeng Li, Jim Tørresen, Oddvar Søråsen Department of Informatics University of Oslo N-0316 Oslo, Norway {shaomenl, jimtoer,
More informationAlgorithms for Advanced Packet Classification with Ternary CAMs
Algorithms for Advanced Packet Classification with Ternary CAMs Karthik Lakshminarayanan UC Berkeley Joint work with Anand Rangarajan and Srinivasan Venkatachary (Cypress Semiconductor) Packet Processing
More informationModule 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1
Module 2 Embedded Processors and Memory Version 2 EE IIT, Kharagpur 1 Lesson 5 Memory-I Version 2 EE IIT, Kharagpur 2 Instructional Objectives After going through this lesson the student would Pre-Requisite
More informationWe r e going to play Final (exam) Jeopardy! "Answers:" "Questions:" - 1 -
. (0 pts) We re going to play Final (exam) Jeopardy! Associate the following answers with the appropriate question. (You are given the "answers": Pick the "question" that goes best with each "answer".)
More information(Refer Slide Time: 02:17)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #06 IP Subnetting and Addressing (Not audible: (00:46)) Now,
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 11 Memory Management Computer Architecture Part 11 page 1 of 44 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin
More informationIn-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
More informationAsynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow
Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Bradley R. Quinton Dept. of Electrical and Computer Engineering University of British Columbia bradq@ece.ubc.ca
More informationAnalysis of MapReduce Algorithms
Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model
More informationArchitecture of distributed network processors: specifics of application in information security systems
Architecture of distributed network processors: specifics of application in information security systems V.Zaborovsky, Politechnical University, Sait-Petersburg, Russia vlad@neva.ru 1. Introduction Modern
More informationChapter 14: Distributed Operating Systems
Chapter 14: Distributed Operating Systems Chapter 14: Distributed Operating Systems Motivation Types of Distributed Operating Systems Network Structure Network Topology Communication Structure Communication
More informationCHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL
CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter
More informationMeasuring the Performance of an Agent
25 Measuring the Performance of an Agent The rational agent that we are aiming at should be successful in the task it is performing To assess the success we need to have a performance measure What is rational
More informationChapter 13. Disk Storage, Basic File Structures, and Hashing
Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing
More informationWEEK 8.1 Registers and Counters. ECE124 Digital Circuits and Systems Page 1
WEEK 8.1 egisters and Counters ECE124 igital Circuits and Systems Page 1 Additional schematic FF symbols Active low set and reset signals. S Active high set and reset signals. S ECE124 igital Circuits
More informationNAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ
What is NAND Flash? What is the major difference between NAND Flash and other Memory? Structural differences between NAND Flash and NOR Flash What does NAND Flash controller do? How to send command to
More informationBig Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
More informationBITWISE OPTIMISED CAM FOR NETWORK INTRUSION DETECTION SYSTEMS. Sherif Yusuf and Wayne Luk
BITWISE OPTIMISED CAM FOR NETWORK INTRUSION DETECTION SYSTEMS Sherif Yusuf and Wayne Luk Department of Computing, Imperial College London, 180 Queen s Gate, London SW7 2BZ email: {sherif.yusuf, w.luk}@imperial.ac.uk
More informationChapter 12 File Management
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access
More information