SUB-MICRON TECHNOLOGY DEVELOPMENT AND SYSTEM-ON-CHIP (SoC) DESIGN - DATA COMPRESSION CORE NASIR SHAIKH HUSIN. Final Report for IRPA Vot 72319
|
|
|
- Rosanna Robbins
- 10 years ago
- Views:
Transcription
1 SUB-MICRON TECHNOLOGY DEVELOPMENT AND SYSTEM-ON-CHIP (SoC) DESIGN - DATA COMPRESSION CORE NASIR SHAIKH HUSIN Final Report for IRPA Vot Fakulti Kejuruteraan Elektrik Universiti Teknologi Malaysia DECEMBER 2002
2 ii ABSTRACT Data compression removes redundancy from the source data and thereby increases storage capacity of a storage medium or efficiency of data transmission in a communication link. Although several data compression techniques have been implemented in hardware, they are not flexible enough to be embedded in more complex applications. Data compression software meanwhile cannot support the demand of high-speed computing applications. Due to these deficiencies, in this project we develop a parameterized lossless universal data compression IP core for high-speed applications. The design of the core is based on the combination of Lempel-Ziv-Storer-Szymanski (LZSS) compression algorithm and Huffman coding. The resulting IP core offers a data-independent throughput that can process a symbol in every clock cycle. The design is described in parameterized VHDL code to enable a user to make a suitable compromise between resource constraints, operation speed and compression saving, so that it can be adapted for any target application. In implementation on Altera FLEX10KE FPGA device, the design offers a performance of 800 Mbps with an operating frequency of 50 MHz. This IP core is suitable for high-speed computing applications or for storage systems.
3 iii ABSTRAK Pemampatan data menyingkirkan lelebihan dari sumber data dan dengan itu meningkatkan keupayaan simpanan sesuatu bahantara storan atau kecekapan penghantaran data dalam sesuatu kait perhubungan. Walaupun beberapa teknik pemampatan data telah dilaksanakan dalam perkakasan, mereka tidak mudah diubah suai untuk digunakan di dalam aplikasi yang lebih kompleks. Perisian pemampatan data pula tidak boleh menyokong keperluan aplikasi perkomputeran kelajuan tinggi. Oleh kerana kekurangan tersebut, dalam projek ini kami membina satu teras IP pemampatan data tanpa-kehilangan berparameter yang semesta untuk aplikasi kelajuan tinggi. Rekabentuk teras adalah berdasarkan gabungan algoritma pemampatan Lempel-Ziv-Storer-Szymanski (LZSS) dan pengekodan Huffman. Teras IP yang dihasilkan mempunyai celusan yang tidak bergantung pada data dan boleh memproses satu simbol dalam setiap kitar jam. Rekabentuk teras ditulis dalam kod VHDL berparameter untuk membolehkan pengguna membuat kompromi yang sesuai diantara kekangan komponen, kelajuan operasi dan penjimatan pemampatan, supaya ia dapat disesuaikan untuk sebarang aplikasi sasaran. Berdasarkan pelaksanan dalam peranti FPGA FLEX10KE Altera, rekabentuk ini mempunyai prestasi 800 Mbps dengan frekuensi pengendalian 50 MHz. Teras IP ini adalah sesuai untuk aplikasi perkomputeran kelajuan tinggi atau untuk sistem storan.
4 iv TABLE OF CONTENTS CHAPTER TITLE PAGE ABSTRACT ABSTRAK TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES LIST OF ABBREVIATIONS ii iii iv vii viii ix CHAPTER I INTRODUCTION Problem Statement 1.2 Objectives 1.3 Scope of Work 1.4 Contributions of the Research 1.5 Organization of Report 1.6 Summary CHAPTER II LITERATURE REVIEW Data Compression Background 2.2 Data Compression Algorithms 2.3 Previous work on Implementations of Data Compression 2.4 Reconfigurability and Reusability 2.5 Systolic Architecture for LZ77/LZSS 2.6 Discussion and Summary
5 v CHAPTER III METHODOLOGIES AND TOOLS Research Procedure 3.2 Timing Simulation in Design Verification 3.3 Hardware Evaluation System for Test and Performance Analysis 3.4 Hardware and Software Tools 3.5 Discussion and Summary CHAPTER IV DESIGN OF DATA COMPRESSION HARDWARE Overview of Compression Hardware System 4.2 Design of Compression Core 4.3 Compression Interface Controller 4.4 Design of Decompression Core 4.5 Decompression Interface Controller 4.6 Summary CHAPTER V DESIGN VERIFICATION AND PERFORMANCE ANALYSIS Design Verification Results of Timing Simulation Results of Hardware Test 5.2 Performance Analysis Performance Metrics Applied Performance Analysis Clock Speed and Area Performance Analysis Compression Ratio
6 vi Performance Analysis Comparison 5.3 Summary with Other Implementations CHAPTER VI CONCLUSIONS Concluding Remarks 6.2 Future Work REFERENCES 37
7 CHAPTER I INTRODUCTION This project is a collaboration project between MIMOS, which acted as the Project Leader, and Universiti Teknologi Malaysia, Universiti Putra Malaysia, Universiti Sains Malaysia and Universiti Malaya. The main objective of the project is to develop a 0.35-micron integrated circuit (IC) fabrication technology. MIMOS can already fabricate ICs using 0.8-micron technology. However, this technology is several generations behind state of the art fabrication process. With this project, the gap can be reduced. The current state of the art fabrication technology is 0.15-micron process. All current processes are called sub-micron processes because the smallest feature that can be fabricated is less than 1 micron. When the fabrication process is successfully upgraded, local research institutions or companies can contract with MIMOS to fabricate their designs. To help these clients, MIMOS must have many sub-system designs, ready to be embedded by clients into their applications. These sub-systems should be in the form of IP (intellectual property) cores, where the design is described in a hardware design language like VHDL, and must be parameterizable so that customers can easily modify the cores to suit their applications. For organizations that do not want to fabricate their design, they can also use FPGA (Field Programmable Gate Array.) The FPGA devices nowadays can accommodate really large designs, integrating many sub-systems into a single device. These new FPGA devices give rise to socalled SoC (Silicon-on-Chip) design methodology, where sub-system integration and software interfacing have to be given serious consideration. The entire project is distributed as follows:
8 2 MIMOS (Project Leader) is responsible for sub-micron process development and device design. UTM is responsible for IP core design and SoC design methodology. UPM is responsible for sub-micron process and device simulations. USM is responsible for process development test and characterization. UM is also responsible for core design. UTM chose to develop data compression IP core. The objective is to produce a reusable data compression core such that a user can choose to compromise between resource constraints, speed and compression level to suit a variety of high-speed applications. The design is based on Lempel-Ziv-Storer-Szymanski (LZSS) compression algorithm combined with Huffman coding. The rest of this report describes the development of this data compression core. In this chapter, the challenge of data compression implementation is discussed. The objective and scope of this work is also presented. 1.1 Problem Statement Data compression is a cost-effective way to increase the effective communication bandwidth or the effective storage capacity without significant increment in cost. It removes redundant information inherent in the source data, thereby enabling a communication link to transmit the same amount of data in fewer bits. For storage systems, fewer bits are actually stored thus increasing the effective storage capacity. The compression core must be able to handle any kind of source data so that it can be used in real-world applications. The implementation has to be fast enough to satisfy high-speed applications. Finally, the design must not be too complicated such that hardware requirements are too excessive. Several data compression techniques have been implemented in either software or hardware. Data compression software, which runs on a personal
9 3 computer (PC), is generally slow and thus not suitable for high-speed computing applications. Dedicated, stand-alone compression ICs are also available, (e.g. IBM ALDC1-40S and STAC Electronics Hi/fn 9600.) These ICs are fixed hardware and cannot be customized or embedded into a larger system. Therefore, the core design must be parameterizable and must take into account resource constraints, speed of operation and compression ratio. 1.2 Objectives The objectives of the research are: 1. To design an embedded data compression core for high-speed applications. The design is parameterizable to facilitate reusability for future applications. 2. To develop a PC-based hardware evaluation system that integrates host PC to the compression core through a PCI-based communication bus. This evaluation system is used to verify the functionality of the compression core. 1.3 Scope of Work The project can be divided into two phases. The first phase is to design a data compression core, where parameterized VHDL is used as design entry. This core should have a speed of at least 100 Mbits per second and a comparable compression ratio with other reported implementations. The second phase is to develop a PCbased hardware evaluation system. The system consists of a GUI software running on a host PC and an evaluation platform. This platform employs a commercial PCI controller to handle the complex PCI bus communication protocol. An integration module consisting various basic interface blocks is then developed and used to handle data transfer protocol between the PCI controller and the compression hardware. The evaluation system is shown in Figure 1.1.
10 4 GUI Program Host PC PCI Driver PCI BUS PCI Plug-In Card EVALUATION PLATFORM PCI Controller Compression Core Integration Module Figure 1.1: Evaluation System The implementation for this project is limited to Altera devices only. This is because of the usage of Altera LPM modules to instantiate RAM and FIFO. For implementations other than Altera devices, all the LPM modules should be replaced with equivalent modules in the particular chosen device. 1.4 Contributions of the Research This research work has led to the following outcomes: 1. The development of a parameterizable data compression embedded core with a competitive performance, compared to other high-performance ASIC or FPGA data compression hardware. It can process a symbol with a clock speed of 50 MHz and offer a data-independent throughput of 50 Msymbols per second. Since the design is parameterizable, it can be easily interfaced
11 5 with any system and enable a user to choose a suitable compromise between resource constraints, speed and compression ratio. 2. The development of a PC-based hardware evaluation system that allows a host PC to access the hardware resources in the compression coprocessor via the PCI communication bus interface. This evaluation system facilitates verification of the coprocessor design. 3. With a suitable arrangement of compression and decompression units in an embedded core environment, we can obtain a real-time data compression system that can work in either half-duplex or full-duplex processing. 1.5 Organization of Report This report is organized into six chapters. The first chapter introduces the motivation of this research. Chapter II contains the literature review to support the rationale behind this work and also to search for the solutions to the research problem. Chapter III describes the methodologies for this project where design verification methods, tools and techniques used in this project are described. Chapter IV presents the hardware design of the data compression core. Chapter V reports the results obtained after testing the designed core. The results are analyzed to validate the compression core design. The performance analysis of the core is also included in this chapter. Finally, the last chapter summarizes the research findings and suggests potential future work.
12 6 1.6 Summary This chapter provides an overview of the project. The challenge of the data compression implementation is discussed, and the objectives for the project are identified.
13 SUB-MICRON TECHNOLOGY DEVELOPMENT AND SYSTEM-ON-CHIP (SoC) DESIGN - DATA COMPRESSION CORE NASIR SHAIKH HUSIN Final Report for IRPA Vot Fakulti Kejuruteraan Elektrik Universiti Teknologi Malaysia DECEMBER 2002
14 ii ABSTRACT Data compression removes redundancy from the source data and thereby increases storage capacity of a storage medium or efficiency of data transmission in a communication link. Although several data compression techniques have been implemented in hardware, they are not flexible enough to be embedded in more complex applications. Data compression software meanwhile cannot support the demand of high-speed computing applications. Due to these deficiencies, in this project we develop a parameterized lossless universal data compression IP core for high-speed applications. The design of the core is based on the combination of Lempel-Ziv-Storer-Szymanski (LZSS) compression algorithm and Huffman coding. The resulting IP core offers a data-independent throughput that can process a symbol in every clock cycle. The design is described in parameterized VHDL code to enable a user to make a suitable compromise between resource constraints, operation speed and compression saving, so that it can be adapted for any target application. In implementation on Altera FLEX10KE FPGA device, the design offers a performance of 800 Mbps with an operating frequency of 50 MHz. This IP core is suitable for high-speed computing applications or for storage systems.
15 iii ABSTRAK Pemampatan data menyingkirkan lelebihan dari sumber data dan dengan itu meningkatkan keupayaan simpanan sesuatu bahantara storan atau kecekapan penghantaran data dalam sesuatu kait perhubungan. Walaupun beberapa teknik pemampatan data telah dilaksanakan dalam perkakasan, mereka tidak mudah diubah suai untuk digunakan di dalam aplikasi yang lebih kompleks. Perisian pemampatan data pula tidak boleh menyokong keperluan aplikasi perkomputeran kelajuan tinggi. Oleh kerana kekurangan tersebut, dalam projek ini kami membina satu teras IP pemampatan data tanpa-kehilangan berparameter yang semesta untuk aplikasi kelajuan tinggi. Rekabentuk teras adalah berdasarkan gabungan algoritma pemampatan Lempel-Ziv-Storer-Szymanski (LZSS) dan pengekodan Huffman. Teras IP yang dihasilkan mempunyai celusan yang tidak bergantung pada data dan boleh memproses satu simbol dalam setiap kitar jam. Rekabentuk teras ditulis dalam kod VHDL berparameter untuk membolehkan pengguna membuat kompromi yang sesuai diantara kekangan komponen, kelajuan operasi dan penjimatan pemampatan, supaya ia dapat disesuaikan untuk sebarang aplikasi sasaran. Berdasarkan pelaksanan dalam peranti FPGA FLEX10KE Altera, rekabentuk ini mempunyai prestasi 800 Mbps dengan frekuensi pengendalian 50 MHz. Teras IP ini adalah sesuai untuk aplikasi perkomputeran kelajuan tinggi atau untuk system storan.
16 iv TABLE OF CONTENTS CHAPTER TITLE PAGE ABSTRACT ABSTRAK TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES LIST OF ABBREVIATIONS ii iii iv vii viii ix CHAPTER I INTRODUCTION Problem Statement 1.2 Objectives 1.3 Scope of Work 1.4 Contributions of the Research 1.5 Organization of Report 1.6 Summary CHAPTER II LITERATURE REVIEW Data Compression Background 2.2 Data Compression Algorithms 2.3 Previous work on Implementations of Data Compression 2.4 Reconfigurability and Reusability 2.5 Systolic Architecture for LZ77/LZSS 2.6 Discussion and Summary
17 v CHAPTER III METHODOLOGIES AND TOOLS Research Procedure 3.2 Timing Simulation in Design Verification 3.3 Hardware Evaluation System for Test and Performance Analysis 3.4 Hardware and Software Tools 3.5 Discussion and Summary CHAPTER IV DESIGN OF DATA COMPRESSION HARDWARE Overview of Compression Hardware System 4.2 Design of Compression Core 4.3 Compression Interface Controller 4.4 Design of Decompression Core 4.5 Decompression Interface Controller 4.6 Summary CHAPTER V DESIGN VERIFICATION AND PERFORMANCE ANALYSIS Design Verification Results of Timing Simulation Results of Hardware Test 5.2 Performance Analysis Performance Metrics Applied Performance Analysis Clock Speed and Area Performance Analysis Compression Ratio
18 vi Performance Analysis Comparison 5.3 Summary with Other Implementations CHAPTER VI CONCLUSIONS Concluding Remarks 6.2 Future Work REFERENCES 37
19 vii LIST OF TABLES TABLE NO TITLE PAGE 3.1 Compression Benchmark Data Set Parameter Configuration of the PCI-Based Data Compression Hardware Core Standard Data Compression Benchmark Data Sets Latency of Compression Core Latency of Decompression Core Area and Speed Evaluation Results Compression Ratio Computation Time for Compression Computation Time for Decompression Comparison of Data Compression Hardware using Text Data Specifications of the Data Compression Embedded Core 35
20 viii LIST OF FIGURES FIGURE NO TITLE PAGE 1.1 Evaluation System Broadcast/Reduce Systolic Architecture Compression System Functional Block Diagram of the Compression Subsystem Functional Block Diagram of the Decompression Subsystem Block Diagram of Compression_Unit Block Diagram of Compression_Interface Block Diagram of Decompression_Unit Block Diagram of Decompression_Interface 24
21 ix LIST OF ABBREVIATIONS API - Application Programming Interface ASIC - Application Specific Integrated Circuit CMOS - Complementary Metal Oxide Semiconductor EAB - Embedded Array Block FIFO - First In First Out FLEX - Flexible Logic Element MatriX FPGA - Field Programmable Gate Array GUI - Graphic User Interface I/O - Input/Output IC - Integrated Circuit IP - Intellectual Property JTAG - Joint Action Test Group LC - Logic Cell LE - Logic Element LPM - Library of Parameterized Modules LZ - Lempel-Ziv LZ77 - Lempel-Ziv-77 LZSS - Lempel-Ziv-Storer-Szymanski OS - Operating System PC - Personal Computer PCI - Peripheral Component Interconnect PE - Processing Element RAM - Random Access Memory SoC - System on Chip UTM - Universiti Teknologi Malaysia VHDL - Very high speed integrated circuit Hardware Description Language VLSI - Very Large Scale Integration
22 CHAPTER I INTRODUCTION This project is a collaboration project between MIMOS, which acted as the Project Leader, and Universiti Teknologi Malaysia, Universiti Putra Malaysia, Universiti Sains Malaysia and Universiti Malaya. The main objective of the project is to develop a 0.35-micron integrated circuit (IC) fabrication technology. MIMOS can already fabricate ICs using 0.8-micron technology. However, this technology is several generations behind state of the art fabrication process. With this project, the gap can be reduced. The current state of the art fabrication technology is 0.15-micron process. All current processes are called sub-micron processes because the smallest feature that can be fabricated is less than 1 micron. When the fabrication process is successfully upgraded, local research institutions or companies can contract with MIMOS to fabricate their designs. To help these clients, MIMOS must have many sub-system designs, ready to be embedded by clients into their applications. These sub-systems should be in the form of IP (intellectual property) cores, where the design is described in a hardware design language like VHDL, and must be parameterizable so that customers can easily modify the cores to suit their applications. For organizations that do not want to fabricate their design, they can also use FPGA (Field Programmable Gate Array.) The FPGA devices nowadays can accommodate really large designs, integrating many sub-systems into a single device. These new FPGA devices give rise to socalled SoC (Silicon-on-Chip) design methodology, where sub-system integration and software interfacing have to be given serious consideration. The entire project is distributed as follows:
23 2 MIMOS (Project Leader) is responsible for sub-micron process development and device design. UTM is responsible for IP core design and SoC design methodology. UPM is responsible for sub-micron process and device simulations. USM is responsible for process development test and characterization. UM is also responsible for core design. UTM chose to develop data compression IP core. The objective is to produce a reusable data compression core such that a user can choose to compromise between resource constraints, speed and compression level to suit a variety of high-speed applications. The design is based on Lempel-Ziv-Storer-Szymanski (LZSS) compression algorithm combined with Huffman coding. The rest of this report describes the development of this data compression core. In this chapter, the challenge of data compression implementation is discussed. The objective and scope of this work is also presented. 1.1 Problem Statement Data compression is a cost-effective way to increase the effective communication bandwidth or the effective storage capacity without significant increment in cost. It removes redundant information inherent in the source data, thereby enabling a communication link to transmit the same amount of data in fewer bits. For storage systems, fewer bits are actually stored thus increasing the effective storage capacity. The compression core must be able to handle any kind of source data so that it can be used in real-world applications. The implementation has to be fast enough to satisfy high-speed applications. Finally, the design must not be too complicated such that hardware requirements are too excessive. Several data compression techniques have been implemented in either software or hardware. Data compression software, which runs on a personal
24 3 computer (PC), is generally slow and thus not suitable for high-speed computing applications. Dedicated, stand-alone compression ICs are also available, (e.g. IBM ALDC1-40S and STAC Electronics Hi/fn 9600.) These ICs are fixed hardware and cannot be customized or embedded into a larger system. Therefore, the core design must be parameterizable and must take into account resource constraints, speed of operation and compression ratio. 1.2 Objectives The objectives of the research are: 1. To design an embedded data compression core for high-speed applications. The design is parameterizable to facilitate reusability for future applications. 2. To develop a PC-based hardware evaluation system that integrates host PC to the compression core through a PCI-based communication bus. This evaluation system is used to verify the functionality of the compression core. 1.3 Scope of Work The project can be divided into two phases. The first phase is to design a data compression core, where parameterized VHDL is used as design entry. This core should have a speed of at least 100 Mbits per second and a comparable compression ratio with other reported implementations. The second phase is to develop a PCbased hardware evaluation system. The system consists of a GUI software running on a host PC and an evaluation platform. This platform employs a commercial PCI controller to handle the complex PCI bus communication protocol. An integration module consisting various basic interface blocks is then developed and used to handle data transfer protocol between the PCI controller and the compression hardware. The evaluation system is shown in Figure 1.1.
25 4 GUI Program Host PC PCI Driver PCI BUS PCI Plug-In Card EVALUATION PLATFORM PCI Controller Compression Core Integration Module Figure 1.1: Evaluation System The implementation for this project is limited to Altera devices only. This is because of the usage of Altera LPM modules to instantiate RAM and FIFO. For implementations other than Altera devices, all the LPM modules should be replaced with equivalent modules in the particular chosen device. 1.4 Contributions of the Research This research work has led to the following outcomes: 1. The development of a parameterizable data compression embedded core with a competitive performance, compared to other high-performance ASIC or FPGA data compression hardware. It can process a symbol with a clock speed of 50 MHz and offer a data-independent throughput of 50 Msymbols per second. Since the design is parameterizable, it can be easily interfaced
26 5 with any system and enable a user to choose a suitable compromise between resource constraints, speed and compression ratio. 2. The development of a PC-based hardware evaluation system that allows a host PC to access the hardware resources in the compression coprocessor via the PCI communication bus interface. This evaluation system facilitates verification of the coprocessor design. 3. With a suitable arrangement of compression and decompression units in an embedded core environment, we can obtain a real-time data compression system that can work in either half-duplex or full-duplex processing. 1.5 Organization of Report This report is organized into six chapters. The first chapter introduces the motivation of this research. Chapter II contains the literature review to support the rationale behind this work and also to search for the solutions to the research problem. Chapter III describes the methodologies for this project where design verification methods, tools and techniques used in this project are described. Chapter IV presents the hardware design of the data compression core. Chapter V reports the results obtained after testing the designed core. The results are analyzed to validate the compression core design. The performance analysis of the core is also included in this chapter. Finally, the last chapter summarizes the research findings and suggests potential future work.
27 6 1.6 Summary This chapter provides an overview of the project. The challenge of the data compression implementation is discussed, and the objectives for the project are identified.
28 CHAPTER II LITERATURE REVIEW In this chapter we present reviews regarding data compression technique and its algorithms. The hardware implementation of data compression and reconfigurability and reusability issues are also discussed. 2.1 Data Compression Background Data compression is the science and the art of massaging data from a given source data in such a way as to obtain a simple representation with at most a tolerable loss of fidelity [Nelson 1996]. Such simplification may be necessitated by storage constraints, bandwidth limitations, or communications channel capacity. Depending on the tolerability of loss, data compression technique can be classified as either lossless or lossy. A lossless technique means that the restored data is identical to the original. It is necessary for many types of data, such as executable code, where nobody can afford to misplace even a single bit of information. A lossy technique means that the restored data approximates the original. This technique might be suitable for data that represent images or voice, where the changes made to the original data resemble a small amount of additional noise. Therefore a lossy technique can produce more saving compared to the lossless techniques. Data compression techniques are widely used to reduce the data to be stored or transmitted. In addition, when the amount of data to be transmitted is reduced, the
29 8 communication channel is effectively increased, resulting in a higher data transfer rate [Hifn 1999]. Data compression can also be applied in the field of cryptography [Lim 2001], due to some similarities between data compression and encryption techniques. 2.2 Data Compression Algorithms Most of the widely used data compression algorithms are based on Lempel- Ziv (LZ) compression [Lempel and Ziv 1977, 1978]. In LZ compression, the previous seen symbols in the source data are used to build a dictionary (or encoding table) adaptively. This dictionary is then used to detect the repeating symbols in the incoming source data. When the repeating symbols are detected, the symbols are replaced with a form whose length is as small as possible. This replacement method is well known as textual substitution. Since the dictionary is adaptively generated from the source data, a prior knowledge of the source properties is no longer required. Therefore, LZ compression and its variants are the universal type of data compression algorithm. Lempel-Ziv-77 (LZ77) algorithm proposed by Abraham Lempel and Jacob Ziv in 1977 is the first LZ compression algorithm [Lempel and Ziv 1977]. LZ77 is a lossless compression algorithm. LZ77 algorithm achieves a very good compression ratio for many types of data, but it involves a computation-intensive comparison during the matching process. Furthermore, the LZ77 codeword is not optimal and it contains redundancy. To overcome this redundancy problem, a more efficient encoding has been provided in Lempel-Ziv-Storer-Szymanski (LZSS) algorithm [Storer and Szymanski 1982]. Unlike the LZ77, which produces fixed-length codeword, the LZSS produces variable-length codeword. The LZSS codeword is a position-length pair when the length is not zero. Otherwise, it outputs the explicit symbols (only the extension symbol). This modification results in a better compression ratio than that of the LZ77 algorithm.
30 9 In general, the length in the LZ77/LZSS codeword is not uniform. A smaller length occurs more frequently, and this suggests that Huffman coding [Huffman 1952] can be used to encode further the codeword to yield additional saving in the compression. The general strategy in the Huffman coding is to construct a code table that is optimized for known characteristics of source data. It allows codeword length to vary from symbol to symbol, where fewer bits are assigned to the frequently occurring symbols and more bits to the seldom-used symbols. In many practical applications, Huffman coding is usually applied after an encoding process that produce non-uniformly distributed codeword. With this strategy, a better compression savings can be achieved. 2.3 Previous Work on Implementations of Data Compression Software implementations can be divided into two types. The first type is software that emphasizes on the compression ratio, sacrificing speed. These include software like gzip, LZH and LZEXE, in which the compression is achieved based on the combination of the LZ77/LZSS and Huffman coding algorithm. The second type is software that emphasizes on the speed rather than on compression ratio. These include Stac s Stacker and Microsoft s DoubleSpace, software compression programs in which, data read or written to hard disk is compressed based on the LZ77 algorithm or its variants. To achieve high performance in both the speed and compression ratio, data compression need to be implemented on hardware. Several parallel processing hardware architectures that provide high throughput has been proposed. [Gonzalez- Smith and Storer 1985, Storer 1992] introduced several systolic-array-based architectures, differing in the manner the comparison output is transmitted. [Stauffer and Hirshberg 1993] evaluated the performances of these systolic-array-based architecture, including: Match Tree, Broadcast/Reduce, and Wrap architecture [Gonzalez-Smith and Storer 1985, Zito-Wolf 1990, Storer 1992]. These systolic array hardware architectures would be further elaborated in Section 2.5.
31 10 In 1995, the LZ77 algorithm with a dictionary size of bytes has been made into a commercial product by IBM s ALDC [Craft 1995]. In 1997, [Benchop 1997] implemented the combination of LZSS and Huffman coding with a dictionary size of up to 8192 bytes in VLSI circuit using a 0.35 µm process technology. VLSI implementations are application-specific, hence they need to be replaced when the target application change. On the other hand, FPGA-based implementations provide an advantage over these custom VLSI implementations in that they are reconfigurable, and therefore flexible and reusable. These reconfigurability and reusability issues are discussed further in the next section. FPGA-based hardware implementation of compression grows rapidly in the late 1990s. [Huang 2000] from Stanford University has implemented the LZ77 algorithm on four Xilinx 4036XLA FPGA chips in 2000, with a throughput of 100 Mbits per second. In the same year, System Design Group from Loughborough University [Nunez and Jones 2000] introduced their FPGA-based hardware implementation, where they implement their X-MatchPRO algorithm into 0.25 µm FLASH-CMOS FPGA Actel A500K ProASIC device and offer a throughput of 100 Mbytes per second at 25 MHz clock. 2.4 Reconfigurability and Reusability Reusability promotes the use of the existing design (product) without significant modifications of the original design. This helps to speed the production line design and lower the cost of implementation. Reconfigurability promotes a possibility to re-architect entire design in the existing component (product). In reconfigurable hardware, designs are implemented in Field Programmable Gate Array (FPGA) device. An FPGA is a type of user programmable gate array that can provide the functionality of custom VLSI circuit. It is ideal for rapid-prototyping, where the chip-level design can be implemented without having to fabricate the microchip.
32 Systolic Architecture for LZ77/LZSS A systolic array is composed of a collection of linearly connected processing elements (PEs), where each processing element contains local units for processing control and storage. Total processing time for a complete task is partitioned into steps by a global clock so that the entire array operates synchronously. Several VLSI implementations of systolic array architecture for the LZ77/LZSS proposed by [Gonzalez-Smith and Storer 1985, Storer 1992] are reviewed. For a fixed-sized dictionary of N symbols, the first systolic architecture, called Match Tree architecture, employs 4N 1 processors. The first 2N processors are configured as a systolic array and the remaining 2N 1 processors are arranged as a binary tree (the match tree) attached to the systolic array. It is able to process a new symbol in a given clock cycle (a step) but the drawback is that the data rate (throughput) is dependant on the size of the dictionary. Broadcast/Reduce is the second architecture, which is also built from a systolic array and binary tree of processors. For a dictionary of size N, it consists of 3N 2 processors. N processors are arranged in a systolic array and the remaining processors are configured as two binary trees (called broadcast tree and reduce tree, and each containing N 1 processors) synchronized with the systolic array (as shown in Figure 2.1). It is also able to process a new symbol in every clock cycle, and the data rate is independent to the size of dictionary. The third architecture is called Wrap architecture. It consists of a bidirectional systolic array, where the trees in the previous two architectures are removed. For a dictionary of size N, the systolic array consists of N/2 PEs connected by a two-way communication channel. It allows a new input to be processed on every cycle, but the PEs and data paths of this architecture are more complicated and increase the processing time for each cycle.
33 12 Input Symbol Broadcast Tree Systolic Array Reduce Tree Match Result Figure 2.1: Broadcast/Reduce Systolic Architecture 2.6 Discussion and Summary Data compression techniques are widely used in a variety of applications, especially in data storage and high-speed communications. It can be classified as either lossless or lossy. For many real-world applications, universal data compression is needed because it provides an optimized compression performance to source data with unknown characteristics. LZ77 and LZSS are two such universal data compression techniques. The LZSS algorithm offers a better compression saving than the LZ77. To achieve high-speed operation, a systolic architecture is proposed in the LZSS hardware design.
34 13 As the state-of-art digital design is moving to SoC technology, it is desirable that a design is made reusable and reconfigurable, in order to support rapid integration of new and existing technologies. Hence, this work uses a platform-based approach to develop a library of parameterized components supplemented by the verification support and documentation. For design implementation, an FPGA-based hardware is chosen.
35 CHAPTER III METHODOLOGIES AND TOOLS This chapter describes the design methodology used in this research project. Research procedure, design verification methods, and tools used will be discussed. 3.1 Research Procedure The digital design is carried out by using top-down approach. This involves a hierarchical and modular design methodology that divides a module into submodules and then repeating this operation on the sub-modules until the complexity of the sub-modules is at an appropriately comprehensible level of detail [Weste 1993]. The design is completely described in parameterized VHDL code using UTM VHDLmg [Khalil and Koay 1999]. Leonardo Spectrum is used to synthesize the VHDL code to produce a netlist, and this netlist is compiled using Altera MAX+plus II. For implementation in an Altera FPGA device, the MegaWizard Plug-In Manager from the MAX+plus II is used to infer the predefined component in VHDL code (such as RAM, FIFO buffer and pci_mt64). A black box is then specified in the VHDLmg to represent the predefined module. While the hardware core is being produced, a software implementation of the algorithm is also developed using Microsoft Visual C++. Testbench for the hardware core is generated from the developed software, and it is used to verify the functionality of the hardware core during design verification stage. In application
36 15 development stage, a real-time data compression platform is developed using the Microsoft Visual C++. It serves as the test application to test the feasibility of the hardware core in handling real-time applications, and the performance of the resulting design is benchmarked. 3.2 Timing Simulation in Design Verification A testbench for data compression hardware would consists of inputting source data and output would be compressed data. At the timing simulation, the testbench has to be translated into a form that is equivalent to the I/O format in the MAX+plus II simulator, where both of the source data and the compressed data is translated into a series of byte codes. The translation of the testbench is done using a GUI program written in the Microsoft Visual C++. Based on the translated testbench, test vectors for timing simulation are created using Microsoft Excel, and are then exported to the MAX+plus II simulator. After simulation, the design is verified by comparing the simulation result to the translated output testbench. Because of the difficulty to obtain the most suitable test vector, the timing simulation is very time-consuming, especially when simulating a large testbench that involved a lot of handshaking. Therefore, it is more practical to conduct the design verification by using hardware evaluation system. 3.3 Hardware Evaluation System for Test and Performance Analysis In hardware evaluation system, the design is programmed into FPGA-based hardware. By connecting the FPGA to a host PC, the design can be interfaced to software, which can concurrently generate the required test vectors. Compared to the timing simulation method, this method reduces the time required in the design verification. In this work, Peripheral Component Interconnect (PCI) bus is employed
37 16 to connect the FPGA to the host PC, due to its higher bandwidth in data transfer. The PCI bus is able to transfer 32-bit data via 33 MHz clock, resulting in a throughput of Gbps [PCI 1998]. A PCI controller is required to interface the design to the PCI bus. Altera pci_mt64 MegaCore [Altera 2001] is the PCI controller that can be used as a master or target on the PCI bus. The pci_mt64 provides flexible general-purpose PCI interfaces that can be customized for specific peripheral requirements. To access the design from the software, a PCI device driver is needed. NuMega DriverAgent [Compuware 1999] is used to develop the PCI device driver. The DriverAgent is a program used as an aid to write Windows-based device drivers. It provides a set of Application Programming Interface (API) functions to directly access and control hardware via the interface on various bus standards (such as PCI and ISA). For the performance analysis of the compression core, three types of data sets are used to evaluate the compression ratio achieved in the design. Table 3.1 lists the types of the compression benchmark data, together with their sources. Those data sets are chosen to assess the viability of the selected universal data compression algorithms in handling different types of data. The first data set is commonly used to test the text data compression, while the second data set is used to evaluate the selected algorithms with different probabilities of data. The third data is the standard image data used to test the feasibility of the design in image applications. Table 3.1: Compression Benchmark Data Set No Data Type Source 1 Text 2 Binary 3 Image Calgary corpus (ftp://ftp.cpsc.ucalgary.ca/pub/projects/text.compression.corpus/) UNIX drand48 () pseudorandom number generator ( CCITT Fax Standard Image (ftp://ftp.funet.fi/pub/graphics/misc/test-images/)
38 Hardware and Software Tools To carry out this project, several tools are required. Following is a list of software and hardware tools that are used in this work. 1. PCI-based Hardware Development Board A configurable computing FLEX10KE PCI card from Altera, with EPF10K200SFC672-1 FPGA onboard to be used for verification and test of the design. The design is programmed into the FPGA through JTAG chain. 2. PCI Controller Core A master/target PCI controller core, pci_mt64, from Altera to be used to interface our design to the PCI bus. 3. VHDLmg A graphical VHDL design entry tool, developed by UTM, to be used for hierarchical and modular VHDL code development. 4. Leonardo Spectrum 2001_1a_28_OEM_Altera A VHDL synthesis tool. 5. Altera MAX+plus II 10.1 A design entry, compilation, simulation and implementation software tool. 6. Microsoft Visual C++ Version 6.0 A window based C++ compiler for creating the GUI test program. 7. NuMega DriverAgent A device driver application development tool used to create the PCI device driver, and to provide a set of API to the GUI test program for accessing and controlling the hardware under test. 8. Microsoft Excel 2000 A spreadsheet program to create the test vector for timing simulation.
39 Discussion and Summary The research procedure and design verification method, and the tools used are presented. Timing simulation and hardware evaluation are the two methods used to verify the design. Since design verification process using timing simulation is very time-consuming, especially in handling a large testbench, the hardware evaluation is mainly used for design verification, and performance analysis as well.
40 CHAPTER IV DESIGN OF DATA COMPRESSION HARDWARE In this chapter, the design of the proposed compression core is presented. These include the design of the compression and decompression subsystems, with their specific interfacing modules. The designs are described completely in parameterized VHDL code, so that they can be easily modified to meet any design requirements in a specific target application. 4.1 Overview of Compression Hardware System Figure 4.1 gives a diagram of the proposed compression system. The general operation of compression subsystem is as follows: 1. when compression is ready to receive data, it collects source data from the host system 2. when the host system is ready to receive data, the compression subsystem sends the compressed data to the host system 3. the status is monitored to determine the availability of this compression system. The decompression subsystem works similarly to that of compression subsystem. The functional block diagram of the compression subsystem is shown in Figure 4.2. Compression_Interface is the controller to interface the compression core, labeled Compression_Unit, to the host system. The compression core is the kernel of the design, where the combination of LZSS and Huffman encoding is
41 20 implemented. Figure 4.3 shows the functional block diagram of the decompression subsystem. It consists of a decompression core and an interface controller. HOST SYSTEM (PC) Input Data Status Output Data Compression SYSTEM Compression Subsystem Decompression Subsystem Figure 4.1: Compression System Interface Controller Handshaking Signal Source Data Handshaking Signal Compressed Data Compression Core Figure 4.2: Functional Block Diagram of the Compression Subsystem The design of these modules are presented in the following sections. Section 4.2 explains the design of the Compression_Unit, and Section 4.3 describes the Compression_Interface. Sections 4.4 to 4.5 describe the design of the decompression subsystem.
42 21 Interface Controller Handshaking Signal Compressed Data Handshaking Signal Restored Data Decompression Core Figure 4.3: Functional Block Diagram of the Decompression Subsystem 4.2 Design of Compression Core (Compression_Unit) Figure 4.4 shows the block diagram of the Compression_Unit. It consists of three hierarchical blocks, which are LZSS Coder, Fixed Huffman Coder and Data Packer. All the modules are synchronously clocked. The LZSS Coder performs the LZSS encoding of source data, which is obtained from the Compression_Interface, and the Fixed Huffman Coder re-encodes the LZSS codeword to achieve a better compression ratio. Finally, the Data Packer packs the unary codes from the Fixed Huffman Coder into a fixed-length output packet and sends it to the Compression_Interface. LZSS Coder Fixed Huffman Coder Data Packer Figure 4.4: Block Diagram of Compression_Unit The LZSS Coder is designed to perform the LZSS encoding at high speed. The algorithm of LZSS encoding is mapped into a systolic architecture to obtain a
43 22 data-independent throughput. The Fixed Huffman Coder uses a table-lookup procedure on a predefined code table to perform the Huffman encoding. Beside the task of Huffman encoding, the Fixed Huffman Coder also computes the length of its codeword. This information would be used in the Data Packer to pack all the variable-length Huffman codeword into a fixed-length output packet. The Data Packer consists of two registers and a queue. It allows only one fixed-length data to be sent to the Compression_Interface in every cycle. 4.3 Compression Interface Controller (Compression_Interface) The architecture of the Compression_Interface is shown in Figure 4.5. The controllers are the main unit in this module. A controller is grouped together with two FIFO buffers. Input Controller extracts input to the Compression_Unit, while the Output Controller collects the output from the Compression_Unit. Interface_Data_In FIFO Poll_Amount_In FIFO Input Controller Connect to Host System Connect to Compression Core Interface_Data_Out FIFO Poll_Amount_Out FIFO Output Controller Figure 4.5: Block Diagram of Compression_Interface
44 Design of Decompression Core (Decompression_Unit) Figure 4.6 shows the block diagram of the Decompression_Unit. It consists of a Data Unpacker, a Fixed Huffman Decoder, a LZSS codeword FIFO buffer, and a LZSS Expander. All the modules are synchronously clocked. The Data Unpacker unpacks the fixed-length compressed input packets from the Decompression_Interface, and the unpacked data would be decoded in Fixed Huffman Decoder to restore the LZSS codeword. The FIFO buffer stores the LZSS codeword and supplies it to the LZSS Expander. Finally, the LZSS Expander restores the symbols corresponding to the LZSS codeword and sends it to the Decompression_Interface. Data Unpacker Fixed Huffman Decoder LZSS Codeword FIFO LZSS Expander Figure 4.6: Block Diagram of Decompression_Unit The Data Unpacker consists of a register and a queue. Fixed Huffman Decoder is designed to perform Huffman decoding using a predefined code table. During the decoding process, a Huffman codeword would be extracted from the unpacked data, and the validity of the extracted Huffman codeword is evaluated. If a valid Huffman codeword is detected, the decoder restores the corresponding LZSS codeword from the predefined code table. In addition, the decoder also determines the length of the extracted Huffman codeword. The LZSS Expander is designed to perform the LZSS decoding procedure. It consists of a codeword analyzer, a dictionary, a delay tree, and a symbol generator.
45 Decompression Interface Controller (Decompression_Interface) Figure 4.7 shows the block diagram of the Decompression_Interface, which is similar to the Compression_Interface. The design consists of three FIFO buffers and two controllers. The Input Controller determines whether the decompression hardware is ready to receive a new series of compressed input packets from the host system. The Interface_Data_In FIFO stores the compressed input packets and supplies it to the Decompression_Unit. The Output Controller is grouped together with the Interface_Data_Out FIFO and the Poll_Amount_Out FIFO, and is used to collect output from the Decompression_Unit. Input Controller Interface_Data_In FIFO Connect to Host system Connect to Decompression Core Interface_Data_Out FIFO Poll_Amount_Out FIFO Output Controller Figure 4.7: Block Diagram of Decompression_Interface 4.6 Summary The hierarchical hardware design of the compression core, the decompression core, and their specific interfacing modules have been described. The designed cores are accessible from any host system through proper interfacing procedures.
46 CHAPTER V DESIGN VERIFICATION AND PERFORMANCE ANALYSIS This chapter reports the results obtained on testing the data compression hardware design using the hardware evaluation system. It starts with design verification, followed by the performance analysis. 5.1 Design Verification In the following sub-sections, design verification results using timing simulation and hardware test are presented. Section describes the timing simulation results and Section presents the results obtained from hardware test. Taking into account factors such as resource limitation on the FPGA device and the 32-bit communication interface supported by the PCI device driver, the PCIbased data compression hardware core (labeled as PCI_Chip) is configured according to the parameters as listed in Table 5.1 for hardware implementation. The actual hardware is the EPF10K200SFC672-1 FPGA device. This is the device available on the Altera PCI development board. The device has 9984 logic cells (LCs) grouped into 1248 logic array blocks (LABs) for complex logic functions, and memory bits grouped into 24 embedded array blocks (EABs) for memory functions [Altera 1999].
47 26 Table 5.1: Parameter Configuration of the PCI-based Data Compression Hardware Core PARAMETER VALUE SymbolWIDTH 16 DicLEVEL 7 MAXWIDTH 5 IniDicValue 0 InterfaceWIDTH 5 PollAmount Results of Timing Simulation For timing simulation, the design parameters are also set according to the parameter configuration as listed in Table 5.1. Several simple source data are encoded by the compression core. These encoded data (the compressed data) are then decoded using the decompression core. The restored output is found to be identical to the original source data. This timing simulation results prove that the decompression core is able to restore the original source data, which are compressed by the compression core. Since timing simulation is very time-consuming, only one test vector from the standard data set, paper4.txt (the benchmark data set obtained from Calgary text compression corpus), is chosen for the timing simulation. This test vector is successfully applied, proving correct operation of both compression and decompression cores Results of Hardware Test The complete compression and decompression design is further verified through a PC-based compression hardware evaluation system. All the compression
48 27 data sets as listed in Table 5.2 are tested using this hardware evaluation system. Table 5.2: Standard Data Compression Benchmark Data Sets No Data Set Data Size (Bytes) 1 Paper4.txt Bib.txt E.coli B7-100k B9-100k B97-100k Ccitt1.pbm Ccitt2.pbm Ccitt3.pbm Data Type Text Data Binary Data Image Data Source Calgary corpus UNIX drand48() pseudorandom number generator CCITT Fax Standard Image The mechanism of design verification process in the hardware evaluation system is similar to that in the timing simulation. By comparing the results with that obtained from software compression and decompression, the compression and decompression operations performed by the hardware are proven to be correct. 5.2 Performance Analysis This section describes the performance metrics chosen to evaluate the data compression core, and the corresponding performance data are reported. Section describes the performance metrics, and Sections to report the results of the performance analysis.
49 Performance Metrics Applied The performance of the data compression core is analyzed using three main performance metrics. The first metric is the area evaluation, where the amount of logic cells (LC) and memory bits in the EPF10K200SFC672-1 device used to implement the design is determined. It provides a measure of the required hardware cost when fabricating the design. The second metric is compression saving, which represents the efficiency of the compression algorithm. This is normally given as compression ratio, and the formula used is Compression Ratio (CR) = SCD / SSD (1) where SCD is the size of compressed data and SSD is the size of source data. When compressing several source data, the compression saving is measured using the weighted average compression ratio, and the formula used is Weighted Average CR = SCD / SSD (2) The third metric is operation speed. Speed performance measures the time required to compress source data or to restore a compressed data. It is related to throughput, defined as Throughput (TP) = SSD / T (3) where T is the time to compress data or to restore the source data. The throughput value is affected by the latency of the pipelined implementation. Tables 5.3 and 5.4 summarize the latency of our design. Table 5.3: Latency of Compression Core Function Pipelined Component Latency (clock cycle) Systolic Array 1 LZSS Coding Reduce Tree DicLEVEL Codeword Generator 1 Fixed Huffman Coding Fixed Huffman Coder 1 Pack Data Data Packer 1 Total Latency 4 + DicLEVEL
50 29 Table 5.4: Latency of Decompression Core Function Pipelined Component Latency (clock cycle) Unpack Data Data Unpacker 1 Fixed Huffman Decoding Fixed Huffman Decoder 1 Store LZSS Codeword LZSS Codeword FIFO 1 Codeword Analyzer 1 LZSS Decoding Dictionary 1 Symbol Generator 1 Total Latency 6 Since our design processes a symbol every clock cycle, the formula for calculating throughput is TP = f * SymbolWIDTH (4) where f is clock frequency and SymbolWIDTH is the length of the symbol. From equations (3) and (4), the formula for calculating the time to compress or decompress data is T NoOfSymbol / f (5) where NoOfSymbol = (SSD / SymbolWIDTH). Since typically the value of NoOfSymbol >> total latency, the latency effect on throughput, and thus on time T can be neglected Performance Analysis Clock Speed and Area The maximum clock speed can be obtained from the timing analyzer in MAX+plus II. For the design configured according to the parameters listed in Table 5.1, the area and speed results are summarized in Table 5.5.
51 30 Table 5.5: Area and Speed Evaluation Results Evaluation Compression Decompression Core Core PCI_Chip LE Area Memory Bit Speed Frequency MHz MHz MHz Throughput Mbps Mbps Mbps Performance Analysis Compression Ratio In order to evaluate the compression ratio and computation time, the hardware evaluation system is clocked at the PCI clock frequency, which is 33 MHz, thus resulting in a throughput of 528 Mbps [calculated using equation (4)]. The compression ratio achieved is shown in Table 5.6. A better compression ratio could be achieved if the design is implemented in a larger device (because it can accommodate a larger dictionary size). The time taken to execute compression and decompression operations are shown in Tables 5.7 and 5.8. The hardware evaluation system executes the compression operation faster than that through software running on a 1 GHz PIII CPU with 1 GB RDRAM. The computation time for the evaluation system is longer than the actual compression time for the compression core calculated using equation (5). This is because the whole operation performed by the evaluation system involves a lot of software overhead. These software overheads include: 1. division of processor time among many tasks by the OS 2. data transfer between host PC and the PCI_Chip.
52 31 Table 5.6: Compression Ratio Data Set Compression Ratio Paper4.txt Text Data Bib.txt E.coli B7-100k Binary Data B9-100k B97-100k Ccitt1.pbm Image Data Ccitt2.pbm Ccitt3.pbm Weighted Average Compression Ratio Table 5.7: Computation Time for Compression Compression Time (ms) Data Set No. of Symbols Compression Core (33 MHz) PC-Based Evaluation System (33 MHz) Software (1 GHz PIII CPU) Paper4.txt Bib.txt E.coli B7-100k B9-100k B97-100k Ccitt1.pbm Ccitt2.pbm Ccitt3.pbm Total Computation Time
53 32 Data transfer time for the PCI bus, system memory and hard disk varies according to the mode of operation, that is whether read or write operation. Since the size of a compressed data is smaller than the size of a source data, time for data transfer for compression and decompression is not equal. Table 5.8 shows that the PC-based evaluation system takes a longer time to execute decompression operation. Table 5.8: Computation Time for Decompression Decompression Time (ms) Data Set No of Symbol Decompression Core PC-Based Evaluation System Software (1GHz PIII (33MHz) (33MHz) CPU) Paper4.txt Bib.txt E.coli B7-100k B9-100k B97-100k Ccitt1.pbm Ccitt2.pbm Ccitt3.pbm Total Computation Time Performance Analysis Comparison with Other Implementations The performance of our compression core, compared to other data compression implementations is presented in this section. The comparisons are made in terms of speed and compression ratio. The area comparison is not done due to different process technology in implementing each design.
54 33 Table 5.9 compares the designed hardware against several popular highperformance ASIC or FPGA lossless data compression hardware. It is found that our data compression core offers a competitive performance to currently available ASIC or FPGA data compression hardware. In addition, it is parameterizable to accomodate reusability for future applications. Table 5.9: Comparison of Data Compression Hardware Using Text Data Technische System Design DEVELOPER Universiteit Eindhoven [Benchop Group Loughborough University Universiti Teknologi Malaysia 1997] [Nunez 2000] PROCESS TECHNOLOGY 0.35 micron gate array 0.25 micron FLASH-CMOS FPGA Actel EPF10K200SFC672-1 A500K ProASIC ALGORITHM LZH X-MatchPRO LZSS + Huffman THROUGHPUT 100 Mbps 800 Mbps Mbps PARAMETERIZABLE No No Yes COMPRESSION RATIO Summary The results of design verification and performance analysis are reported in this chapter. With the results obtained from timing simulation and hardware evaluation, the functionality of our design is proved. We also report the performance metrics of our design, and compare the results with other data compression implementations.
55 CHAPTER VI CONCLUSIONS This chapter summarizes the research findings, followed by suggestions for potential future work. 6.1 Concluding Remarks This project describes the design of a data compression embedded core based on the combination of LZSS data compression algorithm and fixed Huffman coding. By using the industry standard language VHDL, the hardware design is parameterizable so that it can meet any requirement in a specific application. The data compression core achieves a throughput of at least 100 Mbps and a high compression ratio. The embedded core offers a data-independent throughput that can process a symbol on every clock cycle. Through its implementation in an Altera FLEX10KE device, the design offers a performance of up to Mbps throughput with an operating frequency of MHz, making it suitable for high-speed and real-time computing applications. Table 6.1 summarizes the general specifications of the data compression embedded core. The data compression embedded core can also be used as a reusable IP core that promotes reusability for future applications. It can integrate with other IP cores (such as a cryptography core) to achieve a more complex application (such as network security system).
56 35 Table 6.1: Specifications of the Data Compression Embedded Core Function Algorithm Architecture Feature Other Features Performance A real-time data compression and decompression engine Combination of Lempel-Ziv-Storer-Szymanski (LZSS) and Huffman Coding Systolic Array, Parallel, Pipelined Parameterizable - Compromise between speed, compression ratio and resource Data-independent throughput Suitable for real-time and high-speed applications For the parameter configuration as listed in Table 5.1, a throughput of Mbps and a compression ratio of is achieved A PC-based hardware evaluation system for the data compression core was also developed. This system integrates a host PC with the embedded core through PCI communication bus. It has a user-friendly GUI program to facilitate the verification of the data compression hardware core. The GUI program can be used to evaluate the efficiency of a chosen configuration. It helps to evaluate the effect of design parameters on the compression ratio. A customized version of the data compression core can be generated using this GUI program. 6.2 Future Work Further enhancements can be done based on the work in this project. Adaptive Huffman coding can be implemented instead of the fixed Huffman coder in the design. The adaptive Huffman coding proposed by [Vitter 1987] can achieve a better compression saving than that of the fixed Huffman coding. A prior knowledge of source data characteristics is not necessary in the adaptive Huffman coding. It can estimate the source data characteristics during compression process,
57 36 and then modifies the code table that it currently uses to generate an adaptive codeword that remains optimal for the current estimates. The Wrap architecture proposed by [Gonzalez-Smith 1982, Storer 1992] is worth studying. The resource utilization in this architecture is reduced. However, the datapaths for this architecture are more complicated, resulting in a lower throughput. The interface block in hardware evaluation platform can be enhanced to suit hardware under test that has different types of I/O. This allows the PC-based hardware evaluation system to test many other cores, not just the compression core.
58 REFERENCES Altera Corporation (1999). Altera Device Data Book. Altera Corporation. Altera Corporation (2001). PCI MegaCore Function User Guide. Altera Corporation. Benchop, L.C. (1997). Lossless Data Compression in VLSI. Technische Universiteit Eindhoven: PhD s Thesis. Compuware Corporation (1999). NuMega DriverAgent Help. Compuware Corporation. Craft, D.J. (1995). ALDC and a Pre-Processor Extension, BLDC, Provide Ultra Fast Compression for General-Purpose and Bit-Mapped Image Data. Data Compression Conference in California Gonzalez-Smith, M.E. and Storer, J.A. (1985). Parallel Algorithms for Data Compression. Journal of the ACM. 32. Hifn (1999) Data Compression Processor. Hi/fn Inc. Huang, W.J. (2000). A Reliable LZ Data Compressor on Reconfigurable Coprocessors. IEEE Trans. Field-Programmable Custom Computing Machines Huffman, D.A. (1952). A Method for the Construction of Minimum Redundancy Codes. Proceedings IRE. Vol Khalil, M. and Koay, K.H. (1999). VHDL Module Generator: A Rapid-prototyping Design Entry Tool for Digital ASICs. Jurnal Teknologi
59 38 Lempel, A. and Ziv, J. (1977). A Universal Algorithm for Sequential Data Compression. IEEE Trans. Information Theory. Vol IT Lempel, A. and Ziv, J. (1978). Compression of Individual Sequence via Variable- Rate Coding. IEEE Trans. Information Theory. Vol IT Lim, H. (2001). Compression as the Best Way to Encrypt Data. HMaxF Ultimate Recursive Lossless Compression Research. Nelson, M. (1996). The Data Compression Book. Hungry Minds Nunez, J.L. and Jones, S. (2000). X-MatchPRO 100 Mbyte/second FPGA-Based Lossless Data Compressor. Proceedings of Design, Automation and Test in Europe PCI Special Interest Group. (1998) PCI Local Bus Specification Revision 2.2. Stauffer, L.M. and Hirschberg, D.S. (1993). Parallel Text Compression. University of California. Storer, J.A. and Szymanski, T.G. (1982). Data Compression in Textual Substitutaion. Journal of ACM 29. Vol Storer, J. A. (1992). Image and Text Compression. Kluwer Academic Publishers. Weste, N.H.E. (1993). Principles of CMOS VLSI Design. Addison-Wesley Vitter, J.S. (1987). Design and Analysis of Dynamic Huffman Codes. Journal of the ACM 34. Vol Zito-Wolf, R.J. (1990). A Broadcast/Reduce Architecture for High-Speed Data Compression. Proc. Second IEEE Symp. on Parallel and Distributed Processing
7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
Computer Systems Structure Input/Output
Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices
Open Flow Controller and Switch Datasheet
Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development
Architectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
Introduction to Digital System Design
Introduction to Digital System Design Chapter 1 1 Outline 1. Why Digital? 2. Device Technologies 3. System Representation 4. Abstraction 5. Development Tasks 6. Development Flow Chapter 1 2 1. Why Digital
JOB AGENT MANAGEMENT SYSTEM LU CHUN LING. A thesis submitted in partial fulfillment of. the requirements for the award of the degree of
JOB AGENT MANAGEMENT SYSTEM LU CHUN LING A thesis submitted in partial fulfillment of the requirements for the award of the degree of Bachelor of Computer Sciences (Software Engineering) FACULTY OF COMPUTER
Digital Systems Design! Lecture 1 - Introduction!!
ECE 3401! Digital Systems Design! Lecture 1 - Introduction!! Course Basics Classes: Tu/Th 11-12:15, ITE 127 Instructor Mohammad Tehranipoor Office hours: T 1-2pm, or upon appointments @ ITE 441 Email:
Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] NIOS II 1 1 What is Nios II? Altera s Second Generation
Image Compression through DCT and Huffman Coding Technique
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul
Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data
White Paper Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data What You Will Learn Financial market technology is advancing at a rapid pace. The integration of
Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001
Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering
Qsys and IP Core Integration
Qsys and IP Core Integration Prof. David Lariviere Columbia University Spring 2014 Overview What are IP Cores? Altera Design Tools for using and integrating IP Cores Overview of various IP Core Interconnect
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana
RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition
RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition A Tutorial Approach James O. Hamblen Georgia Institute of Technology Michael D. Furman Georgia Institute of Technology KLUWER ACADEMIC PUBLISHERS Boston
Testing of Digital System-on- Chip (SoC)
Testing of Digital System-on- Chip (SoC) 1 Outline of the Talk Introduction to system-on-chip (SoC) design Approaches to SoC design SoC test requirements and challenges Core test wrapper P1500 core test
A Generic Network Interface Architecture for a Networked Processor Array (NePA)
A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction
Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip
Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC
Enhancing High-Speed Telecommunications Networks with FEC
White Paper Enhancing High-Speed Telecommunications Networks with FEC As the demand for high-bandwidth telecommunications channels increases, service providers and equipment manufacturers must deliver
Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng
Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption
Serial port interface for microcontroller embedded into integrated power meter
Serial port interface for microcontroller embedded into integrated power meter Mr. Borisav Jovanović, Prof. dr. Predrag Petković, Prof. dr. Milunka Damnjanović, Faculty of Electronic Engineering Nis, Serbia
Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic
Aims and Objectives E 3.05 Digital System Design Peter Cheung Department of Electrical & Electronic Engineering Imperial College London URL: www.ee.ic.ac.uk/pcheung/ E-mail: [email protected] How to go
Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education
Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,
INDUSTRIAL CONTROL TECHNOLOGY. A Handbook for Engineers and Researchers. Peng Zhang. Beijing Normal University, People's Republic of China
INDUSTRIAL CONTROL TECHNOLOGY A Handbook for Engineers and Researchers Peng Zhang Beijing Normal University, People's Republic of China Ш I William I Andrew Norwich, NY, USA Contents Preface 1 Sensors
Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com
Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and
FPGA-based MapReduce Framework for Machine Learning
FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China
FPGA Prototyping Primer
FPGA Prototyping Primer S2C Inc. 1735 Technology Drive, Suite 620 San Jose, CA 95110, USA Tel: +1 408 213 8818 Fax: +1 408 213 8821 www.s2cinc.com What is FPGA prototyping? FPGA prototyping is the methodology
9/14/2011 14.9.2011 8:38
Algorithms and Implementation Platforms for Wireless Communications TLT-9706/ TKT-9636 (Seminar Course) BASICS OF FIELD PROGRAMMABLE GATE ARRAYS Waqar Hussain [email protected] Department of Computer
What is a System on a Chip?
What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex
Rapid System Prototyping with FPGAs
Rapid System Prototyping with FPGAs By R.C. Coferand Benjamin F. Harding AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Newnes is an imprint of
Design of a High-speed and large-capacity NAND Flash storage system based on Fiber Acquisition
Design of a High-speed and large-capacity NAND Flash storage system based on Fiber Acquisition Qing Li, Shanqing Hu * School of Information and Electronic Beijing Institute of Technology Beijing, China
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
CLIENT SERVER APPLICATION FOR SERVER FARM PERFORMANCE MONITORING ABDIRASHID HASSAN ABDI
ii CLIENT SERVER APPLICATION FOR SERVER FARM PERFORMANCE MONITORING ABDIRASHID HASSAN ABDI A project submitted in partial fulfillment of the requirements for the award of the degree of Master of Computer
Design of a High Speed Communications Link Using Field Programmable Gate Arrays
Customer-Authored Application Note AC103 Design of a High Speed Communications Link Using Field Programmable Gate Arrays Amy Lovelace, Technical Staff Engineer Alcatel Network Systems Introduction A communication
Wan Accelerators: Optimizing Network Traffic with Compression. Bartosz Agas, Marvin Germar & Christopher Tran
Wan Accelerators: Optimizing Network Traffic with Compression Bartosz Agas, Marvin Germar & Christopher Tran Introduction A WAN accelerator is an appliance that can maximize the services of a point-to-point(ptp)
DEVELOP AND DESIGN SHEMATIC DIAGRAM AND MECHANISM ON ONE SEATER DRAG BUGGY MUHAMMAD IBRAHIM B MD NUJID
DEVELOP AND DESIGN SHEMATIC DIAGRAM AND MECHANISM ON ONE SEATER DRAG BUGGY MUHAMMAD IBRAHIM B MD NUJID A report in partial fulfillment of the requirements For award of the Diploma of Mechanical Engineering
Applying the Benefits of Network on a Chip Architecture to FPGA System Design
Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.
Car Rental Management System (CRMS) Lee Chen Yong
Car Rental Management System (CRMS) Lee Chen Yong This report is submitted in partial fulfillment of the requirement for the Bachelor of Computer Science (Database Management) FACULTY OF INFORMATION AND
System-on. on-chip Design Flow. Prof. Jouni Tomberg Tampere University of Technology Institute of Digital and Computer Systems. jouni.tomberg@tut.
System-on on-chip Design Flow Prof. Jouni Tomberg Tampere University of Technology Institute of Digital and Computer Systems [email protected] 26.03.2003 Jouni Tomberg / TUT 1 SoC - How and with whom?
Networking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
Computer Organization & Architecture Lecture #19
Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of
Quartus II Software Design Series : Foundation. Digitale Signalverarbeitung mit FPGA. Digitale Signalverarbeitung mit FPGA (DSF) Quartus II 1
(DSF) Quartus II Stand: Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] Quartus II 1 Quartus II Software Design Series : Foundation 2007 Altera
Cisco ROSA Video Service Manager (VSM) Version 05.03
Data Sheet Cisco ROSA Video Service Manager (VSM) Version 05.03 The Cisco ROSA Video Service Management (VSM) system provides service providers with a complete, powerful solution for the management of
SPAM FILTERING USING BAYESIAN TECHNIQUE BASED ON INDEPENDENT FEATURE SELECTION MASURAH BINTI MOHAMAD
SPAM FILTERING USING BAYESIAN TECHNIQUE BASED ON INDEPENDENT FEATURE SELECTION MASURAH BINTI MOHAMAD A project report submitted in partial fulfillment of the requirements for the award of the degree of
IMPROVING SERVICE REUSABILITY USING ENTERPRISE SERVICE BUS AND BUSINESS PROCESS EXECUTION LANGUAGE AKO ABUBAKR JAAFAR
IMPROVING SERVICE REUSABILITY USING ENTERPRISE SERVICE BUS AND BUSINESS PROCESS EXECUTION LANGUAGE AKO ABUBAKR JAAFAR A project report submitted in partial fulfillment of the requirements for the award
What is LOG Storm and what is it useful for?
What is LOG Storm and what is it useful for? LOG Storm is a high-speed digital data logger used for recording and analyzing the activity from embedded electronic systems digital bus and data lines. It
Switch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
Networking Remote-Controlled Moving Image Monitoring System
Networking Remote-Controlled Moving Image Monitoring System First Prize Networking Remote-Controlled Moving Image Monitoring System Institution: Participants: Instructor: National Chung Hsing University
Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin
BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394
Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level
System: User s View System Components: High Level View Input Output 1 System: Motherboard Level 2 Components: Interconnection I/O MEMORY 3 4 Organization Registers ALU CU 5 6 1 Input/Output I/O MEMORY
The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group
The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links Filippo Costa on behalf of the ALICE DAQ group DATE software 2 DATE (ALICE Data Acquisition and Test Environment) ALICE is a
DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL
IJVD: 3(1), 2012, pp. 15-20 DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL Suvarna A. Jadhav 1 and U.L. Bombale 2 1,2 Department of Technology Shivaji university, Kolhapur, 1 E-mail: [email protected]
NIOS II Based Embedded Web Server Development for Networking Applications
NIOS II Based Embedded Web Server Development for Networking Applications 1 Sheetal Bhoyar, 2 Dr. D. V. Padole 1 Research Scholar, G. H. Raisoni College of Engineering, Nagpur, India 2 Professor, G. H.
Systolic Computing. Fundamentals
Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW
evm Virtualization Platform for Windows
B A C K G R O U N D E R evm Virtualization Platform for Windows Host your Embedded OS and Windows on a Single Hardware Platform using Intel Virtualization Technology April, 2008 TenAsys Corporation 1400
ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT
216 ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT *P.Nirmalkumar, **J.Raja Paul Perinbam, @S.Ravi and #B.Rajan *Research Scholar,
HELP DESK SYSTEM IZZAT HAFIFI BIN AHMAD ARIZA
HELP DESK SYSTEM IZZAT HAFIFI BIN AHMAD ARIZA A thesis submitted in fulfillment of the requirement for the awards of Bachelor of Computer Science (Computer Systems & Networking) with Honours Faculty of
ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU
ESE566 REPORT3 Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU Nov 19th, 2002 ABSTRACT: In this report, we discuss several recent published papers on design methodologies of core-based
SDLC Controller. Documentation. Design File Formats. Verification
January 15, 2004 Product Specification 11 Stonewall Court Woodcliff Lake, NJ 07677 USA Phone: +1-201-391-8300 Fax: +1-201-391-8694 E-mail: [email protected] URL: www.cast-inc.com Features AllianceCORE
Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31
Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the
Analysis of Compression Algorithms for Program Data
Analysis of Compression Algorithms for Program Data Matthew Simpson, Clemson University with Dr. Rajeev Barua and Surupa Biswas, University of Maryland 12 August 3 Abstract Insufficient available memory
Chapter 3 ATM and Multimedia Traffic
In the middle of the 1980, the telecommunications world started the design of a network technology that could act as a great unifier to support all digital services, including low-speed telephony and very
4. H.323 Components. VOIP, Version 1.6e T.O.P. BusinessInteractive GmbH Page 1 of 19
4. H.323 Components VOIP, Version 1.6e T.O.P. BusinessInteractive GmbH Page 1 of 19 4.1 H.323 Terminals (1/2)...3 4.1 H.323 Terminals (2/2)...4 4.1.1 The software IP phone (1/2)...5 4.1.1 The software
High Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
Pre-tested System-on-Chip Design. Accelerates PLD Development
Pre-tested System-on-Chip Design Accelerates PLD Development March 2010 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 Pre-tested
OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect
OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE Guillène Ribière, CEO, System Architect Problem Statement Low Performances on Hardware Accelerated Encryption: Max Measured 10MBps Expectations: 90 MBps
FPGAs for High-Performance DSP Applications
White Paper FPGAs for High-Performance DSP Applications This white paper compares the performance of DSP applications in Altera FPGAs with popular DSP processors as well as competitive FPGA offerings.
FPGA Music Project. Matthew R. Guthaus. Department of Computer Engineering, University of California Santa Cruz http://vlsida.soe.ucsc.
Department of Computer Engineering, University of California Santa Cruz http://vlsida.soe.ucsc.edu Biographic Info 2006 PhD, University of Michigan in Electrical Engineering 2003-2005 Statistical Physical
INFOCOMMUNICATIONS JOURNAL Mojette Transform Software Hardware Implementations and its Applications
FEBRUARY 011 VOLUME III NUMBER 1 39 INFOCOMMUNICATIONS JOURNAL 40 FEBRUARY 011 VOLUME III NUMBER 1 FEBRUARY 011 VOLUME III NUMBER 1 41 INFOCOMMUNICATIONS JOURNAL b1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a1 10
Introduction to PCI Express Positioning Information
Introduction to PCI Express Positioning Information Main PCI Express is the latest development in PCI to support adapters and devices. The technology is aimed at multiple market segments, meaning that
10-/100-Mbps Ethernet Media Access Controller (MAC) Core
10-/100-Mbps Ethernet Media Access Controller (MAC) Core Preliminary Product Brief December 1998 Description The Ethernet Media Access Controller (MAC) core is a high-performance core with a low gate count,
Electronic system-level development: Finding the right mix of solutions for the right mix of engineers.
Electronic system-level development: Finding the right mix of solutions for the right mix of engineers. Nowadays, System Engineers are placed in the centre of two antagonist flows: microelectronic systems
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP SWAPNA S 2013 EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP A
LogiCORE IP AXI Performance Monitor v2.00.a
LogiCORE IP AXI Performance Monitor v2.00.a Product Guide Table of Contents IP Facts Chapter 1: Overview Target Technology................................................................. 9 Applications......................................................................
Universal Flash Storage: Mobilize Your Data
White Paper Universal Flash Storage: Mobilize Your Data Executive Summary The explosive growth in portable devices over the past decade continues to challenge manufacturers wishing to add memory to their
SDR Architecture. Introduction. Figure 1.1 SDR Forum High Level Functional Model. Contributed by Lee Pucker, Spectrum Signal Processing
SDR Architecture Contributed by Lee Pucker, Spectrum Signal Processing Introduction Software defined radio (SDR) is an enabling technology, applicable across a wide range of areas within the wireless industry,
Multichannel Voice over Internet Protocol Applications on the CARMEL DSP
Multichannel Voice over Internet Protocol Applications on the CARMEL DSP 1 Introduction Multichannel DSP applications continue to demand increasing numbers of channels and equivalently greater DSP performance
Computer Network. Interconnected collection of autonomous computers that are able to exchange information
Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.
MASTER S PROJECT REPORT SUMMARY
MASTER S PROJECT REPORT SUMMARY LEVEL OF SERVICE (LOS) FOR MULTILANE HIGHWAY AND ROAD ACCIDENT INFORMATION SYSTEM DEVELOPMENT OF BATU PAHAT AREA (LORIS) Prepared by: Mohd Ezree Bin Abdullah Master of Engineering
EC313 - VHDL State Machine Example
EC313 - VHDL State Machine Example One of the best ways to learn how to code is seeing a working example. Below is an example of a Roulette Table Wheel. Essentially Roulette is a game that selects a random
MODELING AND SIMULATION OF SINGLE PHASE INVERTER WITH PWM USING MATLAB/SIMULINK AZUAN BIN ALIAS
MODELING AND SIMULATION OF SINGLE PHASE INVERTER WITH PWM USING MATLAB/SIMULINK AZUAN BIN ALIAS This thesis is submitted as partial fulfillment of the requirement for the award of the Bachelor Degree Electrical
Non-Data Aided Carrier Offset Compensation for SDR Implementation
Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center
Software Defined Radio Architecture for NASA s Space Communications
From July 2007 High Frequency Electronics Copyright 2007 Summit Technical Media Software Defined Radio Architecture for NASA s Space Communications By Maximilian C. Scardelletti, Richard C. Reinhart, Monty
Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio
Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio 1 Anuradha S. Deshmukh, 2 Prof. M. N. Thakare, 3 Prof.G.D.Korde 1 M.Tech (VLSI) III rd sem Student, 2 Assistant Professor(Selection
Implementing Offline Digital Video Storage using XenData Software
using XenData Software XenData software manages data tape drives, optionally combined with a tape library, on a Windows Server 2003 platform to create an attractive offline storage solution for professional
LIGHTNING AS A NEW RENEWABLE ENERGY SOURCE SARAVANA KUMAR A/L ARPUTHASAMY UNIVERSITI TEKNOLOGI MALAYSIA
LIGHTNING AS A NEW RENEWABLE ENERGY SOURCE SARAVANA KUMAR A/L ARPUTHASAMY UNIVERSITI TEKNOLOGI MALAYSIA LIGHTNING AS A NEW RENEWABLE ENERGY SOURCE SARAVANA KUMAR A/L ARPUTHASAMY A project report submitted
Model-based system-on-chip design on Altera and Xilinx platforms
CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT Model-based system-on-chip design on Altera and Xilinx platforms Ronald Grootelaar, System Architect [email protected] Agenda 3T Company profile Technology
CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE
CHAPTER 5 71 FINITE STATE MACHINE FOR LOOKUP ENGINE 5.1 INTRODUCTION Finite State Machines (FSMs) are important components of digital systems. Therefore, techniques for area efficiency and fast implementation
Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.
Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how
ALL-AIO-2321P ZERO CLIENT
ALL-AIO-2321P ZERO CLIENT PCoIP AIO Zero Client The PCoIPTM technology is designed to deliver a user s desktop from a centralized host PC or server with an immaculate, uncompromised end user experience
A Framework for Automatic Generation of Configuration Files for a Custom Hardware/Software RTOS
A Framework for Automatic Generation of Configuration Files for a Custom Hardware/Software Jaehwan Lee, Kyeong Keol Ryu and Vincent John Mooney III School of Electrical and Computer Engineering Georgia
Special FEATURE. By Heinrich Munz
Special FEATURE By Heinrich Munz Heinrich Munz of KUKA Roboter discusses in this article how to bring Microsoft Windows CE and WindowsXP together on the same PC. He discusses system and application requirements,
White Paper 40-nm FPGAs and the Defense Electronic Design Organization
White Paper 40-nm FPGAs and the Defense Electronic Design Organization Introduction With Altera s introduction of 40-nm FPGAs, the design domains of military electronics that can be addressed with programmable
Cisco Application Networking for Citrix Presentation Server
Cisco Application Networking for Citrix Presentation Server Faster Site Navigation, Less Bandwidth and Server Processing, and Greater Availability for Global Deployments What You Will Learn To address
Understanding the Performance of an X550 11-User Environment
Understanding the Performance of an X550 11-User Environment Overview NComputing's desktop virtualization technology enables significantly lower computing costs by letting multiple users share a single
COMPUTER HARDWARE. Input- Output and Communication Memory Systems
COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)
SAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
