CRAIS: A Crossbar-Based Interconnection Scheme on FPGA for Big Data

Size: px
Start display at page:

Download "CRAIS: A Crossbar-Based Interconnection Scheme on FPGA for Big Data"

Transcription

1 Wang C, Li X, Zhou XH. CRAIS: A crossbar-based interconnection scheme on FPGA for big data. JOURNAL OF COM- PUTER SCIENCE AND TECHNOLOGY 30(1): Jan DOI /s CRAIS: A Crossbar-Based Interconnection Scheme on FPGA for Big Data Chao Wang 1 ( 王 超 ), Member, CCF, ACM, IEEE Xi Li 1,2, ( 李 曦 ), Senior Member, CCF, Member, ACM, IEEE, and Xue-Hai Zhou 1 ( 周 学 海 ), Senior Member, CCF, Member, ACM, IEEE 1 School of Computer Science, University of Science and Technology of China, Hefei , China 2 School of Software Engineering, University of Science and Technology of China, Suzhou , China {cswang, llxx, Received July 15, 2014; revised December 12, Abstract On-chip interconnection has posed significant challenges in multiprocessor system on chip (MPSoC) design paradigm, especially in big data era. With respect to the state-of-the-art, crossbar-based interconnection methodologies are still efficient for FPGA-based small-scale heterogeneous MPSoCs. This paper proposes a crossbar-based on-chip interconnection scheme, named CRAIS. CRAIS utilizes reconfigurable crossbar interconnections between microprocessors and intellectual property (IP) cores in MPSoC. The hardware interconnection can be dynamically reconfigured during execution. Empirical results on FPGA prototype demonstrate that CRAIS can achieve more than 7X speedup compared with the state-of-the-art StarNet approach, while it only utilizes 21% 35% hardware resources of StarNet. Keywords interconnect, big data, crossbar, multiprocessor system on chip 1 Introduction With the wide application of cloud computing, mobile Internet and networking, social and enterprise big data pervades our daily life. Diversely, the vast amounts of data generated at an explosion-like speed and the fast growth rate of the global data are unprecedented [1]. It makes our life more convenient and at the same time also poses significant challenges to computer researchers. The novel big data applications are generally considered with the following four characteristics. Volume. Data magnitude develops from TB to PB and even to ZB; this calls for special care from data mining and processing techniques. Variety. More and more data is semi-structured or unstructured, such as social media, web pages, images, and videos. Velocity. Not only the existed volume but also the generation speedup of data is tremendous, and requires high-speed data transfer. Real-Time Requirements and Low Value Density. The efficiency of data mining should be maintained. Above characteristics will inevitably bring new challenges. Massive data makes data security increasingly difficult. What is worse, large data processing and analysis capabilities are far below the ideal level. It needs the capacity of high-speed information transmission and real-time analysis and processing. Furthermore, low value density feature makes data mining more difficult. The collection and analysis of big data is quite time consuming, which poses a significant challenge for real-time processing techniques. If we want to summarize some useful information from the data sea, we have to collect all the data that may be potentially useful. However, it costs much time to transfer massive Regular Paper Special Section on Computer Architecture and Systems for Big Data This work was supported by the National Natural Science Foundation of China under Grant Nos , , , , and , the Natural Science Foundation of Jiangsu Province of China under Grant No. SBK , the Fundamental Research Funds for the Central Universities of China under Grant No. WK , the Open Project of State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (CAS) under Grant No. CARCH201407, and the Strategic Priority Research Program of CAS under Grant No. XDA Corresponding Author 2015 Springer Science + Business Media, LLC & Science Press, China

2 Chao Wang et al.: CRAIS: Crossbar-Based Interconnection Scheme on FPGA for Big Data 85 amounts of data. Therefore accelerating the big data application is becoming more and more important. Meanwhile, over the last few years, Fled Programmable Gate Arrays (FPGA) [2], Coarse Grained Reconfigurable Architecture (CGRA) [3] and Graphic Processing Unit (GPU) based heterogeneous computing modes have been regarded as efficient acceleration platforms for data-intensive computing domains, such as machine learning [4-5] and genome sequencing [6]. As an alternative to big data computation using conventional symmetric multiprocessors, heterogeneous computing could efficiently utilize the benefits among a wide range of computing resources. For these two approaches, custom and reconfigurable accelerators processors using FPGA could exploit much more computational performance as well as less power consumption, making it a creditable way for changing and diverse big data applications. As a result, very high throughput and energy efficiency can be potentially achieved using reconfigurable architectures through FPGA devices, such as RAMP [7] and MOLEN [8]. This is particularly true for most dataflow processing and stream applications where data-level parallelism is regarded as the major evaluation metric. On the other end of the spectrum, reconfigurable parallel computing techniques are enabling fast prototyping of dataflow applications ranging from single case study to a series of similar applications. It is a natural manner to regard each microprocessor, digital signal processor or Intellectual Property (IP) accelerator as a devoted function unit; hence the multiprocessor system on chip (MPSoC) solution can effectively facilitate researchers in high-level abstraction and platform-based system design. As such, there exists tremendous potential for applying reconfigurable computing techniques to dataflow related fields, including biomedical systems, social networks, machine learning, and deep belief networks. MPSoC is one of the most promising architectures in the very large scale integration (VLSI) computing domain. Heterogeneous MPSoCs, which consist of both microprocessors and IP cores, are becoming dominant in diverse embedded applications. However, due to limited flexibility and unsatisfying performance, onchip data communication has been regarded as one of the major bottlenecks of MPSoC architecture [9]. The wide-spread big data applications also demand to deal with the side-effect of reconfigurable computing carefully. It is becoming increasingly insufferable to design a feasible system with such increasing complexity and explosive data scale, especially with existing unsatisfying FPGA platforms. In traditional SoC design, on-chip interconnection primarily considers bus-based schemes for single-processor architectures [10]. It was predicted a long time ago that current bus-based architecture would run out of performance as SoC grows with more and more hardware resources. In addition, the flexibility of bus-based architectures is significantly constrained due to fixed data path interconnections [11]. To address this problem, many novel interconnection approaches have been presented in the past decade [9], such as network on chip (NoC), crossbar- and ring-based interconnections. These approaches have been widely used in designing scalable on-chip communication schemes for better structure and modularity. Among these state-of-the-art studies, crossbar-based interconnection can fit in different applications at runtime; therefore it has been widely applied in heterogeneous research paradigms, in particular for small-scale embedded systems. It has been proved that crossbar-based schemes have been quite successful in commercial products as efficient on-chip communications. However, few of them use crossbar to fast prototype the interconnection on FPGA-based MPSoC. As the number of processors can be dynamically reconfigured on FPGA, the on-chip interconnections need to be adjusted simultaneously. In particular, it is desirable to download different program contexts into the same hardware. This feature also enables the system to accept new hardware logic after fabrication. Furthermore, due to hardware reuse, the reconfiguration characteristic can significantly decrease hardware cost, as well as shorten the time-to-market (TTM) of MPSoCs. In order to tackle above challenges, the motivation of this work is to propose an FPGA-based adaptive interconnection based on the crossbar for big data applications. The framework is able to facilitate researchers to construct a fast prototyping evaluation platform, which integrates the hardware accelerators and peripherals with high bandwidth and acceptable hardware cost. We propose CRAIS, a crossbar-based interconnection scheme on heterogeneous MPSoC. We claim the following contributions. 1) We present CRAIS interconnection with runtime reconfiguration. CRAIS is a hardware crossbar based interconnection module that can be reconfigured during task execution period automatically. Furthermore, it can buffer the tasks if they cannot run immediately. 2) CRAIS is an adaptable interconnection. Due to various and fast changing big data applications, the in-

3 86 J. Comput. Sci. & Technol., Jan. 2015, Vol.30, No.1 terconnection scheme should be adaptive. To achieve this goal, we design the CRAIS module that can be adapted to: data paths between two custom processors or IP cores, the reconfiguration of functional IP cores on FPGA, and the reconfiguration of data width for IP cores. 3) We propose an FPGA prototype with the processing working flow. A hardware prototype has been conducted on state-of-the-art FPGA development board with dynamic partial reconfigurable technologies. In the comparative performance study, area cost and power consumption are taken into evaluation for CRAIS. Based on the reconfigurable hardware, the working flow is described in state transfers. The remainder of this paper is organized as follows. Section 2 discusses the related work. Section 3 proposes the CRAIS hardware architecture and components. The work flow is described in Section 4. Section 5 introduces the prototype system and experimental results. Finally Section 6 concludes the paper. 2 Related Work In the past decades, on-chip interconnection has been one of the major research areas in MPSoC architecture for embedded systems. In this section, we give a brief overview about the state-of-the-art related work. First, the most popular interconnect architectures include hierarchical bus, crossbar, mesh, NoC, etc. Among these interconnections, bus-based schemes are quite efficient in traditional uniprocessor architectures. However, the major drawback of bus-based schemes is that they cannot be modified after fabrication, which brings significant challenges in flexible architecture design. Meanwhile, growing with the demands of flexibility, adaptive interconnections and customized architectures have been dominant in the interconnection research filed. For example, both [12] and [13] utilize NoC to organize adaptive interconnect on-chip architectures. FreeRider [14] presents a non-local adaptive NoC routing with packet-carried propagation of congestion information. Schleupen et al. [15] used FLEXIO and SERDES bus interfaces between the CELL processor and the FPGA. The EIB ring bus of the IBM CELL processor [16] allows 12 bus transactions to take place simultaneously. Meanwhile, mesh-based NoC is becoming popular in state-of-the-art approaches due to its high flexibility and scalability. For instance, Intel has developed an 80-core processor that uses a 2D mesh network and thereby achieves 16 bytes (per router) per cycle at a 5- GHz operating frequency [17]. Besides, ring-based NoC provides an alternative option to fast communication on-chip. For example, ring road NoC [18] has implemented multiple packet-switched rings with onion-style floor plan. Other literatures such as Torus Ring [19] and Hybrid Ring/Mesh Based NoC topology [20] have been quite successful in the market. However, a major weakness of the NoC-based interconnections is the inevitable hardware cost and high design complexity, which dramatically increases the burden of architectural designers, as well as delays the TTM of the chip products. In order to evaluate the interconnection concepts, fast prototype design and verification based on FPGA are becoming very popular [21]. For example, Kumar et al. [22] referred to an FPGA design flow to provide MPSoC generation from high level descriptions. Similarly, an approach of application mapping on FPGA is proposed in [23]. Faruque et al. [24] proposed a runtime adaptive on-chip communication scheme, but it only modifies the buffer allocation, leaving the communication channel unchanged. Zheng et al. [25] introduced a hybrid communication reconfigurable network on chip for MPSoC, which uses a TDMA shared bus for inter-cluster communication. Gohringer and Becker [26] presented a high performance reconfigurable multi-processor. The concept is quite feasible for real hardware FPGA, but the paper focuses on design flow and software tool chains. Recently a star network is illustrated on FPGA [27], but the interconnection could only be reconfigured statically. Wang et al. [28] proposed a crossbar-based interconnection scheme, but the scheme does not present detailed processing flow. To sum up, crossbar-based interconnection schemes have been widely used in MPSoC architectures. Table 1 lists the typical state-of-the-art interconnection schemes. However, the weakness of current researches points out the motivation of this paper. 1) The volume of the data has renewed the interest of crossbar-related research areas. Heterogeneous accelerators have been demonstrated as the efficient solution for modern data-intensive applications. As a consequence, it would be necessary and essential to build a fast prototyping system using FPGA-based hardware platforms. In order to tackle this problem, the onchip interconnection scheme would be a primary issue to consider. For the fast-changing and various accelerators, the flexibility of crossbar-based interconnection favors the scalable demands.

4 Chao Wang et al.: CRAIS: Crossbar-Based Interconnection Scheme on FPGA for Big Data 87 Table 1. Typical State-of-the-Art Interconnection Schemes Type Reference Description Weakness NoC [12] and [13] Utilize NoC to organize adaptive Over area utilization [15] FLEXIO and SERDES bus Specific to CELL processor [14] Non-local adaptive Homogeneous processor Ring [16] EIB ring bus allows 12-bus One-way connection [19] Torus ring Homogeneous processor [18] Multiple packet-switched rings Homogeneous processor Mesh [17] High throughput 2D mesh network For large-scale processors [20] Hybrid ring/mesh based NoC Homogeneous processor [29] 36-core with snoopy coherence Homogeneous processor Crossbar [28] Crossbar-based interconnection Lack of processing details StarNet [30] StarNet for FPGA based Not adaptive at runtime 2) Another consideration is the area utilization of the FPGA chip. It has been a big issue of the crossbarbased interconnection schemes as they consume much area of the chip. However, the growing density of the FPGA chip now enables the integration of much more hardware accelerators. Therefore, the high-bandwidth crossbar can be applied for the connection among the heterogeneous accelerators. 3) In conventional crossbar architectures, most studies focus on either the special purpose processors, such as the network processors [31], or improving the I/O bandwidth [32]. However, as one of the most important features of FPGA, the adaptability should be considered. In this paper, we have taken account of the selfreconfiguration of the crossbar interconnection, which corresponds to the various and changing big data applications. 3 Architecture and Concepts To get the high level view of the CRAIS interconnection, we first propose a heterogeneous MPSoC platform, on which the processor and IP cores are connected through the CRAIS module. Software tasks run on microprocessors, while hardware tasks are spawned from processors to the hardware IP blocks (HWIP) for hardware acceleration. 3.1 Infrastructure and Components The system architecture overview is illustrated in Fig.1. In particular, the heterogeneous MPSoC platform consists of following components. Processor Array. Multiple symmetric microprocessors are integrated to run general software applications. To take the performance into consideration, each processor has a pair of one-way fast (First-in-Firstout) based peer-to-peer links connected to CRAIS, for data sending and receiving transactions. IP Core Array. Heterogeneous HWIPs are integrated into the platform to run specific tasks with hardware acceleration. These IP cores are also connected to the CRAIS module with -based links. CRAIS Interconnection. All the function units (FUs, including processors and HWIP) are connected to the crossbar-based CRAIS module. The interconnection paths can be dynamically reconfigured according to various circumstances. There are three key modules implemented inside CRAIS. First, a configuration controller is responsible for operating the configuration process. Second, a queue module is shared by the processor array and the IP core array is employed to buffer the tasks temporarily when current task cannot run immediately. Last but not least, a semaphore indicates whether the system is busy, providing the guidance for path configuration. On-Chip Interconnect and Peripherals. In order to support system debugging, selected peripherals are integrated, such as DDR DRAM controller, Ethernet controller, system ACE controller, UART, timer and interrupt controller. All these peripheral modules are integrated with on-chip interconnects, such as CoreConnect PLB and OPB. 3.2 Communication Protocol and Arbitration As introduced above, CRAIS is in charge of the configuration for data communication between FUs. It is clear that each FU is connected to CRAIS with a pair of one-way based links. To ensure the execution performance and flexibility, Xilinx Fast Simplex Links (FSL) [33] are utilized as the basic channels for demonstration.

5 88 J. Comput. Sci. & Technol., Jan. 2015, Vol.30, No.1 IP Core Array Hardware IP Core Hardware IP Core Hardware IP Core Queue Task Queue Configuration Crossbar Semaphore Processor Array Processor Processor Processor On-Chip Interconnect System ACE Ethernet DDR DRAM Interrupt Timer UART Fig.1. Heterogeneous MPSoC hardware platform. The CRAIS module is utilized to connect all the processors and IP cores Based Connection The processor needs to offload its tasks to HWIP before execution. Compared to traditional bus interconnections, structure is a better choice for burst large-scale data transfer. Xilinx FSL bus offers a uniform communication interface between Microblaze core and hardware blocks through the buffer. Fig.2 illustrates the principle of the FSL bus signals to transfer messages. FSL provides a straightforward data path from the registers files in master-slave manners. When the tasks are dispatched, the microprocessor plays as the master module, while the HWIP is regarded as the slave, vice versa. Therefore, the read and write bus operations can be directly invoked through special instructions. For Xilinx Microblaze processor instruction set architecture (ISA), specific instructions GET (get, nget, cget and ncget) and PUT FSL_M_Clk FSL_M_Data FSL_M_Control FSL_M_Write FSL_M_Full Master Signals... FSL_S_Clk FSL_S_Data FSL_S_Control FSL_S_Read FSL_S_Exists Slave Signals Fig.2. FSL bus signals. FSL is composed of two-way links [33]. (put, nput, cput, ncput) instructions are designed to operate data movement transactions between processor register files and FSL buffers. Similarly, each HWIP is also connected to CRAIS with a pair of FSLs. The data between CRAIS and HWIP is driven by the hard wired signals Communication Protocol with FSL Extensions In order to utilize FSL for runtime reconfiguration, the communication transaction is divided into two phases: configuration phase and data transfer phase. Since default FSL bus transactions cannot behave a two-stage data communication, we reuse some FSL signals for a custom communication protocol. In the configuration stage, the configuration request is sent through FSL M Control signal, which indicates whether the current bus transaction is at the configuration stage or an actual data transfer movement. Simultaneously, FSL M Data signal refers to the identification of the target HWIP. After the CRAIS module receives task requests from a specific processor or IP core, it first checks for the busy status of the target IP core and then responses through FSL interface. Thereafter, the system shall be waken and enter the data transfer stage for general data burst communication. Similarly, when the task execution on IP block is finished, results are also returned via a two-step bus

6 Chao Wang et al.: CRAIS: Crossbar-Based Interconnection Scheme on FPGA for Big Data 89 transaction procedure. In contrast to the task requests stages, the transaction of the data returning is raised by the HWIP that plays a master role during the specific results-returning operation Arbitration and Queue Due to the FPGA area limitations, there are only a small number of FUs that can be integrated into the system; therefore an arbitration function module is implemented into the configuration controller in case of the condition when more than one master module requests a same target at the same time. In this situation, a result-returning operation always gets higher priority than the new task-issuing requests. Moreover, if multiple requests arrive at CRAIS at the same time (e.g., multiple IP cores return results to a same original microprocessor), a round-robin arbitration scheme should identify the priorities dynamically. If the data transfer of current task is not finished, other competitive tasks will be buffered into the queue, including its master ID, the slave ID and additional configurations (e.g., data scale and burst mode flag). When the data transfer of the on-going task is finished, the head request of the queue should be released automatically. At this time, the configuration controller in CRAIS should operate the interconnection reconfiguration according to the specific task ID stored in the queue. After the data path of CRAIS has been reconfigured, the stored request can be released from the queue, while the corresponding semaphore will be updated simultaneously. 4 Work Flow Based on the MPSoC architecture and CRAIS components proposed in previous section, this section describes the detailed work flow of CRAIS interconnect reconfiguration and data transfer process. 4.1 Interconnection Reconfiguration Fig.3 illustrates the processing flow chart for both the new requests and the stored requests. Fig.3(a) illustrates the processing flow for the new request, while Fig.3(b) refers to the processing flow for the stored request. All the stored requests are released from the queue, when the current request is finished. As the consequence, the difference is that the CRAIS module is able to guarantee the availability of the crossbar when a stored request is released. In particular, the algorithm is introduced in Algorithm 1 in six steps Algorithm 1. Adaptive Interconnection Process Step 1: FU sends a task request to CRAIS module Step 2: CRAIS receives request Detect whether the target slave module is available If the target is free goto Step 3 Else goto Step 5 Step 3: Set the path between the master and the slave Update the status of Semaphore Return and acknowledgement signal Step 4: Data are transferred through CRAIS data path In the end of the transmission, goto Step 6 Step 5: Check whether Queue is full If Queue is full No more requests are acceptable, return busy Else Store the requests into Queue Step 6: Check whether Queue is empty If Queue is empty Exit Else Release the head task of Queue Decode task information goto the Step 3 Each FU (including the microprocessor and HWIP) could act as the master module during data transfer. What is more, every master should send the configuration parameters to the configu- ration controller in prior to the path configuration. On receiving a request, CRAIS first detects whether the target slave module is available (step2: lines 3 7). If the target is available, the path could be configured directly (step 4: line 11); otherwise there is a running task using the data path, which means the current task should be stored into the queue. However, before storing the tasks into the queue, CRAIS must check whether the queue is full (step 5: lines 13 17). The task can be stored only when the queue is not full. When the data transfer is finished, the head task of the queue is released and CRAIS will configure the data path accordingly (step 6: lines 18 24). 4.2 State Chart Based on the hardware connection, each task will undergo different transferred states from when the task arrives until it is finished. Fig.4 describes the state transfer chart. In particular, there are three states of CRAIS, including BUSY, IDLE and FREE states. At startup, all the FUs are available and the queue is empty. In this situation, there are no tasks running in the system. Therefore, the system is in the IDLE state. When a new request arrives, CRAIS enters into the BUSY state. The semaphore is set, and then the crossbar is configured.

7 90 J. Comput. Sci. & Technol., Jan. 2015, Vol.30, No.1 New Request Stored Request Push Task into Queue No Queue Full? Yes Return Busy Yes IP Occupied? No Configure Matrix Modify Semaphore No Configure Matrix Modify Semaphore Pop Queue End Pop Queue Data Complete Queue Empty? No Data Complete Queue Empty? Yes No Yes End End (a) (b) Fig.3. Process flow chart. (a) Processing flow chart for the new request. (b) Processing flow chart for the stored request. 5.1 Simulation Platform Queue Empty IDLE Set Semaphone Clear Queue Fig.4. FREE New Request Queue Not Empty BUSY Finish Set Semaphone Configure Matrix Finite state machine. New Request When the task is finished, CRAIS enters into the FREE state. When entering the FREE state, the queue module will be checked: if all the tasks inside the queue are finished, then CRAIS goes to the IDLE state; otherwise, release the head task of the queue and go back to the BUSY state. In order to evaluate the system performance, CRAIS is implemented at the register-transfer level (RTL) with Verilog HDL, and simulated in Vivado and ModelSim tools. Fig.5 illustrates the simulation diagram of CRAIS module in ModelSim. The task execution is divided into four phases: configuration, data transfer (task sent), execution, and data transfer (results received). The two transfer phases represent input data sent and results returned, respectively. The reconfiguration stage before the data transfer is less than 10 clock cycles on FPGA, which is insignificant for most hardware executions (e.g., more than cycles for JPEG encoding IP cores). 5 Performances and Experimental Results This section introduces the prototyping system and experimental results. We have implemented CRAIS with Verilog HDL, and we use both simulation platform and Xilinx FPGA prototyping boards in the experiments. Fig.5. Configuration Data Execution Data Simulation results in ModelSim software tool.

8 Chao Wang et al.: CRAIS: Crossbar-Based Interconnection Scheme on FPGA for Big Data FPGA Prototype with Case Studies To evaluate CRAIS in the real FPGA prototyping platform, we have implemented the hardware with three Microblaze processors and two HWIPs as case studies. In all the cases, the design is synthesized on a Xilinx XUPV5 development board, equipped with a Xilinx Virtex-5 XC5VLX110T FPGA. The test cases are described as below. Inverse Discrete Cosine Transform (IDCT ) implements a standard IDCT block as a single FSL slave module. Advanced Encrypt Standard (AES) is an important encryption algorithm. We have implemented both encrypt and decrypt modules respectively. Along with the hardware modules, we also employ three Microblaze processors for software computing tasks and control tasks execution. Microblaze is a PowerPC architecture compatible soft core with RISC ISA, and also is optimized for Xilinx FPGA implementation. In the experiment, we use Microblaze version 7.20.a (with the clock frequency 125 MHz, local memory of 8 KB, no configurable instruction or data cache). Each Microblaze accesses its private BRAM block via the local memory bus (LMB). The whole platform is constructed and set up using Xilinx Vivado Design Suite The block diagram is illustrated in Fig.6. Furthermore, to facilitate the interconnection, all the above modules are connected to the CRAIS mod- IDCT IDCT IP Core AES AES IP Core CRAIS DLMB ILMB CRAIS bram_block SLMB lmb_bram_ lmb_bram_ if_cntlr if_cntlr DLMB SPLB SLMB ILMB MPLB microblaze_1 micorblaze_1 bram_block SLMB lmb_bram_ if_cntlr DLMB SPLB SLMB lmb_bram_ if_cntlr ILMB MPLB microblaze_1 Computing Microblaze bram_block SLMB SLMB lmb_bram_ lmb_bram_ if_cntlr if_cntlr DLMB SPLB ILMB MPLB microblaze_1 mb_plb PLB (Processor Local Bus ) Computing Microblaze Slaves of mb_ plb Computing Microblaze xps_ intc xps_ timer xps_uartlite Interrupt Timer RS232 SPECS EDK VERSION 11.3 ARCH VIRTEX 5 PART XC5VLX50T Fig. 6. Prototyping system constructed on Xilinx V5 FPGA. Specification: EDK version: 11.3, architecture: VIRTEX 5, part: XC5VLX50T.

9 92 J. Comput. Sci. & Technol., Jan. 2015, Vol.30, No.1 ule via FSL-based channels. Furthermore, the interrupt controller, timer controller, and RS232 controller are connected to the three Microblaze processors via PLB bus. 5.3 Performance Evaluation In order to achieve a comprehensive benefit against current state-of-the-art literatures, we have compared the transfer time of the tasks with the StarNet interconnection [30], which utilizes Xilinx Fast Simplex Links (FSL) as basic structure to connect the scheduler with heterogeneous processing elements, including processors and hardware IP cores. Blocking and nonblocking application interfaces are provided for highlevel programming. Taking advantage of reconfigurable attributions of FPGA, the architecture employs star network for on-chip connection between the central Microblaze processor and the heterogeneous IP cores. Every processor/ip is connected to the scheduler through a pair of one-way peer-to-peer channel, and FSL is utilized for demonstration. The advantage of the star network is that when the hardware accelerator is reconfigured, only corresponding links need to be adjusted. The affect of the network reconfiguration keeps the rest of the MPSoC system unchanged, which can greatly increase the flexibility and help construct the prototype system. However, due to that all the data should be transferred via the central processor, the unavoidable latency is an important issue to be considered. Another weakness is that the runtime adaptability is not enabled and the communication needs to be arbitrated when more accelerators are integrated. The speedup is measured as a major performance metric, as illustrated in Fig.7. Average Time (cycle) StarNet CRAIS Without Config. Time CRAIS with Config. Time Data Amount (Τ40 bytes) Fig.7. CRAIS vs StartNet: the average speedup up to more than 7.0X. The x-axis in Fig.7 represents the data amount, and the y-axis stands for the average transfer time (ATT) for each word. In this figure, we have measured the transfer time of StarNet and the CRAIS module. Please note that the CRAIS module needs to configure the data paths in prior to the data transfer between the processor and accelerators. The configuration time is independent of the data to be transferred. Here, we use two sub-figures illustrating the transfer time with/without the configuration time respectively. ATT is reduced when the data amount is increasing for both approaches, due to that the scheduling overhead is becoming insignificant. When the data scale is relatively small (e.g., from 4 to 16 words), the speedup is approximately between 2X and 4X. Meanwhile, when the data amount is increased to more than 128 words, the speedup is regulated to 7X on average as the average transfer time of both CRAIS and StarNet goes flat. The experimental results depict that the CRAIS approach can get a better performance than the stateof-the-art StarNet based interconnection schemes. Because the configuration time is independent of the data to be transferred, we can conclude that the gap between the ATT with/without configuration time is decreased when there is more data to be transferred. In particular, the gap is approximately 10% for the 4 40 bytes (25 cycles without configuration vs 27.5 cycles with configuration), and no more than 1% for bytes (7 cycles without configuration vs 7.06 cycles with configuration). Besides the speedup against StarNet, the performance with different bandwidths of CRAIS indicates a significant metric of scalability. Therefore, we evaluate the total transfer time with different data amounts. The experimental results are illustrated in Fig.8. The execution time grows from 100 cycles to 896 cycles, when the data amount is increased from 4 40 bytes to bytes. When the data amount is less than Total Time (cycle) CRAIS Total Time Data Amount (Τ40 bytes) Fig.8. Scalability analysis of the CRAIS module, which reveals the total transfer time for data amount from 4 40 bytes to bytes.

10 Chao Wang et al.: CRAIS: Crossbar-Based Interconnection Scheme on FPGA for Big Data bytes, the CRAIS is not scalable enough to achieve a flat average transfer time. Meanwhile, when CRAIS is employed for the situation with more than bytes, we can conclude that the CRAIS demonstrates nice scalability. The average transfer time stays at 7 cycles approximately. Hardware Used Slice Registers Slice LUTs IO Blocks 5.4 Hardware Overhead To evaluate hardware resources utilization, we synthesized the prototyping system in Xilinx Vivado tool set with different bandwidths of CRAIS scheme (8, 16, and 32, respectively). Fig.9 presents the hardware cost of CRAIS module of different data widths. When the data width is 64, the module cannot be synthesized in the devices because of the limitation of IO blocks resources. The XC5VLX110T FPGA device contains only 640 IO blocks. At the data width of 64 bits, each of connected blocks has at least 64 2=128 bits, excluding the request, control and status signals. For demonstration, we implemented five FUs, which have 128 5=640 signals in total. Apparently, the signals exceed the upper bound of the FPGA device. Therefore in following experiments, we set the CRAIS width to 16. Hardware Used Fig.9. widths Slice Registers Slice LUTs IO Blocks Data Width (bit) Hardware cost for CRAIS module of different data Fig.10 compares the hardware cost between CRAIS and other modules including adder (AES) and IDCT cores. With respect to the computational HWIP like adder and IDCT, the chip area utilization of the IP cores is mainly consumed by logic signals. Moreover, CRAIS has at least five IP block connection interfaces, which makes HWIP occupy even more hardware resources. 0 AES IDCT Module 16-Bit CRAIS Fig.10. Hardware cost comparison of the CRAIS module and computational modules. We measure the hardware cost of the entire system including three Microblaze processors, CRAIS, FSL, PLB bus, two HWIP memory blocks and peripherals. The report is listed in Table 2, which represents that the entire system takes approximately 10% of slice registers, 8% of block RAM, and 9% of slice LUTs. For the width of 16, 19% of I/O ports are taken. The DSP resources are occupied by the IDCT IP core using DSP implementation for acceleration. Resource Registers LUTs BRAM IOBs BUFGs BUFIOs DSP48Es Table 2. Hareware Cost of the Entire System Status Percentage We compare the system hardware cost of CRAIS with the StarNet interconnection presented in [30], as depicted in Fig.11. Except for the slice registers, all the rest hardware resources utilization of CRAIS costs is much less than that of the state-of-the-art StartNet. In particular, the slice LUTs and the DSP take only 35% and 21% of that of StarNet, respectively. 5.5 Power Consumption We use the Xilinx XPower Analyser to analyze the power consumption of the CRAIS prototyping system. Fig.12 illustrates the power estimation of CRAIS modules with 8-bit, 16-bit, 32-bit width, adder and IDCT, respectively. The results depict that the quiescent power of each module is approximately similar, about 1.2 W. Meanwhile, as the data paths are configured at runtime, the CRAIS module also has dynamic

11 94 J. Comput. Sci. & Technol., Jan. 2015, Vol.30, No.1 power, and the bandwidth takes a strong effect on dynamic power, ranging from 0.05 W to 0.15 W. 80% Table 4. Quiescent and Dynamic Power Consumption Quiescent Dynamic Total Junction Power(W) Power(W) Power(W) Temperature( C) % 60% 50% 40% 30% 20% 10% 0% Slice Registers CRAIS StarNet Slice LUTs RAMB36 - EXPs External IOBs DSP48Es Moreover, we find that the power consumption for the CRAIS-based MPSoC system is unnoticeable, compared to microprocessors. For example, MIPS R14000, which is fabricated in the 0.13µ technology, occupies 204 mm 2 and consumes 17 W at 500 MHz. In contrast to these modern processors, the entire CRAIS-based MPSoC system, which includes three Microblaze processors and two HWIPs, takes only 36% power consumption of the processor itself. Fig.11. System hardware cost comparison between CRAIS and StarNet interconnection. Power (W) Fig Bit CRAIS Power 16-Bit CRAIS Dynamic Power 32-Bit CRAIS Modules AES Total IDCT Power estimation of the CRAIS modules. Table 3 lists the power consumption of the FPGAbased MPSoC prototype. From the table, we can conclude that a large proportional (23.41%) power is consumed by the I/O blocks. Table 4 illustrates that the quiescent power takes 93.45% of the total power of the MPSoC system, which also proves that the running configuration is power saving compared to the static processor and HWIP modules. Table 3. Power Consumption of Prototyping System Name Power(W) Percentage IOs Clocks BRAMs PLLs Signals Logic Total Conclusions This paper presented a reconfigurable crossbarbased on-chip interconnection scheme CRAIS, which could reconfigure the on-chip interconnection data paths on FPGA at run time. Each microprocessor/hwip is connected to CRAIS with a pair of peerto-peer -based links. Empirical results on an FPGA prototyping system demonstrate that CRAIS supports runtime interconnect adaptation efficiently. The speedup for average data transfer time against the state-of-the-art StarNet approach achieves more than 7.0X. Furthermore, CRAIS only utilizes 21% 35% hardware resources and insignificant power utilization of StarNet. The initial results are promising, but there are still directions worth pursuing. First, as the area of single FPGA chip is seriously constrained, we are seeking for highly efficient interconnections between multiple FPGA chips. Second, we are also extending the CRAIS architecture to support dynamic reconfiguration of IP cores. Acknowledgement The authors deeply appreciate anonymous reviewers for their insightful comments and suggestions. References [1] Howe D, Costanzo M, Fey P et al. Big data: The future of biocuration. Nature, 2008, 455: [2] Singh S. Computing without processors. Communications of ACM, 2011, 54(8): [3] Huang Y, Ienne P, Temam O et al. Elastic CGRAs. In Proc. ACM/SIGDA International Symposium on Field Programmable Gate Arrays, February 2013, pp

12 Chao Wang et al.: CRAIS: Crossbar-Based Interconnection Scheme on FPGA for Big Data 95 [4] Chen T, Du Z, Sun N et al. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine learning. In Proc. the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2014, pp [5] Chen Y, Luo T, Liu S et al. DaDianNao: A machinelearning supercomputer. In Proc. the 47th IEEE/ACM International Symposium on Microarchitecture, December [6] Wang C, Li X, Chen P et al. Heterogeneous cloud framework for big data genome sequencing. IEEE/ACM Transactions on Computational Biology and Bioinformatics, (preprint) [7] Wawrzynek J, Patterson D, Oskin M et al. RAMP: Research accelerator for multiple processors. IEEE Micro, 2007, 27(2): [8] Panainte E, Bertels K, Vassiliadis S. The Molen compiler for reconfigurable processors. ACM Transactions on Embedded Computing Systems, 2007, 6(1): Article No. 6. [9] Benini L, De Micheli G. Networks on chips: A new SoC paradigm. IEEE Computer, 2002, 35(1): [10] Wolf W, Jerraya A, Martin G. Multiprocessor systemon-chip (MPSoC) technology. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2008, 27(10): [11] Dally W J, Towles B. Route, packets, not wires: On-chip interconnection networks. In Proc. the 38th Annual Design Automation Conference. June 2001, pp [12] Tuan V, Katsura N, Matsutani H et al. Evaluation of a multicore reconfigurable architecture with variable core sizes. In Proc. IEEE International Symposium on Parallel & Distributed Processing, May [13] Tuan V M, Amano H. A mapping method for multi-process execution on dynamically reconfigurable processors. In Proc. the International Conference on Field-Programmable Technology, December 2007, pp [14] Liu S, Chen T, Li L et al. FreeRider: Non-local adaptive network-on-chip routing with packet-carried propagation of congestion information. IEEE Transactions on Parallel and Distributed Systems, (to be appeared). [15] Schleupen K, Lelaich S, Mannion R et al. Dynamic partial FPGA reconfiguration in a prototype microprocessor system. In Proc. the International Conference on Field Programmable Logic and Applications, August 2007, pp [16] Kistler M, Perrone M, Petrini F. Cell multiprocessor communication network: Built for speed. IEEE Micro, 2006, 26(3): [17] Hoskote Y, Vangal S, Singh A et al. A 5-GHz mesh interconnect for a teraflops processor. IEEE Micro, 2007, 27(5): [18] Samuelsson H, Kumar S. Ring road NoC architecture. In Proc. Norchip Conference, November 2004, pp [19] Kwark J W, Jhon C S. Torus ring: Improving performance of interconnection network by modifying hierarchical ring. Parallel Computing, 2007, 33(1):2-20. [20] Bourduas S, Zilic Z. A hybrid ring/mesh interconnect for network-on-chip using hierarchical rings for global routing. In Proc. the 1st International Symposium on Networks-on- Chip, May 2007, pp [21] Madsen J, Stidsen T, Kjaerulf P et al. Multi-objective design space exploration of embedded system platforms. In Proc. the IFIP TC 10 Working Conference on Distributed and Parallel Embedded Systems, October 2006, pp [22] Kumar A, Hansson A, Huisken J et al. An FPGA design flow for reconfigurable network-based multi-processor systems on chip. In Proc. the Design, Automation & Test in Europe Conference & Exhibition, April [23] Dittmann F, Gotz M, Rettberg A. Model and methodology for the synthesis of heterogeneous and partially reconfigurable systems. In Proc. IEEE International Parallel and Distributed Processing Symposium, March [24] Faruque M, Ebi T, Henkel J. Runtime adaptive on-chip communication scheme. In Proc. IEEE/ACM International Conference on Computer-Aided Design, November 2007, pp [25] Zheng L, Cai J, Du M et al. Hybrid communication reconfigurable network on chip for MPSoC. In Proc. the 24th IEEE International Conference on Advanced Information Networking and Applications, April 2010, pp [26] Gohringer D, Becker J. High performance reconfigurable multi-processor-based computing on FPGAs. In Proc. IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum, April [27] Wang C, Zhang J, Zhou X et al. A flexible high speed star network based on peer to peer links on FPGA. In Proc. the 9th IEEE International Symposium on Parallel and Distributed Processing with Applications, May 2011, pp [28] Wang C, Li X, Zhou X et al. CRAIS: A crossbar based adaptive interconnection scheme. In Proc. the 8th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications, March 2012, pp [29] Daya B, Chen C, Subramanian S et al. SCORPIO: A 36- core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering. In Proc. the 41st ACM/IEEE International Symposium on Computer Architecture, June 2014, pp [30] Wang C, Li X, Zhang J et al. A star network approach in heterogeneous multiprocessors system on chip. The Journal of Supercomputing, 2012, 62(3): [31] Freitas H, Carvalho M, Amaral A et al. Reconfigurable crossbar switch architecture for network processors. In Proc. IEEE International Symposium on Circuits and Systems, May [32] Young S, Alfke P, Fewer C et al. A high I/O reconfigurable crossbar switch. In Proc. the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, April 2003, pp [33] Rosinger H P. Connecting customized IP to the MicroBlaze soft processor using the Fast Simplex Link (FSL) channel. XILINX R XAPP529, May notes/xapp529.pdf, Dec

13 96 J. Comput. Sci. & Technol., Jan. 2015, Vol.30, No.1 Chao Wang received his B.S. and Ph.D. degrees from University of Science and Technology of China, Hefei, in 2006 and 2011 respectively, both in computer science. He is an associate processor with School of Computer Science, University of Science and Technology of China, Hefei. His research interests focus on multicore and reconfigurable computing. He has authored more than 60 publications and patents. He is now an editor board member of MICPRO, IET CDT, IJHPSA and IJBPIM. He serves as the publicity chair of HiPEAC 2015 and ISPA 2014, and the guest editor for TCBB and IJPP. He is a member of CCF, ACM, and IEEE. Xi Li received his Ph.D. degree in computer science from University of Science and Technology of China (USTC), Hefei, in Now he is the deputy vice dean of School of Software Engineering, USTC, Suzhou, and also an associate professor of School of Computer Science, USTC, Hefei. He directs the research programs in Embedded System Laboratory of the School of Computer Science, examining various aspects of embedded system with the focus on performance, availability, flexibility and energy efficiency. Prof. Li is a senior member of CCF and a member of ACM and IEEE. Xue-Hai Zhou received his B.S., M.S., and Ph.D. degrees all in computer science from School of Computer Science, University of Science and Technology of China, Hefei. Now he is a professor in the School of Computer Science, University of Science and Technology of China. Prof. Zhou has published over 100 international journal and conference articles in the areas of software engineering, operating systems, and distributed computing systems. Prof. Zhou is a senior member of CCF and a member of ACM and IEEE. Xue-Hai Zhou received his B.S., M.S., and Ph.D. degrees all in computer science from School of Computer Science, University of Science and Technology of China, Hefei. Now he is a professor in the School of Computer Science, University of Science and Technology of China. Prof. Zhou has published over 100 international journal and conference articles in the areas of software engineering, operating systems, E.

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

- Nishad Nerurkar. - Aniket Mhatre

- Nishad Nerurkar. - Aniket Mhatre - Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,

More information

LogiCORE IP AXI Performance Monitor v2.00.a

LogiCORE IP AXI Performance Monitor v2.00.a LogiCORE IP AXI Performance Monitor v2.00.a Product Guide Table of Contents IP Facts Chapter 1: Overview Target Technology................................................................. 9 Applications......................................................................

More information

A Dynamic Link Allocation Router

A Dynamic Link Allocation Router A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection

More information

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

OpenSoC Fabric: On-Chip Network Generator

OpenSoC Fabric: On-Chip Network Generator OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation

More information

Design and Verification of Nine port Network Router

Design and Verification of Nine port Network Router Design and Verification of Nine port Network Router G. Sri Lakshmi 1, A Ganga Mani 2 1 Assistant Professor, Department of Electronics and Communication Engineering, Pragathi Engineering College, Andhra

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip

Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Introduction to Exploration and Optimization of Multiprocessor Embedded Architectures based on Networks On-Chip Cristina SILVANO silvano@elet.polimi.it Politecnico di Milano, Milano (Italy) Talk Outline

More information

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

Accelerate Cloud Computing with the Xilinx Zynq SoC

Accelerate Cloud Computing with the Xilinx Zynq SoC X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce

More information

FlexPath Network Processor

FlexPath Network Processor FlexPath Network Processor Rainer Ohlendorf Thomas Wild Andreas Herkersdorf Prof. Dr. Andreas Herkersdorf Arcisstraße 21 80290 München http://www.lis.ei.tum.de Agenda FlexPath Introduction Work Packages

More information

Automated Method to Generate Bitstream Intellectual Property Cores for Virtex FPGAs

Automated Method to Generate Bitstream Intellectual Property Cores for Virtex FPGAs Automated Method to Generate Bitstream Intellectual Property Cores for Virtex FPGAs Edson L. Horta 1 and John W. Lockwood 2 1 Department of Electronic Engineering, Laboratory of Integrated Systems, EPUSP

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches 3D On-chip Data Center Networks Using Circuit Switches and Packet Switches Takahide Ikeda Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka,

More information

On-Chip Communications Network Report

On-Chip Communications Network Report On-Chip Communications Network Report ABSTRACT This report covers the results of an independent, blind worldwide survey covering on-chip communications networks (OCCN), defined as is the entire interconnect

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

Introduction to System-on-Chip

Introduction to System-on-Chip Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: Embedded Systems - , Raj Kamal, Publs.: McGraw-Hill Education Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,

More information

Operating System Support for Multiprocessor Systems-on-Chip

Operating System Support for Multiprocessor Systems-on-Chip Operating System Support for Multiprocessor Systems-on-Chip Dr. Gabriel marchesan almeida Agenda. Introduction. Adaptive System + Shop Architecture. Preliminary Results. Perspectives & Conclusions Dr.

More information

Multiprocessor System-on-Chip

Multiprocessor System-on-Chip http://www.artistembedded.org/fp6/ ARTIST Workshop at DATE 06 W4: Design Issues in Distributed, CommunicationCentric Systems Modelling Networked Embedded Systems: From MPSoC to Sensor Networks Jan Madsen

More information

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION

CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION CONSTRAINT RANDOM VERIFICATION OF NETWORK ROUTER FOR SYSTEM ON CHIP APPLICATION T.S Ghouse Basha 1, P. Santhamma 2, S. Santhi 3 1 Associate Professor & Head, Department Electronic & Communication Engineering,

More information

Optimizing Configuration and Application Mapping for MPSoC Architectures

Optimizing Configuration and Application Mapping for MPSoC Architectures Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : Sebastien.Le-Beux@polymtl.ca 1 Multi-Processor Systems on Chip (MPSoC) Design Trends

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.

More information

Chapter 2 Heterogeneous Multicore Architecture

Chapter 2 Heterogeneous Multicore Architecture Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is

More information

NIOS II Based Embedded Web Server Development for Networking Applications

NIOS II Based Embedded Web Server Development for Networking Applications NIOS II Based Embedded Web Server Development for Networking Applications 1 Sheetal Bhoyar, 2 Dr. D. V. Padole 1 Research Scholar, G. H. Raisoni College of Engineering, Nagpur, India 2 Professor, G. H.

More information

Load Balancing & DFS Primitives for Efficient Multicore Applications

Load Balancing & DFS Primitives for Efficient Multicore Applications Load Balancing & DFS Primitives for Efficient Multicore Applications M. Grammatikakis, A. Papagrigoriou, P. Petrakis, G. Kornaros, I. Christophorakis TEI of Crete This work is implemented through the Operational

More information

Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC)

Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC) Real-time Processor Interconnection Network for FPGA-based Multiprocessor System-on-Chip (MPSoC) Stefan Aust, Harald Richter Department of Computer Science Clausthal University of Technology Julius-Albert-Str.

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet Professor Jiann-Liang Chen Friday, September 23, 2011 Wireless Networks and Evolutional Communications Laboratory

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

FPGA-based MapReduce Framework for Machine Learning

FPGA-based MapReduce Framework for Machine Learning FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China

More information

Pre-tested System-on-Chip Design. Accelerates PLD Development

Pre-tested System-on-Chip Design. Accelerates PLD Development Pre-tested System-on-Chip Design Accelerates PLD Development March 2010 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 Pre-tested

More information

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Shaomeng Li, Jim Tørresen, Oddvar Søråsen Department of Informatics University of Oslo N-0316 Oslo, Norway {shaomenl, jimtoer,

More information

Reconfigurable System-on-Chip Design

Reconfigurable System-on-Chip Design Reconfigurable System-on-Chip Design MITCHELL MYJAK Senior Research Engineer Pacific Northwest National Laboratory PNNL-SA-93202 31 January 2013 1 About Me Biography BSEE, University of Portland, 2002

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Topology adaptive network-on-chip design and implementation

Topology adaptive network-on-chip design and implementation Topology adaptive network-on-chip design and implementation T.A. Bartic, J.-Y. Mignolet, V. Nollet, T. Marescaux, D. Verkest, S. Vernalde and R. Lauwereins Abstract: Network-on-chip designs promise to

More information

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Design of a High Speed Communications Link Using Field Programmable Gate Arrays Customer-Authored Application Note AC103 Design of a High Speed Communications Link Using Field Programmable Gate Arrays Amy Lovelace, Technical Staff Engineer Alcatel Network Systems Introduction A communication

More information

Implementation and Design of AES S-Box on FPGA

Implementation and Design of AES S-Box on FPGA International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 232-9364, ISSN (Print): 232-9356 Volume 3 Issue ǁ Jan. 25 ǁ PP.9-4 Implementation and Design of AES S-Box on FPGA Chandrasekhar

More information

Design Cycle for Microprocessors

Design Cycle for Microprocessors Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types

More information

Real-Time Operating Systems for MPSoCs

Real-Time Operating Systems for MPSoCs Real-Time Operating Systems for MPSoCs Hiroyuki Tomiyama Graduate School of Information Science Nagoya University http://member.acm.org/~hiroyuki MPSoC 2009 1 Contributors Hiroaki Takada Director and Professor

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2 Reconfigurable Architectures Chapter 3.2 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design Coarse-Grained Reconfigurable Devices Recall: 1. Brief Historically development (Estrin Fix-Plus

More information

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw Feb. 2013 Course Overview

More information

SoC IP Interfaces and Infrastructure A Hybrid Approach

SoC IP Interfaces and Infrastructure A Hybrid Approach SoC IP Interfaces and Infrastructure A Hybrid Approach Cary Robins, Shannon Hill ChipWrights, Inc. ABSTRACT System-On-Chip (SoC) designs incorporate more and more Intellectual Property (IP) with each year.

More information

Extending the Power of FPGAs. Salil Raje, Xilinx

Extending the Power of FPGAs. Salil Raje, Xilinx Extending the Power of FPGAs Salil Raje, Xilinx Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of

More information

High-Level Synthesis for FPGA Designs

High-Level Synthesis for FPGA Designs High-Level Synthesis for FPGA Designs BRINGING BRINGING YOU YOU THE THE NEXT NEXT LEVEL LEVEL IN IN EMBEDDED EMBEDDED DEVELOPMENT DEVELOPMENT Frank de Bont Trainer consultant Cereslaan 10b 5384 VT Heesch

More information

Computer System Design. System-on-Chip

Computer System Design. System-on-Chip Brochure More information from http://www.researchandmarkets.com/reports/2171000/ Computer System Design. System-on-Chip Description: The next generation of computer system designers will be less concerned

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Architetture di bus per. on-chip motivations

Architetture di bus per. on-chip motivations Architetture di bus per System-On On-Chip Massimo Bocchi Corso di Architettura dei Sistemi Integrati A.A. 2002/2003 System-on on-chip motivations 400 300 200 100 0 19971999 2001 2003 2005 2007 2009 Transistors

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip www.ijcsi.org 241 A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip Ahmed A. El Badry 1 and Mohamed A. Abd El Ghany 2 1 Communications Engineering Dept., German University in Cairo,

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

An FPGA Design Flow for Reconfigurable Network-Based Multi-Processor Systems on Chip

An FPGA Design Flow for Reconfigurable Network-Based Multi-Processor Systems on Chip An FPGA Design Flow for Reconfigurable Network-Based Multi-Processor Systems on Chip Akash Kumar 1, Andreas Hansson 1, Jos Huisken 2, and Henk Corporaal 1 1 Eindhoven University of Technology, Eindhoven,

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Development of a Research-oriented Wireless System for Human Performance Monitoring

Development of a Research-oriented Wireless System for Human Performance Monitoring Development of a Research-oriented Wireless System for Human Performance Monitoring by Jonathan Hill ECE Dept., Univ. of Hartford jmhill@hartford.edu Majdi Atallah ECE Dept., Univ. of Hartford atallah@hartford.edu

More information

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP SWAPNA S 2013 EFFICIENT ROUTER DESIGN FOR NETWORK ON CHIP A

More information

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:

More information

synthesizer called C Compatible Architecture Prototyper(CCAP).

synthesizer called C Compatible Architecture Prototyper(CCAP). Speed Improvement of AES Encryption using hardware accelerators synthesized by C Compatible Architecture Prototyper(CCAP) Hiroyuki KANBARA Takayuki NAKATANI Naoto UMEHARA Nagisa ISHIURA Hiroyuki TOMIYAMA

More information

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution

More information

SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS

SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS IN SPACE APPLICATIONS Session: Networks and Protocols Long Paper B. Osterloh, H. Michalik, B. Fiethe

More information

NORTHEASTERN UNIVERSITY Graduate School of Engineering

NORTHEASTERN UNIVERSITY Graduate School of Engineering NORTHEASTERN UNIVERSITY Graduate School of Engineering Thesis Title: Enabling Communications Between an FPGA s Embedded Processor and its Reconfigurable Resources Author: Joshua Noseworthy Department:

More information

Testing of Digital System-on- Chip (SoC)

Testing of Digital System-on- Chip (SoC) Testing of Digital System-on- Chip (SoC) 1 Outline of the Talk Introduction to system-on-chip (SoC) design Approaches to SoC design SoC test requirements and challenges Core test wrapper P1500 core test

More information

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,

More information

Optical interconnection networks with time slot routing

Optical interconnection networks with time slot routing Theoretical and Applied Informatics ISSN 896 5 Vol. x 00x, no. x pp. x x Optical interconnection networks with time slot routing IRENEUSZ SZCZEŚNIAK AND ROMAN WYRZYKOWSKI a a Institute of Computer and

More information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.

More information

International Workshop on Field Programmable Logic and Applications, FPL '99

International Workshop on Field Programmable Logic and Applications, FPL '99 International Workshop on Field Programmable Logic and Applications, FPL '99 DRIVE: An Interpretive Simulation and Visualization Environment for Dynamically Reconægurable Systems? Kiran Bondalapati and

More information

Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system

Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system Innovative improvement of fundamental metrics including power dissipation and efficiency of the ALU system Joseph LaBauve Department of Electrical and Computer Engineering University of Central Florida

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications The implementation and performance/cost/power analysis of the network security accelerator on SoC applications Ruei-Ting Gu grating@eslab.cse.nsysu.edu.tw Kuo-Huang Chung khchung@eslab.cse.nsysu.edu.tw

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

Qsys and IP Core Integration

Qsys and IP Core Integration Qsys and IP Core Integration Prof. David Lariviere Columbia University Spring 2014 Overview What are IP Cores? Altera Design Tools for using and integrating IP Cores Overview of various IP Core Interconnect

More information

Interconnection Generation for System-on-Chip Design and Design Space Exploration

Interconnection Generation for System-on-Chip Design and Design Space Exploration Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. G. Fettweis Interconnection Generation for System-on-Chip Design and Design Space Exploration Dipl.-Ing. Markus Winter Vodafone Chair for Mobile

More information

Open Flow Controller and Switch Datasheet

Open Flow Controller and Switch Datasheet Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

Design of a High-speed and large-capacity NAND Flash storage system based on Fiber Acquisition

Design of a High-speed and large-capacity NAND Flash storage system based on Fiber Acquisition Design of a High-speed and large-capacity NAND Flash storage system based on Fiber Acquisition Qing Li, Shanqing Hu * School of Information and Electronic Beijing Institute of Technology Beijing, China

More information

What are embedded systems? Challenges in embedded computing system design. Design methodologies.

What are embedded systems? Challenges in embedded computing system design. Design methodologies. Embedded Systems Sandip Kundu 1 ECE 354 Lecture 1 The Big Picture What are embedded systems? Challenges in embedded computing system design. Design methodologies. Sophisticated functionality. Real-time

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

System on Chip Platform Based on OpenCores for Telecommunication Applications

System on Chip Platform Based on OpenCores for Telecommunication Applications System on Chip Platform Based on OpenCores for Telecommunication Applications N. Izeboudjen, K. Kaci, S. Titri, L. Sahli, D. Lazib, F. Louiz, M. Bengherabi, *N. Idirene Centre de Développement des Technologies

More information

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,

More information

Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA

Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA B. Neji 1, Y. Aydi 2, R. Ben-atitallah 3,S. Meftaly 4, M. Abid 5, J-L. Dykeyser 6 1 CES, National engineering School

More information

Memory Architecture and Management in a NoC Platform

Memory Architecture and Management in a NoC Platform Architecture and Management in a NoC Platform Axel Jantsch Xiaowen Chen Zhonghai Lu Chaochao Feng Abdul Nameed Yuang Zhang Ahmed Hemani DATE 2011 Overview Motivation State of the Art Data Management Engine

More information

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09 Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,

More information

A Complete Multi-Processor System-on-Chip FPGA-Based Emulation Framework

A Complete Multi-Processor System-on-Chip FPGA-Based Emulation Framework A Complete Multi-Processor System-on-Chip FPGA-Based Emulation Framework Pablo G. Del Valle, David Atienza, Ivan Magan, Javier G. Flores, Esther A. Perez, Jose M. Mendias, Luca Benini, Giovanni De Micheli

More information

Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays

Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue, Ver. III (Jan - Feb. 205), PP 0- e-issn: 239 4200, p-issn No. : 239 497 www.iosrjournals.org Design and Analysis of Parallel AES

More information

Cloud Computing and Robotics for Disaster Management

Cloud Computing and Robotics for Disaster Management 2016 7th International Conference on Intelligent Systems, Modelling and Simulation Cloud Computing and Robotics for Disaster Management Nitesh Jangid Information Technology Department Green Research IT

More information

OpenSPARC T1 Processor

OpenSPARC T1 Processor OpenSPARC T1 Processor The OpenSPARC T1 processor is the first chip multiprocessor that fully implements the Sun Throughput Computing Initiative. Each of the eight SPARC processor cores has full hardware

More information

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL IJVD: 3(1), 2012, pp. 15-20 DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL Suvarna A. Jadhav 1 and U.L. Bombale 2 1,2 Department of Technology Shivaji university, Kolhapur, 1 E-mail: suvarna_jadhav@rediffmail.com

More information

Parallel AES Encryption with Modified Mix-columns For Many Core Processor Arrays M.S.Arun, V.Saminathan

Parallel AES Encryption with Modified Mix-columns For Many Core Processor Arrays M.S.Arun, V.Saminathan Parallel AES Encryption with Modified Mix-columns For Many Core Processor Arrays M.S.Arun, V.Saminathan Abstract AES is an encryption algorithm which can be easily implemented on fine grain many core systems.

More information

Model-based system-on-chip design on Altera and Xilinx platforms

Model-based system-on-chip design on Altera and Xilinx platforms CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT Model-based system-on-chip design on Altera and Xilinx platforms Ronald Grootelaar, System Architect RJA.Grootelaar@3t.nl Agenda 3T Company profile Technology

More information

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik Contents Überblick: Aufbau moderner FPGA Einblick: Eigenschaften

More information

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances: Scheduling Scheduling Scheduling levels Long-term scheduling. Selects which jobs shall be allowed to enter the system. Only used in batch systems. Medium-term scheduling. Performs swapin-swapout operations

More information