Tolerating SEU Faults in the Raw Architecture

Size: px
Start display at page:

Download "Tolerating SEU Faults in the Raw Architecture"

Transcription

1 Tolerating SEU Faults in the Raw Architecture Karandeep Singh #, Adnan Agbaria +*, Dong-In Kang #, and Matthew French # # USC Information Sciences Institute, Arlington VA, USA {karan, dkang, + IBM Haifa Research Lab, Mount Carmel, Haifa 31905, Israel adnan@il.ibm.com Abstract This paper describes software fault tolerance techniques to mitigate SEU faults in the Raw architecture, which is a single-chip parallel tiled computing architecture. The fault tolerance techniques we use are efficient Checkpointing and Rollback of processor state, Break-pointing, Selective Replication of code and Selective Duplication of tiles. Our fault tolerance techniques can be fully implemented in the software, without any changes to the architecture, transparent to the user, and designed to fulfill run-time performance and throughput requirements of the system. We illustrate these techniques by mitigating matrix multiply kernel mapped on Raw. The proposed techniques are also applicable to other tiled architectures (and also parallel systems in general). 1. Introduction Multi-core tiled computing architectures are becoming increasing popular because of their good performance, throughput, and power efficiency. Multiple cores also provide an inherent redundancy that enables better fault tolerance and recovery. Raw (developed at Massachusetts Institute of Technology) [Taylor02] is a general purpose tiled parallel processing architecture with a compiler programmable interconnection network. The current Raw chip has 16 processor tiles, which can scale to a larger number of tiles on future chips. The interconnection network extends off-chip, allowing large fabrics of Raw chips to be built. To mitigate faults on Raw, we periodically checkpoint the processor state, which is stored in off-chip stable storage. The state to be checkpointed includes data caches, data in memory, registers, and states of on-chip networks. Two-level Asynchronous and Synchronous checkpoints are taken to improve performance. Selective Replication of code and Selective Duplication of tiles are used for fault/failure detection. Breakpoints are inserted in the code after every replication/ duplication for comparing corresponding states and detecting faults. Our analysis shows that our techniques mitigate a high percentage of SEU (single event upset) faults in Raw, with no VLSI or architecture modifications. 2. Raw Architecture The Raw architecture consists of 16 processing tiles connected in a mesh using two types of high performance pipelined networks: static and dynamic. The Raw chip is implemented in IBM s 180 nm 1.8V 6-layer CMOS 7SF SA-27E copper process. It has a peak throughput of 6.8 GFLOPS at 425 MHz. Each tile has two processors; one MIPS type compute processor with 32KB each of data and instruction cache, and one switch processor with 64 KB of instruction cache. The compute processor has an 8-stage in-order single-issue MIPS-style processing pipeline and a 4-stage single-precision pipelined FPU. Two of the interconnecting networks are static 2-D point-to-point mesh networks, which are optimized to route single word quantities of data (without any headers) and these routes are determined at compile time. There are two dynamically routed networks in Raw. The general dynamic network is used for data transfer among tiles, while the memory dynamic network is used to access off-chip memory [Taylor02]. Block diagram for the Raw architecture is shown in Figure 1. Raw s exposed ISA allows parallel applications to exploit all of the chip resources, including gates, wires and pins and it performs very well across a large class of stream and embedded computing applications [Taylor04]. Compared to a 180 nm Pentium-III, using commodity PC memory system components, Raw performs within a factor of 2x for sequential applications with a very low degree of ILP, about 2x to 9x better for higher levels of ILP, and 10x-100x better when highly parallel applications are hand coded and optimized [Taylor04]. * Work performed at USC/ISI

2 Figure 1: Raw architecture (source [Taylor04]) 3. Fault Tolerance Techniques In this work we assume that a SEU fault [NVSR02] can happen at any time. We would like to mitigate SEUs with minimum modification in the Raw architecture, and preferably, without any hardware modification. So, we consider and adopt software fault tolerance techniques that are (1) transparent to the user and (2) efficient to meet performance requirements. Mainly, we use a combination of the following techniques for fault detection and tolerance: selective replication, selective duplication, checkpoint/restart, breakpoints, and TMR. A breakpoint (BP) is used in Raw for error detection. The compiler (or user) inserts BPs in the code of the program to be able to detect errors. The location of the BPs in the program depends on the applied fault detection techniques. For example, as shown later, we insert BPs after code duplication and any selective replication. Please notice here that BP is used to help us in detecting faults. In selective replication (SR) [GS96], the compiler and/or user selects some code to be replicated. As a result, during execution, the selected code runs simultaneously in two different tiles in Raw. Due to replication some synchronization may be required to ensure total ordering in messages delivery [Birman97]. Although, replication is a successful technique for providing fault tolerance, we use SR for failure detection in Raw. Specifically, we trace the states of the two replicas. BPs are inserted after SR to detect any SEU that may happen in one of the replicas. In selective duplication (SD), the compiler and/or user selects some instructions to be duplicated in the code. Then after any code duplication, a BP is inserted to detect any SEU in the instruction. As we see here, we use BPs to check any possible SEU after code duplication and replication. SR is more expensive than SD in terms of the resources and the price to compare the states of the replicas during a BP. Therefore, we try to use SD where we are able to detect SEUs, without the need of synchronization with other tiles. Although SD cannot detect a SEU in the networks, it may detect it in a register. We use a heuristic function that helps us in determining the code that needs to be either replicated or duplicated. Such function is based on cost/effectiveness tradeoff to choose either replication or duplication. We use checkpoint/restart for providing fault tolerance in Raw. Checkpoint/Restart (C/R) is a way to provide persistence and fault tolerance in both uniprocessor and distributed systems [EAWJ02]. Checkpointing is the act of saving an application's state to stable storage during its execution, while restart is the act of restarting the application from a checkpointed state. In the Raw architecture, in order to recover a tile from a failure, we need to checkpoint the state of the tile, which is defined as follows: data caches, data in memory, registers, and the state on the networks. So, during checkpoint, we need to save all these states for each tile. Figure 2 presents the information on what need to be checkpointed.

3 The network buffers The registers The log file Logs are saved during the execution Log file Figure 2: Checkpointing technique in Raw In our approach, to achieve an efficient C/R mechanism, we applied a new application-based incremental checkpointing. With this approach, during checkpoint, each tile saves only the following: all the registers, the state of the networks, and a log file. This log file is created during the execution and is flushed at every checkpoint. The log file implements our incremental checkpointing technique. Instead of using page faults or copy-on-write as presented in [Plank97], we use a new technique that involves the compiler to identify all the modified data structures between two consecutive checkpoints. The compiler inserts the log calls. By calling a special Log call, the modified data structures are logged in the log file. Specifically, for every data structure a, we log a after its last modify and before the next checkpoint. Using compiler-based analysis, we can identify all the data structures that need to be logged between every two consecutive checkpoint calls [Muchnick97]. Figure 3 presents an example of code in which the compiler inserts Log calls for logging the modified data structures. Chkpt() // Checkpoint # i a = f(..). b = g(..) a = // New modification for a a = // Last modification for a Log (a) Chkpt() // Checkpoint # i+1 Figure 3: An example of using Log and Chkpt calls in the code Usually, the Raw architecture runs multi-task applications in which every tile runs a task. The tasks communicate with each other using on-chip networks. The problem of checkpoint and restart is more complicated in distributed settings (such as Raw), where an application is composed of several processes, each possibly running on a different computer. In order to restart such an application, one has to choose a collection of checkpoints, one from each process that corresponds to a consistent distributed application's state. A distributed application's state is not consistent if it represents a situation in which some message m is received by some process, but the event of sending m is not in the checkpoint collection. A collection of checkpoints that corresponds to a consistent distributed state forms a recovery line. As presented in Figure 4, if a failure occurs when no collection of checkpoints taken during the execution forms a recovery line, then the application will have to be restarted from the beginning, regardless of the number of checkpoints already taken. The domino effect was identified in [Randell75] as the source of not being able to find a recovery line in an execution with checkpoints, indicating that a recovery line is not guaranteed unless special care is taken. To ensure recovery lines, in this work, we adopt the d-bc (d-bounded Cycle) distributed checkpointing protocol presented in [AAFV04]. This is a two-level checkpointing protocol. In Raw, we allow every tile to take its checkpoint independently. We denote this local checkpoint by CL1. However, to avoid the domino effect, we force all the tiles to coordinate their checkpoints to create a consistent global checkpoint. We denote this global checkpoint by CL2. Since CL1 is local and does not require synchronization, CL1 is cheaper than CL2 in terms of size and overhead and occurs more frequently than CL2

4 T 1 T 2 message checkpoint Inconsistent checkpoints Figure 4: Inconsistent checkpoint due to message exchanges X failure Figure 5 presents a running example of our fault tolerance techniques in Raw. This includes SR, SD, BPs, and C/R. In this example, tile T 7 implements SR for T 1. Similarly, T 5 implements SR for T 2. Notice here that after every SR we have BPs to check for any possible errors. Again, notice here that the BPs and checkpoints are inserted in the code during the compilation of the application. So, each tile is aware of the techniques that it applied during the execution. Global and consistent checkpoints T 1 SD join BP T 7 - replica T 2 T 5 - replica - Breakpoint - CL1 Checkpoint - CL2 Checkpoint Figure 5: An example that shows our fault tolerance techniques in Raw 4. Analysis and a Sample Application We define Reliability as the percentage of time that an application can run without resetting the system on a SEU. We derive the reliability analytically using area information of the processor and the information of possible effects when an SEU fault happens on the area. We use analytical methods to derive reliability numbers. Here, we present an implementation of fault-tolerant matrix multiplication on the Raw processor, and derive reliability of the implementation. The estimated amount of areas of functional components of a tile in Raw processor is shown in Figure 6. A tile consists of a tile processor and a switch processor. A tile processor is estimated to take 60.1% of a tile s area. And a switch processor is estimated to take 39.9% of a tile s area. Compo -nents Area Error Result Dcache Dcache FPU ALU Fetch Unit GPR SPR Event Counters A * Total (a) Tile processor

5 Compo -nents Area Error Result Control Switch Processor SN Data SN DN Data DN MN Data MN B ** Total (b) Switch processor Figure 6: Estimated area information of a Raw tile and its fault susceptibility when fault tolerant Matrix Multiplication runs on it ( * : 50% of cache is prone to SEUs at any given time. Out of that 33% area is assumed to be occupied by instructions and 66% by operands. 5% of instruction and 25% of operand area is assumed to be critical) ** : About 50% cache is assumed to be filled and 50% of them are assumed to be critical.) An implementation of FT Matrix Multiplication using our fault tolerance technique is shown in Figure 7. Base algorithm multiplies input matrices A and B, and produces result matrix C in a streaming fashion. Columns of matrix B are read by the tiles in the upper row (tiles 1 and 2 in Figure 7). Rows of matrix A are read by the tiles at the leftmost column (tiles 4, 8, 12 in Figure 7). Result matrix C is collected by the tiles in the rightmost column except the tile in the upper row (tiles 7, 11, 15 in Figure 7). Computations are done by the remaining tiles (tiles 5, 6, 9, 10, 13, 14 in Figure 7). For each part of the algorithm and mapping, different FT techniques are applied. Different FT techniques are applied to the algorithm. Temporal Triple Modular Redundancy (TMR) is applied to input and output parts, which makes single SEU correction possible for inputs. The overhead of temporal TMR at the input nodes is justified by the longer computation time of the computation nodes. Code duplication and local checkpoint and rollback techniques are used for the computation nodes. Since inputs are protected by software TMR, an SEU on a computation node can be well confined within the nodes and local checkpoint and rollback technique can tolerate SEU on the computation part. System monitoring processes are replicated in tile 0 and tile 3 for better protection which are not used by the base algorithm Function Node FT Technique Column Input 1, 2 Temporal TMR Row Input 4, 8, 12 Temporal TMR Computation 5, 6, 9, 10, Code Duplication 13,14 Result Output 7, 11, 15 Temporal TMR System Monitoring 0, 3 Code Replication (a) Mapping (b) Used techniques Figure 7: Mapping of FT Matrix Multiplication on Raw Processor and techniques used Reliability of the FT Matrix Multiplication on Raw is estimated using the hardware area information which is shown in Figure 6. Effect of an SEU on each functional component is estimated conservatively. For example, an SEU on instruction cache control logic is always (100%) assumed to lead to system reset. The percentage of system reset when a SEU occurs on a functional component is presented in the rows titled as Error in Figure 6 (a) and (b). The overall percentage of system reset due to an SEU on a functional component

6 is presented in the rows titles as Result in Figure 6 (a) and (b). Based on the assumptions and estimation of the hardware, the reliability of FT Matrix Multiplication is 89.72%. The performance of the FT Matrix Multiplication is determined by the performance of the computation nodes. Since Code Duplication techniques are used for the computation algorithm, we expect the performance in terms of FLOPS to be slightly less than half of the base algorithm. 5. Conclusion We present software fault tolerance techniques to mitigate SEU faults in the Raw architecture. In this work, we describe these fault detection and mitigation techniques, which can be implemented in the software and don t require any hardware changes in the architecture. We demonstrate these techniques with the help of a sample application (matrix multiplication) and also present analytical evaluations of the performance and reliability numbers for our techniques. 6. Acknowledgements Effort sponsored through the Department of the Interior National Business Center under grant number NBCH The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. Method for Safety-Critical Systems: Effectiveness and Drawbacks. In Proceedings of the 15th IEEE Symposium on Integrated Circuits and Systems Design. pp , September, 2002, Porto Alegre, Brazil. [Plank97] J. S. Plank. An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance. Technical Report UT-CS Department of Computer Science, University of Tennessee. July, [Randell75] B. Randell. System Structure for Software Fault Tolerance. IEEE Transactions on Software Engineering. SE- 1: , June, [Taylor02] Michael B. Taylor,, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffmann, Paul Johnson, Jae-Wook Lee, Walter Lee, Albert Ma, Arvind Saraf, Mark Seneski, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe and Anant Agarwal. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro, Mar/Apr [Taylor04] Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, Henry Hoffmann, Paul Johnson, Jason Kim, James Psota, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe, and Anant Agarwal. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. Proceedings of International Symposium on Computer Architecture, June References [AAFV04] A. Agbaria, H. Attiya, R. Friedman, and R. Vitenberg. Quantifying Rollback Propagation in Distributed Checkpointing. Journal of Parallel and Distributed Computing. 64(3): , March, [Birman97] K. P. Birman. Building Secure and Reliable Network Applications. Manning Publishing Company and Prentice Hall. December, [EAWJ02] E. N. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson. A Survey of Rollback-Recovery Protocols in Message- Passing Systems. ACM Computing Surveys. 34(3): , September, [GS96] R. Guerraoui and A. Schiper. Fault-Tolerance by Replication in Distributed Systems. Reliable Software Technologies - Ada-Europe'96. pp , [Muchnick97] S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers [NVSR02] B. Nicolescu, R. Velazco, M. Sonza Reorda, M. Rebaudengo, and M. Violante. A Software Fault Tolerance

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

A Dynamic Link Allocation Router

A Dynamic Link Allocation Router A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection

More information

Switched Interconnect for System-on-a-Chip Designs

Switched Interconnect for System-on-a-Chip Designs witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased

More information

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Client/Server Computing Distributed Processing, Client/Server, and Clusters Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis

Performance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work

More information

Reliable Systolic Computing through Redundancy

Reliable Systolic Computing through Redundancy Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

OC By Arsene Fansi T. POLIMI 2008 1

OC By Arsene Fansi T. POLIMI 2008 1 IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

路 論 Chapter 15 System-Level Physical Design

路 論 Chapter 15 System-Level Physical Design Introduction to VLSI Circuits and Systems 路 論 Chapter 15 System-Level Physical Design Dept. of Electronic Engineering National Chin-Yi University of Technology Fall 2007 Outline Clocked Flip-flops CMOS

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.

Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp. Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.49-54 : isrp13-005 Optimized Communications on Cloud Computer Processor by Using

More information

Weighted Total Mark. Weighted Exam Mark

Weighted Total Mark. Weighted Exam Mark CMP2204 Operating System Technologies Period per Week Contact Hour per Semester Total Mark Exam Mark Continuous Assessment Mark Credit Units LH PH TH CH WTM WEM WCM CU 45 30 00 60 100 40 100 4 Rationale

More information

CS250 VLSI Systems Design Lecture 8: Memory

CS250 VLSI Systems Design Lecture 8: Memory CS250 VLSI Systems esign Lecture 8: Memory John Wawrzynek, Krste Asanovic, with John Lazzaro and Yunsup Lee (TA) UC Berkeley Fall 2010 CMOS Bistable 1 0 Flip State 0 1 Cross-coupled inverters used to hold

More information

HyperThreading Support in VMware ESX Server 2.1

HyperThreading Support in VMware ESX Server 2.1 HyperThreading Support in VMware ESX Server 2.1 Summary VMware ESX Server 2.1 now fully supports Intel s new Hyper-Threading Technology (HT). This paper explains the changes that an administrator can expect

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

How To Understand The Concept Of A Distributed System

How To Understand The Concept Of A Distributed System Distributed Operating Systems Introduction Ewa Niewiadomska-Szynkiewicz and Adam Kozakiewicz ens@ia.pw.edu.pl, akozakie@ia.pw.edu.pl Institute of Control and Computation Engineering Warsaw University of

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

The Service Availability Forum Specification for High Availability Middleware

The Service Availability Forum Specification for High Availability Middleware The Availability Forum Specification for High Availability Middleware Timo Jokiaho, Fred Herrmann, Dave Penkler, Manfred Reitenspiess, Louise Moser Availability Forum Timo.Jokiaho@nokia.com, Frederic.Herrmann@sun.com,

More information

Cost-Performance of Fault Tolerance in Cloud Computing

Cost-Performance of Fault Tolerance in Cloud Computing Cost-Performance of Fault Tolerance in Cloud Computing Y.M. Teo,2, B.L. Luong, Y. Song 2 and T. Nam 3 Department of Computer Science, National University of Singapore 2 Shanghai Advanced Research Institute,

More information

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING Hussain Al-Asaad and Alireza Sarvi Department of Electrical & Computer Engineering University of California Davis, CA, U.S.A.

More information

Verification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems

Verification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems Verification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems Melanie Berg 1, Kenneth LaBel 2 1.AS&D in support of NASA/GSFC Melanie.D.Berg@NASA.gov 2. NASA/GSFC Kenneth.A.LaBel@NASA.gov

More information

Java Virtual Machine: the key for accurated memory prefetching

Java Virtual Machine: the key for accurated memory prefetching Java Virtual Machine: the key for accurated memory prefetching Yolanda Becerra Jordi Garcia Toni Cortes Nacho Navarro Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

More information

Network Attached Storage. Jinfeng Yang Oct/19/2015

Network Attached Storage. Jinfeng Yang Oct/19/2015 Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability

More information

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Executive Summary Oracle Berkeley DB is used in a wide variety of carrier-grade mobile infrastructure systems. Berkeley DB provides

More information

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1 Distributed Systems REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1 1 The Rise of Distributed Systems! Computer hardware prices are falling and power increasing.!

More information

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas 3. Replication Replication Goal: Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas Problems: Partial failures of replicas and messages No

More information

Embedded Systems Lecture 9: Reliability & Fault Tolerance. Björn Franke University of Edinburgh

Embedded Systems Lecture 9: Reliability & Fault Tolerance. Björn Franke University of Edinburgh Embedded Systems Lecture 9: Reliability & Fault Tolerance Björn Franke University of Edinburgh Overview Definitions System Reliability Fault Tolerance Sources and Detection of Errors Stage Error Sources

More information

An Introduction to the ARM 7 Architecture

An Introduction to the ARM 7 Architecture An Introduction to the ARM 7 Architecture Trevor Martin CEng, MIEE Technical Director This article gives an overview of the ARM 7 architecture and a description of its major features for a developer new

More information

An On-Line Algorithm for Checkpoint Placement

An On-Line Algorithm for Checkpoint Placement An On-Line Algorithm for Checkpoint Placement Avi Ziv IBM Israel, Science and Technology Center MATAM - Advanced Technology Center Haifa 3905, Israel avi@haifa.vnat.ibm.com Jehoshua Bruck California Institute

More information

Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately.

Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately. Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately. Hardware Solution Evolution of Computer Architectures Micro-Scopic View Clock Rate Limits Have Been Reached

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

Virtual machine interface. Operating system. Physical machine interface

Virtual machine interface. Operating system. Physical machine interface Software Concepts User applications Operating system Hardware Virtual machine interface Physical machine interface Operating system: Interface between users and hardware Implements a virtual machine that

More information

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Prof. Silvano, Politecnico di Milano

More information

MULTI-SPERT PHILIPP F ARBER AND KRSTE ASANOVI C. International Computer Science Institute,

MULTI-SPERT PHILIPP F ARBER AND KRSTE ASANOVI C. International Computer Science Institute, PARALLEL NEURAL NETWORK TRAINING ON MULTI-SPERT PHILIPP F ARBER AND KRSTE ASANOVI C International Computer Science Institute, Berkeley, CA 9474 Multi-Spert is a scalable parallel system built from multiple

More information

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications

Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications TRIPTI SHARMA, K. G. SHARMA, B. P. SINGH, NEHA ARORA Electronics & Communication Department MITS Deemed University,

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Attaining EDF Task Scheduling with O(1) Time Complexity

Attaining EDF Task Scheduling with O(1) Time Complexity Attaining EDF Task Scheduling with O(1) Time Complexity Verber Domen University of Maribor, Faculty of Electrical Engineering and Computer Sciences, Maribor, Slovenia (e-mail: domen.verber@uni-mb.si) Abstract:

More information

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1 CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level -ORACLE TIMESTEN 11gR1 CASE STUDY Oracle TimesTen In-Memory Database and Shared Disk HA Implementation

More information

Design and Implementation of the Heterogeneous Multikernel Operating System

Design and Implementation of the Heterogeneous Multikernel Operating System 223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,

More information

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,

More information

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 ISSN 0976 6464(Print)

More information

Load Balancing in Distributed Data Base and Distributed Computing System

Load Balancing in Distributed Data Base and Distributed Computing System Load Balancing in Distributed Data Base and Distributed Computing System Lovely Arya Research Scholar Dravidian University KUPPAM, ANDHRA PRADESH Abstract With a distributed system, data can be located

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Controlling a Dot Matrix LED Display with a Microcontroller

Controlling a Dot Matrix LED Display with a Microcontroller Controlling a Dot Matrix LED Display with a Microcontroller By Matt Stabile and programming will be explained in general terms as well to allow for adaptation to any comparable microcontroller or LED matrix.

More information

Low-Overhead Hard Real-time Aware Interconnect Network Router

Low-Overhead Hard Real-time Aware Interconnect Network Router Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering

More information

Six Strategies for Building High Performance SOA Applications

Six Strategies for Building High Performance SOA Applications Six Strategies for Building High Performance SOA Applications Uwe Breitenbücher, Oliver Kopp, Frank Leymann, Michael Reiter, Dieter Roller, and Tobias Unger University of Stuttgart, Institute of Architecture

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

Snapshots in Hadoop Distributed File System

Snapshots in Hadoop Distributed File System Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

A Practical Approach of Storage Strategy for Grid Computing Environment

A Practical Approach of Storage Strategy for Grid Computing Environment A Practical Approach of Storage Strategy for Grid Computing Environment Kalim Qureshi Abstract -- An efficient and reliable fault tolerance protocol plays an important role in making the system more stable.

More information

Spacecraft Computer Systems. Colonel John E. Keesee

Spacecraft Computer Systems. Colonel John E. Keesee Spacecraft Computer Systems Colonel John E. Keesee Overview Spacecraft data processing requires microcomputers and interfaces that are functionally similar to desktop systems However, space systems require:

More information

USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES

USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES 1 ALIREZA POORDAVOODI, 2 MOHAMMADREZA KHAYYAMBASHI, 3 JAFAR HAMIN 1, 3 M.Sc. Student, Computer Department, University of Sheikhbahaee,

More information

Chapter 6, The Operating System Machine Level

Chapter 6, The Operating System Machine Level Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General

More information

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi

More information

Communication Networks. MAP-TELE 2011/12 José Ruela

Communication Networks. MAP-TELE 2011/12 José Ruela Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 11 Memory Management Computer Architecture Part 11 page 1 of 44 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin

More information

CHAPTER 11: Flip Flops

CHAPTER 11: Flip Flops CHAPTER 11: Flip Flops In this chapter, you will be building the part of the circuit that controls the command sequencing. The required circuit must operate the counter and the memory chip. When the teach

More information

An Operating System for Multicore and Clouds

An Operating System for Multicore and Clouds An Operating System for Multicore and Clouds Mechanisms and Implementataion David Wentzlaff, Charles Gruenwald III, Nathan Beckmann, Kevin Modzelewski, Adam Belay, Lamia Youseff, Jason Miller, Anant Agarwal

More information

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7

ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7 ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7 4.7 A 2.7 Gb/s CDMA-Interconnect Transceiver Chip Set with Multi-Level Signal Data Recovery for Re-configurable VLSI Systems

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available: Tools Page 1 of 13 ON PROGRAM TRANSLATION A priori, we have two translation mechanisms available: Interpretation Compilation On interpretation: Statements are translated one at a time and executed immediately.

More information

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest 1. Introduction Few years ago, parallel computers could

More information

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of

More information

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Availability Digest. MySQL Clusters Go Active/Active. December 2006 the Availability Digest MySQL Clusters Go Active/Active December 2006 Introduction MySQL (www.mysql.com) is without a doubt the most popular open source database in use today. Developed by MySQL AB of

More information

Real-Time (Paradigms) (51)

Real-Time (Paradigms) (51) Real-Time (Paradigms) (51) 5. Real-Time Communication Data flow (communication) in embedded systems : Sensor --> Controller Controller --> Actor Controller --> Display Controller Controller Major

More information

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND -Based SSDs Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim Department of Computer Science and Engineering, Seoul National University,

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

Chapter 2 Heterogeneous Multicore Architecture

Chapter 2 Heterogeneous Multicore Architecture Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is

More information

Digital Signal Controller Based Automatic Transfer Switch

Digital Signal Controller Based Automatic Transfer Switch Digital Signal Controller Based Automatic Transfer Switch by Venkat Anant Senior Staff Applications Engineer Freescale Semiconductor, Inc. Abstract: An automatic transfer switch (ATS) enables backup generators,

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Fault Tolerant Matrix-Matrix Multiplication: Correcting Soft Errors On-Line.

Fault Tolerant Matrix-Matrix Multiplication: Correcting Soft Errors On-Line. Fault Tolerant Matrix-Matrix Multiplication: Correcting Soft Errors On-Line Panruo Wu, Chong Ding, Longxiang Chen, Teresa Davies, Christer Karlsson, and Zizhong Chen Colorado School of Mines November 13,

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do

More information

Switch Fabric Implementation Using Shared Memory

Switch Fabric Implementation Using Shared Memory Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today

More information

On Demand Loading of Code in MMUless Embedded System

On Demand Loading of Code in MMUless Embedded System On Demand Loading of Code in MMUless Embedded System Sunil R Gandhi *. Chetan D Pachange, Jr.** Mandar R Vaidya***, Swapnilkumar S Khorate**** *Pune Institute of Computer Technology, Pune INDIA (Mob- 8600867094;

More information

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN

TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN USING DIFFERENT FOUNDRIES Priyanka Sharma 1 and Rajesh Mehra 2 1 ME student, Department of E.C.E, NITTTR, Chandigarh, India 2 Associate Professor, Department

More information

A Framework for Highly Available Services Based on Group Communication

A Framework for Highly Available Services Based on Group Communication A Framework for Highly Available Services Based on Group Communication Alan Fekete fekete@cs.usyd.edu.au http://www.cs.usyd.edu.au/ fekete Department of Computer Science F09 University of Sydney 2006,

More information

Multithreading Lin Gao cs9244 report, 2006

Multithreading Lin Gao cs9244 report, 2006 Multithreading Lin Gao cs9244 report, 2006 2 Contents 1 Introduction 5 2 Multithreading Technology 7 2.1 Fine-grained multithreading (FGMT)............. 8 2.2 Coarse-grained multithreading (CGMT)............

More information

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications 1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency

More information

Dynamic resource management for energy saving in the cloud computing environment

Dynamic resource management for energy saving in the cloud computing environment Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan

More information

HRG Assessment: Stratus everrun Enterprise

HRG Assessment: Stratus everrun Enterprise HRG Assessment: Stratus everrun Enterprise Today IT executive decision makers and their technology recommenders are faced with escalating demands for more effective technology based solutions while at

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip www.ijcsi.org 241 A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip Ahmed A. El Badry 1 and Mohamed A. Abd El Ghany 2 1 Communications Engineering Dept., German University in Cairo,

More information

High Performance Computer Architecture

High Performance Computer Architecture High Performance Computer Architecture Volker Lindenstruth Lehrstuhl für Hochleistungsrechner Archittektur Ruth-Moufang Str. 1 email: ti@compeng.de URL: www.compeng.de Telefon: 798-44100 Volker Lindenstruth

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information