Tolerating SEU Faults in the Raw Architecture
|
|
- Adelia Cobb
- 7 years ago
- Views:
Transcription
1 Tolerating SEU Faults in the Raw Architecture Karandeep Singh #, Adnan Agbaria +*, Dong-In Kang #, and Matthew French # # USC Information Sciences Institute, Arlington VA, USA {karan, dkang, + IBM Haifa Research Lab, Mount Carmel, Haifa 31905, Israel adnan@il.ibm.com Abstract This paper describes software fault tolerance techniques to mitigate SEU faults in the Raw architecture, which is a single-chip parallel tiled computing architecture. The fault tolerance techniques we use are efficient Checkpointing and Rollback of processor state, Break-pointing, Selective Replication of code and Selective Duplication of tiles. Our fault tolerance techniques can be fully implemented in the software, without any changes to the architecture, transparent to the user, and designed to fulfill run-time performance and throughput requirements of the system. We illustrate these techniques by mitigating matrix multiply kernel mapped on Raw. The proposed techniques are also applicable to other tiled architectures (and also parallel systems in general). 1. Introduction Multi-core tiled computing architectures are becoming increasing popular because of their good performance, throughput, and power efficiency. Multiple cores also provide an inherent redundancy that enables better fault tolerance and recovery. Raw (developed at Massachusetts Institute of Technology) [Taylor02] is a general purpose tiled parallel processing architecture with a compiler programmable interconnection network. The current Raw chip has 16 processor tiles, which can scale to a larger number of tiles on future chips. The interconnection network extends off-chip, allowing large fabrics of Raw chips to be built. To mitigate faults on Raw, we periodically checkpoint the processor state, which is stored in off-chip stable storage. The state to be checkpointed includes data caches, data in memory, registers, and states of on-chip networks. Two-level Asynchronous and Synchronous checkpoints are taken to improve performance. Selective Replication of code and Selective Duplication of tiles are used for fault/failure detection. Breakpoints are inserted in the code after every replication/ duplication for comparing corresponding states and detecting faults. Our analysis shows that our techniques mitigate a high percentage of SEU (single event upset) faults in Raw, with no VLSI or architecture modifications. 2. Raw Architecture The Raw architecture consists of 16 processing tiles connected in a mesh using two types of high performance pipelined networks: static and dynamic. The Raw chip is implemented in IBM s 180 nm 1.8V 6-layer CMOS 7SF SA-27E copper process. It has a peak throughput of 6.8 GFLOPS at 425 MHz. Each tile has two processors; one MIPS type compute processor with 32KB each of data and instruction cache, and one switch processor with 64 KB of instruction cache. The compute processor has an 8-stage in-order single-issue MIPS-style processing pipeline and a 4-stage single-precision pipelined FPU. Two of the interconnecting networks are static 2-D point-to-point mesh networks, which are optimized to route single word quantities of data (without any headers) and these routes are determined at compile time. There are two dynamically routed networks in Raw. The general dynamic network is used for data transfer among tiles, while the memory dynamic network is used to access off-chip memory [Taylor02]. Block diagram for the Raw architecture is shown in Figure 1. Raw s exposed ISA allows parallel applications to exploit all of the chip resources, including gates, wires and pins and it performs very well across a large class of stream and embedded computing applications [Taylor04]. Compared to a 180 nm Pentium-III, using commodity PC memory system components, Raw performs within a factor of 2x for sequential applications with a very low degree of ILP, about 2x to 9x better for higher levels of ILP, and 10x-100x better when highly parallel applications are hand coded and optimized [Taylor04]. * Work performed at USC/ISI
2 Figure 1: Raw architecture (source [Taylor04]) 3. Fault Tolerance Techniques In this work we assume that a SEU fault [NVSR02] can happen at any time. We would like to mitigate SEUs with minimum modification in the Raw architecture, and preferably, without any hardware modification. So, we consider and adopt software fault tolerance techniques that are (1) transparent to the user and (2) efficient to meet performance requirements. Mainly, we use a combination of the following techniques for fault detection and tolerance: selective replication, selective duplication, checkpoint/restart, breakpoints, and TMR. A breakpoint (BP) is used in Raw for error detection. The compiler (or user) inserts BPs in the code of the program to be able to detect errors. The location of the BPs in the program depends on the applied fault detection techniques. For example, as shown later, we insert BPs after code duplication and any selective replication. Please notice here that BP is used to help us in detecting faults. In selective replication (SR) [GS96], the compiler and/or user selects some code to be replicated. As a result, during execution, the selected code runs simultaneously in two different tiles in Raw. Due to replication some synchronization may be required to ensure total ordering in messages delivery [Birman97]. Although, replication is a successful technique for providing fault tolerance, we use SR for failure detection in Raw. Specifically, we trace the states of the two replicas. BPs are inserted after SR to detect any SEU that may happen in one of the replicas. In selective duplication (SD), the compiler and/or user selects some instructions to be duplicated in the code. Then after any code duplication, a BP is inserted to detect any SEU in the instruction. As we see here, we use BPs to check any possible SEU after code duplication and replication. SR is more expensive than SD in terms of the resources and the price to compare the states of the replicas during a BP. Therefore, we try to use SD where we are able to detect SEUs, without the need of synchronization with other tiles. Although SD cannot detect a SEU in the networks, it may detect it in a register. We use a heuristic function that helps us in determining the code that needs to be either replicated or duplicated. Such function is based on cost/effectiveness tradeoff to choose either replication or duplication. We use checkpoint/restart for providing fault tolerance in Raw. Checkpoint/Restart (C/R) is a way to provide persistence and fault tolerance in both uniprocessor and distributed systems [EAWJ02]. Checkpointing is the act of saving an application's state to stable storage during its execution, while restart is the act of restarting the application from a checkpointed state. In the Raw architecture, in order to recover a tile from a failure, we need to checkpoint the state of the tile, which is defined as follows: data caches, data in memory, registers, and the state on the networks. So, during checkpoint, we need to save all these states for each tile. Figure 2 presents the information on what need to be checkpointed.
3 The network buffers The registers The log file Logs are saved during the execution Log file Figure 2: Checkpointing technique in Raw In our approach, to achieve an efficient C/R mechanism, we applied a new application-based incremental checkpointing. With this approach, during checkpoint, each tile saves only the following: all the registers, the state of the networks, and a log file. This log file is created during the execution and is flushed at every checkpoint. The log file implements our incremental checkpointing technique. Instead of using page faults or copy-on-write as presented in [Plank97], we use a new technique that involves the compiler to identify all the modified data structures between two consecutive checkpoints. The compiler inserts the log calls. By calling a special Log call, the modified data structures are logged in the log file. Specifically, for every data structure a, we log a after its last modify and before the next checkpoint. Using compiler-based analysis, we can identify all the data structures that need to be logged between every two consecutive checkpoint calls [Muchnick97]. Figure 3 presents an example of code in which the compiler inserts Log calls for logging the modified data structures. Chkpt() // Checkpoint # i a = f(..). b = g(..) a = // New modification for a a = // Last modification for a Log (a) Chkpt() // Checkpoint # i+1 Figure 3: An example of using Log and Chkpt calls in the code Usually, the Raw architecture runs multi-task applications in which every tile runs a task. The tasks communicate with each other using on-chip networks. The problem of checkpoint and restart is more complicated in distributed settings (such as Raw), where an application is composed of several processes, each possibly running on a different computer. In order to restart such an application, one has to choose a collection of checkpoints, one from each process that corresponds to a consistent distributed application's state. A distributed application's state is not consistent if it represents a situation in which some message m is received by some process, but the event of sending m is not in the checkpoint collection. A collection of checkpoints that corresponds to a consistent distributed state forms a recovery line. As presented in Figure 4, if a failure occurs when no collection of checkpoints taken during the execution forms a recovery line, then the application will have to be restarted from the beginning, regardless of the number of checkpoints already taken. The domino effect was identified in [Randell75] as the source of not being able to find a recovery line in an execution with checkpoints, indicating that a recovery line is not guaranteed unless special care is taken. To ensure recovery lines, in this work, we adopt the d-bc (d-bounded Cycle) distributed checkpointing protocol presented in [AAFV04]. This is a two-level checkpointing protocol. In Raw, we allow every tile to take its checkpoint independently. We denote this local checkpoint by CL1. However, to avoid the domino effect, we force all the tiles to coordinate their checkpoints to create a consistent global checkpoint. We denote this global checkpoint by CL2. Since CL1 is local and does not require synchronization, CL1 is cheaper than CL2 in terms of size and overhead and occurs more frequently than CL2
4 T 1 T 2 message checkpoint Inconsistent checkpoints Figure 4: Inconsistent checkpoint due to message exchanges X failure Figure 5 presents a running example of our fault tolerance techniques in Raw. This includes SR, SD, BPs, and C/R. In this example, tile T 7 implements SR for T 1. Similarly, T 5 implements SR for T 2. Notice here that after every SR we have BPs to check for any possible errors. Again, notice here that the BPs and checkpoints are inserted in the code during the compilation of the application. So, each tile is aware of the techniques that it applied during the execution. Global and consistent checkpoints T 1 SD join BP T 7 - replica T 2 T 5 - replica - Breakpoint - CL1 Checkpoint - CL2 Checkpoint Figure 5: An example that shows our fault tolerance techniques in Raw 4. Analysis and a Sample Application We define Reliability as the percentage of time that an application can run without resetting the system on a SEU. We derive the reliability analytically using area information of the processor and the information of possible effects when an SEU fault happens on the area. We use analytical methods to derive reliability numbers. Here, we present an implementation of fault-tolerant matrix multiplication on the Raw processor, and derive reliability of the implementation. The estimated amount of areas of functional components of a tile in Raw processor is shown in Figure 6. A tile consists of a tile processor and a switch processor. A tile processor is estimated to take 60.1% of a tile s area. And a switch processor is estimated to take 39.9% of a tile s area. Compo -nents Area Error Result Dcache Dcache FPU ALU Fetch Unit GPR SPR Event Counters A * Total (a) Tile processor
5 Compo -nents Area Error Result Control Switch Processor SN Data SN DN Data DN MN Data MN B ** Total (b) Switch processor Figure 6: Estimated area information of a Raw tile and its fault susceptibility when fault tolerant Matrix Multiplication runs on it ( * : 50% of cache is prone to SEUs at any given time. Out of that 33% area is assumed to be occupied by instructions and 66% by operands. 5% of instruction and 25% of operand area is assumed to be critical) ** : About 50% cache is assumed to be filled and 50% of them are assumed to be critical.) An implementation of FT Matrix Multiplication using our fault tolerance technique is shown in Figure 7. Base algorithm multiplies input matrices A and B, and produces result matrix C in a streaming fashion. Columns of matrix B are read by the tiles in the upper row (tiles 1 and 2 in Figure 7). Rows of matrix A are read by the tiles at the leftmost column (tiles 4, 8, 12 in Figure 7). Result matrix C is collected by the tiles in the rightmost column except the tile in the upper row (tiles 7, 11, 15 in Figure 7). Computations are done by the remaining tiles (tiles 5, 6, 9, 10, 13, 14 in Figure 7). For each part of the algorithm and mapping, different FT techniques are applied. Different FT techniques are applied to the algorithm. Temporal Triple Modular Redundancy (TMR) is applied to input and output parts, which makes single SEU correction possible for inputs. The overhead of temporal TMR at the input nodes is justified by the longer computation time of the computation nodes. Code duplication and local checkpoint and rollback techniques are used for the computation nodes. Since inputs are protected by software TMR, an SEU on a computation node can be well confined within the nodes and local checkpoint and rollback technique can tolerate SEU on the computation part. System monitoring processes are replicated in tile 0 and tile 3 for better protection which are not used by the base algorithm Function Node FT Technique Column Input 1, 2 Temporal TMR Row Input 4, 8, 12 Temporal TMR Computation 5, 6, 9, 10, Code Duplication 13,14 Result Output 7, 11, 15 Temporal TMR System Monitoring 0, 3 Code Replication (a) Mapping (b) Used techniques Figure 7: Mapping of FT Matrix Multiplication on Raw Processor and techniques used Reliability of the FT Matrix Multiplication on Raw is estimated using the hardware area information which is shown in Figure 6. Effect of an SEU on each functional component is estimated conservatively. For example, an SEU on instruction cache control logic is always (100%) assumed to lead to system reset. The percentage of system reset when a SEU occurs on a functional component is presented in the rows titled as Error in Figure 6 (a) and (b). The overall percentage of system reset due to an SEU on a functional component
6 is presented in the rows titles as Result in Figure 6 (a) and (b). Based on the assumptions and estimation of the hardware, the reliability of FT Matrix Multiplication is 89.72%. The performance of the FT Matrix Multiplication is determined by the performance of the computation nodes. Since Code Duplication techniques are used for the computation algorithm, we expect the performance in terms of FLOPS to be slightly less than half of the base algorithm. 5. Conclusion We present software fault tolerance techniques to mitigate SEU faults in the Raw architecture. In this work, we describe these fault detection and mitigation techniques, which can be implemented in the software and don t require any hardware changes in the architecture. We demonstrate these techniques with the help of a sample application (matrix multiplication) and also present analytical evaluations of the performance and reliability numbers for our techniques. 6. Acknowledgements Effort sponsored through the Department of the Interior National Business Center under grant number NBCH The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. Method for Safety-Critical Systems: Effectiveness and Drawbacks. In Proceedings of the 15th IEEE Symposium on Integrated Circuits and Systems Design. pp , September, 2002, Porto Alegre, Brazil. [Plank97] J. S. Plank. An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance. Technical Report UT-CS Department of Computer Science, University of Tennessee. July, [Randell75] B. Randell. System Structure for Software Fault Tolerance. IEEE Transactions on Software Engineering. SE- 1: , June, [Taylor02] Michael B. Taylor,, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffmann, Paul Johnson, Jae-Wook Lee, Walter Lee, Albert Ma, Arvind Saraf, Mark Seneski, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe and Anant Agarwal. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro, Mar/Apr [Taylor04] Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, Henry Hoffmann, Paul Johnson, Jason Kim, James Psota, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe, and Anant Agarwal. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. Proceedings of International Symposium on Computer Architecture, June References [AAFV04] A. Agbaria, H. Attiya, R. Friedman, and R. Vitenberg. Quantifying Rollback Propagation in Distributed Checkpointing. Journal of Parallel and Distributed Computing. 64(3): , March, [Birman97] K. P. Birman. Building Secure and Reliable Network Applications. Manning Publishing Company and Prentice Hall. December, [EAWJ02] E. N. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson. A Survey of Rollback-Recovery Protocols in Message- Passing Systems. ACM Computing Surveys. 34(3): , September, [GS96] R. Guerraoui and A. Schiper. Fault-Tolerance by Replication in Distributed Systems. Reliable Software Technologies - Ada-Europe'96. pp , [Muchnick97] S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers [NVSR02] B. Nicolescu, R. Velazco, M. Sonza Reorda, M. Rebaudengo, and M. Violante. A Software Fault Tolerance
Power Reduction Techniques in the SoC Clock Network. Clock Power
Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a
More informationA Dynamic Link Allocation Router
A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection
More informationSwitched Interconnect for System-on-a-Chip Designs
witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased
More informationEE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution
EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution
More informationThis Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationClient/Server Computing Distributed Processing, Client/Server, and Clusters
Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the
More informationArchitectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng
Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption
More informationPrinciples and characteristics of distributed systems and environments
Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single
More informationPerformance Metrics and Scalability Analysis. Performance Metrics and Scalability Analysis
Performance Metrics and Scalability Analysis 1 Performance Metrics and Scalability Analysis Lecture Outline Following Topics will be discussed Requirements in performance and cost Performance metrics Work
More informationReliable Systolic Computing through Redundancy
Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationOC By Arsene Fansi T. POLIMI 2008 1
IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5
More informationDesign and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip
Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana
More information路 論 Chapter 15 System-Level Physical Design
Introduction to VLSI Circuits and Systems 路 論 Chapter 15 System-Level Physical Design Dept. of Electronic Engineering National Chin-Yi University of Technology Fall 2007 Outline Clocked Flip-flops CMOS
More informationControl 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
More informationAdvances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.
Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.49-54 : isrp13-005 Optimized Communications on Cloud Computer Processor by Using
More informationWeighted Total Mark. Weighted Exam Mark
CMP2204 Operating System Technologies Period per Week Contact Hour per Semester Total Mark Exam Mark Continuous Assessment Mark Credit Units LH PH TH CH WTM WEM WCM CU 45 30 00 60 100 40 100 4 Rationale
More informationCS250 VLSI Systems Design Lecture 8: Memory
CS250 VLSI Systems esign Lecture 8: Memory John Wawrzynek, Krste Asanovic, with John Lazzaro and Yunsup Lee (TA) UC Berkeley Fall 2010 CMOS Bistable 1 0 Flip State 0 1 Cross-coupled inverters used to hold
More informationHyperThreading Support in VMware ESX Server 2.1
HyperThreading Support in VMware ESX Server 2.1 Summary VMware ESX Server 2.1 now fully supports Intel s new Hyper-Threading Technology (HT). This paper explains the changes that an administrator can expect
More informationScalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
More informationHow To Understand The Concept Of A Distributed System
Distributed Operating Systems Introduction Ewa Niewiadomska-Szynkiewicz and Adam Kozakiewicz ens@ia.pw.edu.pl, akozakie@ia.pw.edu.pl Institute of Control and Computation Engineering Warsaw University of
More informationwhat operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
More informationThe Service Availability Forum Specification for High Availability Middleware
The Availability Forum Specification for High Availability Middleware Timo Jokiaho, Fred Herrmann, Dave Penkler, Manfred Reitenspiess, Louise Moser Availability Forum Timo.Jokiaho@nokia.com, Frederic.Herrmann@sun.com,
More informationCost-Performance of Fault Tolerance in Cloud Computing
Cost-Performance of Fault Tolerance in Cloud Computing Y.M. Teo,2, B.L. Luong, Y. Song 2 and T. Nam 3 Department of Computer Science, National University of Singapore 2 Shanghai Advanced Research Institute,
More informationCHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL
CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationFAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING
FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING Hussain Al-Asaad and Alireza Sarvi Department of Electrical & Computer Engineering University of California Davis, CA, U.S.A.
More informationVerification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems
Verification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems Melanie Berg 1, Kenneth LaBel 2 1.AS&D in support of NASA/GSFC Melanie.D.Berg@NASA.gov 2. NASA/GSFC Kenneth.A.LaBel@NASA.gov
More informationJava Virtual Machine: the key for accurated memory prefetching
Java Virtual Machine: the key for accurated memory prefetching Yolanda Becerra Jordi Garcia Toni Cortes Nacho Navarro Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain
More informationParallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
More informationChapter 2 Logic Gates and Introduction to Computer Architecture
Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are
More informationNetwork Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
More informationHighly Available Mobile Services Infrastructure Using Oracle Berkeley DB
Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Executive Summary Oracle Berkeley DB is used in a wide variety of carrier-grade mobile infrastructure systems. Berkeley DB provides
More informationDistributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1
Distributed Systems REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1 1 The Rise of Distributed Systems! Computer hardware prices are falling and power increasing.!
More informationAvoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas
3. Replication Replication Goal: Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas Problems: Partial failures of replicas and messages No
More informationEmbedded Systems Lecture 9: Reliability & Fault Tolerance. Björn Franke University of Edinburgh
Embedded Systems Lecture 9: Reliability & Fault Tolerance Björn Franke University of Edinburgh Overview Definitions System Reliability Fault Tolerance Sources and Detection of Errors Stage Error Sources
More informationAn Introduction to the ARM 7 Architecture
An Introduction to the ARM 7 Architecture Trevor Martin CEng, MIEE Technical Director This article gives an overview of the ARM 7 architecture and a description of its major features for a developer new
More informationAn On-Line Algorithm for Checkpoint Placement
An On-Line Algorithm for Checkpoint Placement Avi Ziv IBM Israel, Science and Technology Center MATAM - Advanced Technology Center Haifa 3905, Israel avi@haifa.vnat.ibm.com Jehoshua Bruck California Institute
More informationHistorically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately.
Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately. Hardware Solution Evolution of Computer Architectures Micro-Scopic View Clock Rate Limits Have Been Reached
More informationA Lab Course on Computer Architecture
A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,
More informationVirtual machine interface. Operating system. Physical machine interface
Software Concepts User applications Operating system Hardware Virtual machine interface Physical machine interface Operating system: Interface between users and hardware Implements a virtual machine that
More informationINSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER
Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Prof. Silvano, Politecnico di Milano
More informationMULTI-SPERT PHILIPP F ARBER AND KRSTE ASANOVI C. International Computer Science Institute,
PARALLEL NEURAL NETWORK TRAINING ON MULTI-SPERT PHILIPP F ARBER AND KRSTE ASANOVI C International Computer Science Institute, Berkeley, CA 9474 Multi-Spert is a scalable parallel system built from multiple
More informationEfficient Interconnect Design with Novel Repeater Insertion for Low Power Applications
Efficient Interconnect Design with Novel Repeater Insertion for Low Power Applications TRIPTI SHARMA, K. G. SHARMA, B. P. SINGH, NEHA ARORA Electronics & Communication Department MITS Deemed University,
More informationSOC architecture and design
SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external
More informationAttaining EDF Task Scheduling with O(1) Time Complexity
Attaining EDF Task Scheduling with O(1) Time Complexity Verber Domen University of Maribor, Faculty of Electrical Engineering and Computer Sciences, Maribor, Slovenia (e-mail: domen.verber@uni-mb.si) Abstract:
More informationBig data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
More informationUNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction
More informationStream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
More informationCASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1
CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level -ORACLE TIMESTEN 11gR1 CASE STUDY Oracle TimesTen In-Memory Database and Shared Disk HA Implementation
More informationDesign and Implementation of the Heterogeneous Multikernel Operating System
223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,
More information18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two
age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,
More informationINTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 ISSN 0976 6464(Print)
More informationLoad Balancing in Distributed Data Base and Distributed Computing System
Load Balancing in Distributed Data Base and Distributed Computing System Lovely Arya Research Scholar Dravidian University KUPPAM, ANDHRA PRADESH Abstract With a distributed system, data can be located
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationControlling a Dot Matrix LED Display with a Microcontroller
Controlling a Dot Matrix LED Display with a Microcontroller By Matt Stabile and programming will be explained in general terms as well to allow for adaptation to any comparable microcontroller or LED matrix.
More informationLow-Overhead Hard Real-time Aware Interconnect Network Router
Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering
More informationSix Strategies for Building High Performance SOA Applications
Six Strategies for Building High Performance SOA Applications Uwe Breitenbücher, Oliver Kopp, Frank Leymann, Michael Reiter, Dieter Roller, and Tobias Unger University of Stuttgart, Institute of Architecture
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
More informationSnapshots in Hadoop Distributed File System
Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any
More informationLizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin
BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394
More informationA Practical Approach of Storage Strategy for Grid Computing Environment
A Practical Approach of Storage Strategy for Grid Computing Environment Kalim Qureshi Abstract -- An efficient and reliable fault tolerance protocol plays an important role in making the system more stable.
More informationSpacecraft Computer Systems. Colonel John E. Keesee
Spacecraft Computer Systems Colonel John E. Keesee Overview Spacecraft data processing requires microcomputers and interfaces that are functionally similar to desktop systems However, space systems require:
More informationUSING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES
USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES 1 ALIREZA POORDAVOODI, 2 MOHAMMADREZA KHAYYAMBASHI, 3 JAFAR HAMIN 1, 3 M.Sc. Student, Computer Department, University of Sheikhbahaee,
More informationChapter 6, The Operating System Machine Level
Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General
More informationA Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi
More informationCommunication Networks. MAP-TELE 2011/12 José Ruela
Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 11 Memory Management Computer Architecture Part 11 page 1 of 44 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin
More informationCHAPTER 11: Flip Flops
CHAPTER 11: Flip Flops In this chapter, you will be building the part of the circuit that controls the command sequencing. The required circuit must operate the counter and the memory chip. When the teach
More informationAn Operating System for Multicore and Clouds
An Operating System for Multicore and Clouds Mechanisms and Implementataion David Wentzlaff, Charles Gruenwald III, Nathan Beckmann, Kevin Modzelewski, Adam Belay, Lamia Youseff, Jason Miller, Anant Agarwal
More informationISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7
ISSCC 2003 / SESSION 4 / CLOCK RECOVERY AND BACKPLANE TRANSCEIVERS / PAPER 4.7 4.7 A 2.7 Gb/s CDMA-Interconnect Transceiver Chip Set with Multi-Level Signal Data Recovery for Re-configurable VLSI Systems
More informationGraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
More informationTools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:
Tools Page 1 of 13 ON PROGRAM TRANSLATION A priori, we have two translation mechanisms available: Interpretation Compilation On interpretation: Statements are translated one at a time and executed immediately.
More informationOperating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest
Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest 1. Introduction Few years ago, parallel computers could
More informationHardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy
Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of
More informationAvailability Digest. MySQL Clusters Go Active/Active. December 2006
the Availability Digest MySQL Clusters Go Active/Active December 2006 Introduction MySQL (www.mysql.com) is without a doubt the most popular open source database in use today. Developed by MySQL AB of
More informationReal-Time (Paradigms) (51)
Real-Time (Paradigms) (51) 5. Real-Time Communication Data flow (communication) in embedded systems : Sensor --> Controller Controller --> Actor Controller --> Display Controller Controller Major
More informationSOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs
SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND -Based SSDs Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim Department of Computer Science and Engineering, Seoul National University,
More informationDigitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation
More informationChapter 2 Heterogeneous Multicore Architecture
Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is
More informationDigital Signal Controller Based Automatic Transfer Switch
Digital Signal Controller Based Automatic Transfer Switch by Venkat Anant Senior Staff Applications Engineer Freescale Semiconductor, Inc. Abstract: An automatic transfer switch (ATS) enables backup generators,
More informationArchitectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
More informationFault Tolerant Matrix-Matrix Multiplication: Correcting Soft Errors On-Line.
Fault Tolerant Matrix-Matrix Multiplication: Correcting Soft Errors On-Line Panruo Wu, Chong Ding, Longxiang Chen, Teresa Davies, Christer Karlsson, and Zizhong Chen Colorado School of Mines November 13,
More informationA Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
More informationCentralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures
Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do
More informationSwitch Fabric Implementation Using Shared Memory
Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today
More informationOn Demand Loading of Code in MMUless Embedded System
On Demand Loading of Code in MMUless Embedded System Sunil R Gandhi *. Chetan D Pachange, Jr.** Mandar R Vaidya***, Swapnilkumar S Khorate**** *Pune Institute of Computer Technology, Pune INDIA (Mob- 8600867094;
More informationTRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN
TRUE SINGLE PHASE CLOCKING BASED FLIP-FLOP DESIGN USING DIFFERENT FOUNDRIES Priyanka Sharma 1 and Rajesh Mehra 2 1 ME student, Department of E.C.E, NITTTR, Chandigarh, India 2 Associate Professor, Department
More informationA Framework for Highly Available Services Based on Group Communication
A Framework for Highly Available Services Based on Group Communication Alan Fekete fekete@cs.usyd.edu.au http://www.cs.usyd.edu.au/ fekete Department of Computer Science F09 University of Sydney 2006,
More informationMultithreading Lin Gao cs9244 report, 2006
Multithreading Lin Gao cs9244 report, 2006 2 Contents 1 Introduction 5 2 Multithreading Technology 7 2.1 Fine-grained multithreading (FGMT)............. 8 2.2 Coarse-grained multithreading (CGMT)............
More informationA New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications
1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency
More informationDynamic resource management for energy saving in the cloud computing environment
Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan
More informationHRG Assessment: Stratus everrun Enterprise
HRG Assessment: Stratus everrun Enterprise Today IT executive decision makers and their technology recommenders are faced with escalating demands for more effective technology based solutions while at
More informationA Generic Network Interface Architecture for a Networked Processor Array (NePA)
A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction
More informationA CDMA Based Scalable Hierarchical Architecture for Network- On-Chip
www.ijcsi.org 241 A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip Ahmed A. El Badry 1 and Mohamed A. Abd El Ghany 2 1 Communications Engineering Dept., German University in Cairo,
More informationHigh Performance Computer Architecture
High Performance Computer Architecture Volker Lindenstruth Lehrstuhl für Hochleistungsrechner Archittektur Ruth-Moufang Str. 1 email: ti@compeng.de URL: www.compeng.de Telefon: 798-44100 Volker Lindenstruth
More informationBigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
More information