Verification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems



Similar documents
LA-UR- Title: Author(s): Intended for: Approved for public release; distribution is unlimited.

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source)

Implementation Details

Use of Reprogrammable FPGA on EUCLID mission

Combinational Controllability Controllability Formulas (Cont.)

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

Chapter 13: Verification

CLASS: Combined Logic Architecture Soft Error Sensitivity Analysis

How To Write An Fpa Programmable Gate Array

7a. System-on-chip design and prototyping platforms

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE

Fault Modeling. Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults. Transistor faults Summary

Modeling Sequential Elements with Verilog. Prof. Chien-Nan Liu TEL: ext: Sequential Circuit

Meeting the Demands of Robotic Space Applications with CompactPCI

Sequential Logic. (Materials taken from: Principles of Computer Hardware by Alan Clements )

Design Verification & Testing Design for Testability and Scan

Digital Design Verification

Dr Markus Hagenbuchner CSCI319. Distributed Systems

Network (Tree) Topology Inference Based on Prüfer Sequence

MultiPARTES. Virtualization on Heterogeneous Multicore Platforms. 2012/7/18 Slides by TU Wien, UPV, fentiss, UPM

White Paper FPGA Performance Benchmarking Methodology

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic

SAN Conceptual and Design Basics

Reviving smart card analysis

Introduction to CMOS VLSI Design (E158) Lecture 8: Clocking of VLSI Systems

Example-driven Interconnect Synthesis for Heterogeneous Coarse-Grain Reconfigurable Logic

Rapid System Prototyping with FPGAs

Embedded Systems Lecture 9: Reliability & Fault Tolerance. Björn Franke University of Edinburgh

Digital Systems Design! Lecture 1 - Introduction!!

Lecture 7: Clocking of VLSI Systems

Contents. System Development Models and Methods. Design Abstraction and Views. Synthesis. Control/Data-Flow Models. System Synthesis Models

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Latches, the D Flip-Flop & Counter Design. ECE 152A Winter 2012

Selective SWIFT-R. A flexible software-based technique for soft error mitigation in low-cost embedded systems

Programming NAND devices

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Index Terms Domain name, Firewall, Packet, Phishing, URL.

Hardware Trojans Detection Methods Julien FRANCQ

System-on. on-chip Design Flow. Prof. Jouni Tomberg Tampere University of Technology Institute of Digital and Computer Systems.

Multi-layer Structure of Data Center Based on Steiner Triple System

Verification and Validation of Software Components and Component Based Software Systems

KEEP IT SYNPLE STUPID

Engineering Change Order (ECO) Support in Programmable Logic Design

ESP-CV Custom Design Formal Equivalence Checking Based on Symbolic Simulation

Upon completion of unit 1.1, students will be able to

Structural Cloud Audits that Protect Private Information

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

Digital Systems. Role of the Digital Engineer

Testing & Verification of Digital Circuits ECE/CS 5745/6745. Hardware Verification using Symbolic Computation

Boolean Network Models

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

High Availability Server Supports Dependable Infrastructures

HyperAccess Access Control System

Introduction to Digital System Design

Life Cycle of a Memory Request. Ring Example: 2 requests for lock 17

Theory of Logic Circuits. Laboratory manual. Exercise 3

Introduction to Programmable Logic Devices. John Coughlan RAL Technology Department Detector & Electronics Division

So far we have investigated combinational logic for which the output of the logic devices/circuits depends only on the present state of the inputs.

Having read this workbook you should be able to: recognise the arrangement of NAND gates used to form an S-R flip-flop.

9/14/ :38

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

1. Relational database accesses data in a sequential form. (Figures 7.1, 7.2)

High-Level Synthesis for FPGA Designs

Emulated Digital Control System Validation in Nuclear Power Plant Training Simulators

Chapter 2 Clocks and Resets

Certification Authorities Software Team (CAST) Position Paper CAST-13

Switch Fabric Implementation Using Shared Memory

Modeling Latches and Flip-flops

SOCWIRE: A SPACEWIRE INSPIRED FAULT TOLERANT NETWORK-ON-CHIP FOR RECONFIGURABLE SYSTEM-ON-CHIP DESIGNS

EXPERIMENT 8. Flip-Flops and Sequential Circuits

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

PROGETTO DI SISTEMI ELETTRONICI DIGITALI. Digital Systems Design. Digital Circuits Advanced Topics

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

SOFTWARE-IMPLEMENTED SAFETY LOGIC Angela E. Summers, Ph.D., P.E., President, SIS-TECH Solutions, LP

(Refer Slide Time: 2:03)

Engr354: Digital Logic Circuits

CNES fault tolerant architectures intended for electronic COTS components in space applications

PIONEER RESEARCH & DEVELOPMENT GROUP

SDLC Controller. Documentation. Design File Formats. Verification

81110A Pulse Pattern Generator Simulating Distorted Signals for Tolerance Testing

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Hunting Asynchronous CDC Violations in the Wild

Software Defined Radio Architecture for NASA s Space Communications

FSM Decomposition and Functional Verification of FSM Networks

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Chapter 4 Software Lifecycle and Performance Analysis

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS

ECE380 Digital Logic

Memory Elements. Combinational logic cannot remember

Attaining EDF Task Scheduling with O(1) Time Complexity

BOSS/EVERCONTROL OS /Middleware Target ultra high Dependability. Abstract

SPACECRAFT AREA NETWORK (SCAN) FOR PLUG AND PLAY OF DEVICES

No Room for Error: Creating Highly Reliable, High-Availability FPGA Designs

路 論 Chapter 15 System-Level Physical Design

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Transcription:

Verification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems Melanie Berg 1, Kenneth LaBel 2 1.AS&D in support of NASA/GSFC Melanie.D.Berg@NASA.gov 2. NASA/GSFC Kenneth.A.LaBel@NASA.gov 1

Acknowledgements Some of this work has been sponsored by the NASA Electronic Parts and Packaging (NEPP) Program and the Defense Threat Reduction Agency (DTRA). Thanks is given to the NASA Goddard Radiation Effects and Analysis Group (REAG) for their technical assistance and support. REAG is led by Kenneth LaBel and Jonathan Pellish. Contact Information: Melanie Berg: NASA Goddard REAG FPGA Principal Investigator: Melanie.D.Berg@NASA.GOV 2

Acronyms Binary decision diagram (BDD) Block Triple Modular Redundancy (BTMR) Combinatorial logic (CL) Computer Aided Design (CAD) Device under test (DUT) Distributed triple modular redundancy (DTMR) Dual interlocked storage cell (DICE) Edge-triggered flip-flops (DFFs) Equivalence checking (EC) Error detection and correction (EDAC) Error rate (de/dt) Fault tolerance (FT) Field programmable gate array (FPGA) Formal verification (FV) Global triple modular redundancy (GTMR) Hardware description language (HDL) Input output (I/O) Linear energy transfer (LET) Local triple modular redundancy (LTMR) Mitigation Window (MW) Mean time to failure (MTTF) Operational frequency (fs) Power on reset (POR) Radiation Effects and Analysis Group (REAG) Single Error Correct Double Error Detect (SECDED) Single event functional interrupt (SEFI) Single event effects (SEEs) Single event latch-up (SEL) Single event transient (SET) Single event upset (SEU) Single event upset cross-section (σ SEU ) Static random access memory (SRAM) Static timing analysis (STA) Triple modular redundancy (TMR) 3

Abstract We propose a method for triple modular redundancy (TMR) insertion verification that satisfies the process for reliable and trusted systems. If a system is expected to be protected using TMR, improper insertion can jeopardize the reliability and security of the system. Due to the complexity of the verification process and the complexity of digital designs, there are currently no available techniques that can provide complete and reliable confirmation of TMR insertion. 4

Overview This presentation addresses the challenge of confirming that TMR has been inserted without corruption of functionality and with correct application of the expected TMR topology. The proposed verification method combines the usage of existing formal analysis tools with a novel search-detect-and-verify tool. This presentation does not address the strength/performance of the selected TMR. The susceptibility of a circuit post-tmr insertion should be evaluated using accelerated testing. Selection of the appropriate TMR scheme is based on the device type, requirements, and design- flush frequency. 5

HDL: Hardware description language STA: Static timing analysis EC: Equivalence checking Functional Specification FPGA User Design Flow HDL Behavioral Simulation Output of synthesis is a gate net list that represents the given HDL function. Synthesis Place and Route STA, EC, and Gate Level Simulation Performed by user with manufacturer FPGA specific design tools Create Configuration STA, and back annotated Gate Level Simulation User creates a design that is mapped into a manufacturer provided FPGA 6

Triple Modular Redundancy (TMR) 7

TMR Schemes Use Majority Voting MajorityVoter = I1 I 2 + I0 I 2 + I0 I1 I0 I1 I2 Majority Voter 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 Triplicate and Vote 8

Triplicate and Vote: But Implementation Schemes Are Not All the Same TMR Schemes are differentiated by how the triplication is performed and where the voters are placed S a ed ata put ca ot be oted out a d ca cause upset a t ee s D SET Q D CLR SET CLR Q Q Q V O T E R D SET Q CLR Singular Data Path: Localized Triple Modular redundancy (LTMR) Q Redundant Data Path: Distributed TMR (DTMR) or Global TMR (GTMR = XTMR) But it s not this easy!!!!!!!!!!!!!!!!!!!! 9

Various TMR Schemes Block diagram of block TMR (BTMR): a complex function containing combinatorial logic (CL) and flip-flops (DFFs) is triplicated as three black boxes; majority voters are placed at the outputs of the triplet. Block diagram of local TMR (LTMR): only flipflops (DFFs) are triplicated and datapaths stay singular; voters are brought into the design and placed in front of the DFFs. Block Diagram of distributed TMR (DTMR): the entire design is triplicated except for the global routes (e.g., clocks); voters are brought into the design and placed after the flip-flops (DFFs). DTMR masks and corrects most 10 single event upsets (SEUs).

HDL: Hardware description language Functional Specification FPGA User Design Flow HDL Output of synthesis is a gate net list that represents the given HDL function. Synthesis Place and Route Create Configuration TMR can be inserted during synthesis or post synthesis. If inserted post synthesis, the gate level netlist is replicated, ripped apart, and voters + feedback are inserted. 11

Concerns during the TMR Insertion Process Synthesis TMR insertion: Did the tool insert the correct type of TMR? Did the user correctly select the expected type of TMR? Does the tool implement the correct the topology of the expected TMR? Is the functionality implemented as expected? Post-synthesis TMR insertion: Is the functionality broken during the netlist replication and voter+feedback insertion process? Is the functionality implemented as expected? Only DTMR (or XTMR) is offered in available tools. Who is involved in the tool development process? How is the tool development environment controlled and verified? 12

Current Verification Process of TMR When using a tool the designer usually assumes the insertion to be reliable. Not good enough: I received a call from an organization that used synopsis to insert TMR. They stated that the results were compatible to the design that had no mitigation. This is because the version of the tool only had LTMR available. LTMR should never be used with SRAM-based FPGAs. This has been reported. Group didn t take into account nor verify the topology of the inserted TMR. Other suggestions to TMR verification: Simulation, Fault injection, and Formal Verification. None of these suggestions will suffice! 13

Verification of TMR Insertion Verification is two fold: Verification that the functionality was not broken during the TMR insertion process. Verification that the topology of the TMR scheme matches the expected methodology. Simulation and fault injection: Although any type verification helps, simulation and fault injection is ineffective as a total proof because of its limited exploration of state-space. Limited state-space traversal cannot ensure that all nodes have been mitigated as expected. Reliable verification is compromised. 14

Formal Verification (FV): Created to address the state-space traversal challenge. 15

Formal Verification: Binary Decision Diagram (BDD) Synchronous circuits are modeled using BDDs. Verification of logic is performed by searching BDDs: Find a given logic group, and search through to terminal nodes while evaluating logic function. Types of searches are mostly event based: A is always triggered if B occurs. A is never triggered if B occurs. A is triggered exactly n- cycles after B occurs. 16

Challenges of BDD Implementation BDD nodes do not represent circuit elements (e.g., DFFs or combinatorial logic blocks). BDDs are logic representations of the circuit under evaluation. Logic representation of a design requires exponentially more nodes than circuit graphs. In most cases, full designs cannot be represented in one BDD. In order to model a design using BDDs: Designs are intelligently partitioned, and Groups of redundant logic nodes are collapsed into one group. 17

Collapse of Redundant Logic 18

Challenges of Formal Verification and TMR Verification The intention of TMR implementation is to insert redundancy. However, FV BDD modeling removes redundant nodes. Hence, FV, as currently implemented, cannot prove that: redundancy exists for all nodes and that it is inserted correctly. 19

FV and Equivalence Checking (EC) Equivalence checking is a watered-down version of FV. It is used to prove that the synthesized logic output matches the designer generated HDL. Hence, EC can be used to prove that the insertion of TMR has not broken functionality. However, once again, it cannot be used to verify that the expected TMR has been inserted correctly. EC tools are readily available. 20

Proposed TMR Verification Methodology Functional Verification: Use EC. Topology Verification: Use a circuit graph (model) to verify TMR topology with cone-of-logic theory. A cone-of-logic is defined by selecting an element under analysis (vertex) and performing a backward trace towards its base components. The trace starts at the cone-of-logic vertex (in this case a voter or another DFF) and ends at the vertex s previous stage of flip-flops (DFFs) or a system input (cone-of-logic base elements). The goal is to verify whether the three copies (TMR domains) of logic that feed each voter under analysis are equivalent and the voters are inserted as expected 21

BTMR Topological Verification First Stage of Algorithm: Voter and Connected DFF Evaluation Search for the voter under analysis; i.e., first stage vertex. This voter is directly connected to a device output. It will be referred to as output-voter. Establish the direct cone-of-logic that feeds the output-voter. For proper synchronous design, it is expected that the first cone-of-logic stage consist solely of the triplicated DFFs and the output-voter. Verify that the cones-of-logic per TMR domain are equivalent copies of each other. This step is simply checking that three of the same copies of DFFs are directly connected to the voter output. 22

BTMR Topological Verification Second Stage of Algorithm: Internal Circuitry Evaluation For the following iterations, the vertex for each cone-of-logic is no longer a voter. Alternatively, the vertices are DFFs. The Start Points for each cone s backward trace remain DFFs or system inputs. Recursive iterations within the algorithm are based on cones-of-logic and are exhausted when all system inputs that are associated with the first stage cone-of-logic (i.e., the voter) are reached and verified. Perform a backwards trace to the previous stage of DFFs (cone-of-logic base components); and establish the cone-of-logic vertex (DFF) under analysis. Search for the two other TMR domain vertex-dff copies and establish their cones-of-logic. Verify that the three cones-of-logic are equivalent. 23

DTMR Topological Verification First Iteration of Algorithm: Identify all system output DFFvoter pairs. As a clarification, these components are physically connected to the DTMR design s output. Establish a cone-of-logic for each of the system output DFFvoter pairs. Search and identify the TMR replicas of the cones-of-logic. Compare cone-of-logic replicas across TMR domains. Verify that each cone-of-logic is equivalent to its copies. 24

DTMR Topological Verification Second Iteration of Algorithm: After performing the first iteration, each of the DFF-voter pairs that were labeled as coneof-logic base elements now become cone-of-logic vertices. The recursive procedure terminates when all paths have been traced back to system inputs. 25

Conclusion The decision to mitigate a design with TMR is associated with a system with significantly high expectations of reliability and trust. Improper TMR insertion can comprise a mission along with security. This presentation describes a novel TMR insertion verification process with two primary concerns: proof that TMR insertion leaves functionality intact; and proof that TMR insertion is topologically implemented as expected. Computer aided design (CAD) tools exist that can verify original (unmitigated) functionality is intact. It is important to take into account that simply proving original functionality is intact does not definitively prove that mitigation is inserted correctly. We suggest performing an additional stage of TMR insertion verification. This stage evaluates design topology. We refer to the proposed topological evaluation tool as searchdetect-and-verify. 26