CNES fault tolerant architectures intended for electronic COTS components in space applications



Similar documents
Modeling a GPS Receiver Using SystemC

Verification of Triple Modular Redundancy (TMR) Insertion for Reliable and Trusted Systems

SAN Conceptual and Design Basics

CS 6290 I/O and Storage. Milos Prvulovic

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

MultiPARTES. Virtualization on Heterogeneous Multicore Platforms. 2012/7/18 Slides by TU Wien, UPV, fentiss, UPM

ELEC 5260/6260/6266 Embedded Computing Systems

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS

Software Engineering for Real- Time Systems.

Hardware RAID vs. Software RAID: Which Implementation is Best for my Application?

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

Contents. System Development Models and Methods. Design Abstraction and Views. Synthesis. Control/Data-Flow Models. System Synthesis Models

Codesign: The World Of Practice

NEC Corporation of America Intro to High Availability / Fault Tolerant Solutions

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Spacecraft Computer Systems. Colonel John E. Keesee

Attaining EDF Task Scheduling with O(1) Time Complexity

Development. Igor Sheviakov Manfred Zimmer Peter Göttlicher Qingqing Xia. AGIPD Meeting April, 2014

Eureka Technology. Understanding SD, SDIO and MMC Interface. by Eureka Technology Inc. May 26th, Copyright (C) All Rights Reserved

Use of Reprogrammable FPGA on EUCLID mission

RAID Technology Overview

Chapter 13 Selected Storage Systems and Interface

Networking Remote-Controlled Moving Image Monitoring System

Memory Testing. Memory testing.1

Boeing B Introduction Background. Michael J. Morgan

Architectures and Platforms

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05

Embedded Systems. introduction. Jan Madsen

MsC in Advanced Electronics Systems Engineering

Lecture 3 - Model-based Control Engineering

CMS Level 1 Track Trigger

ELECTROTECHNIQUE IEC INTERNATIONALE INTERNATIONAL ELECTROTECHNICAL

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

Non-Isolated Analog Voltage/Current Output module IC695ALG704 provides four configurable voltage or current output channels. Isolated +24 VDC Power

Fault Tolerant Servers: The Choice for Continuous Availability on Microsoft Windows Server Platform

4 non-safe digital I/O channels 2 IO-Link Master V1.1 slots. Figure 1. Figure 2. Type code. TBPN-L1-FDIO1-2IOL Ident no

CryptoFirewall Technology Introduction

Open Architecture Design for GPS Applications Yves Théroux, BAE Systems Canada

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Radiation effects on space electronics. Jan Kenneth Bekkeng, University of Oslo - Department of Physics

Embedded Systems Lecture 9: Reliability & Fault Tolerance. Björn Franke University of Edinburgh

FPGAs in Next Generation Wireless Networks

Using RAID6 for Advanced Data Protection

Architectures for Distributed Real-time Systems

CCS Hardware Test and Commissioning Plan

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

Flash Corruption: Software Bug or Supply Voltage Fault?

MSITel provides real time telemetry up to 4.8 kbps (2xIridium modem) for balloons/experiments

From Control Loops to Software

Serial Communications

COMP 7970 Storage Systems

A Gigabit Transceiver for Data Transmission in Future HEP Experiments and An overview of optoelectronics in HEP

WEA-Base. User manual for load cell transmitters. UK WEA-Base User manual for load cell transmitters Version 3.2 UK

Application Note 120 Communicating Through the 1-Wire Master

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

IMA for space Status and Considerations

Operating Systems. Lecture 03. February 11, 2013

A Model Based Approach for Safety Analysis Embedding Altarica in Alstom MBSE Process

Introduction to Programmable Logic Devices. John Coughlan RAL Technology Department Detector & Electronics Division

PROFINET IO Diagnostics 1

CFD Implementation with In-Socket FPGA Accelerators

Chapter 1 Lesson 3 Hardware Elements in the Embedded Systems Chapter-1L03: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

How To Develop An Iterio Data Acquisition System For A Frustreo (Farc) (Iterio) (Fcfc) (For Aterio (Fpc) (Orterio).Org) (Ater

Certification of a Scade 6 compiler

Measurement and Analysis Introduction of ISO7816 (Smart Card)

Computer Systems Structure Input/Output

Hardware in the Loop (HIL) Testing VU 2.0, , WS 2008/09

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

Inside Track Research Note. In association with. Enterprise Storage Architectures. Is it only about scale up or scale out?

Assurance Cases and Test Design Analysis

VtRES Towards Hardware Embedded Virtualization Technology: Architectural Enhancements to an ARM SoC. ESRG Embedded Systems Research Group

Digital Signal Controller Based Automatic Transfer Switch

7a. System-on-chip design and prototyping platforms

Am186ER/Am188ER AMD Continues 16-bit Innovation

I/O. Input/Output. Types of devices. Interface. Computer hardware

BUILD VERSUS BUY. Understanding the Total Cost of Embedded Design.

VMware View 4 with PCoIP I N F O R M AT I O N G U I D E

Distributed Databases

Trap list of NEC ESMPRO Agent Ver.4.5 (Syslog Monitor)

USE OF SCILAB FOR SPACE MISSION ANALYSIS AND FLIGHT DYNAMICS ACTIVITIES

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Developments in Point of Load Regulation

Transcription:

4ème Atelier Thématique du RIS Nlles architectures & techno des processeurs et SdF Aix-en-Provence 20 novembre 2002 CNES fault tolerant architectures intended for electronic COTS components in space applications Michel PIGNOL CNES DTS/AE/SEA/IL 18 avenue Edouard Belin 31401 Toulouse Cedex 4 - FRANCE Tel.: +33 (0)5 61 27 43 61 michel.pignol@cnes.fr CNES 1

Outline Motivation with regard to electronic commercial components (COTS) embedded in satellites Needs for low cost fault tolerant architectures Presentation of two CNES low cost fault tolerant architectures Conclusion 2

Motivation wrt HW COTS (1/2) of choice in HiRel (RadHard - RadTol) ICs accelerated since W. Perry memorandum RadHard ICs perfo. << COTS perfo. Ambitious satellite missions will require higher and higher performances Needs for next satellite generation mass / power consumption / cost / planning A possible solution is: electronic commercial components (COTS) 3

Motivation wrt HW COTS (2/2) Possession cost Component level Mission dependant Engineering cost Purchase cost Rad-Hard/Tol HiRel?? Engin. cost Engin. cost Purch. cost COTS COTS = Amplification effect at the system level System level Important gains: Mission feasibility Mission performance Evolution capability Volume Mass Power consumption Thanks to COTS PERFORMANCES Main motivation = System level gains 4

Fault tolerant architecture needs Total integrated dose SEL COTS techno. OK ICs selection, local shielding COTS techno. OK ICs selection, local antilatching system SEE (upset + transient inside combinational logic) COTS are sensitive We have to live with! HW COTS requires fault tolerant architectures Need for low cost fault tolerant architectures 5

Two CNES fault tolerant archi. Avoid heavy triplication Fault tolerance cost (volume, mass, pw cons.) Scientific missions: HW duplication minimised DMT Application missions: HW duplication accepted DT2 90-95 % > 99,99 % SEE coverage 6

DMT key elements DMT - Duplex Multiplexé dans le Temps (duplex in time) DMT technology = time replication Concept Up-to-date COTS µp High processing perfo. Detection Two virtual channels run on a single physical channel (a single µp) Recovery Based on a safe context storage (safe wrt direct SEE and not accessible by a faulty µp) Granularity for detect./recov. Macro-granularity 7

Macro-granularity Simple example of a real time SW architecture: Real time interrupt Task A OBT RTC - Real Time Cycle One iteration of a given task (see next slide) Task B ACS Task C Therm Task D Scrubbing Task C operational cycle preemption Macrogranularity for detection and recovery easiness of the implementation t 8

Detection: DMT sequencing Without DMT Acq # N Iteration #N of a given task Processing + Commands generation # N Cds temporarily stored in table for waiting actuation during Cds phase Commands (Cds) generated to actuators With DMT 1 Acq # N 2 Acq # N Comp. acq # N 1 Processing # N 2 Processing # N Comp. results # N Cds gene. # N Acq In context In param. VC1 VC2 VC1 - Virtual Channel #1 VC2 Acq phase Processing phase Cds phase No in/out Cds Out context Out param. 9

Problem of recovery wrt duplex A duplex is able to detect comparison (cf. previous slide) A duplex is not able to recover no information is available for determining which is the healthy/faulty channel (unlike within a triplex architecture) Specific mechanisms are required for implementing a recovery within a duplex architecture 10

Two recovery modes Forward recovery µp = 2 x perfo. Real time interrupt Task #i VC1+VC2 Ctxt #N-1 #N #N+1 #N+2 Cds #N-1 SEE Cds #N Cds #N+1 Cds #N+2 Cds #N are missing (e.g. tasks with control-loop) Backward recovery µp < 4 x perfo. Task #i VC1+VC2 Ctxt #N-1 #N #N #N+1 #N+2 Cds #N-1 SEE VC1+VC2Cds Cds #N #N Cds #N+1 Mix: forward and backward recovery Cds #N+2 Iteration #N (i.e. VC1 + VC2) is played again (tasks having specific time relationship) 11

HW Address checking Mem & I/O access checking: Three critical operations... tasks context storage Cesam SEE Faulty µp µp.. SW comparison. access to Cds bus performed with a high level of safety, thanks to an HW address monitoring (FPGA / ASIC named CESAM, designed to be SEU free) 12

DMT features (1/2) Microprocessor impact The SEE tolerance requires 50 % of the computing performance... thus the µp must have twice the nominal computing performance Detection SEE wrt acq. Acquisitions replication & comparison SEE wrt processing Processing replication & comparison of the main processing results (task context, out parameters, Cds) A single µp with error free comparison even in presence of faults [thanks to CESAM] Safety Erroneous Cds can t be generated No Cds issued before a complete check of proces. results & HW monitoring [CESAM] Between Cds computation and Cds generation, Cds are stored inside internal tables 13

DMT features (2/2) Real time recovery A given iteration of a task where an error is detected: is cancelled (forward recovery) It is like if the iteration #N of this task is missing, the task having an hole inside its commands generation cycle (typically for tasks with control loop algorithms ) or is executed again (backward recovery) But, in this case, the µp must have up to 4 time the computing perfo. (typically for tasks having specific time relationship not compatible with one RTC delay) Mix: e.g. task#i with forward, task#j with backward Recovery modes are allowed thanks to a safe storage of context data [thanks to CESAM] I/O must be centralised at the beginning (for acquisition) and at the end (for actuation) of each task 14

Tolerance performance DMT performances Fail operational: > 95 % (measured) Fail op. if ETR protected: > 98 to 99 % (theory) Fail safe: > 99,99 % (theory, measure DMT requires a µp having about twice the nominal computing performance in course) The recurring cost is not impacted by DMT The SW development cost is impacted, but not a lot thanks to a DMT pre-compiler 15

DT2 key elements DT2 - Dual Duplex Tolerant to Transients DT2 technology = macro-synchronised master/checker archi. & minimal HW duplic. Concept The duplication is limited to the PUC (Processing Unit Core) µp + companion chip + µp s mem Detection Two minimun physical channels (PUC) Recovery Based on a safe context storage (safe wrt direct SEE and not accessible by a faulty µp) Granularity for detect./recov. Macro-granularity 16

DT2 architecture PUC without DT2 Processing Unit Core with DT2 Mem PUC#1 Mem Mem PUC#2 Edac Edac Edac µp µp Error µp Comp. chip Cesam Syclopes Cesam Other parts (Bckpl bus - I/O bus) Other parts of the computer (Backplane bus - I/O bus) 17

DT2 features A large part of DMT features are also true for DT2 except the following ones: Microprocessor impact No more impact on the µp computing performance but the SEE tolerance requires 50 % of the total computing perfo. Detection SEE wrt processing Processing duplication & comparison of the main processing results (task context, out parameters, Cds) Memory and I/O access checking Limited to the tasks context storage function CESAM (Mem. access check) / SYCLOPES (Synchro. & Comp. & I/O) FPGAs / ASICs designed to be SEU free 18

DT2 performances Tolerance performance Fail operational & safe: > 99,99 % (theory) The recurring cost is a little bit increased by DT2, but less than an heavy triplex The SW development cost is less impacted than with DMT The protection of a COTS RTOS needs specific mechanisms (CNES mechanisms validation through breadboard in course) 19

Conclusion Main motivation for HW COTS is system level gains Low cost fault tolerant architectures are required to be able to embed electronic commercial components inside satellites The CNES DMT and DT2 patented architectures minimise costly HW expansion; SW impact (mainly for DMT) is reduced thanks to a pre-compiler DMT and DT2 don t require costly developments DMT and DT2 are application generic 20