Embedded Real-Time Systems (TI-IRTS) Safety and Reliability Patterns B.D. Chapter 9. 405-456



Similar documents
Safety Critical & High Availability Systems

Spacecraft Computer Systems. Colonel John E. Keesee

Fault Tolerance & Reliability CDA Chapter 3 RAID & Sample Commercial FT Systems

Domains. Seminar on High Availability and Timeliness in Linux. Zhao, Xiaodong March 2003 Department of Computer Science University of Helsinki

MultiPARTES. Virtualization on Heterogeneous Multicore Platforms. 2012/7/18 Slides by TU Wien, UPV, fentiss, UPM

A distributed data processing architecture for real time intelligent transport systems

Service & Support. Higher-level safe switch-off of the power supply of functionally nonsafe standard modules? Wiring Examples.

Value Paper Author: Edgar C. Ramirez. Diverse redundancy used in SIS technology to achieve higher safety integrity

Basic Components of a Spacecraft Computer. Hardware, Software, and Documentation

University of Paderborn Software Engineering Group II-25. Dr. Holger Giese. University of Paderborn Software Engineering Group. External facilities

Xanadu 130. Business Class Storage Solution. 8G FC Host Connectivity and 6G SAS Backplane. 2U 12-Bay 3.5 Form Factor

Outline. Failure Types

DEDICATED TO EMBEDDED SOLUTIONS

Sheet 7 (Chapter 10)

Intellicus Enterprise Reporting and BI Platform

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

Operating Systems 4 th Class

Silver Peak Virtual Appliances

Software Safety Basics

VMware vsphere 5 Quick Start Guide

High Availability Design Patterns

Do AUTOSAR and functional safety rule each other out?

Configuring PROFINET

Safety Requirements Specification Guideline

Support of Windows Server 2012 The NCP Secure Enterprise VPN Server supports the Windows Server 2012 (64 bit) operating system.

Basic Fundamentals Of Safety Instrumented Systems

Data Storage - II: Efficient Usage & Errors

Manage the RAID system from event log

Handling Hyper-V. In this series of articles, learn how to manage Hyper-V, from ensuring high availability to upgrading to Windows Server 2012 R2

Programmable Systems The H41q and H51q System Families

Open Architecture Design for GPS Applications Yves Théroux, BAE Systems Canada

UPSentry Smart User s Manual. Shutdown Management Software. for Mac OS X 10.2

Pervasive PSQL Meets Critical Business Requirements

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Electronic Power Control

HRG Assessment: Stratus everrun Enterprise

Data Link Layer Overview

The functionality and advantages of a high-availability file server system

ERROR DETECTION AND CORRECTION

Health monitoring & predictive analytics To lower the TCO in a datacenter

External Storage 200 Series. User s Manual

How To Run An Ultravm Cloud Server

PSM/SAK Event Log Error Codes

Propulsion Gas Path Health Management Task Overview. Donald L. Simon NASA Glenn Research Center

UC CubeSat Main MCU Software Requirements Specification

STM32 F-2 series High-performance Cortex-M3 MCUs

TABLE OF CONTENT

Simnet Registry Repair User Guide. Edition 1.3

Software Engineering. Computer Science Tripos 1B Michaelmas Richard Clayton

VIA RAID Installation Guide

Guide to SATA Hard Disks Installation and RAID Configuration

A Comparison of Client and Enterprise SSD Data Path Protection

ABB Technology Days Fall 2013 System 800xA Server and Client Virtualization. ABB Inc 3BSE en. October 29, 2013 Slide 1

Troubleshooting PCs What to do when you don't know what to do!

Linear Motion and Assembly Technologies Pneumatics Service. Industrial Ethernet: The key advantages of SERCOS III

Red Hat Enterprise linux 5 Continuous Availability

Microsoft SQL Server Guide. Best Practices and Backup Procedures

CPU ARM926EJ-S, 200MHz. Fast Ethernet 10/100 Mbps port. 6 digital input 2 digital open-drain alarm output

Busch-Watchdog Presence EIB Type:

Highly Available Tessitura

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Sample of Hardware Equipment Acceptance Form

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS

SNMP Web card. User s Manual. Management Software for Uninterruptible Power Supply Systems

Chapter 10 Troubleshooting

Virtual Integrated Design Getting started with RS232 Hex Com Tool v6.0

Chapter 11 I/O Management and Disk Scheduling

Transmitter Interface Program

The Therac 25 A case study in safety failure. Therac 25 Background

BrightStor ARCserve Backup for Windows

EMC DATA DOMAIN DATA INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY

WinClon CC. Network-based System Deployment and Management Tool. A Windows Embedded Partner

Distribution One Server Requirements

IBM ^ xseries ServeRAID Technology

BECKHOFF. Application Notes. Data Storage on PC's and PLC's. For additional documentation, please visit.

C3306 LOCKOUT/TAGOUT FOR AUTHORIZED EMPLOYEES. Leader s Guide. 2005, CLMI Training

Chapter 12 Network Administration and Support

Creating A Highly Available Database Solution

Availability and Disaster Recovery: Basic Principles

Dr Markus Hagenbuchner CSCI319. Distributed Systems

Safety and Hazard Analysis

Allen-Bradley/Rockwell

Maintaining the Content Server

Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features

Dependable Systems Course. Introduction. Dr. Peter Tröger

PoNET kbd48cnc. User s manual

OSW TN002 - TMT GUIDELINES FOR SOFTWARE SAFETY TMT.SFT.TEC REL07

Embedded Systems Lecture 9: Reliability & Fault Tolerance. Björn Franke University of Edinburgh

List of courses MEngg (Computer Systems)

Systems I: Computer Organization and Architecture

Design Patterns for Safety-Critical Embedded Systems

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

NVIDIA RAID Installation Guide

IMPORTANCE OF COMPUTING HEALTH MANAGEMENT

CS 6290 I/O and Storage. Milos Prvulovic

Transcription:

Embedded Real-Time Systems (TI-IRTS) Safety and Reliability Patterns B.D. Chapter 9. 405-456 Version: 10-5-2010

Agenda Introduction to safety Patterns: 1. Protected Single Channel Pattern 2. Homogeneous Redundancy Pattern 3. Triple Modular Redundancy Pattern 4. Heterogeneous Redundancy Pattern 5. Monitor-Actuator Pattern 6. Sanity Check Pattern 7. Watchdog Pattern 8. Safety Executive Pattern Slide 2

Safety and Reliability Safety defined as freedom from accident and losses Reliability refers to the probability that a system will continue to function for a specified period of time Both requires some kind of redundancy to identify the dangerous condition or fault to take corrective action Slide 3

Safety versus Reliability Systems without safety impacts Systems with a fail-safe state Reliability Safety Systems without a fail-safe state Slide 4

Safety is a System Issue The system is either safe or it isn t not the software, not the electronics, not the mechanics It is the interactions of all these elements that determines system safety Slide 5

Single Point Failures Most experts consider a device safe: only when any single-point failure cannot lead to an incident Slide 6

Common Mode Failures A common mode failure: is a failure in multiple control paths due to a common or shared fault Example: a cardiac pacemaker with a watchdog circuit controlling a CPU both driven by the same crystal if the crystal fails e.g. doubling the frequency the heart could be paced by 210 beats per min. Slide 7

Example: Unsafe Linear Accelerator «processor» Linear Accelerator «actuator» Beam Emitter Set Intensity Start Beam End Beam The CPU can be a common mode failure «sensor» Beam Detector Beam Intensity Beam Duration Slide 8

Safe Linear Accelerator «processor» Linear Accelerator Set Intensity Self-test status watchdog service Emergency Shutdown Self-test status Start Beam End Beam «processor» Safety Processor Radiation Sensor «actuator» Beam Emitter Set Intensity Start Beam End Beam off «sensor» Beam Detector Beam Intensity Beam Duration «switch» Cut-off Switch on off on off «actuator» Mechanical Curtain Slide 9

Faults and Fail-Safe State Faults come in two flavors: Systematic faults (errors or design faults) Random faults (failures) Fail-safe state Many, but not all, safety-critical systems have a condition of existence known to be safe Slide 10

Fail-Safe State Different types of safe failure modes: Off state Emergency stop (cutting power) Production stop (finish current task) Protection stop (shuts down immediately) Partial shutdown (degraded functional level) Hold (no functionality) Manual, or external control (via external input) Restart (reboot or restart) Slide 11

Achieving Safety Mechanisms to identify data corruption: Parity simple 1-bit parity identifies single-bit errors Hamming codes multiple parity bits to identify n-bit errors and repair (n-1)-bit errors Checksums Cyclic Redundancy Checks (CRCs) Homogenous multiple storage Complement multiple storage store data in 1 s complement to protect against RAM faults Slide 12

A Safety Development Process Slide 13

1. Protected Single Channel Pattern The Protected Single Channel Pattern uses a single channel to handle sensing and actuation. Will not be able to continue to function in the presence of persistent faults, but it may be able to handle transient faults. Slide 14

Protected Single Channel Pattern Structure Slide 15

Protected Single Channel Pattern Example Slide 16

2. Homogeneous Redundancy Pattern The Homogeneous Redundancy Pattern uses replicated channels with a switchto-backup policy in the case of an error. The pattern improves reliability by addressing random faults (failures). Provides no protection against systematic faults. Slide 17

Homogeneous Redundancy Pattern Structure Backup channel Slide 18

Homogeneous Redundancy Pattern Example Slide 19

3. Triple Modular Redundancy Pattern The Triple Modular Redundancy Pattern operates three channels in parallel used to enhance reliability and safety in situations where there is no fail-safe state. Slide 20

Triple Modular Redundancy Pattern Structure Slide 21

Triple Modular Redundancy Pattern Example Slide 22

4. Heterogeneous Redundancy Pattern The Heterogeneous Redundancy Pattern uses multiple channels that have independent designs and/or implementations. Can also detect systematic faults. Slide 23

Heterogeneous Redundancy Pattern Structure Slide 24

Heterogeneous Redundancy Pattern Example Slide 25

5. Monitor-Actuator Pattern In the Monitor-Actuator Pattern an independent sensor maintains a watch on the actuation channel looking for an indication that the system should be commanded to its fail-safe state. Slide 26

Monitor-Actuator Pattern Structure Slide 27

Monitor-Actuator Pattern Example Slide 28

6. Sanity Check Pattern The Sanity Check Pattern is a variant of the Monitor-actuator pattern. The pattern exists to ensure that the actuation is approximately correct. Slide 29

Sanity Check Pattern Structure Slide 30

Sanity Check Pattern Example Slide 31

7. Watchdog Pattern The Watchdog Pattern checks that the internal computational processing is proceeding as expected. Can detect timebase faults and deadlocks. It may be combined with any other safety patterns in this chapter. Slide 32

Watchdog Pattern Structure Slide 33

Watchdog Pattern State Machine Slide 34

Watchdog Pattern Example Slide 35

8. Safety Executive Pattern The Safety Executive Pattern provides a safety executive to oversee the coordination of potentially multiple channels when safety measures must be actively applied. Slide 36

Safety Executive Pattern Structure Slide 37

Safety Executive Pattern Example Slide 38

Summary Safety is a system architecture/design issue 1. Protected Single Channel Pattern 2. Homogeneous Redundancy Pattern 3. Triple Modular Redundancy Pattern 4. Heterogeneous Redundancy Pattern 5. Monitor-Actuator Pattern 6. Sanity Check Pattern 7. Watchdog Pattern 8. Safety Executive Pattern Slide 39