SFTA, SFMECA AND STPA APPLIED TO BRAZILIAN SPACE SOFTWARE



Similar documents
Safety Driven Design with UML and STPA M. Rejzek, S. Krauss, Ch. Hilbes. Fourth STAMP Workshop, March 23-26, 2015, MIT Boston

Effective Application of Software Safety Techniques for Automotive Embedded Control Systems SAE TECHNICAL PAPER SERIES

MOCET A Certification Scaffold for Critical Software

University of Paderborn Software Engineering Group II-25. Dr. Holger Giese. University of Paderborn Software Engineering Group. External facilities

Managing Design Changes using Safety-Guided Design for a Safety Critical Automotive System

ASSESSMENT OF THE ISO STANDARD, ROAD VEHICLES FUNCTIONAL SAFETY

Safety and Hazard Analysis

STAMP Based Safety Analysis for Navigation Software Development Management

Best Practice In A Change Management System

Safety Requirements Specification Guideline

Software Engineering. Computer Science Tripos 1B Michaelmas Richard Clayton

Design of automatic testing tool for railway signalling systems software safety assessment

A Comprehensive Safety Engineering Approach for Software Intensive Systems based on STPA

Space product assurance

Controlling Risks Risk Assessment

Software Safety Hazard Analysis

Independent Validation of Software Safety Requirements for System of Systems by S. Driskell, J. Murphy, J.B. Michael, M. Shing

Risk Analysis of a CBTC Signaling System

ISO Introduction

Autonomous Operations for Small, Low-Cost Missions: SME, EOR (STEDI), DATA, and Pluto Express OUTLINE. Visions for Future, Low-Cost Missions

A Methodology for Safety Critical Software Systems Planning

Systems Engineering. Designing, implementing, deploying and operating systems which include hardware, software and people

Safety Lifecycle illustrated with exemplified EPS

Guide to Reusable Launch and Reentry Vehicle Software and Computing System Safety

3.0 Risk Assessment and Analysis Techniques and Tools

Software-based medical devices from defibrillators

Assurance Cases and Test Design Analysis

Risk Assessment for Medical Devices. Linda Braddon, Ph.D. Bring your medical device to market faster 1

Software Engineering UNIT -1 OVERVIEW

An Introduction to Fault Tree Analysis (FTA)

Risk. Risk Categories. Project Risk (aka Development Risk) Technical Risk Business Risk. Lecture 5, Part 1: Risk

Ildeberto Muniz de Almeida, Department of Public Health, Faculty of Medicine, Botucatu, São Paulo, Brasil.

The Role of Automation Systems in Management of Change

2014 STAMP Conference MIT Partnership for a Systems Approach to Safety Using STAMP Principles in Risk Management of Large Scale Pipeline Projects

Risk Assessment and Management. Allen L. Burgenson Manager, Regulatory Affairs Lonza Walkersville Inc.

3.4.4 Description of risk management plan Unofficial Translation Only the Thai version of the text is legally binding.

SPPA-D3000 Plant Monitor Technical Description

Trigger-to-Image Reliability (T2IR)

How To Use A Power Supply On A Powerline 2.2 (Ai)

Safety Integrity Levels

Embedded Critical Software Testing for Aerospace Applications based on PUS

RAMS Software Techniques in European Space Projects

OSW TN002 - TMT GUIDELINES FOR SOFTWARE SAFETY TMT.SFT.TEC REL07

A System-safety process for by-wire automotive systems

Implementation of ANSI/AAMI/IEC Medical Device Software Lifecycle Processes.

AP1000 European 18. Human Factors Engineering Design Control Document

FMEA and FTA Analysis

Operating Systems. Lecture 03. February 11, 2013

Fault Tree Analysis (FTA) and Event Tree Analysis (ETA)

INFORMATION TECHNOLOGY SERVICES IT CHANGE MANAGEMENT POLICY & PROCESS

THE LEAN-RESOURCES BASED CONSTRUCTION PROJECT PLANNING AND CONTROL SYSTEM

Socio-Technical Systems

Software Engineering for Real- Time Systems.

SpaceLoft XL Sub-Orbital Launch Vehicle

System Safety Process Applied to Automotive High Voltage Propulsion Systems

Edwin Lindsay Principal Consultant. Compliance Solutions (Life Sciences) Ltd, Tel: + 44 (0) elindsay@blueyonder.co.

Socio technical Systems. Ian Sommerville 2006 Software Engineering, 8th edition. Chapter 2 Slide 1

System Aware Cyber Security

Introduction to system safety and risk management in complex systems. Dr. John Thomas Massachusetts Institute of Technology

Basic Fundamentals Of Safety Instrumented Systems

Risk management for external beam radiotherapy Recommendations (draft)

Developing software which should never compromise the overall safety of a system

Testing Low Power Designs with Power-Aware Test Manage Manufacturing Test Power Issues with DFTMAX and TetraMAX

Fault Modeling. Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults. Transistor faults Summary

Value Paper Author: Edgar C. Ramirez. Diverse redundancy used in SIS technology to achieve higher safety integrity

SAFE SOFTWARE FOR SPACE APPLICATIONS: BUILDING ON THE DO-178 EXPERIENCE. Cheryl A. Dorsey Digital Flight / Solutions cadorsey@df-solutions.

RF System Design and Analysis Software Enhances RF Architectural Planning

QUALITY RISK MANAGEMENT (QRM): A REVIEW

DESPITE AN ENORMOUS amount

Failure Analysis Methods What, Why and How. MEEG 466 Special Topics in Design Jim Glancey Spring, 2006

PROFIBUS fault finding and health checking

Appendix J. Software Safety

Lecture 8. Systems engineering L E C T U R E. SIMILAR process. Zuzana Bělinová. Faculty of Transportation Sciences, CTU in Prague

Overview. Stakes. Context. Model-Based Development of Safety-Critical Systems

Safety Critical & High Availability Systems

Using STAMP/STPA to Chinese High Speed Railway Train Control System

Real Time Tax System Initiative Public Meeting. December 8, 2011

NXP Basestation Site Scanning proposal with AISG modems

Selecting Sensors for Safety Instrumented Systems per IEC (ISA )

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

MONITOR ISM / AFx Multi-Tenant Security System User Guide V1.3

Operational Risk Management for IT infrastructure

ON-GUARD. Guard Management System. Table of contents : Introduction Page 2. Programming Guide Page 5. Frequently asked questions Page

Hazards associated with the gas system and how to mitigate them

INITIAL TEST RESULTS OF PATHPROX A RUNWAY INCURSION ALERTING SYSTEM

CHAPTER 1 INTRODUCTION

Goddard Procedures and Guidelines

Software testing. Objectives

Continuous Risk Management at NASA

SOFTWARE DEVELOPMENT STANDARD FOR SPACECRAFT

Prospective Hazard Analysis (PHA) Proactive risk assessment Looking for Trouble

Contact ID Event Definition Codes

Project Risks. Risk Management. Characteristics of Risks. Why Software Development has Risks? Uncertainty Loss

Embedded Systems Conference / Boston, MA September 20, ESC-302 Stress Testing Embedded Software Applications

Federal Emergency Management Agency

Innovations in Pharma Sales Operations

Corrective and Preventive Action Background & Examples Presented by:

PROFINET IO Diagnostics 1

Safe Automotive software architecture (SAFE) WP 6, WT Deliverable D Methods for Assessment Activity Architecture Model (AAM)

Transcription:

SFTA, SFMECA AND STPA APPLIED TO BRAZILIAN SPACE SOFTWARE Carlos H N Lahoz Instituto de Aeronautica e Espaco (IAE) Instituto Tecnologico da Aeronautica(ITA) BRAZIL STAMP/STAP Workshop 2014 25-27 March2014-MIT Campus

Agenda Context of this work Space Software - Case Study Combined approach SFTA+SFMECA STPA Considerations

Context of this work This work reports some results of a research project performed at IAE/Brazil using dependability techniques applied to space computer system SFTA and SFMECA was conducted on system software specification (SSS) in a case study of an hypothetical spacecraft software STPA is being applied to one scenario in order to evaluate possible additional information about how the behavioral safety constraints can be violated

FTC: function responsible for calculating the time in pre-flight and flight phases (Lower Time Limit -LTL and Upper Time Limit -UTL). RED: function responsible for detecting the reference-events expected during the pre-flight and all flight phases through the information of vehicle acceleration, obtained by the inertial sensor (IS), and flight time List of RED events: RED_PRE, RED_A, RED_A6 RED_B, RED_C, RED_D Space Software - Case Study Inertial System data (IS) acceleration IS_OK flight time reference_event_ok referenceevents detection (RED) Events Table (ET) (*) lower time limit -LTL upper time limit -UTL flight time calculation (FTC) NAV_ON Ground System data (GS) REP: function responsible for generating the related-events linked to the reference-events that must be controlled by OBC Onboard computer (OBC) event_accel OR specific_time_limit (*) related_event_ok control actuators command (CAC) actuator_on relat_event_ok Actuator System data (AS) relatedevents production (REP) Sequence of Flight Events (SFE) dataflow CAC: function responsible for generating the data used by the channels activation commands to the rocket actuators system (AS), such as the movable nozzles

SFTA+SFMECA combined approach Converse Combination approach: According to the system s function requirements and the failure definition, this technique selects one or more specific undesirable events as the top events to build the responsible SFTA After the qualitative analysis, some important basic events are selected These events are analyzed and evaluated by the FMECA procedure According to the result of SFMECA, further analysis and calculation of the fault tree analysis can be carried out Converse combination approach

SFTA+SFMECA combined approach Four steps: Step 1-Preparation for techniques application: evaluating SFTA level (specification or code level) and SFMECA table tailoring Step 2-SFTA analysis: to look at the software faults related to resources (data) and tasks (functions) that could cause a hazard Step 3-SFMECA analysis: using ELICERE guidewords to classify failure modes from SFTA Step 4- Identify compensating provisions: in order to suggest new nonfunctional requirements

SFTA+SFMECA: step 1 ELICERE guidewords otherapproaches (*) description Absent Omission total No resource not provided; hardware failure; lack or loss of messages; lack of input values of a sensor; lack of input values or output; failure to receive the required data; loss of data due to hardware failure sensor failure to send the data Resource Incorrect Comission, Omission partial More, Less Reverse Part of Other than bad data; any resource that does not correctly describe the use of the system or its operating environment; spurious or unexpected signals in the output of a device; error values for routine firing of triggers; incomplete data structure; lack of some data in a sequence; resource was greater or less than required; only part of the resource was offered; offered opposite resource; another resource was offered; information delivered with wrong value Wrong Timing Early Before Late After device start out of time specified; device start out of order specified; obsolete data used to the control decision; spurious data; inadvertent or flawed that occur only with some entries; resource provided before the time required; resource provided after the required time; ABDC sequence occurs in a sequence of events that should be ABCD Duplicated Comission repetition As wellas additional resource offered; saturated data; duplicate data; overflow; resource offered when not required; a data from an expected communication is repeated when it should not be Classes of Failures: ELICERE resource guidewords (*) CHAZOP (Nimmo; Nunns and Eddershaw, 1987), SHAZOP (Burns and Pitlado, 1993) SFMEA (Lutz and Woodhouse, 1996), SHARD/LISA (Pumfrey, 2000)

SFTA+SFMECA: step 2 SFTA: top down (deductive) technique that focuses on how errors, or even normal functioning of the system can lead to hazards. Top event = hazard (system software requirements not met) Basics events = set of possible causes (software requirements not met)

SFTA+SFMECA: step 2 (SFE) Failure in the Sequence of Flight Events (REP) related-events not produced (FTC) flight time calculation error (RED) reference-events detection wrong (CAC) failure to control actuators command (RED_PRE) IS Comm NOT OK (RED_A) Lift Off NOT Detected (RED_A6) 2E Ignition is NOT detected (RED_B) 1E Separation is NOT detected (RED_C) 2E Burnout is NOT detected (RED_D) 3E Burnout is NOT detected SFTA application for Sequence of Flight Events

SFTA+SFMECA: step 3 SFMECA: Bottom-up(indutive) method used to find potential system problems SFMECA is applied in the SFTA basic events, identifying: potential failure modes (guidewords) consequences, severity criticality possible compensating provisions

SFTA+SFMECA: step 3 Failure Mode Failure Class Potential Cause Effect Severity Criticality RED_PRE IS CommNOT OK (Inertial System communicatio n is not been working) Incorrect Data Incorrect information that the rocket is ready to flight OR Incorrect information that indicate the IS isready OR Incorrect information of longitudinal acceleration of the vehicle to detection of reference events OR Incorrect control flag to start the execution of each control algorithms OR Incorrect information of the time (flight time) Wrong data time of the reference events RED_PRE and the instant of starting communication from IS to OBC SFMECA application for "Inertial System Communication Not OK OR Incorrect time instant of starting the vehicle flight (from FTC) 5 Mission loss B

SFTA+SFMECA: step 4 Compensation Provision: -Ensurethattheeventthatstarts thecommunication withtheis andobc is correctly identified(consistency)

STPA Step 0: Establish the fundamentals Define what is "accident" for the system and what is an unacceptable loss For the SFE: accident is the fact that the software was not able to perform the sequence of flight events causing loss of mission

STPA Step 0: Define what are the system hazards (H) and their safety constraints (SC) System Hazards H1=Failure on RED_PRE H2=Failure on RED_A H3=Failure on RED_A6 H4=Failure on RED_B H5=Failure on RED_C Safety Constraints SC1= ensure the correct communication with the IS to activate the pre-flight event SC2= the software must receive the NAV_ON to initialize the flight time SC3= the ignition of the second rocket stage (2E) must be detected SC4= the separation of the first rocket stage (1E) must be detected SC5= the burnout of 2E must be detected H6=Failure on RED_D SC6= the burnout of 3E must be detected H7=Failure on actuation command SC7=verify if the channels are actuated

STPA Step 0: Define a basic control structure

STPA Step 1: Identify potentially inadequate (unsafe) control actions of the system that could lead to a hazardous state (unsafe control) As well as ELICERE guidewords, STPA classify four unsafe controls: Control Action Not Providing causes hazard Providing causes hazard Wrong timing or order causes hazard Stop too soon or applied too long Send IS_OK IS_OK not sent Not applicable Not applicable Not applicable Provide acceleration transmitt NAV_ON Detect acceleration Set actuation command IS do not supply the data GS not supplied RED do not acquire the data CAC do not set the A/D channel IS supplied the incorrect data Not applicable Not applicable CAC provides the wrong actuation IS supplies the data with long delay GS supplied after 1E burnout RED acquires data out of the time window CAC provides the actuation in a wrong time IS bus stops functioning Not applicable RED stops to acquire data during the fly Not applicable

STPA Step 2: Identify causes of unsafe control actions Hazard control behavior identified in this case study: no feedback by the actuation command from I/O channels: information to the SFE if the first stage (1E) was physically separated after the activation of the respective digital channels (output)

STPA Step 2: SFE Causes of Unsafe Control Actions from set actuation command Controller: SFE - acceleration is not supplied - acceleration is not acquired -UTL is not achieved - the channel is not actuated - the channel is actuated in a wrong time - no feedback of the actuation channel Control process: verify if the channels are actuated A/D channels

STPA Step 2: Develop mitigations to set actuation command onboard software should read the data from 1E movable nozzle actuation channel (input), located in the 2E, to check if the value is zero. The zero value in this channel means that the 1E was physically separated

Considerations: case study The integrated use of SFTA (top events) and SFMECA (basic events) for software dependability analysis allowed identifying gaps in meeting requirements: SFTA: produced 62 gates and 170 basic events Most of SFE basic events that had been identified by SFTA were also identified in STPA hazard analysis The STPA unsafe control action no feedback of the actuation channel, is not clearly identified by SFTA+SFMECA Although the STPA was not used extensively in the project, provides a structured process for hazards analysis, that apparently helps to reduce the analytical burden

Considerations about (S)FTA & (S)FMECA FMECA results are presented in a less intuitive way: tabular format (Hong, L. & Binbin, L. 2009) The effort to use FTA is 2x more than STPA (Yahia, H. & Fawzy, E., STPA Workshop 2013) If FTA or FMEA focused only on the physical architecture without consideration to control system propagation paths and feedback mechanisms, it may be possible to miss some safety requirements (Sundaram, P.& Hartfelder, D., STPA Workshop 2013)

Considerations about STPA Domain expertise and a level of familiarity with control engineering is needed (Malakis, S., STPA Workshop 2012) In multiple controllers case, it is important to understand interaction (interference) among controllers. However, it is difficult (Ujiie, R. & Ishimatsu, T., STPA Workshop 2012) STPA analyze not only safety aspects, but also functional goals (Thomas, J., STPA Workshop 2012) STPA addresses misbehaviors due to software problems and may help address regulatory concerns (Torok, R. & Geddes, B., STPA Workshop 2013)

Considerations about STPA Use of STPA allowed the design team to identify more casual factors for quality losses than FMEA or FTA, including component interactions, software flaws, and omissions and external noises (Goerges, S., STPA Workshop 2013) How to develop real-time constraints? (Yahia, H. & Fawzy, E., STPA Workshop 2013) Likely to require a facilitator for new users and dependent on analysis boundary (Torok, R. & Geddes, B., STPA Workshop 2013) The third step of STPA needs a lot of effort, time and deep knowledge for examining the controllers with process models (Abdulkhaleq, A., STPA Workshop 2013)

Thank you Carlos H N Lahoz lahozchnl@iae.cta.br 55-12-997131177 mobile Instituto de Aeronautica e Espaco(IAE) Sao Jose dos Campos -Brazil This study is a part of the on-going research project Verification and Validation of Space Projects -Software Systems sponsored by National Council for Scientific and Technological Development (CNPq), Brazil (Process 559973-2010-1)