Safety-Critical Firmware What can we learn from past failures?



Similar documents
Top 10 Bug-Killing Coding Standard Rules

Software Engineering. Computer Science Tripos 1B Michaelmas Richard Clayton

SCADE Suite in Space Applications

Technical Report CMU/SEI-88-TR-024 ESD-TR

CS4507 Advanced Software Engineering

Software Safety Basics

Introduction. Getting started with software engineering. Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 1 Slide 1

Minimizing code defects to improve software quality and lower development costs.

Software Testing & Analysis (F22ST3): Static Analysis Techniques 2. Andrew Ireland

New trends in medical software safety: Are you up to date? 5 October 2015, USA

The Therac 25 A case study in safety failure. Therac 25 Background

Rigorous Software Development CSCI-GA

The Course.

Die wichtigsten Use Cases für MISRA, HIS, SQO, IEC, ISO und Co. - Warum Polyspace DIE Embedded Code-Verifikationslösung ist.

Abstract Interpretation-based Static Analysis Tools:

PATRIOT MISSILE DEFENSE Software Problem Led to System Failure at Dhahran, Saudi Arabia

Dependable Systems Course. Introduction. Dr. Peter Tröger

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

Software: Driving Innovation for Engineered Products. Page

Motivation and Contents Overview

Static vs. Dynamic Testing How Static Analysis and Run-Time Testing Can Work Together. Outline

Sound Verification Techniques for Developing High-Integrity Medical Device Software

JOURNAL OF OBJECT TECHNOLOGY

The Road from Software Testing to Theorem Proving

Quality Management. Lecture 12 Software quality management

Software Engineering. Hans van Vliet Vrije Universiteit Amsterdam, The Netherlands

Testing and Inspecting to Ensure High Quality

Software testing. Objectives

Redefining Static Analysis A Standards Approach. Mike Oara CTO, Hatha Systems


An Introduction to MPLAB Integrated Development Environment

Best Practices for Verification, Validation, and Test in Model- Based Design

BOOKOUT V. TOYOTA Camry L4 Software Analysis. Michael Barr

Safety and Hazard Analysis

Course Goals. Solve Non-Technical Customer problem Server side: Ruby on Rails Client side: HTML, CSS, AJAX, JavaScript Deploy using cloud computing

CS100B Fall Professor David I. Schwartz. Programming Assignment 5. Due: Thursday, November

Real Time Programming: Concepts

Oracle Solaris Studio Code Analyzer

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

Software Engineering Introduction & Background. Complaints. General Problems. Department of Computer Science Kent State University

The Space Shuttle: Teacher s Guide

Software: Driving Innovation for Engineered Products

How Safe does my Code Need to be? Shawn A. Prestridge, Senior Field Applications Engineer

PRESENTATION SPACE MISSIONS

Copyright 2012 Pearson Education, Inc. Chapter 1 INTRODUCTION TO COMPUTING AND ENGINEERING PROBLEM SOLVING

Static Analysis of Dynamic Properties - Automatic Program Verification to Prove the Absence of Dynamic Runtime Errors

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

Achieving business benefits through automated software testing. By Dr. Mike Bartley, Founder and CEO, TVS

The programming language C. sws1 1

Comprehensive Static Analysis Using Polyspace Products. A Solution to Today s Embedded Software Verification Challenges WHITE PAPER

JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 21/2012, ISSN

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur

SOFTWARE DEVELOPMENT STANDARD FOR SPACECRAFT

Certification Authorities Software Team (CAST) Position Paper CAST-13

The Security Development Lifecycle. OWASP 24 June The OWASP Foundation

Software Engineering/Courses Description Introduction to Software Engineering Credit Hours: 3 Prerequisite: (Computer Programming 2).

ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER

Aerospace Information Technology Topics for Internships and Bachelor s and Master s Theses

Proving Control of the Infrastructure

Static analysis of numerical programs

Driving force. What future software needs. Potential research topics

THERE S NO EXCUSE FOR UNSAFE ACTS

HY345 Operating Systems

Ethical Issues in the Software Quality Assurance Function

Embedded & Real-time Operating Systems

Advanced Testing Methods for Automotive Software

Bug hunting. Vulnerability finding methods in Windows 32 environments compared. FX of Phenoelit

Automation can dramatically increase product quality, leading to lower field service, product support and

Protect Your Organization With the Certification That Maps to a Master s-level Education in Software Assurance

Eliminate Memory Errors and Improve Program Stability

MPLAB TM C30 Managed PSV Pointers. Beta support included with MPLAB C30 V3.00

Operating Systems 4 th Class

Fast Arithmetic Coding (FastAC) Implementations

F-22 Raptor. Agenda. 1. Motivation

System Engineering: A Traditional Discipline in a Non-traditional Organization

Overview and History of Software Engineering

Developers and the Software Supply Chain. Andy Chou, PhD Chief Technology Officer Coverity, Inc.

When COTS is not SOUP Commercial Off-the-Shelf Software in Medical Systems. Chris Hobbs, Senior Developer, Safe Systems

Computer Science 217

NWEN405: Security Engineering

What Is Specific in Load Testing?

Code Coverage: Free Software and Virtualization to the Rescue

Practical Programming, 2nd Edition

Introduction into IEC Software life cycle for medical devices

Kathy Au Billy Yi Fan Zhou Department of Electrical and Computer Engineering University of Toronto { kathy.au, billy.zhou }@utoronto.

Propsim enabled Aerospace, Satellite and Airborne Radio System Testing

Agile SPL-SCM: Agile Software Product Line Configuration and Release Management

Transcription:

Safety-Critical Firmware What can we learn from past failures? Michael Barr & Dan Smith Webinar: September 9, 2014 MICHAEL BARR, CTO BSEE/MSEE and Firmware Developer Consultant and Trainer (1999-present) Former Adjunct Professor! University of Maryland, Johns Hopkins University Former Editor-in-Chief; Columnist; Conference Chair Expert witness! unintended acceleration injuries; smartphone and set-top patents Author of 3 books and 70+ articles/papers 2 Copyright Barr Group. All rights reserved. Page 1

BARR GROUP The Embedded Systems Experts Barr Group helps companies make their embedded systems safer and more secure. barrgroup.com 3 UPCOMING PUBLIC BOOT CAMPS Embedded SOFTWARE Boot Camp! October 20-24 near Detroit, Michigan Embedded ANDROID Boot Camp! October 27-31 in Costa Mesa, California Embedded SECURITY Boot Camp! November 3-7 in Dallas, Texas http://barrgroup.com/training-calendar 4 Copyright Barr Group. All rights reserved. Page 2

UPCOMING PUBLIC 1-DAY TRAINING Firmware Defect Prevention for Safety-Critical Devices! September 23 rd near Detroit, Michigan http://barrgroup.com/courses/1day/safety-critical Overview! Focus on cost-effective defect prevention best practices! For engineers and managers in safety-critical fields 5 DAN SMITH, PRINCIPAL ENGINEER BSEE from Princeton 20+ years of embedded systems design! Fields: Control systems, telecom/datacom, medical devices, defense, transportation! Roles: engineer, instructor, speaker, consultant! Numerous RTOSes, processors, platforms Focus on secure, safe, fault-tolerant systems 6 Copyright Barr Group. All rights reserved. Page 3

OVERVIEW OF TODAY S WEBINAR Goal! Examine past software failures in critical systems! Learn how to avoid repeating the past Key Takeaways! Failures often traceable to preventable defects! Combination of education, process and vigilance Prerequisites! Knowledge of C (and perhaps a bit of C++) 7 CRITICAL SYSTEMS Defined: A (safety) critical system can cause injury or death when it malfunctions. Other disciplines, weighty concerns:! High Security Systems (access control, military)! High Availability Systems (grid, mobile/cellular, internet)! Mission critical systems (unmanned exploration) Commission / Omission 8 Copyright Barr Group. All rights reserved. Page 4

LOOKING THROUGH A KEYHOLE Much more to developing safety-critical systems! Planning, staffing, training, budgeting! Product specifications, requirements, test plans! Hardware, mechanical, redundancy, fail-safes! Modeling / simulation, formal proofs, fuzzing! Testing, validation, verification Presentation covers only implementation phase! Specifically, firmware development 9 ROLE OF FIRMWARE IN CRITICAL SYSTEMS Increasing role of firmware in:! Automobiles & transportation in general! Mobile electronics (phone, GPS, etc.)! Medical devices! we could go on & on & on More functionality being pushed into firmware! Operations formerly handled by hardware! Greater complexity, greater potential for problems 10 Copyright Barr Group. All rights reserved. Page 5

RIPPED FROM THE HEADLINES Source: http://en.ria.ru/world/20140828/192413515/galileo-satellites-incident-likely-result-of-software-errors.html http://spectrum.ieee.org/tech-talk/aerospace/satellites/two-galileo-satellites-are-parked-in-the-wrong-spots 11 HINDSIGHT Of course the cause & fix is obvious!! Then why same mistakes repeated over & over?!?! Similar lessons from security (e.g. buffer overflow) Point isn t to criticize or taunt! Avoid repeating the same mistakes 12 Copyright Barr Group. All rights reserved. Page 6

THERAC-25 Images: http://hci.cs.siue.edu/nsf/files/semester/week13-2/ppt-text/slide13.html 13 THERAC: WHAT HAPPENED? Hardware interlocks were designed out! Previous generations had them, replaced with software Early sign of software s increasing safety responsibility Race conditions & improper machine settings! High energy beam activated without spreader plate! One byte-counter overflowed at just the wrong time Result: 100x radiation dosage! At least 6 patients harmed, 3 killed 14 Copyright Barr Group. All rights reserved. Page 7

THERAC: FINDINGS Atomic Energy of Canada Limited (AECL):! Immature and inadequate software development process ( untestable software )! Incomplete reliability modeling & failure mode analysis! No (independent) review of critical software! Improper software re-use from older models! Improper inter-task synchronization Also notable:! System implemented in assembly language! System used own in-house operating system 15 ARIANE 5 / FLIGHT 501 Successor to smaller Ariane 4 rocket! Designed to carry larger, heavier payloads! Today: standard launch vehicle for ESA June 1996: Maiden flight Payload: Cluster! Four 1200-kg spacecraft! Mission: study Earth s magnetosphere 16 Image Source: https://en.wikipedia.org/wiki/file:ariane_501_cluster.svg Copyright Barr Group. All rights reserved. Page 8

FLIGHT 501 FAILURE 37 seconds into launch! Both inertial navigation systems malfunction & crash! Thrusters steered into extreme & incorrect orientations! Vehicle departed from intended flight path Flight termination system! Mechanical stresses triggered deliberate self-destruction! Fortunately that worked as intended!!! Cost: approximately $370M 17 FLIGHT 501 - CAUSE Inertial navigation system (SRI) re-used from Ariane 4 Flight 501 much greater horizontal velocity! Conversion: 64-bit floating point to 16-bit integer 18! Variable holding horizontal velocity overflowed! Overflow checks omitted for efficiency Implementation language was Ada! Typically regarded as a safer implementation language Software where error occurred:! Not needed after launch! Copyright Barr Group. All rights reserved. Page 9

MISRA C:2012 Directive 4.1: Run-time failures shall be minimized. C s run-time environment is very light-weight! Unchecked array access, divide by 0, dynamic allocation Implication:! Burden is on you, the programmer Tactic: Extensive (dynamic) run-time checking 19 ASSERTIONS Software assertions! Used to confirm programmer s assumptions at runtime Also a form of documentation C language: assert() (header file <assert.h>)! Expression is expected to evaluate to TRUE bool$isinrange(int$lower_bound,$int$upper_bound,$int$value)$ {$ $$assert(lower_bound$<=$upper_bound);$ $$ $ }$ What if expression evaluates to FALSE? 20 Copyright Barr Group. All rights reserved. Page 10

REMOVING ASSERTIONS Cost of assertions! Run time (CPU), code size Removing / disabling assertions! Typically by defining NDEBUG at compile time Assertions turn into whitespace! Often done just before shipping / production Ship what you tested Parachutes and pennies! And seatbelts 21 FLIGHT 501 - LESSONS Re-use of software isn t always an automatic win Mixing data types (e.g int & float) can be problematic! It s not just C that suffers from such problems Don t execute unnecessary software Disable assertions (sanity checks) at your own risk Consideration of failure modes is important too 22 Copyright Barr Group. All rights reserved. Page 11

WHAT ABOUT TESTING? Testing is necessary and important! But not sufficient Testing does not prove the absence of bugs! Some bugs escape to the field And are often very difficult to reproduce! Tests are software, too Bugs aren t limited to production code Tests are just one part of an overall quality strategy 23 PATRIOT MISSILE SYSTEM 24 Copyright Barr Group. All rights reserved. Page 12

PATRIOT MISSILE FAILURE : February 25, 1991! 28 U.S. soldiers dead; 100+ wounded! Single deadliest incident for U.S. 25 THE PATRIOT SOFTWARE BUG Two versions of system time! Clock 1: integer ticks (one tick = 0.1s) 26! Clock 2: fixed-point representation 3.25s:'000000000000000000000011.010000000000000000000000' Problem: no exact representation of 0.1 decimal (base 10) in binary ( non terminating )! Conversion from integer ticks to floating point values results in rounding (about 1 part in a million) After 100 hours (360,000 seconds), this is ~0.34 seconds! But what does that translate to in terms of distance? GAO Report: https://www.fas.org/spp/starwars/gao/im92026.htm Copyright Barr Group. All rights reserved. Page 13

PERILS OF FLOATING POINT, 1 void$test1(void)${$ $$float$f$=$0.1f;$printf("%0.6f\n",$f);$ $$f$+=$0.1f;$$$$$$printf("%0.6f\n",$f);$ }$ void$test2(void)${$ $$float$f$=$0.1f;$printf("%0.9f\n",$f);$ $$f$+=$0.1f;$$$$$$printf("%0.9f\n",$f);$ }$ int$main(void)${$ $$test1();$ $$test2();$ $$ $ 0.100000$ 0.200000$ 0.100000001$ 0.200000003$ 27?!?!? PERILS OF FLOATING POINT, 2 void$test3(void)${$ $$float$f1$=$0.1f,$f2$=$0.3f;$ $$f1$+=$0.3f;$f1$+=$0.7f;$ $$for$(int$i$=$0;$i$<$8;$++i)$${$ $$$$f2$+=$0.1f;$ $$}$ $$printf("%0.9f\n%0.9f\n",$f1,$f2);$ }$ int$main(void)${$ $$test3();$ $$return$0;$ }$ 1.100000024$ 1.100000143$ Larger error accumulation due to rounding on each iteration of loop 28 Copyright Barr Group. All rights reserved. Page 14

ACCUMULATED ERROR Uptime (h) Error (s) Shift (m) 1.0034 7 8.0275 55 20.0687 137 100.3433 687 29 GAO Report: https://www.fas.org/spp/starwars/gao/im92026.htm PATRIOT MISSILE FAILURE: LESSONS Testing will not catch all problems Mixing floating point and fixed point/integer operations can be tricky! In fact using floating point alone can be tricky! Tracking time (or any precise quantity)! Be consistent! Understand precision, rounding and conversion 30 Copyright Barr Group. All rights reserved. Page 15

MARS CLIMATE ORBITER 31 Source: https://upload.wikimedia.org/wikipedia/commons/1/19/mars_climate_orbiter_2.jpg UNITS ARE IMPORTANT Ultimately, computers calculate things! Most calculations involve dimensions & units! Pressure(kPa), velocity(m/s), flow (l/m), etc. Common unit mistakes in calculations! Same fundamental dimension, different system e.g. SetVelocityMetersPerSec(MPH_55);'! Disagreement in fundamental dimensions e.g. SetAcceleration((pos2?pos1)/time));' 32 Copyright Barr Group. All rights reserved. Page 16

DIMENSIONAL ANALYSIS In C, no unit information in standard types e.g. int'speed'='1234;'! Is that 123.4 meters per second?! Is that 123.4 miles per hour? e.g. float'calcpress(float'force,'float'area);'! What are the units for force & area? Can we use the language s type system to help?! Yes, and static analysis, too (e.g. Flexelint 9) 33 USING FLEXELINT 9 TO EXPOSE DIMENSION / UNIT PROBLEMS $$$$$1 $//$Dimensional$analysis$demonstration.$ $$$$$2 $//$Report$whenever$a$variable$(such$as$v)$typed$as$a$Velocity$ $$$$$3 $//$is$assigned$anything$other$than$a$velocity$or$a$met/sec.$ $$$$$4 $$ $$$$$5 $//lint$wstrong($acjcx,$met,$sec,$velocity$=$met/sec$)$ $$$$$6 $typedef$double$met,$sec,$velocity;$ $$$$$7 $$ $$$$$8 $Velocity$speed($Met$d,$Sec$t$)${$ $$$$$9 $$$Velocity$v;$ $$$$10 $$$v$=$d$/$t;$$$$$$$$$$$$$$//$ok$ $$$$11 $$$v$=$1$/$t;$$$$$$$$$$$$$$//$nope!$ $$$$12 $$$v$=$(3.5/t)$*$d;$$$$$$$$//$ok$ v$=$1$/$t;$$$$$$$$$$$$$$//$warning$ dimensional2.c$$11$$warning$632:$assignment$to$strong$type$'met/sec'$ in$context:$assignment$ dimensional2.c$$11$$warning$633:$assignment$from$a$strong$type$'1/ Sec'$in$context:$assignment$ 34 Copyright Barr Group. All rights reserved. Page 17

C DON T USE NAKED NUMBERS Consider an object-oriented approach! Create different types (classes) for different units typedef$uint32_t$speed1;$ typedef$uint16_t$speed2;$ typedef$struct$foo_tag${$ $$SPEED1$SpeedInCmPerSec;$ }$SPEED_CM_S;$ typedef$struct$foo_tag2${$ $$SPEED2$SpeedInMilesPerHour;$ }$SPEED_M_H;$ $ //$Below$routines$would$have$builtWin$bounds$checking,$etc.$ void$ctor1speedcmpersec(speed_cm_s$*obj,$speed1$initspeedcmpersec);$ void$ctor2speedcmpersec(speed_cm_s$*obj,$speed_m_h$const$*speedin);$ void$adjustspeedcmpersec(speed_cm_s$*current,$speed_cm_s$const$*adjustment);$ $ 35 EVEN BETTER USE C++ C++ is a perfect fit for this problem! Stronger type system than C! Templates & metaprogramming enforce dimensional correctness at compile time! More information:! Paper, Scott Meyers, Dimensional Analysis in C++ 1! See Boost::Units 2 1 http://se.ethz.ch/~meyer/publications/others/scott_meyers/dimensions.pdf 2 http://www.boost.org/doc/libs/1_56_0/doc/html/boost_units/dimensional_analysis.html 36 Copyright Barr Group. All rights reserved. Page 18

FILTERING OUT THE DEFECTS Coding Standard (e.g. MISRA) Static Analysis Formal Code Inspection Testing (multiple levels) 37 KEY TAKEAWAYS No such thing as bug-free software Testing is not sufficient Defense in depth just like security! Coding Standard / safe subset (e.g. MISRA standard)! Process (static analysis, code inspections)! Knowledge is a pro-active approach Always better to prevent than to find & fix 38 Copyright Barr Group. All rights reserved. Page 19

FURTHER READING Haven t found that glitch Dr. David Cummings http://articles.latimes.com/2010/mar/11/opinion/la-oewcummings12-2010mar12 Mars Code - Gerard J. Holzmann Text: http://cacm.acm.org/magazines/2014/2/171689-mars-code/fulltext Video: http://vimeo.com/84991949 Better Embedded System SW (Phil Koopman) http://betterembsw.blogspot.com/ 39 QUESTION & ANSWER 40 Copyright Barr Group. All rights reserved. Page 20

ADDITIONAL RESOURCES Paper: Top 10 Bug-Killing Coding Standard Rules barrgroup.com/embedded-systems/how-to/bug-killing- Standards-for-Embedded-C Michael Barr s Blog: Barr Code http://embeddedgurus.com/barr-code/ Training: Barr Group s Upcoming Public Courses barrgroup.com/training-calendar 41 CONCLUSION 42 Copyright Barr Group. All rights reserved. Page 21