DEDICATED TO EMBEDDED SOLUTIONS RELIABILITY IN SUBSEA ELECTRONICS TECHNIQUES TO OBTAIN HIGH RELIABILITY STIG-HELGE LARSEN KARSTEN KLEPPE DATA RESPONS 2012-10-16
AGENDA Introduction Analysis and Design Techniques Reliability Predictions FRACAS and Data Processing Techniques Production and Repair Testing Reliability Program Planning 2
THIS IS DATA RESPONS We are a full-service, independent technology company and a leading player in the embedded solutions market. ESTABLISHED: 1986 Listed on the Oslo Stock Exchange (Ticker: DAT) CERTIFICATIONS: ISO 9001:2008 ISO 14001:2004 OHSAS 18001:2007 EMPLOYEES: 465
CUSTOMISATION CUSTOM SPECIFICATION CHOICE OF TECHNOLOGY EXTREME CONDITIONS Physical size Interfaces Functionality Performance Power demands Regulations Standards Operating systems Software architecture Hardware platform Processor architecture Memory and storage Communication & I/O Display and touch Humidity Altitude Temperature Vibration Salt spray Shock EMC 4
EXAMPLE: CURRENT SENSOR BOARD Meassuring range: 0.2 1.2 A AC Accuracy: Better than ± 1.0 % CAN bus interface 4-20 ma outputs Qualified according to ISO 13628-6 for Subsea Production Control Systems Based on Hall effect current sensor 5
RELIABILITY IN SUBSEA ELECTRONICS
INTRODUCTION Reliability in Data Respons Reliability study IEC 61508 QA system Reliability The ability of an item to perform a required function under stated conditions for a specified period of time Availability The proportion of time for which the equipment is able to perform its function 7
SUBSEA Characteristics Relative low volumes Need for high reliability Low accessibility High cost in case replacements 8
KEY POINTS Techniques to obtain high reliability in electronics Topic Areas: Relevant Themes: Key Points: Design Techniques and Analysis Root Causes of Failures Failure Reporting and Corrective Actions System Automated Testing Accelerated Stress Testing Reliability Program Plan 9
ANALYSIS AND DESIGN TECHNIQUES Techniques to obtain high reliability in electronics Topic Areas: Relevant Themes: Key Points: Analysis and Design Techniques Root Causes of Failures Failure Reporting and Corrective Actions System Automated Testing Accelerated Stress Testing Reliability Program Plan 10
ANALYSIS AND DESIGN TECHNIQUES
ANALYSIS AND DESIGN TECHNIQUES Start with evaluation of the relationships between different parts of the system Evaluate different design alternatives Follow design guidelines 12
ANALYSIS AND DESIGN TECHNIQUES Use design checklists Arrange design reviews Perform stress analysis and derating of components 13
ANALYSIS AND DESIGN TECHNIQUES Failure Mode, Effects and Criticality Analysis (FMECA) identifies potential failure modes lists the effects of failures Hardware Design Failure Modes basis for eliminating missioncritical, single-point failures Component Data[Base] FMECA Failure Effects Failure Rate & Criticality Numbers 14
ANALYSIS AND DESIGN TECHNIQUES Failure Mode, Effects and Diagnostic Analysis (FMEDA) includes diagnostic coverage (the ability of any automatic diagnostics to detect failures) Hardware Design Failure Modes Failure Effects Component Data[Base] FMEDA Failure Rate & Criticality Numbers Diagnostic Coverage 15
16 FMECA - EXAMPLE OF DA FORM 7611
17 FMECA - EXAMPLE OF DA FORM 7612
ANALYSIS AND DESIGN TECHNIQUES Redundancy duplicating critical parts usually in the case of a backup or fail-safe 18
ANALYSIS AND DESIGN TECHNIQUES Software Development Plan Describing software development methodology and techniques including reviews, coding standard, and testing. Key aspect of the software reliability program. The software reliability depends on the number of software faults. Testing is very important for software: every individual unit integration full system 19
ANALYSIS AND DESIGN TECHNIQUES Design for Test (DFT) make it easier to implement low level manufacturing tests Typical Board with Boundary-Scan Components Built-In Test (BIT) to achieve high reliability for a lower cost Automatic Reset Features restart if critical events lack of communications, or improper software operation. Source: Corelis 20
ANALYSIS AND DESIGN TECHNIQUES Thermal Analysis good working temperature for every chip to achieve the required design for reliability and performance Electromagnetic Analysis good electromagnetic compatibility (EMC) design for correct operation of different equipment in the same electromagnetic environment 21
ANALYSIS AND DESIGN TECHNIQUES Accelerated Testing using high stresses to get failures quickly 22
ANALYSIS AND DESIGN TECHNIQUES Root Cause Analysis (RCA) to correct or eliminate root causes a tool of continuous improvement Reliability Growth Analysis collecting, modeling, analyzing and interpreting data learn improvement done in the reliability of a product 23
RELIABILITY PREDICTIONS
RELIABILITY PREDICTIONS A quick reliability analysis for the designed system is needed MTBF is often used as a measure for reliability Restricted to operation under stated conditions Important to use a relevant prediction calculation procedure 25
RELIABILITY PREDICTIONS Abstract from reliability analysis checklist in MIL-HDBK-217 26
RELIABILITY PREDICTIONS Factors that affect the MTBF figures from vendors Prediction methods Predefined conditions Quality level of components The source and assumptions for the base failure rate of each component type The vendors assumptions need to be understood. MTBF a indicator of reliability 27
RELIABILITY PREDICTIONS What is the use of reliability predictions? assessment of whether reliability goals (e.g. MTBF) can be reached identification of potential design weaknesses evaluation of alternative designs and life-cycle costs the provision of data for system reliability and availability analysis 28
FRACAS & DATA PROCESSING TECHNIQUES
FRACAS & DATA PROCESSING TECHNIQUES Techniques to obtain high reliability in electronics Topic Areas: Relevant Themes: Key Points: Analysis and Design Techniques Root Causes of Failures Failure Reporting and Corrective Actions System Automated Testing Accelerated Stress Testing Reliability Program Plan 30
FRACAS FRACAS: Failure Reporting And Corrective Action System 31
DATA ANALYSIS: PARETO CHART Pareto chart: To highlight the most important among a (typically large) set of factors. The most frequent fault causes will vary from item to item. No fault found and Root cause unknown will often amount to a larger part of all cases. 32
DATA ANALYSIS: NO FAULT FOUND Some possible reasons for no fault found (NFF): a seldom failure hard to recreate (e.g. failure under special conditions) the failure is coming and going (e.g. a loose connection) there has never been a fault on the item 33
DATA ANALYSIS: INTERMITTENT FAILURES Intermittent Failures: The system performs incorrectly only under certain conditions, but not others. Can cause the same system failure if reinstalled, and can therefore generate high costs. 34
DATA ANALYSIS: PARETO CHART Example summarized The following categories in particular need attention: 1. Power circuit 2. PCB production / assembly 3. Input/output circuit 4. Firmware 5. Connectors or internal cables Also often relevant for some items: 6. Secondary storage / external memory (disk) 7. Mechanical damage 8. Batteries 9. Software 10. CPU module 11. Others for instance short circuit internal memory (RAM) fault defect fan errors in procedure design fault 35
PRODUCTION AND REPAIR
PRODUCTION AND REPAIR Some relevant topics: Errors during production tests and field errors will correlate Follow-up of suppliers Production batch volume for electronics Saving test data so that analysis is easily ISO 20815 standard Production assurance and reliability management 37
PRODUCTION AND REPAIR IPC-A-610 - Acceptability of Electronic Assemblies IPC J-STD-001 - Requirements for Soldered Electrical and Electronic Assemblies IPC product classes: CLASS 1 - General Electronic Products CLASS 2 Dedicated Service Electronic Products CLASS 3 High Performance Electronics Products 38
PRODUCTION AND REPAIR Rework implies a risk for the reliability, and therefore it should be requirements about the maximum allowed rework should be substantiated and documented for each serial number IPC-7711/7721 is the IPC standard for rework, modification and repair 39
HANDLING ELECTRONIC ASSEMBLIES Electrostatic discharge (ESD) can occur with no visible signs of damage. 40
HANDLING ELECTRONICS ASSEMBLIES Two simple principles of electrostatic safe handling are: 1. Only handle sensitive components in an ESD Protected Area (EPA). 2. Protect sensitive devices outside the EPA using ESD protective packaging 41
TESTING
TESTING Techniques to obtain high reliability in electronics Topic Areas: Relevant Themes: Key Points: Analysis and Design Techniques Root Causes of Failures Failure Reporting and Corrective Actions System Automated Testing Accelerated Stress Testing Reliability Program Plan 43
AUTOMATED TESTING Why automated testing? human errors can be minimized more thorough testing enable monitoring of variations in test results do several tests very quickly and find potential points of failure 44
AUTOMATED TESTING Automatic Optical Inspection (AOI) takes time to set up correctly Example from Axiomtek Automated X-Ray Inspection (AXI) in many ways similar to AOI except that it can look through IC packages 45
AUTOMATED TESTING In-Circuit Test (ICT) often limited when pins for contact don t get access on boards ICT example from RNS International Manufacturing Defect Analyzer (MDA) does not check the operation of ICs 46
AUTOMATED TESTING JTAG Boundary Scan widely used much of a board to be tested with only minimal access Typical Board with Boundary-Scan Components its standard is IEEE 1149.1 boundary scan integrated circuits (ICs) connected serially on a board Source: Corelis 47
AUTOMATED TESTING Functional Automatic Test System use equipment for testing the function of a circuit Example on a software-defined test system from National Instruments 48
AUTOMATED TESTING Built-In Test (BIT) good accessibility to the hardware often less-expensive tests Loop back test connecting transmitter and receiver on the same board Some form of external tests will usually be required in addition to self-diagnostics 49
AUTOMATED TESTING For testing of external interfaces using a standard protocol, a software tool can be purchased for testing and data logging By analyzing data from testing, production areas that need attention and improvement can be pinpointed. 50
STRESS TESTING - ISO 13628 PART 6 ISO 13628 part 6 for subsea production control systems: Qualification and EMC (electromagnetic compatibility): Shock Vibration Temperature EMC tests ESS (Environmental Stress Screening) during production: Random vibration Thermal cycling Burn-in Final functional test 51
52 BATH TUB CURVE
HALT - HIGHLY ACCELERATED LIFE TESTING HALT to provoke failures commonly seen after long-term use within a relatively short period of time take corrective measures either changes to the design or changes in the production process Source: Turin Networks 53
HALT - HIGHLY ACCELERATED LIFE TESTING Typical tests are: Cold Step Test Hot Step Test Rapid Temperature Cycling Test (e.g. 60 C/minute ramp-rate) Stepped Vibration (random) Test Combined Environment Stress 54
HASS - HIGHLY ACCELERATED STRESS SCREENING HASS production equivalent of HALT to find manufacturing/ production process induced defects Common screen varieties Source: Turin Networks 55
RELIABILITY PROGRAM PLAN
RELIABILITY PROGRAM PLAN Techniques to obtain high reliability in electronics Topic Areas: Relevant Themes: Key Points: Analysis and Design Techniques Root Causes of Failures Failure Reporting and Corrective Actions System Automated Testing Accelerated Stress Testing Reliability Program Plan 57
RELIABILITY PROGRAM PLAN Reliability Program Plan include required activities, methods, analyses, tools, and test strategies for the system important to reach the required reliability 58
WWW.DATARESPONS.COM