Using the Prognostic Health Management Program on the Air Force Next Generation Reusable Launch Vehicle Len Losik, Ph.D 1 Failure Analysis, Capitola CA 95010 The Air Force is planning to replace the EELV with a reusable booster, promising a 50% cost savings per launch. However, such claims were made by the Air Force for the EELV program that did not occur, and the EELV is being considered for cancellation as a result. To convince skeptics of the promises for the NGRLV, the NGRLV will need to have sources of proven practices and technologies that resulted in real reduced life cycle cost. This data is available from the Air Force s F-35 JSF program that was achieved from adding intelligence into their avionics equipment. The technologies used on the Air Force s F-35 offers the NGRLV further reduction in launch cost by leveraging the integrated vehicle health management (IVHM) program that results in lower life cycle cost in addition to the savings from using a reusable booster. The cost of the ground support needed for a NGRLV may be similar to the ground support for a fighter aircraft and so the NGRLV can leverage the latest proven practices and technology from the Air Force F-35 JSF to market the cost savings to the NGRLV skeptics. The lower NGRLV vehicle operating and maintenance cost result from adding intelligence and decision making into the NGRLV equipment using a prognostic and health management program, while simultaneously increasing NGRLV reliability and availability and decreasing risk of a catastrophic failure. This is achieved by stopping the premature failure of on-board equipment. An IVHM program requires the onboard equipment to have embedded predictive algorithms that continuously measure equipment remaining usable life automatically. Equipment usable life can also be measured manually using spacecraft equipment analog telemetry and predictive algorithms to demodulate the equipment telemetry behavior. A factory ATP identifies the spacecraft equipment that fails performance testing for repair or replacement before launch. A prognostic analysis measures equipment usable life and the results will identify the equipment that has passed performance testing but will fail prematurely with certainty. An IVHM allows the NGRLV to be lighter and offer reliability and availability that surpasses all previous space vehicles. Using an IVHM on the NGRLV will allow the NGRLV to automatically notify ground support personnel prior to landing which equipment needs to be repaired or replaced allowing a faster turn-around and this paradigm provides the reliability and availability of on-board equipment that eludes today s expendable launch vehicles. This paper will define the IVHM program for the NGRLV and will include the changes necessary to the design of the NGRLV that will further refine the operational and rapid turnaround procedures and the autonomous operations proven on the F-35 JSF further decreasing the NGRLV life-cycle cost. This paper will identify the practices and technology that offers the necessary reduction in operational and maintenance costs for the NGRLV making Air Force space missions cheaper, safe and reliable. I. Introduction 14 The current paradigm for space systems mission success is space vehicle safety; maintainability and supportability are a function of reliability act (R act ) and redundancy. R act is the reliability based on the ratio of successes to attempts. Reliability cal (R cal ) is the reliability that is the result from using stochastic equations to calculate reliability. This paradigm produced the greatest catastrophic space vehicle failures and the premature failures of over 25% of all 1 President and Chief Technical Officer, 501 Plum Street, Suite 48, Capitola CA, Senior Member 1
military, NASA and commercial space missions making getting to space and working in space unsafe. The new paradigm for mission success developed by the Air Force for the new F-35 Joint Strike Fighter is that safety, maintainability and supportability are a function of reliability, redundancy and prognostics and health management (PHM) capabilities. This same paradigm is well suited for the space industry, because telemetry is used on all spacecraft for a variety of reasons. Telemetry was borrowed from the jet aircraft flight-test community at Edwards Air Force Base over 60 years ago. With our suggestion to the NASA HQ Safety and Mission Assurance Dept. NASA HQ published a PHM plan for use on all future NASA aircraft. We have requested that NASA adopt the PHM paradigm on all future manned and manned space missions as well. Figure 1. An Artist Concept for the Low Cost, Highly supportable Air Force Next Generation Reusable Launch Vehicle to Replace the Air Force EELV. The reason that the F-35 won DOD funding was the adoption of the PHM and the PHM program could be leveraged to upgrade the logistics and supportability paradigm used in today s space systems. The F-35 vehicle design drivers were reliability, supportability, serviceability & maintainability, which all defined in its autonomic logistics system (ALS) that lowered the life-cycle cost of the F-35 JSF by 50% over existing fighter programs. The Air Force F-35 JSF Autonomic Logistics System utilizes sustaining engineering, a 24/7 help desk with access to electronic joint-service technical data, intelligent maintenance management with global supply chain accessibility and new support equipment management. The flight and maintenance training includes an integrated training program and on demand maintenance training, vehicle software reuse for providing a highly supportable aircraft. The F-35 JSF has a smart and reliable design incorporating prognostics and health management and remove-and-replace (R/R) maintenance in its condition-based maintenance (CBM) program. 15 The Air Force JSF autonomic logistics information system is a distributed information system and functions as an enterprise resource solution that is secure, scalable and deployable. Integrated support means design data is available to support information using predictive algorithms to measure remaining usable life to remove and replace any unit before failure occurs. An intelligent vehicle means prognostics & health management (PHM) is utilized on all systems. The vehicle flight operations are integrated with training for optimal mission performance, high sortie rate and a low logistics footprint. The F-35 ALS has integrated training and common, joint pilot/maintainer training that is modular with flexible pilot training. Mission planning accommodates for unexpected maintenance actions and the automatic ordering of spares. This avoids lengthy down time. The bookkeeping of parts is by serial number allowing easy to notice trends with a particular part, easy to notice trends with operating conditions and easy to detect design defects. The ALS Provides full asset visibility across fleet including parts life tracking that allows users to know actual condition of vehicle. 2
Figure 2. The Systems on the $135M Reliability Centered Air Force F-35 Joint Strike Fighter Employing Intelligent, Self-Prognostic Equipment Supported using a Condition-Based Maintenance (CBM) Program. II. Why Logistics, Supportability, Reliability, Serviceability and Availability are Important Design Parameters and Should Replace Equipment Performance in Priority for Space Missions At most large aerospace companies and organizations, design engineers are physically separated from the reliability and logistical engineers. The design engineer is responsible for the space systems physical design and performance while the reliability and logistic engineers calculate the reliability and logistics on paper almost independently from the design engineer, using stochastic equations provided by the procurement contract in a probability reliability analysis. All the technical information needed by the reliability and logistical engineers is provided in the procurement contract and so little technical interchange is necessary with the design engineers worried about performance. Table 1. Results from a 1989 Aerospace Corporation Study that Included 60 Air Force Procured Satellites that did not Employ Intelligent, Self-Prognostic Equipment, Proving that using PRA to Quantify Reliability and Dynamic Environmental Factory Acceptance Performance Testing to Confirm Equipment Performance Produces Space Vehicles Whose Reliability is Dominated by Premature Failures. 3
Stochastic equations were adopted around 1960, to quantify equipment reliability, maintainability and availability not because they were the right equations to use but because the ability to accurately define the desired behavior that the stochastic equations were used to quantify were believed to be too expensive, take too long and impossible to quantify. The stochastic equations use terms that provide results that seem important, but is just conjecture and speculation. Stochastic equations were developed to be the equations that engineers to start with to quantify the desired behavior and were never intended to be the final equations to quantify the desired behavior, since they use almost random data and do not reflect any real equipment behavior and are not the desired tools to quantify future equipment behavior. Year of Launch Figure 3. The Achieved Reliability of U.S. ICBMs and Launch Vehicles not Employing Intelligent, Self- Prognostic Equipment from 1957 through 2004 (Aerospace Corporation, 2005). Since the premature failure of spacecraft and launch vehicle subsystem equipment is proprietary information, it is not available to the public from the space systems builders. However, it may be available from official sources that do have contractual access to it. Based on industry sources such as Aerospace Corporation, Futron Corporation and Frost & Sullivan, the reliability of space systems is dominated by premature equipment failures, failures that occur weeks or months after beginning of life. When the reliability of equipment is dominated by premature failures, it demonstrates that the cause of premature failures is not understood, nor can they be stopped. The high number of premature space systems equipment failures and failures that occur during the normal lifetime and end of life requires an extensive and expensive logistics program and spares. Prior to the F-35, only aircraft suffered more premature failures than satellites and launch vehicles. 13 Space mission failure rates are often as high as 25%/year and occur either from a failure of the equipment on a launch vehicle or on the spacecraft. The likelihood of a mission critical premature equipment failure occurring on Air Force spacecraft is 70% within the first 45 days after arriving in space and so extensive and complex logics programs are needed. The commercial space industry counters the high premature failure rate by employing insurance policies that often cost over $25M for the launch and much more for the first year of on-orbit use. The cost of the launch and one year of on orbit use insurance policy is recovered by higher service charges paid by customers. The U.S. taxpayer absorbs the huge financial losses from both NASA and Air Force procured launch vehicles and satellites that fail prematurely. To identify the cause of premature failures, the research was completed that identified equipment with parts suffering from accelerated aging. 9 The accelerated aging is caused when at least one part is aging prematurely. As a part ages prematurely, it causes non-repeatable transient event (NRTE) in its performance data including analog telemetry. 1 The use of telemetry is a major design requirement on NASA, military and commercial satellites. Although telemetry is not separated out as a single cost, its use impacts all major design drivers and system cost in many ways. In the past, an NRTE was misdiagnosed as systemic noise and overlooked. Today, a prognostic analysis is used to illustrate and identify the presence of an NRTE in equipment performance data or telemetry. A prognostic 4
analysis is an engineering analysis that uses past performance data to predict equipment remaining usable life. A prognostic analysis cannot use speculation to identify the cause of an NRTE that is present. The heart of the condition-based maintenance program is the prognostic (scientific) analysis that is completed either automatically by the equipment with embedded, model-based or data-driven predictive algorithms. 11 Equipment with embedded predictive algorithms are considered intelligent equipment because they make decisions related to replacement by continuously measuring remaining usable life and they alert the right personnel through the on-board telemetry system in the near future. Intelligent equipment will improve the logistics of space systems and raise the reliability of space systems to a level never before achieved. Predictive algorithms measure the equipment remaining usable life invasively or the measurement is completed manually by engineering personnel trained in completing scientific analysis of the data. For existing systems or equipment without embedded algorithms, the prognostic analysis is done manually by personnel trained in completing a scientific analysis of the engineering data from equipment. 6 A scientific analysis is necessary due to the addition of probability reliability analysis (PRA) in procurement contracts to quantify equipment mission life. PRA was added not because it was the right action to use, but because it was the best idea at the time. After adopting PRA in the space industry, an engineering analysis was allowed to use speculation to explain the behavior in test data used to measure equipment remaining usable life and/or guess at the sources of non-repeatable transient events that are used to measure remaining usable life. The current procurement contracts for military and NASA space systems only require equipment performance to be measured and confirmed by the contractor. The usable life is not measured but calculated as the reliability using stochastic equations in a PRA. III. The History of the U.S. ICBM and its Relationship to Today s Reliability, Logistical and Supportability Paradigm used in Space Missions According to different sources, military and commercial space missions fail prematurely about 25%/year. When a military space system fails prematurely, military program officials may change the success criteria to achieve the congressionally approved failure rates as occurred on the EELV and so all military references are suspect. The commercial space industry is buoyed by insurance industry whose premium is paid for in the startup cost and recovered through higher prices for their satellite services. The taxpayer suffers the financial losses for both NASA and military space missions. The CBM reduces the size and complexity of the logistical system necessary for space exploration by reducing the need for replacement equipment by using equipment that will not fail prematurely. 4 Figure 3 illustrates the reliability of many U.S. ICBM and launch vehicles derived from an ICBM dating back from the very first test until 2004. The reliability of these systems are often described as needing a learning curve due to the extend time in decades that occurs before reliability levels off and tends towards 90%. These are steep learning curves, as described in game theory and they demonstrate that no experience was shared between vehicles in their development and test. The Air Force launch vehicles that are based on the ICBM with the same name include the Atlas, Titan and Delta (THOR). Each was designed and developed by the newly established Air Force at the Western Development Laboratory in Los Angeles CA during the early 1950 s using concurrent engineering. These U.S. ICBM programs came after paper studies funded by the Army-Air Force in the late 1940 s. The studies were motivated by the successful development of the German V-2 ICBM. Concurrent engineering was used by the Army-Air Force to speed the deployment of U.S. ICBMs to counter the ICBM threat by the Soviet Union in 1953, which had a fully developed ICBM with a nuclear bomb as its payload. Unable to identify the causes of the premature equipment failures of the fleet of U.S. ICBM s, many procurement contact changes were made after the Atlas, Titan and Thor (Delta) ICBMs were designed and fielded hoping to improve their reliability. The additions to space systems procurement contracts include: 1.) Using parts designed to withstand the rigors of space, 2.) Parts screening hoping to minimize the use of parts that will fail prematurely, 3.) Quality control program, 4.0 redundancies, 5.) Dynamic environmental qualification and acceptance factory testing program (ATP), 6.) Equipment analog and discrete telemetry measurements borrowed from the jet-aircraft flight-test community and 7.) Probability reliability analysis (PRA) using stochastic equations to quantify equipment and vehicle reliability as a probability. Stochastic equations use almost random factors to arrive at results that appear to be meaningful but are not related to the behavior it is being used to quantify. The use of PRA, borrowed from the merchant shipping industry to calculate insurance premiums allowed contractors to be paid even when a launch vehicle failed prematurely as long as their calculations for reliability met the contract results. PRA, which is a Markov-based, is used when the desired behavior is believed to be too 5
expensive or not possible to quantify accurately. The Markov property includes behavior being instantaneous and random. Figure 4. An Example of the Theoretical Weibull Cost-to-Reliability Relationship for Specific Parts Quality Levels used in Today s Reliability Paradigm for a Performance-Based Space System that Illustrates the Cost to Increase Reliability a Small Amount can Increase Exponentially. Note that 100% and 0% Reliability do not Exist when Reliability is Defined in Probabilistic Terms. When probability reliability analysis is used to quantify reliability, the cost to increase reliability a small percentage increases in cost much faster than the increase in reliability. 2 Table 1 shows the results of a study completed in 1989 by Aerospace Corporation to prove that the cost of requiring space vehicle suppliers to complete dynamic environmental factory testing was an effective tool in increasing space vehicle reliability. Table 1 shows that in the 60 satellites purchased by the Air Force from a variety of suppliers, an average of 4 mission critical failures occurred during factory acceptance testing (ATP) that would have occurred in space if the ATP had not been completed thus justifying the cost of the factory ATP. Table 1 also shows that after 47 of the 60 satellites were launched, there was a mission critical failure that occurred within 45 days on 70% of the satellites. Since equipment that passed ATP failed during vehicle ATP and failed after arriving in space, the authors proved that an ATP identifies the equipment that fails during ATP, but failed to recognize that the ATP itself does not identify the equipment that passes ATP but will fail prematurely. With equipment failure rates as high as occurred in the study, a logical system capable of providing large quantities of equipment quickly is needed. 7 Today, the reliability of space vehicle equipment and space vehicles continue to be specified to contractors as a probability and so allows space vehicle suppliers to deliver some space vehicles that fail prematurely without financial penalty. If the space vehicle supplier misses the contractually agreed to delivery date, a financial penalty is levied on the supplier, thus motivating company test personnel to overlook test data that will slow down testing and increase risk of missing the delivery date. 5 Figure 5 summarizes research conducted by NASA GSFC Code 570 personnel to quantify the number of premature equipment failures that occurred on NASA and military spacecraft between 1990 and 2001. The study was completed in 2001, but its message is still pertinent to today s spacecraft reliability. 12 NASA adopted an integrated health management (IVHM) and prognostic health management (PHM) program in 2009 to increase safety and mission assurance on all future NASA aircraft. 6
Figure 5. Results from a NASA GSFC Study Completed to Quantify the Number of Premature Satellite Subsystem Equipment Failures on NASA GSFC Procured Satellites without Intelligent, Self-Prognostic Equipment from 1990 to 2001. 3 Figure 6 illustrates the number of unclassified military satellites that failed prematurely (red) from 1959 until 2009 compared to the total number (blue) launched from Futon Corporation. The data was originally generated to illustrate that as the number of premature satellite failures increased per year, the number would immediately decrease after significant attention by the military was focused on the contractor, demonstrating that the premature failures were contractor-personnel related. The processes and practices used in the manufacture and test of the Atlas, Titan and Delta ICBMs were adopted by other industries whose companies did not receive payment when their products failed prematurely. A few of these companies decided that when their products failed prematurely they did not fail instantaneously and randomly but exhibited behavior that could be observed in other like products whose remaining usable life could be accurately predicted. The behavior that occurred in equipment and products that preceded a failure became known as failure models, and failure models were the first to be used with pattern recognition software in model-based predictive algorithms. The performance data in failure models is collected from circuits embedded in either electrical or mechanical equipment. Failure models are generated using equipment analog telemetry and any performance data available acquired during factory testing or in the field.. Other industries have added telemetry systems to their equipment and products to measure equipment/product usable life. For products that failure models were not available, data-driven predictive algorithms were developed. Datadriven algorithms use the data from equipment testing and while in use to determine their normal, baseline behavior from which to identify the behavior that precedes a failure. Data-driven algorithms are better suited for aerospace equipment because aerospace equipment is generally produced in relatively small quantities unlike consumer products that are often produced by the millions. Using the behavior that precedes a product failure, the condition-based maintenance (CBM) program was developed. It leverages the presence of the symptoms that precede a failure to determine whether action is needed. The CBM replaces the routine maintenance program, which replaces equipment and actions are completed based on a predetermined schedule rather than need. In the CBM, equipment remaining usable life is predicted using either embedded predictive algorithms or manually by completing a prognostic analysis using equipment telemetry. Predictive algorithms covert the equipment telemetry, which is performance data into a measurement of usable life thus allowing equipment performance to measured and confirmed and remaining usable life to be known simultaneously. 7
Figure 6. The Number of (Unclassified) Military Satellites Launched (blue) and the Number of (Unclassified) Military Satellites that Failed Prematurely (red) from 1959 that did not use Intelligent, Self-Prognostic Equipment (Futron Corp). IV. The Reliability-Centered Space System With a few exceptions, preventive maintenance has been considered the most advanced and effective maintenance available for use by many organizations. A Preventive Maintenance program (PMP) is based on the assumption of a "fundamental cause-and-effect relationship between scheduled maintenance and operating reliability. This assumption was based on the intuitive belief that as mechanical parts wear out, the reliability of any equipment is related to its operating age. It therefore follows that the more frequently equipment was repaired or overhauled, the better protected it was against the likelihood of failure. The problem that drove the cost was in determining what age was necessary to assure reliable operations. Figure 7. The Intelligent, Self-Prognostic, Reliability-Centered Army/Rockwell Collins Shadow UAV using Embedded Predictive Algorithms from its PHM Program to Measure On-Board Equipment Remaining Usable Life for Down-Linking in Real-Time to Shadow Maintenance Crew. 8
Some experts reached the conclusion that, a maintenance policy based exclusively on a maximum operating duration would have little or no effect on the failure rate no matter what the limit. In separate independent studies, it was found that a difference existed between the perceived and the intrinsic design life for the majority of equipment and components. It was discovered that in many systems, equipment greatly exceeded the perceived or stated design life, while failed prematurely. Reliability-Centered Maintenance (RCM) programs provide the optimum mix of reactive, time- or intervalbased, condition-based and proactive maintenance practices. The application of each strategy is shown in Fig. 1. These principal maintenance strategies, rather than being applied independently, are integrated to take advantage of their respective strengths in order to maximize facility and equipment reliability while minimizing life-cycle costs. Figure 8. Components of a Reliability Centered Maintenance Program. RCM includes reactive, time-based, condition-based and proactive tasks. In addition, a user should quantify the system s boundaries and performance envelopes, system/equipment functions, functional failures, and failure modes, all of which are critical components of the RCM program. Preventive Maintenance (PM) programs assumes that failure probabilities can be determined statistically for individual machines and components, which is not true, and parts can be replaced or adjustments can be performed in time to preclude failure. For example, a common practice has been to replace or renew bearings after so many operating hours assuming that bearing failure rate increases with time in service. Fortunately, advances using a prognostic and health management (PHM) program have made it possible in many cases to identify the precursors of failure, quantify equipment condition and schedule the appropriate replacement with a higher degree of confidence than was possible when performing strictly interval-based maintenance relying upon usually erroneous estimates of when a component might fail. It has been discovered that there are many different equipment failure characteristics, only a small number of which are age- or use-related. This new knowledge has increased the emphasis on Condition Monitoring (CM), often referred to as Condition- Based Maintenance (CBM), which has caused a reduction in the reliance upon time-based preventive maintenance. It should not be inferred from the above that all interval-based maintenance should be replaced by conditionbased maintenance. In fact, interval-based maintenance is appropriate for those instances where abrasive, erosive or corrosive wear takes place, material properties change due to fatigue, embrittlement, etc. and/or a clear correlation between age and functional reliability exists. In addition, for those systems or components where no failure consequences in terms of mission, environment, safety, security or Life-Cycle Cost (LCC) exist, maintenance should not be performed, i.e., the equipment should be run to failure and replaced. 9
Figure 9. An Example of a Reliability Centered Maintenance (RCM) Logic Tree The concept of RCM has been adopted across many government and industry operations as a strategy for performing maintenance. RCM applies maintenance strategies based on consequence and cost of failure. In addition, RCM seeks to minimize maintenance and improve reliability throughout the life cycle by using proactive techniques such as improved design specifications, integration of condition monitoring in the commissioning process, and the Age Exploration (AE) process. V. Using a Reliability Centered, CBM with the Air Force s Reusable Space Booster The Air Force developing a reusable space booster (RSB) to replace the EELV that has exceeded all worse case cost projections to launch NASA and military satellites and is being reviewed for termination. The performance of the RSB has been defined as that of the EELV, offering the Air Force an opportunity to design a reliability-centered RSB rather than an exotic performance based RSB. The reusable space booster (RSB) is being studied by several agencies within the Air Force hoping to use vehicle Isp performance as the RSB design driver rather than reliability, maintainability, serviceability and supportability. Figure 10 is the Air Force classified, Boeing/NASA X-37, manned spacecraft launched several times to obtain flight and handling characteristics data for the future reusable space booster at mach 25-reentry velocity hoping to obtain information that will allow the Air Force to quantify flight behavior to include in the contractor procurement documents for the RSB. The next launch of the X-37 is planned for October 2012 using an Atlas V 501. 10
Figure 10. The Classified Boeing/NASA/Air Force 30 ft. X-37, Human Rated Reusable Spacecraft without Intelligent, Self-Prognostic Equipment, Launched to Obtain Long-Duration, Automated Space Flight Data, Reentry Flight Characteristic Data and Automated Payload Deployment Data. Figure 11. An Artist Concept for the Air Force s $25B Unmanned Reusable Space Booster Maintained Like the F-35 JSF Employing Intelligent, Self-Prognostic Equipment throughout with a System Availability and Supportability Exceeding the EELV, Costing 75% Less per Launch than the EELV. Only the RSB availability requirement has been defined by the Air Force with the payload lift capability to LEO defined as the same as the EELV and so studying extreme payload lift performance options is not necessary for the RSB but will probably be done because the design engineers are performance based and not reliability/availability based. The RSB will function as an aircraft during a brief portion of its flight and as a space booster to get its payload to space. As a reusable aircraft, the RSB can benefit using the same logistical program adopted by the Air Force s F- 35 Joint Strike Fighter in which the life-cycle cost was decreased by 50% by using a condition-based maintenance (CBM) program, also known as predictive maintenance program (PMP), over all previous jet fighter aircraft. The CBM program is being back-fitted on existing Navy and Air Force fighters that use a routine based maintenance program to lower the life-cycle cost. The F-35 won funding during peacetime with no super power enemy defined. It did so due to the inexpensive life cycle cost from the CBM. All future manned and unmanned fighter aircraft will use the CBM. Autonomic Logistics (AL) is a seamless, embedded solution that integrates current performance, operational parameters, current configuration, scheduled upgrades and maintenance, component history, predictive diagnostics 11
(prognostics) and health management, and service support based on the CBM program used on the Air Force s F-35. Essentially, AL does invaluable and efficient behind-the-scenes monitoring, maintenance and prognostics to support the aircraft and ensure its continued good health. Commonality: Commonality is the key to affordability on the assembly line; in shared platforms; in common space systems that enhance maintenance, field support and service interoperability; and in almost 100 percent commonality of the avionics suite. Component commonality across all three variants reduces unique spares requirements and the logistics footprint. In addition to reduced flyaway costs, the CBM is designed to integrate new technologies easily during its entire life cycle. Figure 12. The Intelligent, Prognostic-Based, Reliability Centered $130M Air Force F-35 Joint Strike Fighter Designed to use a Condition-Based Maintenance Program Employing Embedded Predictive Algorithms in all Electrical and Mechanical Equipment Lowering Life Cycle Cost by 50% with Equipment that does not Fail Prematurely. After over-procuring several hundred of each of the highly unreliable ICBMs in the 1950 s, the Air Force was directed by the U.S. government to use the highly unreliable and expensive Atlas, Delta (THOR) and Titan ICBM s to launch U.S. military and government payloads. They initially achieved space mission reliability as low as 40%. Although each ICBM/launch vehicle has been updated and upgraded many times, the basic Atlas and Delta ICBM is still the dominate U.S. launch vehicles. In the 1960 s the companies that readied the Atlas, Delta and Titan for launch expanded their customer base to include NASA and civil payloads. This was approved by Congress to decrease the cost of launching military satellites by spreading the non-recurring costs for launching each ICBM over many more payloads. In the mid 1970 s, the U.S. and international launch vehicles were dominated by Atlas and Delta and these companies became unresponsive to the international and specifically the European satellite customer-base. The Ariane launch vehicle was developed by a French company called Arianespace and it was dedicated to supporting the European military civil satellite market. The RSB can benefit from a CBM by exploiting the key elements of the CBM, affordability, survivability, maintainability and supportability. This is done by enhance flight safety, increase system availability, eliminate false alarms such as CND s and RTOK s during maintenance and reduced life cycle costs Autonomic Logistics (AL): The CBM autonomic logistics system monitors the health of the aircraft systems in flight; downlink that information to the ground; and trigger personnel, equipment and parts to be pre-positioned for quick turnaround of the aircraft. The AL is a natural evolution of legacy diagnostic capabilities coupled with the added functions, capabilities, and benefits offered by new space flight proven technologies. Ultimately, this automated approach results in higher launch rates necessary to support planned scheduled flight rates and increases in military space missions without any improvements. Autonomic logistics is also something of a mind reader. Through a system called prognostics and health management, computers use accumulated data to keep track of when a part is predicted to fail. With this aid, maintainers can fix or replace a part before it fails and keep 12
the aircraft ready to fly. Like the rest of the program, the autonomic logistics system is on a fast track. It has to be available to support the air vehicle during operational test and evaluation. Because logistics support accounts for two-thirds of a reusable boosters life cycle cost, a reusable space booster will achieve unprecedented levels of reliability and maintainability, combined with a highly responsive support and training system linked with the latest in information logistical information and technology. The spacecraft will be ready for fight anytime and anywhere. 15 The F-35 is designed to reduce operational and support costs by increasing reliability and reducing required maintenance. Such high reliability enables rapid deployment with minimum support equipment. The cost to operate and maintain the F-35 is 50 percent less than that for the aircraft it replaced. For decades, the concept of repairing new aircraft came only after the aircraft was built. Then, it had to conform to an existing logistics structure. The JSF autonomic logistics is built concurrently with the air vehicle and that it performs with a level of information accuracy, best value, and total life cycle cost from the start. VI. What is a Prognostic & Health Management Program? 8 PHM is an enhanced diagnostic process for determining the state of a component to perform its function(s), allows a high degree of fault detection and fault isolation capability with very low false alarm rate. It leverages the presence of accelerated aging in equipment performance data including analog telemetry to identify the equipment that will pass ATP and fail prematurely after arriving in space. Prognostics measures actual material condition or state of health, which includes predicting and determining the remaining useful life and performance life remaining of components. Health management systems make intelligent, informed, appropriate decisions about maintenance and logistics actions based on diagnostics/prognostics information, available resources and operational demand. VII. Goals of PHM Enhance Space Vehicle Safety, Reliability and Availability The engineering and management decisions are different for preventing a failure rather than react to a failure after one occurs. Reduce Maintenance Manpower, Spares & Repair Costs A 50% reduction in cost can be expected when adopting a CBM over a routine maintenance program as achieved on the Air Force s F-35 JSF. Maximize Lead Time For Maintenance & Parts Procurement The equipment informs operational personnel when it needs replacing and when it will fail. Eliminate Scheduled Inspections and Enables CBM Actions are needed only when the equipment informs personnel. Opportunistic maintenance reduces A/C down time Provide Real Time Notification& Health Reporting Only informs ground personnel what NEEDS to be known immediately Downlink info & answers in-flight Informs maintenance & auto-log of the rest Aids in Decision Making & Resource Management The equipment is the source of information for determining what equipment needs to be replaced and when rather than a schedule for replacement, replacing only equipment that needs to be replaced. Reduce Life Cycle Costs - Personnel are deployed and actions taken only when needed based on the intelligent equipment. Eliminate CNDs & RTOKs Non-repeatable transient events are tied to equipment end of life and not systemic noise. Detect Incipient Faults & Monitor until Just Prior to Failure Equipment change out is can be conducted at the time of failure or prior to the failure to manage failures to a positive conclusion. Identify Potentially Catastrophic Failures Weeks/Months Before They Occur Allows time for contingency procedures to be developed and rehearsed in necessary, stopping surprise equipment failures that may increase risk of total mission failure due to surprise unexpected equipment responses that may be hidden during recovery procedures implementation. Uses Limited Specialized Sensors Uses existing telemetry system sensors and data links. Take Max Advantage of the Smart Digital Spacecraft Allows leveraging the digital data communications systems. 13
VIII. PHM Constituent Functions and Processes Fault Detection Identifies that equipment will experience a disruption of service sometime in the near future. Fault Isolation Identifies what equipment will experience a disruption of service in the near future. Advanced Diagnostics Identifies the equipment with at least one part suffering from accelerated aging. Predictive Diagnostics/ Remaining Useful Life Predictions Predicts remaining usable life of equipment with at least one part suffering from accelerated aging. Component Life Tracking Allows the parts that experience accelerated aging to be identified and tracked. Performance Degradation Trending Trend equipment analog telemetry to ensure normal aging occurs. False Alarm Mitigation Using only flight proven predictive algorithms means that there will be no false alarms. Warranty Guarantee Tracking Provides the data enabling new business practices such as eliminating product/equipment warranty. Selective Fault Reporting Only tells ground personnel what needs to be known and action to be taken immediately Aids in Decision Making & Resource Management Fault Accommodation and Possible Reconfiguration Automatic redundancy switching when necessary. Information Management Right info to right people at right time IX. Maintenance Programs The goal of maintenance is to avoid or mitigate the consequences of failure of equipment. This may be done by preventing the failure before it occurs. It is designed to preserve and restore equipment reliability by replacing worn components before they fail. Preventive maintenance activities include partial or complete overhauls at specified periods, oil changes, lubrication and so on. In addition, workers can record equipment deterioration so they know to replace or repair worn parts before they cause system failure. The ideal preventive maintenance program would prevent all equipment failure before it occurs. Reactive Maintenance Program - Maintenance is performed only after a machine fails or experiences problems. Preventive Maintenance Program - Preventive maintenance can be described as the maintenance of equipment or systems before fault occur. It can be divided into planned maintenance and condition-based maintenance. The main difference is determination of maintenance time, or determination of moment when maintenance should be performed. While preventive maintenance is generally considered worthwhile, there are risks such as equipment failure or human error involved when performing preventive maintenance, just as in any maintenance operation. Preventive maintenance as scheduled overhaul or scheduled replacement provides two of the three proactive failure management policies available. Preventive maintenance is conducted to keep equipment working and/or extend the life of the equipment while corrective maintenance, sometimes called "repair," is conducted to get equipment working again. Predictive Maintenance Program - Predictive maintenance techniques help determine the condition of inservice equipment to predict when maintenance should be performed. This approach offers cost savings over routine or time-based preventive maintenance, because tasks are performed only when warranted. The main value of predicted maintenance is to allow convenient scheduling of corrective maintenance, and to prevent unexpected equipment failures. The key is "the right information in the right time. By knowing which equipment needs maintenance, maintenance work can be better planned (spare parts, people etc.) and what would have been "unplanned stops" are transformed to shorter and fewer "planned stops, thus increasing plant availability. Other advantages include increased equipment lifetime, increased safety, fewer surprise accidents with negative impact, and optimized spare parts handling. Condition-based maintenance, attempts to evaluate the condition of equipment by performing periodic or continuous (online) equipment condition monitoring. The ultimate goal of predictive maintenance is to perform maintenance at a scheduled point in time when the maintenance activity is most cost-effective and before the equipment loses performance within a threshold. 14
Figure 13. The Relative Cost of Different Maintenance Programs. (Appleby Reliability, 2012) 16 This is in contrast to time- and/or operation count-based maintenance, where a piece of equipment is maintained whether it needs it or not. Time-based maintenance is labor intensive, ineffective in identifying problems that develop between scheduled inspections, and is not cost-effective. The "predictive" component of predictive maintenance stems from the goal of predicting the future trend of the equipment's condition. This approach uses principles of statistical process control to determine at what point in the future maintenance activities will be appropriate. Most predictive analysis is performed while equipment is in service, thereby minimizing disruption of normal system operations. Reliability-Centered Maintenance Program - Emphasizes the use of predictive maintenance techniques in addition to traditional preventive measures. When properly implemented, RCM provides the tools for achieving lowest asset Net Present Costs (NPC) for a given level of performance and risk. One area that many times is overlooked is how to, in an efficient way, transfer the predictive maintenance data to a Computerized Maintenance Management System (CMMS) system so that the equipment condition data is sent to the right equipment object in the CMMS system in order to trigger maintenance planning, execution and reporting. Unless this is achieved, the solution is of limited value, at least if the predictive maintenance solution is implemented on a medium to large system with tens of thousands pieces of equipment. X. The RCM Principles RCM is Function Oriented RCM preserves system or equipment function, not just operability for operability's sake. Redundancy of function, through multiple pieces of equipment, improves functional reliability but increases life cycle cost in terms of procurement and operating costs. RCM is System Focused RCM is more concerned with maintaining system functionality than with individual component function. RCM treats failure statistics in an actuarial manner. The relationship between operating age and the failures experienced is important. RCM is not overly concerned with simple failure rate; it seeks to know the conditional probability of failure at specific ages (the probability that failure will occur in each given operating age bracket is ignored). RCM Acknowledges Design Limitations. RCM objective is to maintain the inherent availability of the equipment, recognizing that changes in inherent reliability are the province of design rather than of maintenance. Maintenance can, at best, only achieve and maintain the level of reliability for equipment that was provided for by design. However, RCM recognizes that maintenance feedback from the PHM improves on the original design. In addition, RCM recognizes that a difference often exists between the calculated design life and the actual design life and addresses this through the Age Exploration (AE) process. 15
RCM is driven by safety, security and economics safety and security must be ensured at any cost so costeffectiveness becomes a criterion. RCM Defines Failure as "Any Unsatisfactory Condition" Therefore, failure may be either a loss of function (operation ceases) or a loss of acceptable quality (operation continues but impacts quality). RCM must address the failure mode and consider the failure mode characteristics. RCM tasks must be effective to reduce the number of failures while cost-effective. RCM Acknowledges Three Types of Maintenance Tasks: The time-directed (preventive maintenance), condition-directed (CM) and failure finding (one of several aspects of Proactive Maintenance). Time-directed tasks are scheduled when appropriate. Condition-directed tasks are performed when conditions indicate they are needed. Failure-finding tasks detect hidden functions that have failed without giving evidence of pending failure. Additionally, performing no maintenance, Run-to-Failure, is a conscious decision and is acceptable for some equipment. RCM is a learning system, gathers, remembers data from the results achieved and feeds this data back to improve design and future maintenance. This feedback is an important part of the proactive/preventative Maintenance element of the RCM program. XI. Space Vehicle Design Drivers Although rocket scientists and engineers can design performance-based systems, they are not trained to design reliability-centered systems. The design drivers for a space launch system generally include the maximum payload size and weight, and system reliability and availability through supportability. Increases in reliability using probability reliability analysis to quantify equipment reliability are made by the reliability engineer who selects more expensive, better screened parts (a.k.a. space qualified), more equipment for more redundancy. The only requirements defined for the Air Force RSB has been the payload capability to match the EELV, vehicle availability and reliability. Figure 14. An Artist Concept for the Congressionally Mandated Space Launch System (SLS) for Launching up to 6 astronauts to Mars, Managed by the Marshal Space Flight Center that does not Require Contractors to Identify the Equipment that will Fail Prematurely for Replacement before Launch thus Increasing Risk and Safety to the On-Board Astronauts. 16
Quantifying the turnaround time for the subsequent launch will be important to the designers. The CBM allows the equipment to self-prognose and identify only the equipment that needs to be replaced be ordered and replaced and this information can be available to the maintenance personnel prior to the return of the RSB for landing using the on-board telemetry system. Although the weight of the payload forces the RSB to be launched in a vertical position to minimize the weight and size of the RSB landing gear, much as the NASA Space Shuttle employed, the RSB will be able to utilize the CBM. A CBM allows the fastest turn around similar to the turnaround time for an Air Force F- 35 JSF meeting the expected high availability requirement for the RSB. XII. The Game Changing Technology using Proprietary Predictive Algorithms used on NASA, Air Force and Commercial Satellites, Missiles and Launch Vehicles A predictive algorithm includes a series of actions, resulting in a scientific analysis, made by personnel trained to identify the presence of accelerated aging that is related to equipment end-of-life. Accelerated aging is caused from at least one part that is decreasing in performance faster than all other parts and eventually causes nonrepeatable transient events (NRTE) in analog telemetry or performance data of any type. In the past when processing speeds were low and software used in test caused glitches that resulted in an NRTE, all NRTE s were misdiagnosed from test equipment or communications resources and ignored. With processing speeds in GHz and telemetry remaining inn Hz and KHz, any NRTE is caused from the equipment or product under test. The behavior in telemetry that is illustrated by predictive algorithms is not been identifiable using an engineering analysis. Using predictive algorithms as part of a prognostic and health management program will prevent surprise and premature failures from occurring. Using diagnostic technology, personnel are trained to react with an engineering analysis called a failure analysis after a failure occurs. An engineering analysis allows speculation and conjecture when insufficient engineering data is available for a more conclusive result, thus often associating the failure with the wrong cause. Figure 15. The Relationship between Time-Series Diagnostic Data, Diagnostic Analysis Results (Diagnostics), Prognostic Data, Prognostic Analysis Results (Prognosis) and Prednostic Data and Prednostic Analysis Results (Remaining Usable Life). The tools in predictive diagnostics identifies the presence of accelerated aging that may be present from at least one part whose performance has degraded sufficiently to cause a non repeatable transient event (NRTE) that is visible in equipment telemetry or performance data of any type after processing with predictive algorithms that is related to equipment end of life with certainty. The equipment that will fail prematurely is identified by using a scientific analysis for replacement thus preventing a failure from occurring. Changing the paradigm from one of reaction to prevention requires training in completing a scientific analysis. The academic and on the job experience necessary to complete the scientific analysis is acquired with special training. Proprietary predictive algorithms simply relate past equipment, non-repeatable transient events that is identifiable in equipment engineering test data 17
with equipment end of life. The relationship of time series-data (telemetry), prognostics (predictive diagnostics) and prednostics (determining remaining usable life) is as follows: The relationship identified in Figure 15 between time-series diagnostic data such as equipment telemetry (performance information) and equipment remaining usable life is defined in Figure 15. The analyze operator is the predictive algorithms. In predictive diagnostics, the analysis of diagnostic data /analog telemetry results in a diagnostic analysis (current technology). The analysis of the results from the diagnostic data is prognostic data. The analysis of the results from the prognostic data is prednostic data - which is the equipment remaining usable life. In a prognostic or scientific analysis, the analysis operator is accomplished by the predictive algorithms. Table 2. An Example of a List of Proprietary Dynamic, Data-Driven Predictive Algorithms for Measuring Spacecraft Equipment Usable Life in Days/Weeks/Months used Successfully on Commercial, Military and NASA Spacecraft and Launch Vehicles and Pioneered on the Air Force GPS Block I satellites. 10 Number Algorithm Name Purpose of Algorithm 1 Baseline Analysis Identifies short and long term normal data behavior 2 Change Analysis Determines change from normal behavior. 3 Comparison Analysis Determines when a change in normal behavior is occurring 4 Day of Failure Search large data sets for common behavior during the same time 5 *Digital Processing Replaces outliers improving image accuracy and resolution 6 Discrimination Analysis Identify behavior that has changed from normal behavior 7 Mathematical Modeling Generates normal behavior from an inadequate data 8 Multi-Variant Limit Analysis Simultaneous analysis across several variables 9 Rate Change Analysis Identifies magnitude of change of behavior 10 Remaining Usable Life Determines remaining usable life 11 Statistical Sampling Reduces amount of data without eliminating desired behavior 12 State Change Analysis Identifies data to be evaluated 13 Super Impositioning Identifies data to be analyzed further for failure signature 14 *Super Precision Improves data integrity 15 *Telemetry Authentication Improves data integrity 16 Virtual Telemetry Creates normal data behavior when none is available 17 Data Integration Creates image for analysis 18 Dataset Generation Creates data set manually when access is not available *Algorithms necessary only when RF or noise from communications equipment may be present XIII. Conclusion Space systems logistical and supportability programs were originally developed for expendable systems such as the Atlas, Titan and Delta (THOR) launch vehicles. These logistical systems are expensive and complex and must provide large quantities of unneeded equipment and goods due to the extremely high number of premature equipment failures that occur during all phases of the design, manufacturing, launch readiness and launch. This is true for NASA, commercial and military space systems that utilize the routine maintenance program that was developed to support a performance based design. Because (non-aerospace) companies must suffer the financial losses associated with producing equipment and products that fail prematurely, these companies developed the prognostics and health management program to stop the manufacture of products and equipment that fail prematurely. The PHM improves space systems reliability, availability, serviceability and supportability greatly by eliminating premature equipment failures requiring far less equipment shipped and far less equipment needing to be replaced. 18
Using the condition-based maintenance program that as part of a prognostics and health management program, existing and future space systems mission life defined by the traditional bathtub reliability curve increases with no change to the equipment design illustrated by a hot tub curve, while the cost to produce and maintain space systems greatly decreases. The PHM requires equipment and products to provide real-time telemetry and many industries are adding telemetry and data acquisitions to the products and equipment to stop premature and surprise failures. Aerospace equipment already uses telemetry and so no design changes are needed to benefit from a PHM if the prognostic analysis is done manually. If the algorithms are embedded in the equipment, a redesign will be necessary but requalification can be by similarity. The tools that allow the logistic programs for space systems to increase availability and supportability results from adopting the reliability centered program, which include using model-based and data-driven predictive algorithms to measure space systems equipment remaining usable life at any location, at any time. Equipment with a measured remaining usable life of less than one year will fail prematurely and so should be replaced before launch, or if detected in space, the surprise failure be managed to a positive conclusion through a variety of actions available. Model-based or data-driven predictive algorithms allow a scientific analysis, which replaces the engineering analysis, to be completed using existing equipment performance data including equipment analog telemetry to measure equipment remaining usable life, allowing a condition-based maintenance program to be employed. The engineering analysis allows conjecture and speculation and is thus inadequate for producing equipment that will not fail prematurely. The CBM identifies the right equipment needed at the right time, thus stopping equipment replacement based on a schedule rather than actual wear-out. The Air Force F-35 JSF adopted the CBM program and decreased the program life cycle cost by 50% over existing fighter programs winning DOD during peacetime with no identified super power enemy. Using the CBM on existing space systems by back fitting equipment with predictive algorithms or by completing a prognostic analysis manually using personnel trained to complete a scientific analysis will improve space logistics reliability, maintainability and supportability and decrease space systems life cycle cost by as much as 50%. Using a CBM on existing and new space systems such as the Air Force RSB will allow the Air Force to procure space systems equipment that will not fail prematurely. They will also exceed mission life, allowing fewer spacecraft to be procured, lowering overall space program cost and space systems budget demands and eliminating the embarrassing premature failures of NASA, commercial and military space systems that are too important and too expensive to fail. REFERENCES 1. Military.com, http://air Force Telemetry Satellite Careers/USMilitary_com.mht, December 2011. 2. Hamberg, Otto and Tosney, William, The Effectiveness of Satellite Dynamic Environmental Acceptance Tests Aerospace Corporation. 1989. 3. Futron Corporation, Military Satellite Reliability, from 1959 to Today, Bethesda, Maryland, 2009. 4. Chang, I-Shih, Aerospace Corporation, Launch Vehicle Reliability, Crosslink, 2001. 5. Robertson, Brent, Stoneking, Eric, Satellite GN&C Anomaly Trends, NASA GSFC, Code 570, Greenbelt MD., paper number AAS 03-071, 2002 6. Losik, Len, Upgrading the Space Flight Factory Acceptance Testing for Equipment and Space Vehicle Design, Manufacture, Test and Integration, 2009 AIAA Space Conference Proceedings. 7. Losik, Len, Predicting Hardware Failures and Estimating Remaining-Usable-life from Telemetry, SanLen Publishing, Sacramento, CA, 2004, ISBN 978-0-9767491-9-6 8. Cheng, Shunfeng and Pecht, Michael, Multivariate State Estimation Technique for Remaining Usable Life Prediction of Electronic Products, Association for the Advancement of Artificial Intelligence, CALCE, 2007. 9. Pecht, Michael G. Prognostics and Health Management of Electronics, Wiley Publications, CALCE Electronics Products and Systems, University of Maryland, 2008, 10. Losik, Len, Stopping Launch Pad Delays and Launch Failures, Satellite Infant Mortalities and On-Orbit Satellite Failures Using Telemetry Prognostic Technology, Proceedings from the International Telemetry Conference, Las Vegas NV. October 2007. 11. Data-Driven Predictive Algorithms Users Guide V2.25, Failure Analysis, Capitola CA. 12. http://www.aeronautics.nasa.gov/nra_pdf/ivhm_tech_plan_c1.pdf 19
13. Frost & Sullivan, Commercial Communications Satellite Bus Reliability Analysis, August 2007. 14. Hess, Andrew, Joint Strike Fighter, Diagnostic, Prognostic and Health Management a Thirty Year Retrospective, presentation, NASA ISHEM Conf. Napa Valley, Oct 2005, Joint Strike Fighter Program Office. 15. Hess, Andrew, The Joint Strike Fighter (JSF), Prognostics and Health Management Program, JSF Program Office presentation, 2006. 16. http://www.applebyreliability.com 17. http://www.wbdg.org/resources/rcm.php?r=wbdg_approach 20