Reliability-Centered Maintenance: A New Approach



Similar documents
What is Reliability Centered Maintenance? A Brief History of RCM

Reliability centered maintenance: managing cost and risk

2012 Americas School of Mines

Understanding maintenance strategies

Chapter 5 Types of Maintenance Programs

Enhance Production in 6 Steps Using Preventive Maintenance

Increase Equipment Uptime Through Robust Enterprise Asset Management. For the Upstream, Midstream & Downstream Sectors of the Oil & Gas Industry

case study Metro St. Louis Predictive Monitoring Summary Introductory Overview ORGANIZATION: PROJECT NAME:

Airline Fleet Maintenance: Trade-off Analysis of Alternate Aircraft Maintenance Approaches

Utilizing Real-Time Information in Enterprise Asset Management Systems

Better-performing Assets Make Companies

Operations & Maintenance 101 Maintenance Strategies and Work Practices to Reduce Costs

Asset Management 101

Actuarial Processes Delivering Business Performance Improvement Through Business Process Outsourcing

IMPLEMENTING A RELIABILITY CENTERED MAINTENANCE PROGRAM AT NASA'S KENNEDY SPACE CENTER

Chapter 9 Reliability Centered Maintenance

High Availability White Paper

Scheduled Maintenance

Advantages of Outsourced Clinical Engineering Model Outweigh In-House Model. By Richard P. Miller, President and CEO, Virtua Health

Flour Milling Process Analysis Get more out of your production with High Resolution in-line analysis. ProFoss. Dedicated Analytical Solutions

Mozzarella Process Analysis Get more out of your production with High Resolution in-line analysis. ProFoss. Dedicated Analytical Solutions

ENTERPRISE ARCHITECTUE OFFICE

Creating a supply chain control tower in the high-tech industry

AIRCRAFT SERVICES COMPONENT SERVICES ENGINE SERVICES LINE MAINTENANCE TRAINING SERVICES ENGINE PARTS REPAIR SERVICES

IT SERVICE MANAGEMENT: HOW THE SAAS APPROACH DELIVERS MORE VALUE

SAM Benefits Overview

How B2B Customer Self-Service Impacts the Customer and Your Bottom Line. zedsuite

Predictive Maintenance

Optimizing the Data Center for Today s State & Local Government

Asset Vulnerability: The Six Greatest Risks Facing IT Asset Inventory and Management and the Single Automated Solution

Industry Solution. Predictive Asset Analytics at Power Utilities

Modernize your Monitor Pro software with less risk, less cost and less effort

PREMIER SERVICES MAXIMIZE PERFORMANCE AND REDUCE RISK

Autonomous Maintenance

Revitalising your Data Centre by Injecting Cloud Computing Attributes. Ricardo Lamas, Cloud Computing Consulting Architect IBM Australia

EQUIPMENT MANAGER S GUIDE

Obbie Lephauphau Snr Mechanical Engineer Asset Management department Sasol Mining, South Africa

What you need to know about cloud backup: your guide to cost, security and flexibility.

Agile Governance. Thought Leadership

Optimization of Clinical Engineering in Private Healthcare

Capitalizing on The Internet of Things

Smarter grids, cleaner power, and the future of utility asset management

Data Management Practices for Intelligent Asset Management in a Public Water Utility

The Next Generation APC by Schneider Electric InfraStruxure

Predictive Maintenance

Customer Needs Management and Oracle Product Lifecycle Analytics

Condition Based Maintenance:

Service Lifecycle Management Solutions

Federal Data Centers: The Build vs. Buy Decision

2009 NASCIO Recognition Award Nomination State of Georgia

FAC 2.1: Data Center Cooling Simulation. Jacob Harris, Application Engineer Yue Ma, Senior Product Manager

Features. Emerson Solutions for Abnormal Situations

POWER. Your Partners in Availability POWER

Symantec Residency and Managed Services

Operations, Maintenance, and Commissioning

White Paper. Making the case for PPM

Predictive Maintenance in a Mission Critical Environment

SAP Preventive Maintenance The Core and More. Len Harms - Vesta

Program Summary. Criterion 1: Importance to University Mission / Operations. Importance to Mission

PROACTIVE ASSET MANAGEMENT

Bob Steibly, GenesisSolutions

Offshore Wind Farm Layout Design A Systems Engineering Approach. B. J. Gribben, N. Williams, D. Ranford Frazer-Nash Consultancy

City of Richmond Capital Planning and Asset Management

Proactive Asset Management with IIoT and Analytics

I D C T E C H N O L O G Y S P O T L I G H T. F l e x i b l e Capacity: A " Z e r o C a p i t a l " Platform w ith On- P r emise Ad va n t a g e s

Infor Enterprise Asset Management (EAM)

Extend and optimize the life of your plant the modular Life Cycle Service portfolio

It s Time to Save for Retirement. The Benefit of Saving Early and the Cost of Delay

Moving Service Management to SaaS Key Challenges and How Nimsoft Service Desk Helps Address Them

IBM Information Technology Services Global sourcing.

TOP TEN CONSIDERATIONS

FACILITIES MAINTENANCE

Managing business risk

Mastering Disaster Recovery: Business Continuity and Virtualization Best Practices W H I T E P A P E R

Qlik UKI Consulting Services Catalogue

Procurement s Role in Plant Reliability and Safety. Reduce Instrumentation Costs Without Sacrificing Reliability and Safety

Hands-on Sessions Each attendee will use a computer to walk-through training material.

Comparison of CMS Preventive Maintenance Regulations

From RCM to CMMS: How to implement the RCM recommendations in EXP and Maximo. Presented October 14, 2009 Chung Yee Keong

FaciliWorks. Desktop CMMS Software. Making Maintenance Manageable

Fluid Automation: The Unsung Contributor to Plant Economic Performance. A Management Brief From ASCO Numatics

CDW Technology Services

Exclusive Stock for total peace of mind. Your private reserve of original spare parts

BEST PRACTICE GUIDE TO MAXIMIZE ROI OF ENERGY MANAGEMENT SYSTEMS. Bob Zak SVP, Facility Solutions Ecova

case study Coverity Maintains Software Integrity of Sun Microsystems Award-Winning Storage Products

ASSIGNING MONETARY VALUES TO PERFORMANCE IMPROVEMENTS

The Leading Reward of Maintenance Management

Aftermarket Services Helping to manage your products post-launch.

Overload Protection in a Dual-Corded Data Center Environment

How to Use Supply Chain Design to Craft Successful M&A Activities

Navigating Among the Clouds. Evaluating Public, Private and Hybrid Cloud Computing Approaches

How Lean Manufacturing Adds Value to PCB Production

Avoiding Revenue Cycle Disasters While Implementing Enterprise Integrated Systems

ASSET MANAGEMENT. a best practices checklist WHAT IS ASSET MANAGEMENT? HERE IS WHAT YOU WILL LEARN:

uptimeplus.co.uk predictive maintenance strategy galileo Supported by

Data Center Audit April 2015

PC & EMBEDDED CONTROL TRENDS

ENTELEC 2002 SCADA SYSTEM PERIODIC MAINTENANCE

Condition Monitoring for Predictive Maintenance

Annual funding statement

Transcription:

Page 1 of 6 Reliability-Centered Maintenance: A New Approach Reducing data center risk and cost By Lee Kirby June 05, 2012 In today s complex data center environment, operators must continuously juggle multiple integrated mechanical, electrical, and data systems to minimize the risk of critical failure. Data center managers traditionally have relied on preventive maintenance (PM) programs to manage risk and keep facilities and systems running. Reliability and performance are the goals, and detailed maintenance routines include tasks such as overhauling a rotary uninterruptible power supply (UPS) or generator and replacing system components on a schedule to prevent failure during live operation. However, some of these standard operating procedures are not optimal and can even introduce more risk into the system. For example, replacing certain non-critical components as part of routine maintenance can actually increase the risk of a critical failure, rather than reduce it. While it may be counterintuitive, data center managers would actually be better served by letting certain components run-to-fail (continue operating in place until the component wears out or fails naturally.)

Page 2 of 6 Reliability-centered maintenance (RCM) is a different approach that uses analysis to determine the appropriate failure management strategy for each component in a system to ensure performance in the most cost effective way. RCM seeks to preserve system function over equipment function, not just operation for operation s sake. This approach can yield a quick return for any data center. THE HISTORY OF RCM RCM was first developed by the aviation industry to manage system-wide maintenance. Airline maintenance is a high-stakes endeavor: in a data center, a critical component failure may cause system downtime; in the airline industry, critical component failures can have potentially life-threatening consequences. Early PM programs were designed based on the assumption that periodic system overhauls are needed to ensure reliability and safety (see figure 1). Replacing components before they had a chance to wear out and fail seemed a prudent approach. However, as maintenance costs began to rise in the 1960s without delivering any corresponding increase in reliability, airlines began to question this assumption. The FAA formed a Maintenance Steering Group (MSG) to research the issue and recommend new approaches. The full findings of the FAA s Maintenance Steering Group were incorporated in a 1968 report titled, MSG-1. The MSG determined that overhaul maintenance is indeed flawed. The overhaul philosophy assumes that most equipment suffers from a bathtub curve (see figure 2): there is an initial break-in period during which risk of failure is high, followed by a fairly long stable operations phase, and lastly a definable wear-out point. In an overhaul, components are removed from the system prior to their defined wear-out point and replaced with new components. Surprisingly, the study found that implementation of overhauls in many cases provided no improvement in safety or reliability. In fact, in some cases, the performance of the system actually worsened after an overhaul. Overhauls increase system risk for two reasons: As maintenance is performed, there is always some potential for the introduction of maintenance (human) error As new components are introduced into the system, a certain percentage will suffer from infant mortality (failure during the initial break-in period) In the airline industry at that time as in the data center industry today overhaul limits were often not determined by analysis of the actual performance parameters of components, leading to high maintenance costs with little benefit. Data center maintenance is all about reliability, but without knowing it, data center managers are introducing risks into their environment by doing things the same way they have always done them. It is counter-intuitive, but sometimes it might be better to not maintain something until it fails, said Gary Lee, president/ceo, Universal Asset Management. In fact, many components have no wear out characteristic; there isn t a universal bathtub curve, but many failure rate profiles for various components (see figure 3). For a component with pronounced wear out, for example, it may be most cost-effective and less risky to replace the component proactively as part of a routine overhaul. But for other components with different failure rate curves, Run-to-fail might be more cost effective. For equipment with a constant or decreasing failure rate, the

Page 3 of 6 overhaul approach means a substantial portion of the useful life of the component population is being sacrificed. RCM RCM was first applied to the Boeing 747. The key to RCM is an analytical process that looks at the reliability of various components and the severity of the consequences if component failure occurs. The consequences are the measurable impact of an equipment failure in the areas of operation, performance, safety, environmental impact, and economics. An organization can evaluate the value of each maintenance task by considering all of these factors. RCM represents a sea change from the way data centers are typically managed today, where the priority is on preventing or minimizing failures. With an RCM approach, however, the goal is not necessarily to prevent all failures, but to manage them cost-effectively, without compromising safety or performance, while meeting mission requirements. RCM reduces the consequences of failures to an acceptable level. The focus is not on component failure, but on function failure. The airline industry and NASA have proven that reliability and costs can improve significantly with an effective reliability-centered maintenance program. The data center industry can learn in a lot of ways from other industries, affording it the opportunity to adopt and implement reliability-centered maintenance as business demands continue to increase in this sector, said Jason Schafer, research manager, Datacenter Technologies, 451 Research. KEY ELEMENTS OF RCM In the context of RCM, there are a multiple maintenance disciplines that, if used in concert, will optimize system performance while minimizing maintenance costs. These are PM actions and corrective maintenance. PM actions and corrective maintenance are taken to preserve functionality, reduce unplanned downtime, and minimize impacts to mission performance. By their nature, PM actions require some level of investment in activities that go beyond simple corrective maintenance such as inspecting, monitoring, refurbishing, or replacing. The RCM process helps organizations evaluate the trade-off between investment in these activities and overall operating costs. Corrective maintenance or run-to-fail (RTF, also referred to as reactive maintenance or repair) responds to failures only after they occur as a result of deterioration in equipment condition from active use. RTF may be the most effective approach for many types of equipment where the consequences of a failure are deemed to be minimal or acceptable. RCM analysis compares the risk and cost of failure against the cost of PM activities that would be required to mitigate or prevent that failure. RCM employs an integrated failure management strategy that helps determine the proper balance between planned (preventive) and run-to-fail approaches. For non-critical system components, for example a single cooling fan in an array of fans, the impact of a failure would be minimal. The RCM approach includes identifying components where investment in redundancy and backup yields a better net cost than investing in PM. For example, if a system is configured so that 30 percent of fans in an array can fail before system performance is compromised, an organization can save time and money by delaying maintenance until 25 percent of fans have failed, rather than replacing each failed fan on an ongoing ad-hoc basis, or engaging in routine fan overhauls before failure occurs. Letting each fan run to the end of its life extracts maximum value from the original equipment investment.

Page 4 of 6 Predictive Testing & Inspection (PT&I). PT&I is a cornerstone of RCM. By understanding the statistical performance and failure parameters of each component, identifying key indicators that are the precursors to failure, and by setting up inspection or sensor regimens to monitor for those precursor signs in critical components, an organization can design the optimal RCM program. Based on ongoing PT&I monitoring, maintenance teams can respond to real-time data to repair or replace equipment when data indicates the condition is deteriorating/at risk of failure. Proactive Maintenance. In environments like data centers where the consequences of failure can vary dramatically, proactive maintenance provides a solid body of evidence-based information and data that help organizations better understand and plan for performance parameters, and make decisions based on sound technical and economic justification. For example, proactive maintenance resources and activities can incorporate equipment specification, failed-part analysis, and reliability engineering. Other Activities. A comprehensive RCM approach also includes actions to address issues that compromise acceptable levels of reliability. These actions can include redesign, procedural changes training, improvements to maintenance manuals, or insertion of new technology in the maintenance or operational environment. For example, figure 4 shows a facility dashboard that integrates multiple factors to help track and predict cooling capacity. BENEFITS OF RCM RCM is an integrated approach: it employs PM, PT&I, run-to-fail, and proactive maintenance techniques to take advantage of each of their respective strengths to ensure facility and equipment operability and efficiency, while providing the required reliability and availability, at the lowest cost. NASA, in its February 2002 Reliability Centered Maintenance Guide for Facilities and Collateral Equipment, explains, RCM seeks the optimal mix of Condition-Based Actions, other Time- or Cycle -Based actions, or a Run-to-Failure approach it is an ongoing process that gathers data from operating systems performance and uses this data to improve design and future maintenance. These maintenance strategies, rather than being applied independently, are integrated to take advantage of their respective strengths in order to optimize facility and equipment operability and efficiency while minimizing life-cycle costs. The benefits of implementing RCM in the data center environment are compelling and have already been proven in the even more demanding maintenance environment of the airline industry. For data center managers, RCM benefits include: Improved Performance. Maintenance organizations can focus on the most critical equipment elements, with shorter work lists reducing extensive and costly shut downs. There are fewer burn-in problems from unnecessary replacements, and more efficient identification of unreliable components. Higher Quality. The discipline of the RCM process results in a better understanding of equipment capacity and capability, a better set of equipment set-up specification and requirements, confirmation or redefinition of equipment-operating procedures, and a clearer definition of maintenance tasks and objectives. Cost Effectiveness. Using an RCM approach dramatically reduces unnecessary routine maintenance, helps prevent or even eliminate expensive failures, and introduces defined

Page 5 of 6 decision guidelines for acquiring new maintenance technology. The NASA RCM Guide states, The flexibility of the RCM approach to maintenance ensures that the proper type of maintenance is performed on equipment when it is needed. Maintenance that is not cost effective is identified and not performed. Life Cycle Costs. RCM reduces life-cycle costs by optimizing maintenance workloads and providing a clearer view of spares and staffing requirements. It also enables organizations to realize a longer useful life of expensive equipment items through the use of condition-based maintenance techniques. The NASA RCM Guide explains, the cost of repair decreases as failures are prevented and preventive maintenance tasks are replaced by condition monitoring. The net effect is a reduction of both repair and a reduction in total maintenance cost. Often energy savings are also realized from the use of PT&I techniques. Enhanced Maintenance Data. RCM provides a better understanding of equipment in its operating context, which in turn leads to more accurate and complete drawings and manuals, and allows maintenance schedules to be more adaptable to changing circumstances. RCM RESULTS Currently, data center managers are unknowingly introducing risks into their environment by following traditional, overhaul-oriented maintenance regimens. The conservative nature of data center operations has stalled the adoption of proven methods from other industries. However, with the right combination of analysis, tools, and procedures, an RCM program can be implemented smoothly to reduce cost and risk. In the data center industry, where 86 percent of downtime is due to infrastructure failure and human error (according to a study commissioned by Sun Microsystems), an effective RCM program can have significant impact. RCM yields results very quickly; most organizations can complete an RCM review on existing equipment and achieve substantial benefits in less than a year. It is also an ideal approach for determining the maintenance requirements of new equipment of all kinds. When applied correctly, it transforms both the maintenance requirements themselves and the way in which the maintenance function as a whole is perceived, said Lee. Most significantly, past experience has demonstrated that data centers can realize substantial savings in maintenance costs. The NASA RCM Guide reports that savings of 30 percent to 50 percent in the annual maintenance budget are often obtained through the introduction of a balanced RCM program. As energy, labor, and operating costs continue to increase in the data center industry, there is a compelling case for the adoption of RCM. Lee Kirby heads Skanska's Strategic Operations Group, which leverages the best practices of the Mission Critical Center of Excellence to ensure maximum returns and optimal performance for data

Page 6 of 6 center facilities. He comes to Skanska with a proven ability to build and run global services and operations organizations. He has successfully led several technology start-ups and turnarounds and has managed world-class global operations. Prior to joining Skanska, he was the key contributor to the growth of the data center operations business at Lee Technologies.