Flexible asset monitoring system improves railway infrastructure availability F.L.J. van den Bos 1, R. Koopal 2 N.J. Steentjes 1 and T.S. Jones 3 1 Inspectation, VolkerRail Nederland B.V., Utrecht, The Netherlands, 2 Asset Management - Infra Systems - Signalling, ProRail B.V., Utrecht, The Netherlands, 3 National Instruments, Newbury, UK Keywords: Analysis, assets, condition, monitoring, real-time Abstract Dutch rail operator ProRail is tasked with increasing the availability of their network and has set a goal of zero avoidable faults. As ProRail outsources maintenance of the rail network to contractors, going forward new contracts will be based on performance and the number of faults, rather than unit based charging. Additionally, penalties will be issued for failing to meet the requirements of a contract. Condition based maintenance is considered to be the most effective strategy to significantly reduce service affecting faults, especially when combined with real-time online monitoring for gathering condition data of assets. This paper discusses the delivery of an application to address Zeeland a performance-based contract in the Netherlands. The contractor VolkerRail, in cooperation with ProRail, utilised real-time monitoring to reduce avoidable faults and improve availability of the network. The paper will describe the flexible asset monitoring system itself, which was developed by VolkerRail s measurement and inspection department Inspectation. 1 Introduction The Netherlands has the busiest rail network in Europe. In 2012 there were over 3.3 million train journeys made on the Dutch track. The Dutch rail network is owned by the Dutch Government and operated by ProRail. ProRail manages 7,033 kilometres of track. The track features 2,731 level crossings (of which 1,614 are secured), 7,195 turnouts and 11,683 signals. The mission of ProRail is to make the rail network safer, more reliable, more punctual and more sustainable [1]. The main goal in maintenance is to increase the availability of the network by reducing faults. The ambition is: zero avoidable faults [1]. Because ProRail has outsourced the maintenance of the rail network, achieving this goal means better performance of contractors. To achieve this, ProRail has changed the way they procure maintenance. New maintenance contracts on the market will be performance based instead of unit based. Performance is no longer a matter of best effort, but will have serious financial consequences. Contractors are forced to change their method of maintenance from corrective and preventive time based, to preventive condition based in order to survive in this new market. Condition Based Maintenance (CBM) is considered to be the most effective maintenance strategy in a performance contract. An essential part of CBM is gathering condition information of the assets that need to be maintained. Besides inspecting assets periodically by visual inspection or taking measurements, there is also the possibility of real time monitoring. 2 Real time monitoring in performance contract Zeeland 2.1. The network in Zeeland The geographical area of the performance contract Zeeland is in the Provinces Zeeland (for most of the tracks) and Brabant of the Netherlands and contains the following main lines: Lage Zwaluwe Roosendaal Roosendaal Vlissingen Besides the two main lines the contract also contains the tracks on industrial areas Sloe and Terneuzen. 2.2. Incentives and applicability The main incentives for real time monitoring are: Reduction of recurring faults: historical status and condition information of assets gives more insight during Root Cause Analysis (RCA). Reduction of repair time: real time status and condition information of assets gives mechanics faster insight into the cause during repair. Reduction of faults: trending of condition information of assets can be used for maintenance planning. Improvement of assets: insight into the performance of assets based on status and condition information can help engineers to come up with system improvements and modifications. Whether real time monitoring is applicable as a tool for CBM depends on the following facts: 1. The failure modes, effects and causes from Failure Mode and Effect Analysis (FMEA) must be 1
predictable based on the real time gathering of status and condition data. 2. The required status and condition data is already available within an asset s information or control system and is also accessible, or it can be obtained by measurement with available sensor and data acquisition technology. 3. The business case must be positive, which means that the failures with the highest risk must be measureable for an affordable price. During the tender of performance contract Zeeland in 2011, VolkerRail did research on these facts. The overall outcome of the research was positive and VolkerRail decided to improve reliability and availability based on the potential of real time monitoring. VolkerRail has won the contract and has introduced real time monitoring on a large scale. The next sections will give a brief summary of the outcome of the research. 2.3. Failure mode and effect analysis (FMEA) Based on the FMEA several objects are prone to failure, especially the following: 1. Light signals 2. Turnouts (also known as set of points or switches) 3. Track circuits 4. Level crossings Further analyses of these objects has been performed and gives more insight into failures, causes and possible indicators that can provide the necessary status and condition data. A subset of indicators for a turnout is given in Table 1. Failure Cause Indicators No power available to drive the electric motor Defect in relay Defect in circuit between relay and electric motor Voltage of the power supply (no voltage or very low voltage) Status of relay (the relay that should be active is down instead of up) Motor current (no current) Blockage of electric motor or point blades Frozen point blades Motor current (current is maximum) or Force on point blades (force is to high) Track temperature (below 0 C) Table 1: Different failures, causes and indicators for a turnout 2.4. Availability of status and condition data A lot of failures are related to signalling. The most commonly used signalling in the Netherlands originates from the Marshall Plan after World War Two and consists of relay interlocking. As a result, the required status and condition data must be measured with additional sensor and data acquisition technology. Based on the outcome of the FMEA, the required sensor technology can be summarised as: Voltage transducers Current transducers or clamps Force sensors Temperature sensors Potential free (or so called dry ) contact on relays These sensors are all commercially available in a wide variety of different types. Unfortunately, it is not always possible to use dry contacts on a relay. Especially, the so called Brelays do not always have spare dry contacts available for monitoring purposes. So, it was necessary to develop a special B-relay monitoring unit to determine whether the relay is up or down. The development was undertaken in cooperation with the Dutch firm Wesemann. For deployment of data acquisition technology in the field, it was determined that the following requirements were the most important: Ruggedness: rail infrastructure is a very rough electromagnetic environment because of long wires, relay switching, high voltages and motors turning on and off. Short time to market: the monitoring system had to be deployed and operational within a year. Capability to include various sensors: such as current, voltage, temperature and digital levels. Capability to measure at different sample rates: from 2 ks/s up to 30 ks/s. Online buffering of sensor data, online data reduction and key parameter determination. 2
System synchronisation through GPS: sensors are distributed along different relay houses and cabinets and also connected to different DAQ devices, but they must be related to each other. One software stack for all DAQ devices that is remotely upgradable and can be configured independently. 3G connectivity to transfer the data over cellular networks to a central database. System health monitoring. Automated analysis, alarming and reporting. After some research on available data acquisition technology, the choice was made to use CompactRIO hardware and LabVIEW system design software. This because other data loggers and monitoring systems did not offer a high sample rate, or, they did not have enough buffering capabilities to transfer the data later over 3G Universal Mobile Telecommunications System (UMTS) networks. Systems sometimes may need to sample at 5 ks/s, or even up to 30 ks/s; therefore, an industrial PC or NI CompactRIO was the only viable solution. CompactRIO is small and rugged, and integrates with many I/O signals. When combined with the SEA CompactRIO 3G module, it has GPS synchronization and UMTS communication capabilities. Because of the wide bandwidth of the I/O modules and the rough EMC environment inside relay houses and cabinets, development of an interface board with input protection circuits to the I/O and a watchdog was necessary. Besides the NI C Series modules, the SEA GPS and GPRS modules were used to synchronise systems and communicate over wireless cellular networks for data transfer and remote upgrades. The design of the interface board and LabVIEW software has been done in cooperation with National Instruments Alliance Partner INCAA Computer B.V. 2.5. Positive business outcome To deliver a positive business outcome, it was not possible to monitor all objects, especially those with high installation costs like in-track force and temperature sensors. Therefore, locations where indicators are concentrated together and installation was lower cost were prioritised. Fortunately, this is the case for relay houses and cabinets. Real time monitoring is not always the best solution. This was for example the case with light signals. The business case showed that it is much better to eliminate the risk of failure by replacing the light bulbs with LEDs. Based on the results of the research the business case turned out to be positive if the monitoring system can reduce failures resulting in train delay by greater than 30%. 2.6. Scope and parameters Real time monitoring was applied to the following objects: 1. 16 Relay houses: Temperature (inside and outside) Power supply voltages (230Vac, 136Vdc, 110Vac and 12Vdc) Relay status power supply (GDPR, POR and POPR) 2. 71 Relay cabinets: Temperature (inside) 3. 200 Turnouts: Motor current (ac or dc) Relay status and control (NWZR/RWZR and NWPR/RWPR) Relay status track circuit (TR) 4. 298 Track circuits: Local and track current (torque) on TR 5. 55 Level crossings: Bell current (dc) Barrier level: 0-6 contact Motor current (dc) Power supply voltage (12Vdc) Relay status crossing (AR, XR, XGNR and XKTER) Relay status power supply: (TG/LG, POR and POPR) In total, approximately 2000 analogue and 5000 digital signals are continuously monitored. 3 Flexmonitoring: a flexible asset monitoring system 3.1. System architecture Figure 2: System architecture of FlexMonitoring One of the key features of FlexMonitoring is where data intelligence is extracted from the raw data. Data intelligence determines the status and condition of assets from the obtained measurement data, and compares this information with pre-defined settings. When an event occurs depending 3
on the urgency, the system sets an alarm or a warning. There are two locations to perform the analysis; locally on the DAQ device or centrally on the server. FlexMonitoring has data intelligence centralised on the server, because of the following advantages: Aggregation of data from different DAQ devices and multiple assets All DAQ devices can use the same software, only configuration per device is necessary. Future system extensibility to add other DAQ devices or PLC s The centralised data intelligence in FlexMonitoring is implemented as the Event Server Application (ESA) and will be discussed in a later section. big variant (G type) with 30 analogue and 60 digital inputs. The G type can be expanded with an extra of 30 analogue or 60 digital inputs. Figure 3 shows the hardware architecture of the G type DAQ device. 3.2. Sensors The system can work with all commercially available industrial analogue sensors that have an output signal of 4 20mA and can provide active sensors with a power supply of 24Vdc. The digital inputs on the DAQ device are specially developed for status determination of railway relays, but can also be used with any other relay that has dry contacts. The DAQ device has also dedicated inputs for temperature sensors. When measuring in signalling circuits, special attention has to be paid to isolation. The system must not have galvanic coupling between the sensor and the signalling circuit. This limits the range of sensors that can be used for monitoring in a railway infrastructure. Commonly used sensors are the Points Condition Monitoring (PCM) range of current clamps manufactured by LEM. A new developed sensor is the B-relay monitoring unit. The B-relay monitoring unit determines whether a relay is up or down based on the presence of the magnetic field produced by the relay s coil. The magnetic field strength varies for different types of relay, so the unit is provided with a micro controller and a memory that contains the settings for the different types of relays. This sensor is used when there are no free contacts available on a B-relay. 3.3. Data Acquisition device The DAQ devices continuously measures on all active channels. Based on the defined measurement type it captures the data in a certain way (e.g. periodically, from an analogue level trigger or a digital state change) and performs local preprocessing on the data to reduce data transmission on the limited bandwidth cellular network. Several watchdog mechanisms record system health and state, including GPS signal strength, memory usage and system temperature. In the unlikely case of an error, the system is fail-safe and resets either the communication modem or the CompactRIO unit. This means the DAQ device is monitoring its own performance, a crucial part of a reliable measurement system. The DAQ devices are available in two types: a small variant (K type) with 20 analogue and 20 digital inputs, and a Figure 3: Hardware architecture of the G type DAQ device 3.4. Measurement data On the server side, a LabVIEW application receives the measurement data and stores it in a SQL database. The application can upgrade the DAQ devices and monitors their health. The data in this database can be viewed by users with the FlexMonitoring Web Viewer. The Web Viewer is a web based user interface and is specially designed for test engineers and service mechanics to analyse assets. 3.5. Event server application (ESA) Processing of the measurement data is done in the ESA. The ESA combines the signals from various DAQ devices into an event and stores the data in a high performance SQL database. This means that the ESA has to know the relation between the signals measured and the assets these signals belong to, in addition to a start trigger for processing the incoming data and which calculations to perform. When processing measurement data the ESA firstly identifies the asset, starts a thread to collect all corresponding data and then uses a state machine to detect the end of the event. When the event is completed, the thread is closed and a scenario engine is started to determine what happened during the event. For a turnout this could mean a train passage or blades. All the scenarios for a turnout are translated into algorithms. These algorithms describe the interpretation of analogue and digital data and are used to determine whether the turnout did or didn t receive a command to move from its normal to the reverse position. The algorithms are built up from a library containing functions like: addition, subtraction, multiplication 4
and division, but can also perform more advanced operations like: differentiation, integration and standard deviation. When the ESA has determined the scenario, for example as Train passage, other parameters can be calculated such as the speed and direction of the train, other calculations can include dispersed energy in an electric motor, peak motor current and the total amount of time to perform a movement from normal to reverse. Once calculations are complete the values are compared to upper and lower thresholds for alarm and warning conditions. 3.6. Web application The data in the central database can be accessed via the FlexMonitoring web application. The application is specially designed for displaying graphs, summarising data, alarm handling and event analysis. Figures 4 and 5 show some screenshots of the FlexMonitoring Web Application. Alarms and warnings can also be displayed to users by SMS/ e-mail notification. an alarm, because there is no fall down of the 209 NWPR after to reverse, meaning that the turnout was not in control before. The right graph shows the next event. The rail traffic controller tries a second time to send the turnout to the normal position. This time the points reach the end position and a train can pass. When a train traffic controller sees that the points of a turnout don t reach their end position, in most cases they will attempt a re-try, as in this example, and do not register the fault. Without real time monitoring such faults stay undetected, until the fault happens more often and the train traffic controller finely registers the fault. In this case VolkerRail could undertake preventive maintenance on this turnout preventing any train delay. 4 Results 4.1. Examples of automated fault analysis Figure 4: Example events for turnout 209, emplacement Roosendaal on the 4th of October 2013 at 1:50 PM Figure 4 shows three sequential events within 1 minute at turnout 209 on emplacement Roosendaal. The events were identified automatically by the ESA. The first two events show a deviation from ideal operation and the last one a normal operation. The two deviations were urgent, so the ESA raised an alarm for both occurrences. The left graphs show the event Steering normal & turnout not in end position. This means that there is movement of the points, but that they do not make it into the exact end position. There could be something blocking the points or poor mechanical adjustment. The ESA selects this event, because the 209 NWPR digital line stays low. The middle graph shows the next event, which is Steering reverse. Because, turnout 209 does not reach its end position after to the normal position, the rail traffic controller sends back the turnout to the reverse position. The middle graph shows that the turnout reaches the end position in reverse (the 209 RWPR digital line is high). For the ESA this event is still Figure 5: Example of events of turnout 83A/B on emplacement Roosendaal on the 2th of October 2013 at around 5:08 AM Figure 5 shows two sequential events identified by the ESA within 1 minute on turnout 83A/B, emplacement Roosendaal. Both events show a fault condition and the ESA raised a warning (indicated by the yellow warning sign with an exclamation mark). Both graphs show the event Turnout in & out normal end position, meaning that the points are not exactly in their end position all the time. This results in a jittering end position relay, in this case due to bad adjustment. The ESA identified this event, because the 83 NWPR digital line falls low while there is no to the opposite position. This kind of fault is typically undetectable for the train traffic controller. VolkerRail is responsible for real time condition monitoring on another contract; Systeemsprong Wissels, located in the provinces Noord-Holland and Utrecht. This deployment has a more recent version of the FlexMonitoring web application. Examples of this are shown in Figures 6 and 7. 5
Type of deviation Number Not urgent, has been carried out in the next 45 maintenance slot Urgent, could be prevented with immediate maintenance (60 > minutes) 15 Figure 6 FlexMonitoring example of abnormal motor current due to friction on a turnout Urgent, less train hindrance with immediate maintenance (10 > 60 minutes) 18 Fault, too less time advantage ( <10 minutes) 22 Fault, could not be detected * 8 Table 2: Summary of the performance on turnouts for the period April March 2014 5 Conclusion The examples presented in Figures 4, 5, 6 and 7 show that, with real time monitoring, fault conditions can be detected automatically. The event server application (ESA) is capable of performing automated scenario analysis on a large range of signals and objects. The output of the ESA consists of graphs, summary items, alarms and warnings. Graphs enamble human interpretation of events Summary items form the basis of condition trend analysis and are used for long-term maintenance planning Alarms and warnings act as triggers for short-term maintenance Figure 7 FlexMonitoring example trend of max motor current and the reduction once replaced with a new motor 4.2. Performance of real time monitoring On the 15th of April 2013, Inspectation has started looking at the FlexMonitoring event data on a daily basis during office hours. Alarms and warnings are processed and communicated to the maintenance team of VolkerRail in Zeeland. The performance of FlexMonitoring on turnouts has been analysed for the period April March 2014. A summary of the results is presented in Table 2. As stated previously, the goal is to deliver a 30% reduction in faults in order to deliver worthwhile a return on investment. Calculation of an exact percentage is challenging as there are many external factors and you cannot predict the exact impact resulting from preventative maintenance. It could be possible that the fault never develops into a train delay. It is best stated that the reduction in faults is in the bounds as follows: Lower bound of 24% (preventable urgent faults (15) as a percentage of faults identified with possible train delay (63)) Upper bound of 56% (preventable urgent faults (15) and non-urgent faults (45) as a percentage of all faults (108)) Besides inspecting assets periodically by visual inspection or taking measurements, VolkerRail now also uses real time monitoring as an input for maintenance planning. Since 15th April 2013 short-term maintenance actions were taken on the automatically generated alarms and warnings in performance contract Zeeland. The performance results given in Table 3 are promising for the improvement of railway infrastructure availability. References [1] ProRail jaarverslag 2012, ProRail, Utrecht, The Netherlands, 2013. 6