Using the Nessus Vulnerability Scanner on Control Systems By Dale Peterson All too often we hear stories about the IT Department or some consultant running a vulnerability scan that takes down a key control systems server or component with potentially devastating effects on the underlying process. This is usually followed by the statement that SCADA and other control systems should never be scanned. In fact, many experts and training courses recommend that control systems never be scanned for vulnerabilities. At Digital Bond, we take the opposite position; the fragility and latent vulnerabilities commonly found in control systems is precisely why they must be scanned for cyber vulnerabilities. Scanning just needs to be done properly. In this paper we describe Digital Bond s proven scanning methodology using the market leading Nessus scanner as an example. A Vivid Example Digital Bond performed an assessment on a very large and critical SCADA system, and the assessment included vulnerability scanning of SCADA servers. A simple, non-intrusive scan that identified the operating system and open ports caused a critical SCADA application port to close. The SCADA server no longer communicated with other servers and required a reboot. All this from a test that the scanner labeled as Safe. This may sound like a bad thing or even a disaster, but it was not. The server being tested was one of a redundant pair so operations were not affected. After isolating the specific test that caused the port to close, the asset owner and Digital Bond contacted the SCADA application vendor. The vendor quickly verified the vulnerability and issued a patch within the week. The asset owner applied the patch and a major vulnerability was removed from the SCADA system. Without scanning the vulnerability would have existed in the SCADA environment for years. The test that caused the crash was likely to be one of the first tests an attacker would run if he penetrated the security perimeter. It also is a test that a well meaning, but misguided IT Department staffer would run against an entire subnet to identify vulnerabilities, thereby taking down both the primary and failover servers and affecting operations. Of course, vendors will not and often can not be so responsive with security patches. However even without a patch, the asset owner can put in place compensating controls to lessen the risk as discussed later in this paper. This is just another element of risk management that must be addressed. There is no excuse for SCADA systems to crash or operate improperly when scanned with a tool like Nessus. This is unacceptable in the non-mission critical IT world, why should it be acceptable for much more important system and applications? Unfortunately the SCADA community suffers from very low expectations. Help change that by pushing your vendors so they can withstand the first level attacks from port scanners and broad based scanning tools.
Scanning Control Systems Control systems are both hard and easy to scan. They are hard to scan because they crash very easily. Many of proprietary SCADA and DCS applications were not designed with security in mind. The software was written for a closed environment with little attention to secure coding practices. If vendors like Microsoft, Cisco and Oracle have latent vulnerabilities given their vast expenditures in security, one can imagine why control system applications often fail during scans. In addition to the control system vendor s code, the SCADA systems leverage third party components such as protocol stacks and web server applications. All too often the vendors have made poor component choices and introduced vulnerabilities. Another common reason why vulnerabilities make it into production is poor Quality Assurance (QA) testing. Vendors only perform positive testing, that the system works properly in normal operations. Failures occur when bits, bytes and packets are sent in an unexpected manner causing the component to fail. Scanners will often send unexpected bytes and packets to applications which are then handled improperly and cause a crash. The good news is control systems are typically deployed with redundancy in place. It is one of the few security strengths that is found even in older, legacy systems. This redundancy can be leveraged in the scanning process to allow rigorous testing, even leading to server crashes, without jeopardizing operations. Planning The Scan The first step is to identify the different types of systems that need to be tested. For example, a SCADA system may have Realtime Servers, Historians, HMI on one or more different Operating Systems (OS), OPC or ICCP Servers, Terminal Servers, PI Servers and a variety of other servers or workstations. In addition to the servers and workstations, add infrastructure components to the list such as Routers, Switches, Communication Servers and Firewalls. Hopefully these systems will not crash when scanned, but we have seen many older Communication Servers crash even under light scanning. Now that you have your list, determine how you will scan each item in the list without affecting operations. Scanning in a highly realistic lab environment would be the first choice, but this is often not available. In this paper we will assume the production network will be scanned. Here are the rules Digital Bond follows before scanning any device on a control system: Assume the device being scanned will go down and be confident the loss of that one device will not affect operations Have the System Administrator participate in the testing Have a plan to recover as quickly as practical. Most often this is simply a reboot, but a plan to recover or rebuild the system should be available. If the asset owner cannot rebuild the system quickly from an image this identifies a different vulnerability. All the redundancy in the world will not help if a zero day worm gets into the
control center and destroys any system with an IP address. The ability to quickly restore using basic IT techniques is an area for improvement in many control systems. As mentioned earlier, redundancy is the key. Typically, you should scan the failover / non-active system. The only exception to this rule is if the non-active system has ports and services closed until it becomes active. In this case, scanning the non-active system will not provide accurate results. A decision needs to be made on the confidence in the failover procedure and impact of a failover to the process. In some cases we have disabled the failover and made the second system active in an isolated subnet. The confidence in redundancy and failover is an important discussion beyond scanning. If the asset owner is not confident in failover working it identifies another security issue. This lack of confidence in failover may mean the device is not patched appropriately because of a fear of failover during the patching process, and the real possibility that failover will not work in an emergency situation. The final situation limiting the ability to scan is the case where a critical system does not have redundancy. Statements such as this system can never go down because it will severely affect peoples lives and this system is not redundant are incompatible. Any system that would cause an unacceptable impact if brought down by scanning should have automated redundancy. Asset owners are deluding themselves if they believe a server will never crash from a hardware fault, memory leak or other cause unrelated to an attack. Digital Bond s policy and recommendation is to not scan any critical systems that lack redundancy and cannot be quickly rebuilt. In these cases, the missing redundancy and recovery are rated Exposures, our highest vulnerability finding rating, and should be addressed immediately. Scanning With Nessus The Nessus Vulnerability Scanner is the most popular broad based scanner and is commonly used by internal and external teams performing security assessments. It has a large number and wide variety of plugins, scanning tests, that continues to grow. Nessus is available free of charge at Tenable Network Security s website, www.tenablesecurity.com. Digital Bond has worked with Tenable Network Security to develop an initial set of SCADA plugins for ICCP and OPC, Modbus TCP, and DNP3 servers as well as some PLC s and SCADA applications. Documentation on the SCADA plugins is available in the Resource Section of Digital Bond s web site, and the SCADA plugins are available from Tenable s Direct Feed 1. In this paper we cover scanning with the standard Nessus plugins and this same approach can be used with any broad based scanning tool. Once you have prepared to scan following the advice in the previous section, select one device to scan - - that is one IP address only. Most of the problems that have occurred in scanning SCADA networks have resulted from scanning an entire subnet and bringing many systems down simultaneously. 1 Tenable charges $1200 a year for access to their Direct Feed. Most non-scada plugins, such as operating system or application plugins, are available free of charge seven days after their release.
After selecting the host, you need to determine what plugins to run. There are two approaches to answering this question. 1. Run a port scan on the host to determine what ports are open and help select the appropriate tests. This is generally our preference; we often use nmap for this port scan. 2. Select the appropriate tests based on your knowledge of the host and include ports scans. This may be more appropriate if you only have a short time window to run one scan. The Nessus Plugin Categories are: AIX Local Security Checks Backdoors CGI abuses CGI abuses : XSS Cisco Debian Local Security Checks Default Unix Accounts Denial of Service FTP Fedora Local Security Checks Finger Abuses Firewalls FreeBSD Local Security Checks Gain a shell remotely Gain root remotely General Gentoo Local Security Checks HP-UX Local Security Checks MacOS X Local Security Checks Mandrake Local Security Checks Misc. NIS Netware Peer-To-Peer File Sharing Port Scanners RPC Red Hat Local Security Checks Remote file access SCADA SMTP problems SNMP Service detection Slackware Local Security Checks SuSE Local Security Checks Unix Security Policy Useless services Web Servers Windows Windows : Microsoft Bulletins Windows : User Management Obviously many of these plugins will not apply, so you should limit even rigorous testing to only useful plugins. Don t select the Windows plugins for a SCADA server running on HP-UX. Similarly there is no reason to run Unix Security Policy Checks, NIS, Red Hat, AIX, Cisco on a Windows device. Any security professional competent to run Nessus should be able to work with a SCADA System Administrator to select the appropriate tests.
Once you have identified the appropriate set of plugins for a device, create a plugin set and save it for future use. So you may have a plugin set for SCADA UNIX servers, another for HMI, and a third for OPC servers on Windows. To maximize the effectiveness of the scan, you should add login credentials and other information into the Nessus configuration. For example, adding a userid and password for an account with Administrator privileges to the Windows system is required for many of the Windows tests. Some will say adding this information is cheating because an attacker will not have this information. This is true if the project is a blind penetration test. However, Digital Bond does not recommend penetration tests for SCADA systems because they are more likely to cause outages and do not provide as much information as a security assessment. The security assessment s goal is to identify vulnerabilities, and providing this account information will better achieve this goal with less risk of an outage. The final decision is whether to run the dangerous plugins such as denial of service tests. These plugins are likely to crash a vulnerable system. Typically Digital Bond will not run these plugins because vulnerability to denial of service conditions can often be determined by identifying missing patches and configurations. It is a judgment call in each security assessment. Analyzing the Scan Results The scan result analysis path depends on whether the host survived the scan or stopped operating properly. Remember the purpose of a broad based vulnerability scan such as Nessus is to identify missing patches, weak or default configurations, and services / applications running on the host; Nessus is not designed to find zero-day vulnerabilities in poorly designed applications. Surviving a safe Nessus scan is a minimal, low-bar criteria for any host. Scanning Caused a Crash If the host stopped operating properly during the scan, the next step is to determine what plugin or plugins caused the problem. If the host stopped working properly but did not completely crash or hang, go to the system and see what services and applications are no longer running. This will help narrow down the plugin that caused the problem. Sometimes looking at the scan results will also provide hints on what plugin caused the problem, but don t rely on the output because hosts behave in a variety of ways after a service crashes. In many cases the Nessus scan results will look better after a service has crashed. Determining what plugin caused the system to stop operating properly is not always easy. It can involve reducing the number of tests and trial and error. The most difficult scenario is when some combination of plugins causes the fault. Work should continue until you are able to isolate the plugins or combination of plugins that can consistently cause the cessation of proper operation. Once the offending plugins or plugins are identified, you should provide the detail to the control system application vendor. Most IT vendors have processes in place for reporting vulnerabilities, but this is not yet common in the control systems community. Fortunately most asset owners maintain a support relationship with their vendors and have points of contact to report problems. The key point is to insure the vendor understands the severity and sense of urgency to resolve the newly discovered vulnerability.
There are differing opinions on what additional vulnerability disclosure is appropriate, and this is covered in detail in Digital Bond s Vulnerability Disclosure blog category. After many years of vendor inaction even when faced with a very large, very unhappy customer, Digital Bond now reports newly discovered vulnerabilities to US-CERT at the same time as the vendor. US-CERT has effectively worked with and applied pressure to the vendors to accelerate the development of security patches for zero-day vulnerabilities. The plugins that caused the crash should be removed from the plugin set and Nessus should be rerun. Scanning Completes Successfully Eventually you will be able to run a scan that completes successfully with the host still operating properly. At this point, review the scan results to identify potential vulnerabilities. Nessus and other scanners typically rate the severity of the findings, but this should only be used as a guideline. Probably the biggest issue with broad based scanners is false positives. They are getting better, but if you read the scan output carefully you will see the word possible used many times. This is the correct way for a scanner to err. False positives can be addressed while false negatives will result in the vulnerability remaining undiscovered. All identified vulnerabilities in the scan need to be verified before being added to an assessment report. Verifying the vulnerability can include inspecting the configuration, secondary testing or even exploiting the vulnerability. Exploit tools such as the open source Metasploit can provide a dramatic example of how a missing patch can be exploited to allow an attacker to have control of a HMI or critical server. In a number of assessments showing unauthorized remote control of an operational HMI was very helpful in getting senior management to realize the need for improved security. It is critical that false positives be removed from any report and the severity ratings be evaluated with an understanding of the true risk to the control system. Beware of the assessments that run the tool and provide the report without this analysis. A report that makes its way to senior management with many false positives or incorrectly rated findings can cause a large amount of unproductive time explaining why these are not issues. Remediation recommendations for the Nessus findings are straightforward and are likely to include to patches, configuration changes and disabling services. The same care taken in the scanning should be followed in any changes to a production system. Change management and testing in control systems is a topic worthy of its own whitepaper and includes working with the application vendor, lab testing and phased deployment. The scanning process described in this paper only scanned one of each type of system. Any vulnerability identified in scanning the sample host is likely to be found in all the systems of that type. Insure that remediation occurs on all hosts of that type. While eliminating the vulnerability identified in the scan is important and straightforward, remediation does not stop there. It is important to determine the root cause of the vulnerability to prevent it from occurring again. A common example is vulnerabilities related to missing patches. Applying the patches will resolve the vulnerability identified in the scan, but the root cause is
problems in the patch management process. If the patch management process is not corrected new security related patches are unlikely to be applied in the future. When Direct Remediation Is Impossible It is not unusual to scan a control system host, find a vulnerability, and report it to the vendor only to hear back from the vendor that yes this is a problem, but there is no fix planned or scheduled. Obviously this is an unacceptable answer, but it is also a fact of life. For example the vendor may say applying a patch will cause the SCADA application to fail or a default account cannot be removed or changed. If a vulnerability cannot be addressed directly, the security team must identify compensating controls that will reduce the risk to an acceptable level. Examples of common compensating controls include internal firewalls or access control lists, bandwidth limitations, custom IDS/IPS signatures, and shortening the recovery time in case of an incident. Compensating controls often require innovative thinking. Future Scans Networks do not remain static and new vulnerabilities are identified every week of the year, so all networks should be periodically retested. After the first scan an asset owner should have a set of Nessus plugins that provide the appropriate information without affecting operations. These plugin sets can be used for future scans or more thorough scanning of the SCADA systems. While you may be confident enough in the process to scan more than one host at a time, never scan a set of hosts that would cause an unacceptable impact to operations if all hosts failed. For example, many Operator stations have at least two HMI. A set of IP address could be created and scanned that only included one HMI from each Operator station. This same philosophy could be followed for redundant servers. It may be possible to assess the entire control center with a tested plugin set in two or three well planned scans. Of course, always err on the side of caution and anticipate that scan related problems could occur. Final Thoughts Vulnerability scanning is an important, high profile part of a security assessment, but it is only part of the assessment. In this paper, we touched on some other elements such as analysis of recovery capability, redundancy, and patch management. Digital Bond security assessments include a review of administrative and technical security controls by interview and inspection. Some of these activities include: Analysis of firewall, router and switch configurations Analysis of OS configurations Analysis of SCADA, DCS and EMS security configurations Interviews with Managers, Operators, Engineers and System Administrators Review of applicable security policies Review and audit of key procedures such as change control and backup
Analysis of availability related to component failure and widespread disaster Analysis of physical security of cyber assets