1 Requirements for Deterministic Control Systems Michael Roa Naval Engineering Department American Bureau of Shipping Alexandria, VA, USA Wayne Cantrell Factory Automation Systems Siemens Industry Automation Johnson City, TN, USA David Cartes, Ph.D. Center for Advanced Power Systems Florida State University Tallahassee, FL, USA Mark Nelson Engineering Department Northrop Grumman Newport News Shipbuilding Newport News, VA, USA Abstract This paper will explain what a deterministic system is, why it is important for real-time mission critical control systems to be deterministic, and provide examples of what types of network protocols should be selected in order to ensure deterministic behavior. In evaluating determinism, the need to consider not only network throughput and latency, but also delay variability or jitter will be discussed. Methods of how to measure determinism will be discussed as well as what types of analysis and testing need be performed in order to validate that a network exhibits deterministic behavior. The paper will offer suggestions for new rules and requirements to be added to naval and commercial rules to ensure that mission critical or vital automation systems are deterministic. I. INTRODUCTION As shipboard computer-based automation systems have evolved, there has been a steady progression towards system architectures based on high-speed integrated networks where data is shared over a common network. Due to the critical nature of real-time control and automation networks, the selection of what network protocols has to take into consideration reliability and responsiveness. A sluggish response time or non-reliable response on a mission critical automation network can have disastrous consequences especially on systems that control and monitor critical machinery systems such as propulsion, steering, electrical plant and firefighting/damage control. These types of systems require a fast response time to ensure action is taken in time to prevent damage to equipment or the vessel and high reliability to ensure that control and monitoring data always gets delivered during critical automation processes and even during periods of heavy network traffic. Lastly, physical redundancy is also needed to ensure the system can withstand single failures and that data has an alternate path to take in the event of a loss of a communication path or equipment. When selecting network protocols and topologies, system designers must take into consideration not only the speed of the network but also its reliability and redundancy in order to minimize the probability of the system either (a) not responding quickly enough, (b) not responding due to failure of data to get delivered, or (c) failing due to a lack of redundancy. This paper will discuss how what requirements need to be emphasized when selecting an automation system network protocol in order to ensure that the system will be able to perform in a reliable and timely manner. Specifically, this paper will focus on the reliability or Quality of Service aspect of the system which is generally expressed as the determinism of the system. II. BACKGROUND The key performance characteristic of an automation system network protocol is determinism. The determinism of a protocol is the ability to support predictable and stable transmission of control parameters between the devices attached to the network. The data transfer must be completed in a defined time period and confirmation must be provided. For real-time control, determinism is the primary measure of network reliability and Quality of Service (QOS). Determinism has taken on more emphasis as automation system networks have been evolving from legacy multilayered systems to Industrial Ethernet based systems that utilize a system-wide common protocol. In a typical legacy automation system network arrangement, three separate communication networks are provided for three separate purposes; (a) Information network, (b) Control Network, and (c) Device level networks. Each of these separate federated networks is optimized for the intended purpose. Different networking protocols and architectures are typically employed at these three levels. Figure 1 provides a typical example of this three level system architecture.
2 Recently a number of field bus protocols have migrated to switched Ethernet based protocols in order to enhance communications capabilities and make it easier to integrate automation system networks from the HMI level down to the field devices. Figure 2 provides a typical example of an automation system with an Industrial Ethernet type architecture where the top level information networks (enterprise level) are seamlessly connected to the field bus device level sub-networks through redundant Ethernet links between network switches. While better integration is the main advantage of this trend, there were initially some concerns regarding the use of Ethernet based protocols on real-time automation networks. FIGURE 1. TYPICAL THREE LEVEL AUTOMATION SYSTEM NETWORK Device Level 1 - At the lowest level the 'Device (field bus) Network' collects a large number of relatively small data items (a small number of bits or bytes each). It typically operates in a repetitive cycle, and the speed of the cycle determines the responsiveness of the controller that needs the information. Information Level (Enterprise Level) - At the other extreme is the enterprise network. This is primarily used to transfer large units of information, on an irregular basis. Examples are sending electronic mail messages, downloading Web pages, making ad-hoc Structured Query Language (SQL) queries, printing documents, and fetching computer programs from file servers. Here the figure of merit is more the information throughput rather than the time delay when attempting to transfer a single data item across the network. The networks used at this level must be able to support wide variations in the amount of network data, and degrade gracefully by slowing up all traffic equally if presented with a large workload. Internet based networking protocols such as TCP/IP and UDP are typically used at this level. Control Level - In between the Device Level and the Information Level is the Control Network level. Here there is an attempt to mix routine scanning of data values with ondemand signaling of alarm conditions, along with transfer of large items such as controller and device programs, batch reports and process recipes. Unfortunately there are many network protocols available to be used at this level including widespread industry protocols such as Profibus, WorldFIP, Modbus Plus, ControlNet, Foundation Fieldbus and Genius bus. Even worse, the design characteristics of each are sufficiently different to make seamless interconnection very difficult. In particular all these networks have their own techniques for addressing, error checking, statistics gathering, and configuration and this imposes complications even when the underlying data itself is carried consistently. 1 Hirschmann Network Systems White Paper Real Time Services (QoS) In Ethernet Based Industrial Automation Networks FIGURE 2. TYPICAL INDUSTRIAL ETHERNET NETWORK Three common Ethernet based field bus protocols that have evolved to Industrial Ethernet networking protocols are Ethernet/IP, MODBUS/TCP, and PROFINET. Ethernet/IP has evolved from DeviceNet, MODBUS/TCP has evolved from MODBUS, and PROFINET has evolved from Profibus. Ethernet/IP EtherNet/IP, based on Ethernet TCP or UDP IP, is a stack extension for automation industry communication. The 'IP', in EtherNet/IP, stands for Industrial Protocol. EtherNet/IP was introduced towards the end of In EtherNet/IP the upperlevel Control and Information Protocol (CIP) which is already used in ControlNet and DeviceNet is adapted to Ethernet TCP/IP and UDP/IP respectively. Modbus/TCP Modbus/TCP is a derivative of the Modbus protocol and the open specification, based on Ethernet and standard TCP/IP, mounts directly on Layer 4. It defines a simple structured,
3 open and widely used transmission protocol for a master-slave communication. PROFINET The first version of PROFINET used Ethernet for non time-critical communication of higher-level devices and Profibus-DP field bus technology for real-time domains integrated to a higher level by Proxies. In its second release, PROFINET provided two communication mechanisms over Ethernet: The standard non real-time communication channel uses TCP/IP while the second channel bypassing the Layers 3 and 4 of the OSI reference model provides more deterministic communication. The protocol reduces data length in order to minimize the throughput time in the communication stack. For an optimal communication performance PROFINET prioritizes the packet as specified in IEEE 802.1p. For real-time communication the highest priority (priority 7) will be used. The following Table 1 provides a brief summary of the major characteristics of the three major Ethernet based field bus protocols that are typically employed on Industrial and Marine automation systems: Protocol Description Method Realtime performa nce EtherNet /IP Modbus/ TCP (IP=Industrial Protocol) Defined by Rockwell. Supported by the ODVA with some 250 members. Main producer is Rockwell for controllers, I/O, HMI and drives; Accu-Sort Systems, Datalogic and Sick: bar code readers; Acromag, Phoenix and Wago: I/O; Bosch Rexroth, Parker Hannifin and SMC: valves. About 21 certified products. In spring 2004 General Motors declared that it would standardize on Ethernet/IP for its automation programs. Defined by Schneider Electric. Supported by the user group Modbus-IDA. The original Modbus protocol (e.g. RS485) in use since Migration to Ethernet easy to implement and widely spread. Probably the most used Ethernet solution so far. About 90 products mostly from suppliers of remote I/O who have multiple choice of interface. Based on normal TCP/IP with alternative UDP/IP as an object embedding protocol, CIP (Common Interface Protocol), transports I/O-data, configuration and diagnostics over normal Ethernet. Nondeterministic with reaction time down to 10ms. Synchronization (CIPsynq - IEC61588) can be added. Bandwidth for TCP/IP %. Based on normal TCP/IP embedding Modbus, a very simple protocol using a request/reply model. The solution is nondeterministic and reaction time is 20ms at best. Realtime with RTPS (Realtime Publisher Subscriber) can be added. This uses UDP/IP to improve performance but not to true realtime standards. Bandwidth for TCP/IP %. Cyclic communi cation ms. Synchroni zation about 10µs. Cyclic communi cation ms. No synchroni zation. Functions Field bus migration over bridges for ControlNet and DeviceNet (installed base 2.5m nodes) which use the same CIP application protocol. Drive control with moderate cycle time and synchronization. Safety protocol planned but not yet ready and approved. Connection of Modbus to Ethernet. Simple protocol for I/O and reading/ writing in registers. Protocol Description Method Realtime performa nce Profinet Defined by Profibus International with more than 1200 members and regional organizations in 25 countries on all continents. More than 25 companies with 100+ products. Beckhoff: PLC and I/O; Comsoft: I/Ocontroller; Control: I/Ocontroller and gateways; Danfoss, Rexroth, SEW: drives and motion control; HMS: I/O and I/Ocontroller, Interbus Proxy, Gateways; Hilscher: I/O and I/O controller, gateways and software; Phoenix Contact: Interbus proxy, I/O and I/Ocontroller; Siemens: PLC, drives and motion control, I/O and I/O-controller, PCcards, asics; Wago: I/O; Yokogawa: PLC. German auto industry comprising Audi, BMW, Daimler- Chrysler and Volkswagen, declared on Profinet. Normal TCP/IP is used for most functions. This includes configuration, parameterization and CBA (Component Based Automation). No restrictions on TCP/IP traffic. For I/O and other realtime functions down to 1ms, direct addressing and prioritized messages are used (RT channel). No restrictions for TCP/IP-traffic but shorter delays can occur in switches due to the priority. Cycle times from 250µs with 30 axis and 50% TCP/IP. 150 axis in 1ms. Synchroni zation <1µs. Functions CBA, transparent migration for Profibus (installed base 13m nodes) and Interbus (installed base: 7.5m nodes). Other field bus migrations in progress. Safety protocol based on PROFIsafe available with approval expected shortly. Other from Profibus profiles for completion over the next two years. New profiles like MESfunctions, starting with Maintenance, are implemented. TABLE 1 MAJOR CHARACTERISTICS OF THE THREE MAJOR ETHERNET BASED FIELD BUS PROTOCOLS 2 Discussion: Requirements in Current Marine Standards While it has been established by the automation industry that determinism is the key characteristic for a critical realtime control system network, most current marine standards do not directly address this topic. Of more concern is that there seems to be no marine standard in existence that defines a minimum level of determinism in some sort of quantifiable measurable characteristic. In general, the commercial shipbuilding rules and regulations loosely address some key network performance characteristics are such as minimum response times on control actions, redundant data links, and avoidance of unacceptable data transmission delays (overloading of network). However, the classification society rules, flag state regulations, and industry standards fall short of specifying a minimum level of determinism or even requiring protocols that exhibit deterministic behavior. In the new draft dot standard IEEE 45.2, Shipboard Control and Automation Systems, a definition for the term deterministic has recently been added and a requirement for control systems to be deterministic has also been incorporated as follows: Deterministic: A performance that responds to input signals fast enough to keep a sustained operation moving predictably at its required speed. For purposes of this guide this means that the maximum (1) jitter times, (2) error 2 Source:
4 handling times, (3) task execution times, (4) hardware response times and (5) communication times are known and specified, and further means that a certain control application may be used without additional consideration of a variability in response time or execution time. Such a system is deterministic for only those certain systems. IEEE Std. 45.2, Clause goes on to require: Operating systems, if used for time critical control tasks, shall be controlled with real-time deterministic operating systems. For military vessels, the ABS Naval Vessel Rules (NVR) have added a significant amount of key network performance characteristics that are required to be adhered to on mission critical control and automation networks, however determinism is not directly addressed. The NVR has specific key network performance criteria established for reconfiguration/failover time, operational availability A 0, network switch to switch communication, backbone capacity, survivability, and redundancy. However, determinism is not directly addressed. The following table summarizes what current commercial and naval marine standards rules require with respect to automation system network performance requirements. Standard ABS Steel Vessel Rules ABS Naval Vessel Rules USCG Flag Regulations 46 CFR Part 62 IEEE 45.2 Determinism (Quality of Service) Automation Network Characteristics Redundancy Speed Survivability Operational Availability A 0 No Yes Yes 2 second response time No Yes Yes defines response Nominal Timeliness for several systems in msecs No Yes No No No Yes definition provided and Deterministic systems required No Yes No Yes Yes No No No INFORMATION LEVEL PROTOCOLS At the information network level, Internet based protocols are typically employed. Additionally, TCP/IP is almost universally used on top of Ethernet to provide the network and transport layers to deal with the issues of routing and end- to-end data integrity in the open systems interconnect model. The following is a summary of these protocols. The Internet Protocol (IP) is the principal communications protocol used for relaying datagrams (packets) across an internetwork using the Internet Protocol Suite. Responsible for routing packets across network boundaries, it is the primary protocol that establishes the Internet. IP is the primary protocol in the Internet Layer of the Internet Protocol Suite and has the task of delivering datagrams from the source host to the destination host solely based on their addresses. For this purpose, IP defines addressing methods and structures for datagram encapsulation. Historically, IP was the connectionless datagram service in the original Transmission Control Program, the other being the connection-oriented Transmission Control Protocol (TCP). The Internet Protocol Suite is therefore often referred to as TCP/IP. The first major version of IP, now referred to as Internet Protocol Version 4 (IPv4) is the dominant protocol of the Internet, although the successor, Internet Protocol Version 6 (IPv6) is in active, growing deployment worldwide. The Transmission Control Protocol (TCP) is one of the core protocols of the Internet Protocol Suite. TCP is one of the two original components of the suite, complementing the Internet Protocol (IP), and therefore the entire suite is commonly referred to as TCP/IP. TCP provides the service of exchanging data directly between two hosts on the same network, whereas IP handles addressing and routing message across one or more networks. In particular, TCP provides reliable, ordered delivery of a stream of bytes from a program on one computer to another program on another computer. TCP is the protocol that major Internet applications rely on, applications such as the World Wide Web, , and file transfer. Other applications, which do not require reliable data stream service, may use the User Datagram Protocol (UDP) which provides a datagram service that emphasizes reduced latency over reliability. TCP provides a point-to-point channel for applications that require reliable communications. The Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP) and Telnet are all examples of applications that require a reliable communication channel. The User Datagram Protocol (UDP) is one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol (IP) network without requiring prior communications to set up special transmission channels or data paths. UDP uses a simple transmission model without implicit hand-shaking dialogues for providing reliability, ordering, or data integrity. Thus, UDP provides an unreliable service and datagrams may arrive out of order, appear duplicated, or go missing without notice. UDP assumes that error checking and correction is either not necessary or performed in the application, avoiding the overhead of such processing at the network interface level. Time-sensitive applications often use UDP because dropping packets is preferable to waiting for delayed packets, which may not be an option in a real-time system. If error correction
5 facilities are needed at the network interface level, an application may use the Transmission Control Protocol (TCP) or Stream Control Transmission Protocol (SCTP) which are designed for this purpose. UDP's stateless nature is also useful for servers answering small queries from huge numbers of clients. Unlike TCP, UDP is compatible with packet broadcast (sending to all on local network) and multicasting (send to all subscribers). Control LEVEL protocols At the control network level, the current trend is for Ethernet protocols to be typically employed. In addition to setting standards for network speed and responsiveness, a minimum level of determinism should be specified as well in order to ensure that the network will reliably perform under worst case data loading conditions. For this reason, an emphasis should be placed on selecting deterministic protocols when specifying an automation network. Being fast is not good enough; the system needs to be fast as well as reliable under all data loading conditions and configurations. Certain protocols are inherently deterministic by their nature while other types such as shared Ethernet are inherently nondeterministic. However, by using switches and duplex links, Ethernet protocols can be made to behave deterministically. When the IEEE standard for Ethernet was first adopted, Ethernet was considered unsuitable at the control and device level. The network protocol was (and still is) nondeterministic. Device response times could not be guaranteed because of data collisions and the delays in retransmitting data. Standard Ethernet employs a Carrier Sense Multiple Access (CSMA) contention based type protocol where each user shares a common media and must sense if the line is available before transmitting data (listenbefore-transmit). Because there is a possibility of data collisions and transmission delays, CSMA protocols are considered non-deterministic. Because of this lack of determinism, Ethernet deployment was initially restricted to the information and enterprise levels of the automation network. Over time it has penetrated into the process and field bus network levels by adding features such as switches and full duplex transmission lines (see figure(2)). Ethernet s non-determinism has been vastly reduced by advanced switching technologies that allow multiple devices simultaneously to transmit and receive data over multiple network loops. Ethernet switches can divide networks into virtual local area networks that segment devices into logical work groups. Ethernet switches also typically have a fast internal backbone, which helps eliminate collisions among data packets. Fast switches can quickly swap network lines, thereby responding very rapidly to anomalies such as missed communications, power failures and device failures. Another factor which has improved determinism of Ethernet is the use of full duplex switched infrastructure. Early Ethernet implementations used hubs. Networks using hubs experience data collisions which impact determinism. Ethernet switches enable deterministic behavior through use of a full duplex store and forward mechanism. Full duplex communication ensures that there are no collisions or delays for packets going in opposite directions on the same cable. Store and forward switches wait until they have fully received each packet before sending it on to the destination address. Adding some intelligence to the switch improves quality of service and adds queue management capabilities. By assigning a priority to time-sensitive data, intelligent Ethernet switches can elevate that traffic above lower-priority data. This ensures that high-priority traffic always traverses the network even if the network becomes congested. Switches also have capabilities that can classify, reclassify, police, mark and even drop incoming data packets as application priorities require. Quality of Service (QoS) is also a known technique used to prioritize traffic on a network based on the application type or destination port. QoS helps to tighten the delay/jitter times. Implementations typically involve labeling different classes of data with a particular priority [from 0 to 7]. For example, an administrator could set I/O scanning traffic at the end-node as higher priority than other application traffic (HMI, SNMP, etc.). When the switch receives traffic from the endnode, the I/O scanning data will be sent out ahead of other traffic (even if it has been received after the other traffic) through prioritization applied by the end-node. Operating speeds across even conventional wire cabling have increased from 10 megabits/second (Mbps) to 100 Mbps. Automatic switches that negotiate 10/100 Mbps are common, optimizing speed and service, as well as allowing a mix of 10 Mbps and 100 Mbps devices on the same network. The higher speed reduces the probability of data collisions and lost data, and delays are no longer an issue. For most factory and marine automation applications, 100 Mbps Ethernet is deterministic enough. Today, 1 gigabit/second (Gbps) is available, and 10 Gbps is near. Gigabit Ethernet is primarily targeted for enterprisewide backbone networks, and it is showing up in distributed control system backbones in the process and marine industries. Advances have been made in industry using ETHERNET in high speed applications (i.e. motion control), in what is called Industrial ETHERNET. These protocols provide the ability to tunnel interrupts through the ETHERNET switch at speeds from 0.25 to 5 milliseconds. Coordination with the controller can reduce this to less than one microsecond. DEVICE LEVEL PROTOCOLS At the device level, field bus type protocols are typically employed. Field bus protocols are truly
6 deterministic. They are time-slot allocation protocols such as Time Division Multiplexing (TDM) and Token passing protocols such as token bus and token ring. Time-division multiplexing (TDM) is a type of digital or (rarely) analog multiplexing in which two or more signals or bit streams are transferred apparently simultaneously as subchannels in one communication channel, but are physically taking turns on the channel. The time domain is divided into several recurrent timeslots of fixed length, one for each subchannel. A sample byte or data block of sub-channel 1 is transmitted during timeslot 1, sub-channel 2 during timeslot 2, etc. One TDM frame consists of one timeslot per sub-channel plus a synchronization channel and sometimes error correction channel before the synchronization. After the last sub-channel, error correction, and synchronization, the cycle starts all over again with a new frame, starting with the second sample, byte or data block from sub-channel 1, etc. One weakness of TDM protocols is that they waste bandwidth if all the nodes do not have something to send and a loss of synchronization to the start the frame signal results in the loss of all communications (single point of failure). Furthermore, under high conditions of traffic, system responses to input and output can vary considerably with time.3 Token bus is a network implementing the token ring protocol over a "virtual ring" on a coaxial cable. A token is passed around the network nodes and only the node possessing the token may transmit. If a node doesn't have anything to send, the token is passed on to the next node on the virtual ring. Each node must know the address of its neighbor in the ring, so a special protocol is needed to notify the other nodes of connections to, and disconnections from, the ring. Token bus was standardized by IEEE standard It is mainly used for industrial applications. Token ring local area network (LAN) technology is a local area network protocol which resides at the data link layer (DLL) of the OSI model. It uses a special three-byte frame called a token that travels around the ring. Token-possession grants the possessor permission to transmit on the medium. Token ring frames travel completely around the loop. Stations on a token ring LAN are logically organized in a ring topology with data being transmitted sequentially from one ring station to the next with a control token circulating around the ring controlling access. This token passing mechanism is shared by ARCNET, token bus, and FDDI, and has theoretical advantages over the stochastic CSMA/CD of Ethernet. A weakness common to both of these token-based protocols is that the token may be lost due to error and may be difficult to recover quickly. Errors may also result in generation of multiple tokens. Accordingly, most protocols employ a method to automatically recover tokens. 3 Determinism in Industrial Computer Control Network Applications, January 1995, Echelon White Paper Measuring and Test Methods to determine if system is deterministic As the move to Industrial Ethernet continues on the manufacturing floor, a key issue of concern is end-to-end performance. Determinism, the ability to ensure that a packet is sent and received in a specific period of time, is an important design goal for industrial networks. Performance tests for switched and routed networks have shown that it is possible to provide real-time communication on the network domain. Measuring Determinism means the capability to accurately characterize the worst case time to exchange information end to end, no matter what other network traffic is occurring. The throughput, latency time and jitter time of this response are the quantified measures of determinism. These terms are defined as follows: Throughput - In communication networks, such as Ethernet or packet radio, throughput or network throughput is the average rate of successful message delivery over a communication channel. This data may be delivered over a physical or logical link, or pass through a certain network node. The throughput is usually measured in bits per second (bit/s or bps), and sometimes in data packets per second or data packets per time slot. Latency - a measure of the time delay experienced by a system. Latency in a packet-switched network is measured either one-way (the time from the source sending a packet to the destination receiving it), or roundtrip (the one-way latency from source to destination plus the one-way latency from the destination back to the source). Round-trip latency is more often quoted, because it can be measured from a single point. Jitter - In the context of computer networks, the term jitter is often used as a measure of the variability over time of the packet latency across a network. A network with constant latency has no variation (or jitter). Packet jitter is expressed as an average of the deviation from the network mean latency. However, for this use, the term is imprecise. The standards-based term is packet delay variation (PDV). PDV is an important quality of service factor in assessment of network performance. In evaluating this determinism it is important to consider not only network throughput and latency, but also delay variability or jitter. These measurements must also be made in the presence of a prescribed traffic load with precise hardware timing for transmission and reception, insuring accuracy in the presence of congestion. Performance tests can be conducted to characterize determinism by directly measuring point-topoint network throughput, loss, latency and jitter. The characterization can take place across an individual switch during vendor evaluation, or across the entire network.
7 Performance tests for network components (switches) have shown that it is possible to provide real-time communication on the network domain making use of Quality of Service. 4 CONCLUSION: There is a strong technical rationale for adding requirements for mission critical control and automation system networks to be deterministic. Defining a minimal level of determinism as well as techniques for evaluating and measuring the level of determinism pre-installation (simulation at the factory) and post-installation (dockside shipboard testing) should also be considered. RECOMMENDATIONS: Add a requirement for mission critical vital automation systems to be deterministic in terms of throughput and system latency. Add a requirement that explains how to measure determinism for a given system. Add requirements for submittals to be prepared and submitted for review and approval in order to show objective evidence that the system will exhibit deterministic behavior (i.e. network performance studies). Add requirements for factory and shipboard testing to be conducted to demonstrate that the system exhibits determinism. 4 Schneider Industrial Ethernet white paper Oct 07, Improving Determinism of Industrial Applications Over Ethernet