1 UniGrids Streaming Framework: Enabling Streaming for the New Generation of Grids Krzysztof Benedyczak 1, Aleksander Nowiński 2, Krzysztof Nowiński 2, and Piotr Ba la 1,2 1 Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Chopina 12/18, PL Toruń, Poland 2 Interdisciplinary Center for Mathematical and Computational Modelling, Warsaw University, Pawińskiego 5a, PL Warsaw, Poland Abstract. We present a new infrastructure for high performance streaming in OGSA/WSRF compliant grid. The UniGrids Streaming Framework (UGSF) works with UnicoreGS as WSRF hosting environment. The paper discusses the advantages of mixed SOAP based control with highly efficient streaming. The UGSF components, streaming server and WSRF web service are described along with a detailed performance analysis including comparison to standard solutions. Some applications based on the UGSF are also presented. 1 Introduction Current trends in grid technology is clearly focused on OGSA (Open Grid Services Architecture)  which implies usage of the web services. The detailed guidelines on how to build grid services are given by the WSRF specifications . The consensus about the importance of such approach was motivated by many reasons. Here we can point to an interoperability as the most significant one. The WSRF as well as other specifications allow developers to easily create grid software compatible with other WSRF implementations. Moreover, as web services technology is widely adopted in B2B applications, one can make use of existing experiences and adopt available solutions. A good example is the BPEL  specification defined for business processes, which is now being used as the tool to define grid workflows. These reasons form a solid base for OGSA which plays a vital role in grids nowadays. But we can not forget that web services also have disadvantages. Here we will focus on two of them that are crucial for data streaming. Thefirstandthemostimportantdrawback of web services is efficiency. Web services technology is based on the SOAP protocol. This results in extensive usage of XML. The obvious consequence is a large overhead for even a simple operation: the SOAP engine has to perform a lot of XML parsing or encoding. Moreover, XML data encoding is very verbose, thus ineffective. In the most streaming B. Kågström et al. (Eds.): PARA 2006, LNCS 4699, pp , c Springer-Verlag Berlin Heidelberg 2007
2 810 K. Benedyczak et al. applications such data overhead is undesirable. The other problem is data streaming: SOAP is message driven and XML to be parsed must be fully read. 1 Presented disadvantages cause that web services technology can not be seen as suitable for any interactive, real-time application. It is hard to imagine a scientist steering an interactive device with latency of every operation measured in seconds. Of course, various XML technologies like binary encodings, aforementioned streaming processing of XML, MTOM  or TCP bindings can be used to boost the performance of web services. We are sure that in some cases it is possible to build streaming technology on top of such optimised web services. An obvious advantage of such approach, is better integration with existing web services agents (like new UniGrids gateway) and not much more. It is also clear that such solutions can be useful only for less demanding streaming applications. To solve the problem we have developed a hybrid system which is a platform to build any type of streaming services managed in WSRF compliant way on. The solution is highly responsive and efficient. 2 System Design The UGSF system is based on the WSRF s compliant version of the well recognized UNICORE middleware . The UnicoreGS  is used as the WSRF hosting environment. The aim of the UniGrids Streaming Framework (UGSF) is to provide direct data streaming and steering for applications. The main part of UGSF is UGSF core which is a middleware that allows developers to create dedicated streaming solutions. 2 Substantial effort was made to prepare a system where the creation of a specialized solution is as easy and quick as possible. Every system based on the UGSF will use the core together with some application dependent code. The UGSF core provides basic functionality common for all streaming applications. This includes creation or shut-down of a connection. The system is designed in such a way that a group of versatile software pieces can be reused. A good example is the component to locate UnicoreGS job s working directory and access it s contents. The UGSF core consistsofa UGSF Web Service part, a Streaming Server part and a library to create clients. The usage of the last component is optional. The UGSF Web Service takes advantage of WSRF capabilities. It is used to control a set of available stream types, to create new streams and to manage already created ones. The Streaming Server part is managed by a UGSF Web Service and performs streaming. The cclient library is used to simplify the creation of the client-side software. Overall architecture is shown in Figure 1. 1 It is worth to note that currently there are intensive efforts to eliminate this issue and hopefully new generations of SOAP engines (as AXIS 2  or XFire  with support for StaX) will solve it. 2 We will use this therm whenever we will refer to basic framework, without actual stream implementations which are also included in UGSF distribution.
3 UGSF: Enabling Streaming for the New Generation of Grids 811 Fig. 1. The general architecture of UGSF The UGSF core is complemented with stream implementations. These consists of two parts: the streaming server and the web service modules. The web service module implements control operations specific to the stream implementation. The streaming server module deals with a wire streaming protocol and data consumption/acquisition. The general pattern of UGSF usage is as follows: The UGSF installation is configured by the system administrator, who defines so called stream types. Every stream type is one stream implementation with some configuration parameters (which can t be modified by users). The user chooses the stream type and creates its instance. If the implementation stipulates that some user s parameters are needed, then the user must supply them. As a result a reference to the newly created stream management WS-Resource is returned. The user can invoke any common (provided by UGSF core) or special (stream implementation defined) operation on the WS-Resource assigned to the created stream. The resource properties contain (among others) information about how to connect to the UGSF streaming server to start streaming. The user connects to the UGSF streaming server and starts the data transfer. It is possible to control the connection via the web service interface. The Streaming Server and the clients built for UGSF are grid-enabled. Therefore, the UGSF can be used to let legacy applications benefit from the grid technology (e. g. grid authorization), using already developed stream implementations. To accomplish general overview of the UGSF we present details about the underlying transport level protocol. In principle, the UGSF is highly flexible and
4 812 K. Benedyczak et al. can be used for any application level protocol. Currently there is no possibility to use any protocol than TCP. This decision was motivated by multiple factors. Our first aim was to support tunneling of streamed data with the UNICORE gateway, which can operate only on TCP connections. Another reason is that the usual use of grid middleware requires high security and reliability of connections (e.g. scientific applications which stream video must not loose any frames contrary to typical multimedia situations when such loss is acceptable). This is much easier to implement in general framework based on TCP/TLS. Nevertheless, in the future versions of the UGSF the UDP entry points can be added. This will involve some redesign of the UGSF Streaming Server. 2.1 UGSF Web Service The UGSF Web Service component consists of two kinds of web services. A base one (called StreamingFrameworkService) is responsible for the connection authorisation, the creation of stream and its setup. During this process the new WS-Resource (called StreamManagementService) is created with a dedicated web service interface. This WS-Resource acts as a controller of an active streaming connection. The StreamingFrameworkService is a WS-Resource which maintains lists of StreamManagementServices. It can be argued that this is a perfect case for the WSRF Service Group which is a federation of WS-Resources. Unfortunately, the Service Group can t be used here due to the security restrictions. The WSRF specification doesn t permit filtering the Service Group s content. As a result every user would have the possibility to see other users streams. The StreamingFrameworkService allows users to get a list of available streaming services and setup a connection to the specific streaming service. The list of both owned and accessible streams is available (see section 2.3 for details). In addition, the StreamingFrameworkService has an administrative interface, which empowers a system administrator to enable and disable particular stream types on the fly. The service reconfiguration such as addition or removal of stream types is also possible. For each created stream, an instance of the StreamManagementService allows the user to perform universal operations for all streams. This includes shutting the stream down (by means of WS-Lifetime interface), getting status and statistics of the connection, as well as pausing or resuming streaming. This functionality can be easily enriched by the developer. He can extend StreamManagementService with additional operations. The enriched implementations are free to consume any special XML configuration supplied to the StreamingFrameworkService and required for service setup and creation. We have also developed an additional service called StreamingFrameworkFactory, which allows site administrators to create base UGSF services and configure them initially. The developed service follows the pattern of the UniGrids atomic services .
5 UGSF: Enabling Streaming for the New Generation of Grids 813 Fig. 2. Services and modules of UGSF components 2.2 UGSF Streaming Server The UGSF Streaming Server is a stand-alone, modular application, which performs streaming to and from the target system. The server is tightly connected to the UGSF Web Service which maintains stream definitions. The communication is done with Java RMI. The server is modular, and highly configurable. The dedicated modules were created to access the actual streaming data source. Such a module also gives access to the running job s outcome. It can also provide it with input, if required. On the other hand there are stream modules that don t need any job to cooperate with. A module which gives access to physical resources like a video camera is a good example. Another one is a module which enables grid usage of the legacy TCP or UDP servers. There is also a whole class of auxiliary modules which acts without any external resources. These modules, for example, convert input data from one format to another. For a particular site, there can be more than one Streaming Server, operated by only one UGSF Web Service. Each server is able to handle multiple stream modules. There is also the possibility to configure another setup: one single Streaming Server can be managed by more than one UGSF Web Service. However, such a scenario is of little practical value. The access to the UGSF Streaming Server is accomplished with a special protocol. Currently the protocol is trivial but it may be developed to a more complicated one when new features are needed. The access to the Streaming Server is done by means of exchangeable entry modules. More than one entry module can be turned on simultaneously. Every stream can be accessed by many entry protocols and the application can choose the one it prefers or understands. Currently there are available HTTP and HTTPS entry points with simple POST based protocols (in fact there is one entry point which can be configured to use or not to use TLS). The system is ready to use the other protocols as well.
6 814 K. Benedyczak et al. 2.3 Advanced Features and Security In addition to the basic infrastructure for streaming connection creation, the UGSF provides a set of advanced stream related operations. These operations focus on a sophisticated data flow creation. By the term data flow we mean the here composition of one or more streams between servers and/or clients created for one application. Every stream implementation can contain more than one flow. Flowisasynonym to a connection, e.g. if one stream maintains three flows then it is possible to open three concurrent connections to this stream. This provides an opportunity to create more advanced streams with a clear separation of logical flows of data, including separation of input and output. The UGSF streams have metadata attached. For every flow there are defined, among others, supported formats. It is possible to specify more than one format for a single flow, as well as express the only supported format combinations. Any flow can have two-way traffic, but it is suggested that a flow should only use input or output whenever possible (so to be one directional). When two-way traffic is required, two flows are preferred. Streams designed this way are much more effortlessly integrated into data flows. In order to enable composition of other than trivial data flows (i.e. client server), UGSF offers a connect operation. It instructs an already created flow of one stream to exit its passive state and to actively initiate connection to another flow. There is also a possibility to create a flow with cloning ability. Such a flow can be used to dynamically create new flows in a stream implementation. A good example of the cloning feature is a multiplexer, which basically manages two flows, the input and output. The output flow has cloning ability and the user can clone the output flow multiple times. As a result he can fork the input into arbitrary number of outputs. Up to now, we haven t covered one significant aspect of the UGSF system: security. The main question here is: What are the requirements to open connection to the Streaming Server? The simplest approach is to enable access to the stream only to its owner. Unfortunately, such a method is not sufficient for more complicated scenarios, such as server server connections. To give an example; Let s consider a data flow where server A is the source of data. This data should be processed by a server B and finally the output should be received by the user U. IfU creates appropriate stream instances on A and B, B will not be allowed to access A s stream - only U will be. To solve this problem, every flow is assigned a token, which is the identity of its owner and its access policy. The token is a large unique number. The access policy is defined by the creator of the stream and describes who is authorised to contact the flow. The default policy is owner-only. In this case only users with a certificate matching the flow owner s certificate can open a connection. He has also to present the flow token for identification purposes. Please note that the token value is not sensitive as it is valid only after a connection is established
7 UGSF: Enabling Streaming for the New Generation of Grids 815 using a valid certificate. Policy can allow public (non restricted) access and also an explicitly specified entity to access the stream. In the matter of UGSF security we still have some work to do. We would like to provide XACML  support for policy description. Also some trust delegation should be supported to achieve better integration with standard grid trust delegation (but this is a matter of better system cohesion). 3 Applications The UGSF system includes several basic stream implementations. The first one is TCPStream, which can be seen as a grid version of the SSH tunnel. It has a similar functionality to such a tunnel and an obvious advantage is that users don t need shell access to the grid site. Moreover, they are authorised in the same way as for any other UniGrid services. Using available client software for creating TCP tunnels in the UGSF package, we have used UGSF to steer an Advantech s ADAM/5000TCP device with an existing application. The ADAM/5000TCP is a Modbus  Ethernet device. The UGSF is very useful because Modbus Ethernet devices are in general insecure and must be protected by firewalls. This example shows that by using UGSF, the whole range of Ethernet devices can be secured and grid enabled only by putting in a few lines in the UGSF configuration file. The TCPStream is accompanied by UDPStream which does a task not widely available by any other software. It tunnels UDP datagrams over the TCP protocol, maintaining UDP sessions in a manner similar to that used by firewalls (packets are scanned for changes of destination ports). In the UGSF there is also a FileStream implemented. It serves as a streaming version of a file access service. The FileStream has the ability to detect file growth, and allows to stream file content as new data is put in. Clearly, this solution is targeted to receive results of arbitrary simulation in real time. To monitor grid job results, there is another stream called IVisStream which is a simple extension of FileStream. It supports, in addition to FileStream features, location of files outputted by a given grid job. Currently we are working on more universal stream types, which will support data flow creation, (as e.g., the Multiplexer stream for splitting arbitrary flow into multiple ones) and to add generic support for video streaming which is a necessity for many streaming applications. We chose Theora  as our native codec. Streams to compress raw video (and decompress it) will be available shortly. 4 Performance During the design of the UGSF, our aim was not to introduce any penalty on throughput (except enforcement of TCP and use of SSL in most cases). It was achieved, as after stream setup, the developer can use an arbitrary protocol on open socket connection. The UGSF core does not add any extra data to the
8 816 K. Benedyczak et al. Fig. 3. The performance of plain netcat stream versus netcat over UGSF TCP tunnel (over the 100Mbit network) opened stream. To check the performance and see how tunneling over a particular stream will impact it, we have run performance tests on the TCPStream. For the tests we have used netcat TCP session. We have compared a direct connection to one tunneled via TCPStream. Two machines running Fedora Core 4 operating system (kernel version FC4) were used. The systems were interconnected with 100Mbit Ethernet. Server machine (B) was equipped with Intel Xeon 2,4GHz CPU and the client was AMD Athlon CPU (A). As it is shown in the figure 3, the plain UGSF tunnel performs nearly the same as the plain netcat connection. The SSL version is, of course, slightly slower, but still the difference is tiny and acceptable in most usage scenarios. We have also looked at the CPU usage reported by the Streaming Server. The server consumed less than 15% of the CPU time at A and about 2 3% more CPU Table 1. Performance comparison of the UGSF and web service based implementation. The data was sent in small chunks in two directions. RQ stands for request and RE for response. The third column contains the total number of full message exchanges (i.e. sending request and receiving response) per second. Messages Implementation (RQ + RS) Relative sizes per second speedup RQ 16B/RS 10kB web service/unicoregs RQ 10kB/RS 10kB web service/unicoregs RQ 16B/RS 10kB UGSF/Java DataStream RQ 10kB/RS 10kB UGSF/Java DataStream
9 UGSF: Enabling Streaming for the New Generation of Grids 817 time at B machine. When SSL is turned on, the CPU intensive encryption caused the increase of the systems utilization to about 50 and 55 percents respectively. To summarize, the results are promising: practically there is nothing to optimize. The CPU operations on recent hardware does not impact throughput of 100MBit streams and there is still some CPU power left. Moreover, the CPU intensive operations are mostly those coming from the SSL sockets implementation of the Java toolkit. The most interesting topic is the comparison of operations invoked by means of standard web service calls, with an analogous system based on the UGSF. The web services operate exclusively on messages. The AXIS 1.x environment used in the UniGrids project limits it to complete message exchanges instead of real streaming. It can be theoretically proven that one-way web services can resolve this issue. However, it is problematic from the server side when a client is behind the firewall/nat. Some progress can be made by using HTTP 1.1 persistent connections  but currently this (along with other needed functionality) is not available in AXIS 1.x. In order to run comparison tests we have developed a trivial UnicoreGS service with only one operation, which consumes and returns a configurable amount of raw data (Base64 encoded). The results of running series of operations on this service are given in the first two rows of tab. 1. Also small client-server application was prepared to test UGSF version. It was used through UGSF TCPStream, and as internal protocol we used Java DataStream. As it can be noted from the last column of table 1, the speed up is more than 20. In fact, this is the minimal performance gain. In reality UGSF can be used to operate much more effectively: by implementing specialised UGSF stream type, the two extra data hops introduced by generic TCPStream and it s client can be eliminated. Moreover, in many cases streaming applications can benefit from parallel streaming, while in the tests we were using synchronised message exchanges. Test results were obtained on the same machines as above. We would like to mention that there is a lot to improve in the web service version too. A Better SOAP engine (e.g. AXIS 2), and usage of its features, can give a substantial performance boost. Also UnicoreGS currently is still in development and there were no optimisations made. 5 Summary The presented development is focused on various applications where UGSF will have a possibility to prove its value. We consider a device access and remote steering, video transmission and scientific image processing. Of course, visualisation and real-time monitoring of computation are also of interest, as it has already been presented for the UNICORE middleware . The UGSF includes two stream implementations that allow for tunneling connections to both TCP and UDP legacy servers on the grid site. Services to stream changing content of grid jobs in real-time are also ready to be used. Support for data flow creations encourages to use UGSF in a component driven way, where already created
10 818 K. Benedyczak et al. stream implementations are reused in larger applications. In general, the developed infrastructure opens a field to numerous applications, which require on-line data streaming and steering. This work was supported by European Commission under IST grant UniGrids (No ). References 1. Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Globus Project (2002), 2. Czajkowski, K., Ferguson, F.D., Foster, I., Frey, J., Graham, S., Sedukhin, I., Snelling, D., Tuecke, S., Vambenepe, W.: The WS-Resource Framework, Version 1.2. OASIS (Organization for the Advancement of Structured Information Standards) (April 2006), resource-1.2-spec-os.pdf 3. Andrews, T., Curbera, F., Dholakia, H., Goland, Y., Klein, J., Leymann, F., Liu, K., Roller, D., Smith, D., Thatte, S. (eds.): Trickovic, I., Weerawarana, S. Business Process Execution Language for Web Services, Version 1.1. OASIS (2003) 4. Axis 2 project (August 2006), 5. XFire project (August 2006), 6. Gudgin, M., Mendelsohn, N., Nottingham, M., Ruellan, H.: SOAP Message Transmission Optimization Mechanism, W3C Recommendation (January 25, 2005), 7. UniGrids project website (August 2006), 8. UNICORE at SorceForge (August 2006), 9. Godik, S., Moses, T.: extensible Access Control Markup Language, Version 1.1. OASIS Committee Specification (August 07, 2003), cs-xacml-specification-1.1.pdf 10. Modbus Organization, Inc. and Schneider Automation Inc.: MODBUS Application Protocol Specification, vol. 1.1 (August 2006), 11. Fielding, R., et al.: RFC Hypertext Transfer Protocol HTTP/1.1, Section 8.1: Persistent Connections. The Internet Society (1999), 12. Theora I Specification. Xiph.org Foundation (March 7, 2006), I spec.pdf 13. Ba la, P., Benedyczak, K., Nowiński, A., Nowiński, K.S.: Real-time visualisation for Unicore middleware. In: Wyrzykowski, R., Dongarra, J.J., Meyer, N., Waśniewski, J. (eds.) PPAM LNCS, vol. 3911, pp Springer, Heidelberg (2006)