Centralized logging system based on WebSockets protocol Radomír Sohlich sohlich@fai.utb.cz Jakub Janoštík janostik@fai.utb.cz František Špaček spacek@fai.utb.cz Abstract: The era of distributed systems and mobile devices brings new challenges in monitoring and controlling the remote components. Watching of components is usually realized through log records. To obtain comprehensive view of distributed system the centralization of logged information is usually required. There are many centralized log solutions such as Syslog,Greylog2,Logstash or cloud service Loggly that implement the functionality of gathering log messages and data from remote components and devices. These solutions are generally based on one way data transfer, that directs from client to server. The simplest solutions use basically log file synchronization to obtain data from remote components. More sophisticated solutions use periodic reading of remote system web service or expose other protocol endpoints like syslog protocol. This research paper proposes centralized logging solution based on Websocket technology. In section 4 article describes features, architecture and communication scheme. Section 5 compares the proposed solution with existing applications. The last section 6 discusses the future work and enhancements of the proposed system. Key Words: Centralized logging, Log4j, WebSockets, Syslog, Greylog2 1 Introduction The era of emerging software with distributed architecture emphasizes difficulties with monitoring and analyzing functionality of remote components. The simplest way how to track behavior of system elements is logging its operations during the runtime. These data then bring the record of program flow and also the information describing system failure or malfunction. The trivial logging solution is that the data is written to local storage. This way is sufficient if the whole system is located on the same machine. The problem occurs if the system is located across multiple devices and the components write logs to local files. In this case the information of whole system behavior is located in separate files and these must be merged and analyzed. There are two general approaches of solving the problem of distributed system logging. Both are based on centralizing information on single machine. These approaches differ in the way how the data are collected. The first technique is that the components of system write log record to their local storage and in the system there is a subsystem which periodically synchronizing its log storage with the remote component. Alternatively the log server sends the request to specified source and receives the information from that source. The shortcoming of the solution is that the entire log file needs to be synchronized and the log records are not present in centralized component in real-time. The second solution stands on exposing receiver for communication with specific protocol. The remote component then sends the log messages directly to log server. Alternatively the remote component could contain thick logging client, which connects directly to remote storage(e.g. database). In both approaches the log server is just passive receiver for log data. This paper proposes experimental implementation based on the second approach with some enhancements in server functionality. The solution is build on lightweight WebSocket technology, NoSQL storage and Java application server. The main improvement is in the usage of WebSocket communication, not only to send log data, but even to control the client settings and functionality. Organization of paper Section 3 describes the requirements and general description of solution. Further in section 4 the architecture and technologies are described. The section 5 contains comparison with another centralized logging solution. Last section 6 summarizes the results of testing and discusses future work. ISBN: 978-1-61804-262-0 103
2 Related work The area of this problem is fairly covered so the study of existing solutions were done (generally Java platform implementations). There are some widely used systems and libraries using one of the mentioned approaches. Greylog2 [11] is log capture and analyzing tool. It has a flexible input types, including syslog, plaintext, and GELF. Additionally it is able to read from HTTP API. Greylog2 using MongoDB[12] as a storage and Elasticsearch[13] to analyze and search through the log records. Another related solution is the Syslog-ng[14]. It supports client-server mode, which is based on configuring one instance of Syslog-ng on client machine to transfer log messages to server machine through specified channel (e.g. udp,tcp connection or syslog protocol). Also syslog-ng driver can be used to write messages directly to remote storage (e.g. SQL storage,nosql storage). Syslog-ng doesn t provide log analysis tool, this feature must be realized through third party tool. Logback brings very similar concept to Graylog2, but does not provide complete functionality for log analysis. To store and analyze logs, Elasticsearch must be integrated. Loggly is a commercial cloud service, commonly called logging as service. The service is capable of gathering logs from every popular programing language or platform and the data could be sent using almost every protocol (Syslog TCP, Syslog UDP, Syslog TCP w/ TLS, or HTTP/S). The disadvantage of this solution is that the system must be connected directly or indirectly (over proxy) to the Internet. 3 Requirements The analysis of the related projects reveals the main requirements for proposed implementation. multi-platform (Linux,Windows) server solution flexible NoSQL storage for log records user friendly web interface access through REST API client transfer protocol widely supported across commonly used programming languages lightweight client implementation easily implementable message format simple configuration from server side open-source 4 Architecture The high level architecture is very simple and it is based on client-server model.[2] The server side consist of application that receives and processes logs, application server and persistent layer. 4.1 Communication Fig. 1: Architecture design As the solution required communication in both directions(client to server, server to client), suitable technology had to be selected. To ensure simplicity and versatility, a web based protocol is preferred. There were designed many two way communication protocols that use HTTP transport layer to benefit from existing infrastructure (authentication, secure transport, proxies). However these protocols are tradeoffs between efficiency and reliability as the HTTP protocol is not initially designed for bidirectional communication[6]. As the substitution for these tradeoffs, WebSocket protocol was designed. The protocol uses the HTTP transport layer as is and it is designed to work on standard port 80 or 443 for secure transport. After a micro-benchmarks between HTTP alternatives and WebSocket protocol, the WebSocket technology was selected. One of the advantages of Web- Socket protocol is that it uses one TCP connection for the communication and avoids the repetitive opening of connection, which reduces the performance. Same as basic HTTP protocol the WebSocket protocol has wide support across programming platforms, so the implementation of clients for various platforms is possible. The log messages are JSON formated and sent by WebSocket text frame to/from client. The JSON format was chosen for its flexibility and support in many programming languages. The JSON log ISBN: 978-1-61804-262-0 104
message contains all standard fields common for logging. There is also field for arbitrary object to be logged. This feature simplifies the data-mining operations from log records. The remote reconfiguration of logging client is implemented via text frames in special format different from standard log message. Also the direction is from server to client. The idea behind this feature is that the server could remotely control settings of each client log level or identification of component. The communication scheme on fig.2 shows the entire process of establishing connection and message exchange. After the WebSocket handshake, server sends initial configuration message to client, which contains the information about log level (in this case FINE) and identification of component, if it is preconfigured by log server admin. After this information exchange, the client sends the log messages with appropriate level. The reconfiguration message shows how the remote setting of log level is done (in this case INFO level). 4.2 Server Fig. 2: Communication scheme The server part is Java Enterprise application, which is running on Wildfly[3] application server. The application implements WebSocket endpoint for logging clients. Server contains remote control logic, user interface and additional REST API to access the functionality designed for log analysis and client remote control. User interface consists of configuration of clients, log analysis and search engine. The persistent layer is based on MongoDB NoSQL database. It was chosen for its flexibility and also it could be easily integrated with advanced indexing, searching and analyzing tools (e.g Elasticsearch, Kibana, Hadoop). The solution transfered the logic of log message writing and processing to server side. Server implementation uses MongoDB Java driver to write logs and to process the log messages asynchronously. The asynchronous writing brings the increase in throughput. 4.3 Client Thanks to WebSocket technology, the implementation of client is possible in various languages(c++,.net,java,javascript,python and others). The experimental client is implemented in Java programming language using the Jetty Web- Socket Client API implementation. Serialization of LogMessage is implemented by Jackson library. If the connection to log server is not present, the client caches records and after the connection is established again, it sends all cached logs to server. 5 Comparison To test proposed implementation against an existing solution, the log4j2 NoSQL appender was chosen as the nearest matching solution. This comparison measures the performance of logging clients, where log4j2 NoSQL appender uses MongoDB Wire Protocol to transfer serialized messages. The custom client uses WebSocket protocol as described above. The methodology of comparison is as follows: create a logger object insert k log records (text logs, logs with exception) measure duration of insert operation The benchmark is implemented also as Java application, as the Log4j2 is Java library. The measurements were realized on clear database collection and in separate runs. Every measurement was repeated 40 times. The insertion of 1000 log records was chosen as most representative sample size if we consider, that common application does not insert more than hundreds of log by one Logger instance. In case if there is no additional object(exception) to serialize the proposed solution shows higher average time to insert 1000 logs. Figure 3 shows the comparison of average duration of 1000 info log messages insertion. The measured value of experimental implementation is almost similar to log4j appender. Figure 4 displays the average duration of inserting 1000 log record containing exception object. In this case the experimental implementation achieved lower time value. This result is caused by more simple implementation of exception serialization and also by transferring of persistence operations to server. Also the average duration is nearly constant. ISBN: 978-1-61804-262-0 105
Java, Python and C++ Protocol buffers[15] could be solution, but the usage of this technology eliminates the versatility of message format. There are also new opportunities to explore in way of remote configuration and client functionality control. In proposed system the reconfiguration of log level and component name are implemented, but further attributes and even remote functions could be added e.g. gathering information about remote system (utilization,source usage) dependent on client platform. Fig. 3: Average duration of inserting 1000 logs without exception Fig. 4: Average duration of inserting 1000 logs with exception 6 Conslusion and future work As described in paper, there are wide array of centralized logging solutions. From simplest solution of file replication to sophisticated cloud services like Loggly. We proposed a centralized logging that benefits from WebSocket protocol as widely supported solution of bidirectional communication. The protocol also uses existing infrastructure. The experimental solution is based on Java platform, but the clients could be implemented in other programming languages. The solution was compared with existing implementation of Log4j NoSql appender. The benchmark of proposed solution proofs, that even not optimized version of that implementation is comparable to existing widely used Log4j2 appender. The tests also display, that the time to send a log record remains stable if the log record contain an object of exception. On the other hand the comparison also reveals that there is space for optimization. The serialization process of log message could be improved as it creates a performance leak of whole system. For References: [1] RFC6455. The WebSocket Protocol. 2011.: Internet Engineering Task Force (IETF), 2011. Available from: https://tools.ietf.org/html/rfc6455 [2] BERSON, Alex. Client-server architecture. McGraw-Hill, 1992. [3] Wildfly [online]. 2013 [cit. 2014-10-29]. Available from: http://wildfly.org/ [4] Mozilla Developer Network: WebSockets [online]. 2014 [cit. 2014-10-29]. Available from: https://developer.mozilla.org/en- US/docs/WebSockets [5] Qt Project: Qt WebSockets C++ Classes [online]. http://qt-project.org/doc/qt-5/qtwebsocketsmodule.html [6] RFC6202. Known Issues and Best Practices for the Use of Long Polling and Streaming in Bidirectional HTTP. University of Rome Tor Vergata : Internet Engineering Task Force (IETF), 2011. Available from: https://tools.ietf.org/html/rfc6202 [7] CROCKFORD, Douglas. The application/json media type for javascript object notation (json). 2006. [8] ABUBAKAR, Yusuf; ADEYI, ThankGod S.; AUTA, Ibrahim Gambo. Performance Evaluation of NoSQL Systems using YCSB in a resource Austere Environment. Performance Evaluation, 2014, 7.8. [9] The State of Logging in Java 2013. In: VAN CAMP, Balder. Zeroturnaround [online]. 2013 [cit. 2014-10-29]. Available from: http://zeroturnaround.com/rebellabs/the-stateof-logging-in-java-2013/ [10] APACHE SOFTWARE FOUNDA- TION. Apache Log4j 2 [online]. 2014 [cit. 2014-10-29].Available from: http://logging.apache.org/log4j/2.x ISBN: 978-1-61804-262-0 106
[11] TORCH GMBH - THE GRAYLOG2 COMPANY. GRAYLOG2 [online]. http://www.graylog2.org/ [12] MONGODB, Inc. MongoDB [online]. http://www.mongodb.com/ [13] ELASTICSEARCH BV. Elasticsearch [online]. http://www.elasticsearch.org [14] BALABIT IT SECURITY. Syslog-ng: The Foundation of Log Management [online]. 2014 [cit. 2014-10-29]. Available from: http://www.balabit.com/networksecurity/syslog-ng [15] GOOGLE, Inc. Google Developers: Protocol Buffers [online]. 2014 [cit. 2014-10-29]. Available from: https://developers.google.com/protocol-buffers ISBN: 978-1-61804-262-0 107