Scalable Logging Solutions on Cloud

Scalable Logging Solutions on Cloud Saurabh Phaltane*, Anand Nahar** & Nikhil Garge*** *Amazon and Linux Admin, Cloud, e-zest Solutions, Pune, Maharashtra, INDIA. E-Mail: saurabh.phaltane{atgmail{dotcom **Subject Matter Expert, Amdocs, Pune, Maharashtra, INDIA, E-Mail: andy9391{atgmail{dotcom ***Software Engineer, e-zest Solutions, Pune, Maharashtra, INDIA. E-Mail: nikhilgarge007{atgmail{dotcom Abstract Centralized Logging from Scalable Servers on Cloud to estimate, analyze and predict the performances on Cloud is a need of the day for the system administrators where aggregating the logs and advanced analytics in real-time serves a major tool for system administrators to track and debug various critical issues in environments. The scope of the paper revolves around the challenges and different approaches to address the issue and experiences of implementing the logging solutions in Amazon Cloud VPC environments, applicable in general cloud environments. The paper discusses different approaches that can used to aggregate the logs in Cloud environments and present the overview of designing architectures for the same. The paper also presents a case study for configuring and optimizing graylog2 on cloud under high logging conditions. Keywords Analytics; AWS; Centralized Logging; Cloud Computing; ElasticSearch; Graylog; Logstash; Optimization; Rsyslog; VPC. Abbreviations Amazon Web Services (AWS); Transmission Control Protocol (TCP); Virtual Private Cloud (VPC). I. INTRODUCTION T HE Era of Cloud and Cloud computing has put forth new metrics of Agility and scalability in place. The targeted efforts and energy investments of the sys admins have significantly taken new direction where the developers/administrators have adopted new approaches for management and monitoring the Applications on cloud. The Scalability and Elasticity renders new set of challenges to be addressed and log analytics for the purpose of breaking the actual event and determining the root cause of Application performance debugging still maintains its monopoly. Logging in logs from different scalable environments to a centralized location was far more an ancient traditional point of concern and sys admins expects cleaner, faster and easier modules for analytics over the collected logs to predict analyze and enhance the performance of the Applications. According to Rajiv Bhandari & Nitin Mishra (2011), Cloud computing is most probability of collection such as service oriented topic. In storage total growth is 54% of Explosion of information. Large scientific calculation such as medicine, forecast, and healthcare is most energetic and faster processing capacity II. SCOPE The scope of paper revolves around the different challenges and approaches to collect, analyze the logs in scalable cloud environment where agility and reliability are the major points. The paper puts forth the different approaches to centralize the logs and puts forward these cases of graylog and test results obtained after optimizing logging solution in high logging environment. III. UNDERSTANDING THE DIFFERENT LOG This list below tries to highlight the different logs in Linux environments with different understanding the different logs provide as highlighted by Ramesh Natarajan (2014). 1. /var/log/messages This contain the general system messages that are generated right from the system boot that may include kern, auth, mail etc. 2. /var/log/dmesg information about the hardware devices that the kernel detects during boot process 3. /var/log/auth.log contains user authorization and access level permission logs 4. /var/log/boot.log Contains information that are logged when the system boots 5. /var/log/daemon.log The daemons running in background logs these logs to 6. /var/log/dpkg.log The dpkg are the package installer logs generated during packages updates and installation and removal. 7. /var/log/kern.log The kernel level logs are the crucial piece of logs useful in debugging kernel level issues. 8. /var/log/lastlog Displays the recent login information for all the users. ISSN: 2321-2381 2014 Published by The Standard International Journals (The SIJ) 208

9. /var/log/maillog The mail logs provide us the logs that help in debugging the sendmail, mail relay, mailqueue logs 10. /var/log/user.log Contains information about all user level logs 11. /var/log/xorg.x.log Log messages from the X 12. /var/log/alternatives.log Information by the updatealternatives are logged into this log file. 13. /var/log/btmp The file logs the information about the about failed login attempts. 14. /var/log/cups The printer and print server logs are generated here. 15. /var/log/anaconda.log The installation level logs are stored in anaonda.log 16. /var/log/yum.log The packages and patch level logs are generated and stored in the yum logs. 17. /var/log/cron The system setlogs are stored in cron logs and useful in debugging the cron set by the servers. 18. /var/log/secure The secure is the most important piece to track when a need to debug the security issues,intrusion attack etc are detected. 19. /var/log/wtmp Contains login records. With wtmp we can find who has logged in the system. IV. MORE LOGS Apart from the above stated logs the system performance determined by the application level logs that might arise out of: Webservers; o Access Logs o Error logs Performance Logs and metrics Application logs Program Exceptions (e.g. Java Exceptions) o Application general Exceptions Application custom generated set exceptions. According to Jan Waldamn, providers of web content were the first one who lack more detailed and sophisticated reports based on server logs. They require to detect behavioral patterns, paths, trends etc. Simple statistical methods do not satisfy these needs so an advanced approach must be used. The events and logs are many, however collecting logs to centralized location and alert generation and smart way of analytics over this collected data and point in time triggers in any application that convey certain and fruitful information are all necessary to analyze, debug and predict the performance of applications on cloud. The above stated logs are the wide domains of logs, the system whose entire health is determined by the output of the logs and predicting and determining the performance of the application becomes a major concern in cloud environments where machines are remotely located and created /destructed dynamically on scale up and scale down. According to Wolfgang Ley & Uwe Ellerman, one approach is to have a human expert define a set of message patterns to find, along with desired actions to be taken when encountering them. The effort invested in writing and maintaining these rules is proportional to the number of message types and the rate at which they change. Another approach according to Vaarandi (2002), for log analysis focuses on visualizing the log data in a useful way. According to Osmar R. Zaane et al., (1988), there are over 30 commercially available applications for web log analysis and many more free available on the Internet. Regardless of their price, they are disliked by their user and considered too slow, inflexible and difficult to maintain. Effective measure to centralize the logs in real time and logs processing to generate the real time alerts in scalable and dynamic environments is really point of concern for the Applications on cloud and can be addressed in various methods of log aggregation and I would like to highlight the approaches of log aggregation and method by which we achieved the logging in the big environment of 200+ Linux environment. V. LOGGING SOLUTIONS AND DISTRIBUTED LOGGING According to Distributed syslog architectures with syslog-ng Premium (2013), when implementing a distributed system logging infrastructure, you must ensure that the following requirements are fulfilled: The messages sent by the end systems arrive to the server (reliable transfer). No messages are lost when the network or the server is temporarily down (disk buffer). Communication to the central server is encrypted, so third parties cannot gain access to sensitive data (SSL/TLS support). o 5.1. Rsyslog The identity of the end systems is verified, so it is not possible to inject fake log messages into the central logs Rsyslog Log stash Graylog According to Peter Matulis (2008), most widely used logging solution and most preferably on Linux/Unix systems is log aggregator for major system logs. The rsyslog [http://www.rsyslog.com/doc/master/, 2014] provide various facilities to send the logs to remote locations over TCP or UDP layers. Full Track and logging control in environments advanced filtering capabilities. Collect, transform and transfer or centralize the logs this tool certainly deserves the praise from administrators as the Swiss knife. ISSN: 2321-2381 2014 Published by The Standard International Journals (The SIJ) 209

LogStash provides a wide support for easy to use parsers that are more user friendly over the perl based parsing syntaxes. The three blocks with relevant filters makes log processing a great go. LogStash provides a real-time log processing on the go and certainly serves a best tool available for analyzing and making more out of our data. For a Free and open source, scalable logging solutions with advanced analytical capabilities then Graylog tops as a perfect logging solution. Grok based filters can be specified: 55.3.244.1 GET /index.html 15824 0.043 The pattern for this could be: %{IP: client %{WORD: method %{URIPATHPARAM: request %{NUMBER: bytes %{NUMBER: duration Figure 1: Rsyslog Centralized Logging Schematic View According to Peter Matulis (2008), Rsyslog helps achieve: Event analysis Event reporting Event remediation Event viewer Event logging architecture Efficient Architecture design is an important consideration while designing the architecture for centralized logging with rsyslog. 5.2. LogStash Figure 2: Sys-Logger Architecture According to [http://logstash.net/docs/1.4.2/, 2014] is the next solution that is available in both open source and proprietary version gives the power to dig into your logs, roll and dice your logs and certainly determine the mines of your data. The easy configurable script of log stash with advanced support to with wide availability of diverse plug-in the tool greatly integrates with many input and output solutions. More interesting fact about logstash comes from the fact about its advanced support for drool based logs mangling and advanced functionality to crop the wide raw log data into parse able and more manageable, meaningful data for analysis. The mere syntactical format gives a great relief in configuration: input { filter { Output{ 5.3. Graylog2 According to [https://github.com/graylog2/graylog2-server, 2014] the Graylog2 App is the core component of the Graylog installation that acts as a centralized logger. The Graylog App logs the incoming logs on the UDP/TCP port into the appropriate elastic search database. Written in Java with advanced analytical capabilities, the core logic is to efficiently use the REST API call of ElasticSearch that lies with the Graylog App. We employed a combo of the above technologies and achieved a scalable logging solution on cloud. According to Peter Matulis (2008), the rsyslog agent running at individual nodes that aggregated the data and pushed the logs over to the centralized logger. /etc/rsyslog.conf *.* @your_logging_server Graylog: The minimal configuration in our rsyslog configuration was able to deliver the logs to a centralized logging repository on cloud. The application level logs can be polled and piped by using the imfile module of the rsyslogger that effectively greps you from major tasks of configuring the logs to a central location. $ModLoadimfile # needs to be done just once # File 1 $InputFileName /path/to/file1 $InputFileTag File-tag1: ISSN: 2321-2381 2014 Published by The Standard International Journals (The SIJ) 210

$InputFileStateFilestat_pointer $InputFileSeverityseverity_of_log $InputFileFacility local6 $InputRunFileMonitor The solution provided a classis solution to serialized Webserver log of Apache, ngnix, etc. Messages that was generated the extension added in value by logging in logs very easily all the single liner logs. The major concern for Application logs where the logs are not serialized those generated by Tomcat applications really makes it use the advanced logging techniques like GELF [http://graylog2.org/resources/gelf, 2014] format for graylog is one of the useful format for logging the nonserialized messages. As per Anton Yakimov (2013), the configuration looks like given below: # Define the graylog2 destination log4j.appender.graylog2=org.graylog2.log.gelfappender log4j.appender.graylog2.grayloghost=graylog2.example. com log4j.appender.graylog2.originhost=my.machine.exampl e.com log4j.appender.graylog2.facility=gelf-java log4j.appender.graylog2.layout=org.apache.log4j.pattern Layout log4j.appender.graylog2.extractstacktrace=true log4j.appender.graylog2.addextendedinformation=true log4j.appender.graylog2.additionalfields={'environment' : 'DEV', 'application': 'MyAPP' # Send all INFO logs to graylog2 log4j.rootlogger=info, graylog2 After the logs are concentrated at single location to the graylog centralized logger all its need to index the logs and do the real analytics on the logs generating the true analytics on the logs. The Graylog provides and similar even LogStash provide the high end capabilities to log in the messages of the logs. The core of graylog logger is the Apache lucence based framework called ElasticSearch, which is the one that plays the entire magic of logging the logs. By passing the graylog and developing a custom logger to this indexer is all way possible. But the scalable capability the Graylog provides is really remarkable: Flexibility to pre-parse the logs with Grook based regexz filters Ability to index logs with tags and filter Almost real-time filtering and index management in the ElasticSearch nodes. Figure 3: Graylog2 Architecture The setup depicted below provides a more robust configuration that can be configured in AWS environments. Each and every log hitting the Graylog can be evaluated against the drools regex to log the message in the most effective and useable format. According to [http://graylog2.org/resources /documentation/general/rewriting] the Drools functionality comes with in-built support by enabling the Graylog.drl the drool rules files in the Graylog configuration. The set of rules in the Graylog.drl can be customized to parse the relevant log and extract the important files in the message. Along with parsing the messages of Graylog itself, the GELF Loggers supported in diverse languages including per, python, java,.net etc. play it easy to log application level logs to Graylog. The GELF Format can be pre-parsed and sent in JSON String or zipped format over UDP to the Graylog server. Graylog logger App server can be scaled up to multiple instances logging into same ElasticSearch bucked in events of high Message logging. Optimization of Graylog Server for High Load Conditions For High Load environments where the number of messages per second exceed 200 messages per second we observed the Graylog server needs some performance tunings and we worked around the configurations and we find that the above configuration in the graylog can be tuned for High Load environments: We observed the use case under a load of 500 messages and we configured and optimized the configuration and found it optimal. The batch size of the number of messages that are delivered to the ElasticSearch and we found the default 5000 as working great. output_batch_size = 5000 ISSN: 2321-2381 2014 Published by The Standard International Journals (The SIJ) 211

The number of default running processors can be optimized here and can be increased, however we find increasing the number of parallel processing reflected in high CPU so we find the configuration optimal for the above messages. Raise this number if your buffers are filling up is recommended but we recommend not increasing the value. processbuffer_processors = 5 outputbuffer_processors = 5 The processor wait strategy Blocking is best suited when the throughput required is high, however when chances on compromising performance to CPU load the strategy to utilize the yielding worked for us. processor_wait_strategy = yielding For Optimal performance tuning the ring size comes very handy and useful when the very high logging is expected. As per the cache space available we arrived to a ring size of for the above scenario. ring_size = 2048 These are few of the parameters that we have to optimize in order to have optimal performance under High load conditions and there is no specific formula to calculate the values but the above benchmarked values work great. Apart from the above configurations the ElasticSearch needs configurations and Optimizations and we find the optimal allocation of heap can certainly help sort out many issues. The Allocation heap size of about 70% of your available RAM for a dedicated Machine can effectively help you decrease the load and optimize the garbage collector run cycles. Also the Java Heap Xmx and Xms are recommended to be equal (Xmx = Xms) for optimal performance of ElasticSearch node. Running multiples nodes of ElasticSearch help you achieve High availability and disaster recovery in High Logging environments and zenautodiscovery is the preferred method that we observed. VI. CONCLUSION Log Aggregation on environments on cloud where the environments are scalable sets where machines are dynamically created and destructed new set of challenges to be addressed and implementation of efficient architectural design with right Tools and Technologies robust logging aggregator becomes a need to predict, analyze and debug critical issues and plan proactive actions as necessary. The solutions provided in this tutorial highlight few of the solutions that can address the issue of centralized logging and case study of graylog provides an overview of our experimentation and results achieved in the High logging Amazon VPC environment with about 200+ servers logging on average 500+ messages per second. Analyzing Security, performance and predicting failure detection is large environments where security of data and logs are too important the above stated solutions can help achieve the required results to a considerable extent. REFERENCES [1] Rsyslog.http://www.rsyslog.com/ [2] http://graylog2.org/gelf [3] Wolfgang Ley & Uwe Ellerman. Logsurfer. http://www.cert.dfn.de/eng/logsurf/. [4] Rajiv R. Bhandari & Nitin Mishra (2011), Encrypted IT Auditing and Log Management on Cloud Computing, IJCSI, 2011. [5] Jan Waldamn, Log File Analysis Technical Report. [6] Peter Matulis (2008), Centralised Logging with Rsyslog. [7] Osmar R. Zaane, Man Xin & Jiawei Han (1998), Discovering web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs, Proc. Advances in Digital Libraries ADL 98, Santa Barbara, CA, USA, Pp. 19 29,. [8] BalaBit IT Security Ltd., Whitepaper Distributed Syslog Architectures with syslog-ng Premium Edition, 2013. [9] Ramesh Natarajan (2014),Thegeekstuff.com [10] R. Vaarandi (2002), Sec - A Lightweight Event Correlation Tool, IEEE IPOM 02 Proceedings. [11] Anton Yakimov (2008), Https://Github.Com/t0xa. Saurabh Phaltane. Saurabh is working at e- Zest Solutions Ltd in Cloud Computing Domain as Linux and AWS Administrator. AWS Certified solutions architect has completed his graduation from Maharashtra Institute of Technology, Pune (Pune University). Profound experience on configuring and managing virtual environments, he has published research work on Apache Web Server Monitoring in IJSER international journal. Learning and discovering new opportunities and leading an entrepreneurial life with profound technopreneural skills motivates him. Anand Nahar. Anand Nahar is a Subject Matter Expert (SME) at Amdocs (since 5 th Aug. 2013 till present day) and works in Software Development and Implementation (SD&I) of Amdocs. He has completed his undergraduate education (B.E.) from Maharashtra Institute of Technology, Pune (Pune University). He has worked on Real Time Physics Simulation Engine on heterogeneous platform using OpenCL and OpenGL as a part of his final year project. Not only does he cherish taking part in various technical events but also is a part of many cultural and extracurricular events. Amazon Cloud. Nikhil Garge is working at e-zest Solutions Ltd in Cloud Computing Domain as Software Engineer. He has Completed his graduation from Pune university and earns a degree in bachelors in Information Technology.Profound experience on configuring and managing virtual environments Nikhil has interests in configuring and developing Applications on ISSN: 2321-2381 2014 Published by The Standard International Journals (The SIJ) 212