and integrating with Elasticsearch, Hadoop and Cassandra November 21, 2014
About me My name is Valentin Fischer-Mitoiu and I work for the University of Vienna. More specificaly in a group called Domainis (Internet domain administration). We operate the nameservers, monitoring and much more for nic.at, the.at registry. I m a monitoring guy and sys admin, working daily with systems like Logstash, Icinga, Elasticsearch, Hadoop, Cassandra.
Scope of this presentation 1. This presentation is meant as a walk-through for starting monitoring logs using logstash. 2. Describing possible integration of logstash in big data environments.
Logstash: What is it? Logstash is a software that allows you to send, receive and manage log files from various sources. It appeared in 2011 and it is getting a lot of interest because of its flexibility and power of configuration. Based on Ruby, works as an agent/daemon and provides a high level of abstractization for its configuration, similar to puppet.
Logstash: How we use it Logstash: How we use it We started by testing it and see what types of logs we can manage with it. The initial setup was processing logs related to dns, login, postresql, soap, epp, and system. We started using it in production and feeding logs from various sources.
Logstash: Why we use it Logstash: Why we use it We use logstash because we want to use its power in managing logs and generate events based on specific triggers. We cover states with icinga and adding logstash would cover the whole monitoring spectrum.
Logstash: Deploying fast and simple Deploying logstash is quite simple, the challenge is in generating a configuration. In most cases it s recommended to start with a centralized setup. This means having a shipper, queue, indexer and some storage system.
Logstash: Overview of a centralized setup
Logstash: Hardware requirements Logstash: Hardware requirements In order find out how much hardware you need for your logstash setup, first think how much data you process daily and if you need to search into it. Another factor is the complexity of your configuration, specially filters.
Logstash: Working with inputs Logstash: Working with inputs This is how a file input looks like 1 i n p u t { f i l e { 3 codec => p l a i n path => / dns / l o g s / p r o c e s s i n g q u e u e /. t a s k s 5 s t a r t p o s i t i o n => [ b e g i n n i n g ] t y p e => p r o c e s s i n g t a s k s 7 } }
Logstash: Working with filters Logstash: Working with filters This is how a filter looks like f i l t e r { 2 grok { a d d t a g => [ p r o c e s s e d t a s k s ] 4 match => [ message, %{IP : c l i e n t } %{WORD: method} %{URIPATHPARAM: r e q u e s t } %{NUMBER: b y t e s } %{NUMBER: d u r a t i o n } ] t a g o n f a i l u r e => [ f a i l e d t a k s ] 6 } }
Logstash: Working with outputs Logstash: Working with outputs This is how an output looks like 1 output { e l a s t i c s e a r c h { 3 c l u s t e r => s t o r a g e c l u s t e r codec => p l a i n 5 } }
Logstash: Working with Elasticsearch Logstash: Working with Elasticsearch Elasticsearch is distributed RESTful search and analytics engine. This gives logstash great searching power using the lucene syntax. It is also a powerful storage system that can make use of indexes, facets, templates and much more.
Logstash: Working with Elasticsearch Logstash: Elasticsearch overview
Logstash: Working with Kibana Logstash: Working with Kibana Kibana is a web interface used to search (query) the elasticsearch cluster. The interface can be grouped in dashboards, each of one containing different layouts. Each layout can have: pies, maps, charts, histograms, tables, trends, etc.
Logstash: Working with Kibana Logstash: Dashboard overview
Logstash: Deployment example Logstash: Deployment example You now know the basics, an actual deployment would look like this. 1. Download a package from www.elasticsearch.org 2. Setup your shipper 3. Setup your queue 4. Setup your indexer 5. Setup your storage
Logstash: Setting up the shipper/forwarder Logstash: Setting up your shipper/forwarder A shipper/forwarder is an agent that usually contains some input (port, file, etc) that gets some data and sends it to an output (a queue if using a centralized setup). To setup your shipper you generally need an input and output section. You can also do preprocessing in most shipping agents.
Logstash: Shipping logs Logstash: Shipping logs Log Courier ( lightweight, fast, secure, stable ) Logstash ( heavy, fast, stable, well developed ) Logstash-forwarder ( fast, secure, well developed ) Nxlog ( no java, lightweight, mainly used for windows logs ) Beaver ( lightweight, fast, python)
Logstash: Setting up the indexer Logstash: Setting up your indexer An indexer is a logstash agent that contains some logic (filters) and does some processing on messages. To setup your indexer you generally need an input, filter and output section.
Logstash: Indexer example Logstash: An example of a working indexer An example of a running indexer.
Logstash: Setting up the queue Logstash: Setting up your queue In order to have a more robust and centralized setup we decided to use a queue. This will keep all your messages and wait for indexers to process them. The most used queue with logstash is redis and rabbitmq.
Logstash: Setting up the storage cluster Logstash: Setting up your storage cluster In order to store all our logs, we will setup an elasticsearch cluster. We will use it also because of its advanced search capabilities. Configuration is fairly simple. Edit elasticsearch.yml and make the necessary changes. After that, set the max RAM limit for your ES node (ES MAX MEM).
Logstash: Setting up the Kibana instance Logstash: Setting up your storage cluster Kibana is a tool used to visualize your data stored in Elasticsearch. You can do complex queries using the lucene language and graph the data in various charts, maps, etc. e.g MESSAGES WHERE LOG SOURCE=( shipper1, shipper2 )
Logstash: Debugging and errors Logstash: Debugging and errors Logstash is pretty clear and descriptive when it comes to errors but you can always troubleshoot more using debug mode (-debug). Another useful tip is sending your logs to stdout output and see more information if you have issues with your messages format. To see all the flags just use -h or help.
Logstash: Getting help Logstash: Getting help Usually if you need help you can enter IRC and channel #logstash on network Freenode. You will find a large community and people willing to help you. I m also present there with nickname vali. You can also visit the official documentation present on http://logstash.net
Logstash: Developing your own add-ons Logstash: Developing your own add-ons To develop your own addon (input,filter,output) you need ruby knowledge. You can find more information at http://logstash.net/docs/1.4.2/extending/ Basically you follow a standard structure, add your methods and required arguments. Do some processing with the incoming event, modify it or return a value.
Logstash: Integrating with big data systems In our case, integrating with big data systems was done via KairosDB. This acts as a middleware and allows us to use as storage system Hadoop or Cassandra. In our case the sending was done via the opentsdb output. And custom scripts sending directly to KairosDB via a TCP port.
Logstash: Using Hadoop Logstash: Using Hadoop The target is to develop a big data system that allows use to store events/data from various systems/sources forever. The system has to be able to be queried using a simple interface and its data returned in a useful format (json) and be graphed easily.
Logstash: Using Cassandra Logstash: Using Cassandra The same idea as using HBase + HDFS but exploring different storage systems. We can still use Hive, Pig and MapReduce queries. More simple to setup
Logstash: Using Elasticsearch Logstash: Using Elasticsearch Elasticsearch has the same characteristics as other NOSQL systems. Some advantages over casssandra and hadoop: - direct access from logstash, no need for any other middleware. - great web interface to allow easy access to the data and excellent graphing options. - powerful API to retrieve data
Logstash: Conclusion In our case logstash seems to be a great candidate for covering the log management spectrum. Its ease of usage, great collection of plugins and easy configuration makes it a great tool for managing logs. It is definitely a tool worth exploring for this kind of monitoring.
Questions This would be a great time to ask questions. :)
The end Thank you!