syslog-ng 3.0 Monitoring logs with Nagios Scheidler Balázs balazs.scheidler@balabit.hu
Table of Contents Short introduction to syslog The syslog-ng story Changes in the log processing landscape New vision for syslog-ng New features in syslog-ng 3.0 Practical example: monitoring logs with nagios
Introduction to syslog I.
Introduction to syslog II. The original system log was written by operators time and date host explanation of the event With this background, no wonder that when Eric Allmann invented syslog it became basically the same: May 18 09:17:01 bzorp CRON[2284]: (root) CMD ( cd / && run parts report /etc/cron.hourly) May 20 09:07:50 bzorp sshd[1847]: Failed password for bazsi from 10.10.2.1 port 42511 ssh2 May 20 09:07:52 bzorp sshd[1852]: Accepted password for user from 10.10.2.1 port 42512 ssh2 May 20 09:07:52 bzorp sshd[1856]: pam_unix(sshd:session): session opened for user bazsi May 20 09:07:54 bzorp sshd[1856]: pam_unix(sshd:session): session closed for user bazsi It even lacks a year in its header, that information is implied, just like in the old syslog book :)
Introduction to syslog III. Jokes put aside, here is the list of important properties of syslog that makes it what it is today: when something happens the device emits a message to the system log (instead of being passively monitored) Syslog messages are unstructured It is trivial to add logging to an application, it is also trivial to send many details (debug & troubleshooting info) syslog has (and always had) facilities to collect all logs from all devices company wide In a lot of cases syslog is the only connection to the operator (think of embedded devices like a switch or a router) Because of the above reasons syslog is ubiquitous Common ground for network equipment and servers alike
syslogd, the original UNIX syslog stuff syslogd was developed as a subsystem of sendmail (the first mail transport agent on UNIX systems) It was capable of centralizing log messages in a network, but had various shortcomings: uses UDP transport, which loses messages (can be up to 90+% in extreme cases) the original facility based filtering is not covering all systems, especially non-unix ones Nevertheless it was: very simple to use and deploy it was good enough for about 20 years it was good enough to standardize all kinds of equipment on
The syslog-ng story Designed for central log collection since the beginning First release was in 1998 The most widespread syslogd alternative, part of various Linux distributions (Debian, SUSE, Ubuntu, Fedora, ) Operates in multiple global networks with tens of thousands of devices @nasa.gov, @lanl.gov, @hq5.army.mil,... Development funded by BalaBit: Open Source Edition: continuing the OSE success Premium Edition: commercial edition released in 2007 syslog-ng Store Box: appliance version released in 2008
The reasons for collecting logs is shifting Earlier, logs were collected primarily for IT management reasons: troubleshooting and forensics, but only in case of an incident The focus is changing: security incident management (SIEM) regulatory reporting (user login/logout, etc) alerting based on correllated/aggregated information The point is: earlier, logs were processed by humans if there was a need these days logs need to be processed regularly and automatically the content of the message becomes more & more important
New vision for syslog-ng Since the needs change, syslog-ng needs to change too The syslog-ng vision needs adjustments Being merely a log transport infrastructure is important, but not enough. syslog-ng is a log router, sending messages to further log analysis devices, doing prefiltering and aiding analysis The content of messages matter, extracting the information from messages is crucial syslog-ng is a great integration platform and is in a good position to influence the syslog message flow syslog-ng 3.0 with its new features is a step in the new direction
New features I. Transport infrastructure enhancements latest syslog standardization work: supports the new RFC5426 is capable of converting between old and new style syslog formats encrypted transport: TLS encrypted connections about 70% improved performance over syslog-ng 2.0 Features of previous syslog-ng versions no message loss: utilizing TCP based transport with flow control portability: supports a wide variety of UNIX systems and architectures IPv6 support
New features II. syslog-ng is a log router all syslog messages are going through syslog-ng simply storing them in files is not enough: send them to further devices along the chain (Splunk, ArcSight, MARS, etc) send them to home-grewn scripts performance is crucial (hence the 70% improvement) syslog-ng is in a good position to preprocess logs classification filtering alerting preliminary analysis
New features III. Content related functions unstructured messages, information needs to be extracted classification is important in selecting/analyzing logs name-value pair support Extract information from messages: csv-parser(): parse CSV-like formats (like Nagios logs) db-parser(): based on a log format database, extract variable information into name-value pairs (more on this later) Rewrite the contents of messages rewrite framework allows to change any textual component of the log message: fix up messages before analysis (set() and subst())
Log processing pipeline in 2.0 Log statement: Source Filter Destination tcp(); program( nagios ); file( nagios.log ); file( /var/log/nagios.log );
Log processing tree in 3.0 Log processing tree: P S R subst( foo, $PROGRAM ); F P D csv parser(); P R D F R D file( nagios.log ); db parser(); match( violation value(.classify.class ));
Getting at content, parsers I. A parser is an element in the processing tree: analyzes the content of the syslog message extracts variable information from messages extracted information is associated with the message as namevalue pairs name-value pairs can be used whereever macros can be used filenames/sql columns/rewrite rules/etc. Two kinds of parsers are supported right now csv-parser() to parse CSV and similar formats db-parser() to parse any kind of message based on a message pattern database
csv-parser() A simple parser that understands Comma Separated Values format (though not limited to just commas ) Each column is parsed into a name-value pair Practical examples: Nagios notification logs Apache logs CURRENT SERVICE STATE: switch8;ping;ok;hard;1;ping OK Packet loss = 0%, RTA = 4.10 ms CURRENT SERVICE STATE: switch9;ping;ok;hard;1;ping OK Packet loss = 0%, RTA = 3.13 ms CURRENT SERVICE STATE: tcamon;ping;ok;hard;1;ping OK Packet loss = 0%, RTA = 1.57 ms CURRENT SERVICE STATE: tcamon scb;ping;critical;hard;1;critical Host Unreachable (10.1.31.2) CURRENT SERVICE STATE: test1;ping;ok;hard;1;ping OK Packet loss = 0%, RTA = 1.61 ms CURRENT SERVICE STATE: test6;ping;critical;hard;1;critical Host Unreachable (10.100.0.6) Drawback: it only recognizes one specific format and the syslog-ng config file easily becomes crowded.
db-parser() Recognize logs based on a log pattern database The syslog-ng config file contains only one parser reference, thus it is easy to follow: parser p_db { db parser(); }; log { source(src); parser(p_db); destination(dst); }; Additional things it does: associate classification: ${.classifier.class} associate matching pattern ID: ${.classifier.rule_id} extract information into name-value pairs: other macros
The pattern database The on-disk format is in XML, which is loaded at startup It does not use Regular Expressions, because: regexps are difficult to write properly (IPv6 address) regexps are even more difficult to understand once written regexps do not scale to large number of patterns regexps do not scale to high number of events/sec Performance: Pattern matching costs about 10-15% of performance relative to the performance of storing logs in files. The algorithm is O(1) on the number of patterns, only the length of the patterns is what counts
Pattern examples Parsing packet filter & nagios service notification logs <patterndb version='1' pub_date='2009 04 17'> <program name='pf'> <pattern>pf</pattern> <rule id='1' class='pf'> <pattern>@string:pf.verdict@ @STRING:PF.CHAIN:/@ IN=@STRING:PF.IN_IFACE@ OUT= MAC=@STRING:PF.MAC::@ SRC=@IPV4:PF.SRC_IP@ DST=@IPv4:PF.DST_IP@ LEN=@NUMBER:PF.PKT_LEN@ TOS=@STRING:PF.TOS@ PREC=@STRING:PF.PREC@ TTL=@NUMBER:PF.TTL@ ID=@NUMBER:PF.ID@ DF PROTO=@STRING:PF.PROTO@ SPT=@NUMBER:PF.SRC_PORT@ DPT=@NUMBER:PF.DST_PORT@ WINDOW=@NUMBER:PF.TCP_WINDOW@ RES=@STRING:PF.RES@ SYN URGP=@NUMBER:PF.TCP_URGP@</pattern> </rule> </program> <program name='nagios'> <pattern>nagios</pattern> <rule id='2' class='alert'> <pattern>service NOTIFICATION: @ESTRING:nagios.contact:;@;@ESTRING:nagios.host:;@;@ESTRING:nagios.service:;@;@ESTRING:nagios. state:;@;@estring:nagios.notify_script:;@;@anystring:nagios.output@</pattern> </rule> </program> </patterndb>
Using extracted data db-parser() extracts information from log messages and associates name-value pairs with the message. Let's put that in an SQL table: destination d_nagiosdb { sql(type(pgsql) host(localhost) database(logs) username(...) password(...) table("nagios_alerts") columns("date timestamp ", "contact", "host", "service", "state", "output") values("$fulldate", "${nagios.contact}", "${nagios.host}", "${nagios.service}", "${nagios.output}") indexes("date", "contact", "host") ); }; We could do the same with all Nagios message types, each with a separate table Alternative to NDOUtils :)
Monitoring logs with Nagios We want to monitor whether a given string appears in the system log Nagios has several plugins to do this: check_log.sh in Nagios plugins check_log.pl in mundle Nagios plugins Possible problems using these solutions they use regexps (slow & difficult to write) they can hardly scale to large logfiles: check_log uses diff to get the differences to look at check_log.pl keeps state, but at the same time applies each monitored regexp to each line iteratively: O(N*M) These problems basically makes these tools unusable for large-scale deployments
Automatic log checking with Nagios Collect the logs via syslog Add patterns to the patterndb that describe the log messages you want to get notified about Classify the patterns into nagios.critical, nagios.warning Notify nagios about matching log messages syslog-ng program() output template( ${.classification.class} $DATE $HOST $MSG\n ); script that reads each line and sends the result to Nagios via NSCA No need to read log files from disk, syslog-ng does the heavylifting the rest is just integration
Other noteworthy features in 3.0 BalaBit supported, free binary packages to free UNIX platforms (Linux, BSD) log statements can be embedded to form a tree-like log processing structure support for character encodings support for include files added support for time zone names (like Europe/Berlin ) automatic restarts in case of an unlikely crash added support for Perl Compatible Regexps (PCRE) and shell like globs statistics framework to collect more stats
Further plans Community built pattern database BalaBit already released some patterns for its SSB product we want to do this transparently with the help of the community Classification improvements support for multiple tags (as in tag clouds) for messages can then be used for even more flexible filtering SQL output improvements put SQL schema to the pattern database Transport improvements compression without TLS, application layer ACKs,...
Summary The syslog-ng vision has been adjusted: syslog-ng is not a mere log transport infrastructure anymore Its new features peek into the log analysis sphere The new power is combined with the log transport capabilities Practical examples
Thanks for listening. Any questions? Mailing list: syslog-ng@lists.balabit.hu Author: bazsi@balabit.hu Web: www.balabit.com 26