UNIVERSITY OF NEBRASKA AT OMAHA Logging on a Shoestring Budget James Harr jharr@unomaha.edu
Agenda The Tools ElasticSearch Logstash Kibana redis Composing a Log System Q&A, Conclusions, Lessons Learned
Tools ELK ElasticSearch Kibana LogStash redis
JSON JavaScript Object Notation ELK Stack s Data Format Scalars: string number true/false/null "James Harr" 3.14159 true, false, null Complex Types: Object (name/value) List (array of values) {"first":"james", "last":"harr", "age":30 [1, 2, 3, "you get the idea, null]
JSON An Example { "first": "James", "last": "Harr", "age": 30, "facebook": null, "twitter": "DNABlob", "googleplus": "james.harr", "emails": [ {"type":"work", "email":"jharr@unomaha.edu", "reply_rate":0.9, {"type":"home", "email":"james.harr@gmail.com", "reply_rate":0.1 ], "tags": [ "network", "unomaha", "nebraska", "nerd" ]
ElasticSearch Document Database Stores JSON Indexes everything No foreign keys No transactions Scalable Fast, I/O Friendly Easy to administer
Kibana WebUI to query ElasticSearch and visualize Data Full-text search Search by field Shareable dashboards Widget-Based UI Lists, Charts, Maps, etc
LogStash logstash is a unix pipe on steroids John Vincent
LogStash - Hello World input { stdin { codec => "plain" output { stdout { codec => "rubydebug"
LogStash - Conditionals filter { if [message] =~ /DHCP[^ ]+/ { mutate { add_tag => dhcp grok { output { elasticsearch { if dhcp in [tags] { tcp { codec => jsonlines host => security port => 1234
LogStash - GROK filter { grok { match => { message => "SRC=(?<src_addr>\d{1,3\.\d{1,3\.\d{1,3\.\d{1,3)"
LogStash - GROK filter { grok { match => { message => "SRC=%{IP:src_addr"
LogStash - GROK Match Patterns: %{PATTERN:field %{PATTERN:field:int %{PATTERN:field:float Pattern Library: 306 built-in patterns, tested Reasonably easy to add your own $ (cd patterns; grep - vce '^$ ^#' *) aws:6 bacula:47 bro:4 exim:12 firewalls:44 grok- patterns:76 haproxy:7 java:13 junos:4 linux- syslog:10 mcollective:1 mcollective- patterns:2 mongodb:7 nagios:61 postgresql:1 rails:7 redis:2 ruby:2
LogStash - GROK %{HAPROXYHTTP translates to %{SYSLOGTIMESTAMP:syslog_timestamp %{IPORHOST:syslog_server %{SYSLOGPROG: % {IP:client_ip:%{INT:client_port \[%{HAPROXYDATE:accept_date\] % {NOTSPACE:frontend_name %{NOTSPACE:backend_name/%{NOTSPACE:server_name % {INT:time_request/%{INT:time_queue/%{INT:time_backend_connect/% {INT:time_backend_response/%{NOTSPACE:time_duration %{INT:http_status_code % {NOTSPACE:bytes_read %{DATA:captured_request_cookie % {DATA:captured_response_cookie %{NOTSPACE:termination_state %{INT:actconn/% {INT:feconn/%{INT:beconn/%{INT:srvconn/%{NOTSPACE:retries %{INT:srv_queue/% {INT:backend_queue (\{%{HAPROXYCAPTUREDREQUESTHEADERS\)?( )?(\{% {HAPROXYCAPTUREDRESPONSEHEADERS\)?( )?"(<BADREQ> (%{WORD:http_verb (% {URIPROTO:http_proto://)?(?:%{USER:http_user(?::[^@]*)?@)?(?:% {URIHOST:http_host)?(?:%{URIPATHPARAM:http_request)?( HTTP/% {NUMBER:http_version)?))?"
LogStash - GeoIP filter { grok { match => { message => "SRC=%{IP:src_addr" geoip { source => "src_addr" target => "src_geo"
LogStash - statsd output { if "firewall" in [tags] { statsd { host => "localhost" count => [ "firewall.%{rule_name.bytes", "%{bytes" ] statsd { host => "localhost" count => [ "firewall.%{rule_name.hits", "1" ]
Inputs, Filters, Outputs Inputs stdin, stdout file eventlog (win32) twitter snmptrap tcp, udp codec => syslog codec => netflow codec => jsonlines redis rabbitmq Filters grok multiline mutate drop clone metrics dns geoip useragent anonymize elapsed elasticsearch Outputs stdin, stdout file redis rabbitmq tcp, udp elasticsearch mongodb nagios opentsdb statsd graphite
redis Message Queue Server Queue Like a mailbox Can have multiple senders. Can have multiple receivers. Each message goes to one receiver. No receiver messages pile up. Channel (pub/sub) Like the radio. Can have multiple publishers. Can have multiple subscribers. Each message goes to all subscribers. No subscriber message is lost. Publisher is not held up.
Composing a Log System Logstash is not a single service Split up concerns. Use queues to deal with bursts, errors. Use channels to troubleshoot. Logstash Process Redis Queue Redis Channel Database / Store
Composing a Log System General Architecture - Start Simple Kibana collector queue analyzer ES Keep collectors simple Reliability and speed are your goal here. Analyzer is the workhorse Can increase threads, run multiple. Queues are vital You will mess up your analyzer. Queues help avoid losing logs. Logstash Process Redis Queue Redis Channel Database / Store
Composing a Log System Channels - for duplicating data Kibana collector queue analyzer ES forwarder remote host (tcp) received forwarder remote host (tcp) Channels Useful when reliable delivery isn t needed and/or data needs to be replicated. Logstash Process Redis Queue Redis Channel Database / Store
Composing a Log System Archiving Kibana collector queue analyzer ES Archive to file gzip compresses data well and fast. archive archiver Logstash Process Redis Queue /log/yyyy-mm-dd/host.log.gz Redis Channel Database / Store
Composing a Log System Debugging with Channels Kibana collector queue analyzer ES collector_out analyzer_out Debug with Channels Channels can be used to sniff what s going on with the log system. throttle filter is your friend. debug-tool stdout Logstash Process Redis Queue Redis Channel Database / Store
Composing a Log System What we use today received parsed statsd Graphite collector queue analyzer analyzer ES [logstash] tcp/5043 - lumberjack Linux Logs tcp/514 - syslog Generic dump tcp/3003 - syslog Palo Alto FW/IPS logs archive archiver ES [panos] ES [netflow] Kibana Logstash Process tcp/4739 - NetFlow/IPFIX NetFlow file.gz nf-collector Redis Queue Redis Channel Database / Store
UNIVERSITY OF NEBRASKA AT OMAHA Q&A
UNIVERSITY OF NEBRASKA AT OMAHA Thanks!
Appendix - Resources LogStash Website Kibana Website HTTP server config (reverse proxy w/ auth) github.com/jamesharr/logstash - Snippet(s) of my log stash config github.com/elasticsearch/curator - Log curation Other Talks www.infoq.com/presentations/elasticsearch youtu.be/ruufnog29m4 - Jordan Sissel youtu.be/fwmnb4-t8vo - More Jordan Sissel
Appendix - Related Projects fluentd (integrates well) graylog2 (ES frontend) github.com/elasticsearch/logstash-forwarder - Log forwarder for resource-constrained systems statsd - count things, add things, periodically send them to graphite graphite - mrtg, but runs as a service opentsdb - graphite, but runs on HBase (good luck)