Using Logstash and Elasticsearch analytics capabilities as a BI tool

Similar documents

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Andrew Moore Amsterdam 2015

Log management with Logstash and Elasticsearch. Matteo Dessalvi

Logging on a Shoestring Budget

Developing an Application Tracing Utility for Mule ESB Application on EL (Elastic Search, Log stash) Stack Using AOP

Log managing at PIC. A. Bruno Rodríguez Rodríguez. Port d informació científica Campus UAB, Bellaterra Barcelona. December 3, 2013

Mobile Analytics. mit Elasticsearch und Kibana. Dominik Helleberg

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon Max Putas

A New Approach to Network Visibility at UBC. Presented by the Network Management Centre and Wireless Infrastructure Teams

How To Use Elasticsearch

Powering Monitoring Analytics with ELK stack

Log Analysis with the ELK Stack (Elasticsearch, Logstash and Kibana) Gary Smith, Pacific Northwest National Laboratory

Processing millions of logs with Logstash

Information Retrieval Elasticsearch

Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET. ISGC 2013, March 2013

Log Management with Open-Source Tools. Risto Vaarandi SEB Estonia

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Using NXLog with Elasticsearch and Kibana. Using NXLog with Elasticsearch and Kibana

Improve performance and availability of Banking Portal with HADOOP

Building a logging pipeline with Open Source tools. Iñigo Ortiz de Urbina Cazenave

Data Discovery and Systems Diagnostics with the ELK stack. Rittman Mead - BI Forum 2015, Brighton. Robin Moffatt, Principal Consultant Rittman Mead

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Log Management with Open-Source Tools. Risto Vaarandi rvaarandi 4T Y4H00 D0T C0M

Streamlining Infrastructure Monitoring and Metrics in IT- DB-IMS

Reliable log data transfer

XpoLog Center Suite Log Management & Analysis platform

Who did what, when, where and how MySQL Audit Logging. Jeremy Glick & Andrew Moore 20/10/14

CitusDB Architecture for Real-Time Big Data

A Scalable Data Transformation Framework using the Hadoop Ecosystem

XpoLog Center Suite Data Sheet

Why should you look at your logs? Why ELK (Elasticsearch, Logstash, and Kibana)?

FUJITSU Software ServerView Cloud Monitoring Manager V1 Introduction

Bernd Ahlers Michael Friedrich. Log Monitoring Simplified Get the best out of Graylog2 & Icinga 2

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com

How To Use Big Data For Telco (For A Telco)

Blackboard Open Source Monitoring

April 8th - 10th, 2014 LUG14 LUG14. Lustre Log Analyzer. Kalpak Shah. DataDirect Networks. ddn.com DataDirect Networks. All Rights Reserved.

XpoLog Competitive Comparison Sheet

Log management with Graylog2 Lennart Koopmann, FrOSCon Mittwoch, 29. August 12

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction. Background

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

Datasheet FUJITSU Software ServerView Cloud Monitoring Manager V1.0

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Comparative Analysis of Open-Source Log Management Solutions for Security Monitoring and Network Forensics

Technical Overview Simple, Scalable, Object Storage Software

the missing log collector Treasure Data, Inc. Muga Nishizawa

BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Real-Time Analytical Processing (RTAP) Using the Spark Stack. Jason Dai Intel Software and Services Group

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Using elasticsearch, logstash and kibana to create realtime dashboards

A Year of HTCondor Monitoring. Lincoln Bryant Suchandra Thapa

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

SharePlex for SQL Server

Big data blue print for cloud architecture

Decoding DNS data. Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs

Hadoop & its Usage at Facebook

Testing Automation for Distributed Applications By Isabel Drost-Fromm, Software Engineer, Elastic

Best Practices for Hadoop Data Analysis with Tableau

Research Report. IBM Operations Analytics - Log Analysis: Getting the Most out of Your Operational Big Data

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Performance and Health Monitoring and Analysis of Hive Scales Portal Web Application

Elasticsearch for Lua Developers. Pablo Musa

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

User and Programmer Guide for the FI- STAR Monitoring Service SE

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

Simba Apache Cassandra ODBC Driver

User Reports. Time on System. Session Count. Detailed Reports. Summary Reports. Individual Gantt Charts

A Performance Analysis of Distributed Indexing using Terrier

Trafodion Operational SQL-on-Hadoop

Delivering secure, real-time business insights for the Industrial world

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

Savanna Hadoop on. OpenStack. Savanna Technical Lead

Finding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012

Finding the needle in the haystack with ELK

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Towards Smart and Intelligent SDN Controller

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Agile Business Intelligence Data Lake Architecture

Openbus Documentation

Adding Indirection Enhances Functionality

Time-Series Databases and Machine Learning

Transcription:

Using Logstash and Elasticsearch analytics capabilities as a BI tool Pashalis Korosoglou, Pavlos Daoglou, Stefanos Laskaridis, Dimitris Daskopoulos Aristotle University of Thessaloniki, IT Center

Outline Technical stuff (Logstash, Elastic, Kibana, Ansible) Motivation for monitoring Software licenses Other use cases Summary and next steps

Logstash Written in jruby Applicable well beyond log files Plethora of core and community contributed plugins I/O plugins Filtering plugins Codecs Take this msg and parse/compute/save stuff on the wire

A very simple pipe example serviceuri: node03.domain.gr\nhostname: node03.domain.gr\nserviceflavour: service_i\nsitename: SITENAME\nmetricStatus: OK\nmetricName: org.ldap.freshness\nsummarydata: OK: freshness=70s, entries=1\ngatheredat: nagios.domain.gr\ntimestamp: 2015-06-08T19:42:31Z\nnagiosName: org.ldap.freshness\nservicetype: service_i\neot\n { } "@timestamp" => "2015-06-08T19:42:31.000Z, "hostname" => "node03.domain.gr", "serviceflavour" => "service_i", "sitename" => "SITENAME", "metricstatus" => "OK", "metricname" => "org.ldap.freshness", "freshness" => 70 "entries" => 1 "gatheredat" => "nagios.domain.gr", "probe" => "org.ldap.freshness", "servicetype" => "service_i"

Logstash forwarders & Lumberjack Logstash-forwarder is a lightweight forwarding service Keeps track of offset within log file Failure resistant Supports multiple file inputs Lumberjack is a collection service Basically one of many input plugins available Uses zlib for compression Secure transmission of logs via OpenSSL

Architecture Overview

Architecture Overview Logstash forwarder(s) configuration

ElasticSearch (Elastic) Distributed data-store with near-real time search capabilities Built on top of Apache Lucene Exposes HTTP RESTful API (i.e. for querying) Multitenant architecture Highly available Shards replication Supports 3 rd party plugins (i.e. HQ, head etc) Apache 2.0 license

Elastic, RDBMS & Hadoop concepts ES document -> Table row in a RDB ES Index -> RDB database A collection of documents ES Mapping -> RDB schema definition ES Shards -> Hadoop splits Each shard is actually a Lucene index ES index splits into shards

Elastic, RDBMS & Hadoop concepts Replication: 1 5 primary shards by default 1 replica for each shard Replicas can t be assigned on the same node with the primary shard

Kibana Node.js frontend for Elastic Allows (realtime) visualisation of data Flexible interface One can add, remove, move, modify rows and graphs Allows different search queries Allows save, import, export and share operations for dashboards

Software @ Auth More than 20 annual contracts signed (+a few perpetual) The majority relies on FlexLM service Expenditures/year: ~100K Use cases Which departments use software X? Which departments use software X for educational or research purposes? How often is software X s Y component used?

Software @ Auth The problem(s) with flex logs: 23:29:06 (deamon) TIMESTAMP 6/3/2015 0:36:51 (deamon) OUT: "feature" someone@somewhere 0:39:04 (deamon) IN: "feature" someone@somewhere 0:54:47 (deamon) DENIED: "feature" someone@somewhere (Licensed number of users already reached. (-4,342)) 0:54:47 (deamon) UNSUPPORTED: "feature" (PORT_AT_HOST_PLUS ) someone@somewhere (License server system does not support this feature. (-18,327)) 0:54:47 (deamon) OUT: "feature" someone@somewhere (2 licenses) 1:08:08 (deamon) IN: "feature" someone@somewhere (2 licenses) 1:08:31 (deamon) OUT: "feature" someone@somewhere 1:10:09 (deamon) IN: "feature" someone@somewhere 1:13:43 (deamon) UNSUPPORTED: "feature" (PORT_AT_HOST_PLUS ) someone@somewhere (License server system does not support this feature. (-18,327)) 3:16:44 (lmgrd) TIMESTAMP 6/4/2015

Software @ Auth Our solution (via logstash filtering): { } "_type": "deamon", "_source": { "message": "19:07:17 (deamon) IN: \"feature\" someone@somewhere", "@version": "1", "@timestamp": "2015-06-08T16:07:17.000Z", "host": "tracker01", "tags": [ "taskterminated", "elapsed", "elapsed.match" ], "feature": "\"feature\"", "username": "someone", "hostname": "somewhere", "elapsed.time": 67, "elapsed.timestamp_start": "2015-06-08T16:06:10.000Z" }

Software @ Auth (screenshots)

Software @ Auth (screenshots)

Software @ Auth The decision making on what contracts will continue and with how many seating licenses depend on our accounting monitoring Actual scenarios/decisions Renew contract for software X but reduce the number of floating licenses Renew annual license for software X but don t renew component Y

Other Use Cases? Web services Accounting Resources usage Environmental monitoring Logins and brute force attempts Performance metrics Any log file (?)

Web services filter { if [type] == "httpd" { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } geoip { source => "clientip" } mutate { convert => { bytes => "integer" } } date { match => [ "timestamp", "dd/mmm/yyyy:hh:mm:ss Z" ] } } }

Web services

Accounting (local HPC resource)

Accounting (local HPC resource)

Logins (successful and attacks)

Resources Usage

Re-playing Log files are still kept in central syslog Scratch elastic completely and everything is reproducible ad-hoc Filters (via Ansible) Log files (via central syslog)

Reporting Elastic API not reachable from outside What if we want to send reports to our users? Using phantomjs framework and rasterize.js we can generate: custom weekly or monthly or annual reports in pdf format and share with our users

Summary The wealth sometimes hidden away in our log files is enormous ELK should not be considered a replacement for central logging Rather it s best to treat it as an addition to an existing stack ELK has helped us in Indexing data from log files Searching through log files Visualizing data and gain useful business insight

Next steps Performance monitoring via Nagios/Icinga probes & metrics Combination with Hadoop stack Safekeeping cold data Performing combined aggregated queries λ architectural prototype Upgrade elastic and kibana to 1.5.x Apply data data retention policies and use Elastic's repository features for long term storage

Questions support@.gr