Log Analysis as a Service using open source scalable systems. Gurvinder Singh Dahiya, Uninett AS Belgrade Security Workshop, 20.03.



Similar documents
Andrew Moore Amsterdam 2015

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Blackboard Open Source Monitoring

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon Max Putas

Log managing at PIC. A. Bruno Rodríguez Rodríguez. Port d informació científica Campus UAB, Bellaterra Barcelona. December 3, 2013

Using Logstash and Elasticsearch analytics capabilities as a BI tool

Information Retrieval Elasticsearch

Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET. ISGC 2013, March 2013

Log Management with Open-Source Tools. Risto Vaarandi SEB Estonia

Powering Monitoring Analytics with ELK stack

Log management with Logstash and Elasticsearch. Matteo Dessalvi

Mobile Analytics. mit Elasticsearch und Kibana. Dominik Helleberg

Logging on a Shoestring Budget

April 8th - 10th, 2014 LUG14 LUG14. Lustre Log Analyzer. Kalpak Shah. DataDirect Networks. ddn.com DataDirect Networks. All Rights Reserved.

Log Analysis with the ELK Stack (Elasticsearch, Logstash and Kibana) Gary Smith, Pacific Northwest National Laboratory

Using elasticsearch, logstash and kibana to create realtime dashboards

Log Management with Open-Source Tools. Risto Vaarandi rvaarandi 4T Y4H00 D0T C0M

Processing millions of logs with Logstash

XpoLog Competitive Comparison Sheet

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

A New Approach to Network Visibility at UBC. Presented by the Network Management Centre and Wireless Infrastructure Teams

Data Discovery and Systems Diagnostics with the ELK stack. Rittman Mead - BI Forum 2015, Brighton. Robin Moffatt, Principal Consultant Rittman Mead

Open Source Technologies on Microsoft Azure

Big Data Pipeline and Analytics Platform

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Cloud Powered Mobile Apps with Azure

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Efficient Management of System Logs using a Cloud

Monitoring Linux and Windows Logs with Graylog Collector. Bernd Ahlers Graylog, Inc.

Scaling Graphite Installations

How To Use Elasticsearch

Comparative Analysis of Open-Source Log Management Solutions for Security Monitoring and Network Forensics

Security Data Analytics Platform

Streamlining Infrastructure Monitoring and Metrics in IT- DB-IMS

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures

Log infrastructure & Zabbix. logging tools integration

The manual contains complete instructions on 'converting' your data to version 4.21.

Building a logging pipeline with Open Source tools. Iñigo Ortiz de Urbina Cazenave

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

FireEye App for Splunk Enterprise

A Performance Analysis of Distributed Indexing using Terrier

Improve performance and availability of Banking Portal with HADOOP

w w w. u l t i m u m t e c h n o l o g i e s. c o m Infrastructure-as-a-Service on the OpenStack platform

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Modern Web development and operations practices. Grig Gheorghiu VP Tech Operations Nasty Gal

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Leveraging Machine Data to Deliver New Insights for Business Analytics

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Integration of IT-DB Monitoring tools into IT General Notification Infrastructure

Log management with Graylog2 Lennart Koopmann, Kieker Days Mittwoch, 5. Dezember 12

A Development Analytics Dashboard For Apache CloudStack

AdRadionet to IBM Bluemix Connectivity Quickstart User Guide

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Q&A Session for Understanding Atrium SSO Date: Thursday, February 14, 2013, 8:00am Pacific

JaM - PHP Error Monitoring Extension

Elasticsearch for Lua Developers. Pablo Musa

Technical Overview: Anatomy of the Cloudant DBaaS

How to Build a simple App for Splunk

Copyright 2013 Splunk Inc. Introducing Splunk 6

GUJARAT TECHNOLOGICAL UNIVERSITY

Log management with Graylog2 Lennart Koopmann, FrOSCon Mittwoch, 29. August 12

Research Report. IBM Operations Analytics - Log Analysis: Getting the Most out of Your Operational Big Data

NoSQL web apps. w/ MongoDB, Node.js, AngularJS. Dr. Gerd Jungbluth, NoSQL UG Cologne,

Deploying and Managing SolrCloud in the Cloud ApacheCon, April 8, 2014 Timothy Potter. Search Discover Analyze

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

Graylog2 Lennart Koopmann, OSDC /

Evaluation and implementation of CEP mechanisms to act upon infrastructure metrics monitored by Ganglia

Bernd Ahlers Michael Friedrich. Log Monitoring Simplified Get the best out of Graylog2 & Icinga 2

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

FormAPI, AJAX and Node.js

THE ENTERPRISE GAMING COOKBOOK

FUJITSU Software ServerView Cloud Monitoring Manager V1 Introduction

TDAQ Analytics Dashboard

Event Logging - Introduction

Big Data Unlock the mystery and see what the future holds. Philip Sow SE Manager, SEA

Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13

G-Cloud Service Definition Version 1.0 April Worship Street London, EC2A 2DX Tel: +44(203) Fax: +44(203)

Finding the needle in the haystack with ELK

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

Configuration Management Evolution at CERN. Gavin

Building a Splunk-based Lumber Mill. Turning a bunch of logs into useful products

An Overview of Amazon Silk Amazon s new cloud-powered browser

Google App Engine. Guido van Rossum Stanford EE380 Colloquium, Nov 5, 2008

Talend Component tgoogleanalyticsmanagement

Splunk: Using Big Data for Cybersecurity

How To Write A Web Server In Javascript

A Year of HTCondor Monitoring. Lincoln Bryant Suchandra Thapa

TIME KEEP LEGAL BILLING SOFTWARE REQUIREMENTS SPECIFICATION

Why should you look at your logs? Why ELK (Elasticsearch, Logstash, and Kibana)?

Leveraging Big Data. A case study from Thomson Reuters

The deployment of OHMS TM. in private cloud

L7_L10. MongoDB. Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD.

State of SIEM Challenges, Myths & technology Landscape 4/21/2013 1

Course 20489B: Developing Microsoft SharePoint Server 2013 Advanced Solutions OVERVIEW

CRM University. Course Catalog. BC010 Boot Camp With a focus on the business use of Microsoft Dynamics CRM, the boot

Unified Batch & Stream Processing Platform

insync Installation Guide

Transcription:

Log Analysis as a Service using open source scalable systems Gurvinder Singh Dahiya, Uninett AS Belgrade Security Workshop, 20.03.2015

Motivation Distributed Systems Centeralized interface to logs Easier access Detection of hidden pattern Access to logs across organization Cenralized alerts and anomaly detection across services http://www.themeparkreview.com/tatsumediaday/tatsumediaday57.jpg 2

Challenges Different Formats and logging methods Different requirements for processing Differnt Dashboards Various Alerts requirments http://img72.imageshack.us/img72/3885/nephew2logs.jpg 3

What is Log? Jun 2 07:40:34 scintilla kernel: [77262.488918] hid-generic 0003:046D:0A15.001C: input,hidraw4: USB HID v1.00 Device [Logitech Logitech G35 Headset] on usb-0000:00:1d.0-1.5.4/input3 [2014-06-02 14:08:39,870][INFO ][cluster.metadata update_mapping [mail] (dynamic) ] [pltrd003] [mail-2014.06.02] 2014-06-02T12:11:25.271Z 158.36.2.74 https://idp.feide.no feide:sso ntnu.no [u'urn:mace:feide.no:services:no.ntnu.ssowrapper'] 1401711085.27 158.38.213.3 - - [02/Jun/2014:14:12:48 +0200] "POST / es/logstash-2014.05.26/_search HTTP/1.1" 200 329628 "https://logs.uninett.no/"; "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36" DEBUG 2014-06-02 14:15:13,371 [Thread-110743] no.uninett.agora.user.userrolesiteutil Querying for liferay sites for roles: [fs] TIMESTAMP + DATA = LOG 4

Time Formats 5

A log is human readable... Jun 2 07:40:34 scintilla kernel: [77262.488918] hid-generic 0003:046D:0A15.001C: input,hidraw4: USB HID v1.00 Device [Logitech Logitech G35 Headset] on usb-0000:00:1d.0-1.5.4/input3 6

7

But machine parsable... maybe? Jun 2 07:40:34 scintilla kernel: [77262.488918] hid-generic 0003:046D:0A15.001C: input,hidraw4: USB HID v1.00 Device [Logitech Logitech G35 Headset] on usb-0000:00:1d.0-1.5.4/input3 8

Apache Regex.. 9

Evolution of log processing Stage 1 Single Host $ grep -I "Invalid user " /var/log/auth.log* awk '{ print $10; }' Stage 2 - A handful of hosts #!/bin/sh USER=root KEY=/root/public_key.pub for HOST in server1 server2 server3 server4 do ssh -l $USER -i $KEY $HOST grep -I "Invalid user " \ /var/log/auth.log* awk '{ print $10; }' done 10

Evolution of log processing Stage 3 Lot of servers $ grep -I "Invalid user " /rsyslog/*/auth.log* awk '{ print $10; }' 11

Evolution of log processing Stage 4 Start using splunk 12

Evolution of log processing The first one is free 500 MB per day 13

Evolution of log processing Incoming invoice from Splunk 14

Evolution of log processing Stage 5 Open source scalable solutions Logstash Elasticsearch Kibana ZeroMQ Logstash-forwarder Rsyslog Statsd 15

Logstash Turns this: «192.168.0.74 - - [13/May/2014:04:28:55-0500] "GET /robots.txt HTTP/1.1" 301 303 "-" "Mozilla/5.0 (compatible; DSASE/1.0; bot@gmail.com)"» Into: { "client address": "192.168.0.74", "user": null, "timestamp": "2014-05-13T14:04:28-0500", "verb": "GET", "path": "/robots.txt", "query": null, "http version": 1.1, "response code": 301, "bytes": 303, "referrer": null "user agent": "Mozilla/5.0 (compatible; DSASE/1.0; bot@gmail.com)" } 16 16

Elasticsearch Document - Oriented Free Text Search & Analytics Engine JSON Apache Lucene No Schema Mapping Types Horizontally Scaleable, Distributed REST API Vibrant Ecosystem Completely Open Source 17

Elasticsearch Index Logical collection of data; might be time based Analogous to a database Sharding Split logical data over several machines Write scalability Control data flows Replication Read scalability Removing SPOF 18

Elasticsearch Shards and Replication Index allocation with 10 shards and 3 replication factors 19

Kibana Javascript based web application No dependency except a web server (Changing a bit in Kibana 4) Visualize data stored in Elasticsearch Dynamic panel and dashboard creation support Multiple panel types support Tables Pie charts Maps Text Trends Histograms 20

Kibana Visualization 21

Kibana Visualization 22

Lego Brick-Built 23

Authentication & Authorization Sensitive information Privacy concerns Support LDAP groups Support Feide, Norwegian single-sign on solution A nodejs app, forked from another kibana-proxy Access is allowed on per service basis by service owner 24

Data Model Normalize the diversity of information Logsource field for host information Username src_ip,dest_ip, src_port, dest_port _service type 25

Log Analysis as Service (LAAS) It's up and running Pilot is offered till April 14, 2015 and production from April 15, 2015 In pilot with Geant SA7-T1 acitivity Colloboration with GarrNet, Heanet Following institutions are sending logs Høgskolen i Oslo-Akerhus Høgskolen i Hedemark Høgskolen i Østfold Høgskolen i Ålesund Handelshøyskolen Uninett AS Currently receiving 60 GB data per day Currently total sliding data size is 2TB 26

Log Analysis as Service (LAAS) It's becoming an eco system Suricata IDS service coming up and can send logs from IDS service to LaaS Edudbg service used LaaS as backend to provide debugging information to all eduroam users in Norway. Intstitutions can share data with each other without giving access to servers itself Can share data with external consultants as well Collaborate on making parsers which can help saving resources and share benefit Collaborate on dashboards as well for similar log services 27

28

29