Geo-Identification of Web Users through Logs using ELK Stack

Similar documents
Log Analysis with the ELK Stack (Elasticsearch, Logstash and Kibana) Gary Smith, Pacific Northwest National Laboratory

Log Management with Open-Source Tools. Risto Vaarandi SEB Estonia

Developing an Application Tracing Utility for Mule ESB Application on EL (Elastic Search, Log stash) Stack Using AOP

Logging on a Shoestring Budget

Processing millions of logs with Logstash

Log Management with Open-Source Tools. Risto Vaarandi rvaarandi 4T Y4H00 D0T C0M

Log management with Logstash and Elasticsearch. Matteo Dessalvi

Andrew Moore Amsterdam 2015

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

April 8th - 10th, 2014 LUG14 LUG14. Lustre Log Analyzer. Kalpak Shah. DataDirect Networks. ddn.com DataDirect Networks. All Rights Reserved.

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Arti Tyagi Sunita Choudhary

Web Log Analysis for Identifying the Number of Visitors and their Behavior to Enhance the Accessibility and Usability of Website

Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET. ISGC 2013, March 2013

Monitoring Linux and Windows Logs with Graylog Collector. Bernd Ahlers Graylog, Inc.

Information Retrieval Elasticsearch

Comparative Analysis of Open-Source Log Management Solutions for Security Monitoring and Network Forensics

Setting Up SSL on IIS6 for MEGA Advisor

Mobile Analytics. mit Elasticsearch und Kibana. Dominik Helleberg

A New Approach to Network Visibility at UBC. Presented by the Network Management Centre and Wireless Infrastructure Teams

Introduction. Background

Using NXLog with Elasticsearch and Kibana. Using NXLog with Elasticsearch and Kibana

Using Logstash and Elasticsearch analytics capabilities as a BI tool

Analyzing the Different Attributes of Web Log Files To Have An Effective Web Mining

Cyclope Internet Filtering Proxy. - Installation Guide -

Why should you look at your logs? Why ELK (Elasticsearch, Logstash, and Kibana)?

Blackboard Open Source Monitoring

ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING

ANALYSING SERVER LOG FILE USING WEB LOG EXPERT IN WEB DATA MINING

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon Max Putas

Includes: Building an SEO- friendly Website: A Comprehensive Checklist

XpoLog Center Suite Log Management & Analysis platform

W3Perl A free logfile analyzer

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Log managing at PIC. A. Bruno Rodríguez Rodríguez. Port d informació científica Campus UAB, Bellaterra Barcelona. December 3, 2013

Using TestLogServer for Web Security Troubleshooting

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Reliable log data transfer

Distributed Framework for Data Mining As a Service on Private Cloud

Apache Logs Viewer Manual

vcenter Chargeback User s Guide vcenter Chargeback 1.0 EN

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

Security Correlation Server Quick Installation Guide

Web Traffic Capture Butler Street, Suite 200 Pittsburgh, PA (412)

Using elasticsearch, logstash and kibana to create realtime dashboards

Discover the best keywords for your online marketing campaign

LogProcess v1.0 User Guide

Using IPM to Measure Network Performance

NEFSIS DEDICATED SERVER

EXPERT STRATEGIES FOR LOG COLLECTION, ROOT CAUSE ANALYSIS, AND COMPLIANCE

Bernd Ahlers Michael Friedrich. Log Monitoring Simplified Get the best out of Graylog2 & Icinga 2

Integration of IT-DB Monitoring tools into IT General Notification Infrastructure

Synergy Controller Cloud Storage Features and Benefits

A Comparative Study of Different Log Analyzer Tools to Analyze User Behaviors

Citrix NetScaler 10.5 Essentials for ACE Migration CNS208; 5 Days, Instructor-led

Considerations In Developing Firewall Selection Criteria. Adeptech Systems, Inc.

Volume SYSLOG JUNCTION. User s Guide. User s Guide

CNS-205 Citrix NetScaler 10 Essentials and Networking

GFI Product Manual. Administration and Configuration Manual

System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks

Citrix NetScaler 10 Essentials and Networking

HP IMC User Behavior Auditor

DocuShare User Guide

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey

Alfresco Enterprise on Azure: Reference Architecture. September 2014

A Study of Web Traffic Analysis

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Real-Time Web Log Analysis and Hadoop for Data Analytics on Large Web Logs

Usage Analysis Tools in SharePoint Products and Technologies

Powering Monitoring Analytics with ELK stack

FileMaker Server 12. FileMaker Server Help

Lesson 7 - Website Administration

Firewall Stateful Inspection of ICMP

Advanced Install & Configuration Guide

Delivering Smart Answers!

USM Web Content Management System

EMC Data Protection Search

Centralized logging system based on WebSockets protocol

How To Use Elasticsearch

Network setup and troubleshooting

Evaluating the impact of research online with Google Analytics

Web Application Threats and Vulnerabilities Web Server Hacking and Web Application Vulnerability

Synergy Controller Cloud Storage Features and Benefits

Adobe Marketing Cloud Using FTP and sftp with the Adobe Marketing Cloud

Using SNMP to Obtain Port Counter Statistics During Live Migration of a Virtual Machine. Ronny L. Bull Project Writeup For: CS644 Clarkson University

NetIQ SecureLogin includes new features, improves usability, and resolves several previous issues.

Unless otherwise noted, all references to STRM refer to STRM, STRM Log Manager, and STRM Network Anomaly Detection.

enicq 5 System Administrator s Guide

Upgrade to Webtrends Analytics 8.7: Best Practices

StreamServe Persuasion SP4 StreamServe Connect for SAP - Business Processes

Search and Information Retrieval

The syslog-ng Premium Edition 5F2

Network Monitoring with SNMP

SonicWALL Global Management System Reporting Guide Standard Edition

Clusterpoint Network Traffic Security System. User manual

Integrity Checking and Monitoring of Files on the CASTOR Disk Servers

Click Stream Data Analysis Using Hadoop

How to Make the Client IP Address Available to the Back-end Server

Sonian Getting Started Guide October 2008

Transcription:

Geo-Identification of Web Users through Logs using ELK Stack Tarun Prakash Ms. Misha Kakkar Kritika Patel Amity University, Uttar Pradesh Assistant Professor, Amity University, Uttar Pradesh Amity University, Uttar Pradesh Email: tarun_prakash@outlook.com Email:mkakkar@amity.edu Email:kritika.abes@gmail.com Abstract With the Internet penetration rate going higher, huge amount of log files are being generated, which contains hidden information having enormous business value. To unlock the hidden returns, log management system helps in making business decisions. Although, a lot of log management exist but they either fail to scale or are costly. Here efforts have been made to solve the shortcomings of prevailing log analyzer tools and this paper demonstrates the working of ELK ecosystem i.e. Elasticsearch, Logstash and Kibana clubbed together to efficiently analyze the log files and provide an interactive and easily understandable insights. Log management systems built on ELK stack are desired to analyze large log data sets while making the whole computation process easy to monitor through an interactive interface. Being from open source community ELK stack has many useful features for log analysis. Elasticsearch is used as Indexing, storage and retrieval engine. Logstash acts as a Log input slicer and dicer and output writer while Kibana performs Data visualization using dashboards. By implementing ELK ecosystem we have efficiently geo-identify the website users traffic using logs. Keywords Elasticsearch, logstash, kibana, Geo-identification, log analysis I. INTRODUCTION Log management system deals with large volumes of computer generated log records (also called as audit records, audit trails, event-logs, etc.). The process involves log collection, centralized aggregation, long-term retention, log analysis, log searching and reporting. A stream of messages in time sequence often contains log entity. Logs are either stored on a disk or supplied as a data stream to a log collector. Logs are being generated in order to help in debugging the operations in an application. The semantics and syntax of data in log messages are usually application or system specific. Log analysis systems must decode messages within the context of an application or system in order to make meaningful comparisons for messages within different log files. As every computing device is generating logs and if only web server logs case is considered then also it amounts to a very large data set. The analysis of these logs not only helps the companies in their decision making but also in improving their products and services. However, most of the small-scale companies cannot afford these log management system. Another problem is the usability of the log management products as most of the available products require technical expertise in-order to process and interpret the logs data. In a nutshell, the three critical problem components for log analysis are Volume, Cost and Usability. Elasticsearch is capable in indexing and searching. Elasticsearch has search and list information accessible RESTfully as JSON over HTTP and can easily scale. It's under the Apache 2.0 permit and is based on top of Apache's Lucene project. Elasticsearch is a content indexing web search tool. The best representation to show Elasticsearch is the pages of a book. As we flip to the back of a book, and look for a word and afterward discover the reference page. This implies that instead of looking at the content strings straightforwardly, Elasticsearch makes an index of the content and it looks to the indexes as opposed to the contents. Thus, it is quick. Logstash is a tool for managing events and logs. It is used to collect logs, parse them, and store them for later use (like, for searching). If we store them in Elasticsearch, we can view and analyze them with Kibana. The Kibana web interface is an adjustable dashboard that we can stretch out and change to suit our surroundings. It permits tables and diagrams creation and in addition complex representations. The Kibana web interface utilizes the Apache Lucene question linguistic structure to permit us to make inquiries. We can seek any of the fields contained in a Logstash, for instance, message, syslog_program, and so on. We can utilize Boolean rationale with AND, OR, and NOT. Fig 1: ELK Ecosystem This paper demonstrates the application of ELK stack to Geo- Identify the web users through logs. As ELK stack is an open source technology, thus it doesn t involve paying high licensing fees. 978-1-4673-8203-8/16/$31.00 c 2016 IEEE 606

II. REALTED WORK Various development works has been going on since the last decade to build log analytics systems. Artyom [1] did a comprehensive analysis on popular open source log management tools such as Graylog2, ELK, and ELSA. His study was based on performance testing, which showed ELSA was fastest and was able to handle 14000 logs per second but ELK secured impact on usability factor. Christopher [2] in his blog analyzed Yahoo's historical stock data [3] using ELK stack and build the dashboard showing the volume of stocks, if Average high and Average low. Aarvik [4] build a real time data analytics tool and tested logs of his Debian Operating system. Tanmay [5] integrated apache log data with GeoCity [6] data, which transforms IP address into exact locations with attributes like longitude, altitude, city, state and country. William [7] in his blog post used Docker [8] container and deployed ELK stack in order to monitor infrastructure. Jettro [9] In his project post scanned directory structure of jpg files and extracted meta data from images, utilizing the ELK stack he build a dashboard which shows details about aperture, focal length and iso. Bogdan [10] build a video card monitoring solution using ELK stack to have the information about the performance when gaming. In a realpython [11] blog a real time twitter sentiment analysis dashboard utilizing Elasticsearch and Kibana was built. Below Table 1 shows a comparative study of GrayLog2, ELK and ELSA. Table 1: Feature Overview of Graylog2, ELK, ELSA [1] Step 2: The Logstash server after receiving the logs and process the different queues to store the data in the local folders. Step 3: Elasticsearch performs different operations (e.g. optimize the data structures, create search indexes, group information) Step 4: Kibana reads Logstash data structures and present them to the users using custom layouts, dashboards and filters. Fig 2: ELK Overview In this study centralized ELK server was set up and Logstashforwarder is used to send the logs to the ELK server. There are two classes of Logstash hosts: Hosts running the Logstash as an event forwarder that sends us application, administration, and host logs to a central Logstash server. Central Logstash hosts running some blend of storage, indexer, archiver, search and web interface programming which get, process, and store logs. A fundamental setup record for Logstash has 3 areas: 1. Input 2. Filter 3. Output Through Inputs we are passing log information to Logstash. a. file: It reads from a document on the file system, much like the UNIX charge "tail - f" b. Syslog: It listens on the port 514 for syslog messages and parses as indicated by RFC3164 group a. Lumberjack: It contains events sent in the logger convention. It is generally called as logstash-forwarder. III. METHODS AND METHODOLOGIES ELK Ecosystem Working Overview: Step 1: The Logstash-forwarder reads the local log files by the user and then log file were sent to the Logstash server (port 5000) encrypted, using the Logstash configuration file. Filters are workhorses for handling inputs in the Logstash chain. They are regularly joined with conditionals so as to perform a certain activity on an occasion, on the off chance that it coordinates specific criteria. Some valuable channels: a. Grok: It Parses discretionary content and structures it. Grok is at present the most ideal route in Logstash to parse unstructured log information into something organized and queryable. With 120 examples 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence) 607

dispatched implicit to Logstash, it is more than likely we will discover one that addresses our issues! b. Drop: The transform channel permits us to do general transformations to fields. We can rename, uproot, supplant, and alter as required. c. Geoip: Adds data about geological area of IP addresses (and presentations astounding outlines in Kibana) Outputs are the last step of the Logstash pipeline Elasticsearch: Elasticsearch is the best approach to store data in a proficient and effortlessly queryable manner. File: composes occasion information to a document on plate. Statsd: An administration that "listens for measurements, similar to counters and clocks, sent over UDP and sends totals to one or more pluggable backend administrations". IV. EXPERIMENTAL SET UP For the purpose of this study - Elasticsearch -1.6.0, Logstash- 1.5.1, Kibana-4.1.0 is used. Log File Description: 74.125.44.145 - - [11/Mar/2015:00:07:32 +0100] "GET /feed HTTP/1.1" 304 0 "" "FeedBurner/1.0 (http://www.feedburner.com)" HSI-KBW-134-3-54-231.hsi14.kabel-badenwuerttemberg.de - - [11/Mar/2015:00:13:12 +0100] "GET /favicon.ico HTTP/1.1" 200 894 "" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0" Username: It resembles the visitor through the IP address. Visiting Path: It provides the path traversed or url being hit directly or indirectly through links by user. Path Traversed: It provide the path traversed within the website. Time Stamp: It provides the amount of time being spent on web site. It is also termed as session. Page last visited: It provides the page that was being visited by user Success rate: It provides the number of downloads being done by user. If there is any purchase being made as software or anything than it adds to the success. User Agent: It provides browser being used at client side. URL: It proves the requested resources being access by user like html page. Request Type: It provides the type of method being used like Get, Post. V. EXPERIMENTATION Access Logs were loaded into data using Logstash into Elasticsearch and then create reports in Kibana using the same index. Loading logs data into Elasticsearch using Logstash Step 1. Loading of log data into text file. Each log line looks similar to fig 3 storing information like caller IP address, the target resource, etc. Fig 3: Sample log file Logstash Configuration:To make Logstash work we need three things: Input definition from where the data comes Filters definition on how our input data will be processed Output where our processed data will be send Fig 4: Logstash Configuration Here at the input section the path specifies the location of the data file, the type is our name for the data and the start position is set to the beginning that means that to process the whole file from the start. In the filter section, first we have to process access_log type and we have two filters grok and geoip. The grok filter is responsible for parsing the access log, it takes the unstructured log from the defined input and give each log line format. The second filter is the geoip, which adds geo information to our data. For the geoip filter we are required to specify thesource property, which holds information about the field that holds the IP address we are interested in in our case it is the clientip field generated by the grok filter. In the output section, host, port and protocol are specified. Next index template is defined by using the indexproperty, which in our case will result indices like logs_2015.03.21, logs_2015.03.22 608 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence)

Step 2. Starting elasticsearch, logstash cluster from command prompt and start kibana services. To configure kibana select Use event times to create index names and Use event times to create index names, Index pattern inverval to match indices in our case daily and the Index name or pattern is set to format [logs_]yyyy.mm.dd. We also have to choose the time based field @timestamp and then hit Create.We have to select the aggregation type. As we intend to find the geo location we need to choose Geo coordinates aggregation, which will be based on the IP addresses information we configured Logstash to extract and add.in the below screen we can the Geo map which shows the Red, Yellow dots based on the number of hits from the IP address of the coloured region as follows: the Hit count of users and generates heat map which is easily understandable and offers other useful visualization schemes. Here we can see that number of hit counts of users from the various regions is being shown in different colors on a map. We can also identity the Geo Grid that has highest hit counts and lowest counts and based on that, we can interpret our user base. In this paper we performed the geo identification of users based on the access logs using open source stack ELK, we have used a small log data set just to demonstrate the usability of ELK stack but the real usefulness of ELK stack will be on Big data sets as without much of installation steps and being the product from open source community, it is cost effective and had huge active contributor base which makes it more competitive as compared to other products for log management activities. Fig 5: GeoMap with Users Fig 7: Hit Counts of Users in Geo-Map Moreover, ELK stack can be used for tracking fraudulent activities, security breaches and it looks promising for Internet of things monitoring which is the future. ELK is much effective in building real time dashboard for analyzing tweets and other analytics. VI. Fig 6: Tabular Data of Hit Counts RESULTS AND DISCUSSIONS ELK as a log management system is very useful in terms of usability as anyone can interpret the results and based on that they can have some insights. Below screen shows the Cluster health monitoring through a GUI.The figure 5 shows REFERENCES [1] Artyom Churilin, "Choosing an open-source log management system for small business," Master s Thesis, Faculty of Information Technology, Tallin University of Technology, Tallinn, Estonia. [2] Christopher (2015, April 15).Visualizing data with Elasticsearch, Logstash and Kibana[Online].Available:http://blog.webkid.io/visualizedatasets-with-elk/ [3]Yahoo Stock history data [Online].Available: http://finance.yahoo.com [4]Anders Aarvik (2014, April 04).A bit on ElasticSearch + Logstash + Kibana (The ELK stack)[online].available:http://aarvik.dk/a-bit-onelasticsearch-logstash-kibana-the-elk-stack/ [5]Tanmay Deshpande (2015,May 10).Log Analytics using Elasticsearch, Logstash and Kibana[Online].Available:http://hadooptutorials.co.in/tutorials/ elasticsearch/log-analytics-using-elasticsearch-logstashkibana.html# 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence) 609

[6]GeoLite Legacy Downloadable Database[Online].Avaialble:http://dev.maxmind.com/geoip/leg acy/geolite/ [7]William Durand (2014, December 17).Elasticsearch, Logstash & Kibana with Docker [Online].Avaialble:http://williamdurand.fr/2014/12/17/elasticse arch-logstash-kibana-with-docker/ [8]Docker [Online].Available: https://www.docker.com/ [9]Jettro Coenradie (2013, November 28).Use Kibana to analyze your images[online].avaialble:https://blog.trifork.com/2013/11/28/ use-kibana-to-analyze-your-images/ [10]Bogdan Dumitresc (2014, January 28)Using Logstash, Elasticsearch and Kibana to monitor your video card[online].avaialble:http://blog.trifork.com/2014/01/28/usin g-logstash-elasticsearch-and-kibana-to-monitor-your-videocard-a-tutorial/ [11]Real Python (2014, November 13) Twitter Sentiment - Python, Docker, Elasticsearch, Kibana [Online].Available:https://realpython.com/blog/python/twittersentiment-python-docker-elasticsearch-kibana/ [12] L.K. Joshila Grace, V. Maheswari, and Dhinaharan Nagamalai Web Log Data Analysis and Mining in Proc CCSIT-2011, Springer CCIS, Vol 133, pp 459-469, 2011. 610 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence)