Centralized and structured log file analysis with Open Source and Free Software tools

Size: px
Start display at page:

Download "Centralized and structured log file analysis with Open Source and Free Software tools"

Transcription

1 Bachelo Thesis Summe Semeste 2013 at Fachhochschule Fankfut am Main Univesity of Applied Sciences Depatment of Compute Science and Engineeing toads Bachelo of Science Compute Science submitted by Jens Kühnel Centalized and stuctued log file analysis ith Open Souce and Fee Softae tools 1. Supeviso: Pof. D. Jög Schäfe 2. Supeviso: Pof. D. Matthias Schubet topic eceived: thesis deliveed:

2 Abstact This thesis gives an ovevie on the Open Souce and Fee Softae tools available fo a centalized and stuctued log file analysis. This includes the tools to convet unstuctued logs into stuctued log and diffeent possibilities to tanspot this log to a cental analyzing and stoage station. The diffeent stoage and analyzing tools ill be intoduced, as ell as the diffeent eb font ends to be used by the system administato. At the end diffeent tool chains ill be intoduced, that ae ell tested in this field. Revisions Rev. 269: Official Bachelo these sent to FH Rev. 273: Removal of Affidavit, fix of Pagenumbe left/ight II

3 Table of Contents 1 Intoduction Selection citeia Pogams that ae included in this thesis What this thesis is not coveing Hadoop Pogams that ae not included in this thesis Stuctue of this thesis Histoy of log files Definitions Log file Centalized log file Definition stuctued log files Definition Open Souce and Fee Softae Definition Log File Analysis Components and Functions Fomats Semi stuctued logs BSD syslog (RFC3164) Moden syslog (RFC 5424) Stuctued logs CEE GELF JSON-logstash Systemd jounal Windos Event Log Auditlog Intusion Detection Message Exchange Fomat (IDMEF) Othe fomats Collecto/Shippe File Sockets, named pipes and STDIN Local Windos Eventlog Compae collecto / shippe Tanspot Syslog AMQP STOMP Ømq/ZMTP Redis Lumbejack Remote Windos Eventlog Compae Tanspots Tansfomation/Nomalization Patten-DB Liblognom Octopussy Gok...25 III

4 3.4.5 Heka Filte_egex nxlog Stoage Log files SQL NoSQL Compae Stoage Analysis nxlog SEC Sagan Logstash and metics Gaylog Visual output Tools Multi pupose tools Syslog-ng Rsyslog Gaylog Logstash Node-Logstash ELSA octopussy nxlog Heka Output Webpage LogAnalyze Kibana Kibana Gaphs StatsD Gaphite Fnodmetic Fnodmetic Classic Fnodmetic Entepise Fnodmetic UI Stoage mysql MongoDB ElasticSeach Tanspots edis abbitmq ActiveMQ Ømq Collecto/Shippe Fluentd flume...43 IV

5 4.5.3 aesant beave lumbejack eventlog-to-syslog oodchuck ncode/logix syslog-shippe emote_syslog systemd/jounal2gelf Analysis Toolchains Possible toolchains Toolchain Featues Accepting stuctued log files Reliable tanspot High availability Use sepaation and LDAP Size of ule base Log Analysis Install Speed Summay Conclusion Shot summay about evey majo tool Futue Optimal toolchain...54 V

6 Bibliogaphy ActiveMQ-Cluste: The Apache Softae Foundation., Featues > Clusteing, 2011, etieved: ActiveMQ-Featues: The Apache Softae Foundation., Connectivity > Coss Language Clients, 2011, etieved: ActiveMQ-SSL: The Apache Softae Foundation., Ho do I use SSL, 2011, etieved: AMQP: OASIS, AMQP A Geneal-Pupose Middleae Standad, 2011, etieved: CBE: IBM, Undestanding Common Base Events Specification V1.0.1, 2004, etieved: CEEFields: MITRE Coopeation, CEE Coe Field Dictionay, 2012, etieved: Chuilin2013: Atyom Chuilin, CHOOSING AN OPEN-SOURCE LOG MANAGEMENT SYSTEM FOR SMALL BUSINESS, 2013, etieved: Chuvakin2008: D. Anton A. Chuvakin, CEE Logging Standad, 2008, etieved: Chuvakin2013: D. Anton A. Chuvakin, Logging and Log Management, 2013, ISBN: Czanik2013: Pete Czanik, PattenDB git moved and updated, 2013, etieved: ELSA-UseGuide:, Use Guide fo ELSA, 2013, etieved: ELSAQuickstat: Matin Holste, ELSA Quickstat, 2011, etieved: FLUENTD-FAQ: unkonn, FAQ, 2013, etieved: FlumeUseGuide: The Apache Softae Foundation, Flume Use Guide, unknon, etieved: FeeSoftae: Fee Softae Foundation, The Fee Softae Definition, 2013, etieved: GELF: Lennat Koopmann, Gaylog Extended Log Fomat, 2011, etieved: Gehads2007: Raine Gehads, hy does the old need anothe syslogd? (aka syslog vs. syslog-ng), 2007, etieved: Gehads2008: Raine Gehads, hy you can't build a eliable TCP potocol ithout app-level acks..., 2008, etieved: Gehads2011: Raine Gehads, Log Nomalization Systems and CEEPofiles, 2011, etieved: Gehads2011-2: Raine Gehads, Using syslog mmnomalize module effectively ith Adiscon LogAnalyze, 2011, etieved: Gehads2013: Raine Gehads, Ho to sign log messages though signatue povide Guadtime, 2013, etieved: Gehads2013-2: Raine Gehads, syslog's fist signatue povide: hy Guadtime?, 2013, etieved: VI

7 Gheoghe2012: Radu Gheoghe, Using Elasticseach fo logs, 2012, etieved: Gilche2012: Floian Gilche, ElasticSeach pe-flight checklist, 2012, etieved: Guzdial1993: Mak Guzdial, Deiving Softae Usage Pattens fom Log Files, 1993 HekaInto: Mozilla Foundation, Intoducing Heka, 2013, etieved: Hintjens2013: Piete Hintjens, Code Connected Volume 1 - Leaning ZeoMQ, 2013, ISBN: Holste2011: Matin Holste, Fighting APT ith Open-souce Softae, Pat 1: Logging, 2011, etieved: Hang2011: Eic Hang, Sam Rash, Data Feeay: Scaling out to Realtime, 2011, etieved: JOURNALFIELDS: Lennat Poetteing, systemd.jounal-fields Special jounal fields, 2012, etieved: JOURNALJSON: Joe Rayhak, Nis Matensen, Jounal JSON Fomat, 2013, etieved: Köbschall: D. Ged Köbschall, pesonal intevie at , 2013 Malpass2011: Ian Malpass, Measue Anything, Measue Eveything, 2011, etieved: MSEventLog: Micosoft, MSDN Event Logging, 2013, etieved: MSEVENTSCHEMA: unknon / Micosoft, Windos Event Schema, 2013, etieved: nxlog: Botond Botyanszki, NXLOG Community Edition Refeence Manual fo v , 2009, etieved: nxlog-va-aning: Botond Botyanszki, NXLOG Community Edition Refeence Manual fo v , 2009, etieved: OctopussyInstallation: unknon, Octopussy Installation,, etieved: Ømq: Piete Hintjens, ØMQ - The Guide, 2013, etieved: OpenSouce: Open Souce Iniative, The Open Souce Definition, unknon, etieved: OSAch2012: Amy Bon, Geg Wilson, The Achitectue Of Open Souce Applications, 2012, ISBN: Poetteing2012: Lennat Poetteing, Foad Secue Sealing (FSS) is finally coming to +systemd's jounal., 2012, etieved: RabbitMQ: GoPivotal, Inc., What can RabbitMQ do fo you?, unknon, etieved: RabbitMQ-SSL: GoPivotal, Inc, SSL Suppot, unknon, etieved: Redis: unknon, Intoduction to Redis, unknon, etieved: VII

8 edis-secuity: unkon, Redis Secuity, unknon, etieved: RELP: Raine Gehads, RELP - The Reliable Event Logging Potocol, 2008, etieved: RFC3164: C. Lonvick, RFC3164: The BSD syslog Potocol, 2001, etieved: RFC3339: G. Klyne, C. Neman, Date and Time on the Intenet: Timestamps, 2002, etieved: RFC4627: D. Cockfod, The application/json Media Type fo JavaScipt Object Notation (JSON), 2006, etieved: RFC4765: H. Deba, D. Cuy, B. Feinstein, The Intusion Detection Message Exchange Fomat (IDMEF), 2007, etieved: RFC5424: R. Gehads, RFC5424: The Syslog Potocol, 2009, etieved: RFC5426: A. Okmianski, Tansmission of Syslog Messages ove UDP, 2009, etieved: SEC: D. Risto Vaaandi, SEC - simple event coelato, 2013, etieved: Seguin2013: Kal Seguin, The Little Redis Book, 2013, etieved: Shao2011: Zheng Shao, Real-time Analytics at Facebook, 2011, etieved: Sissel2012: Jodan Sissel, Poposal: ne logstash event schema, 2012, etieved: Sissel2013: Jodan Sissel, lumbejack,, etieved: Sissel2013-2: Jodan Sissel, MITRE's CEE is a failue fo pofit., 2013, etieved: SLES2013: SuSE Entepise Team, Release Notes fo SUSE Linux Entepise Seve 11 Sevice Pack 2, 2013, etieved: STOMP: Unknon, STOMP Potocol Specification, Vesion 1.2, 2012, etieved: Tunbull2013: James Tunbull, The Logstash Book, 2013, etieved: ULM: J. Abela, T. Debeaupuis, Univesal Fomat fo Logge Messages, 1999, etieved: Vaaandi2012: D. Risto Vaaandi,D. Michael R. Gimaila, Secuity Event Pocessingith Simple Event Coelato, 2012, etieved: Valdman2001: Jan Valdman, Log File Analysis, 2001 XML Fomat: 2013, Extensible Makup Language (XML),, etieved: ZMTP: imatix, 15/ZMTP - ZeoMQ Message Tanspot Potocol, 2012, etieved: ZMTP-CURVE: Piete Hintjens, 26/CuveZMQ Authentication and Encyption Potocol, 2013, etieved: VIII

9 Illustation Index Illustation 1: Log Infastuctue...10 Illustation 2: Octopussy ule ceation...25 Illustation 3: syslog in/out plugins...31 Illustation 4: Gaylog2 eb page...32 Illustation 5: Logstash eb page...33 Illustation 6: ELSA eb page...35 Illustation 7: Octopussy home page...36 Illustation 8: LogAnalyze eb page...37 Illustation 9: Kibana 2 eb page...38 Illustation 10: Kibana 3 eb page...39 Illustation 11: Elasticseach ith HEAD plugin...41 Illustation 12: Possible toolchains (ed=stoage, yello=nomalize, hite=ebpages, blue=shippe...46 Index of Tables Table 1: Tools used in this thesis...3 Table 2: Tools not used in this thesis...6 Table 3: collecto/shippe Ovevie...20 Table 4: Tanspot Ovevie...23 Table 5: Featue: accepting stuctued log files...47 Table 6: Featue: eliable tanspot...48 Table 7: Featue: high availability...48 Table 8: Featue: Use sepaation and LDAP...49 Table 9: Featue: size of ule base...49 Table 10: Featue: log analysis...50 Table 11: Featue: easy to install...50 Table 12: Featue: ovevie...52 Table 13: Total Ovevie: Pat Table 14: Total Ovevie: Pat IX

10

11 1 Intoduction Log files ae a cental pat in the ok of a system administato. Wheneve something goes ong, the fist look is nomally into one log file o anothe. Fo something so fundamental fo the pope oking and management of a netok of computes, it is fascinating ho fe tools ae available and ho unstuctued log files eally ae. The only pactical useful "defacto standad" as fo a long time [RFC3164] hich has been itten to descibe the syslog potocol that as and is used in diffeent Unix systems. It offes UDP tansmission to a cental log file seve. This syslog seve only sends unstuctued o semistuctued messages. In the last couple of yeas a stong movement came up to put log files onto a moe stuctued path. It also has a stong emphasis on moden tools like NoSQL and JSON. This thesis ill sho an ovevie on the cuent state of this stuctued log file. It ill sho the diffeent fomats that ae used today, hich tools can help ith the ceation of a stuctued log infastuctue and hat is still missing o can be impoved. 1.1 Selection citeia This thesis aims to give an ovevie ove the available Open Souce and Fee Softae tools ith the folloing citeia fo selecting the ight tools. The citeia ae based on the equiements of multiple companies the autho oked in the past and pesent. This is aimed toads middle and lage companies. Combines data fom many souces and diffeent fomats In this thesis the enomous amount of log file analyzes that can only analyze one log file fomat ill be ignoed. This includes tools like astat, analog etc. that only analyze apache logs. Easy to use fo aveage system administatos (Windos and Linux) This necessitates a usable documentation Has to be stuctued to ease analysis Geneate stuctued log files out of unstuctued text log files, fo easy tansition fom unstuctued to stuctued logs. Secue and eliable tanspot (lost messages ae to be avoided, but message delivey does not need to be guaanteed) Data should be stoed and pocessed edundantly to avoid single point of failue Fast enough to get the data fom thousands of machines on a "nomal" seve 8-16 Coes, 16-64GB RAM Contains only Open Souce and Fee Softae that can un unde diect contol of the use / administato. All pats of the system must confim to this ule. Only ith Open Souce and Fee Softae it is possible to have an auditing of the used softae to check fo compliance ith existing egulations. Log files can contain pesonal infomation and should not be stoed in the cloud fo pivacy easons. 1

12 This thesis does not include OpenCoe Softae. OpenCoe Softae is softae that is offeed as Fee and Open Souce Softae ith special featues only available in a closed souce vesion. With OpenCoe Softae thee ae special featues that make it impossible to ok ith, because this featues ae only available in a closed souce vesion, often called "Po" o "Entepise". Especially encyption is not an optional featue. An OpenCoe Softae poduct ill not accept a ne featue like encyption, because it ill hut the sales of the closed softae poduct. A eal Open Souce poject ill gladly accept code donations to suppot a ne featue, like encyption. The tools have to analyze the data "on the fly". An analysis duing the night in a batch job is to slo in a moden IT old. Active development is necessay, fo all pats of the system. Thee ae a lot of dead Open Souce pogams available, but ithout active development no ne featues and bug fixes ill be available. Of couse anyone could take the code and continue to develop it, but the shee numbe does not allo to include them in this thesis. A poject is consideed dead in this thesis ithout a elease in to yeas, o no commit to the code management in one yea. 1.2 Pogams that ae included in this thesis Pogam Langua stable URL ge Vesion License Function logstash Java/Ru by Apache 2.0 TCNASO gaylog2 Java/Ru by GPLv3 ASO syslog C GPLv3/LGPL TCN syslog-ng C ok-secuity/syslogng/opensouce-loggingsystem/ LGPLv2.1/GPLv TCN 2 nodelogstash Javasci pt Apache 2.0 ode-logstash TCNO octopussy pel NSAO nxlog C GPLv2/LGPLv2 CAN Heka Go MPL v2.0 oodchuc Ruby k MIT oodchuck C aesant pel GPL esant C beave Python 30 MIT lez/beave/eleases C Apache 2.0 sel/lumbejack/eleases C lumbejack Ruby/ (C go) 2 GPLv2/GPLv3 CN

13 Pogam Langua stable URL ge Vesion syslogshippe Ruby License Function BSD sel/syslog-shippe C emote_sys Ruby log BSD /emote_syslog C fluentd Ruby Apache 2.0 C flume Java Apache 2.0 C Apache 2.0 ix C BSD ounal2gelf C BSD entlog-to-syslog/ C elasticsea Java ch Apache 2.0 / S mongodb c AGPLv3 / Apache 2.0 S edis C BSD T abbitmq Elang MPL v1.1 T activemq Java Apache 2.0 T 0MQ C LGPLv3+ T SEC pel GPLv2 A Sagan C m/ GPLv2 A StatsD Javasci pt MIT d/ O Gaphite Python Apache 2.0 O Fnodmeti Java/Ru c by MIT O Kibana2 Ruby BSD O Kibana3 Javasci 3.0.0pt m3 Apache 2.0 O ncode/logi Python x systemd/jo Python unal2gelf eventlogto-syslog C++ LogAnalyz PHP e GPLv3 m/ O Table 1: Tools used in this thesis About column "Functions included": T=Tanspot, C=Collecto, N=Nomalization, A=Analyze, S=Stoage, O=Output 3

14 About column "License": The autho is not a laye and the licenses that ae shon hee ae the ones that ae shon on the ebsite, README o LICENSE file. I did not check evey file and this is not a license analysis. 1.3 What this thesis is not coveing This thesis ill not cove the monitoing fo availability o pefomance. The chances that a pocess dies ithout an eo message that could be analyzed is much too high. Fo that kind of monitoing I suggest tools like the Open Monitoing Distibution (OMD), nagios o Zabbix. Also not included in this thesis is the compliance ith diffeent las and egulations, like PCI DSS, FISMA, HIPAA o best pactice fameoks such as ISE2700 and COBIT. Fo moe infomation see Chapte19 of [Chuvakin2013]. Also the compliance to data potection las ae not coveed, even hen sometimes tools fo some data anonymization ae shon. Tools that ae only designed to ok ith Intusion Detection Systems and use only log file analysis as a small pat of a lage design ae not descibed hee. Fo this eason the tools OSSIM and ossec.net ae not included Hadoop The Hadoop ecosystem ith Hadoop Distibuted File System (HDFS) is the efeence fo Open Souce big data management. The Hadoop Distibuted File System is a distibuted and scalable filesystem, based on the Hadoop Infastuctue and allos to use the MapReduce mechanism to let the ok be done in a ay to bing the computation and data stoage close togethe, often on the same machine. To analyze, quey and summaize the data the Hive Dataaehouse and the HBase non-elational Database can be used. Hadoop is an Apache poject and uses the Java platfom. Access to the data is available fom diffeent pogamming languages. The setup and pogamming is quite complicated, compaed to othe solutions that ill be coveed in this thesis, but can handle much bigge datasets ith multiple Petabyte. It is possible to use Hadoop to ceate a log analyzing infastuctue, but building a Hadoop infastuctue simply fo log file analysis is ovesized. Thee is a hadoop subpoject that ceates a log analyzing platfom on top of Hadoop named chucka, but this poject is almost dead o dying ith no mail in the Mailing list fo 6 Month and only some one and to lines bug fixes fom one develope in This does not comply ith the definition of a dead poject, but neve the less it ill not be included in this thesis, because of the necessity fo hadoop. Scibe a tool un on top of Hadoop as used by Facebook to analyze the log files, but this poject as appaently abandoned by Facebook and eplaced by a closed souce tool called Calligaphus [Hang2011] [Shao2011] Theefoe Hadoop based solutions ill not be coveed in the thesis, including Hive, HBase, Hadoop Distibuted Filesystem, Thift, Avo, OpenTSDB and pig. The Apache pojects flume is included because it cannot only ite to Hadoop, but also to othe data stoages Pogams that ae not included in this thesis The folloing pogams ae not included in this thesis. 4

15 Pogam splunk URL eason fo exclusion Not Open Souce/Fee Softae 5

16 Pogam URL eason fo exclusion loggly cloud based, not Open Souce/Fee Softae ntsyslog dead poject, last elease 2007 OSSIM log managment only in Closed Souce vesion, open coe Sguil dead poject, last elease 2011 ads.html logsufe dead poject, last elease 2011 scibe dead poject, abandoned by facebook loghound dead poject, last elease 2004 oghound/ Snae OpenCoe, encyption only in closed naeagents/index.html souce vesion Bevis Poject Lasso dead poject, last elease 2008, ceato olog/ unde ne oneship Riemann only taking logs ceated by on log libay gelfino dead poject, last commit 2012 ossec.net log management only a vey small pat, no stuctued log files Hadoop Tools See chapte Hadoop chucka See chapte Hadoop uka scibe See chapte Hadoop Thift See chapte Hadoop Avo See chapte Hadoop OpenTSDB See chapte Hadoop dendite No usable documentation, this poject as endite/gaphs/commit-activity only active fo 3 eek, ith 8 commits all togethe. loges No usable documentation logtail No license attached 6 dead poject, last commit Feb. 2012

17 Pogam Gaphtastic URL eason fo exclusion No license, no commit in ove a yea phtastic Table 2: Tools not used in this thesis 1.4 Stuctue of this thesis This thesis is divided into 6 main chaptes. The fist chapte Intoduction is the cuent chapte. It contains the selection citeia that have been used to select the pogams and a shot histoy of log files. The chapte Definitions defines the necessay basics to undestand the est of the thesis. The chapte Components and Functions shos the diffeent pats that ae needed fo a centalized and stuctued log file analysis. The chapte Tools ill sho the diffeent tools that ae available in the Fee Softae and Open Souce old and the functionalities they offe. In the chapte Toolchains the diffeent tools ill be put togethe to ceate some examples of a centalized and stuctued log analysis system. The chapte Conclusion ill close the thesis, ith a conclusion and 'a ty to get a glimpse' into the futue of stuctued log file analysis. 1.5 Histoy of log files Long befoe computes, Paish egistes in hich all the baptisms, maiages and buials ae ecoded, could be consideed to be one of the fist log files. A lot of similaities can be found hee; One line pe enty, only append and a semi-stuctued fomat. The name log comes fom the nautical log, a device that as used to measue the speed of a boat. The measuements hee itten into a log book, to get an ovevie on the pogess of the jouney. In compute science the fist computes used small lightbulbs to sho the status of the machine and a good opeato could look at the "blinkenlights" and kne the poblem. When had copy teminals ee idespead the fist teminal called console as used to pint out the status messages of the system. When had discs ee intoduced log files came into existence. Accoding to D. Ged [Köbschall], Head of Depatment fo Cash & Deivatives IT Opeations at Deutsche Böse AG, Eschbon: "I as able to detect that the machine cashed on the changed blinking hythm on a HP When I as oking on a Contol Data Copoation 1700 (ceated 1966) it used a teletype as a console and nomally it ould ing a bell hen a cash occued. Late in 1977 hen OpenVMS as developed, it ote the status messages not only to the had copy teminal, but to the OpenVMS opeato log and still does that on the cuent OpenVMS machines that un the Xeta systems, poeing the Fankfut Stock Exchange". One of the fist standadizations in compute log files as the ceation of syslog in 1980 by Eic Allman [OSAch2012]. Initially ceated as a log mechanism fo sendmail it as late intoduced into the BSDistibution and became the de facto standad of logging in all Unix systems. But not all Unix pogams ae iting log messages ith syslog. On a moden Linux system apache, exim and samba ae thee examples that ae iting thei on files on default, pimaily fo pefomance easons. A lot of sevices ae still using syslog, including most mail seves, con, pam, inetd and ntp. The fist successful lage scale intoduction into stuctued log files as the development of the Windos Eventlog ith Windos NT 3.1. The Windos Eventlog is based on a binay fomat, but can be queied ith a.net-based inteface [MSEventLog]. In Windos Vista and Windos Seve 2008 the Windos Event logging API as eplaced by the Windos Event log API extending the possibilities of the API. 7

18 2 Definitions 2.1 Log file The definition of a log file fom Mak Guzdial [Guzdial1993] "discete ecodings of use actions duing softae use" is not univesal enough, because it does not comply ith most log fles on Unix o Windos based systems, hee also hadae and system messages ae stoed in log fles. Jan Valdman in [Valdman2001] uses a much ide defnition: "Cuent softae application often poduce (o can be confgued to poduce) some auxiliay text fles knon as log fles". This is bette suited to hat an aveage system administato ill undestand unde a log fle, but is vey vague in hat is eally stoed in log fles. D. Chuvakin in [Chuvakin2013] on Page 2 defnes it as: "A log messages is hat a... device... geneates in esponse to some sot of stimuli". The same autho used a diffeent defnition at the company pesentation fo LogLogic in 2008 [Chuvakin2008]: "Log = message geneated by an IT system to ecod hateve event happening". The logstash develope Jodan Sissel defnes a log message as "timestamp plus data" in [Sissel2012]. This may be a simplifed vie, but fo the development of a log fle analysis tool it is the only thing that is consistent to all log fles. My defnition is a moe develope efeenced defnition, because vey often the eason fo iting a log message is not undestandable fom the outside: "A log fle contains the infomation the develope of an application thought to be helpful and inteesting in the cuent state of the softae, togethe ith the timestamp hen this state occued." Log data o log enties o log messages ae all diffeent names fo the content of a log fle. Most log enties ae contained in one line, but that is not valid fo all log enties, like Python o Java stack taces. 2.2 Centalized log file The definition of log file contains the od "file". This induces a efeence to a nomal file on the filesystem in most compute uses. A pogam that ites to a nomal log file should only append data to a log file and neve change it afte it is itten. Secuity extensions like SELinux o the Windos file pemissions allos to enfoce the limitation to append to a log file only. A long standing tadition in Unix netoks is syslog. Syslog as itten in 1980 by Eic Allman as a log mechanism fo the famous sendmail pogam [OSAch2012]. An ealy add on as done to send log infomation to a cental syslog seve to have a centalized vie of all infomation. "Centalized" in this thesis should be defined as a ay to see and quey all log files of a defined goup of machines (nomally all machines managed by a goup of people) in one ebpage, database o filesystem. The log files do not need to and should not be stoed on a single machine, instead the data should be stoed on multiple machines fo edundancy easons, but the data should be the same on all machines. 2.3 Definition stuctued log files Eic Allman in [OSAch2012] chapte also ote that he thinks syslog as vey ell designed, but specified that the only thing he should have changed as: "I ould pay moe attention to making the syntax of logged messages machine paseable essentially, I failed to pedict the existence of log monitoing." 8

19 The log messages ae send and stoed in diffeent log fomats. As D. Chuvakin ote in a pesentation in 2008 [Chuvakin2008]: "log fomat=layout of the log messages in the fom of fields, sepaatos, delimites, tag etc." The log file fomats can be soted into fou categoies as defined by R. Gehads in [Gehads2011]: "Semi-stuctued" "eakly stuctued" In this fom the log enties ae "like CSV-based fomats, needs extenal infomation to undestand fields". "stongly stuctued o full stuctued" In this fom the log enties ae still lagely fee fom, ith some stuctues ae aleady included, but thee is "no clea distinction beteen field values, field delimites and noise data". In a stongly stuctued log file all infomation stoed in a stuctue of the fomat and "stuctued data, field names and values ae povided". Possible fomats ae XML and JSON among othes. "Stongly stuctued based on a validating pofile" Even a stongly stuctued file fomat can be efined hen the "field names, values and semantics ae povided" and can be checked. Typical poblems ith stuctued logs ithout validation pofiles fo example ae: undefined field names: "ip" o "ip-addess" o "ipaddess" fomat: date as defined in fc3164 (May 23 21:22:23) o in fc3339/iso8601 ( T21:22: :00) datatype: host containing only IP addesses o hostname o both Thee should be a fifth categoy, that as not defined be R. Gehads: "unstuctued" log files. These unstuctued log files ae itten by the Linux and the Dain kenel as an example. Hee no stuctue hatsoeve is found. The diffeent fomats that ae in use in the field ill be descibed late in this thesis. 2.4 Definition Open Souce and Fee Softae This thesis ill only look at tools that ae both Open Souce Softae as defined by the Open Souce Initiative [OpenSouce] and Fee Softae as defined by the Fee Softae Foundation [FeeSoftae]. The eason fo that is not only pice, as it is often peceived by manages and accountants, but the possibility to extend the softae as the use sees fit. It also avoids the pitfalls of popietay softae, like the emoving of softae poducts by the manufactue, theefoe focing to eplace the softae because of missing suppot/licenses o the license estiction based on log sizes. 9

20 2.5 Definition Log File Analysis Log file analysis is the pocess to extact usable infomation fom the log files. This can be statistic analysis like pecentages of eo messages pe hosts, ho many mails ee sent o ho often someone tied to guess a passod via ssh. Also some dependency o coelation analysis could be done, like a use tied to login to times unsuccessfully and the thid as successful. A thid possibility is a Baian analysis, to detect unusual eo messages. 10

21 3 Components and Functions To ceate a "centalized and stuctued log file analysis ith Fee Softae and Open Souce tools" a lot of pogams have to inteact togethe. In this chapte the diffeent components that ae necessay ill be explained. Illustation 1: Log Infastuctue Illustation 1 shos the diffeent ays a log message can go fom the pogam that ceated the log file to the Stoage and Visual Output. The fist step is the ceation of a log message by the pogam, also knon as log souce. The log message can be ceated in a lot of diffeent fomats, in chapte Fomats the most common ones ill be intoduced. In the chapte Tanspot the diffeent mechanisms to send stuctued and unstuctued messages ae intoduced. The most common used tool fo stoing and tanspoting log messages on Unix based systems is syslog. This tool is both a log fomat and a tanspot fomat in one. Not all pogams can send the log data using syslog. Some ae only able to ite into a log file on the filesystem. The Collecto/Shippe ill ead the data fom disc and foad them to the tanspote o diectly to the cental hub. The Tansfomation/Nomalization step ill take the log data that is not in a pedefined stuctue and convet it into the equested stuctue. This can be done centalized, o on the same machine hee the log souce is unning. The Analysis phase ties to find poblems, attacks and othe abnomalities inside the log data. The Stoage ill save the log data and nomally index it, to speed up seaches. This can be done by taditional SQL systems as ell as NOSQL systems. The Visual output includes gaphs, but also ebpages to simplify the seaching and analyzing of the data. 11

22 3.1 Fomats The main poblem of log files, beside the huge amount of log data that ae ceated by all the diffeent pogams in a company, ae the many diffeent log fomats. The huge amount of unstuctued log files can only be handled each by itself, but the semi stuctued and stuctued log files make it possible to ceate ules and tansfomations to convet them all into a unified fomat. Most of the stuctued log files ae based on one of the existing file fomats JSON o XML. The [XML Fomat] is ceated by the Wold Wide Web Consotium, but is often consideed to be too complex and had to ead fo humans. The JSON fomat as ceated by Douglas Cockfod and is standadized in [RFC4627] and is a vey simple fomat that is easily ead Semi stuctued logs Fom the huge amount of semi stuctued logs that exist in the field only to ae eally defined and used. They ill be shon in the next section BSD syslog (RFC3164) The BSD (sometimes called taditional) syslog fomat as documented afte being used fo seveal yeas in [RFC3164]. It aleady shoed some vey simple semi stuctue. It contained the folloing fields in the heade: PRI contains Facilities (0-23) Seveity (0 emegency 7 debug) Timestamp in the fomat Mmm dd hh:mm:ss (Aug 12 23:12:14) hostname o ip addess The RFC3164 has some vey big limitations, the log enty is limited to 1024 chaactes and the tanspot is defined only fo UDP and is theefoe not eliable Moden syslog (RFC 5424) These limitations of the taditional syslog ee emoved duing the standadization of a ne syslog fomat in [RFC5424] itten in 2009 by R. Gehads develope of syslog and ae no used by all cuent syslog implementations like syslog and syslog-ng. The changes include emoval of the 1024 byte limitation and a sitch in the time fomat to the [RFC3339] standad, itself a sub vesion of ISO The RFC5424 time fomat looks like this: T21:23:20.43Z o T23:23: :00 The fist is itten in UTC, the second is itten in local timezone, but both ae defining the same time. Also notice the use of yeas and sub-second pecession anothe nuisance of the old syslog fomat. RFC5424 also added suppot fo IPv6 addesses, fo UTF-8, a equied suppot fo TLS and suppot fo additional fields like the folloing: 12 Vesion=1 (RFC5425) App-Name

23 PROCID MSGID othe stuctued data fo example: oigin ip addess entepiseid (simila to SNMP) softae softae vesion meta sequenceid sysuptime With RFC5424 the possibility of a stuctue inside syslog as added. Since the ceation in 2009 a lot of implementations ae aleady available that suppot RC5424. But it still is idely used to tanspot unstuctued log data, because it does not equie stuctued log data Stuctued logs The Semi-stuctued log files fom RFC5424 ee only consideed to be the fist step. Thee ae some othe log file stuctues available. Most of the stuctued logs also contain the field definitions CEE The Common Event Expession (CEE) is a standadization effot stated in 2007 and lead by the MITRE Coopeation, a not-fo-pofit oganization that is ell knon in the IT industy fo the Common Vulneabilities and Exposues (CVE) numbes. The poject ceated not only a standadization document that defines a log fomat (called CEE Pofile) hich is encoding neutal (called CEE Log syntax), but also a CEE Log Tanspot. Vesion 1.0-beta1 specified an event fomat that could be encoded in JSON o XML files. The encoding fomat is intechangeable, because the field names togethe ith file types have been standadized in [CEEFields]. Thee fields ae equied to be pesent: host (hostname o IP addess), pname (pocess name) and time. To integate CEE into an existing syslog infastuctue a special heade "@cee:" as ceated that is used as a pefix in font of a valid JSON message. With this pefix it is possible to send CEE messages via BSD (RFC3164) and moden syslog(rfc5425) infastuctues. Poject Lumbejack is an Open Souce poject to implement CEE into Open Souce poducts. Thee is also a pogam ith the same name. To diffeentiate the names "Poject Lumbejack" and "Lumbejack" ae used. Poject Lumbejack is suppoted by both moden syslog implementations syslog and syslog-ng. Togethe ith Red Hat they have spaned the pojects "ceelog utils" and "libumbelog" to ease the implementation in Open Souce pojects. Both syslog and syslog-ng ae eady fo CEE in the cuent vesion, but beyond that thee is almost no usage of CEE in the Open Souce old. 13

24 { "host":"system.example.com", "pid":123, "time":" t12:38: :00", "msgid":"abc", "msg":"my event message", "app":"application", "pname":"auth", "sev":10, "action":"login", "status":"success" } Text 1: CEE Example log enty All this effot looked eally pomising, until May 2013 hen MITRE lost funding fom the US govenment and stopped the standadization poject and the mailing list. But even befoe this, thee ee aleady some stong opposing voices. Among othes the logstash develope Jodan Sissel ote in [Sissel2013-2] that the specifications left too many options to choose fom and it is possible to ceate tools that ae CEE compliant, but cannot talk ith each othe, hen one is enfocing JSON and the othe is enfocing XML GELF [GELF] is the Gaylog Extended Log Fomat that as ceated in 2010 as the native fomat of the gaylog2 seve. GELF is not a eal standad, but simply the fomat that gaylog2 specified. GELF is not only a log fomat, but also a tanspot potocol. The definition fo the GELF standad is stoed in the GIT epositoy of gaylog2 and theefoe can be changed by the gaylog2 develope ithout notice. The fomat also uses JSON as a file fomat like CEE, but also specifies that the JSON must be compessed ith zlib o gzip. The folloing fields ae specified in [GELF]: 14 vesion: GELF spec vesion "1.0" (sting); MUST be set by client libay. host: the name of the host o application that sent this message (sting); MUST be set by client libay. shot_message: a shot desciptive message (sting); MUST be set by client libay. full_message: a long message that can i.e. contain a backtace and envionment vaiables (sting); optional. timestamp: UNIX micosecond timestamp (decimal); SHOULD be set by client libay. level: the level equal to the standad syslog levels (decimal); optional, default is 1 (ALERT). facility: (sting o decimal) optional, MUST be set by seve to GELF if empty. line: the line in a file that caused the eo (decimal); optional.

25 file: the file (ith path if you ant) that caused the eo (sting); optional. _[additional field]: evey othe field you send and pefix ith a _ (undescoe) ill be teated as an additional field. { "vesion": "1.0", "host": "1", "shot_message": "Shot message", "full_message": "Backtace hee\n\nmoe stuff", "timestamp": , "level": 1, "facility": "payment-backend", "file": "/va//somefile.b", "line": 356, "_use_id": 42, "_something_else": "foo" } Text 2: GELF message The GELF fomat is quite idely used, ith suppot not only in gaylog, but also in logstash and nxlog and othes JSON-logstash The logstash poject ith lead pogamme Jodan Sissel ceated thei on log fomat. This fomat is not eally specified, but is also used by othe pogams as ell like fluentd and flume. In this fomat thee ae six fields that ae all equied, and specific extensions by the diffeent applications ae added into the "fields" field. 15

26 { => "pok.example.com", => "apache", => [], => { "client" => " ", "duation_usec" => 240, "status" => 404, "equest" "method" "efee" => "/favicon.ico", => "GET", => "-" }, "@timestamp" => " T14:53: " } Text 3: logstash JSON fomat Systemd jounal The systemd is a ne init system that is idely used on ne Linux distibutions. Systemd is the default init system fo Fedoa, Mandiva, OpenSuSE and many othes. One pat of this system is a ne log sevice named jounal hich stated in The jounal as again an attempt to ceate a ne default stuctued log fomat fo Linux. The fields in the jounal ae sepaated into 3 diffeent kinds [JOURNALFIELDS]. Addess fields ( pefix, double undeline) Tusted field (_ pefix, single undeline) Addess fields ae only usable inside the jounal and should not be used outside. Tusted fields ae implicitly added by the jounal and cannot be set by the log client. This includes the _PID, _UID and _EXE fields. Use field (no pefix) All othe fields ae use fields and can be specified by evey log client itself. Thee ae some pedefined fields: MESSAGE, PRIORITY, ERRNO, CODE_FILE, CODE_LINE, CODE_FUNC, SYSLOG_FACILITY, SYSLOG_IDENTIFIER and SYSLOG_PID. A vey impotant field is the MESSAGE_ID, this is a UUID field that makes it possible to geneate a unique identifie fo evey kind of log message. All messages that ae geneated at the same state fom the same pogam, should have the same message id. The jounal intenally uses some binay fomat, but the jounalctl command can expot the log enties into diffeent fomats, including JSON [JOURNALJSON] and an expot fomat. The expot fomat can be used to send jounal enties acoss the netok and looks like a list of envionment vaiables. The jounal fomat allos the same field to be used multiple times inside one enty, this is mapped into a JSON aay. 16

27 The systemd jounal is vey deeply integated ithin systemd and is only available on Linux. To suppot othe opeating systems syslog autho Raine Gehads ceated the libay liblogging to ceate a jounal eplacement libay that is available on all opeating systems. { "_SERVICE":"systemd-logind.sevice" "MESSAGE":"Use haald logged in" "MESSAGE_ID":"422bc3d271414bc8bc9570f222f24a9" "_EXE":"/lib/systemd/systemd-logind" "_COMM":"systemd-logind" "_CMDLINE":"/lib/systemd/systemd-logind" "_PID":"4711" "_UID":"0" "_GID":"0" "_SYSTEMD_CGROUP":"/system/systemd-logind.sevice" "_CGROUPS":"cpu:/system/systemd-logind.sevice" "PRIORITY":"6" "_BOOT_ID":"422bc3d271414bc8bc95870f222f24a9" "_MACHINE_ID":"c686f3b205dd48e0b43ceb6eda479721" "_HOSTNAME":"aldi" "slogin_user":"500" } Text 4: systemd jounal log enty in JSON-petty Windos Event Log Fom NT 3.5 until Windos XP and Windos Seve 2003 Windos used the Windos Event Log, intenally called Event Tacing fo Windos. This as eplaced by a ne vesion called Windos Eventing ith Windos Vista and Windos Seve Windos Eventing enties can be expoted and displayed as XML and displayed ith the Windos Event logs. The Event Schema XML Schema Definition is only available in the Windos SDK, but a textual desciption is available in [MSEVENTSCHEMA]. 17

28 - <Event xmlns=" - <System> <Povide Name="Micosoft-Windos-Secuity-Auditing" Guid="{ A5BA-3E3B0328C30D}" /> <EventID>4672</EventID> <Vesion>0</Vesion> <Level>0</Level> <Task>12548</Task> <Opcode>0</Opcode> <Keyods>0x </Keyods> <TimeCeated SystemTime=" T06:51: Z" /> <EventRecodID>2341</EventRecodID> <Coelation /> <Execution PocessID="516" TheadID="480" /> <Channel>Secuity</Channel> <Compute>WIN-M5PCUTLMBMT</Compute> <Secuity /> </System> - <EventData> <Data Name="SubjectUseSid">S </Data> <Data Name="SubjectUseName">SYSTEM</Data> <Data Name="SubjectDomainName">NT AUTHORITY</Data> <Data Name="SubjectLogonId">0x3e7</Data> <Data Name="PivilegeList">SeAssignPimayTokenPivilege SeTcbPivilege SeSecuityPivilege SeTakeOneshipPivilege SeLoadDivePivilege SeBackupPivilege SeRestoePivilege SeDebugPivilege SeAuditPivilege SeSystemEnvionmentPivilege SeImpesonatePivilege</Data> </EventData> </Event> Text 5: Windos Eventlog XML file Auditlog The audit log is the log file of the Linux auditing system. With the help of the auditing system a linux system administato can monito file changes, logins, logouts, successful and unsuccessful authentications, SELinux violations and can even tace evey available syscall and the esult of this syscall. Nomally the auditing system ites its logs to /va/log/audit/audit.log and uses this fomat: 18

29 type=user_auth msg=audit( :133952): use pid=17126 uid=0 auid= ses= subj=system_u:system_:local_login_t:s0-s0:c0.c1023 msg='op=pam:authentication acct="oot" exe="/bin/login" hostname=? add=? teminal=tty2 es=success' Text 6: Auditlog Intusion Detection Message Exchange Fomat (IDMEF) Intusion Detection Message Exchange Fomat (IDMEF) is an XML based file fomat that is defined in [RFC4765], but is still in the expeimental phase. This fomat can be used by snot and suicata Othe fomats Thee ae a lot of othe fomats defined, almost evey company has ceated its on log fomat. Hee is a list of othe log fomats that ae knon, but deemed too unimpotant to descibe them hee in detail. Common Event fomat Secuity Device Event Exchange (SDEE) is a log fomat ceated by IBM and based also on XML. Univesal Fomat fo Logge Messages ([ULM]) ceated by Cisco as an XML based log fomat fo thei intusion pevention system. Common Base Event ([CBE]) A definition ceated by the company acsight that uses a pipe ( ) to sepaate fields. This IETF fomat uses a space sepaated list of key=value fields and as ceated in 1999 as an intenet daft, but as depecated in the same yea. othe log fomats Apache fomat of the access log can be specified vey feely, including iting JSON-based fomat into the access log. The fomat of the eo log cannot be changed. log4j (tomcat, JBoss) log4j is an Open Souce Java libay that is used by a lot of Java pogams, incl. tomcat and JBoss. With the help of the plugin achitectue it is vey easy to add suppot fo diffeent stuctued log files. mod_secuity mod_secuity is a eb application fieall based on apache, nginx o IIS. It uses its on log fomat ith multiple foote, heades, tailes and body. Thee ae special tools to manage this, like AuditConsole fom jall.og. 19

30 Python Python comes ith its on build-in logging class. Because Python also ships ith JSON since 2.6, it is quite easy to ite JSON log files ith any Python pogam. Ruby Ruby also comes ith its on build-in logging class and changing the log fomat to JSON is possible ith the extension (also knon as gems) named logging by TP. 3.2 Collecto/Shippe A lot of pogams suppot syslog hich diectly allos it to send log data to a cental log seve, but not all. To suppot these kind of pogams a collecto o shippe is needed File Most shippe o collecto tools ae eading existing log files line by line and send those to a cental log seve via a pedefined tanspot. To speed up the detection of ne lines in the log file, most tools ae using a mechanism like inode notification o sized based change detection to avoid eeading evey log file enty again Sockets, named pipes and STDIN Anothe possibility to get log enties into the shippe ae sockets and named pipes. The advantage is that no disc I/O is necessay to get the log enties into the shippe, but hen the shippe is not unning the log messages cannot be handled and messages could be lost. Anothe possibility is eading the messages fom STDIN. With this mechanism the log souce stats the shippe as a subpocess and sends the messages via STDIN (file descipto 0). The log souce should check if the shippe is still unning and estat the shippe hen necessay. Whateve mechanism is used the log shippe sends the collected log messages via one o moe of seveal tanspot mechanisms to the next step of the log analyzing toolset Local Windos Eventlog Some shippes, hen un on a Windos system, can ead the Windos Event log and send them to a cental seve. Because the Event log is aleady stuctued it should be avoided to send the enties in an unstuctued log fomat. But sadly enough that is hat most tools do. diectoy stuctue multiple files ith * STDIN/STDOUT unix domain socket named pipe eventlog local Windos syslog syslog-ng 20 spool message duing dontime flat file logstash systemd jounal Pogamm Compae collecto / shippe

31 oodchuck aesant beave lumbejack syslog-shippe emote_syslog fluentd flume spool message duing dontime Heka systemd jounal eventlog local Windos named pipe nxlog unix domain socket STDIN/STDOUT flat file diectoy stuctue multiple files ith * Pogamm node-logstash systemd/jounal2gelf eventlog-to-syslog Table 3: collecto/shippe Ovevie 3.3 Tanspot Thee ae diffeent tanspot mechanisms to bing the log messages to a cental seve. The tanspot defines the ie potocol that is used to send the messages. Because of the sensitive natue of log files they should be encypted hen send beteen machines. Not all tanspot mechanism ae suppoting that. It should also suppot eliable tanspot making sue that no message is lost on the ay Syslog The BSD syslog standad [RFC3164] also includes a tanspot potocol based on UDP and uses pot 514. The eplacement [RFC5424] does not include a tanspot potocol itself, but equies all implementation to suppot Tanspot Laye Secuity (TLS) ith TCP-Pot A UDP based tanspot using pot 514 is descibed in [RFC5426]. These syslog potocols ae the most used log tanspots in the field. With the help of TCP sessions the tanspot of the log messages is quite eliable, but thee ae cases in hich the loss of data sent via tcp cannot be avoided. The syslog autho Raine Gehads descibes this in his blogpost [Gehads2008]. This poblem led to the development of [RELP], that implements app level acknoledgment. This ceates a much moe eliable tanspot, but is missing encyption. RELP ith encyption is on the TODO list of Gehads, but is not yet stable. Until then it ould be possible to use stunnel to add TLS encyption on top. The poblem ith RELP is that it is no standad and is not used in any othe tool, like syslog-ng. 21

32 3.3.2 AMQP The Advanced Message Queuing Potocol [AMQP] is an open standadized message middleae application laye. AMQP as developed fo the financial industy, but is no used fo a lot of diffeent puposes. Its main advantages ae inteopeability. AMQP is an application laye potocol and it is possible to let multiple AMQP seves (aka. AMQP Boke) fom diffeent vendos talk ith each othe, simila to http o smtp. The othe main advantage is its eliability, because it can be vey tightly contolled that no message that as enteed into an AMQP system can be lost. AMQP uses special tems to descibe its components: An Exchange is hee the log messages ae send fom, hee they ae "poduced". The Queue is hee the log messages ae ead fom. The Bindings ae connecting Exchanges and Queue ith one anothe. The Boke is the AMQP Seve. AMQP suppots both usename and passod authentication as ell as SASL authoization. It also suppots TLS encyption, see Pat 5 of the [AMQP] standad. AMQP is used by the folloing message seve: Apache Qpid, Apache ActiveMQ, RabbitMQ as ell as othes STOMP [STOMP] o Simple (o Steaming) Text Oientated Messaging Potocol is a potocol simila to AMQP, but instead of being a binay fomat, it uses a text based fomat vey simila to http. It is so simple that a telnet session is enough fo some basic usages. Because of the text based fomat it is vey vebose and takes much moe bandidth than necessay. It also lacks some featues that ae available in AMQP. STOMP in the cuent vesion 1.2 suppots usename and passod authentication, but encyption is not available. It is possible to use an stunnel to put an encyption laye aound STOMP. Stomp is used by to seves that ae also speaking AMQP: Apache ActiveMQ and RabbitMQ(ith Plugin) Ømq/ZMTP [Ømq] also knon as ZeoMQ o 0MQ is anothe messages queue system. Unlike STOMP and AMQP it is not built fo inteopeability, but thee ae multiple implementations available. Ømq is a libay that does not need a dedicated boke and is designed to be vey easy to use and vey fast. This simplifies setup enomously and is the main eason hy it is used quite often fo log tanspot. The tanspot potocol is called [ZMTP], but is not idely used outside the Ømq poject itself. A ne vesion of Ømq called CuveMQ as ceated in 2013 to bing encyption suppot to Ømq, the ne tanspot potocol is named [ZMTP-CURVE], but it is vey ne and no stable elease has been ceated yet Redis Redis is a key-value stoe and belongs to the so called NoSQL databases. Fo the use as a log tanspot the build-in featue called "channels" is used to ceate a publish-subscibe messaging infastuctue. The edis tanspot does not suppot encyption. Only a passod authentication scheme ithout usenames is available. Moe in the edis chapte on page

33 3.3.6 Lumbejack Lumbejack is the tanspot potocol used by the shipping tool ith the same name. This is not to be confused ith the poject lumbejack, belonging to the CEE initiative. The develope is Jodan Sissel ho also ceated logstash and as ceated because he needed a tanspot potocol that suppoted "encypted, tusted, compessed, latency-esilient, and eliable tanspot of events"[sissel2013]. Moe about lumbejack on page Remote Windos Eventlog Micosoft Windos also has its on tanspot mechanism. This mechanism is pimay used by Micosoft itself, but access to the Client component is available via an API on a Windos Seve. It is theefoe possible to collect all Eventlogs fom all machines in a complete Windos domain and send it to the cental machine ithout having to install the shippe on all machines. Sadly no Open Souce and Fee Softae tool suppots this at the moment. It as pat of Poject Lasso, but this poject is dead no and does not suppot the ne Log Fomat used since Windos Vista and Seve Heka oodchuck aesant beave syslog-shippe lumbejack kafka vanishlog nxlog gaphtastic gaphite octopussy statsd SNMP node-logstash lumbejack gelf syslog-ng 0MQ Stomp (ActiveMQ,RabbitMQ) syslog amqp (QPID,ActiveMQ,RabbitMQ) http edis RELP gaylog2 ebsocket TLS encypted channel IETF syslog tcp tls (RFC5425) logstash IETF syslog tcp (RFC5424) BSD syslog udp (RFC3164) Pogamm IETF syslog udp (RFC5424) Compae Tanspots 23

34 kafka vanishlog gaphtastic gaphite statsd SNMP lumbejack gelf 0MQ Stomp (ActiveMQ,RabbitMQ) systemd/jounal2gelf eventlog-to-syslog amqp (QPID,ActiveMQ,RabbitMQ) edis ncode/logix ebsocket http flume RELP TLS encypted channel fluentd IETF syslog tcp tls (RFC5425) IETF syslog tcp (RFC5424) BSD syslog udp (RFC3164) IETF syslog udp (RFC5424) Pogamm emote_syslog Table 4: Tanspot Ovevie 3.4 Tansfomation/Nomalization Most of the log messages that ae sent today ae not yet in a stuctued log fomat. To bing stuctue into these log messages a tansfomation o nomalization is necessay. This detects the diffeent kind of messages and ceates key-value pais out of the unstuctued log messages. Thee ae diffeent ays to do this. A egula expession (egex) based system could be used to achieve this, but managing and iting lage egula expessions can be cumbesome and eo pone. Most of the tools ae oking on a diffeent basis. This appoach is called samples based o patten based. Hee the pasing is done based on fixed stings and matching is done ith pedefined field type. The egula expession to pase a line like this: sshd[1738]: Accepted passod fo oot fom pot ssh2 Would be: sshd\[[0-9]+\]: Accepted (gssapi(-ith-mic -keyex)? sa dsa passod publickey keyboad-inteactive/pam) fo [^[:space:]]+ fom [^[:space:]]+ pot [0-9]+ ssh2 It is easie to maintain a uleset like this: sshd [!PID!]: Accepted!AUTHMETHOD! fo!username! fom!ipaddress! pot!portnumber! ssh2 The Tansfomation can be done on the cental seve o on evey client itself. The cental Tansfomation has the advantage that the ules ae stoed in one place and can be changed quite easily, but the CPU usage can be a poblem in lage setups. To avoid that, the Nomalization can be 24

35 spead out to multiple nodes, o the ok can be done on the client side befoe the tanspot. The CPU load on evey client is quite small, but the distibution of the ule set can be poblematic, if no configuation management like puppet, chef o ansible is aleady in place Patten-DB The Patten-DB is pat of syslog-ng and is nomally compiled into the syslog-ng binay. The documentation of Patten-DB is vey complete and syslog-ng has a GIT epositoy hee it collects ules fo diffeent sevices. The pattens itself ae stoed inside an XML stuctue and include test messages and examples. The Patten-DB is vey actively maintained and also allos fo messages to be coelated. This allos fo mail seve to save sende and ecipient of a mail into one log enty o to put togethe the coelations beteen the logon and logoff times, to save the duation of a login. To pase the example fom above the ule should look like this: sshd [@NUMBER:PID:@]: ssh Liblognom The Liblognom tool is developed by the syslog ceato Raine Gehads and is ceated as a libay so othe tools can use this nomalization tool as ell. Liblognom includes a small tool called "nomalize" to check the ulesets and ceates JSON messages out of nomal log files. This makes ule iting much easie. The documentation is somehat limited, but enough to ceate the ules, but thee is no adequate ule libay so all ules have to be ceated by oneself. Liblognom is not only used by syslog but also by the Sagan poject. They have ceated a ule libay, the only one available. ules=sshd [%pid:numbe%:] Accepted %auth_method:od% fo %usename:od% fom %sc-ip:ipv4% pot %sc-pot:numbe% ssh Octopussy Octopussy is a log management system that uses its on log nomalization. The ule base is quite extensive, but it can only be used by the Octopussy system, because it is an integated pat. The pattens ae stoed in an XML file and can be edited and ceated ith the help of the Octopussy ebpage. This makes it vey easy fo a system administato to ceate ne pattens. The example line fom above ould be found by this ule: <@REGEXP("ssh\S+"):daemon@>[<@PID:pid@>]: <@REGEXP("Accepted passod fo.+"):msg@> 25

36 Illustation 2: Octopussy ule ceation Gok The gok libay is ceated by logstash develope Jodan Sissel and is available fo othe tools to be use. Gok itself is based on egex, but makes it easie to ite ules because it allos to give names to egex pattens and use these names instead. To match ou example fom above the patten could look like this: sshd [%{NUMBER:pid}:] Accepted %{WORD:auth_method} fo % {WORD:usename} fom %{IPORHOST:sc-ip} pot %{NUMBER:sc-pot} ssh2 A nice additional tool available fo gok is gokdiscovey. This tool takes a sample log message and ties to pedict the patten that could be used to nomalize this message. Of couse this is not alays diectly usable, but speeds up the ceation of ules ith gok Heka Mozillas Heka includes its on tansfomation. It is based on egex, but it is easie to ead, because it includes the vaiable name inside the egex. The folloing is an example fo pasing the Apache combined log file fomat. Some lines hee deleted hee that ould have defined the type of the fields. match_egex = '/^(?P<RemoteIP>\S+) \S+ \S+ \[(?P<Timestamp>[^\]] +)\] "(?P<Method>[A-Z]+) (?P<Ul>[^\s]+)[^"]*" (?P<StatusCode>\d+) (?P<RequestSize>\d+) "(?P<Refee>[^"]*)" "(?P<Bose>[^"]*)"/' timestamplayout = "02/Jan/2006:15:04: " Filte_egex The Node-Logstash tool uses a pue egex based nomalization. The configuation is a lot moe eo pone, as it is visible in this example: { 26

37 "egex": "^<(\\S+)>(\\S+\\s+\\S+\\s+\\d+:\\d+:\\d+) (\\S+) ([^:\\[]+)\\[?(\\d*)\\]?:\\s+$accepted \ (gssapi(-ith-mic keyex)? sa dsa passod publickey keyboad-inteactive/pam) \ fo [^[:space:]]+ fom [^[:space:]]+ pot [0-9]+( (ssh ssh2))$", "fields":"syslog_pioity,timestamp,@souce_host,syslog_pogam,sy slog_pid,auth_method,usename,sc-ip,sc-pot", "numeical_fields": "syslog_pid","sc-pot" "date_fomat": "MMM DD HH:mm:ss Z" } nxlog Nxlog also offes some limeted Tansfomation beteen fomats. It can convet fo example a Windos Eventlog to a JSON o GELF message, but can not convet unstuctued log fomat into stuctued ones. 3.5 Stoage The taditional ay to stoe log messages is a log file. This may be bad fo seaches, but thee ae some advantages to it. In most cases some kind of database system should be used Log files Taditional log files may feel antiquated, but they have the big advantage that they ae eadable in the futue. 10 o even 30 yea old log files can be ead today, if the physical medium is still eadable. Ne featues make log files even moe inteesting. Since syslog vesion 7.4 it is possible to ceate signed log messages ith the help of guadtime [Gehads2013]. This uses a Keyless Signatue Infastuctue and a hash-tee o Mekel-tee to put multiple small log messages togethe and then uses linked timestamps to make it tampepoof. The Cyptogaphic infomation is shon in [Gehads2013-2] and at.openksi.og. Rsyslog's appoach is tageted to be used hen iting to a log file. It is not possible to be used befoe it is send to the cental log seve. This is a design decision that comes fom the idea that not all log messages that ae send to a cental seve ill be saved. Systemd's jounal also has a signing featue. It uses Foad Secue Sealing (FSS) to achieve a simila objective. Instead of the Keyless Signatue Infastuctue it uses a cyptokey that is displayed as ASCII and QR-code duing ceation. This can be scanned and be used to check if the log file has been alteed. The log files ae stoed locally and can be deleted by an attacke. This poblem is acknoledged by the autho in [Poetteing2012] SQL The idea to use SQL to save syslog data is not ne. Both syslog and syslog-ng have been suppoting SQL databases fo a long time. The poblem is that you cannot eally split up the unstuctued log message, so the table stuctue of such a database is quite simple. Only the syslog stuctue log host, date, facility and pioity can be stoed and the message is a long sting field. Thee ae some possibilities to speed up seaches via full text seach extensions like Sphinx. Stoing stuctued log data in a SQL Database is not easie, because thee ae too many diffeent fields possible. The fixed schema of SQL is not flexible enough to be used fo that. 27

38 The tools ELSA (see page 35) and syslog's LogAnalyze (see page 38) ae using MYSQL (see page 41) to stoe the log messages in an SQL Database. Rsyslog and syslog-ng as ell as othe tools, ae suppoting othe SQL dialects as ell. Some ith the help of the DBI libay, some ith native suppot NoSQL The poblem ith the schema and stuctued log files as one of the easons to move to a NoSQL database, pimaily a document stoe. The document stoe is a NoSQL database like MongoDB and Elasticseach and is stoing documents in JSON, XML o othe data fomats. These documents can be indexed and eplicated to speed up seaches and make the system moe eliable. Thee ae pimaily to NoSQL databases used ith log management, MongoDB (see page 41) and Elasticseach (see page 42) Compae Stoage Pogam mongodb logstash hadoop elasticseach (logstash fomat) gaylog2 (gaylog fomat) syslog syslog-ng (logstash fomat) N/DBI DBI node-logstash (logstash fomat) nxlog DBI Heka fluentd SQL (DBI o native) flume (logstash fomat) (logstash fomat) DBI Table 5: Stoage Ovevie 3.6 Analysis Simply stoing the nomalized log data is not enough, to get some moe usage fom the log files the data in it has to be analyzed. The main eason to analyze the log data is to detect poblems, attacks and to coelate events. Some events need only to be noticed if a lot of them occu. One logon eo is nothing to oy about, 1'000 logon eos ae not nomal and should be checked. A 404 eo on a ebpage is ok, 1'000 pe second is not ok. This kind of analysis should be done automatically, based on a itten ule set nxlog Nxlog has an analyzing functionality. It has a special module called event coelato (pm_evco), but it also suppots simple statistical countes like RATE, COUNT, AVG o the change ate of the RATE called GRAD. This makes it possible to ceate some simple analysis, but thee ae some poblems ith this as itten in the nxlog documentation [nxlog-va-aning]. 28

39 With the event coelato module it is possible to ceate ules to ignoe messages that aive too often, to avoid being flooded by anings. It offes the command "pais", that looks fo events that have a matching pai, the login and logout message of a use is a good example of such a pai. The command "absent" ill seach fo boken pais, ithout the second pat aiving inside a cetain timefame. The folloing example ill send a aning if the field "Message" containing "login failue" is detected 3 times in 60 seconds. <Thesholded> Condition $Message =~ /^login failue/ Theshold 3 Inteval 60 Exec $a_event = "login guessing in pogess"; </Thesholded> SEC The Simple Event Coelato (SEC) is a univesal event pocessing tool, that cannot only be used fo log files but fo faud detection and othe event coelation as ell. SEC is itten in pel and uses egex to coelate the messages. As itten on the [SEC] ebpage: "SEC eads lines fom files, named pipes, o standad input, matches the lines ith pattens (like egula expessions o Pel suboutines) fo ecognizing input events, and coelates events accoding to the ules in its configuation file(s). SEC can poduce output by executing extenal pogams (e.g., snmptap o mail), by iting to files, by sending data to TCP and UDP based seves, by calling pecompiled Pel suboutines, etc." The folloing example fom [Vaaandi2012] shos a ule that checks ssh, apache and iptables/netfilte fo attacks and sends a mail hen an attack is detected: type=eventgoup3 ptype=regexp patten=sshd\[\d+\]: Failed \S+ fo (?:invalid use )?\S+ fom ([\d.]+) pot \d+ ssh2 thesh=3 ptype2=regexp patten2=^([\d.]+) \S+ \S+ \[[^]]+\] [^ ]+ HTTP\/[\d.]+ 4\d+ \d+ thesh2=1 ptype3=regexp patten3=kenel: IN=\S+ OUT= MAC=\S+ SRC=([\d.]+) thesh3=5 desc=repeated pobing fom $1 action=pipe Repeated pobing fom host $1 /bin/mail oot@localhost indo=120 29

40 3.6.3 Sagan Sagan is a eal-time log analysis & coelation tool and is itten in multitheaded C. Sagan ules look simila to the ules of the Snot Intusion Detection System (IDS) to simplify ule management ith oinkmaste and simila tools. The log messages have to be deliveed in a special pipe ( ) sepaated fomat via a FiFo socket. As an output it can ite diectly to a log file o uses banyad2 to ite to a SQL database. This is the same mechanisms that is used by snot. It uses liblognom fo nomalization and its on ules that ae simila to snot ules. The folloing ules ill ceate a aning if moe than 5 authentication failues can be detected inside a 300 second timefame: dop tcp $EXTERNAL_NET any -> $HOME_NET $SSH_PORT (msg:"[openssh] PAM Authentication failue - Bute foce [5/5]"; content: "Authentication failue"; classtype: unsuccessful-use; efeence: ul,iki.quadantsec.com/bin/vie/main/ ; nomalize: openssh; pogam: sshd; afte: tack by_sc, count 5, seconds 300; theshold: type limit, tack by_sc, count 5, seconds 300; fsam: sc, 1 day; sid: ; ev:5;) Logstash and metics Logstash can be used fo some analysis jobs. Thee is a metic plugin that can ceate ates calcuations fo 1, 5, and 15 minutes, as ell as min, max, stddev and avg. The poblem is that thee is no ay to use it diectly, you can only foad it via JSON to anothe tool, like a gaphe as explained in the next chapte. Also missing is the possibility to check if a special use has been mistyping his passod a cetain amount of times in the last couple of minutes Gaylog2 Gaylog2 has the possibility to put messages hich ae selected by a seach quey into a message steam. When a cetain amount of messages aive in a steam, it can tigge an alam and can send mails, using jabbe o call an extenal plugin. It can also foad all messages fom a steam to an output plugin like an extenal paging sevice, but it also misses checks against things like guessing passods. Steams only ok ith ne messages that aive, not ith messages that ae aleady stoed in the elasticseach database. 3.7 Visual output All the collected, nomalized and analyzed log files can be stoed, but ithout a visual output no one ill notice. Thee ae multiple eb applications that can sho diffeent aspects of the log messages, most ae integated into a log tool, like ocotpussy, gaylog2 o ELSA (see chapte Multi pupose tools on page 31). Thee ae to kibana pojects that ae oking ith logstash, but ae developed sepaately. Kibana 3 is even usable ith othe tools like gaylog, as long as it uses timestamps and elasticseach as stoage. Moe about the diffeent eb font ends in chapte Webpage on page 33. If you only ant some gaphs to be added to an existing eb site, special gaphing tools ae available. These gaphing tools can be found in chapte Gaphs on page

41 4 Tools Most of the tools used fo stuctued log file analysis offe multiple components in one pogam. In the last chapte the diffeent pats that ae necessay fo the ceation ee intoduced. This chapte shos the diffeent tools ith all pats that ae built into the tools. This chapte begins ith the multi pupose tools, then the diffeent outputs, then stoage, tanspot, shippe/collecto and finally the analysis tools. 4.1 Multi pupose tools Some tools can be used fo a ide ange of puposes, some othes ae only ceated fo one specific pupose. This section begins ith the multi pupose tools Syslog-ng Syslog-ng is a syslog seve ceated by Balabit.com a Hungay based company. Syslog-ng exists in to vesions: an Open Souce Edition (OSE) and a Pemium Edition. The latte is only available fo paying customes ith suppot and it is not Open Souce. Because of this, syslog-ng as almost emoved fom this thesis, but it is used quite extensively and the missing featues ae not big enough to aant the emoval. The featues missing in the OSE vesion include: Handle Multiline messages, encypted log files, eliable log tansfe, client-side failove and buffeing log messages pesistently to had disc in case the destination becomes uneachable. In this thesis heneve syslog-ng is itten, it is about the OSE vesion. Syslog-ng as the default syslog in SuSE Entepise Linux (SLES) and OpenSuSE, but it is being eplaced by syslog [SLES2013]. It is unknon if that as because of the OpenCoe natue of the development, o to be in sync ith othe distibutions like Debian and Red Hat. Syslog-ng cannot only be used to collect syslog messages, but also as a shippe eading the files diectly. The suppot fo eading multiple files ith ildcads is only suppoted by the closed souce Pemium Edition. The same goes fo the handling of missing syslog seves. When the taget seve is not available, the log messages ae not stoed, but lost. Thee is a huge amount of plugins available inside syslog-ng including iting to SQL Databases, MongoDB and AMQP. With the help of an AMQP Cluste it is possible to make sue that syslog-ng does not lose messages hen a seve is don. Syslog-ng does suppot encyption out of the box. Syslog-ng has its on nomalization tool called Patten-DB ith a huge amount of pedefined ules. This uleset is vey actively maintained [Czanik2013]. See chapte on page25. The documentation of syslog-ng is vey good, ell stuctued and extensive. The OpenCoe natue of syslog-ng is a big poblem, but the huge and actively maintained PattenDB is something that is not available anyhee else Rsyslog Rsyslog stated as a eplacement fo the taditional syslog and as an opponent fo the existing syslog-ng. As develope Raine Gehads ote in [Gehads2007] it as developed to be a eal Fee Softae and Open Souce altenative, because syslog-ng has become a dual-licensed open coe poduct. Rsyslog is completely Open Souce and Fee Softae and suppot is available fom the Geman company of the oiginal autho named Adiscon.com. 31

42 Rsyslog is the default syslog seve fo Fedoa, OpenSUSE, Debian and RedHat Entepise Linux and available in evey majo linux distibution. It suppots a lage amount of input and output plugins as shon in Illustation 3. Illustation 3: syslog in/out plugins The possibility to ite to mysql makes it possible to use the Log Analyze tools fom page 38. One of the unique featues of syslog is the epl plugin, that makes it possible to make sue that syslog messages ae eally eceived by the seve. This only oks ith syslog, because it is not a standad. Togethe ith the disc based queue it is vey easy to make sue no message gets lost. This can also be done via the Ømq plugin. Both these eliable mechanisms suffe fom a lack of encyption. Only the encyption of the nomal syslog taffic is available. The development of encypted epl is unde ay but not available yet. Syslog offes a nomalization libay named "Liblognom" that is descibed on page 25. The documentation of syslog is stange, because a lot of inteesting featues ae only explained in blog posts fom the main autho Gaylog2 Gaylog2 is a complete, Fee Softae and Open Souce log management solution, ceated by Lennat Koopmann and is suppoted by toch.sh a Geman based company. It stoes the data in an elasticseach cluste and the statistics and gaph data in a MongoDB. The log messages can be send via syslog in an unstuctued ay o in the on stuctued fomat named GELF, see page

43 Gaylog2 suppots the ceation of multiple gaylog2 instances, iting to the same elasticseach cluste. This allos to ceate a fail ove setup. Togethe ith the AMQP suppot in gaylog2 it is possible to have diffeent gaylog2 nodes connecting to the same AMQP boke infastuctue. In this setup gaylog2 nodes ill automatically distibute the messages to shae the load. The biggest poblem of gaylog2 is the fixed equiement to a specific elasticseach vesion. This happens because gaylog2 adds its on elasticseach node into the cluste. This cluste node can be configued to stoe data itself o to solely foad the data to othe data nodes. This can ceate some poblems, because the elasticseach development is quite fast, and you have to use an old vesion to un gaylog2. Anothe poblem ith elasticseach is that it handles deletion of old log files not based on date, but only on the size of the log files. This makes it easy to manage the disc space, but it is not knon ho many days of log files ae available. The ebpage fom gaylog2 is moe than a simple dashboad, it allos to sot messages into steams. Steams ae quey esults that can be used fo monitoing and aleting othe pogams, as itten on page 30. Steams can vey easily be ceated fom the ebpage and be put into categoies to make handling of a huge amount of steams easie. The eb inteface also suppots adding admin messages to log enties, if a special poblem is knon. It is possible to ite a egula expession and fo evey log enty that fits this expession, an automatic message is added to the eb page. Gaylog2 offes to emove sensitive infomation like passods. Steams can also tigge alams, send mails, jabbe messages o an extenal plugin. All messages of a steam can also be foaded to an output plugin. To ound it up gaylog2 allos to put machines into host goups. Illustation 4: Gaylog2 eb page Logstash Logstash is the "siss amy knife" of log management. Containing eveything fom tanspots, to collecting local souces, to nomalizing log messages, to stoing data in elasticseach up to a ebpage to quey the data fom elasticseach. 33

44 Logstash development as stated in 2009 by Pete Fitchman and Jodan Sissel and has a huge numbe of input and output plugins as seen on page 57, as ell as filte plugins. The filte plugins includes plugins that can be used fo: anomymization, convet to the GELF, JSON, KV, XML and othe fomats, esolve ip addess into geo coodinates, gok as descibed in chapte Gok on page 26, mege multilines (like stack taces) into one message, split one message into to, tanslate numbe into text (like eo codes into eo message) o esolve IP addesses into hostnames. The documentation is vey good, and includes a vey helpful intoduction. If moe infomation is equied a logstash book itten by James Tunbull is also available [Tunbull2013]. Logstash's on ebpage is vey limited in its usage. It can be used to quey elasticseach, but it is missing a lot of othe featues offeed by the competitos. The big advantage is the simple installation. When logstash is aleady unning, a simple command line ith the paamete "eb" stats the eb seve. Seveal eliable tanspots ae available, like AMQP and syslog ith RELP. It also suppots encypted syslog, but only ithout RELP. With tools like lumbejack it suppots both eliable and encypted tanspots as ell. As the logstash ebpage is vey limited, instead the tools Kibana 2 o Kibana 3 ae often used. Illustation 5: Logstash eb page 34

45 4.1.5 Node-Logstash Node-logstash is a eimplementation of logstash itten in Javascipt, based on node.js and developed by Betand Paquet. It also uses elasticseach, but it is not limited to a specific vesion. Node-logstash has not yet the same amount of plugins as the oiginal. The gok plugin is missing, and a eplacement plugin called filte_egex is oking ith a egex ule base. This is shon in chapte Node-Logstash on page 26. The Filte plugins ae: add_souce_host, add_timestamp, compute_date_field, compute_field, gep, json_field, multiline, mutate_eplace, evese_dns, split, syslog_pi and egex. So a lot of the functionalities ae missing. A eliable tanspot is available ith edis, but thee is no encyption available hatsoeve. The poject stated vey ecently in July 2012, but is vey actively developed ELSA The Entepise Log and Seach Application is a combination of syslog-ng, mysql and sphinx. It is itten pimaily in pel by Matin Holste. He stated the poject in As itten in [Holste2011] the development as diven by the need to ceate a logging seve that could be queied vey fast. Accoding to the [ELSA-UseGuide], it is using the Patten-DB fom syslog-ng fo nomalization, and foads the nomalized log messages to the pel pogams. These ae sending the messages via bulk load to the mysql database. The sphinx seve is indexing the ne data evey fe hous to gain speed and to ok ith lage chunks of data. Afte a defined amount of time the data is moved fom the MyISAM table to a table fom type ARCHIVE. The quey language is based on the google quey language, hich makes it vey easy to use, but does not suppot ildcad seaches, that most othe eb font ends offe. Google also povides a lot of images, Javascipt and css files, that make it impossible to use it ithout intenet access. Fo a tool ith such secuity and pivacy elated data, it is supising that it gets most of the files fom google. The installation is a little unusual, hee most tools tell you to install a list of equiements and then you have to install the package, ELSA only offes an install scipt that does the installation. Sepaated into "node" and "eb" it installs a lot of pogams like mysql, apache, gcc, heade files fo diffeent development packages and a huge amount of cpan modules. It also donloads syslogng and sphinx and compiles it. It also donloads its on souce and cpanm fom the eb. The nice thing is at the end it uns a self-check that puts some messages into syslog and tests if these ae coectly stoed and indexed. It is possible to ceate multiple ELSA nodes, but these nodes ae not eplicated, but instead evey node uns independently ith its on messages, databases and index seve. The ebpage ill send the necessay queies to all ELSA nodes in the cluste, so it looks like all infomation is stoed in one datasouce. This has the advantage that evey node is obust and if a node is missing the infomation of this node is missing too, but eveything else it unaffected. ELSA also offes aleting, a plugin achitectue and host checks that infoms about hosts that ae not sending messages anymoe. A nice featue is the possibility of defining log classes and defining hich use can access hich message. This makes it possible to define that a eb develope can access the eb logs, but not the ssh o audit logs. It also offes dashboads fo a bette ovevie ove diffeent kind of log messages. The documentation is quite extensive, but to ask questions it is necessay to have a google account. 35

46 Illustation 6: ELSA eb page octopussy Octopussy o 8pussy is a quite old poject and as stated in 2005 by Sebastien Thebet. It is itten in pel and is available as a Debian achive o as souce code. It bings its on nomalization libay as shon on page 25. It uses syslog to accept the messages and sends them via a fifo into the ocotpussy dispatche. The dispatche sends the messages to the pase (fo the nomalization). Fo evey host that sends messages to octopussy a ne pase is stated. This can be a poblem ith lage setups, because evey octo_pase has a esident set size of 24 Mebibytes. With thousands of hosts this can be a poblem. On the ebpage it can be defined hich sevices ae unning on the host and automatically the uleset fo this sevice is added. A huge amount of pedefined log fomats o sevices ae available, not only Linux and Windos Sevices, but also MacOS, Netsceen, Ionpot, F5 and Cisco. It is also possible to add ones on sevices and ules. Thee is even a izad that shos all unidentified log messages and helps ith the ceation of ules fo these log messages. Thee is LDAP authentication available, as ell as alets, epoting and d-based gaphics. The big poblem is the data stoage. Thee is a mysql database equied, but the logs ae stoed based on the detected sevice in a compessed cleatext file. These files ae stoed in a date based file hieachy ith one log file fo evey minute of the day, up to 1440 files pe diectoy. This can limit the seach speed. A fast seach is possible, but only if the seach is limited to a specific sevice, because only these logs have to be uncompessed and seached. 36

47 Illustation 7: Octopussy home page nxlog Nxlog is a vey univesal log collecto and shippe, togethe ith some analyzing and nomalization capabilities. It is itten in multitheaded C and is ceated and suppoted by the Hungay based company Nxsec. The code is only eleased on SouceFoge as a ta.gz achive and no souce code epositoy like SVN o GIT is available. Nxlog is OpenCoe softae; some featue ae only available ith the "Entepise" vesion, this includes bette Event coelation, http REST api, snmp input and a emote indos event collection. But the impotant featues ae available in the Open Souce vesion. The achitectue is based on plugins, but these ae called modules hee. The modules ae sepaated into extension modules that add suppot fo message fomats like syslog, gelf, JSON o multi line message pase to handle Java stack tace. Thee ae input and output modules as ell as pocess modules ith suppot fo memoy and disk buffes fo bidging seve gaps. It also offes event coelation and message de-duplication. Encyption is available as an input and output module. The analyzing and event coelation is descibed on page 28. The [nxlog] documentation is quite complete and includes a lot of examples. 37

48 4.1.9 Heka In Apil 2013 the Sevice Team of the Mozilla Foundation announced the fist public elease of Heka on thei page [HekaInto]. Heka is pimaily designed as a shippe, but has some suppot fo nomalization. It is itten in go, but can be extended in the language lua as ell. It uses RabbitMQ as the pimay tanspot, but does not suppot the TLS encyption of AMQP. It can ite diectly to elasticseach since vesion 0.3 eleased in July Heka is a vey young poduct, but ith the suppot of the Mozilla Foundation it could become vey inteesting in the coming months. Heka has some log nomalization featues as shon on page 26. The Documentation is supisingly thoough fo such a young poject. 4.2 Output Some ebpages and gaphics geneatos ae tool independent. These ae shon hee Webpage The Webpages shon hee ae not belonging to a special tool o fameok, but ae developed independently LogAnalyze The LogAnalyze is ceated by the same develope as syslog and is using syslog to ite syslog data into a MySQL database and this data is shon via this ebpage. Stictly speaking this tool should not be shon hee, because it does not use stuctued log data, instead it only uses the semistuctued log data fom syslog. But thee is a possibility to extend LogAnalyze to handle stuctued data as descibed on the syslog ebpage [Gehads2011-2]. 38

49 Kibana 2 Kibana 2 is a eb font end fo accessing the data itten to elasticseach by logstash. Kibana 2 is based on Ruby and needs a lot of Ruby gems. It is suggested that these ae installed ith the help of the Ruby gem bundle. Kibana 2 is offeing inteactive gaphs and can sho tends and distibution of fields in the log data. It even can ceate dashboads o ss feeds based on lucene queies. Kibana 1 as based on php, but is long abandoned. Illustation 9: Kibana 2 eb page Kibana 3 Kibana 3 is a ne vesion of Kibana itten completely in HTML5 and Javascipt by the elasticseach team. The code unning in the eb bose talks diectly ith the elasticseach seve. It is not yet consideed stable, but is oking quite ell aleady. The diffeence beteen kibana 2 and 3 is not only the pogamming language, but the ay kibana 3 oks ithout a fixed fomat in elasticseach. It is possible to use it on othe data as ell, as long as thee is a time field. It is even possible to use gaylog2 fomat ith kibana 3. Because of the use of HTML5 it does not need anything special on the seve side, but it needs a moden eb bose on the client side and theefoe it does not ok ith an olde vesion of "Intenet Exploe" hich ae used at too many companies. Kibana 3 is moe like a eb based dashboad ceation tool, than a simple dashboad. It ships ith an example fo logstash, but it can be vey easily extended and eitten ithout a single line of code. 39

50 Illustation 10: Kibana 3 eb page Gaphs The diffeent eb font ends can ceate some simple gaphs, but if the gaphs should be used in some othe ebsite like monitoing, a special gaph tool is needed StatsD StatsD as ceated by Etsy to follo thei eligion of the "Chuch of Gaphs. If it moves, e tack it." as stated on [Malpass2011]. StatsD is a simple Event Tacking system itten in Javascipt based on Node.js. It eceives the status changes via an UDP socket. A ne counte does not need to be ceated, simply stat adding data to a ne counte and gaphs ill automatically be ceated. Fo vey fequently hit countes it is possible to send only evey ten o evey 100 events to StatsD and it ill be coectly stoed in the counte. StatsD does not ceate gaphs, but sends the data nomally on to gaphite to geneate the gaphs Gaphite Gaphite is used quite often togethe ith StatsD, but can be used ithout it. Gaphite is itten in Python and uses the Tisted and Django fameok. Intenally it uses hispe as a database fo time-seies data (simila to d), cabon is the data point eceive and a eb application is available 40

51 to display the gaphs, also called metic. The intenal achitectue is explained in chapte 7 of [OSAch2012]. Evey gaph o metic has a path that is used to specify the gaph and can help oganize it as ell, like company.ebsites.logging.auth.use.eo. The gaphite messages ae send in a fomat like this: path_to_gaph value unixtimestamp company.ebsites.logging.auth.use.eo The gaphite seve takes this infomation and stoes it aggegated in the hispe database, based on the configuation fo this tee of gaphs and geneates the gaphs to be used by a ebpage Fnodmetic Fnodmetic is a collection and visualization fameok fo time seies data. Thee ae to backends to choose fom plus a eb GUI Fnodmetic Classic Fnodmetic Classic is itten in Ruby and uses a edis NoSQL database fo stoage. This is a Ruby fameok to ite Webpages ith gaphs, using Ruby as a Domain Specific Language ith pe-build idgets to ease development. It eceives data as JSON via a TCP/UDP Pot o via HTTP Post Fnodmetic Entepise Fnodmetic Entepise is using Scala that is un on a JVM. Fnodmetic can be used as a eplacement fo statsd+gaphite, but the API to eceive data is diffeent. Fnodmetic Entepise can eceive data via TCP/UDP o via a http ebsocket. The big diffeence is that the name of the metic contains the metic type like mean, sum, min/max etc Fnodmetic UI The fnodmetic UI is a HTML5 Application fameok that connects to one of the to fnodmetic backends. It is possible to ceate a ne html page o integate it into an existing one, ith only some Javascipt addons including fnodmetic itself and jquey. 4.3 Stoage mysql Mysql is the olds most used Open Souce and Fee Softae Relational Database. Ceated by the MySQL AB Company in 1995 it as bought by Sun and late by Oacle. Since Oacle acquied MySQL, the Open Souce develope including the oiginal authos ee not happy ith the ay Oacle handled the poject. They theefoe ceated a fok named MaiaDB. Both MaiaDB and MySQL ae quite compatible. In this thesis the tem mysql efes to both databases MaiaDB and MySQL. Because mysql is so famous, it ill not be intoduced hee futhe MongoDB MongoDB as one of the ealiest NoSQL databases and one of the most used ones of its kind. MongoDB uses a binay epesentation of JSON called BSON. MongoDB is vey old fo a NoSQL system ith a poduction eady elease in MongoDB is developed and suppoted by the US based MongoDB, Inc. MongoDB is a document database, that can stoe data schema fee. To ceate high availability setups it is suppoting a maste-slave configuation. This nomally uns in an 41

52 asynchonous mode, so the databases ae not alays in sync. To split huge databases it uses shading, hee the data is distibuted to diffeent machines based on a shad key. It also suppots map-educe to distibute data and aggegation opeation ElasticSeach Elasticseach (ES) is also document oiented like MongoDB, but it is designed as a pue seach engine based on the Apache lucene seach libay instead. It povides ealtime data analytics and can be distibuted ove multiple machines, both fo load easons as ell as to impove availability. It has full text seach capabilities and can stoe queies inside elasticseach and execute them faste hen needed. As a quey language it uses the lucene syntax that includes seaches in fields like this: host_souce:testmachine AND eo Elasticseach is vey easy to setup because it only needs a Java untime, the Java ja file and a config file ith the cluste name in it. A cluste is a collection of multiple elasticseach instances that can talk ith each othe. To ceate a cluste ith elasticseach machines, simply make sue multicast is available and all nodes use the same cluste name. Thee is a special tutoial fo using elasticseach to stoe logs [Gheoghe2012]. Elasticseach uses JSON as the document fomat. The JSON documents that should be stoed inside ES can be put thee ith a http PUT equest. To impove pefomance multiple JSON documents can be stoed in one equest using the bulk API. To even futhe impove pefomance a so called ive plugin can be used to push documents into the ES instance. Some othe type of plugins including management eb font ends called "site plugins". The site plugin "elasticseach-head" is used vey often and allos a fast and easy ovevie on the cluste and can make changes as ell. Thee ae seveal othe site plugins available, that can give an insight into the pefomance and esouce usage of elasticseach. Illustation 11: Elasticseach ith HEAD plugin 42

53 Elasticseach uses shading and eplicas to speed up access and distibute the load. Shading is used to split up the data to be distibuted to multiple machines inside a cluste. The default numbe of shads ae five. In this configuation, if a cluste has moe the 5 machines, thee ae some machines that do not stoe any data. So it is helpful to ceate at least as many shads as thee ae nodes in the cluste. These shads can also be eplicated to othe machines inside the cluste to ok as a failove in case of a lost node. Changing the shads in an index can be vey complicated, so it is easie to ceate the coect numbe of shads at ceation time. Adding additional eplicas on the othe hand is vey easy. As Floian Gilche ote in [Gilche2012] it is quite easy to ceate a split bain situation. This can happen hen the elasticseach nodes ae distibuted in to data centes and the connection is seveed. In case of a split bain situation, thee is no possibility to eintegate the to sides, it is necessay to delete one side and use only the othe. This ill lead to loss of data, if it is not handled coectly up font. 4.4 Tanspots Syslog is nice to send log messages via a netok, but hen it comes to eliability and secuity the folloing dedicated tanspot systems can offe some nice altenatives edis [Redis] is an in-memoy database that suppots eplication ith a maste-slave setup, as ell as pesistence ith the help of snapshot and jounal files. Fo the use as a log tanspot the build-in featue called channels ae used to ceate a publish-subscibe messaging infastuctue. Togethe ith the eplication suppot it is possible to ceate a high available setup, but this is still in development and not suppoted yet. Redis does not suppot data encyption and only a passod authoization as descibed in [edis-secuity]. The documentation is vey good and if this is not enough thee is a fee book itten by Kal Seguin called "a little intoduction book about edis" at [Seguin2013] abbitmq [RabbitMQ] is an Open Souce message boke that is developed by RabbitMQ Inc, a London based company, no oned by VMWae. It is itten in elang and is based on the Open Telecom Platfom. Elang is a functional pogamming language and famous fo its possibility to estat the pogam in pats, ithout a complete estat o losing connectivity o function. RabbitMQ also contains access libaies fo a lot of diffeent pogamming languages. RabbitMQ includes gateays to talk ith AMQP, STOMP and MQTT. Tanspot encyption ith TLS is available built-in, using the openssl libay as itten in [RabbitMQ-SSL]. RabbitMQ can be setup to cluste multiple machines in a local netok into a single logical boke. This makes it possible to estat single machines ithout any sevice inteuption. It is also possible to mio queues ove seveal machines to ensue that in case of a hadae failue no messages ae lost. All this high availability is alays paid ith a pefomance penalty. The documentation is vey extensive and contains examples fo all use cases ActiveMQ ActiveMQ is also an Open Souce and Fee Softae message boke. It is itten in Java and eleased unde the supevision of the Apache foundation. The client is available in many languages and it suppots not only AMQP and STOMP, but also XMPP (fome Jabbe) and a RESTfull eb API as itten in [ActiveMQ-Featues]. Accoding to [ActiveMQ-Cluste] ActiveMQ also suppots 43

54 clusteing in diffeent flavos. Fom failove cluste hich make sue that the clients can send messages to the boke even hen a node is don, to Maste-Slave setup hee the messages of one node ae stoed on a second machine to be send, if the maste node goes don. ActiveMQ is suppoting encyption as itten in [ActiveMQ-SSL]. The ActiveMQ documentation is complete and seveal books ae available Ømq [Ømq] o 0MQ o zeomq ae thee diffeent spellings fo the same pogam. Ømq is a socket libay, that can send messages to anothe pocess. This can happen inside the same pocess via inpoc, to othe pocesses on the same machine via IPC, o to pocesses on othe machines ith the help of TCP o Mulitcast connections. The advantage is that it does not need a special boke seve unning. O as Piete Hintjens ote in [Ømq] - The Guide: "Ømq... looks like an embeddable netoking libay but acts like a concuency fameok." As itten on page 22 Ømq does not suppot encyption, but the development has stated to suppot this. Because it does not have a specialized boke it does not suppot clusteing. It is designed fo speed, not fo eliability. Ømq is ceated by imatix, a Belgium based company hich povides commecial suppot offeings. The documentation is vey good and can also be obtained as a book [Hintjens2013]. 4.5 Collecto/Shippe The possibilities of the diffeent shippe/collecto can be seen on the ovevie on page Fluentd Fluentd is a vey univesal shippe/collecto and is developed by Sadayuki Fuuhashi and itten in C fo the pefomance elevant pats and the est in Ruby as itten in the [FLUENTD-FAQ]. It is sponsoed by the Company Teasue Data in Califonia hich offes a cloud based log analysis platfom and uses fluentd to send the data to thei cloud. It uses JSON as the native log fomat and can be consideed a syslog of stuctued logs. It uses a plugin achitectue, ith suppot of output and input plugins. Thee ae ove 150 plugins available fo fluentd and developing a plugin is vey easy. The elasticseach output plugin ites in the same fomat as logstash. It does not suppot any encypted tanspots, but it suppots eliable tanspot and pesistent disc and memoy buffe in case a taget seve is don. It also suppots a high available setup, hee multiple fluentd instances send the messages to a log aggegato unning on to machines. The sende ill sitch ove to the backup aggegato hen the pimay is don. Fluentd can be installed as the fluentd Ruby gem o as a td-agent build as a deb o pm package. The td-agent vesion has a sloe elease cycle, has a moe tested elease, but of couse the gem vesion has the ne featues faste flume The Apache flume poject is "a distibuted, eliable, and available sevice fo efficiently collecting, aggegating, and moving lage amounts of log data" accoding to the [FlumeUseGuide]. Flume is itten in Java and is suppoting multiple Hadoop mechanisms. It is included in this thesis because is also suppots othe inputs and outputs, such as an elasticseach output that ites like logstash. To ite to elasticseach it is necessay to add the elasticseach and lucene-coe jas into the lib diectoy of the flume installation, because it uses the same mechanism to use an elasticseach node to add the data to the cluste. 44

55 The input mechanism is called souce in the flume documentation and the output is called sink. Souce and output ae connected ith channels. Flume does not suppot encyption outside of Hadoop, but a memoy and disk based pesistence in case of a seve dontime is available. The [FlumeUseGuide] is vey extensive ith a lot of examples aesant Aesant is a pel based shippe that suppots edis and the abbitmq client libay. With the help of the second aesant instance unning on the edis machine, it is possible to use encypted edis beave Beave is a Python based shippe that suppots edis, 0mq and uses the abbitmq client libay to access AMQP and stomp based seves. A nice featue of beave is the suppot fo a ssh tunnel to be ceated at statup. This also makes it possible to ceate an encypted connection to 0mq and edis lumbejack The logstash develope needed a shippe that as not Java based and had a vey lo memoy and CPU equiements. To fulfill his need he ceated lumbejack, because thee as no othe system suppoting that. Thee ae to implementations of lumbejack, a Ruby and go based one and a Ruby and c based system. Both ae still eceiving patches, but the go based system is much moe active developed and is stoed in the maste banch of GIT. Accoding to its ceato [Sissel2013] it is:"encypted, tusted, compessed, latency-esilient, and eliable tanspot of events". It uses OpenSSL as a base libay and uses X509 cetificates to check the seve cet. It is possible to use client cetifications as ell. Lumbejack does not suppot caching in case of a connection poblem, but it is possible to configue multiple logstash seves that ae used in case of a connection poblem eventlog-to-syslog Eventlog-to-syslog is an Open Souce and Fee Softae Windos based Eventlog shippe. It is based on the souce code fom Cutis Smith (Pude Univesity) and is no developed by Shein Faia (Rocheste Institute of Technology). This tool suppots both Windos Eventlogs fomats befoe and afte Windos 2008 and Windos Vista. It is itten in C++ and sends the Eventlogs to a Syslog seve. It suppots the taditional syslog fomat via UDP and TCP and also suppots seve failove in case of an uneachable syslog seve oodchuck Woodchuck is a vey simple Ruby based shippe that only suppots edis as output, theefoe no cypto suppot ncode/logix Logix is a vey simple Python based log shippe that is developed since It accepts udp syslog messages and sends them in gelf fomat to a AMQP seve to be ead by gaylog syslog-shippe Syslog-shippe is a vey simple shipping tool itten in Ruby. It can only ead in multiple files and send them to a syslog seve. It ill add the syslog heade if equested, uses TCP and suppot TLS encyption. 45

56 emote_syslog emote_syslog is a Ruby tool, that also eads files and sends them to a emote seve via syslog. It suppots TLS encyption ith client cetificates and can detect ne log files hen using globs. If it is configued to look fo files like /somehee/*.log and a ne log file appeas it ill be collected. It can even do some basic pasing functionalities systemd/jounal2gelf Jounal2gelf is a vey simple Python based shippe ceated in 2012 that takes systemd's jounal messages in JSON fom STDIN and sends them to a gaylog2 seve via GELF. 4.6 Analysis Most analysis tools belong to a multi pupose tool. Thee ae only to independent analysis tools. Sagan is descibed on Page 30 and SEC is descibed on Page

57 5 Toolchains Thee ae multiple ays to build a stuctued log analysis solution, based on the tools descibed in this thesis. Selecting the coect solution is not an easy task. Chapte 4 ties to give an ovevie on the diffeent advantages and disadvantages of these tools. 5.1 Possible toolchains Because of the lage numbe of possibilities to combine the diffeent tools it is necessay to get an ovevie on the toolchains hich can be build. Because thee ae so many collectos/shippes available, this thesis ill stat fom the ebpage side to sho an ovevie on the possible tool stacks. The tool LogAnalyze does not eally suppot stuctued log file analysis, even though some additional fields can be added. This is not a eal stuctued solution and ill not be consideed in this pat of the thesis. Illustation 12: Possible toolchains (ed=stoage, yello=nomalize, hite=ebpages, blue=shippe As seen in Illustation 12, ELSA and Octopussy ae to special cases, because they ae both not modula like the othe tools, but they ae combining diffeent tools in a pedefined ay. This makes it hade to build things on top of the tools. The tools fluentd and flume could be used to ite the log data diectly to elasticseach, but then no nomalization ould be possible. Both ae theefoe not included in the list of toolchains. Kibana 3 can access elasticseach in gaylog2 fomat, but only by ceating the dashboad completely fom scatch, theefoe the dashed line. 47

58 The ebpages can access the data stoage in paallel, as they do not make changes to the data. Because of that they ill be ignoed in the collection of toolchains. This esults in the folloing toolchains: Elasticseach (gaylog fomat) - gaylog2 - logstash Elasticseach (gaylog fomat) - gaylog2 - nxlog Elasticseach (logstash fomat) - logstash (gok) Elasticseach (logstash fomat) - syslog (liblognom) ELSA Octopussy These six toolchains ill be discussed in this chapte. 5.2 Toolchain Featues To get a bette ovevie on the diffeent toolchains, some featues ill be compaed. This includes featues like high availability, size of ule base and ease of installation Accepting stuctued log files To select a solution it is necessay to kno hat kind of log files ae going to be pocessed. This analysis should not only be done fo the cuent log files, but should also include thoughts about hat kind of log files need to be pocessed in the futue. If pogams stat to ite stuctued log files in the futue, it ould be bad if these messages cannot be used by the selected toolkit. If the selected solution needs to accept aleady stuctued log files, both ELSA and Octopussy can be emoved fom the selection because they cannot accept aleady stuctued log files. The only input into the system is semi-stuctued syslog messages. Both have the possibility to nomalize diffeent kinds of syslog messages, but stuctued log messages cannot be send in. The idea of denomalization and e-nomlization ill be ignoed hee, because of possible pasing eos and pefomance easons. Pogam accepts stuctued log files Elasticseach (gaylog fomat) - gaylog2 - logstash Elasticseach (gaylog fomat) - gaylog2 - nxlog Elasticseach (logstash fomat) - logstash (gok) Elasticseach (logstash fomat) - syslog (liblognom) ELSA no Octopussy no Table 6: Featue: accepting stuctued log files Reliable tanspot The log files should not be lost on the ay to the cental log seve. This can be avoided by stoing the log data locally should the log seve be unavailable. 48

59 Both gaylog2 and logstash suppot AMQP fo eliable message tansfe. Logstash suppots a huge numbe of othe eliable inputs. ELSA is based on syslog-ng and only accepts syslog messages. Syslog-ng does not suppot eliable syslog tansfe. Octopussy is using syslog to eceive log messages, because of the RELP tanspot of syslog it is possible to get eliable syslog tanspot ith Octopussy. Pogam Reliable tanspot Elasticseach (gaylog fomat) - gaylog2 - logstash Elasticseach (gaylog fomat) - gaylog2 - nxlog Elasticseach (logstash fomat) - logstash (gok) Elasticseach (logstash fomat) - syslog (liblognom) ELSA no Octopussy Table 7: Featue: eliable tanspot High availability Log seves ae an impotant pat of the seve infastuctue. To make sue the seve is alays unning, it is advisable to ceate a high availability setup. This is nomally done in the fom of multiple seves unning in paallel and iting to the stoage in paallel. In case of a disaste it is helpful to distibute the data to diffeent seves. Cluste softae (like Red Hat, Veitas, VMWae Vmotion) is not consideed hee, as this is a geneal solution that can be used ith evey softae. With gaylog2 multiple gaylog2 nodes can be ceated, hich access the same elasticseach cluste. Only one of the gaylog2 seves needs to be configued as maste, because it has to un some cleanup jobs, but that function can be easily moved. Logstash also suppots iting to the same elasticseach cluste as explained in chapte seven of the logstash book by [Tunbull2013]. ELSA also suppots multiple nodes, but these nodes ae alays iting into thei on local database. If a node goes don, it is possible to ite to a diffeent node and the data is stoed thee. A quey ill be sent to all nodes and theefoe all log messages fom befoe and afte the poblem ae integated into one vie. Octopussy does not suppot a failove solution. Pogam high availability distibuted stoe Elasticseach (gaylog fomat) - gaylog2 - logstash Elasticseach (gaylog fomat) - gaylog2 - nxlog Elasticseach (logstash fomat) - logstash (gok) Elasticseach (logstash fomat) - syslog (liblognom) ELSA no Octopussy no no Table 8: Featue: high availability 49

60 5.2.4 Use sepaation and LDAP A cental log system can be vey helpful ith the coelated and centalized access to all log files, but soone o late othe people outside of the opeatos ill ant access to some of the log files. Having some kind of use sepaation is helpful hee, making it possible to give cetain uses access to only cetain kinds of log messages. Defining hich use gets access to hich log files should be an easy task. To handle use ceation and cental passod management the solution should suppot LDAP o Active Diectoy. Gaylog2 makes it possible to ceate log steams and give uses the pemission to access the steams as eades. Logstash does not have uses and theefoe cannot give a limited vie of the log messages. ELSA has the possibility to activate use logons. In the default setting eveybody that can access the ebpage can ead all log files. With dashboads it is possible to stoe queies and give uses access to the log messages that this quey etuns. In Octopussy the use management is vey detailed. It is possible to ceate ead-only uses that can ead all log files, but cannot make any changes to the configuation. Thee also ae esticted uses that can be limited to the log data of cetain devices, sevices, alets o special epots. Pogam Use sepaation LDAP Elasticseach (gaylog fomat) - gaylog2 - logstash Elasticseach (gaylog fomat) - gaylog2 - nxlog Elasticseach (logstash fomat) - logstash (gok) no no Elasticseach (logstash fomat) - syslog (liblognom) no no ELSA Octopussy Table 9: Featue: Use sepaation and LDAP Size of ule base The ceation of the ules fo log nomalization can be a vey tedious job. It is nice to be able to access a huge amount of them ithout iting oneself. Pogam # of pepaed ules Elasticseach (gaylog fomat) - gaylog2 - logstash 0 Elasticseach (gaylog fomat) - gaylog2 - nxlog 4 Elasticseach (logstash fomat) - logstash (gok) 0 Elasticseach (logstash fomat) - syslog (liblognom) 0 ELSA ~130 Octopussy ~1500 Table 10: Featue: size of ule base Log Analysis The centalized logs should be analyzed to detect attacks and othe anomalies. But not all tools have this equied featue available. 50

61 Pogam log analysis Elasticseach (gaylog fomat) - gaylog2 - logstash limited Elasticseach (gaylog fomat) - gaylog2 - nxlog Elasticseach (logstash fomat) - logstash (gok) limited Elasticseach (logstash fomat) - syslog (liblognom) limited ELSA no Octopussy Table 11: Featue: log analysis Install Even hen the installation is only done once, poblems duing installation ae discouaging the use of the pogam. The installation should un on diffeent Linux distibutions, fo this thesis thee distibutions ae selected. The toolchains ae installed on Red Hat Entepise Linux (RHEL) 6.4, Debian 7 and Ubuntu to test the installation. Elasticseach, gaylog2 and logstash ae all Java based tools. Thus the installation is vey easy, as it only needs a Java untime envionment and some basic statup scipts. The difficulty of the installation of gaylog2 and logstash is the dependency on a specific vesion of elasticseach. Rsyslog is pepackaged fo all 3 distibutions and is also available in the cuent vesion diectly fom the autho. Nxlog is not pepackaged by any distibution, but pepaed packages fo all thee distibutions ae available fom the poject ebsite. ELSA is using an installation scipt. The scipt did not ok duing the test fo this thesis ith Debian 7 and RHEL 6.4, even though they ae listed as suppoted platfoms on the ebpage [ELSAQuickstat]. Ocotpussy offes a ta.gz achive and a built Debian package. The Debian package oks ith Debian, as ell as ith Ubuntu. The ta.gz achive as installable ith RHEL6.4. All thee installations ae descibed on the [OctopussyInstallation] eb page. Pogam Debian 7 Ubuntu RHEL 6.4 Elasticseach (gaylog fomat) - gaylog2 - logstash Elasticseach (gaylog fomat) - gaylog2 - nxlog Elasticseach (logstash fomat) - logstash (gok) Elasticseach (logstash fomat) - syslog (liblognom) ELSA no no Octopussy Table 12: Featue: easy to install Speed The pefomance of the log system can have a vey lage impact on the decision fo a log system. Measuing the pefomance can be vey complicated, because of the diffeent achitectues of the log toolchains. Fo example ELSA is using a batch job based system that accepts messages and stoes them in a queue; this queue ill be impoted once a minute as itten in chapte Capabilities on the ebpage [ELSA-UseGuide]. The indexing is done "evey fe hous" fo pefomance easons. This cannot be faily tested against an elasticseach stoage, hich indexes the data duing aival. 51

62 Also the ay to distibute load is vey diffeent beteen the toolchains. Whee Octopussy can only un on one machine, ELSA can un on many machines, each ith its local database. The elasticseach based tools can distibute the data to many data nodes and shae the data beteen data centes. The maste thesis of [Chuilin2013] ties to handle this kind of poblem. In this thesis the pefomance as tested ith fou diffeent toolchains: Gaylog2, logstash, syslog ae iting to elasticseach and ELSA is iting to MySQL. The esult of the pefomance test is that ELSA is almost ten times faste than gaylog o logstash and still five times faste than syslog. Octopussy as not coveed in this thesis. The use of a vitual machine on top of a Windos okstation could lead to a lot of noise in the data. Because no a data as published it is not possible to check the standad deviation of the tests. The used setup suggests that the standad deviation as quite high. The vey limited amount of RAM (only 2GB) pe machine, could lead to a big disadvantage against the Java based systems ith its lage memoy equiements. A fai test of all toolchains ould also need to include the nomalization phase. To ceate a fai test it has to be made sue that the numbe of ules ae the same eveyhee, otheise the lage patten size of syslog-ng ould be constituted as a disadvantage. One possibility to handle all the poblems ould be to test evey tool sepaately, as fa as this is possible. Tools that can pefom multiple tasks, as logstash, ould be tested sepaately fo evey task. In case of logstash this ould constitute a test fo: using it as shippe, iting data to elasticseach and using it fo nomalization ith gok. With the help of the Foce Flo La it ould be possible to calculate the speed of the hole system, but the numbe of tests to be done fo such an endeavo ould be too lage fo this thesis. Because of all these poblems no pefomance data is published in this thesis. 5.3 Summay The six possible toolchains all have thei advantages and disadvantages. Table 13 is a summay of all the featues compaed in this chapte. This ovevie ill not be used to declae a inne, because hat is impotant o not alays depends on the use case of the company. All six toolchains ee unning in a test envionment fo seveal eeks, ithout any majo incidents. The functionalities diffe much beteen the tools, but all ae stable and they can be used on a daily basis. 52

63 RHEL 6.4 Ubuntu Debian 7 log analysis # of pepaed ules LDAP Use sepaation distibuted stoe high availability Reliable tanspot Pogam Elasticseach (gaylog fomat) - gaylog2 - logstash 0 limit ed Elasticseach (gaylog fomat) - gaylog2 - nxlog 4 Elasticseach (logstash fomat) - logstash (gok) no no 0 limit ed Elasticseach (logstash fomat) - syslog (liblognom) no no 0 limit ed ELSA no no ~130 no no no Octopussy no no ~150 0 Table 13: Featue: ovevie 53

64 6 Conclusion It is aleady possible to ceate a centalized stuctued log file analysis infastuctue and thee ae a lot of diffeent tools available that can help ceate such a log infastuctue. Which tools o toolchains ae selected is up to the use to decide. This thesis ill only sho the advantages and disadvantage of the diffeent tools and the functionalities that they offe. Defining a inne fo evey use case ould be unpofessional. 6.1 Shot summay about evey majo tool As a shot ovevie hee ae all majo tools ith a vey shot summay about the advantages and disadvantage as expeienced by the autho: Syslog-ng: The huge nomalization uleset is vey nice, the missing eliable syslog tanspot because of OpenCoe is not. Rsyslog: Vey flexible syslog seve, the usability of the nomalization tool is limited by the missing ules. Gaylog2: It ants stuctue, but it ill ok ithout it. Can handle pemissions of log messages. Logstash: Siss amy knife of log management. "Eveybody sees eveything" makes it unusable fo some companies. Node-logstash: Can be a nice Java fee altenative fo logstash, but it is not thee yet. ELSA: Stuctued log files cannot be used. The fixation on google and google tools is disconceting fo some uses. Designed fo geat speed. Octopussy: Eveything ill be analyzed as a stuctued log file, but then stoed in a nomal file hich slos seaches. Nxlog: It tanspots logs and can do a lot of changes and analyzing on the ay. The OpenCoe natue is not as bad as othes. Heka: Vey young, but aleady vey usable, could become a geat tanspote. kibana 2: Nice ebpage, but ill be oveshadoed by its nee cousin. kibana 3: Geat flexible eb font end fo elasticseach, as soon as it is finished. statsd + gaphite: Often used gaphe tool chain, not eally coveed in this thesis. mysql: Eveybody knos it, MySQL ill be eplaced by MaiaDB. Elasticseach: A vey easy tool, that eases a lot of poblems. The split bain situation can be vey dangeous in lage setups, if not handled coectly. fluentd: Vey univesal tanspote itten in Ruby. flume: Vey univesal tanspot itten in Java, destined fo Hadoop. 6.2 Futue The futue fo stuctued log files is vey uncetain. The CEE standadization pocess is dead and no evival is in sight, because of uncetain funding. Poject Lumbejack could become the ne standad, if the ceatos push it had enough. Systemd's Jounal could also be the ne standad, but 54

65 ill only be available fo Linux. GELF and Logstash ae aleady in use, but ae both not eal standads. JSON appeas to be the only unifying base fomat eveybody is oking ith, but no decision hich fields should be used and ho they should be named has been eached yet. 6.3 Optimal toolchain If the autho should deam up a pefect log solution it ould be this: A ebpage fo the ceation of missing nomalization ules like octopussy. A pepaed nomalization uleset like in patten-db o octopussy. Logstash fomat fo elasticseach ith gaylog2 functionalities. Both kibana 3 and gaylog2 as eb font ends possible. Easy coelation and analysis based on the nomalized data. Statsd+gaphite gaphs ae pepaed and automatically filled ith data. Logstash input/output capabilities ithout the memoy ovehead of Java. A univesal stuctued log fomat that is used by developes and tools alike. 55

66 Appendix: Abbeviations Abbeviation AMQP Advanced Message Queuing Potocol ELSA Entepise log seach and achive FIFO Fist in - fist out GELF Gaylog Extended Log Fomat GIT Open Souce code management tool GNU Gnu is Not Unix GPL Gnu Public License GUI Gaphical Use Inteface JSON JavaScipt Object Notation KV Key Value LDAP Lighteight Diectoy Access Potocoll RFC Request fo Comment RHEL Red Hat Entepise Linux SVN Subvesion - an Open Souce code management tool STDIN Standad Input TCP Tansmission Contol Potocol TLS Tanspot Laye Secuity UDP Use Datagam Potocol URL Univesal Resouce Locato XML Extensible Makup Language 56

67 syslog-ng node-logstash Heka oodchuck aesant beave lumbejack syslog-shippe emote_syslog flume lumbejack gelf 0MQ nxlog fluentd Stomp (ActiveMQ,RabbitMQ) amqp (QPID,ActiveMQ,RabbitMQ) http edis RELP octopussy ebsocket TLS encypted channel IETF syslog tcp tls (RFC5425) IETF syslog tcp (RFC5424) gaylog2 syslog IETF syslog udp (RFC5424) BSD syslog udp (RFC3164) spool message duing dontime systemd jounal local Windos eventlog named pipe unix domain socket flat file logstash STDIN/STDOUT Pogamm diectoy stuctue / multiple files ith * Ovevie ncode/logix systemd/ jounal2gelf eventlog-tosyslog Table 14: Total Ovevie: Pat 1 57

68 (l) node-logstash othe N/DBI DBI linux accounting log (l) nxlog DBI Heka flume Table 15: Total Ovevie: Pat 2 58 log4j, ic, titte, xmpp, , nagios, amazon (g) syslog-ng fluentd SQL (DBI o native) (l) gaylog2 syslog elasticseach hadoop mongodb gaphite kafka statsd vanishlog SNMP gaphtastic Pogamm logstash (l) (l) (l) nagios, DBI fnodmetic, couchdb, on log libs avo, JMS, HBase, sol, JDBC, ic

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distibuted Computing and Big Data: Hadoop and Map Bill Keenan, Diecto Tey Heinze, Achitect Thomson Reutes Reseach & Development Agenda R&D Oveview Hadoop and Map Oveview Use Case: Clusteing Legal Documents

More information

Software Engineering and Development

Software Engineering and Development I T H E A 67 Softwae Engineeing and Development SOFTWARE DEVELOPMENT PROCESS DYNAMICS MODELING AS STATE MACHINE Leonid Lyubchyk, Vasyl Soloshchuk Abstact: Softwae development pocess modeling is gaining

More information

Comparing Availability of Various Rack Power Redundancy Configurations

Comparing Availability of Various Rack Power Redundancy Configurations Compaing Availability of Vaious Rack Powe Redundancy Configuations By Victo Avela White Pape #48 Executive Summay Tansfe switches and dual-path powe distibution to IT equipment ae used to enhance the availability

More information

HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING

HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING U.P.B. Sci. Bull., Seies C, Vol. 77, Iss. 2, 2015 ISSN 2286-3540 HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING Roxana MARCU 1, Dan POPESCU 2, Iulian DANILĂ 3 A high numbe of infomation systems ae available

More information

Comparing Availability of Various Rack Power Redundancy Configurations

Comparing Availability of Various Rack Power Redundancy Configurations Compaing Availability of Vaious Rack Powe Redundancy Configuations White Pape 48 Revision by Victo Avela > Executive summay Tansfe switches and dual-path powe distibution to IT equipment ae used to enhance

More information

CIS-162. Security Baselines. Security Baselines. Best Practice aka Beer and Pizza. Remove all unneeded processes

CIS-162. Security Baselines. Security Baselines. Best Practice aka Beer and Pizza. Remove all unneeded processes CIS-162 Intoduction to Netok Secuity Comptia Secuity+ Eam Domain 3 Secuity Baselines Secuity Baselines Standad that oganization systems must comply Establish the nom of consistency Detect the anomaly Pefom

More information

The transport performance evaluation system building of logistics enterprises

The transport performance evaluation system building of logistics enterprises Jounal of Industial Engineeing and Management JIEM, 213 6(4): 194-114 Online ISSN: 213-953 Pint ISSN: 213-8423 http://dx.doi.og/1.3926/jiem.784 The tanspot pefomance evaluation system building of logistics

More information

Converting knowledge Into Practice

Converting knowledge Into Practice Conveting knowledge Into Pactice Boke Nightmae srs Tend Ride By Vladimi Ribakov Ceato of Pips Caie 20 of June 2010 2 0 1 0 C o p y i g h t s V l a d i m i R i b a k o v 1 Disclaime and Risk Wanings Tading

More information

How to SYSPREP a Windows 7 Pro corporate PC setup so you can image it for use on future PCs

How to SYSPREP a Windows 7 Pro corporate PC setup so you can image it for use on future PCs AnswesThatWok TM How to SYSPREP a Windows 7 Po copoate PC setup so you can image it fo use on futue PCs In a copoate envionment most PCs will usually have identical setups, with the same pogams installed

More information

Power Monitoring and Control for Electric Home Appliances Based on Power Line Communication

Power Monitoring and Control for Electric Home Appliances Based on Power Line Communication I²MTC 2008 IEEE Intenational Instumentation and Measuement Technology Confeence Victoia, Vancouve Island, Canada, May 12 15, 2008 Powe Monitoing and Contol fo Electic Home Appliances Based on Powe Line

More information

INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS

INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS Vesion:.0 Date: June 0 Disclaime This document is solely intended as infomation fo cleaing membes and othes who ae inteested in

More information

Database Management Systems

Database Management Systems Contents Database Management Systems (COP 5725) D. Makus Schneide Depatment of Compute & Infomation Science & Engineeing (CISE) Database Systems Reseach & Development Cente Couse Syllabus 1 Sping 2012

More information

Things to Remember. r Complete all of the sections on the Retirement Benefit Options form that apply to your request.

Things to Remember. r Complete all of the sections on the Retirement Benefit Options form that apply to your request. Retiement Benefit 1 Things to Remembe Complete all of the sections on the Retiement Benefit fom that apply to you equest. If this is an initial equest, and not a change in a cuent distibution, emembe to

More information

Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing

Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing M13914 Questions & Answes Chapte 10 Softwae Reliability Pediction, Allocation and Demonstation Testing 1. Homewok: How to deive the fomula of failue ate estimate. λ = χ α,+ t When the failue times follow

More information

Alarm transmission through Radio and GSM networks

Alarm transmission through Radio and GSM networks Alam tansmission though Radio and GSM netwoks 2015 Alam tansmission though Radio netwok RR-IP12 RL10 E10C E10C LAN RL1 0 R11 T10 (T10U) Windows MONAS MS NETWORK MCI > GNH > GND > +E > DATA POWER DATA BUS

More information

Ilona V. Tregub, ScD., Professor

Ilona V. Tregub, ScD., Professor Investment Potfolio Fomation fo the Pension Fund of Russia Ilona V. egub, ScD., Pofesso Mathematical Modeling of Economic Pocesses Depatment he Financial Univesity unde the Govenment of the Russian Fedeation

More information

How to create RAID 1 mirroring with a hard disk that already has data or an operating system on it

How to create RAID 1 mirroring with a hard disk that already has data or an operating system on it AnswesThatWok TM How to set up a RAID1 mio with a dive which aleady has Windows installed How to ceate RAID 1 mioing with a had disk that aleady has data o an opeating system on it Date Company PC / Seve

More information

Over-encryption: Management of Access Control Evolution on Outsourced Data

Over-encryption: Management of Access Control Evolution on Outsourced Data Ove-encyption: Management of Access Contol Evolution on Outsouced Data Sabina De Capitani di Vimecati DTI - Univesità di Milano 26013 Cema - Italy [email protected] Stefano Paaboschi DIIMM - Univesità

More information

Hitachi Virtual Storage Platform

Hitachi Virtual Storage Platform Hitachi Vitual Stoage Platfom FASTFIND LINKS Contents Poduct Vesion Getting Help MK-90RD7028-15 2010-2014 Hitachi, Ltd. All ights eseved. No pat of this publication may be epoduced o tansmitted in any

More information

How to recover your Exchange 2003/2007 mailboxes and emails if all you have available are your PRIV1.EDB and PRIV1.STM Information Store database

How to recover your Exchange 2003/2007 mailboxes and emails if all you have available are your PRIV1.EDB and PRIV1.STM Information Store database AnswesThatWok TM Recoveing Emails and Mailboxes fom a PRIV1.EDB Exchange 2003 IS database How to ecove you Exchange 2003/2007 mailboxes and emails if all you have available ae you PRIV1.EDB and PRIV1.STM

More information

College of Engineering Bachelor of Computer Science

College of Engineering Bachelor of Computer Science 2 0 0 7 w w w. c n u a s. e d u College of Engineeing Bachelo of Compute Science This bochue Details the BACHELOR OF COMPUTER SCIENCE PROGRAM available though CNU s College of Engineeing. Fo ou most up-to-date

More information

Uncertain Version Control in Open Collaborative Editing of Tree-Structured Documents

Uncertain Version Control in Open Collaborative Editing of Tree-Structured Documents Uncetain Vesion Contol in Open Collaboative Editing of Tee-Stuctued Documents M. Lamine Ba Institut Mines Télécom; Télécom PaisTech; LTCI Pais, Fance mouhamadou.ba@ telecom-paistech.f Talel Abdessalem

More information

How to create a default user profile in Windows 7

How to create a default user profile in Windows 7 AnswesThatWok TM How to ceate a default use pofile in Windows 7 (Win 7) How to ceate a default use pofile in Windows 7 When to use this document Use this document wheneve you want to ceate a default use

More information

Firstmark Credit Union Commercial Loan Department

Firstmark Credit Union Commercial Loan Department Fistmak Cedit Union Commecial Loan Depatment Thank you fo consideing Fistmak Cedit Union as a tusted souce to meet the needs of you business. Fistmak Cedit Union offes a wide aay of business loans and

More information

Financial Derivatives for Computer Network Capacity Markets with Quality-of-Service Guarantees

Financial Derivatives for Computer Network Capacity Markets with Quality-of-Service Guarantees Financial Deivatives fo Compute Netwok Capacity Makets with Quality-of-Sevice Guaantees Pette Pettesson [email protected] Febuay 2003 SICS Technical Repot T2003:03 Keywods Netwoking and Intenet Achitectue. Abstact

More information

IBM Research Smarter Transportation Analytics

IBM Research Smarter Transportation Analytics IBM Reseach Smate Tanspotation Analytics Laua Wynte PhD, Senio Reseach Scientist, IBM Watson Reseach Cente [email protected] INSTRUMENTED We now have the ability to measue, sense and see the exact condition

More information

Cloud Service Reliability: Modeling and Analysis

Cloud Service Reliability: Modeling and Analysis Cloud Sevice eliability: Modeling and Analysis Yuan-Shun Dai * a c, Bo Yang b, Jack Dongaa a, Gewei Zhang c a Innovative Computing Laboatoy, Depatment of Electical Engineeing & Compute Science, Univesity

More information

How Much Should a Firm Borrow. Effect of tax shields. Capital Structure Theory. Capital Structure & Corporate Taxes

How Much Should a Firm Borrow. Effect of tax shields. Capital Structure Theory. Capital Structure & Corporate Taxes How Much Should a Fim Boow Chapte 19 Capital Stuctue & Copoate Taxes Financial Risk - Risk to shaeholdes esulting fom the use of debt. Financial Leveage - Incease in the vaiability of shaeholde etuns that

More information

AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM

AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM Main Golub Faculty of Electical Engineeing and Computing, Univesity of Zageb Depatment of Electonics, Micoelectonics,

More information

Chapter 3 Savings, Present Value and Ricardian Equivalence

Chapter 3 Savings, Present Value and Ricardian Equivalence Chapte 3 Savings, Pesent Value and Ricadian Equivalence Chapte Oveview In the pevious chapte we studied the decision of households to supply hous to the labo maket. This decision was a static decision,

More information

Automatic Testing of Neighbor Discovery Protocol Based on FSM and TTCN*

Automatic Testing of Neighbor Discovery Protocol Based on FSM and TTCN* Automatic Testing of Neighbo Discovey Potocol Based on FSM and TTCN* Zhiliang Wang, Xia Yin, Haibin Wang, and Jianping Wu Depatment of Compute Science, Tsinghua Univesity Beijing, P. R. China, 100084 Email:

More information

Module Availability at Regent s School of Drama, Film and Media Autumn 2016 and Spring 2017 *subject to change*

Module Availability at Regent s School of Drama, Film and Media Autumn 2016 and Spring 2017 *subject to change* Availability at Regent s School of Dama, Film and Media Autumn 2016 and Sping 2017 *subject to change* 1. Choose you modules caefully You must discuss the module options available with you academic adviso/

More information

Ignorance is not bliss when it comes to knowing credit score

Ignorance is not bliss when it comes to knowing credit score NET GAIN Scoing points fo you financial futue AS SEEN IN USA TODAY SEPTEMBER 28, 2004 Ignoance is not bliss when it comes to knowing cedit scoe By Sanda Block USA TODAY Fom Alabama comes eassuing news

More information

How To Use A Network On A Network With A Powerline (Lan) On A Pcode (Lan On Alan) (Lan For Acedo) (Moe) (Omo) On An Ipo) Or Ipo (

How To Use A Network On A Network With A Powerline (Lan) On A Pcode (Lan On Alan) (Lan For Acedo) (Moe) (Omo) On An Ipo) Or Ipo ( Hubs, Bidges, and Switches Used fo extending LANs in tems of geogaphical coveage, numbe of nodes, administation capabilities, etc. Diffe in egads to: m collision domain isolation m laye at which they opeate

More information

est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years.

est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years. 9.2 Inteest Objectives 1. Undestand the simple inteest fomula. 2. Use the compound inteest fomula to find futue value. 3. Solve the compound inteest fomula fo diffeent unknowns, such as the pesent value,

More information

9:6.4 Sample Questions/Requests for Managing Underwriter Candidates

9:6.4 Sample Questions/Requests for Managing Underwriter Candidates 9:6.4 INITIAL PUBLIC OFFERINGS 9:6.4 Sample Questions/Requests fo Managing Undewite Candidates Recent IPO Expeience Please povide a list of all completed o withdawn IPOs in which you fim has paticipated

More information

A framework for the selection of enterprise resource planning (ERP) system based on fuzzy decision making methods

A framework for the selection of enterprise resource planning (ERP) system based on fuzzy decision making methods A famewok fo the selection of entepise esouce planning (ERP) system based on fuzzy decision making methods Omid Golshan Tafti M.s student in Industial Management, Univesity of Yazd [email protected]

More information

An Epidemic Model of Mobile Phone Virus

An Epidemic Model of Mobile Phone Virus An Epidemic Model of Mobile Phone Vius Hui Zheng, Dong Li, Zhuo Gao 3 Netwok Reseach Cente, Tsinghua Univesity, P. R. China [email protected] School of Compute Science and Technology, Huazhong Univesity

More information

COMPLYING WITH THE DRUG-FREE SCHOOLS AND CAMPUSES REGULATIONS

COMPLYING WITH THE DRUG-FREE SCHOOLS AND CAMPUSES REGULATIONS Highe Education Cente fo Alcohol and Othe Dug Abuse and Violence Pevention Education Development Cente, Inc. 55 Chapel Steet Newton, MA 02458-1060 COMPLYING WITH THE DRUG-FREE SCHOOLS AND CAMPUSES REGULATIONS

More information

Give me all I pay for Execution Guarantees in Electronic Commerce Payment Processes

Give me all I pay for Execution Guarantees in Electronic Commerce Payment Processes Give me all I pay fo Execution Guaantees in Electonic Commece Payment Pocesses Heiko Schuldt Andei Popovici Hans-Jög Schek Email: Database Reseach Goup Institute of Infomation Systems ETH Zentum, 8092

More information

They aim to select the best services that satisfy the user s. other providers infrastructures and utility services to run

They aim to select the best services that satisfy the user s. other providers infrastructures and utility services to run End-to-End Qo Mapping and Aggegation fo electing Cloud evices Raed Kaim, Chen Ding, Ali Mii Depatment of Compute cience Ryeson Univesity, Toonto, Canada [email protected], [email protected], [email protected]

More information

LTI, SAML, and Federated ID - Oh My!

LTI, SAML, and Federated ID - Oh My! LTI, SAML, and Fedeated ID - Oh My! Chales Seveance, Ph.D. Stephen P Vickes IMS Global Leaning Consotium http://www.imsglobal.og/ Poblem Statement We need a way to align IMS Leaning Tools Inteopeability

More information

The Role of Gravity in Orbital Motion

The Role of Gravity in Orbital Motion ! The Role of Gavity in Obital Motion Pat of: Inquiy Science with Datmouth Developed by: Chistophe Caoll, Depatment of Physics & Astonomy, Datmouth College Adapted fom: How Gavity Affects Obits (Ohio State

More information

Faithful Comptroller s Handbook

Faithful Comptroller s Handbook Faithful Comptolle s Handbook Faithful Comptolle s Handbook Selection of Faithful Comptolle The Laws govening the Fouth Degee povide that the faithful comptolle be elected, along with the othe offices

More information

An Analysis of Manufacturer Benefits under Vendor Managed Systems

An Analysis of Manufacturer Benefits under Vendor Managed Systems An Analysis of Manufactue Benefits unde Vendo Managed Systems Seçil Savaşaneil Depatment of Industial Engineeing, Middle East Technical Univesity, 06531, Ankaa, TURKEY [email protected] Nesim Ekip 1

More information

Trading Volume and Serial Correlation in Stock Returns in Pakistan. Abstract

Trading Volume and Serial Correlation in Stock Returns in Pakistan. Abstract Tading Volume and Seial Coelation in Stock Retuns in Pakistan Khalid Mustafa Assistant Pofesso Depatment of Economics, Univesity of Kaachi e-mail: [email protected] and Mohammed Nishat Pofesso and Chaiman,

More information

Smarter Transportation: The power of Big Data and Analytics

Smarter Transportation: The power of Big Data and Analytics Smate Tanspotation: The powe of Big Data and Analytics Eic-Mak Huitema, Global Smate Tanspotation Leade IBM 1 Intelligent Tanspot Systems (ITS) fo the futue 2 BECAUSE WE WANT IT FOR THE FUTURE. How? The

More information

Scheduling Hadoop Jobs to Meet Deadlines

Scheduling Hadoop Jobs to Meet Deadlines Scheduling Hadoop Jobs to Meet Deadlines Kamal Kc, Kemafo Anyanwu Depatment of Compute Science Noth Caolina State Univesity {kkc,kogan}@ncsu.edu Abstact Use constaints such as deadlines ae impotant equiements

More information

THE DISTRIBUTED LOCATION RESOLUTION PROBLEM AND ITS EFFICIENT SOLUTION

THE DISTRIBUTED LOCATION RESOLUTION PROBLEM AND ITS EFFICIENT SOLUTION IADIS Intenational Confeence Applied Computing 2006 THE DISTRIBUTED LOCATION RESOLUTION PROBLEM AND ITS EFFICIENT SOLUTION Jög Roth Univesity of Hagen 58084 Hagen, Gemany [email protected] ABSTRACT

More information

STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION

STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION Page 1 STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION C. Alan Blaylock, Hendeson State Univesity ABSTRACT This pape pesents an intuitive appoach to deiving annuity fomulas fo classoom use and attempts

More information

An Efficient Group Key Agreement Protocol for Ad hoc Networks

An Efficient Group Key Agreement Protocol for Ad hoc Networks An Efficient Goup Key Ageement Potocol fo Ad hoc Netwoks Daniel Augot, Raghav haska, Valéie Issany and Daniele Sacchetti INRIA Rocquencout 78153 Le Chesnay Fance {Daniel.Augot, Raghav.haska, Valéie.Issany,

More information

Methods for the specification and verification of business processes MPB (6 cfu, 295AA)

Methods for the specification and verification of business processes MPB (6 cfu, 295AA) Methods fo the specification and veification of business pocesses MPB (6 cfu, 295AA) Robeto Buni http://wwwdiunipiit/~buni 22 - Business pocess execution language 1 Object We oveview the key featues of

More information

Review Graph based Online Store Review Spammer Detection

Review Graph based Online Store Review Spammer Detection Review Gaph based Online Stoe Review Spamme Detection Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu Univesity of Illinois at Chicago Chicago, USA [email protected] [email protected] [email protected] [email protected]

More information

ON THE (Q, R) POLICY IN PRODUCTION-INVENTORY SYSTEMS

ON THE (Q, R) POLICY IN PRODUCTION-INVENTORY SYSTEMS ON THE R POLICY IN PRODUCTION-INVENTORY SYSTEMS Saifallah Benjaafa and Joon-Seok Kim Depatment of Mechanical Engineeing Univesity of Minnesota Minneapolis MN 55455 Abstact We conside a poduction-inventoy

More information

Instructions to help you complete your enrollment form for HPHC's Medicare Supplemental Plan

Instructions to help you complete your enrollment form for HPHC's Medicare Supplemental Plan Instuctions to help you complete you enollment fom fo HPHC's Medicae Supplemental Plan Thank you fo applying fo membeship to HPHC s Medicae Supplement plan. Pio to submitting you enollment fom fo pocessing,

More information

Effect of Contention Window on the Performance of IEEE 802.11 WLANs

Effect of Contention Window on the Performance of IEEE 802.11 WLANs Effect of Contention Window on the Pefomance of IEEE 82.11 WLANs Yunli Chen and Dhama P. Agawal Cente fo Distibuted and Mobile Computing, Depatment of ECECS Univesity of Cincinnati, OH 45221-3 {ychen,

More information

Channel selection in e-commerce age: A strategic analysis of co-op advertising models

Channel selection in e-commerce age: A strategic analysis of co-op advertising models Jounal of Industial Engineeing and Management JIEM, 013 6(1):89-103 Online ISSN: 013-0953 Pint ISSN: 013-843 http://dx.doi.og/10.396/jiem.664 Channel selection in e-commece age: A stategic analysis of

More information

Experiment 6: Centripetal Force

Experiment 6: Centripetal Force Name Section Date Intoduction Expeiment 6: Centipetal oce This expeiment is concened with the foce necessay to keep an object moving in a constant cicula path. Accoding to Newton s fist law of motion thee

More information

Armored Car Insurance Application

Armored Car Insurance Application Amoed Ca Insuance Application Applicant Details: Fist named insued: _ Please attach list of any additional insueds to be included fo coveage. Addess: City/State/Zip: Effective date: Expiation date: Additional

More information

883 Brochure A5 GENE ss vernis.indd 1-2

883 Brochure A5 GENE ss vernis.indd 1-2 ess x a eu / u e a. p o.eu c e / :/ http EURAXESS Reseaches in Motion is the gateway to attactive eseach caees in Euope and to a pool of wold-class eseach talent. By suppoting the mobility of eseaches,

More information

Instituto Superior Técnico Av. Rovisco Pais, 1 1049-001 Lisboa E-mail: [email protected]

Instituto Superior Técnico Av. Rovisco Pais, 1 1049-001 Lisboa E-mail: virginia.infante@ist.utl.pt FATIGUE LIFE TIME PREDICTIO OF POAF EPSILO TB-30 AIRCRAFT - PART I: IMPLEMETATIO OF DIFERET CYCLE COUTIG METHODS TO PREDICT THE ACCUMULATED DAMAGE B. A. S. Seano 1, V. I. M.. Infante 2, B. S. D. Maado

More information

DOCTORATE DEGREE PROGRAMS

DOCTORATE DEGREE PROGRAMS DOCTORATE DEGREE PROGRAMS Application Fo Admission 2015-2016 5700 College Road, Lisle, Illinois 60532 Enollment Cente Phone: (630) 829-6300 Outside Illinois: (888) 829-6363 FAX: (630) 829-6301 Email: [email protected]

More information

Cisco 2811 and 2821 Integrated Services Router with AIM-VPN/SSL-2

Cisco 2811 and 2821 Integrated Services Router with AIM-VPN/SSL-2 Cisco 2811 an 2821 Integate Sevices Route ith AIM-VPN/SSL-2 FIPS 140-2 Non Popietay Secuity Policy Level 2 Valiation Vesion 1.5 Septembe 8, 2008 Copyight 2005 Cisco Systems, Inc. This ocument may be feely

More information

Epdf Sulf petroleum, Eflecti and Eeflecti

Epdf Sulf petroleum, Eflecti and Eeflecti ANALYSIS OF GLOBAL WARMING MITIGATION BY WHITE REFLECTING SURFACES Fedeico Rossi, Andea Nicolini Univesity of Peugia, CIRIAF Via G.Duanti 67 0615 Peugia, Italy T: +9-075-585846; F: +9-075-5848470; E: [email protected]

More information

Towards Automatic Update of Access Control Policy

Towards Automatic Update of Access Control Policy Towads Automatic Update of Access Contol Policy Jinwei Hu, Yan Zhang, and Ruixuan Li Intelligent Systems Laboatoy, School of Computing and Mathematics Univesity of Westen Sydney, Sydney 1797, Austalia

More information

Continuous Compounding and Annualization

Continuous Compounding and Annualization Continuous Compounding and Annualization Philip A. Viton Januay 11, 2006 Contents 1 Intoduction 1 2 Continuous Compounding 2 3 Pesent Value with Continuous Compounding 4 4 Annualization 5 5 A Special Poblem

More information

Cisco 3825 and Cisco 3845. Integrated Services Routers. with AIM-VPN/SSL-3

Cisco 3825 and Cisco 3845. Integrated Services Routers. with AIM-VPN/SSL-3 Cisco 3825 an Cisco 3845 Integate Sevices Routes ith AIM-VPN/SSL-3 FIPS 140-2 Non Popietay Secuity Policy Level 2 Valiation Vesion 1.5 Septembe 8, 2008 Copyight 2007 Cisco Systems, Inc. This ocument may

More information

California s Duals Demonstration: A Transparent. Process. Margaret Tatar Chief, Medi-Cal Managed Care Division. CA Coo 8/21/12

California s Duals Demonstration: A Transparent. Process. Margaret Tatar Chief, Medi-Cal Managed Care Division. CA Coo 8/21/12 Califonia s Duals Demonstation: A Tanspaent and Inclusive Stakeholde Pocess Magaet Tata Chief, Medi-Cal Managed Cae Division Depatment of Health Cae Sevices 1 Stakeholde Engagement 1. 2. Inclusive Building

More information

Confirmation of Booking

Confirmation of Booking The Pesentes Rebecca Mogan Rebecca is a Taxation Consultant with the NTAA and has ove 15 yeas tax expeience. Rebecca holds a Bachelo of Ats and Law and a Mastes of Taxation. Rebecca has pesented a numbe

More information

Secure Smartcard-Based Fingerprint Authentication

Secure Smartcard-Based Fingerprint Authentication Secue Smatcad-Based Fingepint Authentication [full vesion] T. Chales Clancy Compute Science Univesity of Mayland, College Pak [email protected] Nega Kiyavash, Dennis J. Lin Electical and Compute Engineeing Univesity

More information

The Incidence of Social Security Taxes in Economies with Partial. Compliance: Evidence from the SS Reform in Mexico

The Incidence of Social Security Taxes in Economies with Partial. Compliance: Evidence from the SS Reform in Mexico The Incidence of Social Secuity Taxes in Economies ith Patial Compliance: Evidence fom the SS Refom in Mexico Gecia M. Maufo Abstact Looking at impovements in social secuity benefits in Mexico, this pape

More information

The impact of migration on the provision. of UK public services (SRG.10.039.4) Final Report. December 2011

The impact of migration on the provision. of UK public services (SRG.10.039.4) Final Report. December 2011 The impact of migation on the povision of UK public sevices (SRG.10.039.4) Final Repot Decembe 2011 The obustness The obustness of the analysis of the is analysis the esponsibility is the esponsibility

More information

YARN PROPERTIES MEASUREMENT: AN OPTICAL APPROACH

YARN PROPERTIES MEASUREMENT: AN OPTICAL APPROACH nd INTERNATIONAL TEXTILE, CLOTHING & ESIGN CONFERENCE Magic Wold of Textiles Octobe 03 d to 06 th 004, UBROVNIK, CROATIA YARN PROPERTIES MEASUREMENT: AN OPTICAL APPROACH Jana VOBOROVA; Ashish GARG; Bohuslav

More information

Chapter 1: Introduction... 7 1-1. BELSORP analysis program... 7 1-2. Required computer environment... 8

Chapter 1: Introduction... 7 1-1. BELSORP analysis program... 7 1-2. Required computer environment... 8 1 [Table of contents] Chapte 1: Intoduction... 7 1-1. BELSORP analysis pogam... 7 1-. Requied compute envionment... 8 Chapte : Installation of the analysis pogam... 9-1. Installation of the WIBU-KEY pogam...

More information

Data Center Demand Response: Avoiding the Coincident Peak via Workload Shifting and Local Generation

Data Center Demand Response: Avoiding the Coincident Peak via Workload Shifting and Local Generation (213) 1 28 Data Cente Demand Response: Avoiding the Coincident Peak via Wokload Shifting and Local Geneation Zhenhua Liu 1, Adam Wieman 1, Yuan Chen 2, Benjamin Razon 1, Niangjun Chen 1 1 Califonia Institute

More information

Ou Appoach and Types of attack

Ou Appoach and Types of attack BlueBoX: A Policy diven, Host Based Intusion Detection system Suesh N. Chai Pau Chen Cheng IBM Thomas J. Watson Reseach Cente Yoktown Heights, NY 10598, U.S.A. schai,pau @watson.ibm.com Abstact In this

More information

Welcome to the Cloud Stream. Sponsored by:

Welcome to the Cloud Stream. Sponsored by: Welcome to the Cloud Steam Sponsoed by: Entepise Cloud (HEC) Hanessing the Powe of eal- Time Business with the Simplicity of the Cloud Ben Lingwood Diecto HEC GtM Entepise Cloud - Oveview Announced May

More information

An Approach to Optimized Resource Allocation for Cloud Simulation Platform

An Approach to Optimized Resource Allocation for Cloud Simulation Platform An Appoach to Optimized Resouce Allocation fo Cloud Simulation Platfom Haitao Yuan 1, Jing Bi 2, Bo Hu Li 1,3, Xudong Chai 3 1 School of Automation Science and Electical Engineeing, Beihang Univesity,

More information

Chris J. Skinner The probability of identification: applying ideas from forensic statistics to disclosure risk assessment

Chris J. Skinner The probability of identification: applying ideas from forensic statistics to disclosure risk assessment Chis J. Skinne The pobability of identification: applying ideas fom foensic statistics to disclosue isk assessment Aticle (Accepted vesion) (Refeeed) Oiginal citation: Skinne, Chis J. (2007) The pobability

More information

Model-Driven Engineering of Adaptation Engines for Self-Adaptive Software: Executable Runtime Megamodels

Model-Driven Engineering of Adaptation Engines for Self-Adaptive Software: Executable Runtime Megamodels Model-Diven Engineeing of Adaptation Engines fo Self-Adaptive Softwae: Executable Runtime Megamodels Thomas Vogel, Holge Giese Technische Beichte N. 66 des Hasso-Plattne-Instituts fü Softwaesystemtechnik

More information

An application of stochastic programming in solving capacity allocation and migration planning problem under uncertainty

An application of stochastic programming in solving capacity allocation and migration planning problem under uncertainty An application of stochastic pogamming in solving capacity allocation and migation planning poblem unde uncetainty Yin-Yann Chen * and Hsiao-Yao Fan Depatment of Industial Management, National Fomosa Univesity,

More information

Cisco 1841 Integrated Services Router with AIM-VPN/SSL-1. And. Cisco 2801 Integrated Services Router with AIM-VPN/SSL-2

Cisco 1841 Integrated Services Router with AIM-VPN/SSL-1. And. Cisco 2801 Integrated Services Router with AIM-VPN/SSL-2 Cisco 1841 Integate Sevices Route ith AIM-VPN/SSL-1 An Cisco 2801 Integate Sevices Route ith AIM-VPN/SSL-2 FIPS 140-2 Non Popietay Secuity Policy Level 2 Valiation Vesion 1.7 Octobe 13, 2009 Copyight 2009

More information

METHODOLOGICAL APPROACH TO STRATEGIC PERFORMANCE OPTIMIZATION

METHODOLOGICAL APPROACH TO STRATEGIC PERFORMANCE OPTIMIZATION ETHODOOGICA APPOACH TO STATEGIC PEFOANCE OPTIIZATION ao Hell * Stjepan Vidačić ** Željo Gaača *** eceived: 4. 07. 2009 Peliminay communication Accepted: 5. 0. 2009 UDC 65.02.4 This pape pesents a matix

More information

FI3300 Corporate Finance

FI3300 Corporate Finance Leaning Objectives FI00 Copoate Finance Sping Semeste 2010 D. Isabel Tkatch Assistant Pofesso of Finance Calculate the PV and FV in multi-peiod multi-cf time-value-of-money poblems: Geneal case Pepetuity

More information

Exam #1 Review Answers

Exam #1 Review Answers xam #1 Review Answes 1. Given the following pobability distibution, calculate the expected etun, vaiance and standad deviation fo Secuity J. State Pob (R) 1 0.2 10% 2 0.6 15 3 0.2 20 xpected etun = 0.2*10%

More information

The LCOE is defined as the energy price ($ per unit of energy output) for which the Net Present Value of the investment is zero.

The LCOE is defined as the energy price ($ per unit of energy output) for which the Net Present Value of the investment is zero. Poject Decision Metics: Levelized Cost of Enegy (LCOE) Let s etun to ou wind powe and natual gas powe plant example fom ealie in this lesson. Suppose that both powe plants wee selling electicity into the

More information

Automated Hydraulic Drilling Rigs. HHSeries

Automated Hydraulic Drilling Rigs. HHSeries Automated Hydaulic Dilling Rigs HHSeies The Shape of Things to Come CUSTOMSOLUTIONS HH600 Rig The HH Hydaulic Hoist Seies Eveything about the HH Seies is designed fo speed, safety and efficiency. You can

More information