Intelligent Log Analyzer. André Restivo <andre.restivo@portugalmail.pt>

Similar documents

A Symptom Extraction and Classification Method for Self-Management

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Oracle Database Performance Management Best Practices Workshop. AIOUG Product Management Team Database Manageability

Efficient database auditing

Offline Payment Methods

Guideline for stresstest Page 1 of 6. Stress test

How To Use The Correlog With The Cpl Powerpoint Powerpoint Cpl.Org Powerpoint.Org (Powerpoint) Powerpoint (Powerplst) And Powerpoint 2 (Powerstation) (Powerpoints) (Operations

Anglia IT Solutions Managed Anti-SPAM

Hands-on Network Traffic Analysis Cyber Defense Boot Camp

Product Review: James F. Koopmann Pine Horse, Inc. Quest Software s Foglight Performance Analysis for Oracle

Helpdesk manual. Version: 1.1

PROVIDING SINGLE SIGN-ON TO AMAZON EC2 APPLICATIONS FROM AN ON-PREMISES WINDOWS DOMAIN

SEO AND CONTENT MANAGEMENT SYSTEM

Handling Hyper-V. In this series of articles, learn how to manage Hyper-V, from ensuring high availability to upgrading to Windows Server 2012 R2

Workflow Templates Library

Chapter 1 - Web Server Management and Cluster Topology

IriScene Remote Manager. Version 4.8 FRACTALIA Software

Club Accounts Question 6.

Integrating a web application with Siebel CRM system

Internet Filtering Appliance. User s Guide VERSION 1.2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

AlienVault. Unified Security Management (USM) 5.x Policy Management Fundamentals

Mail Merge Tutorial (for Word ) By Allison King Spring 2007 (updated Fall 2007)

Tips for writing good use cases.

Intrusion Detection System using Log Files and Reinforcement Learning

Digital signature in insecure environments

Using Artificial Intelligence in Intrusion Detection Systems

MODEL DRIVEN DEVELOPMENT OF BUSINESS PROCESS MONITORING AND CONTROL SYSTEMS

Making the Right Choice

Listeners. Formats. Free Form. Formatted

Customization & Enhancement Guide. Table of Contents. Index Page. Using This Document

MODEL OF SOFTWARE AGENT FOR NETWORK SECURITY ANALYSIS

HowTo. Planning table online

University of Hull Department of Computer Science. Wrestling with Python Week 01 Playing with Python

INinbox Start-up Pack

Agent s Handbook. Your guide to satisfied customers

E-Commerce Supply Chain Management Domain Research and Standard Architectures Kunal Chopra, Jeff Elrod, Bill Glenn, Barry Jones.

CHAPTER 26 - SHOPPING CART

White Paper. The Ten Features Your Web Application Monitoring Software Must Have. Executive Summary

A Quick Start Guide On How To Promote Your Site Using Do It Myself SEO

Finding a Job. When You Have a Record

A Reseller s Guide to Using Helm

The Integration Between EAI and SOA - Part I

The Social Accelerator Setup Guide

Utility Communications FOXMAN-UN Network Management System for ABB Communication Equipment

INTRUSION PREVENTION AND EXPERT SYSTEMS

Installation Guide for LynxClient

An Oracle White Paper October BI Publisher 11g Scheduling & Apache ActiveMQ as JMS Provider

Getting Started with Dynamic Web Sites

FioranoMQ 9. High Availability Guide

The web server administrator needs to set certain properties to insure that logging is activated.

Automatic measurement of Social Media Use

Cloud Services. Sharepoint. Admin Quick Start Guide

TNT SOFTWARE White Paper Series

White Paper. Java versus Ruby Frameworks in Practice STATE OF THE ART SOFTWARE DEVELOPMENT 1

Simple SEO Success. Google Analytics & Google Webmaster Tools

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

WhatsUp Gold v11 Features Overview

Microsoft Query, the helper application included with Microsoft Office, allows

Dynamics CRM for Outlook Basics

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

Basic ESXi Networking

OAuth Web Authorization Protocol Barry Leiba

QaTraq Pro Scripts Manual - Professional Test Scripts Module for QaTraq. QaTraq Pro Scripts. Professional Test Scripts Module for QaTraq

Troubleshoot Using Event Log Mining

Usability Test Results

An in-building multi-server cloud system based on shortest Path algorithm depending on the distance and measured Signal strength

Test Driven Development Part III: Continuous Integration Venkat Subramaniam

Tableau Server Trusted Authentication

In the same spirit, our QuickBooks 2008 Software Installation Guide has been completely revised as well.

Using WebLOAD to Monitor Your Production Environment

kalmstrom.com Business Solutions

Database manager does something that sounds trivial. It makes it easy to setup a new database for searching with Mascot. It also makes it easy to

SysAidTM. Monitoring Guide

Firewalls Netasq. Security Management by NETASQ

THE WINDOWS AZURE PROGRAMMING MODEL

The ABC s of Communicating with Your Child s School

Writer Guide. Chapter 15 Using Forms in Writer

Protecting and controlling Virtual LANs by Linux router-firewall

New Generation of Software Development

WHAT IS THE CONFIGURATION TROUBLESHOOTER?

Cassandra A Decentralized, Structured Storage System

Evaluation of different Open Source Identity management Systems

Data processing goes big

Cross Bulk Mailer 6.1 User Guide

Building Your Firewall Rulebase Lance Spitzner Last Modified: January 26, 2000

Reverse proxy for Tomcat Project Plan

Outlook 2010 Essentials

Security Alert Management in E-Business Networks

easy_review version BoostMyShop

Overview - Using ADAMS With a Firewall

Analysis of Issues with Load Balancing Algorithms in Hosted (Cloud) Environments

Gravity Forms: Creating a Form

Software Requirements Specification. Schlumberger Scheduling Assistant. for. Version 0.2. Prepared by Design Team A. Rice University COMP410/539

Silect Software s MP Author

Using WhatCounts Publicaster Edition To Send Transactional s

Overview - Using ADAMS With a Firewall

Monitoring Windows Event Logs

PATROL From a Database Administrator s Perspective

The Practical Organization of Automated Software Testing

Transcription:

Intelligent Log Analyzer André Restivo <andre.restivo@portugalmail.pt> 9th January 2003

Abstract Server Administrators often have to analyze server logs to find if something is wrong with their machines. In larger systems these logs can become too large to inspect manually so some tools have been developed to aid the Server Administrator in this task. Most of these tools force the server administrator to define what he is expecting to find in the logs if everything is working correctly, and report all the lines that don t match what was defined as normal. This means that a enormous effort must be put into configuring these kind of log analyzers. This report tries to define a different kind of log analyzer, one that doesn t need a big setup effort and that can learn what is and what isn t normal as it inspects the system logs.

Contents 1 Introduction 4 1.1 Objectives................................... 4 1.2 Motivation................................... 4 1.3 State of the Art................................ 5 1.4 Concept.................................... 6 1.4.1 Clustering............................... 6 1.4.2 Pre-processing............................ 7 1.4.3 Event Matching............................ 8 1.4.4 Cluster Format Analysis....................... 8 1.4.5 Statistical Analysis.......................... 9 1.4.6 Alarm Reporting........................... 10 1.4.7 User feedback............................. 10 2 System Design 11 2.1 Architecture.................................. 11 2.2 Modules.................................... 12 2.2.1 Log Supplier............................. 12 2.2.2 Pre-Processor............................. 12 2.2.3 Clustering............................... 13 2.2.4 Matching............................... 15 2.2.5 Alarm Reporting........................... 16 2.2.6 User Feedback............................ 17 3 Conclusions and Future Work 18 3.1 Possible Improvements............................ 18 1

3.1.1 Clustering and Matching....................... 18 3.1.2 Statistical Analysis.......................... 19 3.1.3 Persistence.............................. 19 3.2 Conclusions.................................. 20 2

List of Figures 2.1 Log Supplier Class Diagram......................... 12 2.2 Pre Processor Class Diagram......................... 13 2.3 Clustering Module Class Diagram...................... 15 2.4 Matcher Module Class Diagram....................... 16 3.1 Clustering Failure Example.......................... 19 3

Chapter 1 Introduction 1.1 Objectives The main objective of this work is to prove that it is possible to implement an alternative to the current log analyzers. This log analyzer should be easy to use and should work in any kind of system without any modification. It isn t in the scope of this project to create a fully functional application but to prove that such application is viable. This will enable the completion of the project in its projected time frame and can provide a solid base for a future work in this area. 1.2 Motivation During my professional life I often have contact with people that have to manage somewhat complex server systems. These system are normally composed of several machines, each one of them writing around 3000 lines of logs per minute. It somewhat confuses me that not much is done with these enormous quantity of information. Normally these logs are only inspected when a problem arises when, in mine point of view, they should be used in a more pro-active way, trying to find that something is not right before it is noticed. The main reason system administrators don t inspect this system logs is the same reason why they are so important. The quantity of information is so big there isn t any possibility 4

an human can go through all of it. This leaves us with two possibilities, either disregard the logs completely untill something breaks or use some kind of tool to help us in this task. Several log analyzer tools exist but unfortunately they normally have either a very time consuming setup phase or are made for a specific application log. This creates some space to a different kind of log analyzer, one that needs to know very little about the system he is working on and that learns with historical data, removing the initial setup work from the system administrator. In this way we hope that more system administrators start using the information present in their system logs in a more productive way instead of using them just to find out what went wrong. 1.3 State of the Art A quick search on the web for log analyzer tools will render an enormous quantity of links and information. If you look a little further into the tools presented you will see that they all belong to one of these categories: Analyzers for one specific product.the most known example is webalizer, a web server log analyzer. These kind of applications are oriented towards one specific product. Analyzers where the user has to specify what he is looking for. An example of these kind of log analyzer is the swatch package. To use this analyzer the user has to specify which log files he wants to monitor and has to pick some regular expressions that match the log events he wants to be warned about. This approach has two problems: It has a very time consuming setup phase and it doesn t consider problems that were never seen by the system administrator (because if he has never seen them he can t specify a regular expression for them). This, of course, opens space for a different type of log analyzer that has the best of the two worlds. This is, a log analyzer that has a simple setup, works with any kind of log and expects the unexpected. 5

1.4 Concept Log files normally consist of a series of lines. These lines are commonly known as events and describe an action that happened on the system. These actions can be of many types, for example: a mail that was sent, someone that logged in the machine, the machine rebooted, someone accessed a web page,... A human can easily group these events by similarity as each type of event always has the same structure with only some tokens changed (we will refer to these event groups as clusters from now on). If a log analyzer could also do this type of grouping it could already do some type of statistical analysis, checking for new event types and for events that suddenly changed their frequency. This type of analysis can be seen as finding strange behaviors in the log file. This approach could easily detect problems like, for instance, a recurrent process that suddenly stopped to run, or a strange increase in failed login attempts. But there is more we can do. If we look into each event cluster we will see that there are some tokens that are often different from one event to the other, while other token remain the same. The tokens that change (that we will know refer to as Variants), could also be the subject of a statistical analysis. In this way each variant would be composed by a set of alternatives, each alternative can be analyzed separatedly. By doing this kind of analysis we could find problems like for instance an user that had never logged in to the machine that is loggin in for the first time. Of course this seems all very easy but if you were paying attention you would have noticed that we said humans could easily group these kind of events. So this is our first problem, how to group events into clusters automatically without having to tell the application which kind of events we are expecting. 1.4.1 Clustering Clustering, at first glance, seems to be the most difficult aspect of this whole idea. A first approach would be to group events that have a big set of common tokens between them. This would work great if all events were like this: 6

User arestivo logged in at 15:06 12/04/2002 from the following IP 141.23.43.231 This kind of log event would have a lot of invariant tokens that would enable us to easily match it to another similar event. Unfortunately not all events are like this one. For instance events like these: GET http://www.example.com/somefile/something/ GET http://127.0.0.1/ In this case a human can easily spot they are of the same type but for a machine using the approach we have mentioned before they seem completely different types of event. The reason the human had a better performance in this example is that the human was able to identify the second token of each event as being an URL. The human s mental structure for this example will be something like: GET {URL1} GET {URL2} Now its easy, would say the machine! This means that we need some kind of action that transforms the first set of events into the second set. We will call this the pre-processing phase. 1.4.2 Pre-processing It s not easy at all for a computer to have the kind of behavior we are looking for, without having some knowledge of the system he is working on. For this reason we have chosen to slightly bend the objectives that we have proposed to achieve. This means that some kind of information about the kind of logs the application is analyzing must be given. The idea is to create a set of rules composed by a regular expression and the name of what that regular expression represent. For instance if we know our logs have dates in the following form, dd/mm/yyyy, we could insert the following rule: \d\d/\d\d/\d\d\d\d - DATE This will of course be much less time consuming than writing a regular expression for all events that we want to be informed of. Other common rules that could be used would be rules for: e-mail addresses, URLs, time, error codes,... 7

1.4.3 Event Matching We now have our events in a much easier format to work with. This doesn t mean that the problem of matching events as being of the same cluster is a easy job. In fact it is still very complicated to determine if two logs are of the same type. Several algorithms can be used to perform this task. Just to mention a few: Using the Levenshtein Distance between strings Counting the number/percentage of common tokens Counting the number/percentage of common characters Counting the number/percentage of variants that would be found if we would merge the two events... As we can see an enormous number of algorithms can be used to perform this task. Besides that each algorithm can be fine tuned by changing internal thresholds. The approach we will use will take all this into consideration. We will use several algorithms and give to each one of them an initial weight. Each algorithm will be queried about the similarity of two events. The result obtained will be weighted using each algorithm specific weight factors. A global answer will be generated. Algorithms who s answer is closer to the global answer will see their weights increased while algorithms that failed to answer correctly will see their weight in the final answer diminished. Besides that, each algorithm will be informed of its performance. This will allow it to fine tune its internal thresholds. This kind of unsupervisioned, or self-supervisioned, learning approach its often called coevolution. 1.4.4 Cluster Format Analysis After grouping events in clusters we also have to find out each cluster s format. This mean that we have to find which are the invariant tokens and which are always constant. When 8

we first form a cluster we start with a single event. This happens when we have an event that doesn t fit in any of the existing clusters. At that time the cluster format is obvious as their aren t any invariants to be spotted yet. As the second event arrives we can just run through each of the event s tokens and compare if they are equal. If they are then we have found a invariant token, if they are different then we found a variant. It s not really as easy as it seems. When we find a variant some kind of synchronizing between the two events must be performed. We will discuss the algorithms behind this process later. 1.4.5 Statistical Analysis As we have mentioned before we can do some kind of statistical to both clusters and variant alternatives. The system being developed will only do some frequency analysis. These analysis will try to find events or parts of events that violate what is considered normal for that event. The idea is to save the maximum and minimum distance reported between events of the same type. If an event starts appearing more quickly that it used to an alarm is generated, on the other hand if an event fails to appear when expected another kind of alarm is generated. We have to be extra careful with this type of analysis. As we have seen before we want this type of system to remove some of the work the user has when analyzing or setting up an analyzer. So we can t flood the user with lots of alarms. This can be solved in the following ways: Have an initial training phase where the analyzer will learn what is and what isn t normal. This phase will be similar to the production phase but with alarms disabled. Allow some monitoring for a new kind of event before reporting alarms for it. For instance if a new application is installed we don t want to start sending alarms for each new event type in the event log. Don t send more than one alarm for each kind of event without having the user give feedback about that alarm. 9

Allow some tolerance when doing the statistical analysis. This tolerance should be easily modified by the user. On the other hand we could have alarms failing to be reported because during the training phase a certain event frequency threshold was violated. This can be solved by having the maximum and minimum frequencies slowly approaching each other. 1.4.6 Alarm Reporting Besides generating alarms we also have to find a way of reporting them to the users. As we don t want to force the user to constantly check for new alarms, the easiest solution is to send the alarms in the form of e-mails. Most e-mail reading application have some kind of warning system that tells the user a new e-mail as arrived. An advantage of this solution will be that getting some user feedback can be easily implemented by sending some clickable links in the e-mail message. 1.4.7 User feedback User feedback to the alarm reports is the only way the user can what kind of alarms he receives. As we have seen this feedback will be made by selecting links on the alarm reports received by e-mail. These feedback links can be the following: Don t send me more alarms for this cluster Don t send me more alarms for this variant Don t send me more alarms for this variant alternative Increase or Decrease the tolerance for this alarm type This was a Major/Minor/Warning alarm These are just the more basic feedback links to be included. More feedback alternatives will probably be identified as soon as the system is more mature. 10

Chapter 2 System Design After seeing what kind of system we want it s now time to see how it will be implemented. In this chapter we will talk about the various modules that are part of the analyzer and talk about the system global architecture. 2.1 Architecture As we have already mentioned large systems usually have more than 3000 events for second in each machine. This mean that performance is a big issue when trying to implement a log analyzer, specially because we dont want to affect the system s performance disturbing it more than we are helping. To try to minimize the impact on the computer s performance the system was designed in order to be easily distributed across different machines. Several different modules where designed without having any restrictions on the machine they would be working on. As this is merely a protype we opted to implement the system in Java and use RMI for communicating between the various modules. If we can prove this kind of system can work we can shift the system into a more performance oriented language like C or C++. 11

2.2 Modules As mentioned before the application will be composed from several modules. Each of this modules will process some data and store them in a buffer. While the buffer isn t full it will continue to process more data. The following module in the event chain will retrieve the processed events from the module and if nothing to retrieve is present then it will wait for a notification that new processed events are ready. We will now see which modules exist and what each one of them is responsible for. 2.2.1 Log Supplier Before starting processing the log files we first must retrieve them. The reason we have chosen to make this a separate module is that it will be easier if for some reason we want to change the way logs are gathered (for instance retrieve them via FTP instead of reading them from the local system). Another advantage of this approach is that it will enable the possibility of retrieving logs from separate files or even from separate machines. To prevent a continuous access to the hard drive this module, as most of the others, will have an internal buffer where log events are stored before being sent to the next module. <<Interface>> LogSupplier SimpleLogSupplier Figure 2.1: Log Supplier Class Diagram 2.2.2 Pre-Processor The Pre-Processor module will connect to any existing Log Supplier, and apply a set of rules. These rules are read from a file upon startup and consist of a regular expression and a keyword. The application will then match each log event to each regular expression and 12

if a match succeeds it will substitute the matched tokens by the specified keyword. For example, this log: 11/05/1999 14:05 Mail sent to user@somewhere.com Would be transformed into DATE TIME Mail sent to EMAIL <<Interface>> LogSupplier <<Interface>> LogConsumer LogPreprocessor Figure 2.2: Pre Processor Class Diagram 2.2.3 Clustering The Clustering module will be one of the most important modules. It will also be a Log- Consumer, being able to read Log Events from any Log Supplier, be it a Log Pre-Processor or a Log Reader. For each log received it will begin by trying to match it to an existing cluster. It will start by calculating the distance between the received Log and each one of the existing clusters. To calculate this distance it will use the Matching module that we will see in detail next. If the nearest Cluster is closer than a pre-defined threshold then it will add this Log Event to that Cluster, if it isn t then it will create a new Cluster having only that new Event. To add a Log Event to an existing Cluster, the Cluster Manager will have to merge the current Log Cluster format with the new Event. This merging operation will work in the following manner (this is a simplified version of the algorithm): 1. Let s1 and s2 be pointers to each strings first token 2. While not finished do { 13

3. If s1 and s2 point to the same token then add that token to the final result and increment both pointers. 4. Else Try to find the token pointed by s1 in the remaining of the second string and vice-versa 5. If found then add the token between s2 and the position the token was found as a new Variant. If a Variant already existed then add it as a new Alternative. Assign the token s position to s2 and increment s1. 6. If not found then add the token at s2 as a new Variant. If a Variant already existed then add it as a new Alternative. Increment both pointers. 7. } For example if we have something like: A B {VARIANT1} C D E A B H C F G D E Then we would start by having s1 and s2 pointing to the token A, while the tokens pointed by both are the same we would just add the pointed tokens to the final result. Thus we would have the final result equal to A B. As soon as we get a mismatch between the two pointers we would have two look for each token in the other string. We wouldn t find them and so we would add a Variant to the final result and, as in that position a Variant already existed we would add H as an Alternative of the already existing Variant. We would resume with token C and when we reached D and F we would have another mismatch. This time we would find the letter D in the second string so we would add another Variant to the final result and add F G as an alternative. At this moment the final result would be A B {VARIANT1} C {VARIANT2}. We would resume with token D and wouldn t find any other mismatches till the end. Thus our final result would be A B {VARIANT1} C {VARIANT2} D E and we would have H as a new Alternative to Variant 1 and F G as the first Alternative to Variant 2. Both Cluster Log and Alternatives will be the objects that we will analyze statistically. In that way they will implement the Alarmable interface, meaning Alarms can be raised 14

referring them. They will also have Statistical information for each one of them. This statistical information will have data about the maximum and minimum frequencies the objects appeared and also a counter for the number of events where they were found. Three types of Alarms can be generated by these Objects: First Event alarms, when an Alarmable Object is first spotted. Minimum Frequency Threshold Alarms, when an Alarmable Object starts appearing with a lesser frequency that it was usual. Maximum Frequency Threshold Alarms, when an Alarmable Object starts appearing with a bigger frequency that it was usual. This last Alarm types are the most difficult to detect because they aren t generated by the existence of a new Event but by the lack of one. This means that each time an event is generated we would have to check if any of the existing Alarmable Objects has crossed the Maximum Frequency Threshold. Of course this would be too time consuming. The solution is to have all the Alarmable Objects ordered by the next possible Maximum Frequency Threshold Alarm and only verify the first few of this objects. <<Interface>> LogConsumer <<Interface>> Alarmable ClusterManager 1 Statistics n ClusterLog 1 n Variant 1 n Alternative Figure 2.3: Clustering Module Class Diagram 2.2.4 Matching Finding out if two separate events belong to the same cluster is also a difficult task to accomplish. The main difficulty is that lots of possible algorithms exist and each one of them 15

seems to be better suited to a specific log format. We decided to use this difficulty in our favor by using as much algorithms as possible and letting them help each other in evolving. We did this by defining a single Match Manager. This Manager would be the responsible for deciding how much did a particular event fit on a certain Cluster. The Match Manager would have several other Objects, called Matchers, able of doing the same type of work. It would start by having the same degree of confidence in each one of them, and as they would answer to its queries it would change the degree of confidence it had on each one of them (based on the answers received). In this way it would be able to do a weighted sum of all the Matchers and return that sum as the result. Besides altering the degree of confidence in each Matcher, the Manager would also inform them of their performance. In that way, each one of them would be able to adjust its internal parameters. This type of cooperation between several algorithms is often called co-evolution. MatcherManager <<Interface>> Matcher Matcher1 Matcher2 Matcher3 Figure 2.4: Matcher Module Class Diagram 2.2.5 Alarm Reporting An important part of the application will be the actual Alarm reporting. Besides sending the alarms to the user this module will also be responsible to filter them. This will prevent that the user gets flooded with too much alarms. Some rules that will be implemented are: 16

Don t send alarms while in the Training Phase Don t send more than one alarm for each Alarmable object untill the user gives feedback about the last alarm (eventually allow untill 5 alarms at a time and warn the user whenever he forgets to give feedback for too long). Don t send statistical alarms untill enough information about the Alarmable object is gathered. Alarms will be sent using e-mails.. This module will also be responsible of constructing the alarm messages, including all the event information, alarm type and feedback links. 2.2.6 User Feedback The User Feedback module will be the only way the user has to configure the way the system behaves. Along with the alarm text, the user will receive a set of links that he can click to alter the alarm behavior. When the user clicks a feedback link he will be sent to a web page informing him that the feedback information has been received. The same webpage will record the feedback information in a database that is constantly monitored by the User Feedback module. In that way the Feedback module will be able to use that information to change several internal parameters. 17

Chapter 3 Conclusions and Future Work 3.1 Possible Improvements As stated in the objectives section, the scope of this project was to create a prototype for a new type of log analyzer based on the statistical analysis of events. This prototype has, of course, room for a lot of improvements. 3.1.1 Clustering and Matching The main modules of the application are the Clustering and Matching modules. In these two modules lies the heart of the analyzer. Both modules have a lot o possible improvement possibilities. At this moment clustering is done by joining a new log event with the most similar of the existing clusters. If all clusters are too different from the log event being analyzed a new cluster is created. This can create two problems: A log event can be assigned to a cluster and later a better cluster can appear. Figure 3.1 is a good example. Notice how the first 4 events seemed to make a good cluster but as soon as more events arrived we can clearly notice 2 different clusters. As clusters grow they might start touching each other and it might be a good idea to merge them. 18

Figure 3.1: Clustering Failure Example The first of this two problems seems to be the most complicated to solve as it would require the analyzer to remove a log event from a cluster. That would mean saving a lot more information about each alternative and a lot more processing. The matching of two Log Events can also suffer some improvements, namely implementing new algorithms. New algorithms could even be based on genetic algorithms or neural networks as each algorithm receives information about its performance that could be used to train this kind of algorithms. 3.1.2 Statistical Analysis At this moment only some simple statistical analysis are performed. It shouldn t be hard to come up with new useful analysis concepts and incorporate them in the application. Some ideas might be: Do some numerical analysis instead of treating all the Alternatives as text, try to identify which ones are numerical and do some kind of trend analysis on them. Try to find out which words are sign of alarms and apply that knowledge on other clusters. Do a more elaborate analysis than just testing maximum and minimum frequencies.... 3.1.3 Persistence In the real world a log analyzer that learns with historical data such as this one doesn t have much use if it isn t capable of saving it s own state. As this is only a prototype not much 19

effort as been put into turning the system persistent. The most logical approach seems to be to use a relational database to save the internal state between system runs. 3.2 Conclusions As the result of this project a prototype as been developed. The feedback module as not yet been implemented but all the other modules are functional. The matching and clustering module are working much better than expected but some work still needs to be done in the event merging function. The amount of Alarms generated is still overwhelming but without extensive tests and training we can t get a clear conclusion about the usability of the system. We have proposed to show that it was possible to create a log analyzer without any knowledge of the logs domain. By trying to match similar events and join them in clusters, merge them and try to extract a common format for each cluster, we were able to analyze statistically the retrieved information. An arquitecarchitectureture for this log analyzer has been proposed and explained in detail and a prototype as been developed. The results achieved show that it is possible to use this kind of technique in log analyzing although a lot of work still needs to be done to make this possible in real time. 20

Bibliography [1] Brad Dolin, Forrest H Bennett, III and Eleanor G. Rieffel Co-evolving an effective fitness sample Proceedings of the 2002 ACM symposium on applied computing [2] Andrew B. Williams and Zijian Ren Agents teaching agents to share meaning Proceedings of the fifth international conference on Autonomous agents, 2001 [3] Haixun Wang, Wei Wang, Jiong Yang and Philip S. Yu Clustering by pattern similarity in large data sets Proceedings of the 2002 ACM SIGMOD international conference on Management of data [4] Ronald Feldman Mining Unstructured Data Tutorial Notes San Diego, 1999 21