An Agent-based Bayesian Forecasting Model for Enhanced Network Security



Similar documents
Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

Chapter 8: Regression with Lagged Explanatory Variables

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

Morningstar Investor Return

Chapter 8 Student Lecture Notes 8-1

Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.

USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES

Measuring macroeconomic volatility Applications to export revenue data,

TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS

Vector Autoregressions (VARs): Operational Perspectives

Hedging with Forwards and Futures

MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR

Usefulness of the Forward Curve in Forecasting Oil Prices

Performance Center Overview. Performance Center Overview 1

Distributing Human Resources among Software Development Projects 1

Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)

How To Calculate Price Elasiciy Per Capia Per Capi

Individual Health Insurance April 30, 2008 Pages

Automatic measurement and detection of GSM interferences

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

The naive method discussed in Lecture 1 uses the most recent observations to forecast future values. That is, Y ˆ t + 1

DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS

INTRODUCTION TO FORECASTING

SPEC model selection algorithm for ARCH models: an options pricing evaluation framework

Trends in TCP/IP Retransmissions and Resets

The Application of Multi Shifts and Break Windows in Employees Scheduling

Load Prediction Using Hybrid Model for Computational Grid

DDoS Attacks Detection Model and its Application

Chapter 7. Response of First-Order RL and RC Circuits

Making a Faster Cryptanalytic Time-Memory Trade-Off

Supply chain management of consumer goods based on linear forecasting models

Towards Intrusion Detection in Wireless Sensor Networks

Appendix D Flexibility Factor/Margin of Choice Desktop Research

Time Series Analysis Using SAS R Part I The Augmented Dickey-Fuller (ADF) Test

Improving timeliness of industrial short-term statistics using time series analysis

Predicting Stock Market Index Trading Signals Using Neural Networks

The Transport Equation

Analysis of Pricing and Efficiency Control Strategy between Internet Retailer and Conventional Retailer

Stock Price Prediction Using the ARIMA Model

Information Theoretic Evaluation of Change Prediction Models for Large-Scale Software

ARCH Proceedings

Analogue and Digital Signal Processing. First Term Third Year CS Engineering By Dr Mukhtiar Ali Unar

SEASONAL ADJUSTMENT. 1 Introduction. 2 Methodology. 3 X-11-ARIMA and X-12-ARIMA Methods

Why Did the Demand for Cash Decrease Recently in Korea?

SELF-EVALUATION FOR VIDEO TRACKING SYSTEMS

Chapter 6: Business Valuation (Income Approach)

Chapter 1.6 Financial Management

A New Type of Combination Forecasting Method Based on PLS

GoRA. For more information on genetics and on Rheumatoid Arthritis: Genetics of Rheumatoid Arthritis. Published work referred to in the results:

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

Information Systems for Business Integration: ERP Systems

Hotel Room Demand Forecasting via Observed Reservation Information

CHARGE AND DISCHARGE OF A CAPACITOR

Model-Based Monitoring in Large-Scale Distributed Systems

Task is a schedulable entity, i.e., a thread

Stochastic Optimal Control Problem for Life Insurance

The Kinetics of the Stock Markets

Relationships between Stock Prices and Accounting Information: A Review of the Residual Income and Ohlson Models. Scott Pirie* and Malcolm Smith**

DOES TRADING VOLUME INFLUENCE GARCH EFFECTS? SOME EVIDENCE FROM THE GREEK MARKET WITH SPECIAL REFERENCE TO BANKING SECTOR

Cointegration: The Engle and Granger approach

A New Schedule Estimation Technique for Construction Projects

Market Liquidity and the Impacts of the Computerized Trading System: Evidence from the Stock Exchange of Thailand

Real-time Particle Filters

Market Analysis and Models of Investment. Product Development and Whole Life Cycle Costing

UPDATE OF QUARTERLY NATIONAL ACCOUNTS MANUAL: CONCEPTS, DATA SOURCES AND COMPILATION 1 CHAPTER 7. SEASONAL ADJUSTMENT 2

The Grantor Retained Annuity Trust (GRAT)

II.1. Debt reduction and fiscal multipliers. dbt da dpbal da dg. bal

Working Paper No Net Intergenerational Transfers from an Increase in Social Security Benefits

PRACTICES AND ISSUES IN OPERATIONAL RISK MODELING UNDER BASEL II

TSG-RAN Working Group 1 (Radio Layer 1) meeting #3 Nynashamn, Sweden 22 nd 26 th March 1999

BALANCE OF PAYMENTS. First quarter Balance of payments

Secure Election Infrastructures Based on IPv6 Clouds

Setting Accuracy Targets for. Short-Term Judgemental Sales Forecasting

Statistical Analysis with Little s Law. Supplementary Material: More on the Call Center Data. by Song-Hee Kim and Ward Whitt

The Greek financial crisis: growing imbalances and sovereign spreads. Heather D. Gibson, Stephan G. Hall and George S. Tavlas

The Real Business Cycle paradigm. The RBC model emphasizes supply (technology) disturbances as the main source of

Multiprocessor Systems-on-Chips

A Natural Feature-Based 3D Object Tracking Method for Wearable Augmented Reality

Detection of DDoS Attack in SIP Environment with Non-parametric CUSUM Sensor

Impact of scripless trading on business practices of Sub-brokers.

Forecasting, Ordering and Stock- Holding for Erratic Demand

Term Structure of Prices of Asian Options

Chapter 4: Exponential and Logarithmic Functions

Forecasting. Including an Introduction to Forecasting using the SAP R/3 System

INTEREST RATE FUTURES AND THEIR OPTIONS: SOME PRICING APPROACHES

Using Weather Ensemble Predictions in Electricity Demand Forecasting

Forecasting Product Sales with Dynamic Linear Mixture Models. Phillip M. Yelland and Eunice Lee

Nikkei Stock Average Volatility Index Real-time Version Index Guidebook

DEMAND FORECASTING MODELS

UNDERSTANDING THE DEATH BENEFIT SWITCH OPTION IN UNIVERSAL LIFE POLICIES. Nadine Gatzert

Forecasting Sales: A Model and Some Evidence from the Retail Industry. Russell Lundholm Sarah McVay Taylor Randall

Time-Expanded Sampling (TES) For Ensemble-based Data Assimilation Applied To Conventional And Satellite Observations

A Scalable and Lightweight QoS Monitoring Technique Combining Passive and Active Approaches

The Architecture of a Churn Prediction System Based on Stream Mining

Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary

Economics Honors Exam 2008 Solutions Question 5

Time-Series Forecasting Model for Automobile Sales in Thailand

LEASING VERSUSBUYING

4. International Parity Conditions

Constant Data Length Retrieval for Video Servers with Variable Bit Rate Streams

Transcription:

An Agen-based Forecasing Model for Enhanced Nework Securiy J. PIKOULAS, W.J. BUCHANAN, Napier Universiy, Edinburgh, UK. M. MANNION, Glasgow Caledonian Universiy, Glasgow, UK. K. TRIANTAFYLLOPOULOS, Universiy of Warwick, UK. Absrac Securiy has become a major issue in many organisaions, bu mos sysems sill rely on operaing sysems, and a user ID and password sysem o provide user auhenicaion and validaion. They also end o be cenralized in heir approach which makes hem open o an aack. This paper presens a disribued approach o nework securiy using s, and presens a novel applicaion of he forecasing echnique o predic user acions. The has been used in he pas on weaher forecasing and has been expanded so ha i can be used o provide enhanced nework securiy by rying o predic user acions. For his a sysem can deermine if a user is acing unpredicably or has changed heir normal working paern. Resuls are also given which show ha he new model can predic user acions, and a se of experimens are proposed for furher exploiaion of he. 1. Inroducion Compuer securiy is a major concern for organizaions. Whils securiy violaions can be caused by exernal users (hackers), Carer and Caz [1] have shown ha he primary hrea comes from individuals inside an organisaion. Hence much more emphasis has o be placed on inernal securiy mechanisms. Exernal nework aacks can be caegorised [4] ino IP spoofing aacks [5], Packe-sniffing [6], sequence number predicion aacks and rus-access aacks. Caegories of inernal aack include Passwords aacks [7], session hijacking aacks, shared library aacks, social engineering aacks, and echnological vulnerabiliy aack. Compuer nework securiy programs can be caegorised as follows [3]: Securiy enhancemen sofware. This enhances or replaces an operaing sysem s buil-in securiy sofware (for example, Mangle I, Passwd+ and Shadow). Auhenicaion and encrypion sofware. This encryps and decryps compuer files (for example, Kerberos, MD5, RIPEM, and TIS Firewall Toolki). Securiy monioring sofware monior. This moniors differen operaions of a compuer nework and oupus he resuls o sysem adminisraors (for example, Abacus Senry, COPS, Tripwire and Tiger). Nework monioring sofware. This moniors user s behaviour or moniors incoming or ougoing raffic (for example, Argus, Arpwach and ISS). Firewall sofware and hardware. This runs on he Inerne/inrane enrance o a compuer nework, and checks all incoming nework raffic for is conens a he nework and ranspor layers of he OSI model. A he nework layer, ypically he Inerne Proocol (IP) addresses are filered for heir source and/or desinaion, and a he ranspor layer, he TCP pors and moniored (hus FTP and TELNET raffic could be blocked for incoming daa raffic, bu SMTP (elecronic mail) could be allowed). These s are generally cenralised applicaions wih no real ime response and have no mechanism o foresee fuure user evens. These s also have a cenral focal poin for securiy (ypically a main server), which could iself become he focus of an aack (such as a denial-ofservice aack, where he server is bombarded wih hoax requess, which evenually reduces is qualiy of service o is cliens). The involved in his research is disribued, and does no depend on a cenral poin of failure. I also gahers user behavioural informaion and i makes a predicion on wha he user migh do in he fuure. This paper presens a disribued approach o nework securiy using s, and presens a novel applicaion of he forecasing echnique o predic user acions. The has, in he pas, been used for weaher forecasing and has been expanded so ha i can be used o provide enhanced nework securiy by rying o predic user acions. For his a sysem can deermine if a user is acing unpredicably or has changed heir normal working paern. Resuls are also given which show ha he new model can predic user acions, and a se of experimens are proposed for furher exploiaion of he. In choosing a compuer nework securiy soluions, he dominan issues are: cos; he desired level of securiy: and he characerisics of he exising operaing sysem envi- 1

ronmen. Three mechanisms for illegal behaviour deecion are commonly used in compuer nework securiy programs [8], and can be applied o all five caegories of compuer securiy program. Saisical Anomaly Deecion Saisical anomaly deecion sysems analyse audi-log daa o deec abnormal behaviour [9]. A profile of expeced online behaviour for a normal user is predefined and derived from how an organisaion expecs a user o behave and from a sysem adminisraor s experience of he way a user is expeced o use he resources of a sysem. Typically, he audi logs are analysed and processed for saisical paerns of evens for ypical operaions for which o deermine usage paerns. These paerns are compared o he user s profile. The sysem hen warns he adminisraor ha here has been a possible inrusion when a profile is differen o a usage paern. The major drawback wih his echnique is ha i canno predic exreme changes in user behaviours, as changes in a user s behaviour normally idenify a securiy breach. Rule Based Deecion Rule-based deecion sysems use a se of generalised rules ha define abnormal behaviour (10,12,13). These rules are formed by analysing previous differen paerns of aack, by differen people. The drawback of his sysem is ha he basic rules are predefined by sysem adminisraors, and canno deec any new aack echniques. If a user exhibis behaviour ha is no prescribed by he exising rules, he user can harm he sysem wihou being deeced. Hybrid Deecion Hybrid deecion sysems are a combinaion of saisical anomaly deecion and rule-based deecion sysems. These, ypically, use rules o deec known s of inrusion and saisical based s o deec new s of inrusion. CMDS (Compuer Misuse Deecion Sysem) [14] is a securiy-monioring package ha provides a o wach for inrusions, such as bad logins or file modificaions. I also moniors for he difficul deecion problems such as socially engineered passwords, rused user file browsing and daa hef ha migh indicae indusrial espionage. CMDS suppors a wide variey of operaing sysems and applicaion programs. The drawback of his sysem is ha i uses saisical analysis o make addiional rules for he sysem. This is a drawback, as i can only deec aack paerns ha have been used in he pas and idenified as aack paerns, or predefined by he sysem operaors. I also generaes long repors and graphs of he sysem performance ha require o be inerpreed by a securiy exper. 4 Inrusion Deecion Sysem We used a mulivariae saisical model because our problem is a linear mulivariae problem and i is simpler faser and more accurae o use a linear model han a non-linear model like neural neworks [24]. In order o es his an inelligen securiy enhancemen sofware sysem was consruced, in which a core sofware resides on one server in a Windows NT nework sysem and user end sofware s reside in each user worksaion. The sofware for each ype of was wrien in SUN Java JDK Version 1.2 on a Microsof Windows NT Version 4 environmen running over a 10/100 Mbps nework. There is one server and 10 cliens. Figure 1 shows a core communicaing wih many user s. A communicaion hread is a unique process ha he core creaes o ransmi daa o he user end in response o message ransmied from he user end. Unique processes enable he core o communicae wih each user effecively and efficienly hereby enabling a fas response o nework monioring. Once he core has responded o a user, he process is killed. The sysem uses a hybrid deecion echnique, where invalid behaviour is deermined by comparing a user s curren behaviour wih heir ypical behaviour and by comparing heir curren behaviour wih a se of general rules governing valid behaviour formed by sysems adminisraors. Typical behaviour is conained in a user hisorical profile. GUI Core connecion engine Communicaion hread Communicaion hread Communicaion hread Communicaion hread reader Communicaion hread Predicor GUI Transmier Sensor Comparaor Figure 1: Agen Environmen Topology The user sofware has four componens: A sensor. The sensor moniors he various sofware applicaions (such as a word processor or a spreadshee) ha are currenly being run by he user on ha worksaion. When a user logs-in he sensor polls he user s aciviy every five seconds and records he user s idenifier and each applicaion s name and process idenifier. A ransmier. Afer he firs polling by he sensor, he ransmier sends his informaion o he core. The core hen responds by sending a user 2

hisorical profile. Wih an audi-log file for a period of one monh, we observed ha he size of an average user profile was beween 400 KB and 600 KB, wih a download ime of beween hree and five seconds. A profile reader. The profile-reader reads he user s hisorical profile. A comparaor. This compares he user s hisorical profile wih he informaion read by he sensor. If he curren behaviour profile does no fall wihin he acceped behaviour paern defined by he user hisorical profile, he comparaor provides he ransmier wih he following informaion: user idenifier, invalid behaviour ype and corresponding invalid behaviour ype daa.. This is hen sen o he core. When invalid behaviour occurs, several courses of acion are available, such as: 1. Warning message o he sysem adminisraor or end user. 2. Kill he specific applicaion ha has caused invalid behaviour. 3. Preven he end user from running any furher applicaions. Cases 2 and 3 can be achieved locally a he clien worksaion, and in Case 1, he user informs he core and he core informs he sysems adminisraor. The user erminaes when a user logs off. Figure 2 shows he complee model for he forecasing sysem, where a core reads he user profile, which is hen received by he user. The user hen predics he usage agains he forecas. Evenually when he user logs off he user profile is updaed and sen back o he core. In he radiional of forecasing, a user even would be averaged over long ime inervals (in Figure 3). Core Core reurns he updaed model for he user Core sends forecasing informaion alers he core on any differences in aciviy. Agen Agen compares compares usage usage wih wih forecas forecas updaes updaes he he forecasing forecasing model model Agen moniors curren usage logs off logins Curren forecased model Usage over login period Averaging Curren user profile New user profile Forecasing of generaing user profile for applicaions Requires large amouns of sorage Gaps in daa reduces predicion New user profile Less sorage Faser processing Figure 3: Tradiional of generaing user profile for applicaions 5 Predicion Model When our inrusion deecion sysem is insalled, he predicion par moniors he user behaviour for 15 imes. Afer ha, i evaluaes iself for five imes. Afer his i is ready o make an accurae predicion. Our model has hree sages of operaion. The sages are: 1. Observaion sage. In his sage he model is monioring he user and records is behaviour. 2. Evaluaion sage. In his sage he model makes a predicion and also moniors he user acual movemens and calculaes he resul. This sage is criical, because he model modifies iself according o he environmen ha i operaes in. 3. One-sep predicion. In his sage he model makes a single sep predicion. For example, assume ha he user is logged in for 15 imes and he model is configured, and i is ready o sar predicing user moves. Insead of making a five or en sep predicion, like oher mahemaical models, our model makes a predicion for he nex sep. When he user logs in and ou of our model, i akes he acual behaviour of he user, compares i wih he one sep predicion ha i has performed before and calculaes he error. So he nex ime a predicion is made for his user i will include also he daa of he las user behaviour. Wih his procedure we maximise he accuracy of he predicion sysem. The proposed forecasing improves his by requiring much less memory sorage. Figure 4 shows a generic model for he predicing using parameers for a given window size (n), ime unis and predicion number (z). Figure 2: Agen forecasing model 3

Applicaion usage (%) δ, β Window size (n) Window Window sored sored when when user user logs logs off off Figure 4: Forecasing calculaion Predicion number (z) Time uni (i) Sample parameers: n = 15 z = 5 Time uni = 1 hour The general mulivariae model (DLM) is given by he nex equaions: ' Y = F θ + v, v ~ N[0, Σ] (1) θ = G θ + ω, ω ~ N[0, W ] (2) 1 We use mulivariae models because we wan o incorporae and forecas several variables simulaneously. Again noe ha he fac ha he parameers θ change boh deerminisically (hrough ) and sochasically (hrough he variance W ), and hus make he model dynamic. Also sandard ARIMA (Auo-Regressive Inegraed Moving Average) models are a special and resricive case of he above model, when you se F = F, G = G and W = W (all hese hree componens are consan over ime). This is resricive since all hese componens are likely o change over ime because e.g. (1) changes over ime and here are oher exernal sources of variaion (such as exra subjecive informaion abou a variable). Moreover, equaion (2) is no observable. This means ha we never are going o see any evoluion or rend in a diagram or a graph. This is a hidden model ha canno assume W o be consan over ime. There is anoher large problem ha we canno ignore in mulivariae models. The variance marix Σ will no be known. Ofen, in sandard ime series, i is assumed known and hey easily jump o anoher problem. However, in pracice, his is exremely difficul o se i as a known marix. I is very difficul o propose wha variance o use o a sysem where 20 applicaions are considered and only 20 or 30 vecors are colleced as daa. So for all hese reasons we need o consider he dynamic models. Also, he sysem could provide forecasing as much ahead as we like, proving very accurae according o he resuls. For his purpose we used a framework, which virually means ha a ime we will have some kind of knowledge, ha is a subjecive belief, expressed in erms of a disribuion. This is he prior disribuion of ( θ D 1) a ime. In oher words, i is wha we know before Y becomes available. Once his happens, we revise his prior belief, using he likelihood funcion, o find he poserior disribuion ( θ D ) or revised, which is beer and more accurae. Then according o simple calculaions, we find he prior of ime 1 and we calculae he poserior a +1, only when informaion of he daa Y+1 comes in o he sysem (e.g. in our case is he real behaviour of he user). The model used becomes: Auoregressive moving average model. The general model inroduced by Box and Jenkins (1976) includes auoregressive as well as moving average parameers, and explicily includes differencing in he formulaion of he model. Specifically, he hree ypes of parameers in he model are: he auoregressive parameers (p), he number of differencing passes (d), and moving average parameers (q). In he noaion inroduced by Box and Jenkins, models are summarized as ARIMA (p, d and q); so, for example, a model described as (0, 1, 2) means ha i conains 0 (zero) auoregressive (p) parameers and 2 moving average (q) parameers which were compued for he series afer i was differenced once. Idenificaion. As menioned earlier, he inpu series for ARIMA needs o be saionary, ha is, i should have a consan mean, variance, and auocorrelaion hrough ime. Therefore, usually he series firs needs o be differenced unil i is saionary (his also ofen requires log ransforming he daa o sabilize he variance). The number of imes he series needs o be differenced o achieve saionary is refleced in he d parameer (see he previous paragraph). In order o deermine he necessary level of differencing, one should examine he plo of he daa and auocorrelogram. Significan changes in level (srong upward or downward changes) usually require firs-order nonseasonal (lag=1) differencing; srong changes of slope usually require second order non-seasonal differencing. Seasonal paerns require respecive seasonal differencing (see below). If he esimaed auocorrelaion coefficiens decline slowly a longer lags, firs-order differencing is usually needed. However, one should keep in mind ha some ime series may require lile or no differencing, and ha over differenced series produce less sable coefficien esimaes. A his sage we also need o decide how many auoregressive (p) and moving average (q) parameers are necessary o yield an effecive, bu sill efficien, model of he process (ha is wih he fewes parameers and greaes number of degrees of freedom among 4

all models ha fi he daa). In pracice, he values of he p or q parameers are rarely greaer han wo (see below for more specific recommendaions). Esimaion and Forecasing. A he nex sep (esimaion), he parameers are esimaed (using funcion minimizaion procedures), so ha he sum of squared residuals is minimised. The esimaes of he parameers are used in he las sage (forecasing) o calculae new values of he series (beyond hose included in he inpu daa se) and confidence inervals for hose prediced values. The esimaion process is performed on ransformed (differenced) daa; before he forecass are generaed, he series needs o be inegraed so ha he forecass are expressed in values compaible wih he inpu daa. This auomaic inegraion feaure is represened by he leer I in he name of he ARIMA ology. In addiion o he sandard auoregressive and moving average parameers, ARIMA models may also include a consan, as described above. The inerpreaion of a saisically significan consan depends on he model ha is fi. Specifically: if here are no auoregressive parameers in he model, hen he expeced value of he consan is µ he mean of he series; if here are auoregressive parameers in he series, hen he consan represens he inercep. If he series is differenced, he consan represens he mean or inercep of he differenced series; For example, if he series is differenced once, and here are no auoregressive parameers in he model, he consan represens he mean of he differenced series, and herefore he linear rend slope of he un-differenced series. ARIMA models are similar o our model. They use he exising daa o calculae he parameers of he model. Bu if, for example, some exernal informaion is available. For example, we may know ha i is he x user and alhough he does no have an illegal user profile, i is very probable ha a a specific poin of ime he will perform a huge invasion o an imporan applicaion. ARIMA will ry o change he parameers o adjus he model, bu even in his case, i is doubful how well he model will do in all he applicaions. Wih our DLM i is no a problem. Simply we add o he prior informaion we have, he exernal informaion. This is named exper inervenion, and he revised poserior akes ino accoun he new knowledge. Our sysem is no assumed perfec when he model is fied, and we le informaion, no maer wha is sor, o make us learn and improve he sysem. Now our model is slighly differen han he one we use for illusraion purposes. We find recurrence relaionships, which are more naural o overall long formulae ha ARIMA works ou. We noe ha because ARIMA is quie complicaed, many praciioners end up o a simple, very simple subclass of ARIMA model, no even someimes saing he assumpions. This produces resuls ha someimes do no correspond o he real applicaion. The only difficuly wih he DLMs is he specificaion of he iniial values, such ha he algorihm may be pu ino pracice. In general his requires o be solved by he experience of he individual praciioner. In our case, we have o specify he following: m0, C0, S0, n0, β, δ, F * m0 is he mean of ( θ 0 D0 ) and C 0 is variance. The choices made are: m 0 = 0. This is se when we expec ha he prior disribuion ( θ 0 D0 ) (he disribuion of he parameer Θ a ime given D 0 - any iniial informaion which is explicily known) will no give any drif o Y 1. The fac ha we expec his o happen, bu we are no sure, so here is here an uncerainy, which is expressed by he variance C 0. I is naural and common policy o assume C 0 = I, he ideniy marix. Bu care mus be aken when we are very uncerain abou our choice we MUST increase he diagonal elemens of C. Of course, his affecs all he following resuls somehow, bu he approach is more realisic. In general we will have more daa vecors han 15, or 20 (our case), hence iniial values will dominae he acual esimaes in a decreasing rae. C 0 = I. This is moivaed by our belief ha m 0 is no imporan o he following values of Y, =1,... S 0 is ypically, almos always se o I and i has no go any special meaning. The only one we can find is ha i is chosen such ha according o he formula ha we have o calculae S, S0 mus lead o accepable resuls (symmeric marices). The n 0 can be se o 0 (a case which implies n 1 = 1, wihou grea loss) or n 0 = 1(a case which implies n 1 = β + 1). The choice of n 0 is no crucial since here is heorem ha saes ha S converges o Σ as goes o infiniy and i does no depend on n 0. Bu i mus ake small values. δ. The δ choice is discussed wih deails in Ameen and Harrison (1982a) where i is shown ha i mus be 0.85 < δ < 1 and quie high. Thus we have se i 0.95. β. The β is a discoun facor as well. In his documen we sae ha i has o be smaller han δ, as, in general, S is no so much influenced by he daa as i is m. Noe ha δ is in A and so i influences m. 5

The componens are defined as: Θ from D 0, our iniial info. Dispersion of he above influence. m 0 : The mean of he influence of 1,Y1 C 0 : S 0 : n 0 : β : δ : F : No meaning, and is an auxiliary quaniy for S. No meaning, and is an auxiliary quaniy for n. Facor of he influence of he daa o he esimae S. Facor of he influence of he daa o he esimae m. A basic quaniy ha expresses he lineariy of he model and gives differen rends o he several values of Y, boh for ime series analysis (wha has happened in he pas) and forecasing (wha will happen in he fuure). Finally, we make clear ha when we say facor in he above explanaion we do no mean any percenage or whaever. Facor means discoun facor, which means ha he esimaes of m and S are discouned somehow and in differen rae, since boh are influenced by daa. Applicaion usage (%) δ, β Window size (n) Inervenion Inervenion Window Window sored sored when when user user logs logs off off Addiional excepional daa (varies he sensiiviy of sysem) ω Time uni (i) Sample parameers: n = 15 z = 5 Time uni = 1 hour Figure 5: Forecasing calculaion wih inervenion Inervenion Inervenion is a mechanism for improving he predicion accuracy. I is used when here is addiional informaion abou he fuure behaviour of he sysem, and can be added o he model prior he predicion. For example if here is some users ha are keen on using illegal sofware or here are new users ha here is no enough informaion abou heir behaviour, by applying he inervenion mechanism, we increase he accuracy of he model and can make more accurae predicions (Figure 5). In our model we can observe his by looking a Figure and Figure. In hese we can observe ha our model predicion is very close o he acual users behaviour for he applicaion number one a he specific ime =19. We achieved his accuracy by applying he inervenion echnique. We can also observe ha he ARIMA model did no make any predicion for his paricular user behaviour. Resuls and experimens The firs se of experimens are made in order o es our securiy environmen o he exen ha i works and o ge some resuls from our proposed saisical model and compare i wih oher saisical models. Invasion ime (hours) 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 Predicion for Applicaion 1 (using model) Real observaions Learning phase Using predicion model 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Time (hours) Figure 6: The Real Observaions of he Model Parameers: n = 15 z = 5 Time uni = 1 hour Our environmen is vasly improved wih he use of he predicion mechanism. Our sysem is no using he real ime daa ha is s gaher only for real ime deecion. Our addiion of he predicion model in our environmen, increases is funcionaliy and is usabiliy o he maximum. Figure 8 shows one user ha logged on o he sysem 20 imes and had one hour sessions each ime. We moniored all heir moves and all he applicaions ha he used. In our predicion model we had only hree applicaions o predic. The inervals are from 0 o 1 and hey denoe an hour. So, for example, 0.3 means ha he user used his program for 0.3 of he hour (18 minues), in his specific hour of he sysem usage. We used our predicion mechanism for he las five observaions. As we can observe from he resuls, if we compare he graphs in Figure 6, which are he real observaions for he hree applicaions, and he graphs in Figure 7, we can see ha he wo figures are almos idenical. We canno say he same if we compare he real readings from he resuls of he ARIMA model. We can see ha hey are 6

less precise wih he acual readings and hey fail o predic he acion of he user in applicaion 1 a he ime inerval nineeen, in comparison wih our model ha prediced i wih a very close figure. Invasion ime (hours) 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 Predicion for Applicaion 1 (using ARIMA) Real observaions Learning phase Parameers: n = 15 z = 5 Time uni = 1 hour Using ARIMA model 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 o verify ha he environmen works and o show ha our saisical proposed model gives beer resuls han he models ha are widely used up o now. In he nex sage, we are planning o expand he number our experimens, and also he number of he applicaions ha we use, and he number of users involved. We also planning o fully exercise our model by insrucing users o have some exreme behaviour for sor periods and normal behaviour for long periods so we wan o see if he model deecs and predics exreme user behaviour (Figure 8). Applicaion usage (%) δ, β Inervenion Inervenion ω Variaion of predicion window (1 o 100) Time (hours) Figure 7: The proposed Model 5 sep predicion Evaluaion Our proposed model is a mulivariae linear model ha is a simple and fas adopive model. I requires far less preparaion han oher models like, for example, he neural ne weighs ha you have o decide before you build your neural ne model. Our proposed environmen reaced as expeced o all he ess ha were applied. The monioring of he user behaviour was successful and he overhead on he sysem resources was minimum. There was a 1 o 2% increase on he CPU usage, when he user was monioring he user moves, and he predicion ask only ook wo seconds o complee wih he hree applicaions and for a fully operaional sysem wih 20-25 applicaions, we esimae ha i will ake no more han five seconds. Our proposed environmen is collecing informaion abou he user every five seconds. The predicion procedure is aking place a he end of each hour of a user s use of he sysem. If he user log off before one hour complees, he calculaion of he predicion akes place whenever he user will finish log on o he sysem again and complees one hour. Anoher difference of our model is ha he saisical models ha are in use now, work inside accepable parameers, only because hey make oo may assumpions abou he iniial parameers, a facor ha we believe makes hem give resuls ha does no represen acual siuaions. Fuure Work The experimens ha are conduced up o now were seup Variaion of ime uni (10 min o 1 hour) 8 References Figure 8: Experimenal seup Variaion of Window size (10 o 500) [1] Carer and Caz, Compuer Crime: an emerging challenge for law enforcemen, FBI Law Enforcemen Bulleing, pp 1-8, December 1996. [2] Roger Blake, Hackers in The Mis, Norhwesern Universiy, December 2, 1994. [3] Naional Insiues of Healh. Cener for Informaion Technology, hp://www.alw.nih.gov/securiy/securiyprog.hml#com mercial, Ocober 1998. [4] W.J. Buchanan. Handbook of Daa Communicaions and Neworks, Kluwer, 1998. [5] SamsNe, A Hacker s Guide o Proecing Your Inerne Sie and Nework, URL:hp://mx.nsu.ru/Max_Securiy/ch28/ch28.hm [6] NeworkICE Corporaion, Packe Sniffing, hp: //www.neworkice.com/ advice/ Underground/ Hacking/ Mehods/ Technical/ Packe_sniffing/defaul.hm [7] Alan Ramsboom, FAQ: NT Crypographic Password Aacks & Defences, 1997, hp: //www.omikron.de/ ~ecr/ nhack/ samfaq.hm. [8] Chris Herringshaw, Deecing Aacks on Neworks, IEEE Compuer Magazine, pp 16 17, Dec. 1997. [9] Debra Anderson, Deecing Unusual Program Behavior Using he NIDES Saisical Componen, IDS Repor SRI Projec 2596, Conrac Number 910097C (Trused Informaion Sysems) under F30602-91-C-0067 (Rome Labs), 1995. [10] T. Lun, H. Javiz, A. Valdes, e al. A Real-Time Inrusion Deecion Exper Sysem (IDES), SRI Projec 6784, 7

Feb. 1992. SRI Inernaional Technical Repor. [11] J Pikoulas and K Trianafyllopoulos, Mulivariae Regression for Predicing Behaviour in a Sofware Agen Compuer Securiy Sysem, 20 h Inernaional Symposium on Forecasing, Lisbon, Porugal, June 21, 2000. [12] Sandeep Kumar and Gene Spafford, A Paern Maching model for Misuse Inrusion Deecion, Proceedings of he 17h Naional Compuer Securiy Conference, Oc. 1994. [13] Mark Crosbie and Gene Spafford, Acive Defence of a Compuer Sysem using Auonomous Agens, COAST Group, Dep. of Compuer Science, Prudue Universiy, Technical Repor (95-008),2 3, Feb 1995. [14] The Compuer Misuse Deecion Sysem, hp://www.cmds.ne/, 1998. [15] Pikoulas J, Mannion M and Buchanan W, Sofware Agens and Compuer Nework Securiy, he 7h IEEE Inernaional Conference on he Engineering of Compuer Based Sysems, pp 211 217, Apr. 2000. [16] Jean O. Dickey, Chrisian L. Keppenne, and Seven L. Marcus, FORECASTING REGIONAL CLIMATE CHANGE WITH ADVANCED STATISTICAL METHODS, Je Propulsion Laboraory, California Insiue of Technology, Pasadena. [17] Professor Hossein Arsham, Saisical Daa Analysis: Prove i wih Daa, Universiy of Balimore, hp://ubmail.ubal.edu/~harsham/sa-daa/opre330.hm. [18] Carlin B. and T. Louis, Bayes and Empirical Bayes Mehods for Daa Analysis, Chapman and Hall, 1996. [19] Sanford Universiy, GENSCAN: A Powerful ool for Gene Predicion, Vol. 8, N. 1, 1999. [20] Seven L. Salzberg, Arhur L. Delcher, Simon Kasif and Owen Whie, Microbial gene idenificaion using inerpolaed Markov models, pp. 544 548, Nucleic Acids Research, 1998, Vol. 26, No. 2, 1998 Oxford Universiy Press. [21] The Grea Lakes Forecasing Sysem, The Ohio Sae Universiy (OSU) and he Naional Oceanic and Amospheric Adminisraion (NOAA) Grea Lakes Environmenal Research Laboraory (GLERL), hp://superior.eng.ohiosae.edu/main/noframes/abou.hml [22] Sandia Naional Laboraories, A Smar, Agen based simulaion model, hp://www-aspen.cs.sandia.gov/, Feb. 2000. [23] J.R.M. Ameen and P.J. Harrison, Normal discoun models, Journal of Saisics, 1985. [24] Georges A. Darbellay and Marek Slama, Forecasing he sor erm demand for elecriciy. Do neural neworks sand a beer chance?, Inernaional Journal of Forecasing, pp. 71-83, 2000 8

9