tatistical ad Fuzzy Approach for Database ecurity Gag Lu, Jukai Yi chool of Iformatio ciece ad Techology Beijig iversity of Chemical Techology Beijig 100029, Chia sizheg@126.com Kevi Lü Bruel iversity, xbridge B8 3PH, K Abstract A ew type of database aomaly is described by addressig the cocept of Cumulated Aomaly i this paper. Dubiety-Determiig Model (DDM), which is a detectio model basig o statistical ad fuzzy set theories for Cumulated Aomaly, is proposed. DDM ca measure the dubiety degree of each database trasactio quatitatively. oftware system architecture to support the DDM for moitorig database trasactios is desiged. We also implemeted the system ad tested it. Our experimetal results show that the DDM method is feasible ad effective. 1. Itroductio The umber of security-breakig attempts origiated iside a orgaizatio is icreasig steadily [5][8]. These attacks are usually made by "authorized'' users of the system. Typically, i oe type of itrusio, a attacker who is authorized to modify data records uder certai costraits deliberately hides his itetios to chage data beyod costraits i differet operatios ad differet trasactios. Ofte, i this type of attack, each idividual trasactio is legitimate; however, the accumulated results of the attacker s operatios are malicious. The existig Itrusio Detectio ystems (ID) ca be grouped ito two classes: (1) misuse detectio, which maitais a database of kow itrusio techiques or behaviors ad detects itrusios by comparig users behaviors agaist the database [7][8]; (2) aomaly detectio, which aalyzes user behaviors ad the statistics of a process i a ormal situatio, ad checks whether the system is beig used i a differet maer [3][9]. I geeral, misuse detectio model caot detect ew, ukow itrusios [7]. Aomaly detectio eeds to maitai the records of users' behaviors ad the statistics for ormal usages, which is referred to as profiles. The profiles ted to be large. That makes detectig itrusio eeds a large amout of system resources, ad delays detectio decisio makigs. If attackers hide their operatios ito other places, aomaly detectio may ot eve be able to detect them. It is fair to say that either aomaly detectio or misuse detectio would be able to effectively detect Cumulated Aomaly. New techiques eed to be ivestigated. I this study, we ivestigate Cumulated Aomaly ad propose a model for detectio. I this model, the detectio rules are set up maually based o the statistical properties of itrusios amogst the ormal trasactios. I additio, membership fuctios [5] i fuzzy set theory, with their parameters specified ito the detectio rules, are applied i the model to moitor ad preset the possibility of itrusios i real time. Membership fuctios assist detectio rules to idicate the likelihood of a trasactio beig itrusive. If a trasactio is idetified by a detectio rule as a possible itrusio, it is said that the rule matches the trasactio. A idicator (degree) withi the iterval [ 01, ] will be calculated. This idicator is used to represet the dubiety degree of a trasactio. Therefore, this model is amed as Dubiety- Determiig Model (DDM). I this method, the dubiety of various types of database trasactios ca be quatitatively deoted i a uified form way. By showig the dubiety degrees of database trasactios, the model ca detect possible aomalies if their dubiety degrees are high. The rest of the paper is as follows. ectio 2 reviews some related work briefly. ectio 3 describes the DDM method. Desig ad implemetatio issues
are discussed i ectio 4. I ectio 5, the experimetal results are itroduced. ectio 6 is the coclusio. 2. Related work The characteristics of widespread used databases with the ivaluable data held i them make it vital to detect ay itrusio or itrusio attempts made at the databases. Therefore, basig o the developmet of itrusio detectios o computer systems, itrusio detectio for databases is becomig imperative eeds. Besides access policies, roles, admiistratio procedures, physical security, security models, ad data iferece, misuse detectio ad aomaly detectio at databases have bee focused o. Christia Yip Chug, Michael Gertz ad Karl Levitt developed DEMID, which is a misuse detectio system for database systems tailored to relatioal database systems [2]. Fracesco M. Malvestuto, Mauro Mezzii ad Maria Moscarii propose a approach to avoid releasig summary statistics that could lead to the disclosure of cofidetial idividual data i [4]. I [8] ad [10], i Yeug Lee, Wai Lup Low ad Pei Yue Wog describe a algorithm that summarizes the raw trasactioal QL queries ito compact regular expressios. All of them have poited out that the cotet of trasactios ca be used to abstract the users profiles, which will be used durig misuse detectio or aomaly detectio. However, to make the detectio results more precise, some quatitative approaches should be employed. I the existig database itrusio detectio researches, fuzzy set theory is maily used with other theories such as eural etwork i buildig profiles for aomaly detectio [1][9][11]. For example, [6] uses a fuzzy Adaptive Resoace Theory (ART) ad eural etwork to detect aomaly itrusio of database operatios, by moitorig the coectio activities to a database. As a result, we have a motivatio of itegratig fuzzy set theory ad itrusio detectio techique to deal with Cumulated Aomaly i databases precisely i real time. 3. Dubiety-Determiig Model (DDM) Give a metric for a radom variable X ad observatios X,, 1 X, the purpose of the statistical sub-model of X is to determie whether a ew observatio X + is abormal with respect to the 1 previous observatios. The mea avg ad the stadard deviatio stdev of X, 1 X are defied as:, X1 + X2 + + X avg = stdev = i= 1 ( X avg) i 2 (1) (2) A ew observatio X 1 + is defied to be abormal if it falls outside a cofidece iterval that is stadard deviatios from the mea, which is deoted by CI: CI = avg ± dev (3) dev = d stdev with d as a parameter. Note that 0 (or ull) occurreces should be icluded so as ot to bias the data. This model ca be applied to variat cases such as evet couters accumulated over a fixed time iterval. Therefore, it would apply for the case of Cumulated Aomaly. Membership fuctios are used to measure the dubiety degrees for each trasactio. For each trasactio, a value of variable X ca be observed. It ca be mapped ito the iterval [ 0,1 ] by a membership fuctio. We defie 0 meas completely acceptable, ad 1 implies aomaly or completely uacceptable. The values betwee 0 ad 1 are called dubious degree. I this way, the dubiety of trasactios ca be deoted i a uified form. A appropriate membership fuctio is the basis of quatitative aalysis o fuzzy attributes ad plays a key role i fuzzy mathematics. The most widely used fuctios iclude -shaped fuctios ( F ), -shaped fuctios ( F ) ad π -shaped fuctios ( F π ). With - shaped fuctios ( F ) defied as complemetarities of π -shaped fuctios, as Figure 1 shows. I Figure 1, we assume that a b c. It is straightforward to prove that whe a= b= c, F ad F both have oly two values which are 0 ad 1, while F π oly has 0 ad F oly has 1 as their values. By adjustig the values of a, b ad c, the shapes of F π ad F ca be chaged. F F π F F Figure 1. The curves of the membership fuctios
A set P cotaiig observatios X,, 1 X of a metric for a radom variable X, i.e. P= { X = 1,2, }, ca be obtaied. I P, there must be a miimum X mi ad a imum X. The mea of all the elemets i P is avg as (1) defies. It is defied that CI = [ X mi,x ]. Thus, by assigig X mi, avg ad X to the parameters of membership fuctios a, b ad c, respectively, ay observatio of a metric for a radom variable X ca be mapped to a real umber i [ 0,1 ]. This real umber deotes the dubiety degree of a observatio X. The values of X mi, avg ad X ca be obtaied by existig approaches. Because X mi ad X are both i CI, F ( ) < 1 ad F ( ) < 1 X mi X must stad (meaig X mi ad X do ot cause aomaly), F { F, F, F, F }. As a result, we π have the defiitio of the four types of membership fuctios show i Figure 2. The parameter α ca be assiged a proper value by users accordig to the applicatios. Nevertheless, it is recommeded that α is ot less tha 1 too much to keep the result values i (,] bc differetiable. 4. Architecture based o DDM The architecture for database trasactio moitorig based o DDM is desiged as show i Figure 3. The user iterface (I) provides tools for iteractios, which icludes ettig Rules ad display Dubiety-Determiig Results. ettig Rules allows users to set up moitorig policies. These moitorig policies are the formatted ad trasferred ito Detectio Rules Base by Mappig to Rules. The iformatio about each database trasactio is orgaized ito Audits Base by esor. Evet Aalyzig selects every ew audit record from Audits Base, ad the checks agaist the detectio rules i Detectio Rules Base. Fially, Evet Aalyzig calculates dubiety degree for the audit record, ad forwards the results to Dubiety-Determiig Result. Other mai compoets of the architecture are: Audits Base is built to store the audit records geerated by esor, while Detectio Rules Base is used to store detectio rules defied maually. ettig Rules, used to defie detectio rules, specifies which attributes of trasactios to moitor, what types of membership fuctios to use, ad what the values of the parameters i membership fuctios are, etc. Mappig to Rules. Whe the iformatio of the moitorig policy ad membership fuctio is decided, Mappig to Rules traslates it ito the format of detectio rules to store i Detectio Rules Base. esor. This module moitors the trasactios of applicatio databases i real time. By aalyzig each trasactio processed, it collects iformatio about the trasactio, ad the stores it i Audits Base. Evet Aalyzig. This is the cetre of the whole architecture. The moitorig algorithm is implemeted i this module. For each record i Audits Base, Evet Aalyzig Module is processed ad matched agaist the rules i Rules Base. The value of the moitored attribute is the obtaied. By substitutig this value i the membership fuctio defied i the rule, the result of the fuctio is calculated as the degree of dubiety. There are two basic data structures required i DDM: Audit Record ad Detectio Rule. Audit Record is for recordig the iformatio about each database trasactio. Detectio Rule is the structure for specifyig the format of the detectio rules. The details of the two structures are defied as follows. Audit Record. This data structure is 6-tuple recordig iformatio of each database trasactio: <AID, ID, QLText, Time_stampe, Data1, Data2> 0 x a 2 1 x a a < x b 2 b a F ( x, a, b, c) = 2 α + 1 α c x b < x c,0< α < 1 2 2 c b 1 x > c F( xabc,,, ) = 1 F( xabc,,, ) a+ b F ( x, a,, b) x b 2 Fπ ( xabc,,, ) = b+ c F ( x, b,, c) x > b 2 F ( x, a, b, c) = 1 F ( x, a, b, c) π Figure 2. The defiitios of the membership fuctios Figure 3. The architecture for database trasactio moitorig based o DDM
AID is the idetifier for each audit record. ID records the user ame of the trasactio. QLText records the cotet of the QL statemet of the trasactio. Time_stamp records the time whe the trasactio is executed. Data1 is the first data field that the trasactio relates to. For example, the data value before update. Data2 is the secod data field that the trasactio relates to. For example, the data value after a update. To make it clearer, from ow o i this paper, we will use the term audit record istead of trasactio. Detectio Rule. This data structure is 6-tuple defiig the format of the detectio rules: <RID, ID, Actio, Obj1, Obj2, Coditio, Time_widow, Mo_type, Fuctio, Eable> RID startig with the letter R is the idetifier for each detectio rule. ID idicates which user the rule is aimed at. Actio idicates what type of operatios the rule is related to, such as select, update, delete ad so o. Obj1 ad Obj2 records for which database object (table, view, procedure, ad so o) the rule is valid. Obj1 is the first object that Actio refers to, such as a table, a view or a procedure. Obj2 is the secod oe. If Obj1 is a table or a view, Obj2 will be a field ame. Coditio idicates the coditio of Actio. sually it is the coditio part ( clause) of the QL statemet. Time_widow specifies a umber of hours as a time rage. The audit records occurred i that time rage before the curretly beig checked oe will be sought by the rule. Mo_type is the type of moitor. It has two values: C ad. C is used for coutig umbers ad is for recordig the sum value. Fuctio is sub-tuple recordig the iformatio of the membership fuctio used by the rule: <FID, A, B, C> FID specifies which type of membership fuctio to use. It has four values. meas F. meas F. P meas F π, while meas F. A, B, ad C store the values of a, b, ad c respectively (defiitio of membership fuctio). Eable is a switch. Whe it is 1, the rule is valid; otherwise, it is ot. 5. Experimetal results The experimets are performed o the DBM of Microsoft QL erver 2000 o Microsoft Widows erver 2003 P1, to show whether DDM ca discover Cumulated Aomaly behaviors. The example database Northwid of QL erver is used i this study. The table Products i it stores product-related data, icludig ProductID ad itprice. uppose there is a product whose ProductID is 9 i Products. Assume a member of staff, A, is authorized to modify itprice of Product 9. However, if the itprice has bee chaged too much or too ofte, it could be suspicious. It is defied that itprice should ot be chaged for more tha 4 times i 30 days, ad the sum of chaged value should ot be more tha 3 pouds i 90 days. Audits Base ad Detectio Rules Base are built accordig to the two basic structures defied. Data. 30000 ormal audit records are stored i the database. Their schema is described i ectio 4. They iclude Time_stamps (system clock) i a period of three moths. The values of fields QLText are commo database operatios i the form of QL statemets, icludig selectig data from a table, updatig the data i a table, isertig data ito or deletig data from a table, executig a procedure, ad opeig a database. Referrig to the above assumptios, 12 additioal audit records for A s updatig itprice of Products 9 are costructed ad mixed ito the existig 30000 audit records. These 12 records are distributed ito the rage of three moths. The Detectio Rules Base (described i ectio 4) cotais two typical detectio rules listed i Table 1 (i which the colum of Eable is ot listed to make the table ot too wide). For example, R02 is used to moitor the audit records with A as ID, update [Products] set itprice=p ProductID=9 as QLText ( p is a umber). The data items before ad after update operatio are recorded i the fields Data1 ad Data2. Whe a audit record R which meets the demad of R02 occurs, the algorithm seeks the audit records meetig the demad of R02 which have occurred 2160 hours before R, ad sums up the margis betwee each pair of Data1 ad Data2 i each of them. The, the summatio is substituted ito F defied i R02. Fially, a result value of the fuctio is calculated as the dubiety degree of that audit record. As this is a real-time process; a audit record will be examied as soo as it arrives.
Table 1. The two detectio rules RID ID ACTION Obj1 Obj2 CONDITION TIME_WINDOW MON_TYPE FID A B C R01 A update Products itprice ProductID=9 720 C 0 3 5 R02 A update Products itprice ProductID=9 2160-3.0 0 3 It ca be see from Table 1 that the two rules are both desiged to moitor A s operatios of updatig itprice of Products 9. R01 moitors the umber of occurreces of the operatio over 30 days (720 hours), while R02 moitors the accumulated values modified over 90 days (2160 hours). Results. I this experimet, we let α = 0.9. As a result, F( X) = F( c) = 0.95. The experimet cotais three tests. I Test 1 oly R01 is eabled. I Test 2 oly R02 is eabled. Both R01 ad R02 are eabled i Test 3 to show the combied results. Figure 4 shows all results. Figure 4 (a) shows the value of itprice after A updates it for each time. Figure 4 (b) shows the moitor result of usig the rule of R01. We ca see that the dubiety degree is icreasig gradually. However, it does ot reach 1 all the while. That meas o aomaly occurs by R01. Figure 4 (c) shows the results of moitorig the modified itprice of Product 9 over 90 days by R02. It is show that the dubiety degree is more ad more close to 1. At the ed the dubiety degree reaches 1. Accordig to the defiitio of DDM, aomalies may occur. Whe R01 ad R02 are both eabled i Test 3, the results are show i Figure 4 (d). Figure 4 (d) also ca be regarded as the combiatios of Figure 4 (b) ad Figure 4 (c) by selectig the poit with the higher dubiety degree value betwee (b) ad (c) for each AID. I geeral, whe several detectio rules are matched to the same audit record, the highest value of dubiety degree amogst these rules will be selected. From the results, we ca see A s operatios cause aomaly. 6. Coclusio A ew type of database aomaly Cumulated Aomaly is ivestigated. A ew detectio method Dubiety-Determiig Model (DDM) has bee proposed for it. Based o DDM, architecture for database trasactio moitorig is desiged ad implemeted. Tests have bee performed to verify the effectiveess of our ovel method. The results suggest that our methods are capable of idetifyig suspicious user behaviors. We are curretly cosiderig developig a method based o DDM for geeral aomaly detectio i databases. Figure 4. The result of the experimet Refereces [1] Cha Ma Kuok, Ada Fu, Ma Ho Wog. Miig fuzzy associatio rules i databases. IGMOD Record, 1998, 27(1), 41-46. [2] Chug C Y,Gertz M, Levitt K. DEMID: A Misuse Detectio ystem for Database ystems. I:The Third Aual IFIP TC-11 WG 11.5 Workig Cof. o Itegrity ad Iteral Cotrol i Iformatio ystems, 1999 [3] Darre Muts, Fredrik Valeur, Giovai Viga. Aomalous system call detectio. ACM Trasactios o Iformatio ad system eurity, Vol. 9, No. 1, February 2006, 61-93. [4] Fracesco M. Malvestuto, Mauro Mezzii, Maria Moscarii. Auditig sum-queries to make a statistical database secure. ACM Trasactios o Iformatio ad system ecurity, Vol. 9, No. 1, February 2006, 31-60. [5] Pedrycz Witold, Gomide Ferado. A Itroductio to Fuzzy ets: Aalysis ad Desig. Cambridge, Mass. MIT Press, 1998. [6] Rug Chig Che, Cheg Chia Hsieh. A aomaly itrusio detectio o database operatio by fuzzy ART eural etwork. Proceedigs of IC 2004. 839-844. [7] ato I., Okazaki Y., Goto.. A improved itrusio detectig method based o process profilig. Trasactios of the Iformatio Processig ociety of Japa vol.43, o.11: Nov. 2002, 3316-26. [8] i Yeug Lee, Wai Lup Low, Pei Yue Wog. Learig figerprits for a database itrusio detectio system. EORIC 2002, LNC 2502, 264-279. [10] Tia-Qig hu, Pig Xiog. Optimizatio of membership fuctios i aomaly detectio based o fuzzy data miig. Proceedigs of 2005 Iteratioal Coferece o Machie Learig ad Cyberetics (IEEE Cat. No. 05EX1059): (Vol. 4) 1987-92 Vol. 4, 2005. [11] Wai Lup Low, Joseph Lee, Peter Teoh. DIDAFIT: Detectig itrusios i databases through figerpritig trasactios. ICEI 2002.