Intelligent Agents and Fraud Detection

Transcription

1 Intelligent Agents and Fraud Detection Name: Jia Wu and Jongwoo Park 1. Introduction Frauds have plagued telecommunication industries, financial institutions and other organizations for a long time. The types of frauds addressed in this paper include cellular communication frauds, credit card transaction frauds, and computer intrusions. These frauds cost the businesses millions of dollars per year. As a result, fraud detection has become an important and urgent task for these businesses. At present a number of methods have been implemented to detect frauds, from both statistical approaches (e.g. data mining) and hardware approaches (e.g. firewalls, smart cards). Currently, data mining is a popular way to combat frauds because of its effectiveness. Hand et al define that data mining is a well-defined procedure that takes data as input and produces output in the forms of models or patterns. In other words, the task of data mining is to analyze a massive amount of data and to extract some usable information that we can interpret for future uses. In doing so, we have to define the clear goal of data mining, and find out the right structure of possible model or patterns that fit to the given data set. Once we have the right model for the data, we can use the model for predicting future events by classifying the data. In terms of data mining, fraud detection can be understood as the classification of the data. Input data is analyzed with the appropriate model and determined whether it implies any fraudulent activities or not. A well-defined classification model is developed by recognizing the patterns of former fraudulent behaviors. Then the model can be used to predict any suspicious activities implied by the new data set. One limitation of using data mining alone in fraud detection is its efficiency problem. Data mining and model construction require a lot of time, which prohibits it to detect frauds in real time. This is a serious drawback since, in many occasions such as online credit card transactions, we need to detect fraudulent activities in a very short period of time. Otherwise, the loss could be huge. 1

2 With the rapid development of information technologies, many new methods that exploit the power of IT to detect frauds have been created. One of these recent methods is to use intelligent agents for fraud detection, which incorporates both computer technologies and data mining knowledge. In this paper we examine the use of intelligent agents in fraud detection. By intelligent agents we mean computer programs that can act on behalf of a person to do various jobs. Intelligent agents can automate a large portion of the fraud detection process and require little human intervention. Additionally, intelligent agents do not stick to one model or rule. They can construct new models and rules for fraud detection with their machine learning capabilities. It would be harder to deceive intelligent agents than other computer programs for fraud detection. Besides, in a multi-agent system, many intelligent agents can work in parallel and corporate with each other. This not only accelerates the detection process but also increases the detection accuracy. Moreover, intelligent agents can be deployed online for real-time detection. It is an extremely desirable feature for online credit card fraud detection and network intrusion detection. The rest of the paper is organized as follows. Section 2 examines a variety of frauds, namely cellular frauds, credit card transaction frauds, and computer intrusions. Section 3 discusses various data mining algorithms for detect frauds and one pattern comparison algorithm. Section 4 describes two different types of intelligent agents. Section 5 gives some applications of the implementation of intelligent agents for fraud detection. Section 6 is our proposed further research areas regarding this topic. We attempt to apply intelligent agents for fraud detection in continuous auditing. And section 7 is the conclusion part. 2. Types of Frauds 2.1 Frauds in Mobile Communications 2

3 In the United States, frauds in mobile communications cost the industry hundreds of millions of dollars per year (Walters and Wilkinson 1994, Steward 1997). It is easy for criminals to commit frauds and hard to trace them due to the nature of mobile communication networks. One of the most epidemic and costly frauds in this area is the cloning fraud. A mobile phone is identified by two numbers, mobile identification number (MIN) and electronic serial number (ESN). Cloning occurs when a criminal makes use of a mobile communication scanner to steal MIN and ESN from a legitimate subscriber and program them into another phone. Afterwards, the illegitimate user can make unlimited calls which will be billed to the legitimate user. On one hand, cloning phones attracted many illicit users because the calls are free and untraceable. On the other hand, the fraudulent usage of cloning phones costs millions of dollars of revenue losses for mobile communication service providers. In addition, because calls made from these cloned phones are very difficult to trace, criminals and terrorists can take advantage of this to perpetrate more serious crimes. 2.2 Frauds in Credit Card Transactions Credit card frauds have been a long-time headache for credit card companies. With the growth of online business around the world, the number of credit card frauds has also increased drastically. A criminal can either steal the plastic credit card to use offline or just obtain the credit number to use it online. Losses from credit card frauds are higher than mobile communication frauds since the former usually involve large amount of transactions. Like mobile communication frauds, credit card frauds are also not easy to trace. 2.3 Intrusions in computer systems Intrusion detection plays a vital role in today s networked environment. Intrusions into computer systems include unauthorized users penetrating the computer systems and authorized users abusing their privileges. Intrusion into computer systems is the most epidemic type of fraud since it is easy to commit. Furthermore, it is very difficult to trace 3

4 the intruders because they may hide in any corner of the world so long as they have the Internet connection. 3. Fraud Detection Algorithms The concept of fraud detection has been founded on data mining techniques such as classification and association rules. Research on fraud detection has been focused on the pattern matching in which abnormal patterns are identified from the normality. We focus on the Detector Constructor framework called DC-1 proposed by Fawcett and Provost (1997) for telephone calls fraud detection and Intrusion Detection framework proposed by Lee, Stolfo, and Mok(1998) 3.1 Detector Constructor Systems (DC-1) (Fawcett and Provost, 1997) Fawcett and Provost s (1997) approach is focused on individual accounts sensitivity, profile, and the aggregation of them to obtain better predictive power. They apply their approach to the account history of cellular calls. The Detector Constructor framework (hereafter DC-1) starts with analyzing available call records including defrauded calls. (1) Classification Rule Learning First, based on the given history of an account, calls of an account are analyzed and labeled as fraudulent calls and legitimate (non-fraudulent) calls. The local set of rules for the account is searched. For example, for one specific account, the following classification rule is devised (Time-of-Day = Night) AND (Location = Bronx) Fraud with certainty factor = 0.89 The certainty factor is defined as a simple frequency-based probability estimate. This rule means that a call is made at night from the Bronx can be considered fraudulent with 89% of the probability. 4

5 However, it is required to have a set of rules, a priori rules, that can perform as fraud indicators, since the rules generated are specific to one single account. In order to generate rules that can apply to as many accounts as possible, they devise an algorithm controlled by two parameters such as T rules and T accts. T rules is defined as a threshold on the number of rules required to cover each account, and T accts is defined as the number of accounts which a rule must have been found in to be selected at all. After an account is examined with a certain number of rules and a rule is applied to a certain number of accounts, a rule is selected. The list of rules generated from each account is reviewed. Finally, the rule that appears the most frequently from the list of the entire account set is chosen. [Refer to Appendix I for their algorithm for rule selection for DC-1]. (2) Construction of Profiling Monitors After rules are selected, a set of monitors are built. The purpose of profiling monitors is to investigate the sensitivities of accounts to general rules. The construction of profiling monitors consists of two stages, a profiling stage and a usage stage. In the profiling stage, a general rule is applied to a portion of an account s legitimate usage to evaluate the account s normal activities. In other words, legitimate activities of an account are summarized into profiling monitors through the use of templates. The statistics of the account s normal activities is saved to that account. Later, in the usage stage, the monitor is applied to the whole part of the account (i.e. account-day). The resulting statistics can be used to examine the abnormality of the usage of the account per day. During this process, the profiling monitors are built by the monitor constructor, which is a set of templates. These templates examine the conditions of the rules. Based on the result of it, each rule-template is finally derived as a profiling monitor. For example, templates are made up with various statistical expressions such as a threshold monitor and a standard deviation monitor. In the threshold monitor, binary categorizations are made according to whether the user s behavior of a day exceeds the threshold defined with the portion of a day. Also, in the standard deviation monitor, different output values 5

6 are defined according to how much the user s behavior of a day deviates from the rule s condition defined with the portion of the day. (3) Combination of Evidence from the Monitors To improve the confidence of the detection, monitors are combined with evidence resulted from the application of monitors to the sample data. For example, monitors generated are applied to a sample account-day, and their outputs, whether fraudulent activities are detected or not, are expressed as a result vector for that day. The evidence about the account-day, whether the account day truly has frauds or not, is introduced together with the outputs. Then, the outputs are weighted with the combination of evidence. Also, the combination of evidence is trained with the threshold value based on the sum of weights. Hence, it is possible to put more confidence on monitors with larger weights to prevent false alarms. After all, there may exist redundant and ineffective rules. To reduce the number of monitors, they propose the use of a sequential forward selection process. Finally, fraud detectors are selected from monitors combined with evidence. 3.2 Intrusion Detection Framework (Lee, Stolfo, and Mok, 1998) Lee, Stolfo, and Mok (1998) design an intrusion detection with the use of data mining techniques. Intrusion detection techniques are largely categorized into two types such as anomaly detection and misuse detection. In the anomaly detection technique, the task is focused on extracting normal (non-fraudulent) usage patterns and finding out deviation from them. On the other hand, in the misuse detection technique, the patterns of previous intrusions and the vulnerable spots of a system are captured based on the historical audit data. Then, an intrusion trial is compared with these identified previous patterns. Their intrusion detection framework starts from the point that there may be a series of access failures to a system that resulted from intrusion trials recorded in the network traffic audit data. Therefore, it is possible to detect intrusions (fraudulent behaviors) by using classification and association rules added with episode analysis. 6

7 (1) Association Rules and Frequent Episode Rules Their framework starts with an expression of an association rule, X Y [c, s]. X and Y are item sets (subsets of attributes in the entire data set. Attributes are columns of a data set). s is support (X Y) of the rule and c is the confidence (support(x Y)/support(X)). The association rule is based on the idea of a priori (Agrawal and Srikant, 1994) in which item sets with length 1 are continuously summed up while joint item sets containing unfrequent subsets are pruned out. If the value of support of an item set is greater than given threshold value, the item set is understood to occur frequently. For example, trn rec.humor, 0.3, 0.1 means that trn comprises 10% of an user s activities, and when a user invokes trn, 30% of the time the user reads the rec.humor file. To consider a sequential characteristic of events, they use the concept of frequent episodes based on minimal occurrences that was devised by Mannila and Toivonen (1996). The frequent episode rule is represented as X, Y Z [c, s, window]. This expression means an episode in which X precedes Y, and Y precedes Z. The episode happens with confidence and support values given. Each event has the width (interval) that is less than the value of window. (2) Introduction of the Axis Attributes and Reference Attributes To prevent meaningless patterns from being generated, they devise the concept of axis attributes and reference attributes. Axis attributes express essential attributes for the construction of association patterns. Therefore, an item set must have these axis attributes to generate a meaningful association pattern. For example, if the service that computer system connections provide is important, the attribute of the service becomes an axis attribute. Then, the association pattern can be expressed as, [Refer to Table 1 of Appendix II] (service = smtp, src_bytes = 200, dst_bytes = 300, flag = SF), 7

8 (service = telnet, flag = SF) (service = http, src_bytes = 200), [0.2, 0.1, 2s] In addition, they devise the concept of reference attributes. They find out that there are some patterns in intrusion trials in which an attribute can play a role of subject. And, some action attributes refer to the subject attribute. For example, it is possible to see the sequence of /images, /images and /shuttle/missions/sts-71 is requested by the same remote host his.moc.kw. from the web log records. [Refer to Table 2 of Appendix II] Next time when the same sequence of requests is recognized, it is possible to find out whether the new sequence has been requested to the same subject attribute identified. If the new sequence does not have the same referred subject, it is possible to drop this episode from the candidate patterns. By devising axis and reference attributes, and defining a frequent episode algorithm, they state that it is possible to improve the pattern finding. (3) Level-wise Approximating Mining On the other hand, in some cases, a pattern with a low frequency matters. However, if the support threshold gets lowered to capture a less frequent but important pattern, the number of rules may increase. To prevent these undesirable results, they propose a levelwise approximate mining. First, the episodes with high frequency axis attribute values are searched. Second, the episodes that have low frequency axis attribute values are searched by the reduction of the support threshold, while the old high axis attribute values are held. Since the old axis attribute value with high frequency already finds out episodes, only new infrequent attribute values, which are relevant in pattern searching, can be considered for new patterns. For example, assuming the following association and episode rule is generated, (service = smtp, src_bytes = 200), (service = smtp, src_bytes = 200) (service = smtp, dst_bytes = 300), [0.3, 0.3, 2s] 8

9 in the second level-wise rule, the axis attribute value is changed from smtp (frequent one) to http (infrequent one) and the support threshold is decreased to 0.1. (service = smtp, src_bytes = 200), (service = http, src_bytes = 200) (service = smtp, dst_bytes = 300), [0.4, 0.1, 2s] [Refer to appendix III for the algorithm for level-wise approximate mining of frequent episodes] 3.3 Algorithms for Pattern Comparisons (Lee, Stolfo, and Mok. 1999) Their data come from simulated intrusion trials with attacking programs. They realize the fact that the patterns of a normal traffic data set differ from the patterns of simulated intrusion attacks. They suggest that, by iteratively comparing these two different patterns, it is possible to find out the patterns of intrusion attacks clearly. During this process, roles of axis attributes and reference attributes are crucial for the sake of rapid pattern comparisons. Through pattern comparisons, fraud patterns can be generated. (1) Encoding scheme First, after the level-wise approximating mining with the use of axis and reference attributes, a candidate classifier is merged and selected. After we get frequent patterns of normal traffic and intrusion attacks, it is possible to encode each pattern to a series of numbers that are comparable. Once patterns are encoded into numbers, these numbers are compared and the most or least similar numbers (patterns) can be selected according to the purpose. The table of data has n attributes. Then each row can represent an association. Some rows in the table have a full set of attributes. On the other hand, others do not have a full set of attributes, and miss some attributes. First, attributes of one association are ordered in terms of (user-defined) importance such as flag, axis attributes, reference attributes, and 9

10 so on. Second, in cases of missing attributes, the positions for the missing attributes are filled with the null value, 0. Therefore, one association can be expressed as (A 1 = v 1, A 2 = v 2,,, A n = v n ) in the complete and ordered form. For example, associations are encoded as shown in the Table 3. [Refer to Table 3 of Appendix 2] For different values of each attribute different numbers are assigned, and for the missing attribute 0 is assigned. Since the flag attribute that tells whether the association is from normal traffic (SF) or the association is from intrusion attack (SO), it comes first in terms of the importance in encoding. (2) Comparing Two Patterns After finishing encoding of associations, an episode is mapped by combining associations. For example, after encoding X association becomes encoding x = x 1 x 2 x n, Y association becomes encoding y = y 1 y 2 y n, and Z association becomes encoding z = z 1 z 2 z n. When there is an episode X, Y Z, the episode can be expressed as x 1 z 1 y 1 x 2 z 2 y 2 x n z n y n, as one dimension of series of numbers. For example, if an intrusion attack episode is given as (flag = SO, service = http), (flag = SO, service = http) (flag = SO, service = http) [0.93, 0.03, 2], the encoded episode becomes When a normal traffic episode is given as (flag = SF, service = http), (flag = SF, service = icmp_echo) (flag = SF, service = http), the encoded episode becomes Consequently, it is possible to compare two episodes by subtracting two episodes and getting the absolute values of differences from each digit (diff score). For example, for the comparison of two episodes above, the diff score of subtraction, d x1 d z1 d y1 d x2 d z2 d y2,,,d xn d zn d yn, is Based on this method, they give the following method of selecting the pattern: (1)Encode all patterns (2)For each pattern from the 10

11 intrusion dataset, calculate its diff score with each normal pattern; keep the lowest diff score as the intrusion score for this pattern (3)Output all patterns that have non-zero intrusion scores, or a user-specified top percentage of patterns with the highest intrusion scores. 4. Intelligent Agents in Fraud Detection Fraud detection is a non-trivial task in this information explosion age. It is faced with three major challenges. First, fraud detection usually involves a large amount of data. In the United States, there were 25 million cellular phone users in 1997 who made about 30 million calls per day (Abu-Hakima et. al., 1997). And these numbers are estimated to have doubled in the recent three years. In Spain, more than 1.2 millions of Visa card operations take place in a given day, 98% of them being handled on line (Dorronsoro et al., 2001). Detecting frauds in such high volume of data is worse than finding a needle in a haystack. It is easy to differentiate a needle from hay but it is hard to tell fraudulent activities from legitimate ones since they look similar. Second, fraud detection needs to be highly accurate. Although the sum of fraudulent activities is very high, the fraud rate is relatively low compared to the gigantic volume of legitimate operations. For credit card transactions in general, the fraud rate is 0.93%. And it is 1.97% for online credit card transactions. Thus, a good fraud detection mechanism should be good at catching frauds and reducing false alarms as well. In terms of statistics, it should be low in both Type I error and Type II error. On one hand, a low fraud coverage rate can increase losses for service providers. On the other hand, a high false alarm rate can irritate customers and drive them away from the companies business. As a result, no prediction success less than 99.9% is acceptable (Brause and Hepp, 1999). Third, frauds need to be detected fast. The expansion of telecommunication networks, the growth of e-business, and the wide deployment of computer systems have brought convenience to both legitimate users and criminals as well. A criminal can commit many 11

12 frauds with high dollar amounts in a short period time. Moreover, legitimate users and customers will lose their patience if they wait too long for fraud check in an operation or transaction. Thus, we need to detect frauds in a very short period of time. Otherwise, the damage costs will be high and the business will lose customers. It is hard for traditional fraud detection methods to satisfy these requirements. For example, the traditional data mining method for fraud detection requires all data reside in the computer s main memory. It is impossible to do so if a huge volume of data is involved. Besides, traditional fraud detection methods suffer from either low error coverage rate or high false alarm rate. Fraud detection methods can be circumvented. If the mobile phone service provider requires Personal Identification Number before a call is made, the criminal can clone that number. If a computer administrator deploys a firewall to block illicit computer uses, an intruder can figure out the configuration weakness within the firewall and bypass it. Many fraud models constructed with traditional statistical methods can generate numerous false alarms. And, if criminals change their patterns, the fraud models can be rendered useless. Moreover, traditional fraud detection methods are usually slow because they require a lot of human intervention. Traditional data mining requires a person to sample a data set, analyze it, establish fraud models, and eventually apply the model for fraud detection. And if a new fraud pattern emerges, the fraud detector needs to repeat the process again. This process needs to be executed offline and it usually takes a long time. Intelligent agents can overcome these obstacles in fraud detection. First, since a data set is handled by a number of agents, each agent only needs to deal with one small piece of data set. And if these agents are deployed on different computers, the piece of data set can reside in the main memories of these computers. Second, intelligent agents do not stick to one rule or model to detect frauds. They are able to derive new rules or models if they receive new inputs. In addition, multiple rules or models can be taught to intelligent agents to ensure the optimal fraud detection. In a word, intelligent agents are intelligent enough to defeat those sophisticated criminals. Last but not least, intelligent agents can 12

13 rapidly detect frauds through cooperation. And they can be placed online to detect frauds in real time. In this paper, we discuss three major types of intelligent agents for fraud detection. The first type is a classification learning multi-agent system, the second type is Java agents for meta learning (JAM), and the third type is artificial neural network agents. 4.1 Classification Learning Multi-agent System Classification learning agents, or rule-based agents, have been extensively studied by many researchers. This paper examines a classification learning multi-agent system specifically designed to detect mobile communication frauds. This system was proposed by Abu-Hakima et al in It consists of three types of agents, namely the Personal Communication Agents (PCAs), the Mobility Network Agents (MNAs), and the Fraud Breaking Agents (FBAs). Personal Communication Agents An important function of PCAs is to set up a user profile. It is possible for PCA to monitor and log all the outgoing telephone numbers, calling time and duration, and receivers information. After PCAs have gathered the information, they put the information in a user profile database. These pieces of information are compared and analyzed with the users previous calling history. The user s calling pattern can be generated using the DC-1 algorithm as we describe before. For example, one of the user s calling patterns may be that the user makes business calls from 9:00 AM to 5:00 PM and personal calls from 5:00 PM to 11:30 PM. Therefore, if the user makes a business call after 11:30 PM, it is very possible that this call is fraudulent. The PCAs can compare the latest user communication with its historical information stored in the user profile database. If it finds that it is an atypical call, it will try to inform the user by another means of communication such as a pager, or a regular phone. If the user can not be reached, the on-going phone call will be switched off by PCAs. 13

14 Mobility Network Agents The MNAs, which are expected to reside in the mobile switching center, can interact with the mobile service subscriber s PCAs. MNAs interact with PCAs to create a better user profile. MNAs can provide billing information about a user to PCAs on a continuous basis. From this information the PCAs can update its user profile database. If an MNA detects a suspicious call, it will alert the PCA. Then the PCA will either alert the user about the suspicious call or monitor the call information for additional evidence to prove that the call is a fraud. Fraud Break Agents Equipped with a single or multiple classification algorithms such as DC-1, Bayes, Ripper and CART, a Fraud Break Agent is specialized in detecting fraudulent calling patterns. Those patterns include long-time international calls, simultaneous calls originated from one cell phone, calls to known criminal centers or suspicious regions. FBAs also reside in the mobile switching center. And based on FBA information, the MNA alerts the PCA to check the user profile for any matching numbers and characteristics for the suspicious calls. We can see that each of PCAs, MNAs and FBAs provides an additional level of protection against fraudulent calls. They interact with each other using various algorithms and check different databases to ensure a low number of false alarms. It would be a very good fraud detection system if it is deployed in the real world. 4.2 Java Agents for Meta Learning The JAM system is a distributed, scalable and portable agent-based data mining system developed by a Columbia University based research group. A JAM system consists two levels of agents: the base level agents and a higher level agent. In order to detect frauds, the JAM system need to compute a fraud detector to judge whether a transaction or an operation is a fraud or not. In JAM this fraud detector is called a classifier. JAM utilizes a machine learning process called meta-learning to compute the classifier. (Chan and Stolfo 14

15 1993). In the meta-learning process, a training data set is divided into several small subsets and distributed to each base agent. Then each base agent computes a base classifier which is a model underlying the data subset, using one of the Bayes, C4.5, CART, ID3, Ripper algorithms or the intrusion detection framework as we mention before. Next, all the base classifiers are delivered to the higher level agent. Each individual base classifier is tested for prediction accuracy against a separate subset of the training data, called a validation set. Through these tests, the higher level agent learns the characteristics and performance of the base classifier. Then, it integrates these independently computed base classifiers into one higher level classifier, called a meta classifier, by using again one of the Bayes, C4.5, CART, ID3, Ripper algorithms or the intrusion detection framework. The meta classifier is the model of the global data set. The JAM system can use this meta classifier to detect frauds. The JAM system has several advantages over the traditional data mining method for fraud detection. The computation of base classifier is a distributed process. Therefore, the base agents can be placed separately in different locations to deal with different data sets. This has two meanings in terms of fraud detection in credit card transactions. First, each bank has usually its own confidential data set and established fraud detection mechanism. It would be better if all the data sets and existent fraud detection algorithms are shared between them. However, they normally would not like to exchange the confidential data sets and information with others. Therefore, if we place the base agents of JAM in the each bank s data set rather than having a centralized data set, it can fully leverage the existing collective wisdom of fraud detection in different banks without breaching their confidentiality requirement. The second beauty is that each base agent only needs to care about one data subset as opposed to a huge centralized data set. This reduces the agent s workload for a large measure. Furthermore, the machine learning process in a JAM system is a two level one. Compared with most other data mining methods, this can lead to a better fraud detector. The fraud model obtained will be globally optimal rather than locally optimal. The model is improved through the meta learning process. 15

16 Owing to these desirable features, the JAM system shows good results in detecting credit card frauds and computer intrusions when tested in lab. 4.3 Artificial Neural Network Lippman (1987) defines the Artificial Neural Network as a statistical information processing mechanisms composed of numerous distributed processing unit or nodes that perform simultaneous computations and communicate using adaptable interconnections called weights. Artificial Neural Network (hereafter ANN) consists of nodes residing in three layers including input, output, and hidden layers. Although the input and output nodes are determined by the user according to purposes, the hidden nodes serving as connectors between input and output layers are established by the network itself through training. Desouza (2001) defines that, in general, the processes of ANN comprise three stages such as training, testing, and deployment. In the training process, different weights are assigned nodes and layers by different training algorithms using the past data. Then, by using testing data extracted from the past data, it is possible to evaluate whether ANN can operate as desired. This combination of training and testing will be performed repeatedly until the model is obtained. There are two types of training methods: supervised and unsupervised methods. In the supervised training method, both the input and the desired result are provided. And the output is compared with the desired data until the predetermined accuracy is obtained by changing different links and weights assigned to ANN. However, in the unsupervised training method, only input data are given and human users do not compare output data with the desired results. Seymour (2000) states that ANN may be the best solutions in the situations where rule selection is difficult in terms of speed and complexity. He argues that ANN is preferred in two main reasons. First, the arithmetic characteristics of ANN make ANN good at handling large volumes of data. ANN put more focus on the pattern identification rather 16

17 than data analysis. Second, ANN can keep alternating its weights among the links with the accumulated data during the training. Therefore, it can be easily and quickly adapted into input changes. This is the main reason that ANN is considered one of the ideal applications for fraud detection that deal with the large amount of data. 5. Applications of Intelligent Agents in Fraud Detection Agent-based applications in general, and neural network agent systems in particular, have already been used in fraud detection. Furthermore, they have great potentials for wide adoption in the future. 5.1 Applications of JAM JAM has been applied in a lab test environment to detect frauds in credit card transactions and computer intrusions. For the credit card fraud test, the JAM research team used data sets provided by Chase and First Union. The two data sets, which are developed over years by experienced bank personnel for fraud detection, share a number of common properties. These properties include a hashed credit card number, scores produced by a commercial authorization/detection system, the date and time of the transaction, past payment information of the card user, the amount of transaction, and so on. Each of the two bank data sets also contain some important proprietary properties (PF). This causes data schema integration problem, which can be solved by two methods. One method is to learn a local model using PF information, later exchange the PF information between the two data sets, and compute a new local model. Another way is to learn a model using PF information and hold it locally without exchange. They sampled 84,000 records from 500,000 records from the two data sets for the learning process. The purpose of learning is to identify fraudulent characteristics in the 30 attribute fields to establish a fraud model. They applied four types of algorithms, including Bayes, C4.5, CART, ID3 and Ripper, to both the base classifier and meta- 17

18 classifier learning processes. The results indicate that Ripper and CART could produce the best base classifiers, and Bayes could generate the best and most stable metaclassifier. Ripper can CART could catch 80% of the fraudulent transactions and give false alarm to 16% of the legitimate transactions. In comparison, Bayes could catch 80% of fraudulent transaction but only cause false alarms to 13% of the transactions. On the opposite end, ID3 is found to be worst algorithm in the overall performance. JAM is also tested to detect intrusions in computer systems. One command called lpt in LINUX operating system can be abused by an intruder to cause a buffer overflow. JAM was applied to find out whether the command is sent by the legitimate user or the intruder. In this context, the agents are trained using the intrusion detection framework which includes axis and reference attributes, level-wise approximating mining, and pattern comparison. The result of the test indicated good performance. 5.2 Application of Neural Network PayPal has successfully implemented its neural network system which brings huge revenues for this company. PayPal is an online person-to-person (P2P) payment company. It allows one user to pay another user through s. To use PayPal service, both payer and payee must register before hand and link their PayPal accounts to either a bank account or a credit card account. To complete a payment, the payer needs to log into his/her PayPal account and tell PayPal the address of the payee and the payment amount. Then, PayPal will transfer the amount of money from the payer s PayPal account to the payee s account. And for this transaction, PayPal will charge a service fee. With PayPal service, a user does not need to register with a credit card company to receive credit card payment from another user. It brings a lot of convenience to individual or small online business users who need to make online P2P payment transactions. However, this type of online P2P payment system suffers from many illegal transactions. Since customers are usually not liable for the losses, it is crucial for merchants to stop these frauds. Apart from PayPal, there were several other companies doing similar 18

19 business. However, because of large fraud losses, these companies went out of business one after another. PayPal has survived well primarily because of its fraud detection system named Igor. Igor incorporates both old and new techniques from the field of artificial intelligence. It is a rule-based expert system equipped with neural network technology. Igor knows a series of rules (for example, if the recipient is associated with a known terrorist group, then block the payment.) Also, Igor s pattern detection algorithm can monitor user activities and learn new types of frauds over time. If a user keeps open new PayPal accounts linked with the same set of credit cards, Igor will learn the scam through data mining and watch the user s payment activities more cautiously. Fraud rate with PayPal is around.5%, is much better than average fraud rate of 1.13% for online merchants. 6. Further Research Areas Intelligent agents for fraud detection can be applied to many areas. One of these areas is continuous auditing. Continuous auditing is a promising field which can automate the auditing process and provide audit reports on a continuous basis. However, one weakness of continuous auditing is the possible management fraud problem. Due to the lack of human intervention, management frauds are more likely to occur. A multi-agent system for fraud detection can solve this problem. Agents can be deployed in supply chain partners sites, at the company s general ledger level, and at the company s financial statement level. The agents at the partners sites can monitor the transaction activities. And they can also interact with general ledger level agents to verify the data accuracy. Also, if there is some unusual transaction, these partner site agents can signal an alarm. After a transaction is completed, all the transaction data will be collected by the general ledger level agents and then delivered to the financial statement level agents. After the delivery, the financial statement level agents will summarize the information can create a set of financial 19

20 statements. Then these agents will compare the data in the reports with those in historical financial reports to check the overall reasonableness. If the data are suspicious, the agents at the financial statement level will alert the human auditor. All these agents will be created and deployed by the CPA firms to ensure the auditor s independence. The agents at the general ledger level and the financial statement level should be XBRLcompliant. With the agents aid, analytical procedures, substantive tests of balance, and the tests of details of balances can be performed automatically. The financial data are doublechecked, both with historical data and with partner s information, to prevent management fraud. 7. Conclusion Intelligent agents can play an important role in the fraud detection domain. They are robust enough to defeat sophisticated fraudsters, they are fast enough to minimize fraud damages, and they are scalable enough to tackle huge volumes of data. Intelligent agents will eventually be the ultimate means to fight against frauds. However, there is still a long way to go before the wide adoption of intelligent agents for fraud detection. The accuracy of fraud detection needs to be improved, the reliability of the agents needs to be ensured, and the costs to build and deploy these agents need to be reduced. Besides, at this point, it seems that research on fraud detection in accounting field, especially from the point of view of continuous auditing is not active. Several reasons can be thought. First, unlike intrusion detection of computer network and fraud detection in calling cards, it is much harder to find out particular patterns or episodes from accounting 20

21 data. According to Lee, Stolfo, and Mok (1998:Mining in a Data-flow environment), the real meaning of automated fraud detection has not been researched on yet. In other words, the research on anomaly detection has not seen any solid results yet. Therefore, tools for fraud detection are always getting behind newly-developed fraud schemes, since we have to learn the record of fraud schemes and train the detector. Since it is really hard to tell abnormal activities which are real fraudulent activities from ones that are unusual legitimate activities, the implementation of the anomaly detection seems difficult. If all the participants in the industry can share their historical fraud data and fraud classifiers, the wide adoption of by using intelligent agents can be realized in the near future. 21

22 Appendix I. Rule selection and covering algorithm used by DC 1 Given : Accts: set of all accounts Rules: set of all fraud rules generated from Accts T rules : (parameter) Number of rules required to cover each account T accts : (parameter) Number of accounts in which a rule must have been found Output : S: set of selected rules. 1. /*Initialization*/ 2. S = { }; 3. for (a Accts) do Cover[a] = 0; 4. for (r Rules) do 5. Occur[r] = 0; /*Number of accounts in which r occurs*/ 6. AcctsGen[r] = { }; /*Set of accounts generating r */ 7. end for 8. /* Set up Occur and AcctsGen */ 9. for (a Accts) do 10. R a = set of rules generated from a; 11. for (r R a ) do 12. Occur[r] : = Occur[r] + 1; 13. add a to AcctsGen[r]; 14. end for; end for 15. /* Cover Accts with Rules */ 16. for (a Accts) do 17. R a = list of rules generated from a; 18. sort R a by Occur; 19. while (cover[a] < T rules ) do 20. r := highest-occurrence rule from R a 21. Remove r from R a 22. if (r S and Occur[r] T accts ) then 23. add r to S; 24. for (a 2 AcctsGen[r]) do 25. Cover[a 2 ] = Cover[a 2 ] + 1; 26. end for; end if 27. end while; end for *source : Fawcett and Provost (1997) Adaptive fraud detection 22

23 Appendix II. Table 1. Network Connection Records Time stamp Duration Service Src_bytes Dst_bytes Flg telnet SF ftp SF smtp SF telnet SF smtp SF smtp SF http REJ smtp SF *source : Lee, Mok and Stolfo (2000), Adaptive Intrusion Detection : a Data Mining Approach Table 2. Web Log Records Timestamp Remote host (subject) Action Request (action) 1 his.moc.kw GET /images 1.1 his.moc.kw GET /images 1.3 his.moc.kw GET /shuttle/missions/sts taka10.taka.is.uec.ac.jp GET /images 3.2 taka10.taka.is.uec.ac.jp GET /images 3.5 taka10.taka.is.uex.ac.jp GET /shuttle/missions/sts-71 8 rjenkin.hip.cam.org GET /images 8.2 rjenkin.hip.cam.org GET /images 9 rjenkin.hip.cam.org GET /shuttle/missions/sts-71 *source : Lee, Mok and Stolfo (2000), Adaptive Intrusion Detection : a Data Mining Approach Table 3. Encoding scheme (Encodings of Associations) Association (flag = SF, service = http, src_bytes = 200) (service = icmp_echo, dst_host = host B ) (flag = S0, service = http, src_host = host A ) (service = user_app, src_host = host A ) Encoding (flag = SF, service = icmp_echo, dst_host = host B, src_host = host C *source : Lee, Mok, and Stolfo (1999), Mining in a Data-flow Environment : Experience in Network Intrusion Detection 23

24 Appendix III. Level-wise Approximate Mining of Frequent Episodes Input : the terminating minimum support s 0, the initial minimum support s i, and the axis attribute(s) Output : frequent episode rules Rules Begin (1) R restricted = 0; (2) scan database to form L = {large 1-itemsets that meet s 0 }; (3) s = s i ; (4) while (s s 0 ) do begin (5) find serial episodes from L : each pattern must contain at least one axis attribute value that is not in R restricted ; (6) append new axis attribute values to R restricted ; (7) append episodes rules to the output rule set Rules ; (8) s = s/2; end while end *source : Lee, Mok, and Stolfo (2000), Adaptive Intrusion Detection : a Data Mining Approach 24

25 References: Abu-Hakima, S., Toloo, M., White, T., A Multi-Agent Systems Approach for Fraud Detection in Personal Communication Systems, IJCAI-97 Workshop on Intelligent Adaptive Agents, Portland, Oregon, 1997 Agrawal, R., Srikant, R., Fast Algorithms for Mining Association Rules, In: Proceedings of the 20 th VLDB conference, Santiago, Chile, 1994 Brause, R., Langsdorf, T., Hepp, M., Neural Data Mining for Credit Card Fraud Detection, Working paper, J.W. Goethe University, Comp. Sc. Dep. Report, Frankfurt, Germany, 1999 Brause, R., Langsdorf, T., Hepp, M., Credit Card Fraud Detection by Adaptive Neural Data Mining, Internet Bericht, Frankfurt, Germany, 1999 Cannady, J., The Application of Artificial Neural Networks to Misuse Detection: Initial Results. Chan, P., Stolfo, S., Toward Parallel and Distributed Learning by Meta-Learning, In:AAAI Workshop in Knowledge Discovery in Databases, 1993, pp Desouza, K., Modeling The Human Brain: Artificial Neural Networks, 2001 (submitted to Journal of the Information Technology Professional, A publication of the Computer Society, IEEE) Dorronsoro, J., Ginel, F., Sanchez, C., Cruz, C.S., Neural Fraud Detection in Credit Card Operations, Paper Draft, Madrid, Spain, 2001 Fawcett, T., Provost, F., Adaptive Fraud Detection, Data Mining and Knowledge Discovery 1,2, Kluwer Academic Publishers, Boston, Massachusttes, 1997, pp Lee, W. et al., Real Time Data Mining based Intrusion Detection Lee, W., Stolfo, S., and Mok, K., Mining Audit Data to Build Intrusion Detection Models, In:Proceedings of the 4 th International Conference on Knowledge Discovery and Data Mining, New York, NY, Lee, W., Stolfo, S., Mok, K., Mining in a Data-flow Environment:Experience in Network Intrusion Detection, In:Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-99), 1999 Lee, W., Stolfo, S., and Mok, K., Adaptive Intrusion Detection : a Data Mining Approach, Kluwer Academic Publishers,

26 Mannila, H., Toivonen, H., Discovering Generalized Episodes Using Minimal Occurences, In:Proceedings of the 2 nd International Conference on Knowledge Discovery in Databases and Data Mining, Portland, Oregon, 1996 Patrick, B., Choi, J.H., Assessing the Risk of Management Fraud Through Neural Network Technology, Auditing: A Journal of Practice & Theory, Vol. 16, No.1, 1997 Prodromidis, A., Stolfo, S., Mining Databases with Different Schemas: Integrating Incompatible Classifiers, In:Proceedings of Fourth International Conference of Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, CA, 1998, pp Prodromidis, A.L., Stolfo, S., Agent-Based Distributed Learning Applied to Fraud Detection, CUCS working paper, New York, NY, 1999 Seymour, B., How Neural Network Technology Can Tackle the Growing Telecom Fraud Problem, Information Security Bulletin, April, 2000, pp Steward, S., Lighting the way in 97, Cellular Business, 23, January, 1997 Stolfo, S., Prodromidis, A., Chan, P.K., JAM: Java Agents for Meta-learning over Distributed Databases, In:Proceedings of Second International Workshop Multistrategy Learning, Center for Artificial Intelligence, George Mason University, Fairfax, VA, 1993 Walters, D., Wilkinson, W., Wireless fraud, now and in the future: A view of the problem and some solutions, Mobile Phone News, October,