PLAANN as a Classification Tool for Customer Intelligence in Banking

PLAANN as a Classification Tool for Customer Intelligence in Banking EUNITE World Competition in domain of Intelligent Technologies The Research Report Ireneusz Czarnowski and Piotr Jedrzejowicz Department of Information Systems Gdynia Maritime University Morska 83, 81-225 Gdynia Poland irek, pj @am.gdynia.pl Abstract. The paper is a report describing using PLAANN as a classification tool for customer intelligence in banking. PLAANN is a software tool consisting of several ANN based classifiers, which are trained using the population learning algorithm (PLA). The paper reviews briefly PLA, explains its application to training ANN, describes functions and operation modes of PLAANN and, finally, reports on experiments with the set of patterns provided under the EUNITE World Competition. 1 Introduction One of the major application categories of artificial neural networks (ANN) is classification. The idea here is to recognize and classify given patterns to typically much fewer groups of patterns. The latter will be the output of the network for this type of application. This is accomplished by training the neural network using sample patterns and their correct answers. When the neural network is properly trained, it can, hopefully, give correct answers not only for the sample patterns, but also for new similar patterns. Main advantages of the approach include ability to tolerate imprecision and uncertainty and still achieving tractability, robustness, and low cost in practical applications. Since training a neural network for practical application is often very time consuming, an extensive research work is being carried in order to accelerate this process. Another problem with ANN training methods is danger of being caught in a local optimum. Hence, researchers look not only for algorithms that train neural networks quickly but rather for quick algorithms that are not likely, or less likely, to get trapped in a local optimum. One of the possible approaches to training ANN is using population-based methods which are known to be useful in solving variety of difficult computational problems [5, 6, 7]. Unfortunately quite often population-based approaches require substantial computational resources rendering the approach not practicable. Possible solution is

2 Ireneusz Czarnowski and Piotr Jedrzejowicz to combine robustness of the population-based methods with efficiency and speed of heuristic algorithms and neighbourhood search techniques. The idea led to proposing a variant of the population-based approach called population learning algorithm (PLA) proposed in [4]. Possibility of applying population learning algorithms to train ANN has been investigated in earlier papers of the authors [1, 2, 3]. Several versions of the PLA have been designed, implemented and applied to solving variety of benchmark problems. Initial results were promising showing good or very good performance of the PLA as a tool for ANN training [2]. The paper focuses on describing features and uses of the proposed PLAANN (Population Learning Algorithm and Artificial Neural Network) classification tool designed for solving classification problems in banking intelligence. The following sections give brief description of the PLA, provide information on intelligent technology used to implement the tool, describe data handling procedures, discuss experimental results and suggest how to use the PLAANN classifier as an adaptive learning system. Conclusions include some comments on features of technologies used. 2 Population Learning Algorithm Population learning algorithm is a population-based method inspired by analogies to a phenomenon of social education processes in which a diminishing number of individuals enter more and more advanced learning stages. In PLA an individual represents a coded solution of the considered problem. Initially, a number of individuals, known as the initial population, is randomly generated. Increasing the initial population size can be considered as a mean for diversification helping to escape from the local optima. Once the initial population has been generated, individuals enter the first learning stage. It involves applying some, possibly basic and elementary, improvement schemes or conducting simple learning sessions. These can be based on some local search procedures. The improved individuals are then evaluated and better ones pass to the subsequent stages. A strategy of selecting better or more promising individuals must be defined and duly applied. At the following stages the whole cycle is repeated. Individuals are subject to improvement and learning, either individually or through information exchange, and the selected ones are again promoted to a higher stage with the remaining ones dropped-out from the process. At the final stage the remaining individuals are reviewed and the best represents a solution to the problem at hand. Learning process at early stages can be run in parallel. Individuals are then grouped into classes with possibly different curricula, which are different improvement schemes. At certain level the best from all groups join together to form higher-level groups where improvement and learning process are still carried in parallel. At some stage selected individuals are brought together to complete education. At different stages of the process, different improvement schemes and learning procedures are applied. These gradually become more and more sophisticated and, possibly, time consuming as there are less and less individuals to be taught. Finally,

PLAANN as a Classification Tool for Customer Intelligence in Banking 3 after having passed all the prescribed stages, a final population is analysed with a view to select the fittest individual. START LEARN i (P) Set the number of learning stages N. Set the initial population size m SELECT i (P) Define learning/improvement procedures LEARN i (P), i=1...n, operating on a population of individuals P Define selection procedures SELECT i (P) i=1...n, operating on a population of individuals P i :=1 i := i +1 i > N YES Consider the best individual from P as a solution NO Generate the initial population Set P := initial population END Fig.1. General idea of the population learning algorithm 3 The PLAANN Classifier The proposed PLAANN classifier is a software tool based on using a set of artificial neural networks trained by the dedicated population learning algorithm. Main functions of the PLAANN include: pre-processing input data sets (both train and test), training artificial neural networks, classification of patterns. 3.1 Pre-processing Input Data Sets To assure efficiency of the classifier it has been decided to apply the Input Data Transformation Algorithm which role is to decrease the size of the training data set by

4 Ireneusz Czarnowski and Piotr Jedrzejowicz first partitioning it into clusters of similar patterns and afterwards considering only a representation of similar patterns. Number of such representatives, known as the inspiration level has to be set by the user at the fine-tuning phase. Choice of its value requires finding a compromise between the time needed for training ANN, the quality of classification results and the size of the available input data set. Pseudo-code of the Input Data Transformation Algorithm is shown in Fig. 2. Procedure Input_data_transformation {N number of patterns; n number of attributes; X input data set matrix of n columns and N rows; x ij - data set element (i=1..n; j=1..n); T in - training data set; T ts - testing data set} Begin Transform X normalizing each x ij into interval <0,1> and then rounding it to 0 or 1; For j =1 to n do Calculate Sum = N j x ij ; i = 1 End for For i =1 to N do n Calculate Ii = xij Sum j ; j= 1 End for Map elements of X into t subsets, each containing data elements with identical values of I i, where t is a number of different values of I i (i = 1...N); Construct training set T tr taking k patterns from each thus created subset, where k inspiration_level; Construct testing set Tts = X Ttr ; End Fig. 2. Pseudo code of the Input Data Transformation Algorithm 3.2 ANN Training Algorithm To train artificial neural networks an implementation of the population learning algorithm, originally proposed in [4], is used. In order to increase efficiency of the approach it has been decided to use the parallel computing environment based on PVM (Parallel Virtual Machine). This allows for running parallel learning processes or groups of such processes and thus speeding up ANN training.

PLAANN as a Classification Tool for Customer Intelligence in Banking 5 A neural network learns patterns by adjusting its weights. Learning process can be considered as a search for weights of connections between neurons such that a network can output the correct target pattern for each input pattern. Search processes aiming at finding the required weights are carried within the proposed tool as a population learning scheme in accordance with principles of the population learning algorithm. Since the discussed approach assumes a multiple classifiers working in parallel, each group of processes is used to train a single classifier (see section 3.3). Processes within such group deal with independent populations of weights (here called individuals) using similar learning and improvement procedures. Processes within a group do exchange information by forwarding best individuals to other processes. Processes of search within such a group are performed by the, so called, slave workers but information exchange and process coordination is handled by the master worker (one for each group of processes). The following features characterize the proposed parallel implementation of PLA: - Master worker defines number of slave workers and size of the initial population for each of them. - Each slave worker uses identical learning/improvement procedures. - Master worker activates parallel processing. - After completing each stage workers inform master about the best solution found so far. - Master worker compares the received values and sends out the best solution to all the workers replacing their current worst individual. - Master worker can stop computations if the desired quality level of the objective function has been achieved. This level is defined at the beginning of computations through setting the desired value of the mean squared error on a given set of training patterns. - Slave workers can also stop computations if the above condition has been met. - Computation is carried for the predefined number of stages. Generally the PLA code, which is run by each slave workers, is based on the following assumptions: - An individual is a vector of real numbers from the predefined interval, each representing a value of weight of the respective link between neurons in the considered ANN. - The initial population of individuals is generated randomly from. - There are five learning/improvement procedures used standard mutation, local search, non-uniform mutation, gradient mutation and gradient adjustment. - There is a common selection criterion for all stages. At each stage, individuals with fitness below the current average are rejected. The improvement procedures require some additional comments. The first procedure standard mutation modifies an individual by generating new values of two randomly selected elements within an individual. If the fitness function value has improved then the modification is accepted. The second learning/improvement procedure involves mutual exchange of values between two randomly selected elements (often called chromosomes) within an individual. If the fitness function value of an individual after such an exchange has improved, then the modification is accepted.

6 Ireneusz Czarnowski and Piotr Jedrzejowicz The third learning/improvement procedure non-uniform mutation involves modifying an individual by repeatedly adjusting value of the randomly selected element (in this case a real number) until the fitness function value has improved or until a number of consecutive improvements have been attempted unsuccessfully. This number has to be set at the fine-tuning phase. The value of the adjustment is calculated as: ( t, y) = y(1 r t ( 1 ) r T, where r is the uniformly distributed real number from (0, 1], T is equal to the length of the vector representing an individual and t is a current number of adjustment. The fourth improvement procedure gradient mutation changes two randomly selected elements within an individual by incrementing or decrementing their values. Direction of change (increment/decrement) is random and has identical probabilities equal to 0.5. The value of change is proportional to the gradient of an individual. If the fitness function value of an individual has improved then the modification is accepted. Number of iterations for procedures number 1,2 and 4 has to be set at the fine-tuning phase. The fifth learning/improvement procedure adjusts the value of each element of the individual by a constant value proportional to its gradient. is calculated as = α ξ, where α is the factor determining a size of the step in direction of ξ, known as a momentum. α has value from (0, 1]. In the proposed algorithm its value iterates starting from 1 with the step equal to 0.02. ξ is a vector determining a direction of search and is equal to the gradient of an individual. ) 3.3 Pattern Classification In general, the proposed classification tool is composed of the K independent classifiers one for each of the considered pattern classes. After all K classifiers have been trained each one becomes an expert in recognizing patterns belonging to a single class. In case of the banking customers classification problem there are two classes (K = 2) and two independent classifiers trained to recognize active and, respectively non-active customers. Each classifier is an artificial neural network of the MLP structure with 3 layers input, hidden and output. Number of neurons in the input layer is equal to the number of attributes of the input pattern. The hidden layer has 15 neurons and the output layer consists of the single neuron. The range of weights is [-1, 1] and the sigmoid activation (transfer) function has the sigmoid gain value set to 1.0. The proposed tool takes a decision as to classifying the input pattern in the following three steps: The input pattern is read by each of the two classifiers (each specializing in a different class). Each of the classifiers produces an output value C i, i = 1,2 which is a real number from the (0, 1] interval.

PLAANN as a Classification Tool for Customer Intelligence in Banking 7 If C 1 > C 2 then the input pattern is classified as belonging to class one. Otherwise it is considered as belonging to class two. 4 Using PLAANN PLAANN has been coded in C++ and is ready to be run on Unix and Linux platforms. The parallel version has been implemented to be executed under the PVM environment. A sequential, non-parallel version of the tool is also available. There are three modes of running the PLAANN: 1. Training classifiers and producing classification results using two data sets one with training and another with testing patterns. 2. Training classifiers and producing classification results using a single input data set which is automatically partitioned into training and testing data sets. The training data set is generated by the Input Data Transformation Algorithm applied to original input data set. The testing data set includes all the remaining instances from the original input set, which have not been included into the training set. 3. Classifying test patterns using already trained classifiers. All input data sets need to be stored in a shared catalogue of the PVM environment. The catalogue can be configured by NFS to assure access to data by all concurrent groups of processes. When running the PLAANN the user has to specify the mode and provide names of the respective data files (training, test or input). The second mode allows the user to experiment with different inspiration levels with a view to finding best compromise between efficiency of computations and quality of the classification process. In each mode the user is expected to set parameters controlling classification processes. This requires modifying the configure file, which example content is shown in Fig. 3. Number of hidden units in the middle layer 15 Value of the sigmoid gain 1.0 Range of weights -1.0 1.0 Number of improvement procedures (max 5) 5 Size of the initial population (max 1000) 150 Number of iterations for improvement procedure 80 Number of slave workers (max 32) 5 Inspiration level 10 Fig.3. Example content of the configure file As the final result PLAANN generates a classified test file containing classification results. The file has the same structure as the training file. Bedsides, the tool generates automatically additional file named info_results. This file summarizes classification results. In Fig. 4 an example content of the info_results file is shown.

8 Ireneusz Czarnowski and Piotr Jedrzejowicz Running mode 2 Execution time [s] 805 Measure of accuracy on test data 80 Number of patterns (input data set) 12000 Number of classes 2 Number of patterns (training data set) 364 Number of patterns in class 1 (training data set) 180 Number of patterns in class 2 (training data set) 184 Number of patterns (test data set) 11636 Fig.4. Example contents of the info_results file With the three operating modes available the PLAANN can be used within an adaptive-learning loop. Initially, depending on the user requirements, the PLAANN should be used in mode 1 or 2. Mode 1 would be used when there is a need for classifying the existing set of patterns and, at the same time, the user does have at hand a set of reliable training data. More often, however the user would have a set of patterns with classification results and would like to have classifier able to classify future patterns of similar kind. In such a case mode 2 should be used to train the tool and to estimate efficiency and quality of classification. If both are satisfactory the user should run the system in mode 3 to get classification of incoming patterns. After a while, an adaptive loop could be closed. Initial data set used at a previous stage plus new patterns that have been obtained within certain time period will be merged to produce a new input data set for mode 2. Frequency of such mergers and repeated training (involving perhaps also a scheme for deleting oldest patterns) would depend on needs of the user and structure of the patterns. 5 Experiment Results PLAANN has been used to produce a set of classification results for the Customer Intelligence in the Bank problem as provided by the organizers of the EUNITE world competition in domain of intelligent technologies. The results have been stored in the file client_test.txt (attached to this paper) consisting of 12000 testing patterns to which information about class has been added by PLAANN. To train the tool a set of patterns provided in the file client_train.txt has been used. Computational experiment carried to evaluate quality of the proposed PLAANN implementation has been based on using the tool in mode 2, thus enabling to choose, during the fine tuning phase, satisfactory inspiration level. The client_train.txt file contains 12000 patterns each with 36 attributes and a class value provided. In the experiment aiming at evaluating PLAANN performance in application to the Customer Intelligence in the Bank problem, the tool has been run with inspiration numbers equal to 5, 10, 15 and 50. Training data sets produced by the Input Data Transformation Algorithm have had 219, 272, 346 and 732 patterns, respectively. This has clearly resulted in a substantial reduction of training set size as compared

PLAANN as a Classification Tool for Customer Intelligence in Banking 9 with original 12000 patterns. The reduced training set still preserves basic features of the analysed data. This can be seen intuitively in Fig. 5 and 6 where the initial distribution of values for attributes 35 and 36 (Fig. 5) can be compared with the distribution generated by the Input Data Transformation Algorithm (Fig. 6). 1 0.9 dimension # 0.8 0.7 0.6 0.5 Non-active Active 0.4 0.5 0.6 0.7 0.8 0.9 1 dimension #35 Fig. 5. The initial distribution of values for the attributes 35 and 36 1 0.9 dimension #3 0.8 0.7 0.6 0.5 Non-active Active 0.4 0.5 0.6 0.7 0.8 0.9 1 dimension # 35 Fig. 6. The reduced distribution of values for the attributes 35 and 36 with 272 patterns and inspiration number = 10 In the computational experiment the PLAANN has been run 20 times for each inspiration level (5, 10, 15, and 50). Each time patterns representing the respective cluster of similar patterns have been randomly drawn from available similar patterns. Their number has been determined by the inspiration level. Characteristics of thus obtained classifications averaged over 20 runs are shown in Table 1.

10 Ireneusz Czarnowski and Piotr Jedrzejowicz Table 1. Performance of PLAANN in the classification experiment Insp. level Training data Accuracy of classification Testing data Training time [s] mean max min mean max min 5 100% 100% 100% 78% 80.3% 76.6% 43 10 100% 100% 99% 79.7% 81.2% 74% 159 15 99.3% 100% 98.4% 80.1% 81.9% 78.6% 375 50 99% 100% 97.5% 80.8% 81.7% 78.8% 786 Averaged 100% - - 80% 341 Overall performance of the PLAANN seems quite satisfactory. It can be expected that the tool in application to customer intelligence in banking problem would assure at least 80% of correct classification decisions. It is also clear that increasing inspiration level leads to a better performance in terms of classifier quality at a cost of higher requirements in terms of computation time. Experiment has been carried on Sun Challenge R4400 workstation with 12 processors. A number of slave workers used by the master varied in different runs from 5 to 15 and has been chosen randomly. 6 Summary The paper is a report describing using PLAANN as a classification tool for customer intelligence in banking problem. PLAANN is a software tool consisting of several ANN based classifiers, which are trained using the population learning algorithm (PLA). Main inference engine within the proposed approach is the population learning approach. Population learning algorithm is a population-based method inspired by analogies to a phenomenon of social education processes in which a diminishing number of individuals enter more and more advanced learning stages. In PLA an individual represents a coded solution of the considered problem. Initially, a number of individuals, known as the initial population, is randomly generated. Increasing the initial population size can be considered as a mean for diversification helping to escape from the local optima. Once the initial population has been generated, individuals enter the first learning stage. It involves applying some, possibly basic and elementary, improvement schemes or conducting simple learning sessions. These can be based on some local search procedures. The improved individuals are then evaluated and better ones pass to the subsequent stages. To solve the classification problem posed by the organizers of the EUNITE World Competition in domain of Intelligent Technologies a software tool called PLAANN has been proposed and implemented. PLAANN has been coded in C++ and is ready

PLAANN as a Classification Tool for Customer Intelligence in Banking 11 to be run on Unix and Linux platforms. The parallel version has been implemented to be executed under the PVM environment. A sequential, non-parallel version of the tool is also available. There are three modes of running the PLAANN: Training classifiers and producing classification results using two data sets one with training and another with testing patterns. Training classifiers and producing classification results using a single input data set which is automatically partitioned into training and testing data sets. Classifying test patterns using already trained classifiers. With the three operating modes available the PLAANN can be used within an adaptive-learning loop allowing to easily train classifiers as often as needed to assure adaptability to changing environment. Computational experiment carried have proven that the PLAAN can be a useful tool for customer intelligence in banking problem. Its estimated performance level is above 80% of correct classifications. References 1. Czarnowski, I., Jedrzejowicz, P., Ratajczak, E.:Population Learning Algorithm - Example Implementations and Experiments. Proceedings of the Fourth Metaheuristics International Conference, Porto (2001) 607-612 2. Czarnowski, I., Jedrzejowicz, P.: Population Learning Metaheuristic for Neural Network Training. Proceedings of the Sixth International Conference on Neural Networks and Soft Computing (ICNNSC), Zakopane (2002) 3. Czarnowski, I., Jedrzejowicz, P.: Application of the Parallel Population Learning Algorithm to Training Feed-forward ANN. Proceedings of the Euro-International Symposium on Computational Intelligence (E-ISCI), Kosice (2002) 4. Jedrzejowicz, P.: Social Learning Algorithm as a Tool for Solving Some Difficult Scheduling Problems. Foundation of Computing and Decision Sciences (1999) 24: 51-66 5. Goldberg, D.E.: Gentic Algorithms in Search, Optimization & Machine Learning, Addison-Wesley, Boston (1989) 6. Michalewicz Z.: Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, Berlin (1992) 7. Mitchell M.: An Introduction to Genetic Algorithms, MIT Press, Boston (1996)