An Improved Data Mining Technique Combined Apriori Algorithm with Ant Colony Algorithm and its Application

An Improved Data Mining Technique Combined Apriori Algorithm with Ant Colony Algorithm and its Application School of Information Engineering Hebei University of Technology, Tianin, China Tianin 300401, China E-mail:liguodong23423@163.com doi:10.4156/dcta.vol5.issue8.27 Abstract In this paper, Apriori algorithm has been improved and applied to substation data mining process. Ant colony algorithm is applied to get the optimal solution of reactive power allocation in substations. The state transition probability formula is amended and parameters are dynamically adusted in this ant colony algorithm. The choice of the ant s path to the next node is determined by the tabu table formulated according to the confidence level of the data mining. The switching strategy of the capacitor sets are given by online algorithm. 1. Introduction Keywords: Ant Colony Algorithm, Data Mining, Reactive Power Optimization Electric power system is a large-scale nonlinear interconnected system. It is difficult to extract the useful information from the accumulated continuously running data for operators in power system. The data mining technique can tae full advantage of these operating data to reveal the principles and rules that the power system contains through association analysis, classification and prediction, clustering analysis, outlier analysis, and so on [1-3]. Data mining technology has been applied in many fields such as credit card management, churning analysis and so on. Most researchers focus on the study of data mining models [4-6]. The application of traditional data mining techniques is continually facing new challenges in power system because an ever increasing amount of data is still being produced at high rates in power system and the analyses of the data often needs to be conducted in real-time and under time constraints. Ant colony algorithm (ACA) is a new method for solving the optimal combination problem [7]. In recent years, researches on the ant colony focus on improving the traditional ant colony algorithm, such as TSP optimal problems and its extended application of the ant colony algorithm to other areas, such as data mining and nowledge discovery [8-10]. Paper [11] adusts the ant colony pheromone adaptively under the limitation of pheromone to further solve the stagnation problem and improve the searching ability of ACA. Paper [12] applies ACA to optimize the rapid microgrid power management problem given complex constraints and obectives including: environmental, fuel/resource availability, and economic considerations. Reactive power plays an important role in supporting the real power flow by maintaining voltage stability and system reliability. The available reactive power capabilities of the system have to be optimally deployed so that bus voltages are ept within ecified limits. The purpose of reactive power diatch is to determine the proper amount and location of reactive support with several constraints. Paper [13] focuses on the voltage/reactive power problem eeping the real power flows fixed to values determined from a base case load flow analysis. In paper [14], optimal power diatch is solved by timevarying acceleration coefficients particle swarm optimization (TVAC-PSO). It proposes a comprehensive model for reactive power pricing in an ancillary services maret. Paper [15] presents an efficient Genetic Algorithm (GA) based reactive power optimization approach to minimize the total support cost from generators and reactive compensators. This paper focuses on the problem of extracting useful data for effective decision-maing of reactive power optimization. It describes the concepts and improvements of association rules algorithm - Apriori algorithm and ant colony algorithm. The improved Apriori algorithm is applied to extract the - 241 -

useful information for the ACA from the large number of running data in the substation operation process. The overall model based on Apriori algorithm and ant colony algorithm is established for reactive power optimization. An example power substation is used to illustrate the application of the proposed models in the voltage and reactive power automatic control system. Based on historical data, the proposed method is used to get the optimal operating conditions of the optimal solution to guide the practical operation. 2. Data mining 2.1. Principle of Association Rules Method Association rules method is represented simply as A B. Where, A I ; B I ; B A= φ. The support level of A B is The confidence level of A B is sup port( A B) P( A B) sup port _ count( A B) cofidence( A B) P( B A) (1) sup port _ count( A) Where, sup port _ count( A B) is the record number of the items which include A B ; sup port _ count( A ) is the record number of the items which include A. The support level indicates the statistical importance of association rules in the whole data set. The confidence level indicates the credibility of the association rules. Generally, the useful association rules are the ones with high support level and confidence level. The data mining process can be divided into two parts: (i) mining the large items set whose general support level is higher than the pre-set value; (ii) get the association rules whose support level is higher than the pre-set minimum support frequency. 2.2. Improved Apriori Algorithm The Apriori algorithm proposed by Agrawal in year 1994 is recursive and includes two main steps: (i) Get the frequent K- item on the frequent (K-1)- item. (ii) Calculate the support level of the candidate set on the database scanning and pattern matching. It can be included that the candidate set is too large and the database is scanned repeatedly in the Apriori algorithm. A improved method without these two drawbacs is applied to the data mining in the historical database of the substations. It is described as follow: (i) Preprocess the original data based on partition. It divides the database of the substation into 9-zones according to the requirement of reactive power and bus voltages. Then it focuses on the data in the area except the normal running area. So it is time-saving and fast-accessing because it only scans the correonding area in the database without scanning the whole database. (ii) Classify with similarity search, according to central substation operation conditions. The association level of the selected data is improved to meet the requirements of practical operation. - 242 -

3. Online optimal algorithm and overall model 3.1. Model of Ant Colony Algorithm Let m be the number of the ants; bi () t the number of the ants at moment t and element i; i () t the information in path (i, ) at moment t; d i ( i, =1,2,, n) the distance between cities i and. At the beginning, (0) C (C is constant). i When a ant (=1,2,m) is moving, it collects the information in the path to choose the next path. The state transition probability of ant s shift from city i to city at moment t is represented as pi () t s 0 [ i ( t)] [ i ( t)], allowed [ i ( t)] [ i ( t)] (2) allowed Where, allowd ={0,1,,n-1}; tabu represents the possible cities allowed to choose in the next step. The artificial ants have the function of memory. Tabu (=1,2,m) records the cities the ant has gone to in the last steps. And it is updated dynamically as the evolutionary process. After a circle with n times, the ant passes all the cities. Each path traversed by an ant is a solution. The information in each path is updated as Where, ( t n) (1 ) ( t) ( t) (3) i i i m i () t i () t (4) 1 ) 1, 0 [ ρ is volatile factor; 1-ρ is information residual factor. i () t is the residual information between city i and city and can be represented as Q, (, ) if ant pass path i i () t L 0, else (5) Where, Q indicates the pheromone intensity; L is the total length of the path the ant passed in this cycle. After several cycles, the calculation ends based on the stop condition. 3.2. Improvement of Ant Colony Algorithm The improvement of the ant colony includes: (i) Selection of parameters: The parameters are dynamically adusted. At the beginning, the parameters are set at a small value, to avoid "false positive feedbac" and "solution loss". When the calculation is running after a certain number of cycles, the parameters are increased to improve the solution quality. (ii) Modification of the parameters: The state transition probability in (2) is modified according to the results of data mining. The higher the confidence level and the pheromone concentration are, the greater the probability that ants choose. - 243 -

In the ant passes path (i, ), i () t is represented as Q(1 p) i () t (6) L Where, p is the confidence level. The tabu table is established according to the results of data mining. And it is updated after each ant s choice until the new optimal strategy is found. (iii) Selection of paths: First, calculate the reactive power supplied by the capacitor sets in all the substations to establish all the woring states. The probable strategies are found out when the reactive power shortfall is compared with the calculated reactive power. The strategies with great difference are aborted. Number the left states and find out the confidence level through data mining. Second, the path selection strategy in the basic ant colony algorithm is adusted. The probability of paths that ants choose is set as the confidence levels of the mined association rules. The tabu table of probable choice is listed. The next path is calculated by the tabu table without randomness. And the original establishment of tabu table is related to the results of the offline data mining. 3.3. Overall Model For a substation in centralized control mode in China, the proposed control strategy of switching capacitors for optimal allocation of reactive power is described as Fig.1. First, it establishes the association rules of the central station and controlled stations based on historical databases. Second, it compares the established results and the measured data. Then it calculates the optimal solution according to evaluation function, namely, optimization goals. New data Historical data Preparation of the data data mining with Apriori offline Confidence level Rule and nowledge The actual power grid ant colony algorithm The optimal strategy online Optimization goals output Real-time collected data Figure 1. Proposed strategy The proposed strategy can be divided into two parts: offline and online. The input of the offline part is the historical databases and the output is the associate rule and the confidence level of the historical data calculated by the Apriori algorithm. The frequent items are mined according to the principle that - 244 -

their frequencies are not less than the pre-set minimum support frequency. Based on the frequent items, the correonding strong association rules are gained. Ant colony algorithm is used to find the optimal strategy of reactive power regulation, based on the output association rules of the offline part. And the renew output of the offline part interact with the online strategy. 3.4. Target Function The power loss between two points i, can be represented as P f l U 2 i i i i (7) Where, P i is the tranorted power between i and ; l i is the length of the transmission line; i is the related comprehensive coefficient. The total power loss can be represented as The node voltage deviation is F 1 n f (8) i1 i f 2 U U U (9) The total voltage deviation of all nodes is F 2 U U n (10) 1 U Where, n is the number of the nodes except the slac bus nodes; U is the set value of the node voltage; ΔU is the set value max deviation of the node voltage. The mathematical model of the reactive power optimization can be represented as 1 12 2 (11) N min C ( F F ) Where, 1 and 2 are the weight coefficients; N is a group of the numbers of the available E [ e, e, e ] T n is the group of the states of the available capacitors ; f 1 and f 2 are the capacitors; 1 2 functions of E. 1capacitoriis switched e1 0 capacitor i is disconected The constraints can be represented as following: (i) The constraint of power balance - 245 -

in Pi UiU ( Gi cosi Bisin i) 0 1 in Qi Ui U ( Gisini Bi cos i) 0 1 (12) Where, P i is the inected active power; Q i is the inected reactive power; U i and U are the node voltage; G i is the conductance between i and ; B i is the susceptance between i and ; δ i is the electrical angle difference between i and. (ii) The constraint of node voltage Q Q Q, U U U,, Cimin Ci Cimax imin i imax i min i i max T T T, C C C imin i imax imin i imax (13) Where, Q Cimin is the min available reactive power; Q Cimax is the max available reactive power; U imin is the min voltage amplitude of node i ; U imax is the max voltage amplitude of node i; [T imin, T imax ] is adustment range of the adustable transformer i; n i,..., 2, 1 = ; C i is the switching frequency; C min and C max are the limits of Ci. If C i reaches to C max, the capacitor is disabled in the left time. 3.5. Calculation of Target Function (i)target function for TSP method: The problem of reactive power optimization in substations can be regarded as a TSP problem. A capacitor set can be regarded as a city in TSP method. The switching state is the path between two cities. The function in (11) can be described as n tsseii tssen 1 (14) s1 min( ( ( )) ( ( ))) Where, ts(( s en 1)) represents the change of target function if there is inected reactive power in the new-added node n. (ii) Constraint conditions: Considering the representation of the constraint conditions of (13) in tabu table, the constraints on voltage and the change of the transformer taps can be ignored. The switching frequency of capacitor sets is Cimin Ci Cimax. If Ci Cimax and last for a period time, the capacitor C i will be not allowed to switched again and the value is set to zero in the left time. 4. Case study The improved algorithm is applied to an example system. The diary operating data are available. Fig.2 shows the simplified study example. A center substation (C, as in Fig.1.) has nineteen controlled substations, three 110KV substations and sixteen 35KV substations. All these substations are equipped with reactive compensators and on-load tap-changing transformers as shown in Table I. The parameters are 0.5, 1, 0.4 before the 1/4 calculation period and 1, 3, 0.8 later. i () t is calculated by (6). So the information in the path is enlarged and the computational complexity is reduced to find the optimal solution quicly. - 246 -

Fig.3(a), (b), and (c) are the evaluation results when the reactive difference of 110V buses changes continuously. Where, (I) aims at the min of the net loss; in other words, 1 1, 2 0 in (11); (II) aims at the min node voltage deviation; in other words, 1 0, 2 1 in (11). Figure 2. A real electric system Table 1. THE CONFIGURATION OF THE COMPENSATED REACTIVE POWER IN EXAMPLE SUBSTATION Node No. Distance Available Var 1 35m 24Var 2 25m 36 Var 3 100m 24 Var 4 78 m 36 Var 5 43 m 24 Var 6 65 m 36 Var 7 73 m 24 Var 8 53 m 36 Var 9 67 m 30 Var 10 36 m 30 Var 11 36 m 12 Var 12 37 m 18 Var 13 56 m 12 Var 14 38 m 18 Var 15 47 m 12 Var 16 56 m 18 Var 17 67 m 12 Var 18 86 m 18 Var 19 33 m 12 Var 17/18 28 m 0 Var The evaluation function is as (5) F ( F F ) C f N 1 1 2 2 i 2i inl - 247 -

Where, F 1 is shown in (8); f 2 is shown in (9); NL f2 1. If the node voltage exceeds a given maximum deviation voltage of the node, the correonding coefficient C i increase as a punitive options. When the 35 V bus coupler switcher S1 is disconnected and 110 V bus coupler switcher S2 is closed, the compensating results are shown in Fig. 3 (a). When S 1 is closed and S2 is disconnected, the compensating results are shown in Fig.3 (b). When S1 and S2 are disconnected, the compensating results are shown in Fig.3 (c). Figure 3. The comparison of reactive compensation From Fig.3, it can be concluded that the overall compensation result with optimized strategy is better than that of the old switching method (III). The evaluation coefficient is equal to zero when fully compensated. The reactive power is over-compensated because of the step reactive power regulation with capacitors in Table.I. - 248 -

5. Conclusions An example substation system is described to test the algorithm proposed in this paper. Experimental results show that, reactive power optimization method based on data mining system can improve the system efficiency, reduce power loss, and have a great significance of stable operation 6. References [1] Qi Luo, Advancing Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining, WKDD 2008. 23-24 Jan. 2008, pp.3-5. [2] Xindong Wu, Data mining: artificial intelligence in data analysis, Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004). pp.7-7. [3] Aihua Li, Lingling Zhang, A Study of the Gap from Data Mining to Its Application with Cases, Business Intelligence and Financial Engineering, BIFE '09. International Conference on 24-26 July 2009 pp.464-467. [4] S J.A.teele, J.R.McDonald, and C.D'Arcy, Knowledge discovery in databases: applications in the electrical power engineering domain, IT Strategies for Information Overload (Digest No: 1997/340), IEE Colloquium on 3 Dec. 1997, pp.8/1-8/4. [5] LI Jianqiang, NIU Chenglin,LIU Jizhen. Application of Data Mining Technique in Optimizing the Operation of Power Plants, Joumal of Power Engineering. Vol.26,No.6, pp.830-835. [6] Cesario, E.; Talia, D.; Distributed Data Mining Models as Services on the Grid, International Conference on Data Mining Worshops, 2008, pp.486-495. [7] Dingli Song, Bingru Yang, Zhen Peng, and Weiwei Fang, Study of cost-sensitive ant colony data mining algorithm, Industrial Mechatronics and Automation, ICIMA 2009. International Conference on15-16 May 2009, pp.488-491. [8] L.Admane, K.Benatchba, M.Koudil, M.Drias, S.Gharout, N.Hamani, Using ant colonies to solve data-mining problems, IEEE International Conference on Systems, Man and Cybernetics, 2004 (4):3151-3157. [9] P. S. Sheloar, V. K. Jayaraman, B. D. Kularni, An ant colony classifier system: application to some process engineering problems, Computers and Chemical Engineering, 2004 (28): 1577-1584. [10] WANG Zhigang, YANG Lixi, CHEN Genyong. Ant Colony Algorithm for Distribution Networ Planning, Proceedings of the EPSA. 2002, 14(6):73-76. [11] Yi Shen; Mingxin Yuan; Yunfeng Bu; Study on adaptive planning strategy using ant colony algorithm based on predictive learning, Control and Decision Conference, 2009, pp: 3030-3035. [12] Colson, C.M.; Nehrir, M.H.; Wang, C.; Ant colony optimization for microgrid multi-obective power management. Power Systems Conference and Exposition, 2009, pp: 1-7. [13] Ali Abdulhadi Noaman, "Concentric Circular Array Antenna Null Steering Synthesis by Using Modified Hybrid Ant Colony System Algorithm", IJACT, Vol. 2, No. 2, pp. 144-157, 2010 [14] [Alaa Alanaby, Ku Ruhana Ku-Mahamud, Norita Md. Norwawi, "Interacted Multiple Ant Colonies Optimization Framewor: an Experimental Study of the Evaluation and the Exploration Techniques to Control the Search Stagnation", IJACT, Vol. 2, No. 1, pp. 78-85, 2010 [15] S. Janairaman, V. Vasudevan, "ACO based Distributed Intrusion Detection System", JDCTA, Vol. 3, No. 1, pp. 66-72, 2009-249 -