Dynamic intelligent cleaning model of dirty electric load data

Transcription

1 Available online at Energy Conversion and Management 49 (2008) Dynamic intelligent cleaning model of dirty electric load data Zhang Xiaoxing a, *, Sun Caixin b a State Key Laboratory of Power Transmission Equipment & System Security and New Technology, Chongqing University, Chongqing , China b The Key Laboratory of High Voltage Engineering and Electrical New Technology, Ministry of Education, Electrical Engineering College of Chongqing University, Chongqing , PR China Received 13 January 2006; received in revised form 16 April 2006; accepted 19 August 2007 Available online 25 October 2007 Abstract There are a number of dirty data in the load database derived from the supervisory control and data acquisition (SCADA) system. Thus, the data must be carefully and reasonably adjusted before it is used for electric load forecasting or power system analysis. This paper proposes a dynamic and intelligent data cleaning model based on data mining theory. Firstly, on the basis of fuzzy soft clustering, the Kohonen clustering network is improved to fulfill the parallel calculation of fuzzy c-means soft clustering. Then, the proposed dynamic algorithm can automatically find the new clustering center (the characteristic curve of the data) with the updated sample data; At last, it is composed with radial basis function neural network (RBFNN), and then, an intelligent adjusting model is proposed to identify the dirty data. The rapid and dynamic performance of the model makes it suitable for real time calculation, and the efficiency and accuracy of the model is proved by test results of electrical load data analysis in Chongqing. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Dirty data; Data mining; Kohonen clustering network; RBF neural network; Dynamic adjusting 1. Introduction High accuracy of load forecasting for power systems improves the security of the power system and reduces generation costs. Load forecasting is highly related to power system operations such as dispatch scheduling, preventive maintenance plan for generators and reliability evaluation of the power systems. In addition, accurate estimated loads are key data that are necessary for electric power price forecast on the electric power markets. So far, many studies on load forecasting have been made to improve prediction accuracy using various conventional methods such as regression models, expert systems, artificial neural network, fuzzy inference and hybrid algorithm [1 7]. Because of transmission errors of the information channel, as well as the faults of the remote terminal unit (RTU) etc., the load data derived from the supervisory control and * Corresponding author. Tel.: x8215; fax: address: mikezxx@tom.com (X. Zhang). data acquisition (SCADA) has some dirty data. Direct use of these load data may have some negative effects on the accuracy of load forecasting, so it is necessary to identify and to adjust these dirty data, which is an important step of data mining [8]. So far, various methods have been proposed to identify and to adjust the dirty data, but there is still no systematic method that can solve this problem effectively all around. Sequential probabilistic ratio analysis is used as outliers detection tools for stationary time series [9], but this method requires relative information about the data set parameters, such as data distribution, which is yet unknown in many cases. Learning vector quantization (LVQ) has been used to get rid of dirty data in Ref. [10]. This method regards data as vector array. If one element in a vector is dirty data, the whole vector is eliminated. Because it cannot identify the exact location of the dirty data, a great deal of useful information will be lost at the same time. In this paper, a dynamic and intelligent model that has three layers based on data mining theory is proposed. The first layer extracts the characteristic curve from the /$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi: /j.enconman

2 X. Zhang, C. Sun / Energy Conversion and Management 49 (2008) load using the Kohonen clustering network improved by the fuzzy soft clustering algorithm. In the second layer, a radial basis function neural network (RBFNN) is used to construct a pattern classifier for identifying dirty data. In the third layer, the value of the dirty data is replaced by the weighted sum of the corresponding two values in the same place in two characteristic curves with maximal membership grade. According to the updated sample data, the proposed dynamic clustering algorithm can automatically search new vectors, namely, the characteristic curve. This model fills up deficiencies of the methods mentioned in the above references, and it owns many advantages, such as high accuracy, real time and dynamic state. What s more, the efficiency and accuracy of the model is proved by test results of electrical load data analysis in Chongqing. 2. Principle and structure of intelligent adjusting model of dirty data Similarity and smoothness are the two important characteristics of electrical load curves. The several peak times in a daily curve are generally the same, and the neighboring points usually have little variation, while the existence of dirty data will obviously destroy the smoothness. However, the similarity remains unchanged because the amount of dirty data is small. Therefore, characteristic patterns can be extracted from many load curves that may contain dirty data using the clustering algorithm of data mining theory, and then, the characteristic curve can be separated from the load curves by a classification algorithm and the dirty data will finally be recognized. The structure of the model is shown in Fig. 1. The first layer is a kind of improved Kohonen network (FKCN fuzzy Kohonen clustering network). The under checked curve x j is the input of the FKCN. If the characteristic curve corresponding to a nerve cell has the biggest similarity to x j, the nerve cell will output 1 and excite the corresponding RBF sub-network. The second layer is a RBF sub-network related to each clustering center. After being trained, it is ready to identify the dirty date and locate them accurately. If the output cell of the RBF is close or equal to 1, the corresponding input cell stands for dirty data. The third layer adjusts the dirty data. The detailed principles in term of model layers are described as follows Load data clustering (the first layer) Data clustering is used for extracting the characteristic curve from the load. Clustering algorithms attempt to assess the interaction among patterns by organizing the patterns into clusters so that patterns within a cluster are more similar to each other than those patterns belonging to different clusters. Neural networks, such as the Kohonen clustering networks (KCNs) have been successfully applied in the area of pattern recognition and clustering [11 14]. One of the advantages of this approach is that it does not need any prior knowledge of the number of clusters present in the data set. However, KCNs suffer from several major problems [15]. Firstly, KCNs are heuristic procedures, so termination is not based on optimizing any model of the process or its data. Secondly, the final weight vectors usually depend on the input sequence. Thirdly, different initial conditions usually yield different results. Fourthly, several parameters of the KCN algorithms, such as the learning rate, the size of update neighborhood and the strategy to alter these two parameters during learning must be varied from one data set to another to achieve useful results. A fuzzy Kohonen clustering network (FKCN) model has been proposed by Bezdek in Ref. [15]. This method can overcome some of the difficulties described above by taking advantage of the best features of the self organizing structure of the KCNs and the fuzzy clustering model of the FCM. In this paper, the FKCN algorithm has been employed in clustering the load data. RBF 1 0 Training date normalize FKCN network RBF J-1 RBF J 1 adjust 0 Σ Data After clean 0 RBF N first layer Data need to be cleaned Second layer Third layer Fig. 1. The intelligent adjusting model of dirty data.

3 566 X. Zhang, C. Sun / Energy Conversion and Management 49 (2008) Kohonen clustering networks (KCNs) The Kohonen model is a neural network that simulates the hypothesized self organization process carried out in the human brain when some input data are presented [11]. The structure of this neural network is composed of two layers: an input layer formed by a set of units (on for each feature of the input) and an output layer formed by units or neurons arranged in a two-dimensional grid. Each neuron has a vector of coefficients associated with it. It can be interpreted as weights attached to the edges that connect the p input nodes to the c output nodes. The aggregate of the c weight vectors (the network weight vector v i ) is adjusted during learning. Given an input vector, the neurons in the output layer compete among themselves and the winner (whose weight has the minimum distance from the input) updates its weights and some set of predefined neighbors. The process continues until the weight vectors stabilize. In this method, a learning rate must be defined that decreases with time in order to force termination. The updated neighborhood must be defined and is also reduced with time. The KN algorithm process can be seen in Ref. [11] Fuzzy c-means algorithms (FCM) Fuzzy c-means clustering [16 18] is a process of grouping similar objects into the same class, but the resulting partition is fuzzy, which means that the patterns are not assigned exclusively to a single class, but partially to all classes. The goal is to optimize the clustering criteria in order to achieve a high intra-cluster similarity and a low inter-cluster similarity using p-dimensional feature vectors. The theoretical basis of these methods will only be briefly reviewed here. Let X ={x 1,x 2,...,x n } denote a data set where each element in X is a vector with P dimension, the data set X is going to be partitioned into c fuzzy clusters. A c-partition of X can be represented by u ik, where u ik is a continuous function in the [0, 1] interval and represents the membership of x k in the cluster i, 16 i 6 c, 16 k 6 n. In general [u ik ] can be denoted by a c n matrix U and satisfies the following conditions: X c i¼1 u ik ¼ 1 The fuzzy c-means algorithm consists of an iterative optimization of an objective function: J m ðu; vþ ¼ Xn X c ðu ik Þ m D ik ð2þ k¼1 i¼1 where the parameter m 2 (1, 1) determines the fuzziness of the partition. In this paper, m = 2.0. v i ={v 1,v 2,...,v c }, with v i is the cluster center of class i, and D ik ¼ðd ik Þ 2 ¼kx k v i k 2 A ð3þ is the distance in the A norm from x k to v i (A is any positive definite p p matrix). ð1þ For a given partition, the cluster centers can be calculated as follows: P n k¼1 v i ¼ ðu ikþ m x P k n k¼1 ðu ikþ m ð4þ A new partition is obtained as " # 1 u ik ¼ Xc m 1 ðd ik =d jk Þ 2 ð5þ j¼1 The iterative optimization of the objective function continues until a stopping criterion is met, usually when the distance between U matrices at successive iterations falls below a threshold, that is E t ¼kU t U t 1 k < e ð6þ FCM is a gradual optimal process with slow convergence FKCN The fuzzy Kohonen clustering network [15] is a type of neural network that combines both methods described above: KCNs and FCM. The structure of this self organization network model consists of two layers: input and output. The input layer is composed of n nodes, where n is the number of features, while the output layer is formed by c nodes, where c is the number of clusters to be found. Every single input node is fully connected to all output nodes with an adjustable weight v i assigned to each connection. Given an input vector, the neurons in the output layer update their weights based on a pre-defined learning rate a. This approach integrates the fuzzy membership u ik from the FCM in the following update rule: v i;t ¼ v i;t 1 þ a ik;t ðx k v i;t 1 Þ where the learning rate a is defined as: a ik;t ¼ðu ik;t Þ mt m t ¼ m 0 ðm 0 1Þt=T m 0 is any positive constant greater than one, t is the current iteration and T is the iteration limit. The steps for the algorithm are: Step 1: Fix c, and e to any small positive constant. Step 2: Initialize the weight vector (cluster centers) v 0 ={v 1,0,v 2,0,...,v c,0 }. Choose m 0 > 1 and maximal iterative steps T. Step 3: For t =1,2,...,T (a) Compute all learning rates a ik as defined in Eq. (8). (b) Update all weight vectors v i,t with: P n k¼1 v i;t ¼ v i;t 1 þ a ik;tðx k v i;t 1 Þ P n s¼1 a ð10þ is;t (c) Compute E t for the stopping criterion, If E t < e then stop, else next t. ð7þ ð8þ ð9þ

4 X. Zhang, C. Sun / Energy Conversion and Management 49 (2008) Dynamic soft clustering by using SFKN The sample data is a time sequence and should be updated dynamically with elapsing time. Thus, dirty data adjustment is also a dynamic process. In this paper, a detective threshold value u 0 is introduced, and the algorithm is detailed as follows: Step 1: Initializing the dynamic detective threshold value u 0. Step 2: Introducing x þ j and x j, where xþ j means the new added sample data in the data set and x j stands for the eliminated sample data. The current sample data can be expressed as: X ¼ffx j g þfx þ j g fx j gg. Step 3: Calculating u iðj j Þ, the membership grade of the remaining data x j x j towards the clustering center vector v i. Setting u iðj j Þ ¼ maxfu iðj j Þg, if u iðj j Þ < u 0, then eliminating v i, updating c = c 1, and setting all the weights to 0 whose related nodes connect with this node in the FKCN; if u iðj j Þ P u 0, remain v i. Step 4: Calculating u ij þ, where u ij þ means the membership grade of x þ j toward each clustering center v i.set u ¼ maxfu ij þ ij þg, if u < u ij 0, continue the next þ step 5; if u P u ij þ 0 and then algorithm finishes. Step 5: Introduce new added clustering center v i þ with initial value x þ j, set c = c + 1 and keep other clustering center unchanged, then choose x þ j as the input of FKCN and calculate new clustering center according to the FKCN algorithm. In step 5, most of the clustering centers remain unchanged in spite of the little variation with the network structure. At the same time, the membership grade of the original data toward these cluster centers also remains unchanged, so the parameters of the original network can be used in the new network, which will converge quickly Pattern classifying of dirty data (the second layer) In the second layer of the model, the RBF is used to construct a pattern classifier for dirty data because of its strong ability of fast convergence and classification Radial basis function network The radial basis function (RBF) network [18 21], which is a three layer neural network including input, hidden and output layers. The input layer connections are not weighted, and thus, each hidden node receives each input value, without alteration. The hidden nodes are the radial basis function units. The transfer function for the hidden nodes is non-monotonic in contrast to the monotonic sigmoid function of back propagation networks. The output nodes are simple summations. The transfer function of the hidden layer in the RBF network often uses a Gaussian function a i ¼ expð kx v i k=r 2 i Þ ð11þ where a i is the activation of the ith node in the hidden layer, X 2 R n is an input vector, v i is called the center vector of the ith node, r i is called the bandwidth vector of the ith node, and kk denotes the Euclidean norm. The output of the network y j is given by: y j ¼ Xm w ji a i ð12þ i¼1 where w ji is the connected weight between the hidden layer and the output layer, m denotes the number of nodes in the hidden layer The dirty data locating algorithm Each clustering center from the FKCN corresponds to a RBFNN, and the value of the clustering center is selected as the center of Gaussian function of each RBF. Each RBF s input layer has 96 nodes (corresponds to the 96 load points per day), and the output layer also has 96 nodes, Suppose that only a single dirty data is present and the rest of the data are normal, then choose the sampling number as 96, and then, the pattern number of the dirty data is 96 2 = 192. Input and output sample data sets can be created as follows: Step 1: Choosing clustering center v i as the i-th input of the RBFNN, that is, x 0 = v i, and the corresponding output is y 0 = (0,0,...,0). Step 2: Giving the first element of v i a deviation, x 1 =(v i (1) + e,v i (2),...,v i (p)), there is products a sample containing dirty data, and output y 1 = (1, 0,...,0) after; Giving the second element of v i a deviation, x 1 =(v i (1),v i (2) + e,...,v i (p)), there is produced a sample containing dirty data, then output y 1 = (0,1,...,0); Continue this operation to the remaining elements of v i and obtain a sample data set with positive deviation. Step 3: Change the deviation e to e and replace 1 in the output vector by 1. Repeat step 2 and obtain a sample data set with negative deviation. The trained network can identify and locate dirty data accurately no matter how the dirty data exists in the curve: whether there is only a single dirty element or a series Recognition and adjustment of dirty data (the third layer) The amendment of dirty data is realized in the third layer. The value of the dirty data positioned in the second layer is adjusted by replacing it by the weighted sum of the corresponding two values in the same position in two characteristic curves with maximal membership grade. If the sub maximal membership grade is less than 0.2, then the

5 568 X. Zhang, C. Sun / Energy Conversion and Management 49 (2008) value of the characteristic curve with maximal membership grade is chosen. For example, dirty data exists in the curve x j from point t1 to t2, and v i1, v i2 are two clustering centers with maximal membership grade, then the amendment of the dirty data can be expressed as: u i1;j u i2;j x 0 j ðtþ ¼v0 i1 ðtþ þ v 0 i2 u i1;j þ u ðtþ ð13þ i2;j u i1;j þ u i2;j v 0 i1 ðtþ ¼v i1ðtþ x jðt1 1Þ v i1 ðt1 1Þ þ x jðt2 þ 1Þ =2 ð14þ v i1 ðt2 þ 1Þ v 0 i2 ðtþ ¼v i2ðtþ x jðt1 1Þ v i2 ðt1 1Þ þ x jðt2 þ 1Þ =2 ð15þ v i2 ðt2 þ 1Þ where t 2 [t1,t2]. 3. The analysis of results Fig. 2. Normalized curves and clustering center of one type of loads. Data in workday and weekend are put into the FKCN, respectively, because these two kinds of load curves are obviously different. This operation reduces the amount of training calculation and the number of clustering centers and increases the calculation speed and improves the efficiency of the model. The following example is derived from electrical load data from April to September 2003 of the Jiangbei power supply bureau in Chongqing, China Normalization of load data Similarity and smoothness of curves are mainly considered in this system. As varied amplitude of curves influences the similarity of the curves, in order to eliminate this influence, we normalize the load as follows: x 0 L ðiþ ¼ x LðiÞ P 96 i¼1 x LðiÞ 3.2. Training results ð16þ Fig. 3. Identification of dirty data. 1: load curve; 2: clustering center with maximal membership grade 3: clustering center with sub-maximal membership grade; 4: curve after adjusted. The clustering center and curves after normalization are shown in Fig Adjusting results Load data in October 2003 (out of the training set) are adjusted randomly. Fig. 3 is a typical load curve with dirty data, where the amendment methods and corresponding results are presented clearly Comparison of accuracy between FKCN and KCNs In order to illustrate the advantages of the FKCN employed in this paper, we replace the FKCN in the first layer of the model by the general Kohonen network. On the basis of daily load data in October 2003, some dirty data are added artificially. The results of the two methods are shown in Table 1 where one can find that the accuracy of the proposed method is higher than that of the KCNs method Dynamic updating algorithm In order to verify the efficiency of this algorithm, dirty data of 5 days in December 2003 are firstly adjusted without using the dynamic updating algorithm (sample data is from April to September in 2003). Then, the model of the dynamic updating algorithm is used, and the sample data set is updated till the day before the identification day. Results are shown in Table 2. The results of the dynamic updating algorithm are satisfying because its error is less than that of the non-dynamic algorithm. In the dynamic updating algorithm, the latest adjusted clustering center and vectors are used, which increases the membership grade of the load curves and clustering centers. However, the dynamic updating algo-

6 X. Zhang, C. Sun / Energy Conversion and Management 49 (2008) Table 1 Comparison of two amendment models Dirty data points rithm improves not only the accuracy of identification but also the adjustment precision of the dirty data. 4. Conclusion Error before adjust (%) General Kohonen (%) SFKN (%) Table 2 The result of random check to load data Date no. Count of dirty data Non-dynamic algorithm Failed to Misjudged judge Dynamic updating algorithm Failed to judge Total Misjudged The analysis of examples illuminates that the FKCN algorithm improves the capability of Kohonen clustering networks and can obtain the clustering center more quickly and reasonably, overcoming the disadvantages of the Kohonen algorithm. The proposed dynamic updating algorithm can adjust the clustering center automatically on the basis of the newly added data, and the RBF networks can identify the exact location of dirty data because of its strong ability of pattern recognition. The dynamic intelligent adjusting model proposed in this paper can process data dynamically with higher accuracy and faster convergence. References [1] Rahman S, Bhatnagar R. An expert system based algorithm for short term load forecast. IEEE Trans Power Syst 1988;3(2): [2] Mori H, Kobayashi H. Optimal fuzzy inference for short-term load forecasting. IEEE Trans Power Syst 1996;11(1): [3] Song Kyung-Bin, Baek Young-Sik, Hun Hong Dug, Jang Gilsoo. Short-term load forecasting for the holidays using fuzzy linear regression method. IEEE Trans Power Syst 2005;20(1): [4] Kim KH. Development of fuzzy expert system for short-term load forecasting on special days. IEEE Trans Power Syst 1998;47(7): [5] Nazarko J, Zalewski W. The fuzzy regression approach to peak load estimation in power distribution systems. IEEE Trans Power Syst 1999;4: [6] Charytoniuk W, Chen M-S. Very short-term load forecasting using artificial neural networks. IEEE Trans Power Syst 2000;15(1): [7] Ling SH, Frank HF Leung, Lam HK, et al. Short-term electric load forecasting based on a neural fuzzy network. IEEE Trans Ind Electron 2003;50(6). [8] Fayyad UM et al., editorsadvances in knowledge discovery and data mining. AAAI Press/MIT Press; [9] Cho Kokyo. Outlier detection for stationary time series. J Stat Plan Infer 2001: [10] Nicolaos B Karayiannis. An axiomaticn approach to soft learning vector quantization and clustering. IEEE Trans Neural Networks 1999;10(5): [11] Kohonen T. Self-organization and associative memory. 3rd ed. Berlin: Springer; [12] Huntsberger T, Ajjimarangsee P. Parallel self-organization feature maps for unsupervised pattern recognition. Int J Gen Syst 1989: [13] Hartigan J. Clustering algorithms. New York: Wiley; [14] Dubes R, Jain A. Algorithms that cluster data. Englewood Cliffs: Prentice Hall; [15] Tsao EC, Bezdek JC. Fuzzy Kohonen clustering networks. Pattern Recogn 1994;27(5): [16] Dubois D, Prade H. Fuzzy sets and system: theory and applications. New York: Academic Press; [17] Bezdek JC. Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press; [18] Broomhead DS, Lowe D. Multivariable functional interpolation and adaptive networks. Complex Syst 1988;2: [19] Moody TJ, Darken CJ. Fast learning in networks of locally tuned processing units. Neural Comput 1989;1: [20] Bishop CM. Neural networks for pattern recognition. Oxford: Clarendon Press; p [21] Chen S, Cowan CFN, Grant PM. Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans Neural Networks 1991;2:302 9.